Forum Discussion
Finding differences in semi-large, evolving dataset
- Feb 05, 2024
datallama I've written this macro. It gives you a new worksheet with
all data, filtered for changes. Start the macro with a cell in your source data selected:
Option Explicit Sub OutputDifferences() Dim src As Range Dim rowCt As Long Dim colCt As Long Dim data As Variant Dim modifiedData() As Variant Dim curId As String Const IDCol As Long = 1 Const dateCol As Long = 2 On Error Resume Next Set src=Application.InputBox("Please select the data area", "Data for update analysis", ActiveCell.CurrentRegion.Address(False, False), , , , , 8) If src Is Nothing Then Exit Sub src.Parent.Sort.SortFields.Clear src.Parent.Sort.SortFields.Add2 Key:=Intersect(src, src.Parent.Range("A:A")) _ , SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal src.Parent.Sort.SortFields.Add2 Key:=Intersect(src, src.Parent.Range("B:B")) _ , SortOn:=xlSortOnValues, Order:=xlDescending, DataOption:=xlSortNormal With src.Parent.Sort .SetRange Range("A1:R13") .Header = xlYes .MatchCase = False .Orientation = xlTopToBottom .SortMethod = xlPinYin .Apply End With data = src.Value2 ReDim modifiedData(LBound(data, 1) To UBound(data, 1), LBound(data, 2) To UBound(data, 2)) For colCt = LBound(data, 2) To UBound(data, 2) modifiedData(1, colCt) = data(1, colCt) If colCt < 3 Then For rowCt = LBound(data, 1) + 1 To UBound(data, 1) modifiedData(rowCt, colCt) = data(rowCt, colCt) Next End If Next For rowCt = LBound(data, 1) + 1 To UBound(data, 1) If rowCt = 2 Then curId = data(rowCt, 1) Else If curId = data(rowCt, 1) Then modifiedData(rowCt, 3) = data(rowCt, 3) modifiedData(rowCt, 4) = data(rowCt, 4) modifiedData(rowCt, 5) = data(rowCt, 5) For colCt = LBound(data, 2) + 2 To UBound(data, 2) 'record fullname, first name and last name 'detect changes in all but the first two columns If data(rowCt, colCt) <> data(rowCt - 1, colCt) Then 'Found a difference, write to array of differences modifiedData(rowCt, colCt) = data(rowCt, colCt) & " --> " & data(rowCt - 1, colCt) End If Next Else curId = data(rowCt, 1) End If End If Next With Worksheets.Add With .Range("A1").Resize(UBound(data, 1), UBound(data, 2)) .Value = modifiedData With .Offset(, UBound(data, 2)).Resize(, 1) .FormulaR1C1 = "=COUNTIF(RC[-" & UBound(data, 2) - 2 & "]:RC[-1],""* --> *"")" .Cells(1, 1).Value = "Changes" End With src.Copy .CurrentRegion.PasteSpecial xlPasteFormats .CurrentRegion.PasteSpecial xlPasteColumnWidths .CurrentRegion.AutoFilter UBound(data, 2) + 1, ">0" End With End With End Sub
Sure, here it is:
So this would be the 'master source table' on sheet1, data merged from other sources. This master sheet is thousands of rows, continuously expanded so it will reach the 100000s sooner rather than later.
| Employee # | Date added | Employee full name | First name | Last name | Country | Region | Location | Department | Division | Work Email | Supervisor name | Hire date | Manager? | Salary | Currency | Type | Status |
| 12345 | 1/31/2023 | John Smith | John | Smith | USA | AMER | Memphis | Sales | DIV1 | mailto:jsmith@acmecorp | Kristine Manager | 5/1/1999 | No | 40000 | USD | Full Time | Active |
| 12346 | 1/31/2023 | Jane Doe | Jane | Doe | USA | AMER | Chicago | Tech | DIV1 | mailto:jdoe@acmecorp | George Supervisor | 2/15/2006 | No | 50000 | USD | Full Time | Active |
| 12347 | 1/31/2023 | William Smith | William | Smith | UK | EMEA | Liverpool | Legal | DIV1 | mailto:wsmith@acmecorp | George Supervisor | 9/23/2014 | No | 45000 | GBP | Full Time | Active |
| 12348 | 1/31/2023 | Emma Doe | Emma | Doe | Australia | APAC | Canberra | Sales | DIV2 | mailto:edoe@acmecorp | Kristine Manager | 4/3/2019 | No | 60000 | AUD | Full Time | Active |
| 12345 | 2/28/2023 | John Smith | John | Smith | USA | AMER | Memphis | Sales | DIV1 | mailto:jsmith@acmecorp | Kristine Manager | 5/1/1999 | No | 40000 | USD | Full Time | Active |
| 12346 | 2/28/2023 | Jane Doe | Jane | Doe | USA | AMER | Chicago | Tech | DIV1 | mailto:jdoe@acmecorp | George Supervisor | 2/15/2006 | No | 50000 | USD | Full Time | Active |
| 12347 | 2/28/2023 | William Smith | William | Smith | UK | EMEA | Liverpool | Legal | DIV1 | mailto:wsmith@acmecorp | George Supervisor | 9/23/2014 | No | 45000 | GBP | Full Time | Active |
| 12348 | 2/28/2023 | Emma Doe | Emma | Doe | Australia | APAC | Canberra | Sales | DIV2 | mailto:edoe@acmecorp | Kristine Manager | 4/3/2019 | No | 60000 | AUD | Full Time | Active |
| 12345 | 3/31/2023 | John Smith | John | Smith | USA | AMER | Memphis | Sales | DIV1 | mailto:jsmith@acmecorp | Kristine Manager | 5/1/1999 | No | 40000 | USD | Full Time | Active |
| 12346 | 3/31/2023 | Jane Doe | Jane | Doe | UK | AMER | Manchester | Tech | DIV1 | mailto:jdoe@acmecorp | George Supervisor | 2/15/2006 | No | 65000 | GBP | Full Time | Active |
| 12347 | 3/31/2023 | William Smith | William | Smith | UK | EMEA | Liverpool | Tech | DIV1 | mailto:wsmith@acmecorp | George Supervisor | 9/23/2014 | No | 45000 | GBP | Full Time | Active |
| 12348 | 3/31/2023 | Emma Doe | Emma | Doe | Australia | APAC | Canberra | Sales | DIV2 | mailto:edoe@acmecorp | Kristine Manager | 4/3/2019 | No | 60000 | AUD | Full Time | Active |
Then out ot the 20-something columns above, the below are 'filtered' to 2 pivot tables on sheet2:
| Filter | Date added 1 (eg. 31/01/2023) | |||||
| Values (DAX measures) | ||||||
| Row Labels (ID) | Name | Dept | Reporting to | Country | Salary in | Salary |
| 12345 | John Smith | Sales | Kristine Manager | USA | USD | 40000 |
| 12346 | Jane Doe | Tech | George Supervisor | USA | USD | 50000 |
| 12347 | William Smith | Legal | George Supervisor | UK | GBP | 45000 |
| 12348 | Emma Doe | Sales | Kristine Manager | Australia | AUD | 60000 |
This pivot is duplicated so that you can filter by entry date. You would select 2 months and the point would be to check if there are differences.
Next, the IDs from the first pivot would be copied with a VBA to sheet3 to a table, so the formulas autofill.
| ID | Changes | Transfer | Salary | Currency | Manager | Department | Country | Full name |
| 12345 | Yes/"" | Yes/"" | IF(VLOOKUP([@ID],'Sheet2'!A:F,6,FALSE)=VLOOKUP([@ID],'Sheet2'!I:N,6,FALSE),"",VLOOKUP([@ID],'Sheet2'!A:N,6,FALSE)) | |||||
| 12346 | ||||||||
| 12347 | ||||||||
| 12348 |
There, the above formula is in the columns from salary to full name, for about 3-400 rows. Change and transfer are just IF formulas checking if there's anything to the right.
The logic how it works is to select eg. Jan 2023 and Mar 2023 on Sheet2 for the pivot tables, then run the VBA for the accurate list of IDs, and by that time the formulas already found that Jane Doe who moved from the US to the UK, so by her ID, in the country column the UK is displayed and there is a YES in the change column so we can filter to only see the changes.
- JKPieterseFeb 05, 2024Silver Contributor
datallama I've written this macro. It gives you a new worksheet with
all data, filtered for changes. Start the macro with a cell in your source data selected:
Option Explicit Sub OutputDifferences() Dim src As Range Dim rowCt As Long Dim colCt As Long Dim data As Variant Dim modifiedData() As Variant Dim curId As String Const IDCol As Long = 1 Const dateCol As Long = 2 On Error Resume Next Set src=Application.InputBox("Please select the data area", "Data for update analysis", ActiveCell.CurrentRegion.Address(False, False), , , , , 8) If src Is Nothing Then Exit Sub src.Parent.Sort.SortFields.Clear src.Parent.Sort.SortFields.Add2 Key:=Intersect(src, src.Parent.Range("A:A")) _ , SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal src.Parent.Sort.SortFields.Add2 Key:=Intersect(src, src.Parent.Range("B:B")) _ , SortOn:=xlSortOnValues, Order:=xlDescending, DataOption:=xlSortNormal With src.Parent.Sort .SetRange Range("A1:R13") .Header = xlYes .MatchCase = False .Orientation = xlTopToBottom .SortMethod = xlPinYin .Apply End With data = src.Value2 ReDim modifiedData(LBound(data, 1) To UBound(data, 1), LBound(data, 2) To UBound(data, 2)) For colCt = LBound(data, 2) To UBound(data, 2) modifiedData(1, colCt) = data(1, colCt) If colCt < 3 Then For rowCt = LBound(data, 1) + 1 To UBound(data, 1) modifiedData(rowCt, colCt) = data(rowCt, colCt) Next End If Next For rowCt = LBound(data, 1) + 1 To UBound(data, 1) If rowCt = 2 Then curId = data(rowCt, 1) Else If curId = data(rowCt, 1) Then modifiedData(rowCt, 3) = data(rowCt, 3) modifiedData(rowCt, 4) = data(rowCt, 4) modifiedData(rowCt, 5) = data(rowCt, 5) For colCt = LBound(data, 2) + 2 To UBound(data, 2) 'record fullname, first name and last name 'detect changes in all but the first two columns If data(rowCt, colCt) <> data(rowCt - 1, colCt) Then 'Found a difference, write to array of differences modifiedData(rowCt, colCt) = data(rowCt, colCt) & " --> " & data(rowCt - 1, colCt) End If Next Else curId = data(rowCt, 1) End If End If Next With Worksheets.Add With .Range("A1").Resize(UBound(data, 1), UBound(data, 2)) .Value = modifiedData With .Offset(, UBound(data, 2)).Resize(, 1) .FormulaR1C1 = "=COUNTIF(RC[-" & UBound(data, 2) - 2 & "]:RC[-1],""* --> *"")" .Cells(1, 1).Value = "Changes" End With src.Copy .CurrentRegion.PasteSpecial xlPasteFormats .CurrentRegion.PasteSpecial xlPasteColumnWidths .CurrentRegion.AutoFilter UBound(data, 2) + 1, ">0" End With End With End Sub- datallamaFeb 07, 2024Copper Contributor
Thank you! I tried running it, selecting the master source table as the data area. I also changed some details (country, region and city) for 'Emma Doe' to test it. The macro runs without an issue - but it doesn't return anything. When I disable the filter in the change column, it returns every line - as expected considering it didn't detect changes - with everything but the 'ID' and 'Date' columns cleared.
Am I trying to use it wrong?- JKPieterseFeb 07, 2024Silver Contributor
datallama See attached workbook.