Forum Discussion
Finding differences in semi-large, evolving dataset
- Feb 05, 2024
datallama I've written this macro. It gives you a new worksheet with
all data, filtered for changes. Start the macro with a cell in your source data selected:
Option Explicit Sub OutputDifferences() Dim src As Range Dim rowCt As Long Dim colCt As Long Dim data As Variant Dim modifiedData() As Variant Dim curId As String Const IDCol As Long = 1 Const dateCol As Long = 2 On Error Resume Next Set src=Application.InputBox("Please select the data area", "Data for update analysis", ActiveCell.CurrentRegion.Address(False, False), , , , , 8) If src Is Nothing Then Exit Sub src.Parent.Sort.SortFields.Clear src.Parent.Sort.SortFields.Add2 Key:=Intersect(src, src.Parent.Range("A:A")) _ , SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal src.Parent.Sort.SortFields.Add2 Key:=Intersect(src, src.Parent.Range("B:B")) _ , SortOn:=xlSortOnValues, Order:=xlDescending, DataOption:=xlSortNormal With src.Parent.Sort .SetRange Range("A1:R13") .Header = xlYes .MatchCase = False .Orientation = xlTopToBottom .SortMethod = xlPinYin .Apply End With data = src.Value2 ReDim modifiedData(LBound(data, 1) To UBound(data, 1), LBound(data, 2) To UBound(data, 2)) For colCt = LBound(data, 2) To UBound(data, 2) modifiedData(1, colCt) = data(1, colCt) If colCt < 3 Then For rowCt = LBound(data, 1) + 1 To UBound(data, 1) modifiedData(rowCt, colCt) = data(rowCt, colCt) Next End If Next For rowCt = LBound(data, 1) + 1 To UBound(data, 1) If rowCt = 2 Then curId = data(rowCt, 1) Else If curId = data(rowCt, 1) Then modifiedData(rowCt, 3) = data(rowCt, 3) modifiedData(rowCt, 4) = data(rowCt, 4) modifiedData(rowCt, 5) = data(rowCt, 5) For colCt = LBound(data, 2) + 2 To UBound(data, 2) 'record fullname, first name and last name 'detect changes in all but the first two columns If data(rowCt, colCt) <> data(rowCt - 1, colCt) Then 'Found a difference, write to array of differences modifiedData(rowCt, colCt) = data(rowCt, colCt) & " --> " & data(rowCt - 1, colCt) End If Next Else curId = data(rowCt, 1) End If End If Next With Worksheets.Add With .Range("A1").Resize(UBound(data, 1), UBound(data, 2)) .Value = modifiedData With .Offset(, UBound(data, 2)).Resize(, 1) .FormulaR1C1 = "=COUNTIF(RC[-" & UBound(data, 2) - 2 & "]:RC[-1],""* --> *"")" .Cells(1, 1).Value = "Changes" End With src.Copy .CurrentRegion.PasteSpecial xlPasteFormats .CurrentRegion.PasteSpecial xlPasteColumnWidths .CurrentRegion.AutoFilter UBound(data, 2) + 1, ">0" End With End With End Sub
- datallamaFeb 05, 2024Copper Contributor
Sure, here it is:
So this would be the 'master source table' on sheet1, data merged from other sources. This master sheet is thousands of rows, continuously expanded so it will reach the 100000s sooner rather than later.
Employee # Date added Employee full name First name Last name Country Region Location Department Division Work Email Supervisor name Hire date Manager? Salary Currency Type Status 12345 1/31/2023 John Smith John Smith USA AMER Memphis Sales DIV1 jsmith@acmecorp Kristine Manager 5/1/1999 No 40000 USD Full Time Active 12346 1/31/2023 Jane Doe Jane Doe USA AMER Chicago Tech DIV1 jdoe@acmecorp George Supervisor 2/15/2006 No 50000 USD Full Time Active 12347 1/31/2023 William Smith William Smith UK EMEA Liverpool Legal DIV1 wsmith@acmecorp George Supervisor 9/23/2014 No 45000 GBP Full Time Active 12348 1/31/2023 Emma Doe Emma Doe Australia APAC Canberra Sales DIV2 edoe@acmecorp Kristine Manager 4/3/2019 No 60000 AUD Full Time Active 12345 2/28/2023 John Smith John Smith USA AMER Memphis Sales DIV1 jsmith@acmecorp Kristine Manager 5/1/1999 No 40000 USD Full Time Active 12346 2/28/2023 Jane Doe Jane Doe USA AMER Chicago Tech DIV1 jdoe@acmecorp George Supervisor 2/15/2006 No 50000 USD Full Time Active 12347 2/28/2023 William Smith William Smith UK EMEA Liverpool Legal DIV1 wsmith@acmecorp George Supervisor 9/23/2014 No 45000 GBP Full Time Active 12348 2/28/2023 Emma Doe Emma Doe Australia APAC Canberra Sales DIV2 edoe@acmecorp Kristine Manager 4/3/2019 No 60000 AUD Full Time Active 12345 3/31/2023 John Smith John Smith USA AMER Memphis Sales DIV1 jsmith@acmecorp Kristine Manager 5/1/1999 No 40000 USD Full Time Active 12346 3/31/2023 Jane Doe Jane Doe UK AMER Manchester Tech DIV1 jdoe@acmecorp George Supervisor 2/15/2006 No 65000 GBP Full Time Active 12347 3/31/2023 William Smith William Smith UK EMEA Liverpool Tech DIV1 wsmith@acmecorp George Supervisor 9/23/2014 No 45000 GBP Full Time Active 12348 3/31/2023 Emma Doe Emma Doe Australia APAC Canberra Sales DIV2 edoe@acmecorp Kristine Manager 4/3/2019 No 60000 AUD Full Time Active Then out ot the 20-something columns above, the below are 'filtered' to 2 pivot tables on sheet2:
Filter Date added 1 (eg. 31/01/2023) Values (DAX measures) Row Labels (ID) Name Dept Reporting to Country Salary in Salary 12345 John Smith Sales Kristine Manager USA USD 40000 12346 Jane Doe Tech George Supervisor USA USD 50000 12347 William Smith Legal George Supervisor UK GBP 45000 12348 Emma Doe Sales Kristine Manager Australia AUD 60000 This pivot is duplicated so that you can filter by entry date. You would select 2 months and the point would be to check if there are differences.
Next, the IDs from the first pivot would be copied with a VBA to sheet3 to a table, so the formulas autofill.
ID Changes Transfer Salary Currency Manager Department Country Full name 12345 Yes/"" Yes/"" IF(VLOOKUP([@ID],'Sheet2'!A:F,6,FALSE)=VLOOKUP([@ID],'Sheet2'!I:N,6,FALSE),"",VLOOKUP([@ID],'Sheet2'!A:N,6,FALSE)) 12346 12347 12348 There, the above formula is in the columns from salary to full name, for about 3-400 rows. Change and transfer are just IF formulas checking if there's anything to the right.
The logic how it works is to select eg. Jan 2023 and Mar 2023 on Sheet2 for the pivot tables, then run the VBA for the accurate list of IDs, and by that time the formulas already found that Jane Doe who moved from the US to the UK, so by her ID, in the country column the UK is displayed and there is a YES in the change column so we can filter to only see the changes.
- JKPieterseFeb 05, 2024Silver Contributor
datallama I've written this macro. It gives you a new worksheet with
all data, filtered for changes. Start the macro with a cell in your source data selected:
Option Explicit Sub OutputDifferences() Dim src As Range Dim rowCt As Long Dim colCt As Long Dim data As Variant Dim modifiedData() As Variant Dim curId As String Const IDCol As Long = 1 Const dateCol As Long = 2 On Error Resume Next Set src=Application.InputBox("Please select the data area", "Data for update analysis", ActiveCell.CurrentRegion.Address(False, False), , , , , 8) If src Is Nothing Then Exit Sub src.Parent.Sort.SortFields.Clear src.Parent.Sort.SortFields.Add2 Key:=Intersect(src, src.Parent.Range("A:A")) _ , SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal src.Parent.Sort.SortFields.Add2 Key:=Intersect(src, src.Parent.Range("B:B")) _ , SortOn:=xlSortOnValues, Order:=xlDescending, DataOption:=xlSortNormal With src.Parent.Sort .SetRange Range("A1:R13") .Header = xlYes .MatchCase = False .Orientation = xlTopToBottom .SortMethod = xlPinYin .Apply End With data = src.Value2 ReDim modifiedData(LBound(data, 1) To UBound(data, 1), LBound(data, 2) To UBound(data, 2)) For colCt = LBound(data, 2) To UBound(data, 2) modifiedData(1, colCt) = data(1, colCt) If colCt < 3 Then For rowCt = LBound(data, 1) + 1 To UBound(data, 1) modifiedData(rowCt, colCt) = data(rowCt, colCt) Next End If Next For rowCt = LBound(data, 1) + 1 To UBound(data, 1) If rowCt = 2 Then curId = data(rowCt, 1) Else If curId = data(rowCt, 1) Then modifiedData(rowCt, 3) = data(rowCt, 3) modifiedData(rowCt, 4) = data(rowCt, 4) modifiedData(rowCt, 5) = data(rowCt, 5) For colCt = LBound(data, 2) + 2 To UBound(data, 2) 'record fullname, first name and last name 'detect changes in all but the first two columns If data(rowCt, colCt) <> data(rowCt - 1, colCt) Then 'Found a difference, write to array of differences modifiedData(rowCt, colCt) = data(rowCt, colCt) & " --> " & data(rowCt - 1, colCt) End If Next Else curId = data(rowCt, 1) End If End If Next With Worksheets.Add With .Range("A1").Resize(UBound(data, 1), UBound(data, 2)) .Value = modifiedData With .Offset(, UBound(data, 2)).Resize(, 1) .FormulaR1C1 = "=COUNTIF(RC[-" & UBound(data, 2) - 2 & "]:RC[-1],""* --> *"")" .Cells(1, 1).Value = "Changes" End With src.Copy .CurrentRegion.PasteSpecial xlPasteFormats .CurrentRegion.PasteSpecial xlPasteColumnWidths .CurrentRegion.AutoFilter UBound(data, 2) + 1, ">0" End With End With End Sub
- datallamaFeb 07, 2024Copper Contributor
Thank you! I tried running it, selecting the master source table as the data area. I also changed some details (country, region and city) for 'Emma Doe' to test it. The macro runs without an issue - but it doesn't return anything. When I disable the filter in the change column, it returns every line - as expected considering it didn't detect changes - with everything but the 'ID' and 'Date' columns cleared.
Am I trying to use it wrong?