SOLVED

Finding differences in semi-large, evolving dataset

Copper Contributor

Hi,

 

I want to find a more elegant and perhaps less reesource intensive solution for a structure I have:

 

I have an evolving dataset, with new additions every month. This is pulled together from multiple sources via queries (due to access limits on the entry side). There are unique IDs in the list, plus the date added as a part of the primary key. One unique set is a couple hundred rows.

 

Next, I have 2 pivot tables set up with date added as a filter, to be able to see the differences between 2 entries. This is less important, serves only as an intermediate step.

 

Then, I have a VBA set up to copy the IDs from one of the pivot tables into a table on a different sheet (clearing the previous data and resizing the table to match the new amount of rows), where Vlookups embedded in IF formulas check if select data changed between the entries.

 

More specifically:

In the pivot tables, next to the IDs are listed things like country, address, name, etc. This is 'duplicated' to the other pivot, then the IF is used to check if for example the country for the given ID  in the first pivot (for date 1) matches the country in the 2nd pivot (for date 2), then return either nothing if they match or the new value if the don't. Then another formula checks if any of the other columns return a nonblank value so that it's possible to easily filter and display only the IDs where something changed (the rest without changes are not relevant in this case).

 

There are 6 columns for this formula now, with the possibility of having a couple more (so ~10-12 maximum). With the VBA + pivots, there are only these 3-400 lines at any given time that are pulled out of the larger merged dataset, and only these are used for the formulas as well.

 

I'm worried about exceeding the capabilities of Excel, but I can't have this elsewhere because I need to share it with other people in my org, and it needs to be in one place.

 

My question is, is there a more elegant, possibly less resource intensive solution to getting to the same result?

6 Replies
Is it possible to show some -made up- sample records?

@Jan Karel Pieterse 

Sure, here it is:

 

So this would be the 'master source table' on sheet1, data merged from other sources. This master sheet is thousands of rows, continuously expanded so it will reach the 100000s sooner rather than later.

 

Employee  #Date addedEmployee full nameFirst nameLast nameCountryRegionLocationDepartmentDivisionWork EmailSupervisor nameHire dateManager?SalaryCurrencyTypeStatus
123451/31/2023John SmithJohnSmithUSAAMERMemphisSalesDIV1jsmith@acmecorpKristine Manager5/1/1999No40000USDFull TimeActive
123461/31/2023Jane DoeJaneDoeUSAAMERChicagoTechDIV1jdoe@acmecorpGeorge Supervisor2/15/2006No50000USDFull TimeActive
123471/31/2023William SmithWilliamSmithUKEMEALiverpoolLegalDIV1wsmith@acmecorpGeorge Supervisor9/23/2014No45000GBPFull TimeActive
123481/31/2023Emma DoeEmmaDoeAustraliaAPACCanberraSalesDIV2edoe@acmecorpKristine Manager4/3/2019No60000AUDFull TimeActive
123452/28/2023John SmithJohnSmithUSAAMERMemphisSalesDIV1jsmith@acmecorpKristine Manager5/1/1999No40000USDFull TimeActive
123462/28/2023Jane DoeJaneDoeUSAAMERChicagoTechDIV1jdoe@acmecorpGeorge Supervisor2/15/2006No50000USDFull TimeActive
123472/28/2023William SmithWilliamSmithUKEMEALiverpoolLegalDIV1wsmith@acmecorpGeorge Supervisor9/23/2014No45000GBPFull TimeActive
123482/28/2023Emma DoeEmmaDoeAustraliaAPACCanberraSalesDIV2edoe@acmecorpKristine Manager4/3/2019No60000AUDFull TimeActive
123453/31/2023John SmithJohnSmithUSAAMERMemphisSalesDIV1jsmith@acmecorpKristine Manager5/1/1999No40000USDFull TimeActive
123463/31/2023Jane DoeJaneDoeUKAMERManchesterTechDIV1jdoe@acmecorpGeorge Supervisor2/15/2006No65000GBPFull TimeActive
123473/31/2023William SmithWilliamSmithUKEMEALiverpoolTechDIV1wsmith@acmecorpGeorge Supervisor9/23/2014No45000GBPFull TimeActive
123483/31/2023Emma DoeEmmaDoeAustraliaAPACCanberraSalesDIV2edoe@acmecorpKristine Manager4/3/2019No60000AUDFull TimeActive

 

Then out ot the 20-something columns above, the below are 'filtered' to 2 pivot tables on sheet2:

 

FilterDate added 1 (eg. 31/01/2023)     
 Values (DAX measures)     
Row Labels (ID)NameDeptReporting toCountrySalary inSalary
12345John SmithSalesKristine ManagerUSAUSD40000
12346Jane DoeTechGeorge SupervisorUSAUSD50000
12347William SmithLegalGeorge SupervisorUKGBP45000
12348Emma DoeSalesKristine ManagerAustraliaAUD60000

 

This pivot is duplicated so that you can filter by entry date. You would select 2 months and the point would be to check if there are differences.

 

Next, the IDs from the first pivot would be copied with a VBA to sheet3 to a table, so the formulas autofill.

 

IDChangesTransferSalaryCurrencyManagerDepartmentCountryFull name
12345Yes/""Yes/""IF(VLOOKUP([@ID],'Sheet2'!A:F,6,FALSE)=VLOOKUP([@ID],'Sheet2'!I:N,6,FALSE),"",VLOOKUP([@ID],'Sheet2'!A:N,6,FALSE))
12346        
12347        
12348        

 

There, the above formula is in the columns from salary to full name, for about 3-400 rows. Change and transfer are just IF formulas checking if there's anything to the right.

The logic how it works is to select eg. Jan 2023 and Mar 2023 on Sheet2 for the pivot tables, then run the VBA for the accurate list of IDs, and by that time the formulas already found that Jane Doe who moved from the US to the UK, so by her ID, in the country column the UK is displayed and there is a YES in the change column so we can filter to only see the changes.

 

 

best response confirmed by datallama (Copper Contributor)
Solution

@datallama I've written this macro. It gives you a new worksheet with

all data, filtered for changes. Start the macro with a cell in your source data selected:

 

Option Explicit

Sub OutputDifferences()
    Dim src As Range
    Dim rowCt As Long
    Dim colCt As Long
    Dim data As Variant
    Dim modifiedData() As Variant
    Dim curId As String
    Const IDCol As Long = 1
    Const dateCol As Long = 2

    On Error Resume Next
    Set src=Application.InputBox("Please select the data area", "Data for update analysis", ActiveCell.CurrentRegion.Address(False, False), , , , , 8)
    If src Is Nothing Then Exit Sub
    src.Parent.Sort.SortFields.Clear
    src.Parent.Sort.SortFields.Add2 Key:=Intersect(src, src.Parent.Range("A:A")) _
                                         , SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal
    src.Parent.Sort.SortFields.Add2 Key:=Intersect(src, src.Parent.Range("B:B")) _
                                         , SortOn:=xlSortOnValues, Order:=xlDescending, DataOption:=xlSortNormal
    With src.Parent.Sort
        .SetRange Range("A1:R13")
        .Header = xlYes
        .MatchCase = False
        .Orientation = xlTopToBottom
        .SortMethod = xlPinYin
        .Apply
    End With
    data = src.Value2
    ReDim modifiedData(LBound(data, 1) To UBound(data, 1), LBound(data, 2) To UBound(data, 2))
    For colCt = LBound(data, 2) To UBound(data, 2)
        modifiedData(1, colCt) = data(1, colCt)
        If colCt < 3 Then
            For rowCt = LBound(data, 1) + 1 To UBound(data, 1)
                modifiedData(rowCt, colCt) = data(rowCt, colCt)
            Next
        End If
    Next
    For rowCt = LBound(data, 1) + 1 To UBound(data, 1)
        If rowCt = 2 Then
            curId = data(rowCt, 1)
        Else
            If curId = data(rowCt, 1) Then
                modifiedData(rowCt, 3) = data(rowCt, 3)
                modifiedData(rowCt, 4) = data(rowCt, 4)
                modifiedData(rowCt, 5) = data(rowCt, 5)
                For colCt = LBound(data, 2) + 2 To UBound(data, 2)
                    'record fullname, first name and last name
                    'detect changes in all but the first two columns
                    If data(rowCt, colCt) <> data(rowCt - 1, colCt) Then
                        'Found a difference, write to array of differences
                        modifiedData(rowCt, colCt) = data(rowCt, colCt) & " --> " & data(rowCt - 1, colCt)
                    End If
                Next
            Else
                curId = data(rowCt, 1)
            End If
        End If
    Next
    With Worksheets.Add
        With .Range("A1").Resize(UBound(data, 1), UBound(data, 2))
            .Value = modifiedData
            With .Offset(, UBound(data, 2)).Resize(, 1)
                .FormulaR1C1 = "=COUNTIF(RC[-" & UBound(data, 2) - 2 & "]:RC[-1],""* --> *"")"
                .Cells(1, 1).Value = "Changes"
            End With
            src.Copy
            .CurrentRegion.PasteSpecial xlPasteFormats
            .CurrentRegion.PasteSpecial xlPasteColumnWidths
            .CurrentRegion.AutoFilter UBound(data, 2) + 1, ">0"

        End With
    End With
End Sub

 

@Jan Karel Pieterse 

 

Thank you! I tried running it, selecting the master source table as the data area. I also changed some details (country, region and city) for 'Emma Doe' to test it. The macro runs without an issue - but it doesn't return anything. When I disable the filter in the change column, it returns every line - as expected considering it didn't detect changes - with everything but the 'ID' and 'Date' columns cleared.
Am I trying to use it wrong?

@datallama See attached workbook.

@Jan Karel Pieterse 

 

I found the problem, I had to update this line in the macro:

 

 With src.Parent.Sort
        .SetRange Range("A1:R13")

to the start and end columns of the source data in my actual file.

After that, it works like a charm. Thank you!

1 best response

Accepted Solutions
best response confirmed by datallama (Copper Contributor)
Solution

@datallama I've written this macro. It gives you a new worksheet with

all data, filtered for changes. Start the macro with a cell in your source data selected:

 

Option Explicit

Sub OutputDifferences()
    Dim src As Range
    Dim rowCt As Long
    Dim colCt As Long
    Dim data As Variant
    Dim modifiedData() As Variant
    Dim curId As String
    Const IDCol As Long = 1
    Const dateCol As Long = 2

    On Error Resume Next
    Set src=Application.InputBox("Please select the data area", "Data for update analysis", ActiveCell.CurrentRegion.Address(False, False), , , , , 8)
    If src Is Nothing Then Exit Sub
    src.Parent.Sort.SortFields.Clear
    src.Parent.Sort.SortFields.Add2 Key:=Intersect(src, src.Parent.Range("A:A")) _
                                         , SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal
    src.Parent.Sort.SortFields.Add2 Key:=Intersect(src, src.Parent.Range("B:B")) _
                                         , SortOn:=xlSortOnValues, Order:=xlDescending, DataOption:=xlSortNormal
    With src.Parent.Sort
        .SetRange Range("A1:R13")
        .Header = xlYes
        .MatchCase = False
        .Orientation = xlTopToBottom
        .SortMethod = xlPinYin
        .Apply
    End With
    data = src.Value2
    ReDim modifiedData(LBound(data, 1) To UBound(data, 1), LBound(data, 2) To UBound(data, 2))
    For colCt = LBound(data, 2) To UBound(data, 2)
        modifiedData(1, colCt) = data(1, colCt)
        If colCt < 3 Then
            For rowCt = LBound(data, 1) + 1 To UBound(data, 1)
                modifiedData(rowCt, colCt) = data(rowCt, colCt)
            Next
        End If
    Next
    For rowCt = LBound(data, 1) + 1 To UBound(data, 1)
        If rowCt = 2 Then
            curId = data(rowCt, 1)
        Else
            If curId = data(rowCt, 1) Then
                modifiedData(rowCt, 3) = data(rowCt, 3)
                modifiedData(rowCt, 4) = data(rowCt, 4)
                modifiedData(rowCt, 5) = data(rowCt, 5)
                For colCt = LBound(data, 2) + 2 To UBound(data, 2)
                    'record fullname, first name and last name
                    'detect changes in all but the first two columns
                    If data(rowCt, colCt) <> data(rowCt - 1, colCt) Then
                        'Found a difference, write to array of differences
                        modifiedData(rowCt, colCt) = data(rowCt, colCt) & " --> " & data(rowCt - 1, colCt)
                    End If
                Next
            Else
                curId = data(rowCt, 1)
            End If
        End If
    Next
    With Worksheets.Add
        With .Range("A1").Resize(UBound(data, 1), UBound(data, 2))
            .Value = modifiedData
            With .Offset(, UBound(data, 2)).Resize(, 1)
                .FormulaR1C1 = "=COUNTIF(RC[-" & UBound(data, 2) - 2 & "]:RC[-1],""* --> *"")"
                .Cells(1, 1).Value = "Changes"
            End With
            src.Copy
            .CurrentRegion.PasteSpecial xlPasteFormats
            .CurrentRegion.PasteSpecial xlPasteColumnWidths
            .CurrentRegion.AutoFilter UBound(data, 2) + 1, ">0"

        End With
    End With
End Sub

 

View solution in original post