Forum Discussion

datallama's avatar
datallama
Copper Contributor
Feb 05, 2024
Solved

Finding differences in semi-large, evolving dataset

Hi,

 

I want to find a more elegant and perhaps less reesource intensive solution for a structure I have:

 

I have an evolving dataset, with new additions every month. This is pulled together from multiple sources via queries (due to access limits on the entry side). There are unique IDs in the list, plus the date added as a part of the primary key. One unique set is a couple hundred rows.

 

Next, I have 2 pivot tables set up with date added as a filter, to be able to see the differences between 2 entries. This is less important, serves only as an intermediate step.

 

Then, I have a VBA set up to copy the IDs from one of the pivot tables into a table on a different sheet (clearing the previous data and resizing the table to match the new amount of rows), where Vlookups embedded in IF formulas check if select data changed between the entries.

 

More specifically:

In the pivot tables, next to the IDs are listed things like country, address, name, etc. This is 'duplicated' to the other pivot, then the IF is used to check if for example the country for the given ID  in the first pivot (for date 1) matches the country in the 2nd pivot (for date 2), then return either nothing if they match or the new value if the don't. Then another formula checks if any of the other columns return a nonblank value so that it's possible to easily filter and display only the IDs where something changed (the rest without changes are not relevant in this case).

 

There are 6 columns for this formula now, with the possibility of having a couple more (so ~10-12 maximum). With the VBA + pivots, there are only these 3-400 lines at any given time that are pulled out of the larger merged dataset, and only these are used for the formulas as well.

 

I'm worried about exceeding the capabilities of Excel, but I can't have this elsewhere because I need to share it with other people in my org, and it needs to be in one place.

 

My question is, is there a more elegant, possibly less resource intensive solution to getting to the same result?

  • datallama I've written this macro. It gives you a new worksheet with

    all data, filtered for changes. Start the macro with a cell in your source data selected:

     

    Option Explicit
    
    Sub OutputDifferences()
        Dim src As Range
        Dim rowCt As Long
        Dim colCt As Long
        Dim data As Variant
        Dim modifiedData() As Variant
        Dim curId As String
        Const IDCol As Long = 1
        Const dateCol As Long = 2
    
        On Error Resume Next
        Set src=Application.InputBox("Please select the data area", "Data for update analysis", ActiveCell.CurrentRegion.Address(False, False), , , , , 8)
        If src Is Nothing Then Exit Sub
        src.Parent.Sort.SortFields.Clear
        src.Parent.Sort.SortFields.Add2 Key:=Intersect(src, src.Parent.Range("A:A")) _
                                             , SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal
        src.Parent.Sort.SortFields.Add2 Key:=Intersect(src, src.Parent.Range("B:B")) _
                                             , SortOn:=xlSortOnValues, Order:=xlDescending, DataOption:=xlSortNormal
        With src.Parent.Sort
            .SetRange Range("A1:R13")
            .Header = xlYes
            .MatchCase = False
            .Orientation = xlTopToBottom
            .SortMethod = xlPinYin
            .Apply
        End With
        data = src.Value2
        ReDim modifiedData(LBound(data, 1) To UBound(data, 1), LBound(data, 2) To UBound(data, 2))
        For colCt = LBound(data, 2) To UBound(data, 2)
            modifiedData(1, colCt) = data(1, colCt)
            If colCt < 3 Then
                For rowCt = LBound(data, 1) + 1 To UBound(data, 1)
                    modifiedData(rowCt, colCt) = data(rowCt, colCt)
                Next
            End If
        Next
        For rowCt = LBound(data, 1) + 1 To UBound(data, 1)
            If rowCt = 2 Then
                curId = data(rowCt, 1)
            Else
                If curId = data(rowCt, 1) Then
                    modifiedData(rowCt, 3) = data(rowCt, 3)
                    modifiedData(rowCt, 4) = data(rowCt, 4)
                    modifiedData(rowCt, 5) = data(rowCt, 5)
                    For colCt = LBound(data, 2) + 2 To UBound(data, 2)
                        'record fullname, first name and last name
                        'detect changes in all but the first two columns
                        If data(rowCt, colCt) <> data(rowCt - 1, colCt) Then
                            'Found a difference, write to array of differences
                            modifiedData(rowCt, colCt) = data(rowCt, colCt) & " --> " & data(rowCt - 1, colCt)
                        End If
                    Next
                Else
                    curId = data(rowCt, 1)
                End If
            End If
        Next
        With Worksheets.Add
            With .Range("A1").Resize(UBound(data, 1), UBound(data, 2))
                .Value = modifiedData
                With .Offset(, UBound(data, 2)).Resize(, 1)
                    .FormulaR1C1 = "=COUNTIF(RC[-" & UBound(data, 2) - 2 & "]:RC[-1],""* --> *"")"
                    .Cells(1, 1).Value = "Changes"
                End With
                src.Copy
                .CurrentRegion.PasteSpecial xlPasteFormats
                .CurrentRegion.PasteSpecial xlPasteColumnWidths
                .CurrentRegion.AutoFilter UBound(data, 2) + 1, ">0"
    
            End With
        End With
    End Sub

     

6 Replies

    • datallama's avatar
      datallama
      Copper Contributor

      JKPieterse 

      Sure, here it is:

       

      So this would be the 'master source table' on sheet1, data merged from other sources. This master sheet is thousands of rows, continuously expanded so it will reach the 100000s sooner rather than later.

       

      Employee  #Date addedEmployee full nameFirst nameLast nameCountryRegionLocationDepartmentDivisionWork EmailSupervisor nameHire dateManager?SalaryCurrencyTypeStatus
      123451/31/2023John SmithJohnSmithUSAAMERMemphisSalesDIV1jsmith@acmecorpKristine Manager5/1/1999No40000USDFull TimeActive
      123461/31/2023Jane DoeJaneDoeUSAAMERChicagoTechDIV1jdoe@acmecorpGeorge Supervisor2/15/2006No50000USDFull TimeActive
      123471/31/2023William SmithWilliamSmithUKEMEALiverpoolLegalDIV1wsmith@acmecorpGeorge Supervisor9/23/2014No45000GBPFull TimeActive
      123481/31/2023Emma DoeEmmaDoeAustraliaAPACCanberraSalesDIV2edoe@acmecorpKristine Manager4/3/2019No60000AUDFull TimeActive
      123452/28/2023John SmithJohnSmithUSAAMERMemphisSalesDIV1jsmith@acmecorpKristine Manager5/1/1999No40000USDFull TimeActive
      123462/28/2023Jane DoeJaneDoeUSAAMERChicagoTechDIV1jdoe@acmecorpGeorge Supervisor2/15/2006No50000USDFull TimeActive
      123472/28/2023William SmithWilliamSmithUKEMEALiverpoolLegalDIV1wsmith@acmecorpGeorge Supervisor9/23/2014No45000GBPFull TimeActive
      123482/28/2023Emma DoeEmmaDoeAustraliaAPACCanberraSalesDIV2edoe@acmecorpKristine Manager4/3/2019No60000AUDFull TimeActive
      123453/31/2023John SmithJohnSmithUSAAMERMemphisSalesDIV1jsmith@acmecorpKristine Manager5/1/1999No40000USDFull TimeActive
      123463/31/2023Jane DoeJaneDoeUKAMERManchesterTechDIV1jdoe@acmecorpGeorge Supervisor2/15/2006No65000GBPFull TimeActive
      123473/31/2023William SmithWilliamSmithUKEMEALiverpoolTechDIV1wsmith@acmecorpGeorge Supervisor9/23/2014No45000GBPFull TimeActive
      123483/31/2023Emma DoeEmmaDoeAustraliaAPACCanberraSalesDIV2edoe@acmecorpKristine Manager4/3/2019No60000AUDFull TimeActive

       

      Then out ot the 20-something columns above, the below are 'filtered' to 2 pivot tables on sheet2:

       

      FilterDate added 1 (eg. 31/01/2023)     
       Values (DAX measures)     
      Row Labels (ID)NameDeptReporting toCountrySalary inSalary
      12345John SmithSalesKristine ManagerUSAUSD40000
      12346Jane DoeTechGeorge SupervisorUSAUSD50000
      12347William SmithLegalGeorge SupervisorUKGBP45000
      12348Emma DoeSalesKristine ManagerAustraliaAUD60000

       

      This pivot is duplicated so that you can filter by entry date. You would select 2 months and the point would be to check if there are differences.

       

      Next, the IDs from the first pivot would be copied with a VBA to sheet3 to a table, so the formulas autofill.

       

      IDChangesTransferSalaryCurrencyManagerDepartmentCountryFull name
      12345Yes/""Yes/""IF(VLOOKUP([@ID],'Sheet2'!A:F,6,FALSE)=VLOOKUP([@ID],'Sheet2'!I:N,6,FALSE),"",VLOOKUP([@ID],'Sheet2'!A:N,6,FALSE))
      12346        
      12347        
      12348        

       

      There, the above formula is in the columns from salary to full name, for about 3-400 rows. Change and transfer are just IF formulas checking if there's anything to the right.

      The logic how it works is to select eg. Jan 2023 and Mar 2023 on Sheet2 for the pivot tables, then run the VBA for the accurate list of IDs, and by that time the formulas already found that Jane Doe who moved from the US to the UK, so by her ID, in the country column the UK is displayed and there is a YES in the change column so we can filter to only see the changes.

       

       

      • JKPieterse's avatar
        JKPieterse
        Silver Contributor

        datallama I've written this macro. It gives you a new worksheet with

        all data, filtered for changes. Start the macro with a cell in your source data selected:

         

        Option Explicit
        
        Sub OutputDifferences()
            Dim src As Range
            Dim rowCt As Long
            Dim colCt As Long
            Dim data As Variant
            Dim modifiedData() As Variant
            Dim curId As String
            Const IDCol As Long = 1
            Const dateCol As Long = 2
        
            On Error Resume Next
            Set src=Application.InputBox("Please select the data area", "Data for update analysis", ActiveCell.CurrentRegion.Address(False, False), , , , , 8)
            If src Is Nothing Then Exit Sub
            src.Parent.Sort.SortFields.Clear
            src.Parent.Sort.SortFields.Add2 Key:=Intersect(src, src.Parent.Range("A:A")) _
                                                 , SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal
            src.Parent.Sort.SortFields.Add2 Key:=Intersect(src, src.Parent.Range("B:B")) _
                                                 , SortOn:=xlSortOnValues, Order:=xlDescending, DataOption:=xlSortNormal
            With src.Parent.Sort
                .SetRange Range("A1:R13")
                .Header = xlYes
                .MatchCase = False
                .Orientation = xlTopToBottom
                .SortMethod = xlPinYin
                .Apply
            End With
            data = src.Value2
            ReDim modifiedData(LBound(data, 1) To UBound(data, 1), LBound(data, 2) To UBound(data, 2))
            For colCt = LBound(data, 2) To UBound(data, 2)
                modifiedData(1, colCt) = data(1, colCt)
                If colCt < 3 Then
                    For rowCt = LBound(data, 1) + 1 To UBound(data, 1)
                        modifiedData(rowCt, colCt) = data(rowCt, colCt)
                    Next
                End If
            Next
            For rowCt = LBound(data, 1) + 1 To UBound(data, 1)
                If rowCt = 2 Then
                    curId = data(rowCt, 1)
                Else
                    If curId = data(rowCt, 1) Then
                        modifiedData(rowCt, 3) = data(rowCt, 3)
                        modifiedData(rowCt, 4) = data(rowCt, 4)
                        modifiedData(rowCt, 5) = data(rowCt, 5)
                        For colCt = LBound(data, 2) + 2 To UBound(data, 2)
                            'record fullname, first name and last name
                            'detect changes in all but the first two columns
                            If data(rowCt, colCt) <> data(rowCt - 1, colCt) Then
                                'Found a difference, write to array of differences
                                modifiedData(rowCt, colCt) = data(rowCt, colCt) & " --> " & data(rowCt - 1, colCt)
                            End If
                        Next
                    Else
                        curId = data(rowCt, 1)
                    End If
                End If
            Next
            With Worksheets.Add
                With .Range("A1").Resize(UBound(data, 1), UBound(data, 2))
                    .Value = modifiedData
                    With .Offset(, UBound(data, 2)).Resize(, 1)
                        .FormulaR1C1 = "=COUNTIF(RC[-" & UBound(data, 2) - 2 & "]:RC[-1],""* --> *"")"
                        .Cells(1, 1).Value = "Changes"
                    End With
                    src.Copy
                    .CurrentRegion.PasteSpecial xlPasteFormats
                    .CurrentRegion.PasteSpecial xlPasteColumnWidths
                    .CurrentRegion.AutoFilter UBound(data, 2) + 1, ">0"
        
                End With
            End With
        End Sub

         

Resources