Feb 05 2024 02:41 AM
Hi,
I want to find a more elegant and perhaps less reesource intensive solution for a structure I have:
I have an evolving dataset, with new additions every month. This is pulled together from multiple sources via queries (due to access limits on the entry side). There are unique IDs in the list, plus the date added as a part of the primary key. One unique set is a couple hundred rows.
Next, I have 2 pivot tables set up with date added as a filter, to be able to see the differences between 2 entries. This is less important, serves only as an intermediate step.
Then, I have a VBA set up to copy the IDs from one of the pivot tables into a table on a different sheet (clearing the previous data and resizing the table to match the new amount of rows), where Vlookups embedded in IF formulas check if select data changed between the entries.
More specifically:
In the pivot tables, next to the IDs are listed things like country, address, name, etc. This is 'duplicated' to the other pivot, then the IF is used to check if for example the country for the given ID in the first pivot (for date 1) matches the country in the 2nd pivot (for date 2), then return either nothing if they match or the new value if the don't. Then another formula checks if any of the other columns return a nonblank value so that it's possible to easily filter and display only the IDs where something changed (the rest without changes are not relevant in this case).
There are 6 columns for this formula now, with the possibility of having a couple more (so ~10-12 maximum). With the VBA + pivots, there are only these 3-400 lines at any given time that are pulled out of the larger merged dataset, and only these are used for the formulas as well.
I'm worried about exceeding the capabilities of Excel, but I can't have this elsewhere because I need to share it with other people in my org, and it needs to be in one place.
My question is, is there a more elegant, possibly less resource intensive solution to getting to the same result?
Feb 05 2024 05:40 AM
Feb 05 2024 07:14 AM
Sure, here it is:
So this would be the 'master source table' on sheet1, data merged from other sources. This master sheet is thousands of rows, continuously expanded so it will reach the 100000s sooner rather than later.
Employee # | Date added | Employee full name | First name | Last name | Country | Region | Location | Department | Division | Work Email | Supervisor name | Hire date | Manager? | Salary | Currency | Type | Status |
12345 | 1/31/2023 | John Smith | John | Smith | USA | AMER | Memphis | Sales | DIV1 | jsmith@acmecorp | Kristine Manager | 5/1/1999 | No | 40000 | USD | Full Time | Active |
12346 | 1/31/2023 | Jane Doe | Jane | Doe | USA | AMER | Chicago | Tech | DIV1 | jdoe@acmecorp | George Supervisor | 2/15/2006 | No | 50000 | USD | Full Time | Active |
12347 | 1/31/2023 | William Smith | William | Smith | UK | EMEA | Liverpool | Legal | DIV1 | wsmith@acmecorp | George Supervisor | 9/23/2014 | No | 45000 | GBP | Full Time | Active |
12348 | 1/31/2023 | Emma Doe | Emma | Doe | Australia | APAC | Canberra | Sales | DIV2 | edoe@acmecorp | Kristine Manager | 4/3/2019 | No | 60000 | AUD | Full Time | Active |
12345 | 2/28/2023 | John Smith | John | Smith | USA | AMER | Memphis | Sales | DIV1 | jsmith@acmecorp | Kristine Manager | 5/1/1999 | No | 40000 | USD | Full Time | Active |
12346 | 2/28/2023 | Jane Doe | Jane | Doe | USA | AMER | Chicago | Tech | DIV1 | jdoe@acmecorp | George Supervisor | 2/15/2006 | No | 50000 | USD | Full Time | Active |
12347 | 2/28/2023 | William Smith | William | Smith | UK | EMEA | Liverpool | Legal | DIV1 | wsmith@acmecorp | George Supervisor | 9/23/2014 | No | 45000 | GBP | Full Time | Active |
12348 | 2/28/2023 | Emma Doe | Emma | Doe | Australia | APAC | Canberra | Sales | DIV2 | edoe@acmecorp | Kristine Manager | 4/3/2019 | No | 60000 | AUD | Full Time | Active |
12345 | 3/31/2023 | John Smith | John | Smith | USA | AMER | Memphis | Sales | DIV1 | jsmith@acmecorp | Kristine Manager | 5/1/1999 | No | 40000 | USD | Full Time | Active |
12346 | 3/31/2023 | Jane Doe | Jane | Doe | UK | AMER | Manchester | Tech | DIV1 | jdoe@acmecorp | George Supervisor | 2/15/2006 | No | 65000 | GBP | Full Time | Active |
12347 | 3/31/2023 | William Smith | William | Smith | UK | EMEA | Liverpool | Tech | DIV1 | wsmith@acmecorp | George Supervisor | 9/23/2014 | No | 45000 | GBP | Full Time | Active |
12348 | 3/31/2023 | Emma Doe | Emma | Doe | Australia | APAC | Canberra | Sales | DIV2 | edoe@acmecorp | Kristine Manager | 4/3/2019 | No | 60000 | AUD | Full Time | Active |
Then out ot the 20-something columns above, the below are 'filtered' to 2 pivot tables on sheet2:
Filter | Date added 1 (eg. 31/01/2023) | |||||
Values (DAX measures) | ||||||
Row Labels (ID) | Name | Dept | Reporting to | Country | Salary in | Salary |
12345 | John Smith | Sales | Kristine Manager | USA | USD | 40000 |
12346 | Jane Doe | Tech | George Supervisor | USA | USD | 50000 |
12347 | William Smith | Legal | George Supervisor | UK | GBP | 45000 |
12348 | Emma Doe | Sales | Kristine Manager | Australia | AUD | 60000 |
This pivot is duplicated so that you can filter by entry date. You would select 2 months and the point would be to check if there are differences.
Next, the IDs from the first pivot would be copied with a VBA to sheet3 to a table, so the formulas autofill.
ID | Changes | Transfer | Salary | Currency | Manager | Department | Country | Full name |
12345 | Yes/"" | Yes/"" | IF(VLOOKUP([@ID],'Sheet2'!A:F,6,FALSE)=VLOOKUP([@ID],'Sheet2'!I:N,6,FALSE),"",VLOOKUP([@ID],'Sheet2'!A:N,6,FALSE)) | |||||
12346 | ||||||||
12347 | ||||||||
12348 |
There, the above formula is in the columns from salary to full name, for about 3-400 rows. Change and transfer are just IF formulas checking if there's anything to the right.
The logic how it works is to select eg. Jan 2023 and Mar 2023 on Sheet2 for the pivot tables, then run the VBA for the accurate list of IDs, and by that time the formulas already found that Jane Doe who moved from the US to the UK, so by her ID, in the country column the UK is displayed and there is a YES in the change column so we can filter to only see the changes.
Feb 05 2024 08:26 AM - edited Feb 05 2024 08:27 AM
Solution@datallama I've written this macro. It gives you a new worksheet with
all data, filtered for changes. Start the macro with a cell in your source data selected:
Option Explicit
Sub OutputDifferences()
Dim src As Range
Dim rowCt As Long
Dim colCt As Long
Dim data As Variant
Dim modifiedData() As Variant
Dim curId As String
Const IDCol As Long = 1
Const dateCol As Long = 2
On Error Resume Next
Set src=Application.InputBox("Please select the data area", "Data for update analysis", ActiveCell.CurrentRegion.Address(False, False), , , , , 8)
If src Is Nothing Then Exit Sub
src.Parent.Sort.SortFields.Clear
src.Parent.Sort.SortFields.Add2 Key:=Intersect(src, src.Parent.Range("A:A")) _
, SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal
src.Parent.Sort.SortFields.Add2 Key:=Intersect(src, src.Parent.Range("B:B")) _
, SortOn:=xlSortOnValues, Order:=xlDescending, DataOption:=xlSortNormal
With src.Parent.Sort
.SetRange Range("A1:R13")
.Header = xlYes
.MatchCase = False
.Orientation = xlTopToBottom
.SortMethod = xlPinYin
.Apply
End With
data = src.Value2
ReDim modifiedData(LBound(data, 1) To UBound(data, 1), LBound(data, 2) To UBound(data, 2))
For colCt = LBound(data, 2) To UBound(data, 2)
modifiedData(1, colCt) = data(1, colCt)
If colCt < 3 Then
For rowCt = LBound(data, 1) + 1 To UBound(data, 1)
modifiedData(rowCt, colCt) = data(rowCt, colCt)
Next
End If
Next
For rowCt = LBound(data, 1) + 1 To UBound(data, 1)
If rowCt = 2 Then
curId = data(rowCt, 1)
Else
If curId = data(rowCt, 1) Then
modifiedData(rowCt, 3) = data(rowCt, 3)
modifiedData(rowCt, 4) = data(rowCt, 4)
modifiedData(rowCt, 5) = data(rowCt, 5)
For colCt = LBound(data, 2) + 2 To UBound(data, 2)
'record fullname, first name and last name
'detect changes in all but the first two columns
If data(rowCt, colCt) <> data(rowCt - 1, colCt) Then
'Found a difference, write to array of differences
modifiedData(rowCt, colCt) = data(rowCt, colCt) & " --> " & data(rowCt - 1, colCt)
End If
Next
Else
curId = data(rowCt, 1)
End If
End If
Next
With Worksheets.Add
With .Range("A1").Resize(UBound(data, 1), UBound(data, 2))
.Value = modifiedData
With .Offset(, UBound(data, 2)).Resize(, 1)
.FormulaR1C1 = "=COUNTIF(RC[-" & UBound(data, 2) - 2 & "]:RC[-1],""* --> *"")"
.Cells(1, 1).Value = "Changes"
End With
src.Copy
.CurrentRegion.PasteSpecial xlPasteFormats
.CurrentRegion.PasteSpecial xlPasteColumnWidths
.CurrentRegion.AutoFilter UBound(data, 2) + 1, ">0"
End With
End With
End Sub
Feb 07 2024 02:59 AM
Thank you! I tried running it, selecting the master source table as the data area. I also changed some details (country, region and city) for 'Emma Doe' to test it. The macro runs without an issue - but it doesn't return anything. When I disable the filter in the change column, it returns every line - as expected considering it didn't detect changes - with everything but the 'ID' and 'Date' columns cleared.
Am I trying to use it wrong?
Feb 07 2024 05:32 AM
@datallama See attached workbook.
Feb 08 2024 07:21 AM
I found the problem, I had to update this line in the macro:
With src.Parent.Sort .SetRange Range("A1:R13")
to the start and end columns of the source data in my actual file.
After that, it works like a charm. Thank you!
Feb 05 2024 08:26 AM - edited Feb 05 2024 08:27 AM
Solution@datallama I've written this macro. It gives you a new worksheet with
all data, filtered for changes. Start the macro with a cell in your source data selected:
Option Explicit
Sub OutputDifferences()
Dim src As Range
Dim rowCt As Long
Dim colCt As Long
Dim data As Variant
Dim modifiedData() As Variant
Dim curId As String
Const IDCol As Long = 1
Const dateCol As Long = 2
On Error Resume Next
Set src=Application.InputBox("Please select the data area", "Data for update analysis", ActiveCell.CurrentRegion.Address(False, False), , , , , 8)
If src Is Nothing Then Exit Sub
src.Parent.Sort.SortFields.Clear
src.Parent.Sort.SortFields.Add2 Key:=Intersect(src, src.Parent.Range("A:A")) _
, SortOn:=xlSortOnValues, Order:=xlAscending, DataOption:=xlSortNormal
src.Parent.Sort.SortFields.Add2 Key:=Intersect(src, src.Parent.Range("B:B")) _
, SortOn:=xlSortOnValues, Order:=xlDescending, DataOption:=xlSortNormal
With src.Parent.Sort
.SetRange Range("A1:R13")
.Header = xlYes
.MatchCase = False
.Orientation = xlTopToBottom
.SortMethod = xlPinYin
.Apply
End With
data = src.Value2
ReDim modifiedData(LBound(data, 1) To UBound(data, 1), LBound(data, 2) To UBound(data, 2))
For colCt = LBound(data, 2) To UBound(data, 2)
modifiedData(1, colCt) = data(1, colCt)
If colCt < 3 Then
For rowCt = LBound(data, 1) + 1 To UBound(data, 1)
modifiedData(rowCt, colCt) = data(rowCt, colCt)
Next
End If
Next
For rowCt = LBound(data, 1) + 1 To UBound(data, 1)
If rowCt = 2 Then
curId = data(rowCt, 1)
Else
If curId = data(rowCt, 1) Then
modifiedData(rowCt, 3) = data(rowCt, 3)
modifiedData(rowCt, 4) = data(rowCt, 4)
modifiedData(rowCt, 5) = data(rowCt, 5)
For colCt = LBound(data, 2) + 2 To UBound(data, 2)
'record fullname, first name and last name
'detect changes in all but the first two columns
If data(rowCt, colCt) <> data(rowCt - 1, colCt) Then
'Found a difference, write to array of differences
modifiedData(rowCt, colCt) = data(rowCt, colCt) & " --> " & data(rowCt - 1, colCt)
End If
Next
Else
curId = data(rowCt, 1)
End If
End If
Next
With Worksheets.Add
With .Range("A1").Resize(UBound(data, 1), UBound(data, 2))
.Value = modifiedData
With .Offset(, UBound(data, 2)).Resize(, 1)
.FormulaR1C1 = "=COUNTIF(RC[-" & UBound(data, 2) - 2 & "]:RC[-1],""* --> *"")"
.Cells(1, 1).Value = "Changes"
End With
src.Copy
.CurrentRegion.PasteSpecial xlPasteFormats
.CurrentRegion.PasteSpecial xlPasteColumnWidths
.CurrentRegion.AutoFilter UBound(data, 2) + 1, ">0"
End With
End With
End Sub