Forum Discussion
Lists Comparison (Stacked & Keeping order)
- Jan 02, 2025
Interesting project. I see you've found a solution but indicate there are still performance issues with larger amounts of data. One thing to keep in mind is that iterative functions like MAKEARRAY, MAP, SCAN and REDUCE will perform very poorly when using INDEX over an array object (as opposed to a physical range). This may be a contributing factor since a number of your custom functions use iterative indexing methods (not to mention some iterative stacking as well).
In the attached file I've used an alternative approach for generating the list of matching indices and row counts, then used SORTBY with EXPAND and TOCOL-IFS-SEQUENCE to pad each array with the appropriate number of blank rows. It seems to be generating the same results as your original formulas, plus it can handle 5 separate lists with 2000 rows each in < 1 second. I've put the new lambda definitions on a separate worksheet, for your convenience. Check it out and see if it's working as expected. Cheers!
I also think you have a problem with multiple solutions and/or undefined optimization. You could easily come up with solutions that are likely far from optimal: ABCDEF could match with BCDEFA in 2 ways:
A_B_C_D_E_F
x _B_C_D_E_F_A
or
x _x_x_ x_x_A_B_C_D_E_F
B_C_D_E_F_A
The former being much "better" match but I don't believe you have any 'rule' why the latter isn't a valid solution. That said, I didn't break down your existing solution to see if it checks for that but it seams to be 'single path' iterative based so I don't think so. we also know excel works with arrays more efficiently. I have the following start that I feel could be useful but maybe you already tried and couldn't get further. So on your sample sheet I took the 1st 'DEMO' and put the 2nd col into a row and found the matix of matches (shown using complex #s for row, column. The last 2 columns (V & W) use those matches and sort the 'real'/rows based on the 'imaginary'/cols and then the reverse. Any 'path' that increases both the row and column is a potential solution, so w/r to the 2 final columns any monotonically increasing set would be a potential solution of matching values (except these :
in this case I see 4 (non-trivial, i.e. sequences that do not exclude possible matches in between) match solutions:
2+3i, 6+5i, 8+6i, 9+9i
2+3i, 6+5i, 7+10i
2+3i, 5+8i, 6+9i, 7+10i
2+3i, 5+8i, 9+9i
So although this step makes it much easier for me to find those solutions I'm not seeing a way to functionally do it. Like it would be nice if there was a trick using MMULT or something to give potential solutions and then use the one that is the longest. In THIS case you can easily go row by row or column by column to find the 2 'best' solutions but that would NOT be the case if the 1st column ended with JUL and the 2nd column ended with MAY as those would have to be 'skipped' in each corresponsing case to avoid the 2 match solution.
- amit_bholaDec 30, 2024Iron Contributor
Thanks m_tarler for the idea about holding pairs of candidate matched-indices as complex nos.
I studied (see attachment), and i expect that filtering-out the monotonically increasing nos. would still need recursion which i already implemented and due to it, as Mr. Peter pointed out, the order of complexity is expected to remain exponential. Thanks for your suggestions. When time permits, i may explore the approach further nonetheless.
- m_tarlerDec 31, 2024Bronze Contributor
So I worked your sheet a little further and you can tell me if the performance is still holding up. Basically once you find the list of 'imaginary numbers' I then do a complete list of all possibilities and then filter out the possibilities that are increasing and then pull the list with the most in the list
see attached
- amit_bholaJan 01, 2025Iron Contributor
m_tarler, Thanks!
I also followed your suggested approach while you did and made a performance comparison.
There is significant improvement till no. of elements are small, but then it does increase exponentially.
e.g. a 3-List , 100-elements each comparison took 4.5 minutes to compute! (Excel could do the calc without becoming unresponsive forever!).
Have made comparison using stop-watch on my machine in updated attached Excel file.
The crux is, earlier i was comparing each of the element of ListA with each of the element of ListB (with some truncation),
Your suggestion about looking only within the candidate matches improved the performance significantly, still slow for longer lists, but perhaps about that far only we can go.
Thanks!