Jun 01 2022 11:00 PM
I have been trying to solve a problem for some time now but unable to find an acceptable solution and hence requesting help from you. The attached has a snapshot of what I am trying to achieve. The problem I am facing is when the list goes beyond 100,000 rows, Excel runs out of resource.
The list contains duplicates as well as non-duplicates. I believe, if we restrict MMULT only for the duplicates, the "Out of Resource" error won't show up. I want the formula to do the following:
1. If unique (non-duplicate), simply consider the given value
2. If duplicate, then do MMULT (sum) for only the duplicate at hand and move to the next in sequence.
In the attached, "a", "b" and "d" are duplicated while "c" is unique. Want the formula to consider only "a" for MMULT and then separately for "b" and so on. Doing it together for "a", "b", "c" and "d" gives the "Out of Resource" error. The formula identifies "c" as unique and copies the corresponding value.
Please let me know if further clarification is required.
I wanted to upload the file but unfortunately couldn't find a link to do so.
Jun 01 2022 11:02 PM
@dipankar_de You can share your file via Onedrive, Google Docs, Dropbox or similar.
Jun 01 2022 11:26 PM
Jun 01 2022 11:33 PM
@dipankar_de Thanks! Not sure what I'm looking at but why not simply use this in C20:
=SUMIF(B2#,UNIQUE(B2#),C2#)
Jun 01 2022 11:51 PM
Jun 01 2022 11:53 PM
Jun 02 2022 12:26 AM
@dipankar_de Well, if your data set is too big for MMULT and you are short on resources, perhaps you should transform the raw data in PowerQuery (PQ) first and create a table summing each of the six columns, grouped by the unique reference (a,b,c,d, etc. as per your example). Did that for the first few columns
A data set of 100 thousand rows isn't particularly big for PQ and shouldn't cause much of a problem. But perhaps I'm missing the point completely and underestimating your problem.
Jun 02 2022 12:41 AM
Jun 02 2022 01:03 AM
@dipankar_de See attached file. It has a very basic query in it. Can't tell how this will perform on your machine with real data, of course.
Jun 02 2022 06:00 AM
As a comment practical limit for MMULT on max dimension of returned vector is about 7500 (bit less).
Jun 02 2022 06:24 AM
Jun 02 2022 07:10 AM
SUMIFS is very fast in execution and tends to beat array methods. However, provided your data is strictly sorted, it is possible to search for the first and last entries of each contiguous range and sum that.
= LET(
distinct, SORT(UNIQUE(letter)),
total, MAP(distinct, LAMBDA(chr,
LET(
firstCell, XLOOKUP(chr,letter,number,,,1),
finalCell, XLOOKUP(chr,letter,number,,,-1),
SUM(firstCell:finalCell)
)
)
),
HSTACK(distinct, total)
)
I have beaten SUMIFS with formulas of this kind in the past in situations in which each summation is only working off a small proportion of the values.
Jun 02 2022 07:19 AM
Jun 02 2022 07:42 AM
Sorry, the HSTACK was little more than cosmetic tidying up. This is the same function stripped right back to a relative reference.
= SUM(XLOOKUP(@distinct,letter,number,,, 1):XLOOKUP(@distinct,letter,number,,,-1))
Jun 02 2022 08:05 AM
Jun 02 2022 10:12 AM
Jun 02 2022 10:16 AM
Jun 02 2022 11:44 AM
So you still have something to look forward to! I couldn't wait, so put my semi-annual update aside and licenced a personal version (insider beta) in order to get to grips with the tumultuous changes.
If you are prepared to accept relative referencing and copy-fill-down then the formula
= SUM(
XLOOKUP(@distinct,letter,number,,, 1)
: XLOOKUP(@distinct,letter,number,,,-1)
)
will do the same job (green header).
Jun 02 2022 12:40 PM
It all depends on concrete data. I assume we shall sumif 100000 range with > 7500 criteria elements. On my relatively slow PC
=SUMIFS( <100000 rows sum range>, <100000 rows names range>, <10000 rows unique names range> )
took about 15 sec.
I don't think that any formula with dynamic arrays, in general any other formula, will be working faster.
To compare, refresh of Power Query aggregation for the same dataset takes about 4 sec and 2-3 sec more if aggregate 8 columns at once.
Jun 02 2022 04:32 PM