Forum Discussion
Excel Challenge - Pivoting poorly structured data
This is from an ExcelBI challenge. I thought it may be worth while posting my solution here as a demonstration of modern Excel methods.
Like many of such challenges, the natural solution approach is to use BYROW but that creates the usual 'array of arrays' error.
Solution:
Gradually I am moving to a point at which I have no formulas showing in the workbook other than calls to Lambda functions. In this case, the worksheet formuloa is
= PIVOTBYCATEGORYλ(OrderTbl)The function works row by row apportioning the amounts against the listed categories
PIVOTBYCATEGORYλ
// Groups and pivots table by category
= LAMBDA(table,
LET(
normalised, BYROWλ(table, APPORTIONλ),
// Identify fields from normalised table
dimension, TAKE(DROP(normalised,,1),,2),
category, TAKE(normalised,,1),
partCost, TAKE(normalised,,-1),
// Pivot by category
return, PIVOTBY(dimension, category, partCost, SUM,,0,,0),
return
)
);The function APPORTIONλ divides the amount between categories so each record within the source data returns a number of rows
APPORTIONλ
// Splits by category and assigns costs
= LAMBDA(record,
LET(
category, TOCOL(REGEXEXTRACT(INDEX(record,4),"\w+",1)),
amount, INDEX(record,3) / COUNTA(category),
year, YEAR(INDEX(record,1)),
region, IF(LEN(INDEX(record, 2)), INDEX(record, 2), "Unknown"),
broadcast, B∕CASTλ(HSTACK(region, year, amount), category),
return, HSTACK(category, broadcast),
return
)
);/* FUNCTION NAME: B∕CASTλ
DESCRIPTION: Broadcasts the terms of a vector over the shape of a second array */
B∕CASTλ = LAMBDA(vector, array, IF({1}, vector, array));The key to making the formula work is the function BYROWλ that I wrote to generalise the inbuilt but over-restrictive BYROW function.
The PIVOTBY function returned the required crosstab from the normalised data array
4 Replies
- djclementsSilver Contributor
Good stuff, Peter! Nice demonstration of the power of your custom function collection for evaluating nested arrays.
I see the majority of the other dynamic array solutions offered up in the comments section of that LinkedIn post use some variation of DROP-REDUCE-VSTACK to perform each step on a row-by-row basis. In this particular situation, however, it is possible to conduct a TOCOL-broadcasting method (unpivot) over the entire table and pass the normalized fields directly to PIVOTBY:
=LET( num, LEN(Categories)-LEN(SUBSTITUTE(Categories,";",))+1, arr, TEXTBEFORE(TEXTAFTER(";"&Categories&";",";",SEQUENCE(,MAX(num))),";"), fnλ, LAMBDA(err,LAMBDA(vec,TOCOL(IF(err,arr,vec),2)))(ISERROR(arr)), PIVOTBY(HSTACK(fnλ(IF(ISBLANK(Region),"Unknown",Region)),fnλ(YEAR(OrderDate))),TOCOL(arr,2),fnλ(Amount/num),SUM,0,0,,0) )Or with CHOOSEROWS and a single broadcast of the table row IDs:
=LET( num, LEN(Categories)-LEN(SUBSTITUTE(Categories,";",))+1, arr, TEXTBEFORE(TEXTAFTER(";"&Categories&";",";",SEQUENCE(,MAX(num))),";"), fnλ, LAMBDA(rId,LAMBDA(vec,CHOOSEROWS(vec,rId)))(TOCOL(IF(ISERROR(arr),arr,SEQUENCE(ROWS(arr))),2)), PIVOTBY(HSTACK(fnλ(IF(ISBLANK(Region),"Unknown",Region)),fnλ(YEAR(OrderDate))),TOCOL(arr,2),fnλ(Amount/num),SUM,0,0,,0) )Kind regards.
- PeterBartholomew1Silver Contributor
David, Interesting solutions! I think that, where possible, obtaining 2D arrays by broadcasting vectors is more efficient than resorting to arrays of functions. Against that, I have been trying to avoid idiosyncratic solutions that are specific to a problem and also to increase my use of arrays of functions (so called 1st class citizens of the new environment). The trouble is that I am so focussed on arrays of functions as the base element of all calculation that I sometimes overlook broadcasting where it is applicable.
- NikolinoDEPlatinum Contributor
This is a very solid—and very modern—Excel solution. You’ve essentially built a small, well-thought-out transformation pipeline using LAMBDA functions.
The way you normalize the data first and only then pivot is especially elegant, and your custom BYROW approach neatly sidesteps one of Excel’s more frustrating limitations.
A great example of how far “formula-only” Excel has come, and a very instructive solution to the challenge.
Thank you for sharing it with us.
- PeterBartholomew1Silver Contributor
The interface will not let me back to correct typos and has stripped out the workbook. Sorry about that!