Forum Discussion
Excel Challenge - Pivoting poorly structured data
This is from an ExcelBI challenge. I thought it may be worth while posting my solution here as a demonstration of modern Excel methods.
Like many of such challenges, the natural solution approach is to use BYROW but that creates the usual 'array of arrays' error.
Solution:
Gradually I am moving to a point at which I have no formulas showing in the workbook other than calls to Lambda functions. In this case, the worksheet formuloa is
= PIVOTBYCATEGORYλ(OrderTbl)The function works row by row apportioning the amounts against the listed categories
PIVOTBYCATEGORYλ
// Groups and pivots table by category
= LAMBDA(table,
LET(
normalised, BYROWλ(table, APPORTIONλ),
// Identify fields from normalised table
dimension, TAKE(DROP(normalised,,1),,2),
category, TAKE(normalised,,1),
partCost, TAKE(normalised,,-1),
// Pivot by category
return, PIVOTBY(dimension, category, partCost, SUM,,0,,0),
return
)
);The function APPORTIONλ divides the amount between categories so each record within the source data returns a number of rows
APPORTIONλ
// Splits by category and assigns costs
= LAMBDA(record,
LET(
category, TOCOL(REGEXEXTRACT(INDEX(record,4),"\w+",1)),
amount, INDEX(record,3) / COUNTA(category),
year, YEAR(INDEX(record,1)),
region, IF(LEN(INDEX(record, 2)), INDEX(record, 2), "Unknown"),
broadcast, B∕CASTλ(HSTACK(region, year, amount), category),
return, HSTACK(category, broadcast),
return
)
);/* FUNCTION NAME: B∕CASTλ
DESCRIPTION: Broadcasts the terms of a vector over the shape of a second array */
B∕CASTλ = LAMBDA(vector, array, IF({1}, vector, array));The key to making the formula work is the function BYROWλ that I wrote to generalise the inbuilt but over-restrictive BYROW function.
The PIVOTBY function returned the required crosstab from the normalised data array
This won't win the award for shortest function. I didn't set out to make a generalized function, but a modular approach using existing functions in my library allowed me to pull this together quickly.
//----------------------------------------------------------------------------------- //PivotbyCatλ //----------------------------------------------------------------------------------- // Author: Patrick H // Date: 1/18/2026 // Description: // Pivot by row fields (one of which is a date), categories, and values. // It's presumed the categories array must be split prior to aggregation so a // delimiter must be specified. With optional parameters for aggregation type (function) // and date display (date_format). // Parameters: // row_detail - 1D Array of non-dates. // row_dates - 1D Array of dates. // categories - 1D Array containing category information strung together with a delimiter. // values - 1D Array of numbers. // delimiter - Delimiter used in the categories array. // // Optional Parameters: // [date_format] - Format code for dates array. If omitted, defaults to "yyy". // [function] - ETA or Lambda function for aggregation. If omitted, defaults to SUM. // // Lambdas called: Streakλ, RepeatRowsλ PivotbyCatλ= LAMBDA( row_detail,row_dates,categories,values,delimiter,[date_format],[function], //Validate inputs LET( //Dimensions r₁, ROWS(row_detail), r₂, ROWS(row_dates), r₃, ROWS(categories), r₄, ROWS(values), //Halting scenarios ShapeMismatch, OR(r₁ <> r₂,r₁ <> r₃,r₁ <> r₄), InvalidDates, OR(AND(ISERROR(DATEVALUE(TEXT(row_dates,date_format))), ISERROR(VALUE(TEXT(row_dates,date_format)))), NOT(AND(ISNUMBER(row_dates)))), InvalidFn, AND(NOT(ISOMITTED(function)), ISERROR(BYROW({1,1;2,2},function))), scenarios, VSTACK(ShapeMismatch,InvalidDates,InvalidFn), errors, {"#DIMENSIONS!";"#DATES!";"#INVALID-FUNCTION!"}, //Logic gate FILTER(errors,scenarios, //Proceed LET( //Optional parameters fnλ, IF(ISOMITTED(function),SUM,function), date_format,IF(ISOMITTED(date_format),"yyy",date_format), //Local functions CountCatλ, LAMBDA(row,COUNTA(TEXTSPLIT(row,delimiter))), Parseλ, LAMBDA(array,instance, TEXTBEFORE(TEXTAFTER(delimiter&array&delimiter, delimiter,instance,,,array),delimiter,,,,categories)), //Categories in each row k, BYROW(categories,CountCatλ), //Indices i, RepeatRowsλ(SEQUENCE(r₁),k), j, Streakλ(i), //Row fields a, RepeatRowsλ(row_detail,k), detail, IF(a= 0, "Unknown",a), dates, RepeatRowsλ(TEXT(row_dates,date_format),k), sort_key, IFERROR(DATEVALUE(dates),VALUE(dates)), row_fields, HSTACK(sort_key,detail,dates), //Column fields unpack, RepeatRowsλ(categories,k), categories, Parseλ(unpack,j), //Value field values, RepeatRowsλ(values / k,k), //Aggregation agg, DROP(PIVOTBY(row_fields,categories,values,fnλ,,0,,0),,1), agg )))); //----------------------------------------------------------------------------------- //Streakλ //----------------------------------------------------------------------------------- // Author: Patrick H // Date: 12/22/25 // Description: // Calculate streaks of consecutive values within 1D or 2D arrays. // // Parameters: // array - Array (1D or 2D). // // Optional Parameters: // [Horizontal?] - For 1D arrays, calculate streaks column‑wise instead of row‑wise. // [Include_Symbol?] - TRUE to include the symbol with streak counts (e.g., W·1, W·2, L·1). Streakλ= LAMBDA( array,[Horizontal?],[Include_Symbol?], // Handle scalar input IF(AND(ROWS(array)=1,COLUMNS(array)=1),array, LET( //Dimensions i, ROWS(array), j,COLUMNS(array), Is2D?, AND(i>1,j>1), //Flatten if 2D array array, IF(Is2D?,TOCOL(array),array), //Direction for stacking NoDir?, ISOMITTED(Horizontal?), fn, IF(NoDir?,VSTACK,HSTACK), //Build true/false array for streak detection kept, IF(NoDir?,DROP(array,-1),DROP(array,,-1)), bool, array=fn("omit",kept), //Accumulate streak counts acc, SCAN(0,bool,LAMBDA(a,v,IF(a=0,1,IF(v,a+1,1)))), //Optionally include symbol with streak counts result, IF(ISOMITTED(Include_Symbol?),acc, array&UNICHAR(183)&acc), //Wrap result back to 2D if original array was 2D deliver,IF(Is2D?,WRAPROWS(result,j),result), deliver ))); //----------------------------------------------------------------------------------- //RepeatRowsλ //----------------------------------------------------------------------------------- //Author: Patrick H. //Date: 11/7/2025 //Version: 1.0 //Description: //Repeat each row in the supplied array by a corresponding repeat count. //Arrays and scalars are supported. //----------------------------------------------------------------------------------- //Parameters: //array - array to be repeated //repeat - 1D array of repeat integers RepeatRowsλ = LAMBDA( array, repeat, //Check inputs IF(OR(ISTEXT(repeat),repeat<=0),#VALUE!, LET( //Flatten input repeat, TOCOL(repeat), //Dimensions and row indexing A↕, ROWS(array), R↕,ROWS(repeat), r, IF(A↕<>R↕,EXPAND(repeat,A↕,,@TAKE(repeat,-1)), repeat), i, SEQUENCE(ROWS(r)), m, MAX(r), idx, LAMBDA(TOCOL(IF(SIGN(r-SEQUENCE(,m,0))=1,i,NA()),2)), //Unwrap idx but defer delivery until function invocation deliver, LAMBDA(CHOOSEROWS(array,idx())), deliver ))());Pivot by row fields (one of which is a date), categories, and values.
7 Replies
- PeterBartholomew1Silver Contributor
Some amazing solutions!
All of which serve to demonstrate that Excel has moved lightyears from where is used to be.
How long before such solutions are overtaken by AI, I have no idea. Mind you, that may be where I delete 365 from my computer and find other interests. You never know though. It is strange to think that mathematical notation was developed to overcome to ambiguity of natural language and here we are hell-bent on reversing the process.
- Patrick2788Silver Contributor
This won't win the award for shortest function. I didn't set out to make a generalized function, but a modular approach using existing functions in my library allowed me to pull this together quickly.
//----------------------------------------------------------------------------------- //PivotbyCatλ //----------------------------------------------------------------------------------- // Author: Patrick H // Date: 1/18/2026 // Description: // Pivot by row fields (one of which is a date), categories, and values. // It's presumed the categories array must be split prior to aggregation so a // delimiter must be specified. With optional parameters for aggregation type (function) // and date display (date_format). // Parameters: // row_detail - 1D Array of non-dates. // row_dates - 1D Array of dates. // categories - 1D Array containing category information strung together with a delimiter. // values - 1D Array of numbers. // delimiter - Delimiter used in the categories array. // // Optional Parameters: // [date_format] - Format code for dates array. If omitted, defaults to "yyy". // [function] - ETA or Lambda function for aggregation. If omitted, defaults to SUM. // // Lambdas called: Streakλ, RepeatRowsλ PivotbyCatλ= LAMBDA( row_detail,row_dates,categories,values,delimiter,[date_format],[function], //Validate inputs LET( //Dimensions r₁, ROWS(row_detail), r₂, ROWS(row_dates), r₃, ROWS(categories), r₄, ROWS(values), //Halting scenarios ShapeMismatch, OR(r₁ <> r₂,r₁ <> r₃,r₁ <> r₄), InvalidDates, OR(AND(ISERROR(DATEVALUE(TEXT(row_dates,date_format))), ISERROR(VALUE(TEXT(row_dates,date_format)))), NOT(AND(ISNUMBER(row_dates)))), InvalidFn, AND(NOT(ISOMITTED(function)), ISERROR(BYROW({1,1;2,2},function))), scenarios, VSTACK(ShapeMismatch,InvalidDates,InvalidFn), errors, {"#DIMENSIONS!";"#DATES!";"#INVALID-FUNCTION!"}, //Logic gate FILTER(errors,scenarios, //Proceed LET( //Optional parameters fnλ, IF(ISOMITTED(function),SUM,function), date_format,IF(ISOMITTED(date_format),"yyy",date_format), //Local functions CountCatλ, LAMBDA(row,COUNTA(TEXTSPLIT(row,delimiter))), Parseλ, LAMBDA(array,instance, TEXTBEFORE(TEXTAFTER(delimiter&array&delimiter, delimiter,instance,,,array),delimiter,,,,categories)), //Categories in each row k, BYROW(categories,CountCatλ), //Indices i, RepeatRowsλ(SEQUENCE(r₁),k), j, Streakλ(i), //Row fields a, RepeatRowsλ(row_detail,k), detail, IF(a= 0, "Unknown",a), dates, RepeatRowsλ(TEXT(row_dates,date_format),k), sort_key, IFERROR(DATEVALUE(dates),VALUE(dates)), row_fields, HSTACK(sort_key,detail,dates), //Column fields unpack, RepeatRowsλ(categories,k), categories, Parseλ(unpack,j), //Value field values, RepeatRowsλ(values / k,k), //Aggregation agg, DROP(PIVOTBY(row_fields,categories,values,fnλ,,0,,0),,1), agg )))); //----------------------------------------------------------------------------------- //Streakλ //----------------------------------------------------------------------------------- // Author: Patrick H // Date: 12/22/25 // Description: // Calculate streaks of consecutive values within 1D or 2D arrays. // // Parameters: // array - Array (1D or 2D). // // Optional Parameters: // [Horizontal?] - For 1D arrays, calculate streaks column‑wise instead of row‑wise. // [Include_Symbol?] - TRUE to include the symbol with streak counts (e.g., W·1, W·2, L·1). Streakλ= LAMBDA( array,[Horizontal?],[Include_Symbol?], // Handle scalar input IF(AND(ROWS(array)=1,COLUMNS(array)=1),array, LET( //Dimensions i, ROWS(array), j,COLUMNS(array), Is2D?, AND(i>1,j>1), //Flatten if 2D array array, IF(Is2D?,TOCOL(array),array), //Direction for stacking NoDir?, ISOMITTED(Horizontal?), fn, IF(NoDir?,VSTACK,HSTACK), //Build true/false array for streak detection kept, IF(NoDir?,DROP(array,-1),DROP(array,,-1)), bool, array=fn("omit",kept), //Accumulate streak counts acc, SCAN(0,bool,LAMBDA(a,v,IF(a=0,1,IF(v,a+1,1)))), //Optionally include symbol with streak counts result, IF(ISOMITTED(Include_Symbol?),acc, array&UNICHAR(183)&acc), //Wrap result back to 2D if original array was 2D deliver,IF(Is2D?,WRAPROWS(result,j),result), deliver ))); //----------------------------------------------------------------------------------- //RepeatRowsλ //----------------------------------------------------------------------------------- //Author: Patrick H. //Date: 11/7/2025 //Version: 1.0 //Description: //Repeat each row in the supplied array by a corresponding repeat count. //Arrays and scalars are supported. //----------------------------------------------------------------------------------- //Parameters: //array - array to be repeated //repeat - 1D array of repeat integers RepeatRowsλ = LAMBDA( array, repeat, //Check inputs IF(OR(ISTEXT(repeat),repeat<=0),#VALUE!, LET( //Flatten input repeat, TOCOL(repeat), //Dimensions and row indexing A↕, ROWS(array), R↕,ROWS(repeat), r, IF(A↕<>R↕,EXPAND(repeat,A↕,,@TAKE(repeat,-1)), repeat), i, SEQUENCE(ROWS(r)), m, MAX(r), idx, LAMBDA(TOCOL(IF(SIGN(r-SEQUENCE(,m,0))=1,i,NA()),2)), //Unwrap idx but defer delivery until function invocation deliver, LAMBDA(CHOOSEROWS(array,idx())), deliver ))());Pivot by row fields (one of which is a date), categories, and values.
- LorenzoSilver Contributor
PQ option
- Abuses of Table.ReplaceValue (can cause perf. issue on large table)
- Doesn't use Table.Pivot
// TableSource let Source = Excel.CurrentWorkbook(){[Name="TableSource"]}[Content], ChangedType = Table.TransformColumnTypes( Source, { {"Amount", type number} } ) in ChangedType // TablePivoted let Source = TableSource, ReplacedOrderDateAndRegion = Table.ReplaceValue( Source, each [OrderDate], "Unknown", // Works with the given source table. Otherwise would also replace null with "Unknown" in [OrderDate] (x,y,z) => if x is null then z else Date.Year( DateTime.Date(x) ), {"OrderDate", "Region"} ), FlaggedMultiCategories = Table.AddColumn( ReplacedOrderDateAndRegion, "Flag", each Text.Contains( [Categories], ";" ), type logical ), SplittedCategoriesAndAmount = Table.Combine( { fxFilterTable( FlaggedMultiCategories, "Flag", false ), Table.ExpandListColumn( Table.ReplaceValue( fxFilterTable( FlaggedMultiCategories, "Flag", true ), each [Amount], each Text.Split( [Categories], ";" ), (x,y,z) => if Value.Type(x) = type number then x / List.Count(z) else z, {"Amount", "Categories"} ), "Categories" ) } ), TotalAmounts = Table.Group( SplittedCategoriesAndAmount, {"OrderDate", "Region", "Categories"}, {"Amount", each List.Sum( [Amount] ), type number} ), DistinctCategories = List.Buffer( List.Sort( List.Distinct( Table.Column( TotalAmounts, "Categories") ) ) ), PivotedCategories = Table.Group( TotalAmounts, {"Region", "OrderDate"}, { "Pivoted", (nested_table) => fxPivotTable( nested_table, "Categories", "Amount"), fxTypeForTable( DistinctCategories, type nullable number) } ), SortedRegionAndOrderDate = Table.Sort( PivotedCategories, { {"Region", Order.Ascending}, {"OrderDate", Order.Ascending} } ), ExpandedPivotedTables = Table.ExpandTableColumn( SortedRegionAndOrderDate, "Pivoted", DistinctCategories ), ReplacedNullAmounts = Table.ReplaceValue( ExpandedPivotedTables, null, 0, Replacer.ReplaceValue, DistinctCategories ), TypedOtherColumns = Table.TransformColumnTypes( ReplacedNullAmounts, { {"Region", type text}, {"OrderDate", Int64.Type} } ) in TypedOtherColumns // fxFilterTable (table as table, filter_field as text, filter_value as any) as table => Table.RemoveColumns( Table.SelectRows( table, each Record.Field( _, filter_field ) = filter_value ), filter_field ) // fxPivotTable ( table as table, optional attributeColumn as nullable text, optional valueColumn as nullable text ) as table => Table.Transpose( Table.SelectColumns( table, if valueColumn is null then "Value" else valueColumn ), Table.Column( table, if attributeColumn is null then "Attribute" else attributeColumn ) ) // fxTypeForTable (List as list, Type as type) as type => type table Type.ForRecord( List.Accumulate( List, [], (state,current) => Record.AddField( state, current, [Type = Type, Optional=false] ) ), false )
- djclementsSilver Contributor
Good stuff, Peter! Nice demonstration of the power of your custom function collection for evaluating nested arrays.
I see the majority of the other dynamic array solutions offered up in the comments section of that LinkedIn post use some variation of DROP-REDUCE-VSTACK to perform each step on a row-by-row basis. In this particular situation, however, it is possible to conduct a TOCOL-broadcasting method (unpivot) over the entire table and pass the normalized fields directly to PIVOTBY:
=LET( num, LEN(Categories)-LEN(SUBSTITUTE(Categories,";",))+1, arr, TEXTBEFORE(TEXTAFTER(";"&Categories&";",";",SEQUENCE(,MAX(num))),";"), fnλ, LAMBDA(err,LAMBDA(vec,TOCOL(IF(err,arr,vec),2)))(ISERROR(arr)), PIVOTBY(HSTACK(fnλ(IF(ISBLANK(Region),"Unknown",Region)),fnλ(YEAR(OrderDate))),TOCOL(arr,2),fnλ(Amount/num),SUM,0,0,,0) )Or with CHOOSEROWS and a single broadcast of the table row IDs:
=LET( num, LEN(Categories)-LEN(SUBSTITUTE(Categories,";",))+1, arr, TEXTBEFORE(TEXTAFTER(";"&Categories&";",";",SEQUENCE(,MAX(num))),";"), fnλ, LAMBDA(rId,LAMBDA(vec,CHOOSEROWS(vec,rId)))(TOCOL(IF(ISERROR(arr),arr,SEQUENCE(ROWS(arr))),2)), PIVOTBY(HSTACK(fnλ(IF(ISBLANK(Region),"Unknown",Region)),fnλ(YEAR(OrderDate))),TOCOL(arr,2),fnλ(Amount/num),SUM,0,0,,0) )Kind regards.
- PeterBartholomew1Silver Contributor
David, Interesting solutions! I think that, where possible, obtaining 2D arrays by broadcasting vectors is more efficient than resorting to arrays of functions. Against that, I have been trying to avoid idiosyncratic solutions that are specific to a problem and also to increase my use of arrays of functions (so called 1st class citizens of the new environment). The trouble is that I am so focussed on arrays of functions as the base element of all calculation that I sometimes overlook broadcasting where it is applicable.
- NikolinoDEPlatinum Contributor
This is a very solid—and very modern—Excel solution. You’ve essentially built a small, well-thought-out transformation pipeline using LAMBDA functions.
The way you normalize the data first and only then pivot is especially elegant, and your custom BYROW approach neatly sidesteps one of Excel’s more frustrating limitations.
A great example of how far “formula-only” Excel has come, and a very instructive solution to the challenge.
Thank you for sharing it with us.
- PeterBartholomew1Silver Contributor
The interface will not let me back to correct typos and has stripped out the workbook. Sorry about that!