Nov 11 2022 09:17 PM
Hello All,
I am trying to split "; " separated range of text into columns. The intent is to use one single spill-formula (instead of copy-fill-down) to address each text in the range.
I have the formula to do this for a small set of values using CONCAT and TEXTSPLIT functions. However, when I try doing it with a set of values that run beyond 32,767 characters, CONCAT function runs into a #VALUE error because of Cell value length restriction in Excel.
This is because using the CONCAT function I was trying to create a long-long string taking the col_delimited values in a cell in each row and padding a row_delimiter at the end of the cell value in each row for the entire range. The objective was to submit the long-long string, so created using the CONCAT function, to the TEXTSPLIT function to split it into columns and rows based on the col_delimiter and row_delimeter characters.
The solution to this, I believe, is to split values in each row (cell) individually. However, I want to get this done using one single array formula for the entire range, which unfortunately, I am unable to accomplish. I have tried using TEXTSPLIT within MAP-LAMBDA to process each row individually, but it only splits the first element of the col_delimited string in each row - the remaining of col_delimited string in each row is ignored.
Request you all to help me with a single-formula spill-array solution to the problem I have. An example case is included in the file (TEXTSPLIT.xlsx) with the range to split (Table2). The single-formula solution that works for a small set of values is given in Cell E3. The single formula solution that failed to work as desired (TEXTSPLIT within MAP-LAMBDA) is available in Cell P3.
Please also let me know why this didn't work even when processing each row individually using MAP-LAMBDA.
Thanks in advance.
Warm regards,
Dipankar De
Nov 12 2022 06:08 AM - edited Nov 12 2022 06:17 AM
At first I thought that this could be solved quite simply with TEXTSPLIT() within BYROW(). But that always resulted in a CALC# error.
I then solved this rather oldscool with MID() SEARCH() and SEQUENCE().
There are probably other variants.
=LET(
in,SUBSTITUTE("; "&Table2[Values]&"; "," ",""),
maxi,MAX(LEN(in)-LEN(SUBSTITUTE(in,";","")))-1,
start,SEARCH(CHAR(124),SUBSTITUTE(in,";",CHAR(124),SEQUENCE(1,maxi,1))),
end,SEARCH(CHAR(124),SUBSTITUTE(in,";",CHAR(124),SEQUENCE(1,maxi,2))),
HSTACK(Table2['#],IFERROR(MID(in,start+1,end-start-1),"")))
I hope that's OK for you.
Nov 12 2022 08:10 AM
Nov 12 2022 09:07 AM
@dscheikey alternative using REDUCE and VSTACK:
=LET(colOne, Table2['#], colRest, Table2[Values], splittedData,
IFERROR(DROP(
REDUCE("",colRest,
LAMBDA(p,q, VSTACK(p, TEXTSPLIT(q, "; ", , FALSE, 1, "")))
)
,1),""),
HSTACK(colOne, splittedData))
Nov 12 2022 11:43 AM
Here's my take:
=LET(matrix,MAKEARRAY(ROWS(Table2),9,LAMBDA(r,c,CHOOSECOLS(EXPAND(TEXTSPLIT(INDEX(Table2[Values],r),";"),1,9,""),c))),HSTACK(Table2['#],matrix))
Nov 12 2022 01:07 PM - edited Nov 12 2022 01:09 PM
I think the abject failure to process arrays or arrays is the most ridiculous error of judgement made by Microsoft in what is otherwise a suburb sequence of innovations, turning Excel from amateurs' corner into something capable of creating reusable, quality solutions. One of the principal benefits of the grid is that it allows one to calculate and display arrays of arrays. To pretend otherwise is to store up huge compatibility problems for when sense eventually prevails.
To offer an absurd solution, I developed a custom Lambda function "Repackλ" to turn the comma-deliminated values into fixed width strings
Repackλ
= LAMBDA(t,
LET(
splitText, TEXTSPLIT(t, ","),
length, LEN(splitText),
CONCAT(splitText & REPT(" ", 10 - length))
)
);
giving
What appears to be a monumental waste of effort pays off because MID will generate and array of arrays if used to wrap the Lambda function.
= MID( MAP(Text, Repackλ), {1,11,21,31,41}, 10)
giving
Having to go back to MID to achieve something that 'Excel has never done?' is so annoying that I propose to continue expressing my scorn at every opportunity (and there are many opportunities, which makes the need for change so urgent). I hope you all join in to make Microsoft listen even if just to save themselves further embarrassment!
Nov 12 2022 11:37 PM
@dscheikey; @mtarler; @Patrick2788; @PeterBartholomew1
Thanks to each of you for providing me with the exact help I was looking for on this forum.
@dscheikey: Your formula might be an old school one but it did solve my problem. I created a similar formula but that was a copy-fill-down type - one formula per row which was restricting me to use the P3# spill-array addressing in subsequent processing. Many thanks for your help - I really liked the start and end formula snippets. The way I did it in my copy-fill-down formula was:
=LET(a, $B5, b, $C5, data, "|" & SUBSTITUTE(b, "; ", "|") & "|",
WrdCnt, LEN(data) - LEN(SUBSTITUTE(data, "|", "")),
start, FIND(CHAR(2), SUBSTITUTE(data, "|", CHAR(2), SEQUENCE(, WrdCnt))),
end, SEQUENCE(1, COLUMNS(start) - 1),
O, MID(data, INDEX(start, end) + 1, INDEX(start, end + 1) - INDEX(start, end) - 1), HSTACK(a, O)),
Seems, the way I had written the "end" was restricting the row-spill when extended to the complete range of rows. Thanks again.
@mtarler: You provided the new school formula that I was looking for but was encountering errors due to the "array of arrays" issue. The formula with REDUCE, VSTACK, TEXTSPLIT did the breakthrough. However, there is still an issue which I am not able to resolve - the output generated by the formula is changing the case of the strings to lowercase. Please look through the "Solution - @mtarler" worksheet in the attached. Do you see a reason why the case is getting changed and what is changing it?
https://1drv.ms/x/s!Aqyz05eCPz3-rWwakzdAFKiYsjtH?e=ydgpxa
@Patrick2788: Thanks for sharing the link to the "Array of Arrays" limitation literature. I haven't had tried your formula on the actual problem (string length more than 32,767 characters) yet. However, looking at it, I see you have hardcoded the number of columns in the MAKEARRAY function parameters. Anyway, this is a small thing to rectify by taking the max of the col_delimiter count. Will let you know once I have looked through your formula.
@PeterBartholomew1: You and Sergei are simply the Best. Though I can follow the solutions you provide, it would take quite some time for me to reach that level. I completely agree with you that going back to MID is really annoying after having seen the power of new age Excel functions.
Nov 13 2022 04:20 AM
I have 'interfered' with Matt's solution to add further refinement or complexity (however you take it) to create a worksheet function
= SplitListλ(Response,"; ",185)
which uses the Lambda funtions
SplitListλ
= LAMBDA(list,s,n,
DROP(
REDUCE("", list, SplitTextλ(s,n)),
1,0)
);
SplitTextλ
= LAMBDA(s,n, LAMBDA(p, q,
VSTACK(p,
EXPAND(TEXTSPLIT(q, s), , n, "")
)
));
Nov 13 2022 04:30 AM
I was looking through as you were refining the formula initially created by Matt. Two questions I have are:
1. In Matt's formula what was creating the lowercase?
2. What necessitates to include EXPAND with TEXTSPLIT? If we don't include EXPAND and instead provide all the parameters that TEXTSPLIT has, what would be the difference?
Nov 13 2022 04:37 AM
Nov 13 2022 06:33 AM
Nov 13 2022 08:22 AM
Nov 13 2022 11:22 AM
Nov 13 2022 11:27 AM
Nov 13 2022 11:35 AM
Nov 13 2022 11:45 AM
I'd simplify to
=LET(
data, HSTACK( Table2['#], TEXTSPLIT( TEXTJOIN("|",,Table2[Values]), ";","|",,,"") ),
header, HSTACK( {"#","Key"}, "Value" & SEQUENCE(, COLUMNS(data) - 2 ) ),
VSTACK( header, data )
)
Nov 14 2022 06:22 AM
Mar 17 2024 08:45 PM
@dipankar_de Here's an updated variant, using a TEXTBEFORE/TEXTAFTER based LAMBDA function:
=HSTACK(Table2['#], TEXTTOCOLS(Table2[Values], "; "))
- OR -
=TEXTTOCOLS(Table2['#]&"; "&Table2[Values], "; ")
Where TEXTTOCOLS is defined in Name Manager as:
=LAMBDA(array,delimiter,
LET(
delims, (LEN(array)-LEN(SUBSTITUTE(array, delimiter, )))/LEN(delimiter),
cols, MAX(delims)+1,
arr, delimiter&array&REPT(delimiter, cols-delims),
TEXTBEFORE(TEXTAFTER(arr, delimiter, SEQUENCE(, cols)), delimiter)
)
)
This method is similar to dscheikey's solution in that it avoids the character limitations imposed on TEXTJOIN and CONCAT and is more efficient than the iterative alternatives, REDUCE/VSTACK and MAKEARRAY/INDEX, when working with larger datasets.
Mar 18 2024 02:47 AM
The same idea but with slightly different strengths and weaknesses.
TEXTBETWEEN
= LAMBDA(string, start, [end], [n],
LET(
end, IF(ISOMITTED(end), start, end),
k, SEQUENCE(1, n),
TEXTAFTER(TEXTBEFORE(string, end, k,,1), start,-1,,1)
)
)
The start and end are separators that can be used to extract substrings. I have left it to the programmer to determine the number of terms to return.
Basically, though, I agree that your suggestion is a good way around the pitiful 'array of arrays' problem.
Mar 18 2024 09:47 PM - edited Mar 18 2024 11:59 PM
@PeterBartholomew1 Good stuff. Looks like you got the syntax figured out for the AFTER/BEFORE arrangement (as opposed to BEFORE/AFTER). I did notice one little glitch, though... when the length of the delimiter is greater than the length of the value in the first field (before the first delimiter), it will return #VALUE!. Interestingly enough, a similar glitch occurred for me when I tried using the "Match to end" option (with a shorter version of TEXTTOCOLS), where it omitted the value in the last field if its length was less than the delimiter.
Another version of TEXTTOCOLS that does appear to work:
=LAMBDA(array,delimiter,
LET(
arr, delimiter&array&delimiter,
cols, SEQUENCE(, MAX(LEN(arr)-LEN(SUBSTITUTE(arr, delimiter, )))/LEN(delimiter)-1),
IFERROR(TEXTBEFORE(TEXTAFTER(arr, delimiter, cols), delimiter), "")
)
)
...but it relies on the IFERROR function to handle any #VALUE! or #N/A errors.
This all came about after responding to another post, Counting the number of occurrences of an entity in a table, where I used virtually the same technique to generate a unique list of items from a range of delimited strings with varying lengths. Sure, TEXTSPLIT/TEXTJOIN could've been used there, but I don't like the character limitations of that method.
I'm with you 100% that nested arrays should be allowed. Iterative alternatives, while useful, are typically not very efficient with larger datasets. In some situations, the right combination of dynamic array helper functions can be used to efficiently simulate an array of arrays, but it usually requires multiple steps involving various manipulations/transformations, which can be a tough sell for the average user. One such example can be found here: Multi-Row Data Consolidation Into Single Row, Multi-Column
Cheers! 🙂