Forum Discussion

dipankar_de's avatar
dipankar_de
Copper Contributor
Nov 12, 2022

Using TEXTSPLIT without copy-fill-down

Hello All,

 

I am trying to split "; " separated range of text into columns. The intent is to use one single spill-formula (instead of copy-fill-down) to address each text in the range. 

 

I have the formula to do this for a small set of values using CONCAT and TEXTSPLIT functions. However, when I try doing it with a set of values that run beyond 32,767 characters, CONCAT function runs into a #VALUE error because of Cell value length restriction in Excel.

 

This is because using the CONCAT function I was trying to create a long-long string taking the col_delimited values in a cell in each row and padding a row_delimiter at the end of the cell value in each row for the entire range. The objective was to submit the long-long string, so created using the CONCAT function, to the TEXTSPLIT function to split it into columns and rows based on the col_delimiter and row_delimeter characters. 

 

The solution to this, I believe, is to split values in each row (cell) individually. However, I want to get this done using one single array formula for the entire range, which unfortunately, I am unable to accomplish. I have tried using TEXTSPLIT within MAP-LAMBDA to process each row individually, but it only splits the first element of the col_delimited string in each row - the remaining of col_delimited string in each row is ignored.

 

Request you all to help me with a single-formula spill-array solution to the problem I have. An example case is included in the file (TEXTSPLIT.xlsx) with the range to split (Table2). The single-formula solution that works for a small set of values is given in Cell E3. The single formula solution that failed to work as desired (TEXTSPLIT within MAP-LAMBDA) is available in Cell P3.

 

Please also let me know why this didn't work even when processing each row individually using MAP-LAMBDA.

 

Thanks in advance.

 

Warm regards,

Dipankar De

 

23 Replies

  • The array of arrays should not be a problem; it is simply the typical output from array calculation.  Since the original date of this post, I have written helper functions (with a bit of tidying up from djclements​ ) that return arrays of arrays.

    = MAPλ(values, LAMBDA(values, TEXTSPLIT(values, ";")))

    or by Currying the LAMBDA function

    Worksheet formula
    = MAPλ(values, TEXTSPLITλ(";"));
    where
    TEXTSPLITλ
    = LAMBDA(s, LAMBDA(v, TEXTSPLIT(v, s)));

    The function MAPλ is to be found at

    A version of Excel MAP helper function that will return an array of arrays

  • Hi, isn't the following more simple?

    =HSTACK(Table2['#],IFERROR(TEXTSPLIT(TEXTJOIN("|",,Table2[Values]),";","|"),""))

     

    • SergeiBaklan's avatar
      SergeiBaklan
      Diamond Contributor

      TEXTJOIN has the limit on combined text length, thus not safe.

  • =HSTACK(Table2['#],IFERROR(TEXTSPLIT(TEXTJOIN("|",,Table2[Values]),";","|"),""))

     

  • djclements's avatar
    djclements
    Bronze Contributor

    dipankar_de Here's an updated variant, using a TEXTBEFORE/TEXTAFTER based LAMBDA function:

     

    =HSTACK(Table2['#], TEXTTOCOLS(Table2[Values], "; "))

     

    - OR -

     

    =TEXTTOCOLS(Table2['#]&"; "&Table2[Values], "; ")

     

    Where TEXTTOCOLS is defined in Name Manager as:

     

    =LAMBDA(array,delimiter,
        LET(
            delims, (LEN(array)-LEN(SUBSTITUTE(array, delimiter, )))/LEN(delimiter),
            cols, MAX(delims)+1,
            arr, delimiter&array&REPT(delimiter, cols-delims),
            TEXTBEFORE(TEXTAFTER(arr, delimiter, SEQUENCE(, cols)), delimiter)
        )
    )

     

    This method is similar to dscheikey's solution in that it avoids the character limitations imposed on TEXTJOIN and CONCAT and is more efficient than the iterative alternatives, REDUCE/VSTACK and MAKEARRAY/INDEX, when working with larger datasets.

    • PeterBartholomew1's avatar
      PeterBartholomew1
      Silver Contributor

      djclements 

      The same idea but with slightly different strengths and weaknesses.

      TEXTBETWEEN
      = LAMBDA(string, start, [end], [n],
          LET(
              end, IF(ISOMITTED(end), start, end),
              k, SEQUENCE(1, n),
              TEXTAFTER(TEXTBEFORE(string, end, k,,1), start,-1,,1)
          )
        )

      The start and end are separators that can be used to extract substrings.  I have left it to the programmer to determine the number of terms to return. 

      Basically, though, I agree that your suggestion is a good way around the pitiful 'array of arrays' problem.

       

       

      • djclements's avatar
        djclements
        Bronze Contributor

        PeterBartholomew1 Good stuff. Looks like you got the syntax figured out for the AFTER/BEFORE arrangement (as opposed to BEFORE/AFTER). I did notice one little glitch, though... when the length of the delimiter is greater than the length of the value in the first field (before the first delimiter), it will return #VALUE!. Interestingly enough, a similar glitch occurred for me when I tried using the "Match to end" option (with a shorter version of TEXTTOCOLS), where it omitted the value in the last field if its length was less than the delimiter.

         

        Another version of TEXTTOCOLS that does appear to work:

         

        =LAMBDA(array,delimiter,
            LET(
                arr, delimiter&array&delimiter,
                cols, SEQUENCE(, MAX(LEN(arr)-LEN(SUBSTITUTE(arr, delimiter, )))/LEN(delimiter)-1),
                IFERROR(TEXTBEFORE(TEXTAFTER(arr, delimiter, cols), delimiter), "")
            )
        )

         

        ...but it relies on the IFERROR function to handle any #VALUE! or #N/A errors.

         

        This all came about after responding to another post, Counting the number of occurrences of an entity in a table, where I used virtually the same technique to generate a unique list of items from a range of delimited strings with varying lengths. Sure, TEXTSPLIT/TEXTJOIN could've been used there, but I don't like the character limitations of that method.

         

        I'm with you 100% that nested arrays should be allowed. Iterative alternatives, while useful, are typically not very efficient with larger datasets. In some situations, the right combination of dynamic array helper functions can be used to efficiently simulate an array of arrays, but it usually requires multiple steps involving various manipulations/transformations, which can be a tough sell for the average user. One such example can be found here: Multi-Row Data Consolidation Into Single Row, Multi-Column 

         

        Cheers! 🙂

  • SergeiBaklan's avatar
    SergeiBaklan
    Diamond Contributor

    dipankar_de 

    I'd simplify to

    =LET(
       data, HSTACK( Table2['#],  TEXTSPLIT( TEXTJOIN("|",,Table2[Values]), ";","|",,,"") ),
       header, HSTACK( {"#","Key"}, "Value" & SEQUENCE(, COLUMNS(data) - 2 ) ),
       VSTACK( header, data )
     )
    • dipankar_de's avatar
      dipankar_de
      Copper Contributor
      Hello Sergei,

      This doesn't work when the length of the string, containing col_delimited text in the range, is more than 32,767 characters due to Excel limitations. This is the reason I requested for help on this forum for a single spill array formula that will process each col_delimited string in each row individually but will output the results for the entire range in one go.

      Both CONCAT and TEXTJOIN have the 32,767-character restriction and hence the problem.

      the example spreadsheet can be found at https://1drv.ms/x/s!Aqyz05eCPz3-rWwakzdAFKiYsjtH?e=ydgpxa in which you can look through any of the worksheets than the TEXTSPLIT worksheet for the near real-world problem.

      Warm regards,
      Dipankar De
  • dipankar_de 

    I think the abject failure to process arrays or arrays is the most ridiculous error of judgement made by Microsoft in what is otherwise a suburb sequence of innovations, turning Excel from amateurs' corner into something capable of creating reusable, quality solutions.  One of the principal benefits of the grid is that it allows one to calculate and display arrays of arrays.  To pretend otherwise is to store up huge compatibility problems for when sense eventually prevails. 

     

    To offer an absurd solution, I developed a custom Lambda function "Repackλ" to turn the comma-deliminated values into fixed width strings

     

    Repackλ 
    = LAMBDA(t,
        LET(
            splitText, TEXTSPLIT(t, ","),
            length,    LEN(splitText),
            CONCAT(splitText & REPT(" ", 10 - length))
        )
    );

     

    giving

    What appears to be a monumental waste of effort pays off because MID will generate and array of arrays if used to wrap the Lambda function.

     

    = MID( MAP(Text, Repackλ), {1,11,21,31,41}, 10)

     

    giving

    Having to go back to MID to achieve something that 'Excel has never done?' is so annoying that I propose to continue expressing my scorn at every opportunity (and there are many opportunities, which makes the need for change so urgent).  I hope you all join in to make Microsoft listen even if just to save themselves further embarrassment!

     

     

     

    • dipankar_de's avatar
      dipankar_de
      Copper Contributor

      dscheikeymtarler; Patrick2788; PeterBartholomew1

       

      Thanks to each of you for providing me with the exact help I was looking for on this forum.

       

      dscheikey: Your formula might be an old school one but it did solve my problem. I created a similar formula but that was a copy-fill-down type - one formula per row which was restricting me to use the P3# spill-array addressing in subsequent processing. Many thanks for your help - I really liked the  start and end formula snippets. The way I did it in my copy-fill-down formula was:

      =LET(a, $B5, b, $C5, data, "|" & SUBSTITUTE(b, "; ", "|") & "|",

               WrdCnt, LEN(data) - LEN(SUBSTITUTE(data, "|", "")),
               start, FIND(CHAR(2), SUBSTITUTE(data, "|", CHAR(2), SEQUENCE(, WrdCnt))),

               end, SEQUENCE(1, COLUMNS(start) - 1),
               O, MID(data, INDEX(start, end) + 1, INDEX(start, end + 1) - INDEX(start, end) - 1), HSTACK(a, O)),

       

      Seems, the way I had written the "end" was restricting the row-spill when extended to the complete range of rows. Thanks again.

       

      mtarler: You provided the new school formula that I was looking for but was encountering errors due to the "array of arrays" issue. The formula with REDUCE, VSTACK, TEXTSPLIT did the breakthrough. However, there is still an issue which I am not able to resolve - the output generated by the formula is changing the case of the strings to lowercase. Please look through the "Solution - mtarler" worksheet in the attached. Do you see a reason why the case is getting changed and what is changing it?

       

      https://1drv.ms/x/s!Aqyz05eCPz3-rWwakzdAFKiYsjtH?e=ydgpxa

       

      Patrick2788: Thanks for sharing the link to the "Array of Arrays" limitation literature. I haven't had tried your formula on the actual problem (string length more than 32,767 characters) yet. However, looking at it, I see you have hardcoded the number of columns in the MAKEARRAY function parameters. Anyway, this is a small thing to rectify by taking the max of the col_delimiter count. Will let you know once I have looked through your formula.

       

      PeterBartholomew1: You and Sergei are simply the Best. Though I can follow the solutions you provide, it would take quite some time for me to reach that level. I completely agree with you that going back to MID is really annoying after having seen the power of new age Excel functions.

      • PeterBartholomew1's avatar
        PeterBartholomew1
        Silver Contributor

        dipankar_de 

        I have 'interfered' with Matt's solution to add further refinement or complexity (however you take it) to create a worksheet function

        = SplitListλ(Response,"; ",185)

        which uses the Lambda funtions

        SplitListλ
        = LAMBDA(list,s,n,
              DROP(
                  REDUCE("", list, SplitTextλ(s,n)),
              1,0)
          );
        
        
        SplitTextλ 
        = LAMBDA(s,n, LAMBDA(p, q, 
              VSTACK(p, 
                  EXPAND(TEXTSPLIT(q, s), , n, "")
              )
          ));

         

  • Patrick2788's avatar
    Patrick2788
    Silver Contributor

    dipankar_de 

    Here's my take:

     

    =LET(matrix,MAKEARRAY(ROWS(Table2),9,LAMBDA(r,c,CHOOSECOLS(EXPAND(TEXTSPLIT(INDEX(Table2[Values],r),";"),1,9,""),c))),HSTACK(Table2['#],matrix))

     

  • dscheikey's avatar
    dscheikey
    Bronze Contributor

    dipankar_de 

    At first I thought that this could be solved quite simply with TEXTSPLIT() within BYROW(). But that always resulted in a CALC# error.

    I then solved this rather oldscool with MID() SEARCH() and SEQUENCE().

    There are probably other variants.

     

    =LET(
    in,SUBSTITUTE("; "&Table2[Values]&"; "," ",""),
    maxi,MAX(LEN(in)-LEN(SUBSTITUTE(in,";","")))-1,
    start,SEARCH(CHAR(124),SUBSTITUTE(in,";",CHAR(124),SEQUENCE(1,maxi,1))),
    end,SEARCH(CHAR(124),SUBSTITUTE(in,";",CHAR(124),SEQUENCE(1,maxi,2))),
    HSTACK(Table2['#],IFERROR(MID(in,start+1,end-start-1),"")))

     

     

    I hope that's OK for you.

    • mtarler's avatar
      mtarler
      Silver Contributor

      dscheikey  alternative using REDUCE and VSTACK:

      =LET(colOne, Table2['#], colRest, Table2[Values], splittedData, 
          IFERROR(DROP(
             REDUCE("",colRest, 
                LAMBDA(p,q, VSTACK(p, TEXTSPLIT(q, "; ", , FALSE, 1, "")))
                )
             ,1),""),
          HSTACK(colOne, splittedData))

Resources