PQ: How to split a large column into 50 characters columns but split only at whitespace?

Iron Contributor

I have a large table (>80'000 rows) with a column (max width = 163) that I need to resize into smaller columns.

The rules are:

- split only at whitespaces

- column size max 50 characters

- 2 colums with with 50, third column with rest (charpos > 100).

 

I can use PQ to split at whitespaces OR at 50 chars.

But how can I split only at whitespaces while trying to make the first two columns 50 chars wide, if possible?

 

 

 

11 Replies

To resize a large table column in Microsoft Excel: Use the Text to Columns feature. Choose "Delimited" and select "Whitespace" as the delimiter. Set column widths to a maximum of 50 characters. For the third column, specify a width greater than 100 characters. This will efficiently split the specified column based on whitespace and meet the defined size criteria. [link removed by admin] 

 

 

@DanHuber 

 

Could you see how it goes with the attached PQ option?

Thanks, but I am looking into a PowerQuery (PQ) solution . I appologize for not writing this the right way...

Thanks @Lorenzo 

I have to admit, I am very hesitant to open an unknown excel file with external data connections on my working environment. I would like, though, since your solution interests me a lot.

 

Is there any way you could share differently? Perhaps with the M code directly?

 

I'm just very very careful. Or try to be :)

 

Daniel

 

@DanHuber 

 

I share your caution and you're absolutely right. Take a look at my profile, nobody complained so far...

If after that you still hesitate let me know and I'll see what I can do

@Lorenzo , is there a specific reason why your file is using an external data connection?

@DanHuber 

is there a specific reason why your file is using an external data connection?

Nope. That's the way it is for a moment with 365. TBH I never tried to figure out the reason

 

Query code is, assuming data formatted as table named Table, with your data in [Column1]:

 

 

 

let
    Source = Excel.CurrentWorkbook(){[Name="Table"]}[Content],

    PartOne = Table.AddColumn(Source, "Part1", each
        let
            range = try Text.Range([Column1], 0, 50)
        in
            if range[HasError] then [Column1]
            else if Text.At(range[Value], 49) =  " " then range[Value]
            else Text.BeforeDelimiter(range[Value], " ", {0,RelativePosition.FromEnd}) & " ",
        type text
    ),
    PartTwo = Table.AddColumn(PartOne, "Part2", each
        let
            after = Text.AfterDelimiter([Column1], [Part1]),
            range = try Text.Range(after, 0, 50)
        in
            if range[HasError] then after
            else if Text.At(range[Value], 49) =  " " then range[Value]
            else Text.BeforeDelimiter(range[Value], " ", {0,RelativePosition.FromEnd}) & " ",
        type text
    ),
    PartThree = Table.AddColumn(PartTwo, "Part3", each
        Text.AfterDelimiter([Column1], [Part1] & [Part2])
    ),
    KeptParts = Table.SelectColumns(PartThree, {"Part1", "Part2", "Part3"}),
    FixedNullLength = Table.ReplaceValue(KeptParts, each [Part1], null,
        (x,y,z) as nullable text => if Text.Length(x) = 0 then null else x,
        {"Part2", "Part3"}
    )
in
    FixedNullLength

 

NB: Last step (FixedNullLength) isn't necessary if you don't mind having non-null cells ("") in the output

@DanHuber 

There was an unecessary if in what I posted yesterday => Previous replies edited

@DanHuber 

 

Something cleaner with a function query named ExtractString:

 

 

// ExtractString
(string as text, max_length as number) as nullable text =>
let
    Source = try Text.Range(string, 0, max_length),
    Extracted = if Source[HasError] then string
        else if Text.At(Source[Value], max_length-1) =  " " then Source[Value]
        else Text.BeforeDelimiter(Source[Value], " ", {0, RelativePosition.FromEnd}) & " "
in
    if Text.Length(Extracted) = 0 then null else Extracted

 

 

 

Main query code:

 

 

// ColumnSplit2
let
    Source = Excel.CurrentWorkbook(){[Name="Table"]}[Content],

    PartOne = Table.AddColumn(Source, "Part1", each
        ExtractString([Column1], 50), type text
    ),
    PartTwo = Table.AddColumn(PartOne, "Part2", each
        ExtractString(Text.AfterDelimiter([Column1], [Part1]), 50), type text
    ),
    PartThree = Table.AddColumn(PartTwo, "Part3", each
        let
            remain = try Text.AfterDelimiter([Column1], [Part1] & [Part2])
        in
            if remain[HasError] then null
            else if Text.Length(remain[Value]) = 0 then null
            else remain[Value],
        type text
    ),
    KeptParts = Table.SelectColumns(PartThree, {"Part1", "Part2", "Part3"})
in
    KeptParts

 

 

EDIT

A generic solution in attached file 'PQ_SplitToColumnsByLengthAndDelim.xlsx' with 2 parameters in 'PARAM' sheet:

- Number of expected columns

- Max. characters per column

 

EDIT 2

PQ_SplitToColumnsByLengthAndDelim_Variant.xlsx only has PARAMeter Max. characters per column. The number of columns is calculated: RoundUp(MaxStringLength / MaxCharsPerColumn)

@DanHuber 

In the event where the above does what you expected feel free to Mark it as solution to help people who Search

Same goes with Split a row into several rows with almost same content. PowerQuery?

 

Thanks :)

@DanHuber 

I haven't followed the details of your requirement but, for interest sake, this is a Lambda function that wraps text to lines breaking at spaces.

"Worksheet formula"
= WrapTextλ(text, 30)

"WrapTextλ"
= LET(
    wordList, TEXTSPLIT(text, , " "),
    wordLen,  LEN(wordList) + 1,
    linePos,  SCAN(0, wordLen, LAMBDA(x, y, IF(x + y < maxLen + 1, x + y, y))),
    lineLen,  FILTER(linePos, VSTACK(DROP(linePos = wordLen, 1), TRUE)),
    lineStrt, SCAN(0, lineLen, LAMBDA(prior, line, prior + line)) + 1 - lineLen,
    MID(text, lineStrt, lineLen)
 )