Forum Discussion
Hans_Olsson
Jun 21, 2023Copper Contributor
Replace parts of text that meets certain criteria in a text file
I have a text file with approx 58000 lines. The lines holds different types of content for different purposes.
Some numbers that meets certain criterias should change to always have a fixed number of digits and remove any non-numeric characters as per the examples below.
I have struggled with the 3 scenarios below for some time now but I'm not getting there so any assistance would be much appreciated.
Scenario # 1: If exist #TRANS and "6" the number following "6", first example "4002", should have leading zeros added to always make it 8 digits within double quotes. Any non-numeric character should be removed.
Current content | What it shlould change to |
#TRANS 4319 {"6" "4002"} -22625.00 20220207 "Entreprenad AB" | #TRANS 4319 {"6" "00004002"} -22625.00 20220207 "Entreprenad AB" |
#TRANS 4319 {"6" "63-22"} -22625.00 20220207 "Entreprenad AB" | #TRANS 4319 {"6" "00006322"} -22625.00 20220207 "Entreprenad AB" |
Scenario # 2: If exist #TRANS and "25" (no "6") the number following "25", within double quotes, "162" should have leading zeros added to always be 4 digits within double quotes.
#TRANS 1930 {"25" "1"} -21940.00 20220207 "Levbet" | #TRANS 1930 {"25" "0001"} -21940.00 20220207 "Levbet" |
Scenario # 3: If exist #TRANS and both "6" and "25" the number following "6" should have any non-numeric characters removed and leading zeroes added for a total of 8 digits. Same as scenario # 1.
When both "6" and "25" exists the number following "25" should hold the 8 digit number following "6" and the 4 digit number following "25" as per scenario 2 in one sequence. 8+4=12 digits within double quotes.
#TRANS 4319 {"6" "4002-22" "25" "03"} 22625.00 20220207 "Entreprenad AB" | #TRANS 4319 {"6" "00400222" "25" "004002220003"} 22625.00 20220207 "Entreprenad AB" |
If someone can put together a formula that manages all 3 scenarios then he/she is my hero!
Hi Hans_Olsson
Thanks for the pictures. According to the few examples you posted earlier #OBJEKT strings appear to be formatted as below but with copy/paste on the Web we're never sure so please check this first with your actual data
The attached update highly/only relies on those consistent patterns (seems very consistent on your side according to your pictures :))
#OBJEKT<2 spaces>6<2 spaces>"2003"<1 space>"Kastrup 11.1"
#OBJEKT<2 spaces>25<1 space>"001"<1 space>"ŽTA 001"
- LorenzoSilver Contributor
Please re-download attachment from the previous reply. Query updated to not replace records like below if they exist:
#TRANS 4324 {"6" "4002-22" "28" "03"} 22625.00 20220207 "Entreprenad AB"
In previous version such records were updated as if the 28 = 25
- LorenzoSilver Contributor
Hi Hans_Olsson
Clear problem description + expected results :-))). With Get & Transform aka Power Query:
NB: This is highly dependent on the quality & consistency of the text between {}. Also, Power Query is case-sensitive so i.e. #TRANS <> #Trans and so far the In table is filtered on [String] starting with #TRANS (easy to adapt if necessary)
In attached file could you check that the Out table reflects what you expect against the In table
If fine add some more actuals in In > Right-click in Out > Refresh
(optimizations remain possible but I did not look at this specifically)
- Hans_OlssonCopper Contributor
Lorenzo Thank you very very much. The attached file clearly does the job. There's just one minor glitch. It requires #TRANS to start in position 1. In the actual file the #TRANS-lines begins with 3 blanks and #TRANS starting in pos 4. If I remove the 3 blanks the query works like a dream.
This is the first time I'm in contact with Power Query. I don't manage to bring up the actual query, to see what the code looks like and add the three 3 blanks or even better, make the query look for #TRANS independent of where it starts.
If it's an easy change it would be very much appreciated if you could make it and attach the updated file.
Regardless, big thanks for this!
- LorenzoSilver Contributor
Doable no problem. Question though: do you want to keep the <spaces> before #TRANS in the Out table? Or do you prefer to remove those leading <spaces> in ALL records, or only on the #TRANS record?