SOLVED

Replace Function (or Substitute) - slashes, dashes...etc

Steel Contributor

Hello,

I need to replace various extraneous characters found in invoice numbers so I can compare based only on the letters or numbers.

I need to replace the following with nothing:

  1. Dash
  2. Slash (backward)
  3. Slash (forward)
  4. Period
  5. Comma
  6. Parenth (
  7. Parenth )
  8. #
  9. Space 

How can I best do this?  I am not sure if there is a new function in excel that handles this or is it Replace or Substitute?  Or? 

thank you

I have attached a sample

 

13 Replies

@Tony2021 

SUBSTITUTE is the correct function, but you shouldn't repeat B2:

 

=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(B2,"-",""),"/",""),"\",""),".",""),",",""),"(",""),")",""),"#","")," ","")

@Tony2021  Alternatively if you have Excel 365 try this:

 

=LET(in,A2,exclude,{",","/","\","-",".","(",")","#"," "},
CONCAT(TEXTSPLIT(in,exclude)))

 

@Tony2021 

 

A recursive SUBSTITUTE called 'Remove'

 

'Remove
=LAMBDA(cel,chars,IF(chars = "", cel, Remove(SUBSTITUTE(cel, LEFT(chars), ""), RIGHT(chars, LEN(chars) - 1))))

 Sheet level formula:

=Remove(B2,"$%()/\|#-. ")

@Patrick2788 

Hello Experts, thank you for the help! I like all the formulas. Remove is interesting indeed and I try to use it in my excel file and I get a #name? I dont see any code so not sure why it wont work in my file.  I imagine the lamba is being used but I dont see where it is. 

 

Also, I need to use this formula in a much larger formula and grateful for your assistance. 

 

In the formula below, C11 is the invoice number that I need to strip I need to replace C11 with the formula. 

I thought I could simply replace C11 with the formula but I dont think I can do that?  

 

original formula: 

FYI: D11 is the invoice amount. 
=IF(TRIM(CLEAN(C11))="","",IF(ISNA(VLOOKUP(C11&D11,Project_Costs[[#All],[Invoice No Stripped]]&Project_Costs[[#Headers],[Invoice amount]],1,FALSE)),"NOT IN a DD","In Previous DD"))

 

I tried as follows: 

using Remove:

=IF(TRIM(CLEAN(C11))="","",IF(ISNA(VLOOKUP(Remove(C11,"$%()/\|#-. ")&D11,Project_Costs[[#All],[Invoice No Stripped]]&Project_Costs[[#Headers],[Invoice amount]],1,FALSE)),"NOT IN a DD","In Previous DD"))

 

using substitute: 

=IF(TRIM(CLEAN(C12))="","",IF(ISNA(VLOOKUP(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(C12,"-",""),"/",""),"\",""),".",""),",",""),"(",""),")",""),"#","")," ","")&D12,ProjectCostsPQ_FE.xlsx!Project_Costs[[#All],[Invoice '#]]&ProjectCostsPQ_FE.xlsx!Project_Costs[[#All],[Invoice amount]],1,FALSE)),"NOT IN a DD","In Previous DD"))

 

thank you very much. 

I would post the file but there is a power query in it so I hope by giving the formula is enough information.  

The substitute version should have worked. The REMOVE requires the LAMBDA which you would put into the NAME Manager. My option could look like this:
=IF(TRIM(CLEAN(C11))="","",IF(ISNA(VLOOKUP(
CONCAT(TEXTSPLIT(C11,{",","/","\","-",".","(",")","#"," "}))&D11,Project_Costs[[#All],[Invoice No Stripped]]&Project_Costs[[#Headers],[Invoice amount]],1,FALSE)),"NOT IN a DD","In Previous DD"))

@Tony2021 Since you mentioned that you already use Power Query, I wonder if you perhaps can do the "lookups" within PQ as well. The code for removing the unwanted characters from the invoice numbers is quite easy. Though, you would not specify which characters to remove. In stead, you would explicitly tell PQ to keep only letters and numbers.

Add a custom column with the following code:

Text.Select([#"Invoice #"], {"a" .. "z", "A" .. "Z", "0" .. "9"})

It will look like this:

Screenshot 2023-01-28 at 08.16.02.png 

@mtarler 

Hi Mtarler, thank you for the follow up. It seems to work but I noticed in my testing that I need to remove leading Zeros since my [Invoice No Stripped] strips out the leading zero's and I compare the 2 data sets (old data w 1000's of rows vs new data) and one has leading zeros and the other doesnt.  

 

here is how I have it:
=IF(TRIM(CLEAN(C11))="","",IF(ISNA(VLOOKUP(
CONCAT(TEXTSPLIT(C11,{",","/","\","-",".","(",")","#"," "}))&D11,Project_Costs[[#All],[Invoice No Stripped]]&Project_Costs[[#All],[Invoice amount]],1,FALSE)),"NOT IN a DD","In Previous DD"))

 

Would you happen to know how to add that part to the above?  thanks again for the expert help.  Much appreciated. 

 

 

Hi Riny, that is a crafty way to think about it...only keep what you need instead of removing. I can use that in my PQ that strips out the characters. I think I could probably somehow manage implementing what you are saying. I would rather keep it in the excel formula though since I am just a basic user of PQ and it would take a bit more thought instead of using a function in an excel formula.
best response confirmed by Tony2021 (Steel Contributor)
Solution

@Tony2021 If you know there will be at most x number you could use nested IF() statements to remove them but a recursive LAMBDA works really well.  Go to Formulas -> Name Manager and add a NEW name called strip0 and give it this definition:

=lambda(n, if(left(n,1)="0",strip0(mid(n,2,len(n))),n))

As long as you are starting down this path why not go all the way and make a custom function to do it all for you.  As shown in the next image is the name manager with 4 names:

strip0 as noted above will strip the preceding 0s off

punctuation is just a name for the array of characters to get removed

RemoveChars is the function I gave above that uses CONCAT and TEXTSPLIT to remove the characters

and

CleanUp puts them all together:

mtarler_0-1674934352688.png

so now you can use:

=IF(TRIM(CLEAN(C11))="","",IF(ISNA(VLOOKUP(
CleanUp(C11)&D11,Project_Costs[[#All],[Invoice No Stripped]]&Project_Costs[[#All],[Invoice amount]],1,FALSE)),"NOT IN a DD","In Previous DD"))

@mtarler 

mtarler, that is amazing. I have done my testing and it works! I have never used Lambda. Didnt even know it existed. You are very talented. Thank you very much for the help. I am very grateful. Have a good night.

you're very welcome. Lambda is relatively new and so I think many if not most casual users don't know about it yet. It is very powerful addition as you can see. There are also a number of other new functionality that has been introduced that if you don't know about I recommend you check out including dynamic arrays, FILTER(), LET(), the new XLOOKUP and XMATCH, the LAMBDA helper functions and the new text functions like the TEXTSPLIT() function I used. It is very exciting to be an excel user right now (and no i don't work for MS and I am NOT paid by them in any way)

@Tony2021 

Another formula but without the elegance of @mtarler 's recursive formula.  It uses a side effect of TEXTSPLIT, namely it removes the separators.

= MAP(invoiceNo, LAMBDA(inv, CONCAT(TEXTSPLIT(inv, unwanted))))
where 'unwanted' refers to
= {"-";"\";"/";".";",";"(";")";"%";"$";"|";"#";" "}

The MAP/LAMBDA calculates cell by cell but returns the entire array of results. 

thank you. I did read up on Lambda today. Interesting addition. Mostly out of my league at least at this point. :)
1 best response

Accepted Solutions
best response confirmed by Tony2021 (Steel Contributor)
Solution

@Tony2021 If you know there will be at most x number you could use nested IF() statements to remove them but a recursive LAMBDA works really well.  Go to Formulas -> Name Manager and add a NEW name called strip0 and give it this definition:

=lambda(n, if(left(n,1)="0",strip0(mid(n,2,len(n))),n))

As long as you are starting down this path why not go all the way and make a custom function to do it all for you.  As shown in the next image is the name manager with 4 names:

strip0 as noted above will strip the preceding 0s off

punctuation is just a name for the array of characters to get removed

RemoveChars is the function I gave above that uses CONCAT and TEXTSPLIT to remove the characters

and

CleanUp puts them all together:

mtarler_0-1674934352688.png

so now you can use:

=IF(TRIM(CLEAN(C11))="","",IF(ISNA(VLOOKUP(
CleanUp(C11)&D11,Project_Costs[[#All],[Invoice No Stripped]]&Project_Costs[[#All],[Invoice amount]],1,FALSE)),"NOT IN a DD","In Previous DD"))

View solution in original post