Forum Discussion
Extract text between two characters
- Aug 23, 2022
Here is a small custom VBA function:
Function ExtractURL(ByVal s As String) As String Dim p1 As Long Dim p2 As Long Dim p3 As Long s = Replace(s, Chr(160), " ") p1 = InStr(1, s, ".") p2 = InStrRev(s, " ", p1 - 1) p3 = InStr(p1 + 1, s, " ") If p3 > 0 Then s = Mid(s, p2 + 1, p3 - p2 - 1) Else s = Mid(s, p2 + 1) End If If Left(s, 4) <> "www." Then s = "www." & s End If ExtractURL = s End Function
Use like this: in a cell formula:
=ExtractURL(A2)
=LET(
a,FILTERXML("<y><z>"&SUBSTITUTE(A2," ","</z><z>")&"</z></y>","//z"),
b,SEARCH(".",a),
LOOKUP(9^99,b,a))
- MolekulaAug 31, 2022Copper Contributor
Detlef_Lewin good evening,
I am using your code on the daily basis and it works perfectly.
But when I try to understand its logic it becomes complicated, can you please explain it to me?
I don't understand why you divide the text by:FILTERXML("<y><z>"&SUBSTITUTE(A2," ","</z><z>")&"</z></y>","//z")
And what 9^99 stands for?
LOOKUP(9^99,b,a))
Thank you in advance
- mtarlerAug 31, 2022Silver Contributor
Molekula I can try to explain his code and my variation on it:
=LET( a,FILTERXML("<y><z>"&SUBSTITUTE(A2," ","</z><z>")&"</z></y>","//z"), b,SEARCH(".",a), LOOKUP(9^99,b,a))
FILTERXML will perform a filter on XML text. XML text is a markup language using <> and </> to tag sections of the text. So in this case the function will "filter" out all the sections tagged with <z>...</z> based on that last argument "//z". So the trick here is to convert the text into XML text where what you want is between the <z> and </z> tags. So step 1 is to 'initialize' the XML text with the <y><z> tags and then inside that function is a SUBSTITUTE(A2, " ", "</z><z>") which replaces every instance of a space (" ") with the end tag </z> and then starts a new z container with <z>. Finally it ends the text with </z> </y> to properly close the containers. So FILTERXML will now create an array output from the original text broken up based on the location of the spaces.
The next line (line 3) then searches for the presence of a "." inside each array element and returns a number if it is there (corresponding to its location) or an error if not
The LOOKUP in line 4 finds the numeric value from the array b (i.e. the element that found a ".") and returns the corresponding element that came out from the FILTERXML function.
In my variation:
=LET( a,FILTERXML("<y><z>"&SUBSTITUTE(A27," ","</z><z>")&"</z></y>","//z"), CONCAT(IF(IFERROR(LEN(WEBSERVICE(a)),FALSE),a,"")))
after creating that array of parts I then check if WEBSERVICE() of that item returns any data and if it finds anything it returns the text from that part back again or returns blank if not. All the elements are CONCAT together so all the blanks + the web address + blanks
- MolekulaAug 29, 2022Copper Contributor
Good evening Detlef_Lewin ,
As for now everything is working perfectly, so thank you once again.
But I am looking a bit in the future, and doubt if there is any possibility to solve the following issue that I may face:1. Apart from the clean data (table in the post above) I receive, there is a high chance of dirty data like that:
port data I receive Result I expect FABF - Google - worldwide - REP Google CRIM - RAW - Yahoo - global - worldwide
Yahoo FABF - DOM - RIM - drive - global drive.can.com limbo - FAB - RAW - Bloomberg limbo.bloomberg.com FABF - google - RIM - drive-global- google.drive.global
FABF - google - RIM - drive-global-NL google.drive.global.nl
CRIM - RAW - Yahoo - global - worldwide (only) Yahoo.worldwide (only)
CRIM - RAW - Yahoo - global - worldwide-(only)-certainperiod Yahoo.worldwide (only) certainperiod
Where there is no clear reference to the website but at least a little source from where it may came from. And thus it will help me to avoid potential blanc links....
2. Is there any way to integrate that in your formula?My solution:
I have added IFERROR=0 to your function to filter the dirty data and added multiple IF columns based on each criteria, but the file looks messy and I am no satisfied with this automation.
Thank you in advance
- MolekulaAug 24, 2022Copper Contributor
Thank you very much Detlef_Lewin you function works like a charm and if I could mention your answer as a best answer I would do it immediately.
Thank you.