Forum Discussion
How to extract multiple occurrences of a text enclosed between two same characters
- Oct 28, 2024
I hope this works for you (with the text in A5):
=LET(w, TEXTSPLIT(A5, "!"), n, COUNTA(w), s, SEQUENCE(, n), f, FILTER(w, ISEVEN(s)), TEXTJOIN(CHAR(10), TRUE, f))
Just to add some thoughts on modern Excel. I am coming to be of the opinion that spreadsheet formulas are best presented as Lambda functions rather than having the formula details exposed. In that case HansVogelaar's formula might be
ImportantText1λ
= LAMBDA(string,
LET(
newLine, CHAR(10),
extracts, IFERROR(TRIM(TEXTAFTER(TEXTSPLIT(string, "<<"), ">>")),""),
TEXTJOIN(newLine, TRUE, extracts, "")
)
);
whilst mine might be
ImportantText2λ
= LAMBDA(string,
LET(
newLine, CHAR(10),
regex, "[\w|\s]*(?=\<)",
TEXTJOIN(newLine,,REGEXEXTRACT(string, regex, 1))
)
);
The worksheet formulas would be
= MAP(notes, ImportantTextλ)
Any change to the formula made by the developer need not even be visible to the user (unless they are the same person of course) so the sheet may be updated when one chooses using the AFE (Advanced Formula Environment).
- SergeiBaklanOct 28, 2024Diamond Contributor
And if with RegEx only (e.g. without MAP and TEXTJOIN)
=REGEXREPLACE( REGEXREPLACE( A5:A7, "(?x) ( [.]*>> | <<[\n]?.*>> | <<.*[\n]?>> | <<[\n]?.*$ ) | (^.*>>) ", "\n" ), "^\n|\n$", "" )
- PeterBartholomew1Oct 29, 2024Silver Contributor
I have mixed feelings regarding regular expressions. The plusses are that it is an industry standard; it is extremely powerful; and its sheer ingenuity is amazing. That said, its conciseness makes it incredibly difficult to read. I would gladly trade 10 times the key presses for 4 times the readability.
In that vein, I did look at some solutions on Stack Overflow where contributors had described elements of the formula, so that a sequence returning "5 1/4" would be designated as <number>, when used more than once. With that in mind, I have explored the idea of using LET variables and even LAMBDA to provide meaning to expressions that would otherwise place heavy demands on the reader to mentally parse the formula. A hybrid solution that would horrify a regex purist!
- SergeiBaklanOct 30, 2024Diamond Contributor
In general we may use comments like
=REGEXREPLACE( REGEXREPLACE( A5:A7, "(?x) #allow spaces/comments ( [.]*>> | #match any text with open sep, or <<[\n]?.*>> | #match any text between close and next open sep, or <<.*[\n]?>> | #the same if newline could be before close sep, or " & " <<[\n]?.*$ )| #match any text between close sep and end of line, or (^.*>>) #match any text at the start of the line ", "\n" ), "^\n|\n$", "" )
The problem is Excel still doesn't allow in formulae text constants with more than 255 characters
Thus we need to concat parts of the pattern. Perhaps similar will be with LET/LAMBDA where bunch of names to be concat.
There are great tools on the market like
Screen Shots: Create, Edit, Test, Debug, Convert, Save and Use Regex, but that's more for people who works with regex on regular basis.
- Zdenek_MoravecOct 28, 2024Brass Contributor
PeterBartholomew1 , nice approach, I like it. I hope, Microsoft team will also improve debug possibilities for LAMBDA formula and the Name Manager dialog window. Currently, it is a kind of black box.
- SergeiBaklanOct 28, 2024Diamond Contributor
AFE from Excel Labs gives both - debugger and ability to work with names. Not IDE, but at least something.
- Zdenek_MoravecOct 28, 2024Brass Contributor
I see now, the Advanced Formula Environment mentioned by PeterBartholomew1 is suitable for LAMBDA debugging. I will have a look, thank You both.
- SergeiBaklanOct 28, 2024Diamond Contributor
As variant
=TEXTJOIN( CHAR(10),, FILTERXML( "<t>" & SUBSTITUTE( SUBSTITUTE(A5,">>","<s>"),"<<","</s>") & "</t>", "//s" ) )