Cell has a list wrapped in html. Split list into 2 separated by special character

Copper Contributor

I have an excel file that has a list of items that are wrapped in html code. I would like to split it into 2 groups and move them into 2 different cells and separate the list with a special character (such as >) that I can search and replace later on.

 

Please see attachment. I have put notes in the excel file

26 Replies

@JCountryman 

 

Revised formulas:

 

Plan:

=LET(plan,TEXTAFTER(TEXTBEFORE(B2,$D$1),"</span><div><ul><li>"),SUBSTITUTE(SUBSTITUTE(plan,"</li><li>",">"),"</li></ul><div>",""))

Benefits:

=LET(benefits,TEXTAFTER(TEXTAFTER(B2,$D$1),"</div><div><ul><li>"),cleaned,SUBSTITUTE(benefits,"</li><li>",">"),TEXTBEFORE(cleaned,"</li>"))

 

@Patrick2788 

 

Patrick I appreciate your response. Instead of a list can we set it up so that it is all one line separated by >. See column C and 7. Highlighted in red. See attached

 

 

Yes, I've revised my post above with new formulas and a new workbook.

@Patrick2788 

Patrick Im not having much luck trying to emulate what you created. Could you please take a look

That sample has different terms within the <> and listed in a different order so the TEXTBEFORE/TEXTAFTER is failing. "Includes:" and "Optional" can be used as delimiters to pull the text for Plan and Benefits then it becomes a matter of cleaning out the other stuff. I'll give it some thought.
create temp table aa as
select *,udf_strip_tags(`Long Description`) plan from Split_list_into_two_separated_by_special_character;
select `Long Description`,regexp2('Includes:([\s\S]*)Optional',plan,1) Plan_inc,regexp2('Optional[\s\S]*Include:([\s\S]+)$',plan,1) Option_inc from aa;

I don't believe there's enough of a pattern to draw up a formula to clean each string that comes in. We can plan for the tags that appear in <> but the part that makes it difficult is the style information contains a lot of text that will change by the cell. For example:
<span style="font-family: -apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, Roboto, &quot;Helvetica Neue&quot;, Arial, sans-serif, &quot;Apple Color Emoji&quot;, &quot;Segoe UI Emoji&quot;, &quot;Segoe UI Symbol&quot;; font-size: 14.4px;">

@Patrick2788 

 

This is a full list. I think we could use Plan and Optional as a pattern to clean each string.

 

For the css styling we can delete it all so it is just text. I have to export the files out in a text format which I need to do a data merge in indesign

 

Will this work with your script?

I'm afraid it's the same issue. I've explored online tools that can make the CSS neater to an extent but it doesn't do us any favors for working with it in Excel. The best solution may be a tool that can extract text from CSS.

@Patrick2788  You were able to strip out the html and css before. Is that a problem now because of my changes?

@JCountryman 

There's too much variation in the CSS. It takes a ridiculous formula to go from the below left to the below right for the Plan:

Patrick2788_0-1678474370740.png

I'm limited with a choice of delimiters for TEXTBEFORE/TEXTAFTER because some of those tags appear elsewhere or with additional tags. There's not enough of a pattern to pick the text out cleanly.  As you can see from the above, there's still a few tags in the 'after'.  SUBSTITUTE can pull those out, but how many different  possibilities might there be to look for with the CSS?

 

The benefits information is more difficult because it contains more style CSS that often changes the name of the font type and size. Lots of things to look for - theoretically possible but if I'm writing the formula for all scenarios it becomes a question if manual extraction would be easier.

@Patrick2788 I know Plans and Options are getting a contant so can you use those as delimiters?

@peiyezhu for the plan and option we would need a special character ">" in-between the each bullet in the "Plan" and "Option" cells.

 

Is there a way to run a script so this is automated? There will be a total of 8-10 Long descriptions.

 

Let me know if this can be done 

@peiyezhu This is way to complicated for me. is there a easier way to do this or can you create a automation to run this on my excel sheet?

create a automation to run this on my excel sheet?
Sorry,I have no windows PC on hand and can not run/debug vba specific for your situation.

Wait for some others idea.