Forum Discussion
JCountryman
Mar 09, 2023Copper Contributor
Cell has a list wrapped in html. Split list into 2 separated by special character
I have an excel file that has a list of items that are wrapped in html code. I would like to split it into 2 groups and move them into 2 different cells and separate the list with a special character...
Patrick2788
Mar 09, 2023Silver Contributor
Yes, I've revised my post above with new formulas and a new workbook.
JCountryman
Mar 09, 2023Copper Contributor
Patrick Im not having much luck trying to emulate what you created. Could you please take a look
- peiyezhuMar 10, 2023Bronze Contributor
- JCountrymanMar 11, 2023Copper Contributor
peiyezhu for the plan and option we would need a special character ">" in-between the each bullet in the "Plan" and "Option" cells.
Is there a way to run a script so this is automated? There will be a total of 8-10 Long descriptions.
Let me know if this can be done
- peiyezhuMar 11, 2023Bronze Contributor
php
https://www.php.net/manual/en/function.strip-tags.php
https://www.php.net/manual/en/function.preg-match-all
vba
https://software-solutions-online.com/vba-regex-guide/
https://answers.microsoft.com/en-us/msoffice/forum/all/building-and-using-regex-patterns-in-vba/955f7d59-82bb-4897-8e24-bb02b4c1137a
online tool
http://e.anyoupin.cn/ceshi/jstest/pull_up_demo.php?s=Split_list
- peiyezhuMar 10, 2023Bronze Contributorcreate temp table aa as
select *,udf_strip_tags(`Long Description`) plan from Split_list_into_two_separated_by_special_character;
select `Long Description`,regexp2('Includes:([\s\S]*)Optional',plan,1) Plan_inc,regexp2('Optional[\s\S]*Include:([\s\S]+)$',plan,1) Option_inc from aa;
- Patrick2788Mar 09, 2023Silver ContributorThat sample has different terms within the <> and listed in a different order so the TEXTBEFORE/TEXTAFTER is failing. "Includes:" and "Optional" can be used as delimiters to pull the text for Plan and Benefits then it becomes a matter of cleaning out the other stuff. I'll give it some thought.
- JCountrymanMar 13, 2023Copper Contributor
Patrick2788 were you able to give this any more thought?
- Patrick2788Mar 14, 2023Silver Contributor
According to Syed_Asad_Abbas , this is possible using BS4 which I believe he's referring to BeautifulSoup 4 which is a Python package. I'll defer to the Python experts on this one.
- JCountrymanMar 10, 2023Copper Contributor
Patrick2788 Did you have any luck with this?
- Patrick2788Mar 10, 2023Silver ContributorI don't believe there's enough of a pattern to draw up a formula to clean each string that comes in. We can plan for the tags that appear in <> but the part that makes it difficult is the style information contains a lot of text that will change by the cell. For example:
<span style="font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, "Helvetica Neue", Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 14.4px;">
- JCountrymanMar 09, 2023Copper Contributor
Patrick2788 Thanks Patrick