Forum Discussion
Extract # and @ from tweet
Hello,
I have a large dataset with tweets in them. These are displayed in a single cell as text. I want to extract all hashtags and tags in this tweet and display them in their own cells. For example:
"My bilateral programme will comprise meetings with PM @GakhariaGiorgi and FM @DZalkaliani as well with the chairman of the https://www.mrexcel.com/forum/usertag.php?do=list&action=hash&hash=Parliament of https://www.mrexcel.com/forum/usertag.php?do=list&action=hash&hash=Georgia " would give" - @GakhariaGiorgi @DZalkaliani https://www.mrexcel.com/forum/usertag.php?do=list&action=hash&hash=Parliament https://www.mrexcel.com/forum/usertag.php?do=list&action=hash&hash=Georgia . Preferably all in their own cell.
I have already tried the following code, but this gave me an error. https://www.mrexcel.com/forum/excel-questions/672138-extracting-mentions-hashtags-column-columns-b-c.html
Any help would be very much appreciated.
2 Replies
- SergeiBaklanDiamond Contributor
I tried that with Power Query. If we add function which extracts words between "@" or "#" and end of the word
(Source as table, Delimiter as text, ResultColumnName as text) as table => let EndWordCharacters = {" ",",",".",":",";","!"}, //Source = Tweets, SplitByFirstDelimeter = Table.AddColumn( Source, "Custom", each List.RemoveFirstN( Text.Split([Tweets],Delimiter) ,1) ), CutAfterWordEnd = Table.AddColumn( SplitByFirstDelimeter, ResultColumnName, each List.Transform([Custom], each Delimiter & Text.Start(_, if Text.PositionOfAny(_,EndWordCharacters) =-1 then Text.Length(_) else Text.PositionOfAny(_,EndWordCharacters) ) ) ), ExtractValues = Table.TransformColumns( CutAfterWordEnd, {ResultColumnName, each Text.Combine( List.Transform(_, Text.From), ", "), type text } ), RemoveColumns = Table.RemoveColumns( ExtractValues, {"Custom"} ) in RemoveColumns
result could be received as
let Source = Excel.CurrentWorkbook(){[Name="Tweets"]}[Content], AddMentions = ExtractTags(Source,"@","Mentions"), AddHashtags = ExtractTags(AddMentions,"#","Hashtags") in AddHashtags
in form of
If each mention/hashtag put into separate cell, when just split columns by comma in final result.
- Mees_LCopper Contributor
Hi Sergei,
I have used your code for a school project for the University of Amsterdam and in doing so I discovered a flaw that wasn't caught in your test-case example. I believe (with my limited knowledge) that the special characters defined in your EndWordCharacters variable weren't picked out by the code (I couldn't tell you why). I have replaced the strings in the list with Character.FromNumber() for all the characters I wanted the code to cut off the text at and now it works like a charm.
Thank you for providing the code I could work with.
Kind Regards,
Mees