Sep 23 2019 11:34 AM
Hello,
I have a large dataset with tweets in them. These are displayed in a single cell as text. I want to extract all hashtags and tags in this tweet and display them in their own cells. For example:
"My bilateral programme will comprise meetings with PM @GakhariaGiorgi and FM @DZalkaliani as well with the chairman of the #Parliament of #Georgia " would give" - @GakhariaGiorgi @DZalkaliani #Parliament #Georgia . Preferably all in their own cell.
I have already tried the following code, but this gave me an error. https://www.mrexcel.com/forum/excel-...lumns-b-c.html
Any help would be very much appreciated.
Sep 23 2019 02:59 PM
I tried that with Power Query. If we add function which extracts words between "@" or "#" and end of the word
(Source as table, Delimiter as text, ResultColumnName as text) as table =>
let
EndWordCharacters = {" ",",",".",":",";","!"},
//Source = Tweets,
SplitByFirstDelimeter = Table.AddColumn(
Source,
"Custom",
each List.RemoveFirstN( Text.Split([Tweets],Delimiter) ,1) ),
CutAfterWordEnd = Table.AddColumn(
SplitByFirstDelimeter,
ResultColumnName,
each List.Transform([Custom],
each Delimiter &
Text.Start(_,
if Text.PositionOfAny(_,EndWordCharacters) =-1
then Text.Length(_)
else Text.PositionOfAny(_,EndWordCharacters)
)
)
),
ExtractValues = Table.TransformColumns(
CutAfterWordEnd,
{ResultColumnName,
each Text.Combine( List.Transform(_, Text.From), ", "),
type text
}
),
RemoveColumns = Table.RemoveColumns(
ExtractValues,
{"Custom"}
)
in
RemoveColumns
result could be received as
let
Source = Excel.CurrentWorkbook(){[Name="Tweets"]}[Content],
AddMentions = ExtractTags(Source,"@","Mentions"),
AddHashtags = ExtractTags(AddMentions,"#","Hashtags")
in
AddHashtags
in form of
If each mention/hashtag put into separate cell, when just split columns by comma in final result.
Oct 09 2022 07:56 AM
Hi Sergei,
I have used your code for a school project for the University of Amsterdam and in doing so I discovered a flaw that wasn't caught in your test-case example. I believe (with my limited knowledge) that the special characters defined in your EndWordCharacters variable weren't picked out by the code (I couldn't tell you why). I have replaced the strings in the list with Character.FromNumber() for all the characters I wanted the code to cut off the text at and now it works like a charm.
Thank you for providing the code I could work with.
Kind Regards,
Mees