Forum Discussion
Extract # and @ from tweet
I tried that with Power Query. If we add function which extracts words between "@" or "#" and end of the word
(Source as table, Delimiter as text, ResultColumnName as text) as table =>
let
EndWordCharacters = {" ",",",".",":",";","!"},
//Source = Tweets,
SplitByFirstDelimeter = Table.AddColumn(
Source,
"Custom",
each List.RemoveFirstN( Text.Split([Tweets],Delimiter) ,1) ),
CutAfterWordEnd = Table.AddColumn(
SplitByFirstDelimeter,
ResultColumnName,
each List.Transform([Custom],
each Delimiter &
Text.Start(_,
if Text.PositionOfAny(_,EndWordCharacters) =-1
then Text.Length(_)
else Text.PositionOfAny(_,EndWordCharacters)
)
)
),
ExtractValues = Table.TransformColumns(
CutAfterWordEnd,
{ResultColumnName,
each Text.Combine( List.Transform(_, Text.From), ", "),
type text
}
),
RemoveColumns = Table.RemoveColumns(
ExtractValues,
{"Custom"}
)
in
RemoveColumns
result could be received as
let
Source = Excel.CurrentWorkbook(){[Name="Tweets"]}[Content],
AddMentions = ExtractTags(Source,"@","Mentions"),
AddHashtags = ExtractTags(AddMentions,"#","Hashtags")
in
AddHashtags
in form of
If each mention/hashtag put into separate cell, when just split columns by comma in final result.
Hi Sergei,
I have used your code for a school project for the University of Amsterdam and in doing so I discovered a flaw that wasn't caught in your test-case example. I believe (with my limited knowledge) that the special characters defined in your EndWordCharacters variable weren't picked out by the code (I couldn't tell you why). I have replaced the strings in the list with Character.FromNumber() for all the characters I wanted the code to cut off the text at and now it works like a charm.
Thank you for providing the code I could work with.
Kind Regards,
Mees