Home

Extract # and @ from tweet

%3CLINGO-SUB%20id%3D%22lingo-sub-870856%22%20slang%3D%22en-US%22%3EExtract%20%23%20and%20%40%20from%20tweet%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-870856%22%20slang%3D%22en-US%22%3E%3CP%3E%3CSPAN%3EHello%2C%3C%2FSPAN%3E%3CBR%20%2F%3E%3CBR%20%2F%3E%3CSPAN%3EI%20have%20a%20large%20dataset%20with%20tweets%20in%20them.%20These%20are%20displayed%20in%20a%20single%20cell%20as%20text.%20I%20want%20to%20extract%20all%20hashtags%20and%20tags%20in%20this%20tweet%20and%20display%20them%20in%20their%20own%20cells.%20For%20example%3A%3C%2FSPAN%3E%3CBR%20%2F%3E%3CBR%20%2F%3E%3CSPAN%3E%22My%20bilateral%20programme%20will%20comprise%20meetings%20with%20PM%20%40GakhariaGiorgi%20and%20FM%20%40DZalkaliani%20as%20well%20with%20the%20chairman%20of%20the%26nbsp%3B%3C%2FSPAN%3E%3CA%20href%3D%22https%3A%2F%2Fwww.mrexcel.com%2Fforum%2Fusertag.php%3Fdo%3Dlist%26amp%3Baction%3Dhash%26amp%3Bhash%3DParliament%22%20target%3D%22_blank%22%20rel%3D%22nofollow%20noopener%20noopener%20noreferrer%20noopener%20noreferrer%20noopener%20noreferrer%22%3E%23Parliament%3C%2FA%3E%3CSPAN%3E%26nbsp%3Bof%26nbsp%3B%3C%2FSPAN%3E%3CA%20href%3D%22https%3A%2F%2Fwww.mrexcel.com%2Fforum%2Fusertag.php%3Fdo%3Dlist%26amp%3Baction%3Dhash%26amp%3Bhash%3DGeorgia%22%20target%3D%22_blank%22%20rel%3D%22nofollow%20noopener%20noopener%20noreferrer%20noopener%20noreferrer%20noopener%20noreferrer%22%3E%23Georgia%3C%2FA%3E%3CSPAN%3E%26nbsp%3B%22%20would%20give%22%20-%20%40GakhariaGiorgi%20%40DZalkaliani%26nbsp%3B%3C%2FSPAN%3E%3CA%20href%3D%22https%3A%2F%2Fwww.mrexcel.com%2Fforum%2Fusertag.php%3Fdo%3Dlist%26amp%3Baction%3Dhash%26amp%3Bhash%3DParliament%22%20target%3D%22_blank%22%20rel%3D%22nofollow%20noopener%20noopener%20noreferrer%20noopener%20noreferrer%20noopener%20noreferrer%22%3E%23Parliament%3C%2FA%3E%3CSPAN%3E%26nbsp%3B%3C%2FSPAN%3E%3CA%20href%3D%22https%3A%2F%2Fwww.mrexcel.com%2Fforum%2Fusertag.php%3Fdo%3Dlist%26amp%3Baction%3Dhash%26amp%3Bhash%3DGeorgia%22%20target%3D%22_blank%22%20rel%3D%22nofollow%20noopener%20noopener%20noreferrer%20noopener%20noreferrer%20noopener%20noreferrer%22%3E%23Georgia%3C%2FA%3E%3CSPAN%3E%26nbsp%3B.%20Preferably%20all%20in%20their%20own%20cell.%3C%2FSPAN%3E%3CBR%20%2F%3E%3CBR%20%2F%3E%3CSPAN%3EI%20have%20already%20tried%20the%20following%20code%2C%20but%20this%20gave%20me%20an%20error.%26nbsp%3B%3C%2FSPAN%3E%3CA%20href%3D%22https%3A%2F%2Fwww.mrexcel.com%2Fforum%2Fexcel-questions%2F672138-extracting-mentions-hashtags-column-columns-b-c.html%22%20target%3D%22_blank%22%20rel%3D%22noopener%20nofollow%20noopener%20noreferrer%20noopener%20noreferrer%20noopener%20noreferrer%22%3Ehttps%3A%2F%2Fwww.mrexcel.com%2Fforum%2Fexcel-...lumns-b-c.html%3C%2FA%3E%3CBR%20%2F%3E%3CBR%20%2F%3E%3CSPAN%3EAny%20help%20would%20be%20very%20much%20appreciated.%3C%2FSPAN%3E%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-870856%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EExcel%20on%20Mac%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3EMacros%20and%20VBA%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E%3CLINGO-SUB%20id%3D%22lingo-sub-871131%22%20slang%3D%22en-US%22%3ERe%3A%20Extract%20%23%20and%20%40%20from%20tweet%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-871131%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F413809%22%20target%3D%22_blank%22%3E%40Marlonvandijk%3C%2FA%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EI%20tried%20that%20with%20Power%20Query.%20If%20we%20add%20function%20which%20extracts%20words%20between%20%22%40%22%20or%20%22%23%22%20and%20end%20of%20the%20word%3C%2FP%3E%0A%3CPRE%20class%3D%22lia-code-sample%20language-markup%22%3E%3CCODE%3E(Source%20as%20table%2C%20Delimiter%20as%20text%2C%20ResultColumnName%20as%20text)%20as%20table%20%3D%26gt%3B%0A%0Alet%0A%20%20%20%20EndWordCharacters%20%3D%20%7B%22%20%22%2C%22%2C%22%2C%22.%22%2C%22%3A%22%2C%22%3B%22%2C%22!%22%7D%2C%0A%0A%20%20%20%20%2F%2FSource%20%3D%20Tweets%2C%0A%20%20%20%20SplitByFirstDelimeter%20%3D%20Table.AddColumn(%0A%20%20%20%20%20%20%20%20Source%2C%0A%20%20%20%20%20%20%20%20%22Custom%22%2C%0A%20%20%20%20%20%20%20%20each%20List.RemoveFirstN(%20Text.Split(%5BTweets%5D%2CDelimiter)%20%2C1)%20)%2C%0A%20%20%20%20CutAfterWordEnd%20%3D%20Table.AddColumn(%0A%20%20%20%20%20%20%20%20SplitByFirstDelimeter%2C%0A%20%20%20%20%20%20%20%20ResultColumnName%2C%0A%20%20%20%20%20%20%20%20each%20List.Transform(%5BCustom%5D%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20each%20Delimiter%20%26amp%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20Text.Start(_%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20if%20Text.PositionOfAny(_%2CEndWordCharacters)%20%3D-1%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20then%20Text.Length(_)%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20else%20Text.PositionOfAny(_%2CEndWordCharacters)%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20)%0A%20%20%20%20%20%20%20%20)%0A%20%20%20%20)%2C%0A%20%20%20%20ExtractValues%20%3D%20Table.TransformColumns(%0A%20%20%20%20%20%20%20%20CutAfterWordEnd%2C%0A%20%20%20%20%20%20%20%20%7BResultColumnName%2C%0A%20%20%20%20%20%20%20%20each%20Text.Combine(%20List.Transform(_%2C%20Text.From)%2C%20%22%2C%20%22)%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20type%20text%0A%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20)%2C%0A%20%20%20%20RemoveColumns%20%3D%20Table.RemoveColumns(%0A%20%20%20%20%20%20%20%20ExtractValues%2C%0A%20%20%20%20%20%20%20%20%7B%22Custom%22%7D%0A%20%20%20%20)%0Ain%0A%20%20%20%20RemoveColumns%3C%2FCODE%3E%3C%2FPRE%3E%0A%3CP%3Eresult%20could%20be%20received%20as%3C%2FP%3E%0A%3CPRE%20class%3D%22lia-code-sample%20language-markup%22%3E%3CCODE%3Elet%0A%20%20%20%20Source%20%3D%20Excel.CurrentWorkbook()%7B%5BName%3D%22Tweets%22%5D%7D%5BContent%5D%2C%0A%20%20%20%20AddMentions%20%3D%20ExtractTags(Source%2C%22%40%22%2C%22Mentions%22)%2C%0A%20%20%20%20AddHashtags%20%3D%20ExtractTags(AddMentions%2C%22%23%22%2C%22Hashtags%22)%0Ain%0A%20%20%20%20AddHashtags%3C%2FCODE%3E%3C%2FPRE%3E%0A%3CP%3Ein%20form%20of%3C%2FP%3E%0A%3CP%3E%3CSPAN%20class%3D%22lia-inline-image-display-wrapper%20lia-image-align-inline%22%20style%3D%22width%3A%20999px%3B%22%3E%3CIMG%20src%3D%22https%3A%2F%2Fgxcuf89792.i.lithium.com%2Ft5%2Fimage%2Fserverpage%2Fimage-id%2F133212iE8503FFB527B0CA8%2Fimage-size%2Flarge%3Fv%3D1.0%26amp%3Bpx%3D999%22%20alt%3D%22image.png%22%20title%3D%22image.png%22%20%2F%3E%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%3EIf%20each%20mention%2Fhashtag%20put%20into%20separate%20cell%2C%20when%20just%20split%20columns%20by%20comma%20in%20final%20result.%3C%2FP%3E%3C%2FLINGO-BODY%3E
Marlonvandijk
Occasional Visitor

Hello,

I have a large dataset with tweets in them. These are displayed in a single cell as text. I want to extract all hashtags and tags in this tweet and display them in their own cells. For example:

"My bilateral programme will comprise meetings with PM @GakhariaGiorgi and FM @DZalkaliani as well with the chairman of the #Parliament of #Georgia " would give" - @GakhariaGiorgi @DZalkaliani #Parliament #Georgia . Preferably all in their own cell.

I have already tried the following code, but this gave me an error. https://www.mrexcel.com/forum/excel-...lumns-b-c.html

Any help would be very much appreciated.

1 Reply

@Marlonvandijk 

I tried that with Power Query. If we add function which extracts words between "@" or "#" and end of the word

(Source as table, Delimiter as text, ResultColumnName as text) as table =>

let
    EndWordCharacters = {" ",",",".",":",";","!"},

    //Source = Tweets,
    SplitByFirstDelimeter = Table.AddColumn(
        Source,
        "Custom",
        each List.RemoveFirstN( Text.Split([Tweets],Delimiter) ,1) ),
    CutAfterWordEnd = Table.AddColumn(
        SplitByFirstDelimeter,
        ResultColumnName,
        each List.Transform([Custom],
            each Delimiter &
                Text.Start(_,
                    if Text.PositionOfAny(_,EndWordCharacters) =-1
                    then Text.Length(_)
                    else Text.PositionOfAny(_,EndWordCharacters)
                )
        )
    ),
    ExtractValues = Table.TransformColumns(
        CutAfterWordEnd,
        {ResultColumnName,
        each Text.Combine( List.Transform(_, Text.From), ", "),
            type text
        }
    ),
    RemoveColumns = Table.RemoveColumns(
        ExtractValues,
        {"Custom"}
    )
in
    RemoveColumns

result could be received as

let
    Source = Excel.CurrentWorkbook(){[Name="Tweets"]}[Content],
    AddMentions = ExtractTags(Source,"@","Mentions"),
    AddHashtags = ExtractTags(AddMentions,"#","Hashtags")
in
    AddHashtags

in form of

image.png

If each mention/hashtag put into separate cell, when just split columns by comma in final result.

Related Conversations
Tabs and Dark Mode
cjc2112 in Discussions on
46 Replies
Extentions Synchronization
Deleted in Discussions on
3 Replies
Stable version of Edge insider browser
HotCakeX in Discussions on
35 Replies
How to Prevent Teams from Auto-Launch
chenrylee in Microsoft Teams on
30 Replies
Security Community Webinars
Valon_Kolica in Security, Privacy & Compliance on
13 Replies