Home

Extract # and @ from tweet

%3CLINGO-SUB%20id%3D%22lingo-sub-870856%22%20slang%3D%22en-US%22%3EExtract%20%23%20and%20%40%20from%20tweet%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-870856%22%20slang%3D%22en-US%22%3E%3CP%3E%3CSPAN%3EHello%2C%3C%2FSPAN%3E%3CBR%20%2F%3E%3CBR%20%2F%3E%3CSPAN%3EI%20have%20a%20large%20dataset%20with%20tweets%20in%20them.%20These%20are%20displayed%20in%20a%20single%20cell%20as%20text.%20I%20want%20to%20extract%20all%20hashtags%20and%20tags%20in%20this%20tweet%20and%20display%20them%20in%20their%20own%20cells.%20For%20example%3A%3C%2FSPAN%3E%3CBR%20%2F%3E%3CBR%20%2F%3E%3CSPAN%3E%22My%20bilateral%20programme%20will%20comprise%20meetings%20with%20PM%20%40GakhariaGiorgi%20and%20FM%20%40DZalkaliani%20as%20well%20with%20the%20chairman%20of%20the%26nbsp%3B%3C%2FSPAN%3E%3CA%20href%3D%22https%3A%2F%2Fwww.mrexcel.com%2Fforum%2Fusertag.php%3Fdo%3Dlist%26amp%3Baction%3Dhash%26amp%3Bhash%3DParliament%22%20target%3D%22_blank%22%20rel%3D%22nofollow%20noopener%20noopener%20noreferrer%20noopener%20noreferrer%20noopener%20noreferrer%22%3E%23Parliament%3C%2FA%3E%3CSPAN%3E%26nbsp%3Bof%26nbsp%3B%3C%2FSPAN%3E%3CA%20href%3D%22https%3A%2F%2Fwww.mrexcel.com%2Fforum%2Fusertag.php%3Fdo%3Dlist%26amp%3Baction%3Dhash%26amp%3Bhash%3DGeorgia%22%20target%3D%22_blank%22%20rel%3D%22nofollow%20noopener%20noopener%20noreferrer%20noopener%20noreferrer%20noopener%20noreferrer%22%3E%23Georgia%3C%2FA%3E%3CSPAN%3E%26nbsp%3B%22%20would%20give%22%20-%20%40GakhariaGiorgi%20%40DZalkaliani%26nbsp%3B%3C%2FSPAN%3E%3CA%20href%3D%22https%3A%2F%2Fwww.mrexcel.com%2Fforum%2Fusertag.php%3Fdo%3Dlist%26amp%3Baction%3Dhash%26amp%3Bhash%3DParliament%22%20target%3D%22_blank%22%20rel%3D%22nofollow%20noopener%20noopener%20noreferrer%20noopener%20noreferrer%20noopener%20noreferrer%22%3E%23Parliament%3C%2FA%3E%3CSPAN%3E%26nbsp%3B%3C%2FSPAN%3E%3CA%20href%3D%22https%3A%2F%2Fwww.mrexcel.com%2Fforum%2Fusertag.php%3Fdo%3Dlist%26amp%3Baction%3Dhash%26amp%3Bhash%3DGeorgia%22%20target%3D%22_blank%22%20rel%3D%22nofollow%20noopener%20noopener%20noreferrer%20noopener%20noreferrer%20noopener%20noreferrer%22%3E%23Georgia%3C%2FA%3E%3CSPAN%3E%26nbsp%3B.%20Preferably%20all%20in%20their%20own%20cell.%3C%2FSPAN%3E%3CBR%20%2F%3E%3CBR%20%2F%3E%3CSPAN%3EI%20have%20already%20tried%20the%20following%20code%2C%20but%20this%20gave%20me%20an%20error.%26nbsp%3B%3C%2FSPAN%3E%3CA%20href%3D%22https%3A%2F%2Fwww.mrexcel.com%2Fforum%2Fexcel-questions%2F672138-extracting-mentions-hashtags-column-columns-b-c.html%22%20target%3D%22_blank%22%20rel%3D%22noopener%20nofollow%20noopener%20noreferrer%20noopener%20noreferrer%20noopener%20noreferrer%22%3Ehttps%3A%2F%2Fwww.mrexcel.com%2Fforum%2Fexcel-...lumns-b-c.html%3C%2FA%3E%3CBR%20%2F%3E%3CBR%20%2F%3E%3CSPAN%3EAny%20help%20would%20be%20very%20much%20appreciated.%3C%2FSPAN%3E%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-870856%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EExcel%20on%20Mac%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3EMacros%20and%20VBA%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E%3CLINGO-SUB%20id%3D%22lingo-sub-871131%22%20slang%3D%22en-US%22%3ERe%3A%20Extract%20%23%20and%20%40%20from%20tweet%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-871131%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F413809%22%20target%3D%22_blank%22%3E%40Marlonvandijk%3C%2FA%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EI%20tried%20that%20with%20Power%20Query.%20If%20we%20add%20function%20which%20extracts%20words%20between%20%22%40%22%20or%20%22%23%22%20and%20end%20of%20the%20word%3C%2FP%3E%0A%3CPRE%20class%3D%22lia-code-sample%20language-markup%22%3E%3CCODE%3E(Source%20as%20table%2C%20Delimiter%20as%20text%2C%20ResultColumnName%20as%20text)%20as%20table%20%3D%26gt%3B%0A%0Alet%0A%20%20%20%20EndWordCharacters%20%3D%20%7B%22%20%22%2C%22%2C%22%2C%22.%22%2C%22%3A%22%2C%22%3B%22%2C%22!%22%7D%2C%0A%0A%20%20%20%20%2F%2FSource%20%3D%20Tweets%2C%0A%20%20%20%20SplitByFirstDelimeter%20%3D%20Table.AddColumn(%0A%20%20%20%20%20%20%20%20Source%2C%0A%20%20%20%20%20%20%20%20%22Custom%22%2C%0A%20%20%20%20%20%20%20%20each%20List.RemoveFirstN(%20Text.Split(%5BTweets%5D%2CDelimiter)%20%2C1)%20)%2C%0A%20%20%20%20CutAfterWordEnd%20%3D%20Table.AddColumn(%0A%20%20%20%20%20%20%20%20SplitByFirstDelimeter%2C%0A%20%20%20%20%20%20%20%20ResultColumnName%2C%0A%20%20%20%20%20%20%20%20each%20List.Transform(%5BCustom%5D%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20each%20Delimiter%20%26amp%3B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20Text.Start(_%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20if%20Text.PositionOfAny(_%2CEndWordCharacters)%20%3D-1%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20then%20Text.Length(_)%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20else%20Text.PositionOfAny(_%2CEndWordCharacters)%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20)%0A%20%20%20%20%20%20%20%20)%0A%20%20%20%20)%2C%0A%20%20%20%20ExtractValues%20%3D%20Table.TransformColumns(%0A%20%20%20%20%20%20%20%20CutAfterWordEnd%2C%0A%20%20%20%20%20%20%20%20%7BResultColumnName%2C%0A%20%20%20%20%20%20%20%20each%20Text.Combine(%20List.Transform(_%2C%20Text.From)%2C%20%22%2C%20%22)%2C%0A%20%20%20%20%20%20%20%20%20%20%20%20type%20text%0A%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20)%2C%0A%20%20%20%20RemoveColumns%20%3D%20Table.RemoveColumns(%0A%20%20%20%20%20%20%20%20ExtractValues%2C%0A%20%20%20%20%20%20%20%20%7B%22Custom%22%7D%0A%20%20%20%20)%0Ain%0A%20%20%20%20RemoveColumns%3C%2FCODE%3E%3C%2FPRE%3E%0A%3CP%3Eresult%20could%20be%20received%20as%3C%2FP%3E%0A%3CPRE%20class%3D%22lia-code-sample%20language-markup%22%3E%3CCODE%3Elet%0A%20%20%20%20Source%20%3D%20Excel.CurrentWorkbook()%7B%5BName%3D%22Tweets%22%5D%7D%5BContent%5D%2C%0A%20%20%20%20AddMentions%20%3D%20ExtractTags(Source%2C%22%40%22%2C%22Mentions%22)%2C%0A%20%20%20%20AddHashtags%20%3D%20ExtractTags(AddMentions%2C%22%23%22%2C%22Hashtags%22)%0Ain%0A%20%20%20%20AddHashtags%3C%2FCODE%3E%3C%2FPRE%3E%0A%3CP%3Ein%20form%20of%3C%2FP%3E%0A%3CP%3E%3CSPAN%20class%3D%22lia-inline-image-display-wrapper%20lia-image-align-inline%22%20style%3D%22width%3A%20999px%3B%22%3E%3CIMG%20src%3D%22https%3A%2F%2Fgxcuf89792.i.lithium.com%2Ft5%2Fimage%2Fserverpage%2Fimage-id%2F133212iE8503FFB527B0CA8%2Fimage-size%2Flarge%3Fv%3D1.0%26amp%3Bpx%3D999%22%20alt%3D%22image.png%22%20title%3D%22image.png%22%20%2F%3E%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%3EIf%20each%20mention%2Fhashtag%20put%20into%20separate%20cell%2C%20when%20just%20split%20columns%20by%20comma%20in%20final%20result.%3C%2FP%3E%3C%2FLINGO-BODY%3E
Highlighted
Marlonvandijk
Occasional Visitor

Hello,

I have a large dataset with tweets in them. These are displayed in a single cell as text. I want to extract all hashtags and tags in this tweet and display them in their own cells. For example:

"My bilateral programme will comprise meetings with PM @GakhariaGiorgi and FM @DZalkaliani as well with the chairman of the #Parliament of #Georgia " would give" - @GakhariaGiorgi @DZalkaliani #Parliament #Georgia . Preferably all in their own cell.

I have already tried the following code, but this gave me an error. https://www.mrexcel.com/forum/excel-...lumns-b-c.html

Any help would be very much appreciated.

1 Reply

@Marlonvandijk 

I tried that with Power Query. If we add function which extracts words between "@" or "#" and end of the word

(Source as table, Delimiter as text, ResultColumnName as text) as table =>

let
    EndWordCharacters = {" ",",",".",":",";","!"},

    //Source = Tweets,
    SplitByFirstDelimeter = Table.AddColumn(
        Source,
        "Custom",
        each List.RemoveFirstN( Text.Split([Tweets],Delimiter) ,1) ),
    CutAfterWordEnd = Table.AddColumn(
        SplitByFirstDelimeter,
        ResultColumnName,
        each List.Transform([Custom],
            each Delimiter &
                Text.Start(_,
                    if Text.PositionOfAny(_,EndWordCharacters) =-1
                    then Text.Length(_)
                    else Text.PositionOfAny(_,EndWordCharacters)
                )
        )
    ),
    ExtractValues = Table.TransformColumns(
        CutAfterWordEnd,
        {ResultColumnName,
        each Text.Combine( List.Transform(_, Text.From), ", "),
            type text
        }
    ),
    RemoveColumns = Table.RemoveColumns(
        ExtractValues,
        {"Custom"}
    )
in
    RemoveColumns

result could be received as

let
    Source = Excel.CurrentWorkbook(){[Name="Tweets"]}[Content],
    AddMentions = ExtractTags(Source,"@","Mentions"),
    AddHashtags = ExtractTags(AddMentions,"#","Hashtags")
in
    AddHashtags

in form of

image.png

If each mention/hashtag put into separate cell, when just split columns by comma in final result.

Related Conversations
Extract data from multiple sheet into one with conditions
TheCarb1 in Excel on
4 Replies
Data Extraction
Seth Pontiff in Excel on
6 Replies