Forum Discussion

DataCruncher's avatar
DataCruncher
Copper Contributor
Feb 07, 2020

Autofill YouTube data

I would like to create a spreadsheet where I paste a YouTube link into a cell in Column A, and then Column B and C autofill with the video's title and duration, respectively.  Is this possible?
  • SergeiBaklan's avatar
    SergeiBaklan
    Feb 08, 2020

    DataCruncher 

    Yes, that could be done. You need to pickup metadata from the page and extract proper content. In Google Sheets exists the function IMPORTXML() which can do that directly. There is no such function in Excel, but you could find VBA equivalents in internet. Combine one with another it's possible to build VBA solution.

     

    Perhaps another variant in Excel is combination of WEBSERVICE() and FILTERXML(), nut sure - didn't play with them.

     

    One more variant is Power Query - to query table with URL:s, load web content as table of texts, parse it and extract required metadata.

    As a sample, let use two functions

    to extract Title

    (URL as text) =>
    let
        //URL = "https://www.youtube.com/watch?v=xy9nmSQeUWg",
        GetPage = Web.Contents(URL),
        PageAsText = Table.FromColumns({Lines.FromBinary(GetPage,null,null,65001)}),
        FiltereMetaName = Table.SelectRows(
            PageAsText,
             each Text.Contains([Column1], "<meta name=""title"" content=")
        ),
        SplitByMetaName = Table.SplitColumn(
            FiltereMetaName,
            "Column1",
            Splitter.SplitTextByDelimiter("<meta name=""title"" content=""", QuoteStyle.None),
            {"Column1.1", "Title"}
        ),
        RemoveUnused = Table.RemoveColumns(
            SplitByMetaName,
            {"Column1.1"}
        ),
        GetTitle = Table.ReplaceValue(
            RemoveUnused,
            """>","",
            Replacer.ReplaceText,{"Title"}
        ){0}[Title]
    in
        GetTitle

    to extract Duration

    (URL as text) =>
    
    let
        //URL = "https://www.youtube.com/watch?v=xy9nmSQeUWg",
        
        GetPage = Web.Contents(URL),
        PageAsText = Table.FromColumns({Lines.FromBinary(GetPage,null,null,65001)}),
        FiltereMetaName = Table.SelectRows(
            PageAsText,
             each Text.Contains([Column1], "<meta itemprop=""duration"" content=")
        ),
        TrimmIt = Table.TransformColumns(
            FiltereMetaName,
            {{"Column1", Text.Trim, type text}}
        ),
        SplitbyMeta = Table.SplitColumn(
            TrimmIt,
            "Column1",
            Splitter.SplitTextByDelimiter("<meta itemprop=""duration"" content=""PT", QuoteStyle.None),
            {"Column1.1", "Duration"}
        ),
        RemoveUnused = Table.RemoveColumns(
            SplitbyMeta,
            {"Column1.1"}
        ),
        ReplaceHr = Table.ReplaceValue(
            RemoveUnused,
            "H","",
            Replacer.ReplaceText,{"Duration"}
        ),
        ReplaceMin = Table.ReplaceValue(
            ReplaceHr,
            "M","",
            Replacer.ReplaceText,{"Duration"}
        ),
        CleanDuration = Table.ReplaceValue(
            ReplaceMin,
            "S"">","",
            Replacer.ReplaceText,{"Duration"}
        ),
        Duration = Time.FromText(
            Text.PadStart(CleanDuration{0}[Duration],6,"0")
        )
    in
        Duration

    With them main query could be

    let
        Source = Excel.CurrentWorkbook(){[Name="tblVideos"]}[Content],
        AddTitle = Table.AddColumn(
            Source,
            "Title",
            each fnGetVideoTitle([URL])
        ),
        AddDuration = Table.AddColumn(
            AddTitle,
            "Duration",
            each fnGetVideoDuration([URL])
        )
    in
        AddDuration

    Result is like

    Minus is that's separate table with not clickable URL and you shall not to forget refresh it with adding new data.

Resources