SOLVED

Autofill YouTube data

Copper Contributor

I would like to create a spreadsheet where I paste a YouTube link into a cell in Column A, and then Column B and C autofill with the video's title and duration, respectively.  Is this possible?

4 Replies
Hello,you have to type in the video Title and Duration into columns B and C..
Can this be done automatically with a macro or some code?
best response confirmed by DataCruncher (Copper Contributor)
Solution

@DataCruncher 

Yes, that could be done. You need to pickup metadata from the page and extract proper content. In Google Sheets exists the function IMPORTXML() which can do that directly. There is no such function in Excel, but you could find VBA equivalents in internet. Combine one with another it's possible to build VBA solution.

 

Perhaps another variant in Excel is combination of WEBSERVICE() and FILTERXML(), nut sure - didn't play with them.

 

One more variant is Power Query - to query table with URL:s, load web content as table of texts, parse it and extract required metadata.

As a sample, let use two functions

to extract Title

(URL as text) =>
let
    //URL = "https://www.youtube.com/watch?v=xy9nmSQeUWg",
    GetPage = Web.Contents(URL),
    PageAsText = Table.FromColumns({Lines.FromBinary(GetPage,null,null,65001)}),
    FiltereMetaName = Table.SelectRows(
        PageAsText,
         each Text.Contains([Column1], "<meta name=""title"" content=")
    ),
    SplitByMetaName = Table.SplitColumn(
        FiltereMetaName,
        "Column1",
        Splitter.SplitTextByDelimiter("<meta name=""title"" content=""", QuoteStyle.None),
        {"Column1.1", "Title"}
    ),
    RemoveUnused = Table.RemoveColumns(
        SplitByMetaName,
        {"Column1.1"}
    ),
    GetTitle = Table.ReplaceValue(
        RemoveUnused,
        """>","",
        Replacer.ReplaceText,{"Title"}
    ){0}[Title]
in
    GetTitle

to extract Duration

(URL as text) =>

let
    //URL = "https://www.youtube.com/watch?v=xy9nmSQeUWg",
    
    GetPage = Web.Contents(URL),
    PageAsText = Table.FromColumns({Lines.FromBinary(GetPage,null,null,65001)}),
    FiltereMetaName = Table.SelectRows(
        PageAsText,
         each Text.Contains([Column1], "<meta itemprop=""duration"" content=")
    ),
    TrimmIt = Table.TransformColumns(
        FiltereMetaName,
        {{"Column1", Text.Trim, type text}}
    ),
    SplitbyMeta = Table.SplitColumn(
        TrimmIt,
        "Column1",
        Splitter.SplitTextByDelimiter("<meta itemprop=""duration"" content=""PT", QuoteStyle.None),
        {"Column1.1", "Duration"}
    ),
    RemoveUnused = Table.RemoveColumns(
        SplitbyMeta,
        {"Column1.1"}
    ),
    ReplaceHr = Table.ReplaceValue(
        RemoveUnused,
        "H","",
        Replacer.ReplaceText,{"Duration"}
    ),
    ReplaceMin = Table.ReplaceValue(
        ReplaceHr,
        "M","",
        Replacer.ReplaceText,{"Duration"}
    ),
    CleanDuration = Table.ReplaceValue(
        ReplaceMin,
        "S"">","",
        Replacer.ReplaceText,{"Duration"}
    ),
    Duration = Time.FromText(
        Text.PadStart(CleanDuration{0}[Duration],6,"0")
    )
in
    Duration

With them main query could be

let
    Source = Excel.CurrentWorkbook(){[Name="tblVideos"]}[Content],
    AddTitle = Table.AddColumn(
        Source,
        "Title",
        each fnGetVideoTitle([URL])
    ),
    AddDuration = Table.AddColumn(
        AddTitle,
        "Duration",
        each fnGetVideoDuration([URL])
    )
in
    AddDuration

Result is like

image.png

Minus is that's separate table with not clickable URL and you shall not to forget refresh it with adding new data.

awesome, thanks!
1 best response

Accepted Solutions
best response confirmed by DataCruncher (Copper Contributor)
Solution

@DataCruncher 

Yes, that could be done. You need to pickup metadata from the page and extract proper content. In Google Sheets exists the function IMPORTXML() which can do that directly. There is no such function in Excel, but you could find VBA equivalents in internet. Combine one with another it's possible to build VBA solution.

 

Perhaps another variant in Excel is combination of WEBSERVICE() and FILTERXML(), nut sure - didn't play with them.

 

One more variant is Power Query - to query table with URL:s, load web content as table of texts, parse it and extract required metadata.

As a sample, let use two functions

to extract Title

(URL as text) =>
let
    //URL = "https://www.youtube.com/watch?v=xy9nmSQeUWg",
    GetPage = Web.Contents(URL),
    PageAsText = Table.FromColumns({Lines.FromBinary(GetPage,null,null,65001)}),
    FiltereMetaName = Table.SelectRows(
        PageAsText,
         each Text.Contains([Column1], "<meta name=""title"" content=")
    ),
    SplitByMetaName = Table.SplitColumn(
        FiltereMetaName,
        "Column1",
        Splitter.SplitTextByDelimiter("<meta name=""title"" content=""", QuoteStyle.None),
        {"Column1.1", "Title"}
    ),
    RemoveUnused = Table.RemoveColumns(
        SplitByMetaName,
        {"Column1.1"}
    ),
    GetTitle = Table.ReplaceValue(
        RemoveUnused,
        """>","",
        Replacer.ReplaceText,{"Title"}
    ){0}[Title]
in
    GetTitle

to extract Duration

(URL as text) =>

let
    //URL = "https://www.youtube.com/watch?v=xy9nmSQeUWg",
    
    GetPage = Web.Contents(URL),
    PageAsText = Table.FromColumns({Lines.FromBinary(GetPage,null,null,65001)}),
    FiltereMetaName = Table.SelectRows(
        PageAsText,
         each Text.Contains([Column1], "<meta itemprop=""duration"" content=")
    ),
    TrimmIt = Table.TransformColumns(
        FiltereMetaName,
        {{"Column1", Text.Trim, type text}}
    ),
    SplitbyMeta = Table.SplitColumn(
        TrimmIt,
        "Column1",
        Splitter.SplitTextByDelimiter("<meta itemprop=""duration"" content=""PT", QuoteStyle.None),
        {"Column1.1", "Duration"}
    ),
    RemoveUnused = Table.RemoveColumns(
        SplitbyMeta,
        {"Column1.1"}
    ),
    ReplaceHr = Table.ReplaceValue(
        RemoveUnused,
        "H","",
        Replacer.ReplaceText,{"Duration"}
    ),
    ReplaceMin = Table.ReplaceValue(
        ReplaceHr,
        "M","",
        Replacer.ReplaceText,{"Duration"}
    ),
    CleanDuration = Table.ReplaceValue(
        ReplaceMin,
        "S"">","",
        Replacer.ReplaceText,{"Duration"}
    ),
    Duration = Time.FromText(
        Text.PadStart(CleanDuration{0}[Duration],6,"0")
    )
in
    Duration

With them main query could be

let
    Source = Excel.CurrentWorkbook(){[Name="tblVideos"]}[Content],
    AddTitle = Table.AddColumn(
        Source,
        "Title",
        each fnGetVideoTitle([URL])
    ),
    AddDuration = Table.AddColumn(
        AddTitle,
        "Duration",
        each fnGetVideoDuration([URL])
    )
in
    AddDuration

Result is like

image.png

Minus is that's separate table with not clickable URL and you shall not to forget refresh it with adding new data.

View solution in original post