Forum Discussion
Autofill YouTube data
- Feb 08, 2020
Yes, that could be done. You need to pickup metadata from the page and extract proper content. In Google Sheets exists the function IMPORTXML() which can do that directly. There is no such function in Excel, but you could find VBA equivalents in internet. Combine one with another it's possible to build VBA solution.
Perhaps another variant in Excel is combination of WEBSERVICE() and FILTERXML(), nut sure - didn't play with them.
One more variant is Power Query - to query table with URL:s, load web content as table of texts, parse it and extract required metadata.
As a sample, let use two functions
to extract Title
(URL as text) => let //URL = "https://www.youtube.com/watch?v=xy9nmSQeUWg", GetPage = Web.Contents(URL), PageAsText = Table.FromColumns({Lines.FromBinary(GetPage,null,null,65001)}), FiltereMetaName = Table.SelectRows( PageAsText, each Text.Contains([Column1], "<meta name=""title"" content=") ), SplitByMetaName = Table.SplitColumn( FiltereMetaName, "Column1", Splitter.SplitTextByDelimiter("<meta name=""title"" content=""", QuoteStyle.None), {"Column1.1", "Title"} ), RemoveUnused = Table.RemoveColumns( SplitByMetaName, {"Column1.1"} ), GetTitle = Table.ReplaceValue( RemoveUnused, """>","", Replacer.ReplaceText,{"Title"} ){0}[Title] in GetTitle
to extract Duration
(URL as text) => let //URL = "https://www.youtube.com/watch?v=xy9nmSQeUWg", GetPage = Web.Contents(URL), PageAsText = Table.FromColumns({Lines.FromBinary(GetPage,null,null,65001)}), FiltereMetaName = Table.SelectRows( PageAsText, each Text.Contains([Column1], "<meta itemprop=""duration"" content=") ), TrimmIt = Table.TransformColumns( FiltereMetaName, {{"Column1", Text.Trim, type text}} ), SplitbyMeta = Table.SplitColumn( TrimmIt, "Column1", Splitter.SplitTextByDelimiter("<meta itemprop=""duration"" content=""PT", QuoteStyle.None), {"Column1.1", "Duration"} ), RemoveUnused = Table.RemoveColumns( SplitbyMeta, {"Column1.1"} ), ReplaceHr = Table.ReplaceValue( RemoveUnused, "H","", Replacer.ReplaceText,{"Duration"} ), ReplaceMin = Table.ReplaceValue( ReplaceHr, "M","", Replacer.ReplaceText,{"Duration"} ), CleanDuration = Table.ReplaceValue( ReplaceMin, "S"">","", Replacer.ReplaceText,{"Duration"} ), Duration = Time.FromText( Text.PadStart(CleanDuration{0}[Duration],6,"0") ) in Duration
With them main query could be
let Source = Excel.CurrentWorkbook(){[Name="tblVideos"]}[Content], AddTitle = Table.AddColumn( Source, "Title", each fnGetVideoTitle([URL]) ), AddDuration = Table.AddColumn( AddTitle, "Duration", each fnGetVideoDuration([URL]) ) in AddDuration
Result is like
Minus is that's separate table with not clickable URL and you shall not to forget refresh it with adding new data.
Perhaps, but I'm not working with VBA now.