Forum Discussion
Marcio Flores
Feb 28, 2018Copper Contributor
Help with Queries and Connections, Data from Web from multiple URLs
Hello,
I have a list of about 75 webpages (within the same website) from which I need to extract an specific table on each of these pages. I have been doing this one by one, (please see file...
SergeiBaklan
Mar 01, 2018Diamond Contributor
Hi Marcio,
Didn't catch you'd like to combine all tables from web in one or keep them separate. If the first
- create the table with all your links
- based on your query create the function which extracts the table from web page
let
Source = (link as text, name as text) => let
Source = Web.Page(Web.Contents(link)),
Data0 = Source{0}[Data],
#"Changed Type" = Table.TransformColumnTypes(Data0,{{"Product number", type text}, {"", type text}, {"sheath diameter", type text}, {"length", type text}, {"approximate net size", type text}, {"Unit/box", Int64.Type}, {"Quantity", type text}, {"2", type text}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type",{{"", "Description"}}),
#"Replaced Value" = Table.ReplaceValue(#"Renamed Columns","",name,Replacer.ReplaceValue,{"Description"}),
#"Duplicated Column" = Table.DuplicateColumn(#"Replaced Value", "Description", "Description - Copy"),
#"Renamed Columns1" = Table.RenameColumns(#"Duplicated Column",{{"Description - Copy", "Product Name"}}),
#"Removed Columns" = Table.RemoveColumns(#"Renamed Columns1",{"Quantity", "2"}),
#"Reordered Columns" = Table.ReorderColumns(#"Removed Columns",{"Product number", "Description", "Unit/box", "sheath diameter", "length", "approximate net size", "Product Name"})
in
#"Reordered Columns"
in Source
- make query on your table with links
- in it add one more column with name of the product which is part of URL
- add another column with function above, parameters will be the link and name fields
- keep only latest column
- expand it - you'll have all your web tables combines
let
Source = Excel.CurrentWorkbook(){[Name="Links"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Links", type text}}),
#"Added Custom1" = Table.AddColumn(#"Changed Type", "Description", each [Links]),
#"Split Column by Delimiter" = Table.SplitColumn(#"Added Custom1", "Description", Splitter.SplitTextByDelimiter("/Products/", QuoteStyle.Csv), {"Description.1", "Description.2"}),
#"Removed Columns1" = Table.RemoveColumns(#"Split Column by Delimiter",{"Description.1"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns1",{{"Description.2", "Description"}}),
#"Replaced Value" = Table.ReplaceValue(#"Renamed Columns","-"," ",Replacer.ReplaceText,{"Description"}),
#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value",".aspx","",Replacer.ReplaceText,{"Description"}),
#"Added Custom" = Table.AddColumn(#"Replaced Value1", "Tables", each fnTransfomLinkTable([Links], [Description])),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Links", "Description"}),
#"Expanded Tables" = Table.ExpandTableColumn(#"Removed Columns", "Tables", {"Product number", "Description", "Unit/box", "sheath diameter", "length", "approximate net size", "Product Name"}, {"Product number", "Description", "Unit/box", "sheath diameter", "length", "approximate net size", "Product Name"})
in
#"Expanded Tables"
Please see file attached
Code could be optimized, it's too straight forward, just to illustrate an idea. Performance is not good, my guess it's mainly due to web connection