Forum Discussion
Help with Queries and Connections, Data from Web from multiple URLs
Hello Sergei,
I had no idea how "in over my head" I would be getting. You just did what the IT guys said couldn't be done.
But here is the issue I face: Almost every table embedded in the (now much longer) list of links on the updated sample fileI I have attached contains columns with different names, or even a different number of columns.
I should have provided with that varied sampling before. I tried to figure out on my own the logical steps you took but the variable of different column names and different number of coulmns have me at a loss.
Would you be willing to take another look at the file and teach me what I need to do to address the above? I am looking to have all of the tables contained in the links combined into one table. If not possible, I can use multiple tables and will combine them manually if I have to.
I am really thankful for the time you took with me. I realize the level of Q&A's that go on in this forum and I know I need to take classes in Querys urgently! :)
Thanks again for your help.
Cheers,
Marcio
Hi Marcio,
Yes, that's bit complicated. In general tables on the web have three parts
- 1st which is Product Number and empty column
- 3rd which is Unix/box, Quantity and empty column
(above exists in each table in same format);
- 2nd which could have different column names, different number of columns and in some cases could be missed at all.
Moreover, some pages do not have tables, nothing to pick-up.
Perhaps the logic is more complex, but that what I seen. Based on this the function to transform web tables looks like
let Source = (link as text, name as text) => let Source = Web.Page(Web.Contents(link)), Data0 = Source{0}[Data], DataEnd = Table.SelectRows(Source, each ([ClassName] = "uk-table uk-text-nowrap")){0}[Data], DataMiddle = try Table.SelectRows(Source, each ([ClassName] = "uk-table")){0}[Data] otherwise Table.FromRows({}), ListOfColumns = List.Combine({ List.FirstN(Table.ColumnNames(Data0),2), List.FirstN(Table.ColumnNames(DataEnd),2), Table.ColumnNames(DataMiddle) }), RemoveLastEmpty = Table.RemoveColumns(Data0,{"2"}), ReorderAsInList = Table.ReorderColumns(RemoveLastEmpty,ListOfColumns), RenameFirstEmpty = Table.RenameColumns(ReorderAsInList,{{"", "Product Name"}}), ProductName = Table.ReplaceValue(RenameFirstEmpty,"",name,Replacer.ReplaceValue,{"Product Name"}), DemoteHeaders = Table.DemoteHeaders(ProductName), EmptyTable = Table.FromRows({}), Result = if Source{0}[Source] = "Service" then EmptyTable else DemoteHeaders in Result in Source
It has two branches of code - if web table exists we return transformed table, otherwise empty one.
Main query Links is practically the same, I only take as the source the table in your last excel sheet with links and Product names, named the table as "Products"
let Source = Excel.CurrentWorkbook(){[Name="Products"]}[Content], ToText = Table.TransformColumnTypes(Source,{{"Link", type text}, {"Product Name", type text}}), AddFromWeb = Table.AddColumn(ToText, "Tables", each fnTransfomLinkTable([Link], [Product Name])), RemoveUnused = Table.RemoveColumns(AddFromWeb,{"Link", "Product Name"}), ExpandWebTables = Table.ExpandTableColumn(RemoveUnused, "Tables", {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6", "Column7"}, {"Product number", "Product Name", "Unit/box", "Quantity", "Text1", "Text2", "Text3"}), RemoveEmptyTables = Table.SelectRows(ExpandWebTables, each ([Product number] <> null)) in RemoveEmptyTables
As result all web tables are combined in one returned to Excel. Together with headers for each web table since they are different.
Alternatively that could be unpivoted table generated by slightly modified above queries. Result is ProductsUnpivoted sheet. But it takes much more time to be generated.
- Dale HohmMar 06, 2018Copper Contributor
Apropos to this question, check out
Deep Dive into Power Query Formula Language
-
May 14, 2014 at 6:03AMby https://channel9.msdn.com/Events/Speakers/Matt-Masson, https://channel9.msdn.com/Events/Speakers/theresa-palmer-boroski
Fast forward to about 41:30.
Dale
- SergeiBaklanMar 06, 2018Diamond Contributor
Hi Dale,
Is that something specific for this concrete task here?
- Dale HohmMar 06, 2018Copper ContributorSimilar to what you have created here, the specific reference demonstrates how to create a function and then iterate across a set of URLs to pull table information.
-