Forum Discussion
anupambit1797
Dec 04, 2025Iron Contributor
How to write a script or any PQ or in Excel to download the zip files from a Webpage
Dear Experts,
Greetings!
https://www.etsi.org/deliver/etsi_ts/138300_138399/138306/
Could you please help me on how to download the pdf.zip files from above for all the versions?
Using a single command in Excel or PQ-option.
Thanks in Advance,
Br,
Anupam
1 Reply
- LorenzoSilver Contributor
Hi
It's almost all about parsing the HTML code and transforming it to Tables (https://learn.microsoft.com/en-us/powerquery-m/html-table). Note that I had a couple of times error "Unable to connect..."
No idea what you want to do with the content of each PDF so the below query stops after getting the content of each filePower Query:
let Source = Web.BrowserContents( "https://www.etsi.org/deliver/etsi_ts/138300_138399/138306/" ), HtmlTextToTable = #table(type table [HtmlText = Text.Type], {{Source}} ), SelectedTextBetweenPreTags = Table.AddColumn( HtmlTextToTable, "BetweenPreTags", each Text.BetweenDelimiters( [HtmlText], "<pre>", "</pre>" ) ), RemovedHtmlTextColumn = Table.SelectColumns( SelectedTextBetweenPreTags, {"BetweenPreTags"} ), RemovedDoubleQuotes = Table.ReplaceValue( RemovedHtmlTextColumn, """", "", Replacer.ReplaceText, {"BetweenPreTags"} ), PdfParentLink = Table.AddColumn( RemovedDoubleQuotes, "PdfParentLink", each let tableFromHtml = Html.Table( [BetweenPreTags], {{"ParentLink", "a", each "www.etsi.org" & [Attributes][href]}} ) in // Root Directory doesn't content any file ==> Skip 1st record Table.Skip( tableFromHtml, 1 ), Table.Type ), RemovedOtherColumn = Table.SelectColumns( PdfParentLink, {"PdfParentLink"}), ExpandedPdfParentLink = Table.ExpandTableColumn( RemovedOtherColumn, "PdfParentLink", {"ParentLink"} ), // There seems to be a single file per Directory... PdfFileName = Table.AddColumn( ExpandedPdfParentLink, "PdfName", each let webContent = Web.BrowserContents( [ParentLink] ), betweenHrefTag1 = Text.BetweenDelimiters( webContent, "<a href=", "</a>", 1 ) in Text.AfterDelimiter( betweenHrefTag1, ">", {0, RelativePosition.FromEnd} ), Text.Type ), PdfContents = Table.AddColumn( PdfFileName, "PdfContents", each Pdf.Tables( Web.Contents( [ParentLink] & [PdfName] ) , [Implementation = "1.3"] ), Table.Type ) in PdfContents