Forum Discussion
anupambit1797
Dec 04, 2025Iron Contributor
How to write a script or any PQ or in Excel to download the zip files from a Webpage
Dear Experts, Greetings! https://www.etsi.org/deliver/etsi_ts/138300_138399/138306/ Could you please help me on how to download the pdf.zip files from above for all the ver...
Lorenzo
Dec 05, 2025Silver Contributor
Hi
It's almost all about parsing the HTML code and transforming it to Tables (https://learn.microsoft.com/en-us/powerquery-m/html-table). Note that I had a couple of times error "Unable to connect..."
No idea what you want to do with the content of each PDF so the below query stops after getting the content of each file
Power Query:
let
Source = Web.BrowserContents( "https://www.etsi.org/deliver/etsi_ts/138300_138399/138306/" ),
HtmlTextToTable = #table(type table [HtmlText = Text.Type],
{{Source}}
),
SelectedTextBetweenPreTags = Table.AddColumn( HtmlTextToTable, "BetweenPreTags", each
Text.BetweenDelimiters( [HtmlText], "<pre>", "</pre>" )
),
RemovedHtmlTextColumn = Table.SelectColumns( SelectedTextBetweenPreTags, {"BetweenPreTags"} ),
RemovedDoubleQuotes = Table.ReplaceValue( RemovedHtmlTextColumn, """", "",
Replacer.ReplaceText, {"BetweenPreTags"}
),
PdfParentLink = Table.AddColumn( RemovedDoubleQuotes, "PdfParentLink", each
let
tableFromHtml = Html.Table( [BetweenPreTags], {{"ParentLink", "a", each "www.etsi.org" & [Attributes][href]}} )
in
// Root Directory doesn't content any file ==> Skip 1st record
Table.Skip( tableFromHtml, 1 ),
Table.Type
),
RemovedOtherColumn = Table.SelectColumns( PdfParentLink, {"PdfParentLink"}),
ExpandedPdfParentLink = Table.ExpandTableColumn( RemovedOtherColumn, "PdfParentLink", {"ParentLink"} ),
// There seems to be a single file per Directory...
PdfFileName = Table.AddColumn( ExpandedPdfParentLink, "PdfName", each
let
webContent = Web.BrowserContents( [ParentLink] ),
betweenHrefTag1 = Text.BetweenDelimiters( webContent, "<a href=", "</a>", 1 )
in
Text.AfterDelimiter( betweenHrefTag1, ">", {0, RelativePosition.FromEnd} ),
Text.Type
),
PdfContents = Table.AddColumn( PdfFileName, "PdfContents", each
Pdf.Tables( Web.Contents( [ParentLink] & [PdfName] ) , [Implementation = "1.3"] ),
Table.Type
)
in
PdfContents