Forum Discussion
YeuHarng
Sep 07, 2023Brass Contributor
Scroll Down WebPage until End
Hi, I have a problem for using PowerShell to scroll down the webpage, because I need to download the pdf file from website, if didn't scroll down until the end of the webpage, it just downloads the first 40 pdf file. If i add import the selenium web driver code before my code, is that can work? So, is that any ways to solve? And also i doing in remote desktop connection.
Link: https://www.fbi.gov/wanted/cyber
- LainRobertsonSilver Contributor
Hi, Yeu.
You can't "scroll down" in a native PowerShell script, but this doesn't mean you need to resort to a tool like Selenium, either.
The FBI has its own REST API which is sufficient for what you're trying to do (noting you have a second thread going on this topic), however, the cyber category isn't individually searchable. It seems cyber can only be obtained using the default category.
Anyhow, here's an example script that you can include further filtering in if you so desire.
Note: You will need to install the ThreadJob PowerShell module for this example, as noted in the comments at the top of the script.
Example
# This script leverages the FBI REST API described at: # https://api.fbi.gov/docs#!/Wanted/get_wanted # This script also relies on the ThreadJob module (for downloading files in parallel) being installed from the official PSGallery repository: # Install-Module -Name ThreadJob -Scope AllUsers; $SaveLocation = "D:\Data\Temp\Forum\fbi\"; # Do not set PageSize too high as this also relates to how many concurrent downloads will be kicked off. $PageSize = 20; $Page = 1; $UriBase = "https://api.fbi.gov/@wanted"; $Category = "default"; while (0 -ne ($Response = Invoke-RestMethod -Method Get -URI "$UriBase`?poster_classification=$Category&pageSize=$PageSize&page=$Page" -ContentType "application/json" -UseBasicParsing -ErrorAction:Stop).total) { $Downloads = @(); $Job = 0; try { foreach ($Url in ($Response.Items.Files | Where-Object { ($_.Name -eq "English") -and ($_.url.EndsWith(".pdf")) }).url) { $FileName = "$SaveLocation$(($Parts = $Url.Split("/"))[-3])_$($Parts[-2]).pdf"; $Downloads += Start-ThreadJob -Name "fbi$($Job.ToString('X3'))" -ScriptBlock { Invoke-WebRequest -Uri $using:Url -OutFile $using:FileName -ErrorAction:Stop; }; $Job++; } Wait-Job -Job $Downloads | Out-Null; Receive-Job -Job $Downloads -ErrorAction:Continue | Out-Null; Remove-Job -Job $Downloads -ErrorAction:SilentlyContinue -Force; $Downloads.Dispose(); } catch { throw; } finally { $Page++; } }
Edited: To remove the two-page limit I'd been using for testing (as there's nearly 1,000 records in the default category.)
Cheers,
Lain
- YeuHarngBrass Contributorso do i need to import the job?
- LainRobertsonSilver Contributor
The command for installing the ThreadJob module (which is a first-party Microsoft module) can be seen on line five of the script.
I cannot help with Selenium at all as I do not work with it.
Cheers,
Lain