Sep 06 2023 11:45 PM - edited Sep 07 2023 12:26 AM
Hi, I have a problem for using PowerShell to scroll down the webpage, because I need to download the pdf file from website, if didn't scroll down until the end of the webpage, it just downloads the first 40 pdf file. If i add import the selenium web driver code before my code, is that can work? So, is that any ways to solve? And also i doing in remote desktop connection.
Link: https://www.fbi.gov/wanted/cyber
Sep 07 2023 06:46 AM - edited Sep 07 2023 07:09 AM
Hi, Yeu.
You can't "scroll down" in a native PowerShell script, but this doesn't mean you need to resort to a tool like Selenium, either.
The FBI has its own REST API which is sufficient for what you're trying to do (noting you have a second thread going on this topic), however, the cyber category isn't individually searchable. It seems cyber can only be obtained using the default category.
Anyhow, here's an example script that you can include further filtering in if you so desire.
Note: You will need to install the ThreadJob PowerShell module for this example, as noted in the comments at the top of the script.
# This script leverages the FBI REST API described at:
# https://api.fbi.gov/docs#!/Wanted/get_wanted
# This script also relies on the ThreadJob module (for downloading files in parallel) being installed from the official PSGallery repository:
# Install-Module -Name ThreadJob -Scope AllUsers;
$SaveLocation = "D:\Data\Temp\Forum\fbi\";
# Do not set PageSize too high as this also relates to how many concurrent downloads will be kicked off.
$PageSize = 20;
$Page = 1;
$UriBase = "https://api.fbi.gov/@wanted";
$Category = "default";
while (0 -ne ($Response = Invoke-RestMethod -Method Get -URI "$UriBase`?poster_classification=$Category&pageSize=$PageSize&page=$Page" -ContentType "application/json" -UseBasicParsing -ErrorAction:Stop).total)
{
$Downloads = @();
$Job = 0;
try
{
foreach ($Url in ($Response.Items.Files | Where-Object { ($_.Name -eq "English") -and ($_.url.EndsWith(".pdf")) }).url)
{
$FileName = "$SaveLocation$(($Parts = $Url.Split("/"))[-3])_$($Parts[-2]).pdf";
$Downloads += Start-ThreadJob -Name "fbi$($Job.ToString('X3'))" -ScriptBlock {
Invoke-WebRequest -Uri $using:Url -OutFile $using:FileName -ErrorAction:Stop;
};
$Job++;
}
Wait-Job -Job $Downloads | Out-Null;
Receive-Job -Job $Downloads -ErrorAction:Continue | Out-Null;
Remove-Job -Job $Downloads -ErrorAction:SilentlyContinue -Force;
$Downloads.Dispose();
}
catch
{
throw;
}
finally
{
$Page++;
}
}
Edited: To remove the two-page limit I'd been using for testing (as there's nearly 1,000 records in the default category.)
Cheers,
Lain
Sep 07 2023 11:13 PM
Sep 08 2023 03:45 AM
The command for installing the ThreadJob module (which is a first-party Microsoft module) can be seen on line five of the script.
I cannot help with Selenium at all as I do not work with it.
Cheers,
Lain
Sep 10 2023 07:31 AM
Sep 10 2023 11:13 PM
The script downloads the English PDF (some people have PDFs attached in more than one language, such as Spanish) for each person in the FBI's "wanted" list.
Cheers,
Lain