Scroll Down WebPage until End

Brass Contributor

Hi, I have a problem for using PowerShell to scroll down the webpage, because I need to download the pdf file from website, if didn't scroll down until the end of the webpage, it just downloads the first 40 pdf file. If i add import the selenium web driver code before my code, is that can work? So, is that any ways to solve?  And also i doing in remote desktop connection.
Link: https://www.fbi.gov/wanted/cyber 


6 Replies

@YeuHarng 

 

Hi, Yeu.

 

You can't "scroll down" in a native PowerShell script, but this doesn't mean you need to resort to a tool like Selenium, either.

 

The FBI has its own REST API which is sufficient for what you're trying to do (noting you have a second thread going on this topic), however, the cyber category isn't individually searchable. It seems cyber can only be obtained using the default category.

 

 

Anyhow, here's an example script that you can include further filtering in if you so desire.

 

Note: You will need to install the ThreadJob PowerShell module for this example, as noted in the comments at the top of the script.

 

Example

 

# This script leverages the FBI REST API described at:
# https://api.fbi.gov/docs#!/Wanted/get_wanted

# This script also relies on the ThreadJob module (for downloading files in parallel) being installed from the official PSGallery repository:
# Install-Module -Name ThreadJob -Scope AllUsers;

$SaveLocation = "D:\Data\Temp\Forum\fbi\";

# Do not set PageSize too high as this also relates to how many concurrent downloads will be kicked off.
$PageSize = 20;
$Page = 1;

$UriBase  = "https://api.fbi.gov/@wanted";
$Category = "default";

while (0 -ne ($Response = Invoke-RestMethod -Method Get -URI "$UriBase`?poster_classification=$Category&pageSize=$PageSize&page=$Page" -ContentType "application/json" -UseBasicParsing -ErrorAction:Stop).total)
{
    $Downloads = @();
    $Job = 0;

    try
    {
        foreach ($Url in ($Response.Items.Files | Where-Object { ($_.Name -eq "English") -and ($_.url.EndsWith(".pdf")) }).url)
        {
            $FileName = "$SaveLocation$(($Parts = $Url.Split("/"))[-3])_$($Parts[-2]).pdf";
    
            $Downloads += Start-ThreadJob -Name "fbi$($Job.ToString('X3'))" -ScriptBlock {
                Invoke-WebRequest -Uri $using:Url -OutFile $using:FileName -ErrorAction:Stop;
            };
    
            $Job++;
        }
    
        Wait-Job -Job $Downloads | Out-Null;
        Receive-Job -Job $Downloads -ErrorAction:Continue | Out-Null;
        Remove-Job -Job $Downloads -ErrorAction:SilentlyContinue -Force;

        $Downloads.Dispose();
    }
    catch
    {
        throw;
    }
    finally
    {
        $Page++;
    }
}

 

 

Edited: To remove the two-page limit I'd been using for testing (as there's nearly 1,000 records in the default category.)

 

Cheers,

Lain

so do i need to import the job?
Hi, cna i ask how to import the Selenium, because i have many issue on Selenium module.

@YeuHarng 

 

The command for installing the ThreadJob module (which is a first-party Microsoft module) can be seen on line five of the script.

 

I cannot help with Selenium at all as I do not work with it.

 

Cheers,

Lain

so for above u write the code can download all the file from the FBI link? because my job is want download all the pdf file from the link

@YeuHarng 

 

The script downloads the English PDF (some people have PDFs attached in more than one language, such as Spanish) for each person in the FBI's "wanted" list.

 

Cheers,

Lain