Forum Discussion

YeuHarng's avatar
YeuHarng
Brass Contributor
Sep 07, 2023

Scroll Down WebPage until End

Hi, I have a problem for using PowerShell to scroll down the webpage, because I need to download the pdf file from website, if didn't scroll down until the end of the webpage, it just downloads the first 40 pdf file. If i add import the selenium web driver code before my code, is that can work? So, is that any ways to solve?  And also i doing in remote desktop connection.
Link: https://www.fbi.gov/wanted/cyber 


  • LainRobertson's avatar
    LainRobertson
    Silver Contributor

    YeuHarng 

     

    Hi, Yeu.

     

    You can't "scroll down" in a native PowerShell script, but this doesn't mean you need to resort to a tool like Selenium, either.

     

    The FBI has its own REST API which is sufficient for what you're trying to do (noting you have a second thread going on this topic), however, the cyber category isn't individually searchable. It seems cyber can only be obtained using the default category.

     

     

    Anyhow, here's an example script that you can include further filtering in if you so desire.

     

    Note: You will need to install the ThreadJob PowerShell module for this example, as noted in the comments at the top of the script.

     

    Example

     

    # This script leverages the FBI REST API described at:
    # https://api.fbi.gov/docs#!/Wanted/get_wanted
    
    # This script also relies on the ThreadJob module (for downloading files in parallel) being installed from the official PSGallery repository:
    # Install-Module -Name ThreadJob -Scope AllUsers;
    
    $SaveLocation = "D:\Data\Temp\Forum\fbi\";
    
    # Do not set PageSize too high as this also relates to how many concurrent downloads will be kicked off.
    $PageSize = 20;
    $Page = 1;
    
    $UriBase  = "https://api.fbi.gov/@wanted";
    $Category = "default";
    
    while (0 -ne ($Response = Invoke-RestMethod -Method Get -URI "$UriBase`?poster_classification=$Category&pageSize=$PageSize&page=$Page" -ContentType "application/json" -UseBasicParsing -ErrorAction:Stop).total)
    {
        $Downloads = @();
        $Job = 0;
    
        try
        {
            foreach ($Url in ($Response.Items.Files | Where-Object { ($_.Name -eq "English") -and ($_.url.EndsWith(".pdf")) }).url)
            {
                $FileName = "$SaveLocation$(($Parts = $Url.Split("/"))[-3])_$($Parts[-2]).pdf";
        
                $Downloads += Start-ThreadJob -Name "fbi$($Job.ToString('X3'))" -ScriptBlock {
                    Invoke-WebRequest -Uri $using:Url -OutFile $using:FileName -ErrorAction:Stop;
                };
        
                $Job++;
            }
        
            Wait-Job -Job $Downloads | Out-Null;
            Receive-Job -Job $Downloads -ErrorAction:Continue | Out-Null;
            Remove-Job -Job $Downloads -ErrorAction:SilentlyContinue -Force;
    
            $Downloads.Dispose();
        }
        catch
        {
            throw;
        }
        finally
        {
            $Page++;
        }
    }

     

     

    Edited: To remove the two-page limit I'd been using for testing (as there's nearly 1,000 records in the default category.)

     

    Cheers,

    Lain

      • LainRobertson's avatar
        LainRobertson
        Silver Contributor

        YeuHarng 

         

        The command for installing the ThreadJob module (which is a first-party Microsoft module) can be seen on line five of the script.

         

        I cannot help with Selenium at all as I do not work with it.

         

        Cheers,

        Lain

Resources