Forum Discussion

StanL's avatar
StanL
Copper Contributor
Jan 19, 2025

WebView2 HTML web-scraping

Interested in a Powershell winform embedding a Webview2 control but with web-scraping capability. Initial frustration trying to find a Microsoft.Web.WebView2.WinForms.dll file that would load in a script w/out error. Downloaded Nuget package(s) which failed, then realized I had about 20 copies of the dll on my PC, most with different dates/sizes associated with different apps/folders. I finally found a file with size of 40,880 (copied from PowerBI \bin) which worked when copied to the PS script working folder. The script below works with that dll file. It navigates to a test webscrape site. In addition to a close button, I added buttons to navigate to another url or display links for the currently displayed URL. The issue is the links button has to perform an Invoke-Webrequest to the site which is redundant since the site is already loaded in the Webview. I realize Webview2 has no internal DOM parsing, but the HTML should be available for parsing. In the script code there is a commented section for using $WebView2.ExecuteScriptAsync() to obtain the HTML. No matter I have tried to use that it doesn't work. Long story short: is there a way to obtain/parse WebView2 HTML in a winform w/out having to redundantly perform Invoke-Webrequest.  

################ Script ################################################

function button1
{
   $title = 'Enter New URL to Navigate To'
   $msg   = 'Please enter URL as https://[...some site...].com/.net/.org'
   $url = [Microsoft.VisualBasic.Interaction]::InputBox($msg, $title)
   $url
   if ($url -ne "")
   {
      try
      {
         $WebView2.Source = $url
         $WebView2.Refresh()
      }
      catch { $pop.popup("Invalid or Unuseable URL",2,"Error",4096) }
   }
}

function button2
{
   $links = ( Invoke-WebRequest -Uri $WebView2.Source).Links.Href  | Get-Unique
   $regex = '/product*'
   $links | Select-String $regex | Select line | Out-Gridview -Title "Webpage Links for Products" -PassThru
   #$pop.popup("Code for Scraping Links",2,"Message",4096)
}
########################################################################################################################
$pop = New-Object -ComObject wscript.shell
New-Variable -Name 'data' -Value "$([Environment]::GetEnvironmentVariable('LOCALAPPDATA'))\Webview2" -Scope Global -Force
$path=$PSScriptRoot  
# Get DLLs
$WinForms = "$path\Microsoft.Web.WebView2.WinForms.dll"
$Core     = "$path\Microsoft.Web.WebView2.Core.dll"

<#
$loader = "$path\WebView2Loader.dll"
$wpf    = "$path\Microsoft.Web.WebView2.Wpf.dll"
#>

Add-Type -AssemblyName Microsoft.VisualBasic
Add-Type -Path $WinForms
#Add-Type -Path $Core
Add-Type -AssemblyName System.Windows.Forms
Add-Type -AssemblyName System.Drawing

$Form = New-Object System.Windows.Forms.Form
$button1 = New-Object System.Windows.Forms.Button
$button2 = New-Object System.Windows.Forms.Button
$cancelButton = New-Object System.Windows.Forms.Button
#
$button1.Location = New-Object System.Drawing.Point(23, 25)
$button1.Name = "button1"
$button1.Size = New-Object System.Drawing.Size(75, 23)
$button1.TabIndex = 0
$button1.Text = "New URL"
$button1.BackColor = "Green"
$button1.ForeColor = "White"
$button1.AutoSize = $true
#
$button2.Location = New-Object System.Drawing.Point(312, 25)
$button2.Name = "button2"
$button2.Size = New-Object System.Drawing.Size(75, 23)
$button2.TabIndex = 1
$button2.Text = "Links"
$button2.BackColor = "Green"
$button2.ForeColor = "White"
$button2.AutoSize = $true
#
$cancelButton.Location = New-Object System.Drawing.Point(684, 25)
$cancelButton.Name = "button3"
$cancelButton.Size = New-Object System.Drawing.Size(75, 23)
$cancelButton.TabIndex = 2
$cancelButton.Text = "Close"
$cancelButton.BackColor = "Red"
$cancelButton.ForeColor = "White"
$cancelButton.Text = 'Close Window'
$cancelButton.AutoSize = $true
$cancelButton.DialogResult = [System.Windows.Forms.DialogResult]::Cancel
$Form.CancelButton = $cancelButton
#
$WebView2 = New-Object -TypeName Microsoft.Web.WebView2.WinForms.WebView2
$WebView2.CreationProperties = New-Object -TypeName 'Microsoft.Web.WebView2.WinForms.CoreWebView2CreationProperties'
$WebView2.CreationProperties.UserDataFolder = $data #keeps it out of $PSScriptRoot  
$WebView2.Source = "https://www.scrapingcourse.com/ecommerce/"
$Webview2.Location = New-Object System.Drawing.Point(23, 65)
$Webview2.Size = New-Object System.Drawing.Size(749, 373)
$WebView2.Anchor = 'top,right,bottom,left'
<#navigation complete
$WebView2_NavigationCompleted = {
    $htmlContent = $WebView2.ExecuteScriptAsync("window.chrome.webview.postMessage(document.documentElement.outerHTML;").Result  
    #$htmlContent = $webView2.CoreWebView2.ExecuteScriptAsync("document.documentElement.outerHTML;").Result
    Write-Host $htmlContent

}
$WebView2.add_NavigationCompleted($WebView2_NavigationCompleted)

$WebView2.add_WebMessageReceived({
    param($WebView2, $message)
    $pop.popup($message.TryGetWebMessageAsString(),3,"Message",4096)
})
#>
$Form.ClientSize = New-Object System.Drawing.Size(800, 450)
$Form.Controls.Add($Webview2)
$Form.Controls.Add($cancelButton)
$Form.Controls.Add($button2)
$Form.Controls.Add($button1)
$Form.Name = "Form"
$Form.Text = "Webview Web Scraping Sample"

$button1.add_click(
{
   button1
})

$button2.add_click(
{
   button2
})

$result=$Form.ShowDialog()
#Terminate if Cancel button pressed
if ($result -eq [System.Windows.Forms.DialogResult]::Cancel) 
{
   [System.GC]::Collect()
   [System.GC]::WaitForPendingFinalizers()
   $form.Dispose()
   Exit 
}
########################################################################################################################

1 Reply

  • luchete's avatar
    luchete
    Iron Contributor

    Hello StanL 

    Here's how you can adjust your code to properly execute the script and retrieve the HTML content.

    First, make sure you are correctly executing the JavaScript to retrieve the HTML:

    $WebView2.add_NavigationCompleted({
        $htmlContent = $WebView2.CoreWebView2.ExecuteScriptAsync("document.documentElement.outerHTML").Result
        Write-Host $htmlContent
    })

    This method will execute the JavaScript document.documentElement.outerHTML inside the WebView2 control, which should return the full HTML content of the page.

    You can also ensure you're using the right event, like NavigationCompleted, to ensure that the page is fully loaded before attempting to scrape the content.

    Let me know how it goes!

Resources