Forum Discussion

YeuHarng's avatar
YeuHarng
Brass Contributor
Sep 18, 2023

encoding issues or character representation problems

Hi guys, i have a problem, now I'm doing web scraping from the webpage, when i get scraping the details, the details is like this:



i have already try to convert but still cannot, so is there any solution suggest, the details's language from the webpage is Thai

  • UTF-8 should be able to convert Thai to 'normal' characters. Could you share a snippet of the webrequest?
    • YeuHarng's avatar
      YeuHarng
      Brass Contributor

      Harm_Veenstra here is the code, after convert i need to write into excel file

      $url = "https://ww2.kanchanaburi.go.th/personal_board//?page=1&limit=99999"
      
      # Create a web request to fetch the HTML content and specify the character encoding
      $headers = @{
          "Accept-Encoding" = "UTF-8"  # Specify the correct encoding if needed
      }
      
      $response = Invoke-WebRequest -Uri $url -Headers $headers
      
      $htmlContent = $response.ParsedHtml
      
      $personBoxes = $htmlContent.getElementsByClassName("col-lg-12 person-box")
      
      # Loop through each "col-lg-12 person-box" element
      foreach ($personBox in $personBoxes) {
          $personName = $personBox.getElementsByClassName("d-flex flex-row row person-detail")[0]
      	$personPosition = $personBox.getElementsByClassName("d-flex flex-row row person-detail")[2]
      
          if ($personName) {
              # Convert the inner text to UTF-8 encoding and print it
              $utf8EncodedText = [System.Text.Encoding]::UTF8.GetBytes($personName.innerText)
              $decodedText = [System.Text.Encoding]::UTF8.GetString($utf8EncodedText)
              Write-Host $decodedText
          }
      	
      	if ($personPosition) {
              $utf8EncodedText = [System.Text.Encoding]::UTF8.GetBytes($personPosition.innerText)
              $decodedText = [System.Text.Encoding]::UTF8.GetString($utf8EncodedText)
              Write-Host $decodedText
          }
      }

      Harm_Veenstra 

      • YeuHarng You could gather the information in a pscustomobject and write it to an Excel file, something like this:

         

        # Loop through each "col-lg-12 person-box" element
        $total = foreach ($personBox in $personBoxes) {
            $personName = $personBox.getElementsByClassName("d-flex flex-row row person-detail")[0]
        	$personPosition = $personBox.getElementsByClassName("d-flex flex-row row person-detail")[2]
        
            if ($personName) {
                # Convert the inner text to UTF-8 encoding and print it
                $utf8EncodedText = [System.Text.Encoding]::UTF8.GetBytes($personName.innerText)
                $decodedText = [System.Text.Encoding]::UTF8.GetString($utf8EncodedText)
                [pscustomobject]@{
                Text =  $decodedText
               }
            }
        	
        	if ($personPosition) {
                $utf8EncodedText = [System.Text.Encoding]::UTF8.GetBytes($personPosition.innerText)
                $decodedText = [System.Text.Encoding]::UTF8.GetString($utf8EncodedText)
                [pscustomobject]@{
                Text =  $decodedText
               }
            }
        }
        
        #Check if the ImportExcel module is installed. Install it if not
        if (-not (Get-Module -ListAvailable -Name ImportExcel)) {
            Write-Warning ("The ImportExcel module was not found on the system, installing now...")
            try {
                Install-Module -Name ImportExcel -SkipPublisherCheck -Force:$true -Confirm:$false -Scope CurrentUser -ErrorAction Stop
                Import-Module -Name ImportExcel -Scope Local -ErrorAction Stop
                Write-Host ("Successfully installed the ImportExcel module, continuing..") -ForegroundColor Green
            }
            catch {
                Write-Warning ("Could not install the ImportExcel module, exiting...")
                return
            }
        }
        else {
            try {
                Import-Module -Name ImportExcel -Scope Local -ErrorAction Stop
                Write-Host ("The ImportExcel module was found on the system, continuing...") -ForegroundColor Green
            }
            catch {
                Write-Warning ("Error importing the ImportExcel module, exiting...")
                return  
            }
            
        }
        
        
        $total | Export-Excel -Path c:\temp\output.xlsx -AutoFilter -AutoSize

Resources