encoding issues or character representation problems

Brass Contributor

Hi guys, i have a problem, now I'm doing web scraping from the webpage, when i get scraping the details, the details is like this:

YeuHarng_1-1695000465102.png

YeuHarng_2-1695000477080.png



i have already try to convert but still cannot, so is there any solution suggest, the details's language from the webpage is Thai

3 Replies
UTF-8 should be able to convert Thai to 'normal' characters. Could you share a snippet of the webrequest?

@Harm_Veenstra here is the code, after convert i need to write into excel file

$url = "https://ww2.kanchanaburi.go.th/personal_board//?page=1&limit=99999"

# Create a web request to fetch the HTML content and specify the character encoding
$headers = @{
    "Accept-Encoding" = "UTF-8"  # Specify the correct encoding if needed
}

$response = Invoke-WebRequest -Uri $url -Headers $headers

$htmlContent = $response.ParsedHtml

$personBoxes = $htmlContent.getElementsByClassName("col-lg-12 person-box")

# Loop through each "col-lg-12 person-box" element
foreach ($personBox in $personBoxes) {
    $personName = $personBox.getElementsByClassName("d-flex flex-row row person-detail")[0]
	$personPosition = $personBox.getElementsByClassName("d-flex flex-row row person-detail")[2]

    if ($personName) {
        # Convert the inner text to UTF-8 encoding and print it
        $utf8EncodedText = [System.Text.Encoding]::UTF8.GetBytes($personName.innerText)
        $decodedText = [System.Text.Encoding]::UTF8.GetString($utf8EncodedText)
        Write-Host $decodedText
    }
	
	if ($personPosition) {
        $utf8EncodedText = [System.Text.Encoding]::UTF8.GetBytes($personPosition.innerText)
        $decodedText = [System.Text.Encoding]::UTF8.GetString($utf8EncodedText)
        Write-Host $decodedText
    }
}

@Harm_Veenstra 

@YeuHarng You could gather the information in a pscustomobject and write it to an Excel file, something like this:

 

# Loop through each "col-lg-12 person-box" element
$total = foreach ($personBox in $personBoxes) {
    $personName = $personBox.getElementsByClassName("d-flex flex-row row person-detail")[0]
	$personPosition = $personBox.getElementsByClassName("d-flex flex-row row person-detail")[2]

    if ($personName) {
        # Convert the inner text to UTF-8 encoding and print it
        $utf8EncodedText = [System.Text.Encoding]::UTF8.GetBytes($personName.innerText)
        $decodedText = [System.Text.Encoding]::UTF8.GetString($utf8EncodedText)
        [pscustomobject]@{
        Text =  $decodedText
       }
    }
	
	if ($personPosition) {
        $utf8EncodedText = [System.Text.Encoding]::UTF8.GetBytes($personPosition.innerText)
        $decodedText = [System.Text.Encoding]::UTF8.GetString($utf8EncodedText)
        [pscustomobject]@{
        Text =  $decodedText
       }
    }
}

#Check if the ImportExcel module is installed. Install it if not
if (-not (Get-Module -ListAvailable -Name ImportExcel)) {
    Write-Warning ("The ImportExcel module was not found on the system, installing now...")
    try {
        Install-Module -Name ImportExcel -SkipPublisherCheck -Force:$true -Confirm:$false -Scope CurrentUser -ErrorAction Stop
        Import-Module -Name ImportExcel -Scope Local -ErrorAction Stop
        Write-Host ("Successfully installed the ImportExcel module, continuing..") -ForegroundColor Green
    }
    catch {
        Write-Warning ("Could not install the ImportExcel module, exiting...")
        return
    }
}
else {
    try {
        Import-Module -Name ImportExcel -Scope Local -ErrorAction Stop
        Write-Host ("The ImportExcel module was found on the system, continuing...") -ForegroundColor Green
    }
    catch {
        Write-Warning ("Error importing the ImportExcel module, exiting...")
        return  
    }
    
}


$total | Export-Excel -Path c:\temp\output.xlsx -AutoFilter -AutoSize