SOLVED

Catch dates from website

MVP

I got this from someone, and for Albania it works like it should. But if i change to for example Germany as country, it gets text and not the correct dates in the date collum.  I see that with Germany and some other countries also have a publicholiday and regional class. Any idea how to format this to catch the date correctly for all countries?

 

$uri = 'http://www.officeholidays.com/countries/albania/index.php'
$html = Invoke-WebRequest -Uri $uri
$table = $html.ParsedHtml.getElementsByTagName('tr') |
Where-Object {$_.classname -eq 'holiday'} |
Select-Object -exp innerHTML

foreach ($t in $table){
    [void]($t -match 'SPAN title="(.*?)"') ; $m1 = $Matches[1]
    [void]($t -match 'tooltip>(.*)') ; $m2 = $Matches[1]
    [void]($t -match 'remarks>(.*) ') ; $m3 = $Matches[1]
    [PSCustomObject]@{
        Date = $m1 ; Holiday = $m2
        Remarks = If ($m2 -ne $m3){$m3}}
}

 

 

 

 

# Results
Date                       Holiday                    Remarks                 
----                       -------                    -------                 
January 01                 New Year's Day                                     
January 02                 Day after New Years Day                            
March 14                   Summer Day                                         
March 21                   Nevruz                     Spring Festival. Persi

2 Replies
best response confirmed by AlexanderHolmeset (MVP)
Solution

 Try below script and it will work more most country Like USA,INDIA,germany but it will not work for albania.

 

$uri = 'http://www.officeholidays.com/countries/usa/index.php'
$html = Invoke-WebRequest -Uri $uri
$tables = $html.ParsedHtml.getElementsByTagName('tr') |
Where-Object {$_.classname -eq 'holiday' -or $_.classname -eq 'regional' } |
Select-Object -exp innerHTML
foreach ($table In $tables){ 
$day= (($table -split "<TD>")[1] -split "</TD>")[0] ;

$Date = (($table -split "<SPAN class=ad_head_728>")[1] -split "</SPAN>")[0]; 

$Holiday = ((($table -split "<TD><A title=")[1] -split ">")[1] -split "</A")[0]
$Remarks = (($table -split "class=remarks>")[1] -split "<")[0]; 
[PSCustomObject]@{
        Date = $Date  ; Holiday = $Holiday
        Remarks = If ($Holiday -ne $Remarks){$Remarks}}

 }

Thanks alot. Worked well with my test.

1 best response

Accepted Solutions
best response confirmed by AlexanderHolmeset (MVP)
Solution

 Try below script and it will work more most country Like USA,INDIA,germany but it will not work for albania.

 

$uri = 'http://www.officeholidays.com/countries/usa/index.php'
$html = Invoke-WebRequest -Uri $uri
$tables = $html.ParsedHtml.getElementsByTagName('tr') |
Where-Object {$_.classname -eq 'holiday' -or $_.classname -eq 'regional' } |
Select-Object -exp innerHTML
foreach ($table In $tables){ 
$day= (($table -split "<TD>")[1] -split "</TD>")[0] ;

$Date = (($table -split "<SPAN class=ad_head_728>")[1] -split "</SPAN>")[0]; 

$Holiday = ((($table -split "<TD><A title=")[1] -split ">")[1] -split "</A")[0]
$Remarks = (($table -split "class=remarks>")[1] -split "<")[0]; 
[PSCustomObject]@{
        Date = $Date  ; Holiday = $Holiday
        Remarks = If ($Holiday -ne $Remarks){$Remarks}}

 }

View solution in original post