PowerQuery Connection to COVID-19 Data on GitHub Get and Transform

Bronze Contributor

Hello-

 

It's been a while since I have visited the forum.  I took a bit of a break there.  I know that someone will be able to help me.  I'm trying to set up a workbook that I can refresh daily with the two most recent COVID-19 files from John Hopkins GetHub Repository.  I have used powerquery for a number of tasks but have never really had the need to scrape the web.

 

The files can be found at this link: https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_report...

 

If I want to get the file from yesterday the link looks like this: https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_daily_report...

 

I was hoping there would be a way to parameterize the data with i'm pulling down with that last portion of the URL (the date).  When I try to pull the data into PowerQuery  I can see the HTML Table Tags.... Just not sure how to parse what I want out of the file.  If anyone could provide assistance it would be very much appreciated.

2020-03-23_12-20-41.png

 

 

Ideally, I would like 03-22-2020 to be pulled onto a worksheet called Current and 03-21-2020 to be pulled onto a worksheet called Previous.

 

 

 

 

5 Replies

@Matt Mickle 

Hi Matt,

 

You may use direct link on the file, add date as parameter

let
    Source = Csv.Document(
        Web.Contents(
            "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/03-22-2020.csv"
        ),
        [Delimiter=",", Columns=8, Encoding=65001, QuoteStyle=QuoteStyle.None]
    ),
    #"Promoted Headers" = Table.PromoteHeaders(
        Source,
        [PromoteAllScalars=true]
    ),
    #"Changed Type" = Table.TransformColumnTypes(
        #"Promoted Headers",
        {
            {"Province/State", type text},
            {"Country/Region", type text},
            {"Last Update", type datetime},
            {"Confirmed", Int64.Type},
            {"Deaths", Int64.Type},
            {"Recovered", Int64.Type},
            {"Latitude", type number},
            {"Longitude", type number}
        }
    )
in
    #"Changed Type"

Another sample of using this source is here https://datachant.com/2020/03/18/covid-19-build-your-own-power-bi-dashboard/

@Sergei Baklan 

 

Thanks so much for the help.  Really appreciate the quick response and reference link.  Hopefully all is well with you.  I heard the Summit was online this year due to extra precaution for the Cornavirus.  I hope that all the MVPs are doing well and you all were able to have a bit of fun at the sessions online.  Good to see you're still out here on the forum helping others.  Thanks again!

@Matt Mickle 

Matt, welcome back to TechComm. Yes, it was a great experience with virtual summit, hopefully all went smoothly. Microsoft did a great job.

Take care.

@Matt Mickle 

 

The two queries I'm using put together all the tables and they also fix some of the data inconsistencies that happened over time.

 

Other datasets out there have been better at using standardized region codes so that the same data inconsistencies from John's Hopkins don't happen.

 

The first query gets all the data sets, and calls the 2nd to get each file in turn.

let
// base URL for all daily updates, format is 01-22-2020.csv, starting 22-jan-2020
startDate = #date(2020,1,22),
// this generate is a "loop" that returns a separate table for each day's data
tableList = List.Generate(()=>startDate, each not Date.IsInNextDay(_), each Date.AddDays(_,1), MakeTableForDate),
// Now combine the data into a massive table
combinedTables = Table.Combine(tableList),
// Adjust all the types and replace nulls with blanks in the combined table
#"Changed Type" = Table.TransformColumnTypes(combinedTables,{{"Last_Update", type datetime}, {"Confirmed", Int64.Type}, {"Deaths", Int64.Type}, {"Recovered", Int64.Type}, {"Lat", type number}, {"Long_", type number}, {"Active", Int64.Type}}),
#"Replaced Value" = Table.ReplaceValue(#"Changed Type",null,"",Replacer.ReplaceValue,{"Admin2"}),
#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value",null,0,Replacer.ReplaceValue,{"Confirmed"}),
#"Replaced Value2" = Table.ReplaceValue(#"Replaced Value1",null,0,Replacer.ReplaceValue,{"Deaths"}),
#"Replaced Value3" = Table.ReplaceValue(#"Replaced Value2",null,0,Replacer.ReplaceValue,{"Recovered"}),
#"Replaced Value4" = Table.ReplaceValue(#"Replaced Value3",null,0,Replacer.ReplaceValue,{"Active"}),
#"Replaced Value5" = Table.ReplaceValue(#"Replaced Value4",null,"",Replacer.ReplaceValue,{"Combined_Key"}),
#"Replaced Value6" = Table.ReplaceValue(#"Replaced Value5","Mainland China","China",Replacer.ReplaceText,{"Country_Region"}),
outputTable = Table.AddIndexColumn( #"Replaced Value6", "RowIndex", 0, 1)
in
outputTable

 

and 

 

(inputDate as date) =>
// returns a table full of values for the particular date
// this is called by the GetFileList query
let
baseURL = "https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_daily_reports/",
csvName = Date.ToText(inputDate, "MM-dd-yyyy") & ".csv",
webName = baseURL & csvName,
// webContents = Web.Contents(baseURL, [Query=[csvName]]),
webContents = Web.Contents(baseURL, [RelativePath=@csvName]),
dataTablenh = try Csv.Document(webContents) otherwise #table({},{}),
dataTableoh = Table.PromoteHeaders(dataTablenh),
// rename columns so we'll get a consistent table design
// new field names FIPS,Admin2,Province_State,Country_Region,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active,Combined_Key
dataTableR = Table.RenameColumns( dataTableoh,
{{"Province/State", "Province_State"},
{"Country/Region", "Country_Region"},
{"Last Update", "Last_Update"},
{"Latitude", "Lat"},
{"Longitude", "Long_"}}, MissingField.Ignore),
dataTable = Table.AddColumn(dataTableR, "File Date", each inputDate)

in
dataTable

 

Note if you want to be able to use the PowerBI service to refresh the data you have to use the form of the query using RelativePath.

 

Note I also use a lot of DAX to (somewhat correctly, and somewhat bug ridden- playing with that -- ) in order to combine the data into my dashboard which you can take a look at here.

https://app.powerbi.com/view?r=eyJrIjoiNjRmMGZkMzItOTc1Ny00MjhhLTk3YmEtZjFkOWUwNWNjMGQ2IiwidCI6IjYwN...

 

@ericleigh007 

 

Hey thanks so much for the additional information.  I'll have to take a closer look at code you're using to transform the data.  A few days after I got my small workbook up and running John Hopkins again adjusted the dataset.....   LOL  So I know exactly what you mean.

 

Keep up the good work! Impressive Dashboard!