Forum Discussion
PowerQuery Connection to COVID-19 Data on GitHub Get and Transform
The two queries I'm using put together all the tables and they also fix some of the data inconsistencies that happened over time.
Other datasets out there have been better at using standardized region codes so that the same data inconsistencies from John's Hopkins don't happen.
The first query gets all the data sets, and calls the 2nd to get each file in turn.
let
// base URL for all daily updates, format is 01-22-2020.csv, starting 22-jan-2020
startDate = #date(2020,1,22),
// this generate is a "loop" that returns a separate table for each day's data
tableList = List.Generate(()=>startDate, each not Date.IsInNextDay(_), each Date.AddDays(_,1), MakeTableForDate),
// Now combine the data into a massive table
combinedTables = Table.Combine(tableList),
// Adjust all the types and replace nulls with blanks in the combined table
#"Changed Type" = Table.TransformColumnTypes(combinedTables,{{"Last_Update", type datetime}, {"Confirmed", Int64.Type}, {"Deaths", Int64.Type}, {"Recovered", Int64.Type}, {"Lat", type number}, {"Long_", type number}, {"Active", Int64.Type}}),
#"Replaced Value" = Table.ReplaceValue(#"Changed Type",null,"",Replacer.ReplaceValue,{"Admin2"}),
#"Replaced Value1" = Table.ReplaceValue(#"Replaced Value",null,0,Replacer.ReplaceValue,{"Confirmed"}),
#"Replaced Value2" = Table.ReplaceValue(#"Replaced Value1",null,0,Replacer.ReplaceValue,{"Deaths"}),
#"Replaced Value3" = Table.ReplaceValue(#"Replaced Value2",null,0,Replacer.ReplaceValue,{"Recovered"}),
#"Replaced Value4" = Table.ReplaceValue(#"Replaced Value3",null,0,Replacer.ReplaceValue,{"Active"}),
#"Replaced Value5" = Table.ReplaceValue(#"Replaced Value4",null,"",Replacer.ReplaceValue,{"Combined_Key"}),
#"Replaced Value6" = Table.ReplaceValue(#"Replaced Value5","Mainland China","China",Replacer.ReplaceText,{"Country_Region"}),
outputTable = Table.AddIndexColumn( #"Replaced Value6", "RowIndex", 0, 1)
in
outputTable
and
(inputDate as date) =>
// returns a table full of values for the particular date
// this is called by the GetFileList query
let
baseURL = "https://github.com/CSSEGISandData/COVID-19/raw/master/csse_covid_19_data/csse_covid_19_daily_reports/",
csvName = Date.ToText(inputDate, "MM-dd-yyyy") & ".csv",
webName = baseURL & csvName,
// webContents = Web.Contents(baseURL, [Query=[csvName]]),
webContents = Web.Contents(baseURL, [RelativePath=@csvName]),
dataTablenh = try Csv.Document(webContents) otherwise #table({},{}),
dataTableoh = Table.PromoteHeaders(dataTablenh),
// rename columns so we'll get a consistent table design
// new field names FIPS,Admin2,Province_State,Country_Region,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active,Combined_Key
dataTableR = Table.RenameColumns( dataTableoh,
{{"Province/State", "Province_State"},
{"Country/Region", "Country_Region"},
{"Last Update", "Last_Update"},
{"Latitude", "Lat"},
{"Longitude", "Long_"}}, MissingField.Ignore),
dataTable = Table.AddColumn(dataTableR, "File Date", each inputDate)
in
dataTable
Note if you want to be able to use the PowerBI service to refresh the data you have to use the form of the query using RelativePath.
Note I also use a lot of DAX to (somewhat correctly, and somewhat bug ridden- playing with that -- ) in order to combine the data into my dashboard which you can take a look at here.
https://app.powerbi.com/view?r=eyJrIjoiNjRmMGZkMzItOTc1Ny00MjhhLTk3YmEtZjFkOWUwNWNjMGQ2IiwidCI6IjYwNGVkZjE0LTljZWQtNDQ5NC04NzExLTZkMjJmMzYyZTYwOSIsImMiOjJ9
Hey thanks so much for the additional information. I'll have to take a closer look at code you're using to transform the data. A few days after I got my small workbook up and running John Hopkins again adjusted the dataset..... LOL So I know exactly what you mean.
Keep up the good work! Impressive Dashboard!