Faster way to transform a time range worth of api data in excel query editor and plot a graph

Copper Contributor

I want to be able to create a trendline graph in excel by combining api data using multiple url sources from https://api.data.gov.sg/v1/environment/wind-speed?date=2021-10-XX (with XX being the days in a date parameter YYYY-MM-DD format) for the whole of Oct 2021 (basically ?date=2021-10-01, ?date=2021-10-02 ... ?date=2021-10-31) as each url can only provide daily "date & time" and "values". Plus, I only require readings from station id S115.

 

Hence, is there a faster way to get and combine all the necessary data, transform it on the query editor as well as plot the graph?

3 Replies

@BrnTan 

If API gives no options I don't think something could be improved dramatically. If only exclude extra json nodes from very beginning and fix converted table in memory.

let
    Source = Json.Document(
        Web.Contents("https://api.data.gov.sg/v1/environment/wind-speed?date=2021-10-01")
    )[items],
    #"Converted to Table" = Table.Buffer(
        Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error)
    ),
    #"Expand readings lists" = Table.ExpandRecordColumn(
        #"Converted to Table",
        "Column1",
        {"timestamp", "readings"},
        {"timestamp", "readings"}
    ),
    #"Expanded readings" = Table.ExpandListColumn(#"Expand readings lists", "readings"),
    #"Expand readings values" = Table.ExpandRecordColumn(
        #"Expanded readings",
        "readings",
        {"station_id", "value"},
        {"station_id", "value"}
    ),
    #"Filter Station" = Table.SelectRows(
        #"Expand readings values",
        each ([station_id] = "S117")
    ),
    #"Changed Type" = Table.TransformColumnTypes(
        #"Filter Station",
        {
            {"timestamp", type datetimezone},
            {"station_id", type text},
            {"value", type number}
        }
    )
in
    #"Changed Type"

In theory it could be done with lists transformations, but Power Query is optimised to work with tables.

Many thanks for the help and the file. I have also edited and rephrase the initial post for better clarity.

Would it also be possible to combine api data using multiple url sources from https://api.data.gov.sg/v1/environment/wind-speed?date=2021-10-XX (with XX being the days in a date parameter YYYY-MM-DD format) for the whole of Oct 2021 (basically ?date=2021-10-01, ?date=2021-10-02 ... ?date=2021-10-31)?

@BrnTan 

On the test

- transformed above script to function with two parameters, date (as text) and station_id

(ISOdate as text, station as text) =>

let
    Source = Json.Document(
        Web.Contents(pURL,[Query = [date = ISOdate ]])
    )[items],

    #"Converted to Table" = Table.Buffer(
        Table.FromList(
            Source,
            Splitter.SplitByNothing(),
            null, null, ExtraValues.Error)
    ),
    #"Expand readings lists" = Table.ExpandRecordColumn(
        #"Converted to Table",
        "Column1",
        {"timestamp", "readings"},
        {"timestamp", "readings"}
    ),
    #"Expanded readings" = Table.ExpandListColumn(
        #"Expand readings lists",
        "readings"),
    #"Expand readings values" = Table.ExpandRecordColumn(
        #"Expanded readings",
        "readings",
        {"station_id", "value"},
        {"station_id", "value"}
    ),
    #"Filter Station" = Table.SelectRows(
        #"Expand readings values",
        each ([station_id] = station)
    ),
    #"Removed Other Columns" = Table.SelectColumns(
        #"Filter Station",{"timestamp", "value"}
    )
in
    #"Removed Other Columns"

- to simplify created in Excel date for October, query it, get data for each date and expand into one table

let
    Source = Excel.CurrentWorkbook(){[Name="Date"]}[Content],
    DateOnly = Table.TransformColumnTypes(
        Source,
        {{"Date", type date}}),
    DateToText = Table.AddColumn(
        DateOnly,
        "textDate",
        each Date.ToText( [Date], "yyyy-MM-dd" )),
    KeepAbove = Table.RemoveColumns(
        DateToText,{"Date"}),
    GetStationsData = Table.AddColumn(
        KeepAbove,
        "Custom",
        each fnGetData( [textDate], "S117" )),
    RemoveTextdate = Table.SelectColumns(
        GetStationsData,{"Custom"}),
    ExpandData = Table.ExpandTableColumn(
        RemoveTextdate,
        "Custom",
        {"timestamp", "value"}, {"timestamp", "value"}   ),
    #"Changed Type" = Table.TransformColumnTypes(
        ExpandData,
        {
            {"timestamp", type datetimezone},
            {"value", type number}
        }
    )
in
    #"Changed Type"

On my not very new PC it took about 3 minutes to refresh and load data into Excel table. Not very dramatic.

 

But you'd like to keep data for years and many stations such straightforward approach won't work. We may try to optimize above script bit more, not sure it changes something significantly. Excel doesn't give options for pagination, thus some workarounds. Perhaps not within query, but where and how keep and combine historical data.