Geospatial joins are now supported by Azure Data Explorer
Published Sep 18 2020 01:13 PM 3,665 Views
Microsoft

This blog post covers a new geo function and the basics of geospatial joins.

 

geo_polygon_to_s2cells

geo_polygon_to_s2cells takes a polygon or multipolygon and a S2 cell level as input and creates an array of all token strings which cover it. Be aware that covering a large-area polygon with small-area cells can lead to a huge amount of covering cells. As a result, the query might return null.

 

KQL-Query:

 

let polygon = dynamic({"type":"Polygon","coordinates":[[[0,0],[0,50],[100,50],[0,0]]]});
print s2_cell_token_count = array_length(geo_polygon_to_s2cells(polygon, 5));

 

 Result:

s2_cell_token_count
286

 

Geospatial joins

This query pattern is oftentimes used in various mobility solutions (geospatial telemetry and static reference data), geospatial risk analysis and agriculture optimization using weather data. It is based on the three-dimensional S2 geometry and the functions geo_polygon_to_s2cells and geo_point_in_polygon. By use of this functionality a geospatial join consists of a coarse-grained join using the S2 cell coverage and the exact validation using the geo_point_in_polygon function.

 

The four main steps:

  1. Converting polygons to S2 cells of level k,
  2. Converting points to the same S2 cells level k,
  3. Joining on S2 cells,
  4. Filtering by geo_point_in_polygon().

 

The following picture explains the flow of the entire KQL query. First you need to choose the right S2 cell level. It should not be too big and not too small (it is not recommended to use more than 10.000 cells, 65535 are possible in theory). Second you are creating the S2 cell tokens for the static dataset. Next you are joining the polygons with the timeseries based on the tokens. This provides you with a dataset which might still has some false positives (S2 cell overlap) and that is why you need to end the statement with a check if the actual point is in the polygon.

geospatialJoin.png

 

This is a concrete example of a geospatial join based on the StormEvents and US_States table located on the help cluster. It calculates the damage in $ by state.

 

KQL-Query:

 

let join_level = 4; 
US_States 
| project State = features.properties.NAME, polygon = features.geometry 
| extend covering = geo_polygon_to_s2cells(polygon, join_level) 
| mv-expand covering to typeof(string) 
| join kind = inner hint.strategy = broadcast 
( 
  StormEvents 
  | project BeginLon, BeginLat , DamageProperty 
  | extend covering = geo_point_to_s2cell(BeginLon, BeginLat, join_level) 
) on covering 
| where geo_point_in_polygon(BeginLon, BeginLat, polygon) 
| summarize CountOfEvents=count(), DamageInDollar=sum(DamageProperty) by tostring(State) 
| top 3 by DamageInDollar desc

 

Result:

State CountOfEvents DamageInDollar

Kansas

2298

533352500

Texas

3889

488525500

Ohio

852

371630500

2 Comments

Thank you @cosh23 for this post!

Are there any other planned geo functions in the pipline?

Microsoft

Hi @elyusubov ,

 

we continously improving our geospatial capabilities. Recently we added support for multi-lines in some functions and converting polygon or multipolygon planar edges to geodesics by adding intermediate points using geo_polygon_densify(). Is there something particular that you are looking for? Please feel free to add an item to https://feedback.azure.com/forums/915733-azure-data-explorer. Thx

 

Cheers, henning.

Version history
Last update:
‎Sep 18 2020 01:19 PM
Updated by: