Azure Data Explorer Blog

9 MIN READ

Generating and Visualizing H3 Indexed Data with Kusto and Python

Former Employee

May 25, 2022

Working with and visualizing geospatial data can sometimes be a challenge, especially in cases where you need to index or group your data into clusters. While there are many approaches to geospatial indexing, one of the most common approaches is to use H3, which is a system designed by Uber. H3 works by dividing the area of a sphere (our Earth) into a series of hexagons, and each hexagon contains another series of seven hexagons, which in turn have seven hexagons, and so on until you reach the maximum “resolution” of the H3 system:

H3 Cell Composition Example (Source: Uber)

These hexagons represent an index that points can fall within. For instance, let’s say you wanted to aggregate a series of points over a large geographic area. You could plot each point, but that can be an extremely tedious and complex task, but by using H3, this indexing can help highlight areas of concentration for analysis.

H3 is very useful, and earlier this year the Kusto team integrated a suite of H3 functions into Azure Data Explorer. To help illustrate how H3 works, you can use Kusto to quickly identity which H3 “cell” the Microsoft headquarters in Redmond, WA falls into. You can try this yourself in Kusto to see the result:

print h3_cell=geo_point_to_h3cell(-122.1215, 47.6740, 7)

This function can take in a longitude and latitude coordinate and returns a hexadecimal value that represents the cell the point belongs to. Note that the length of the hexadecimal value corresponds to the resolution value (the third function parameter; if you change the 7 to a 3 (lower resolution), you’ll get more trailing “f” values and higher resolutions work the opposite.

Once you have the H3 cell ID, we can in turn use another handy function to take the resulting H3 value and convert it to a polygonal shape:

print geo_h3cell_to_polygon("862a1072fffffff")

This gives you the GeoJSON dynamic result which contains an array of coordinates of the H3 hexagon:

A practical example: OpenCellID Coverage

To illustrate how these functions can work, you can download the OpenCellID dataset and use Azure Data Explorer to load the data while enriching the data with H3 indexes. The OpenCellID dataset is free to download and use, but you’ll need to first register to get an access token. Once you have one, you can either download the entire dataset or a specific country by MCC (Mobile Country Code). For instance, let’s download the dataset for the Netherlands:

The file(s) will be in compressed gzip format with a CSV file inside. If you decompress the file and open it with an editor of your choice, the dataset looks very straightforward: Each row represents one tower (transmitter) and has columns denoting the type of radio, MCC code, MNC, and other fields. To store this data, create a “raw” table that you will load the data to that matches the data types in the file:

.create table RawOpenCellID (
    RadioType: string, 
    MCC: long, 
    MNC: long, 
    LAC: long, 
    CID: string, 
    Unit: long, 
    Lon: real, 
    Lat: real, 
    Range: long, 
    Samples: long, 
    Changeable: long, 
    Created: long, 
    Updated: long, 
    AvgSignal: long
)

Next, because your file may (or may not) have a header row, you'll want to create an ingestion mapping. This will be used when you load your CSV files to tell Kusto what format you expect the columns to be in and how to map them to your new table:

 .create table RawOpenCellID ingestion csv mapping "raw_csvMapping"
'['
' { "column":"RadioType","DataType":"string","Ordinal":"0"},'
' {"Name":"MCC","DataType":"long","Ordinal":"1"},'
' {"Name":"MNC","DataType":"long","Ordinal":"2"},'
' {"Name":"LAC","DataType":"long","Ordinal":"3"},'
' {"Name":"CID","DataType":"string","Ordinal":"4"},'
' {"Name":"Unit","DataType":"long","Ordinal":"5"},'
' {"Name":"Lon","DataType":"real","Ordinal":"6"},'
' {"Name":"Lat","DataType":"real","Ordinal":"7"},'
' {"Name":"Range","DataType":"long","Ordinal":"8"},'
' {"Name":"Samples","DataType":"long","Ordinal":"9"},'
' {"Name":"Changeable","DataType":"long","Ordinal":"10"},'
' {"Name":"Created","DataType":"long","Ordinal":"11"},'
' {"Name":"Updated","DataType":"long","Ordinal":"12"},'
' {"Name":"AvgSignal","DataType":"long","Ordinal":"13"}'
']'

Before you import, you'll need to do one last step: because Kusto supports H3 with several functions, you can create a final table with columns for our H3 cells that you want to calculate and as you load your data to the table, you can use an update policy to calculate the cells as the data is being loaded. To do this, you'll first need a function that will act as your query to transform the raw data:

.create-or-alter function with (docstring = "Enriches the data updated timestamps and some H3 resolutions",folder = "UpdatePolicyFunctions") EnrichData() {
RawOpenCellID
| extend Created = unixtime_seconds_todatetime(Created), Updated= unixtime_seconds_todatetime(Updated)
| extend H3_High = geo_point_to_h3cell(Lon, Lat, 9), H3_Medium = geo_point_to_h3cell(Lon, Lat, 6), H3_Low = geo_point_to_h3cell(Lon, Lat, 2)
}

This query will calculate the H3 Cell for each row’s Longitude and Latitude and generate 3 new columns: one for “high” resolution, “medium” resolution, and “low” resolution cells. You can also take this opportunity to transform the timestamp columns to proper datetime types, too. Next, create the final table, which looks a lot like your first table, but with appended H3 cell columns as well as changed data types of the Created and Updated columns:

.create table OpenCellH3Data (
    RadioType: string, 
    MCC: long, 
    MNC: long, 
    LAC: long, 
    CID: string, 
    Unit: long, 
    Lon: real, 
    Lat: real, 
    Range: long, 
    Samples: long, 
    Changeable: long, 
    Created: datetime, 
    Updated: datetime, 
    AvgSignal: long, 
    H3_High: string, 
    H3_Medium: string, 
    H3_Low: string
)

Finally, create and apply the update policy. This will tell Data Explorer to run this function as data is added to the source table, and append it to your new table:

.alter table OpenCellH3Data policy update 
@'[{ "IsEnabled": true, "Source": "RawOpenCellID ", "Query": "EnrichData()", "IsTransactional": false, "PropagateIngestionProperties": false}]'

With all that in place, it’s time to load the data. For this example, just use the Netherland’s source data and the OneClick ingestion process inside Kusto Web Explorer. With this process, you'll upload your file, select your target table and ingestion mapping that was created already, and start the import:

On the Ingest Data page, make sure you select your cluster and database, and select “Existing Table” and pick the raw table you created:

Then select “from file” and upload your MCC file that you downloaded earlier:

Finally, on the schema page, you’ll need to make sure check “keep existing table schema” and also select “use existing” on the mapping and choose the mapping you created above:

And then click “Start Ingestion.” Once it finishes, take a look at the resulting table:

This looks pretty good! Now you've got some nice, enriched, and summarized data, all thanks to Kusto. But how do you “see” it?

Visualizing H3

To properly visualize your H3 plots, you’ve got a lot of options; there are several graphical utilities, SDKs, and libraries to plot the geometric shapes, and some libraries even provide direct support for mapping a known H3 cell ID by hexadecimal value.

For ease of demonstration, it's possible to take your query from the previous section and plot the results using Python. Python provides a robust and easy to use ecosystem of libraries to quickly take your existing query, submit it to Azure Data Explorer, and the present the results visually. Note that the following examples were prepared via Python notebooks in Jupyter Lab, and the source code for each of these demos are available over on GitHub for you to try yourself.

Example 1: Using Plotly

A very straightforward example of viewing this data can be done using the plotly library in conjunction with pandas and geojson. First, you'll query your ADX cluster using Python. Azure Data Explorer has a Python SDK specifically for querying and returning your data. Here's a snippet of how it works (and note you need to replace the cluster URL value below:

AAD_TENANT_ID = "common"
KUSTO_CLUSTER = " <your cluster url>
KUSTO_DATABASE = "opencellid"
kcsb = KustoConnectionStringBuilder.with_aad_device_authentication(KUSTO_CLUSTER)
kcsb.authority_id = AAD_TENANT_ID
Kusto_client = KustoClient(KCSB)
query = """OpenCellH3Data
| where MCC == 204
| where isnotempty(H3_Medium)
| summarize arg_max(Updated, *) by RadioType, MCC, MNC, LAC
| summarize SignalStrength=avg(AvgSignal), dcount(LAC), UmtsCells = countif(RadioType == "UMTS"), LTE = countif(RadioType == "LTE") by H3_Medium
| extend H3_Hash_Polygon = geo_h3cell_to_polygon(H3_Medium)
| extend Features=pack(
"geometry", H3_Hash_Polygon,
"id", H3_Medium,
"properties",pack("value", UmtsCells),
"type", "Feature")"""

response = KUSTO_CLIENT.execute(KUSTO_DATABASE, query)
df = dataframe_from_result_table(RESPONSE.primary_results[0])
feature_list = df['Features'].to_list()

This code block will create all the required connections and authenticate you to your cluster via AAD authentication. If you look at the query, you’ll notice that the “Features” column is packing all the required data and geometry into a dynamic column; this will save the time of having to calculate or transform the data later in Python. Then it takes the query from before and submits it to the cluster. The resulting data lives in a Pandas data frame.

Many plotting and graphing libraries expect the data to be in a GeoJSON feature list format, and using the geojson python library you can quickly create our FeatureList object from your data frame:

from geojson import FeatureCollection
feat_collection = FeatureCollection(feature_list)

Once the data exists in the correct format, create your map:

fig = (px.choropleth_mapbox(
df, 
geojson=feat_collection, 
locations='H3_Medium', 
color='UmtsCells',
color_continuous_scale="agsunset",
range_color=(0,df.UmtsCells.mean()), 
mapbox_style='carto-positron',
zoom=6.5,
center = {"lat": 52.1326, "lon": 5.2913}, 
opacity=0.6,
hover_data=['SignalStrength','dcount_LAC','UmtsCells','LTE'],
width=700,
height=800,
))
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

You can see from the code that there are a lot of options for configuring the resulting map, from which pieces of data to plot, to what colors and themes you’d like. There are a wealth of options to explore, but when you’ve plotted the results you should see your H3 cells, indexed and sized according to the resolution you specified in the query. The map is also interactive; try mousing it over it to see the results!

Example 2: Using kepler.gl

Kepler.gl is a very impressive and high-performance plotting and visualization tool. Primary built on node.js, you can also use Jupyter Notebooks to render your data inside of cell. The results are impressive:

To do this, you need to make sure you look at the install guide for enabling the required extensions, and to create these results you will do the same work as before, except to render your kepler.gl map, you’ll first specify the area, load the data, and display the result. Kepler.gl also supports saving your configuration, and you can load it each time to present the data the way you want each time. Be sure to check out the example in the repository for a complete example.

Try This Yourself: For Free!

H3 represents a fantastic option for data scientists and analysts to quickly and efficiently cluster and index their geographic data. By using the built-in capabilities of Kusto, you can very easily enrich your geospatial data. This example represents something you can use and try yourself, for free. If you haven’t already, make sure you claim your free Kusto cluster today so you can follow along with this example. Also, be sure to check out this repository which contains all the Kusto code for creating your tables and ingestion and update policies. You will also find all the requirements and more detailed instructions for setting up your local environment. Happy querying!

Updated May 31, 2022

Version 5.0

Former Employee

Joined March 09, 2022

View Profile