kusto
143 TopicsAzure Data Explorer's Advanced Geospatial Analysis: Breaking New Ground
In the rapidly evolving landscape of data analytics, Azure Data Explorer (ADX) continues to push boundaries with its latest enhancement to geospatial capabilities. These new features, meticulously developed by Michael Brichko and the Kusto team, represent a significant advancement in how organizations can analyze and derive insights from location-based data directly within the Kusto Query Language (KQL). The Evolution of Geospatial Analysis in ADX Azure Data Explorer has long supported basic geospatial operations, but this latest release dramatically expands its capabilities with functions that address specific analytical challenges faced by data engineers and analysts working with spatial data. New Powerful Lookup Plugins and Joins At the core of this update are powerful additions that solve complex spatial relationship problems: the geo_polygon_lookup and geo_line_lookup plugins, alongside comprehensive geospatial join capabilities. geo_polygon_lookup This plugin efficiently determines relationships between points and polygons, answering questions like "which sales territory contains this customer?" or "which service zone covers this location?": let territories = datatable(region_name:string, polygon:dynamic) [ "Northeast", dynamic({"type":"Polygon", "coordinates":[[[-73.97375,40.74300],[-73.98653,40.75486],[-73.99910,40.74112],[-73.97375,40.74300]]]}), "Southwest", dynamic({"type":"Polygon","coordinates":[[[2.57564,48.76956],[2.42009,49.05163],[2.10167,48.80113],[2.57564,48.76956]]]}), ]; let customer_locations = datatable(customer_id:string, longitude:real, latitude:real) [ "Customer1", -73.98000, 40.74800, "Customer2", 2.50000, 48.90000, "Customer3", 10.00000, 50.00000 ]; customer_locations | evaluate geo_polygon_lookup(territories, polygon, longitude, latitude) | project customer_id, region_name The performance benefits here are substantial - instead of complex self-joins or multi-step operations, this plugin handles the spatial relationship calculations in a single, optimized operation. geo_line_lookup Similarly, the geo_line_lookup plugin identifies lines (like highways, pipelines, or power lines) within a specified distance of points: let infrastructure = datatable(line_id:string, line:dynamic) [ "Highway 101", dynamic({"type":"LineString","coordinates":[[-122.40297,37.79329],[-122.38855,37.77867]]}), "Main Pipeline", dynamic({"type":"LineString","coordinates":[[-118.35645,34.17247],[-118.32962,34.09873]]}), ]; let maintenance_reports = datatable(report_id:string, longitude:real, latitude:real, issue:string) [ "R001", -122.39500, 37.78500, "Debris", "R002", -118.34000, 34.15000, "Leak", "R003", -120.00000, 36.00000, "Damage" ]; maintenance_reports | evaluate geo_line_lookup(infrastructure, line, longitude, latitude, 500) // within 500 meters | project report_id, line_id, issue This capability is invaluable for infrastructure management, transportation analysis, and network optimization scenarios. Advanced Geospatial Joins Beyond the lookup plugins, ADX now provides comprehensive support for various geospatial join strategies. These capabilities allow for sophisticated spatial analysis using different geo-hashing approaches: // Join locations using S2 cells let locations1 = datatable(name: string, longitude: real, latitude: real) [ "Store A", -0.12433080766874127, 51.51115841361647, "Store B", -0.12432651341458723, 51.511160848670585, "Store C", -0.12432466939637266, 51.51115959669167, "Store D", 1, 1, ]; let customer_visits = datatable(customer_id: string, longitude: real, latitude: real) [ "Customer1", -0.12432668105284961, 51.51115938802832 ]; let s2_join_level = 22; // Higher level = more precision locations1 | extend hash = geo_point_to_s2cell(longitude, latitude, s2_join_level) | join kind = inner ( customer_visits | extend hash = geo_point_to_s2cell(longitude, latitude, s2_join_level) ) on hash | project name, customer_id For more complex proximity requirements, the H3 geo-hashing system with neighbor awareness provides an elegant solution: // Join locations using H3 cells with neighbor awareness let retail_locations = datatable(store_name: string, longitude: real, latitude: real) [ "Downtown", -0.12433080766874127, 51.51115841361647, "Westside", -0.12432651341458723, 51.511160848670585, "Eastside", -0.12432466939637266, 51.51115959669167, "Remote", 1, 1, ]; let customer_events = datatable(event_id: string, longitude: real, latitude: real) [ "Purchase123", -0.12432668105284961, 51.51115938802832 ]; let to_hash = (lng: real, lat: real) { let h3_hash_level = 14; // Precision level let h3_hash = geo_point_to_h3cell(lng, lat, h3_hash_level); array_concat(pack_array(h3_hash), geo_h3cell_neighbors(h3_hash)) }; retail_locations | extend hash = to_hash(longitude, latitude) | mv-expand hash to typeof(string) | join kind = inner ( customer_events | extend hash = to_hash(longitude, latitude) | mv-expand hash to typeof(string) ) on hash | distinct store_name, event_id For proximity-based joins that require precise distance calculations, the buffer-based approach provides exceptional flexibility: // Join locations based on precise distance buffers let venues = datatable(venue_name: string, longitude: real, latitude: real) [ "O2 Entrance", 0.005889454501716321, 51.50238626916584, "North Gate", 0.0009625704125020596, 51.50385432770013, "Greenwich Park", 0.0009395106042404677, 51.47700456557013, ]; let points_of_interest = datatable(poi_id: string, longitude: real, latitude: real) [ "O2 Arena", 0.003159306017352037, 51.502929224128394 ] | extend buffer = geo_point_buffer(longitude, latitude, 300, 0.1); // 300-meter radius venues | evaluate geo_polygon_lookup(points_of_interest, buffer, longitude, latitude) | project venue_name, poi_id Beyond joins and lookups, ADX now offers specialized functions for precise geospatial calculations: geo_from_wkt() The new geo_from_wkt() function bridges the gap between different geospatial systems by converting Well-Known Text (WKT) format - a standard in GIS systems - into GeoJSON objects that ADX can process: // Convert WKT to GeoJSON for further analysis let wkt_shapes = datatable(shape_id:string, wkt_representation:string) [ "City Boundary", "POLYGON((-122.406417 37.785834, -122.403984 37.787343, -122.401826 37.785069, -122.404681 37.782928, -122.406417 37.785834))", "Highway", "LINESTRING(-122.33707 47.60924, -122.32553 47.61803)" ]; wkt_shapes | extend geojson_shape = geo_from_wkt(wkt_representation) Result: shape_id wkt_representation geojson_shape City Boundary POLYGON((-122.406417 37.785834, -122.403984 37.787343, -122.401826 37.785069, -122.404681 37.782928, -122.406417 37.785834)) { "type": "Polygon", "coordinates": [ [ [ -122.406417, 37.785834 ], [ -122.403984, 37.787343 ], [ -122.401826, 37.785069 ], [ -122.404681, 37.782928 ], [ -122.406417, 37.785834 ] ] ] } Highway LINESTRING(-122.33707 47.60924, -122.32553 47.61803) { "type": "LineString", "coordinates": [ [ -122.33707, 47.60924 ], [ -122.32553, 47.61803 ] ] } This function supports all standard geometry types including Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection. Route Analysis Functions Two complementary functions provide sophisticated route analysis capabilities: geo_line_interpolate_point geo_line_interpolate_point() calculates a point at a specified fraction along a line: // Find points along a delivery route at 25%, 50%, and 75% of the journey let delivery_routes = datatable(route_id:string, route:dynamic) [ "Route A", dynamic({"type":"LineString","coordinates":[[-122.33707, 47.60924], [-122.32553, 47.61803]]}) ]; delivery_routes | extend start_point = geo_line_interpolate_point(route, 0), quarter_point = geo_line_interpolate_point(route, 0.25), midpoint = geo_line_interpolate_point(route, 0.5), three_quarters = geo_line_interpolate_point(route, 0.75), end_point = geo_line_interpolate_point(route, 1) Result: route_id route start_point quarter_point midpoint three_quarters end_point Route A { "type": "LineString", "coordinates": [ [ -122.33707, 47.60924 ], [ -122.32553, 47.61803 ] ] } { "type": "Point", "coordinates": [ -122.33707, 47.609240000000007 ] } { "type": "Point", "coordinates": [ -122.33418536369042, 47.611437608491833 ] } { "type": "Point", "coordinates": [ -122.33130048494128, 47.613635144663533 ] } { "type": "Point", "coordinates": [ -122.3284153637215, 47.615832608503482 ] } { "type": "Point", "coordinates": [ -122.32553000000002, 47.61803 ] } geo_line_locate_point geo_line_locate_point() performs the inverse operation, determining how far along a route a specific point is located: // Calculate what percentage of the route has been completed let active_routes = datatable(vehicle_id:string, route:dynamic, current_long:real, current_lat:real) [ "Truck1", dynamic({"type":"LineString","coordinates":[[-122.33707, 47.60924], [-122.32553, 47.61803]]}), -122.33000, 47.61500 ]; active_routes | extend completion_percentage = geo_line_locate_point(route, current_long, current_lat) * 100 | project vehicle_id, completion_percentage Result: vehicle_id completion_percentage Truck1 63,657018697669 Together, these functions enable precise tracking and analysis of movement along routes, critical for logistics, fleet management, and transportation applications. Closest Point Calculations Two new functions address the need to find exact points on geometric features that are closest to reference points: geo_closest_point_on_line geo_closest_point_on_line() identifies the exact point on a line nearest to a reference point: print point_on_line = geo_closest_point_on_line(-115.199625, 36.210419, dynamic({ "type":"LineString","coordinates":[[-115.115385,36.229195],[-115.136995,36.200366],[-115.140252,36.192470],[-115.143558,36.188523],[-115.144076,36.181954],[-115.154662,36.174483],[-115.166431,36.176388],[-115.183289,36.175007],[-115.192612,36.176736],[-115.202485,36.173439],[-115.225355,36.174365]]})) geo_closest_point_on_polygon geo_closest_point_on_polygon() calculates the closest point on a polygon boundary to a given location: let central_park = dynamic({"type":"Polygon","coordinates":[[[-73.9495,40.7969],[-73.95807266235352,40.80068603561921],[-73.98201942443848,40.76825672305777],[-73.97317886352539,40.76455136505513],[-73.9495,40.7969]]]}); print geo_closest_point_on_polygon(-73.9839, 40.7705, central_park) These functions enable precise proximity analysis for applications ranging from emergency response to facility planning. Technical Implementation and Performance Considerations What makes these new geospatial features particularly impressive is their integration with ADX's query engine. The functions leverage ADX's columnar storage and parallel processing capabilities to perform complex spatial operations efficiently at scale. For large datasets, the lookup plugins use optimized spatial indexing techniques to avoid the performance pitfalls that typically plague geospatial joins. This means that operations that might take minutes or hours in traditional GIS systems can execute in seconds on properly optimized ADX clusters. Real-World Applications The new geospatial capabilities in ADX enable sophisticated solutions across industries: Telecommunications Network operators can analyze signal coverage against population density polygons to identify optimization opportunities: let cell_towers = datatable(tower_id:string, signal_longitude:real, signal_latitude:real, signal_strength:int) [ "Tower1", -73.98000, 40.74800, 85, "Tower2", -73.97500, 40.75200, 72, "Tower3", -73.99000, 40.74100, 90 ]; let population_zones = datatable(zone_name:string, population_density:int, zone_polygon:dynamic) [ "Midtown", 25000, dynamic({"type":"Polygon", "coordinates":[[[-73.97375,40.74300],[-73.98653,40.75486],[-73.99910,40.74112],[-73.97375,40.74300]]]}) ]; let signal_threshold = 80; let density_threshold = 20000; cell_towers | evaluate geo_polygon_lookup(population_zones, zone_polygon, signal_longitude, signal_latitude) | summarize avg_signal_strength=avg(signal_strength) by zone_name, population_density | where avg_signal_strength < signal_threshold and population_density > density_threshold Energy Pipeline operators can identify sensors that might be affected by maintenance on nearby infrastructure: let planned_maintenance = datatable(maintenance_id:string, maintenance_longitude:real, maintenance_latitude:real, start_time:datetime) [ "M001", -118.34500, 34.16000, datetime(2025-04-15), "M002", -118.33000, 34.10000, datetime(2025-04-20) ]; let pipelines = datatable(pipeline_id:string, pipeline_path:dynamic) [ "Main", dynamic({"type":"LineString","coordinates":[[-118.35645,34.17247],[-118.32962,34.09873]]}), "Secondary", dynamic({"type":"LineString","coordinates":[[-118.36000,34.15000],[-118.34000,34.08000]]}) ]; let sensors = datatable(sensor_id:string, pipeline_id:string, sensor_type:string, next_scheduled_reading:datetime) [ "S001", "Main", "Pressure", datetime(2025-04-16), "S002", "Main", "Flow", datetime(2025-04-18), "S003", "Secondary", "Temperature", datetime(2025-04-22) ]; planned_maintenance | evaluate geo_line_lookup(pipelines, pipeline_path, maintenance_longitude, maintenance_latitude, 500) | join kind =inner sensors on pipeline_id | project maintenance_id, sensor_id, sensor_type, next_scheduled_reading Transportation & Logistics Fleet operators can optimize routing by analyzing historical trip data against road networks: let completed_trips = datatable(trip_id:string, trip_success:bool, incident_longitude:real, incident_latitude:real, incident_type:string) [ "T001", false, -122.40000, 47.61000, "Delay", "T002", false, -122.39500, 47.60800, "Traffic", "T003", false, -122.41000, 47.62000, "Weather" ]; let road_networks = datatable(road_name:string, road_type:string, road_path:dynamic) [ "5th Avenue", "Urban", dynamic({"type":"LineString","coordinates":[[-122.40297,47.59329],[-122.40297,47.62000]]}), "Highway 99", "Highway", dynamic({"type":"LineString","coordinates":[[-122.39000,47.60000],[-122.41000,47.63000]]}) ]; completed_trips | where trip_success == false | evaluate geo_line_lookup(road_networks, road_path, incident_longitude, incident_latitude, 100) | summarize incidents_count=count() by road_name, road_type, incident_type | order by incidents_count desc Environmental Monitoring Researchers can correlate sensor readings with geographic zones to track pollution dispersion: let sensor_readings = datatable(sensor_id:string, reading_type:string, reading_value:real, timestamp:datetime, sensor_longitude:real, sensor_latitude:real) [ "ENV001", "PM2.5", 35.2, datetime(2025-04-01), -122.33000, 47.61500, "ENV002", "PM2.5", 22.8, datetime(2025-04-01), -122.34000, 47.62000, "ENV003", "PM2.5", 41.3, datetime(2025-04-01), -122.32000, 47.60500 ]; let air_quality_zones = datatable(zone_name:string, zone_boundary:dynamic) [ "Downtown", dynamic({"type":"Polygon","coordinates":[[[-122.34000,47.60000],[-122.34000,47.62000],[-122.32000,47.62000],[-122.32000,47.60000],[-122.34000,47.60000]]]}) ]; sensor_readings | where reading_type == "PM2.5" and timestamp > ago(7d) | evaluate geo_polygon_lookup(air_quality_zones, zone_boundary, sensor_longitude, sensor_latitude) | summarize avg_reading=avg(reading_value) by zone_name, bin(timestamp, 1h) Getting Started with ADX Geospatial Analysis The geospatial functions are available in all ADX clusters without requiring any special configuration. For optimal performance with large spatial datasets: Consider partitioning strategies that account for spatial locality Pre-compute and cache complex geometries when dealing with static boundaries Monitor query performance to identify opportunities for optimization Conclusion Azure Data Explorer's enhanced geospatial capabilities represent a significant advancement in making spatial analysis accessible and performant within a cloud analytics platform. By eliminating the need for specialized GIS tools and complex data movement, these features enable organizations to derive deeper insights from location-based data more efficiently than ever before. Whether you're analyzing telecommunications networks, optimizing logistics operations, managing energy infrastructure, or monitoring environmental patterns, ADX now provides the tools to incorporate sophisticated geospatial analysis directly into your data workflows. #AzureDataExplorer #ADX #Kusto #KQL #GeospatialAnalysis #DataAnalytics #CloudComputing #BigData #SpatialIntelligence #BusinessIntelligence #DataEngineering #MicrosoftAzure #DataScience #GIS #Analytics205Views2likes0CommentsNew ADX Dashboards Customization Features: More Control, Better Usability, and Improved Performance
We’re introducing new dashboard customization features to enhance control, usability, and performance. From managing data series visibility to improving navigation and map behavior, these updates help create a clearer, more efficient dashboard experience. Legend Number Configuration Adjustable Panel Width Crosshair Tooltip Number Configuration Map Centering Configuration Legend Number Configuration for Dashboards To enhance readability and performance, dashboard authors can now configure the number of data series displayed on load when multiple series are expected in a chart. Additional series remain accessible via the legend and can be rendered as needed. For example, imagine a chart designed to display energy consumption over time for a fleet of cars. The dashboard author expects a large number of data series—one for each vehicle. To make the chart easier to interpret and improve dashboard performance, they can now set a limit on how many series are rendered initially. Users can still explore the full dataset by selecting additional series from the legend. Crosshair Tooltip Number Configuration We’re introducing a new setting under Display options that allows dashboard authors to control the number of data points displayed in a chart’s crosshair tooltip. Depending on the expected number of data series in a chart and the specific use case, dashboard owners can now set a limit on how many data points appear in the tooltip. This helps improve readability and prevents overcrowding when dealing with a large number of series. With this update, users can tailor the tooltip experience to focus on the most relevant insights while keeping charts clear and easy to interpret. Note: This tile-level setting may be overridden by the general ADX web setting, "Show all series in chart tooltip." Adjustable Panel Width for Editors and Viewers We’re introducing a highly requested improvement: the ability to manually adjust the width of the pages pane in both edit and view modes. For dashboards with multiple pages—especially those with long names—users can now resize the panel by dragging to expand or collapse it, making navigation easier and improving usability. This flexibility ensures a more comfortable viewing experience, allowing users to see more of their page names at a glance without truncation. Map Centering Configuration for Dashboard Tiles Introducing a new setting to Map visualizations in Dashboards, giving users more control over how maps behave during data refreshes. With the new auto center setting, displayed on top of the map visualization, users can choose whether the map resets its zoom and center position upon refresh or maintains their manually adjusted view: Auto center OFF: The zoom level and position set by the user will persist across data refreshes, preventing unwanted zoom-in/out changes. Users can still manually reset the view using the Center button. Auto center ON: The map will automatically adjust its zoom and center position with each data refresh, ensuring the view is always recalibrated based on the latest data. This feature helps prevent disruptions in analysis, particularly for users who prefer a fixed view while monitoring live data updates. Azure Data Explorer Web UI team is looking forward for your feedback in KustoWebExpFeedback@service.microsoft.com You’re also welcome to add more ideas and vote for them here - Ideas1.2KViews4likes0CommentsSynapse Data Explorer (SDX) to Eventhouse Migration Capability (Preview)
Synapse Data Explorer (SDX), part of Azure Synapse Analytics, is an enterprise analytics service that enables you to explore, analyze, and visualize large volumes of data using the familiar Kusto Query Language (KQL). SDX has been in public preview since 2019. The evolution of Synapse Data Explorer The next generation of SDX offering is evolving to become Eventhouse, part of Real-Time Intelligence in Microsoft Fabric. Eventhouse offers the same powerful features and capabilities as SDX, but with enhanced scalability, performance, and security. Eventhouse is built on the same technology as SDX, and is compatible with all the applications, SDKs, integrations, and tools that work with SDX. For existing customers considering a move to Fabric, we are excited to offer a seamless migration capability. You can now migrate your Data Explorer pools from Synapse workspace to Eventhouse effortlessly. To initiate the migration of your SDX cluster to Eventhouse, simply create a support ticket. Our team will collaborate with you to ensure a smooth transition. Looking ahead, we plan to release a self-service API in early 2025, enabling you to manage the migration independently.267Views0likes0CommentsQuery Acceleration for Delta External Tables (Preview)
An external table is a schema entity that references data stored external to a Kusto database. Queries run over external tables can be less performant than on data that is ingested due to various factors such as network calls to fetch data from storage, the absence of indexes, and more. Query acceleration allows specifying a policy on top of external delta tables. This policy defines a number of days to cache data for high-performance queries. Query Acceleration policy allows customers to set a policy on top of external delta tables to define the number of days to cache. Behind the scenes, Kusto continuously indexes and caches the data for that period, allowing customers to run performant queries on top. QAP is supported by Azure Data Explorer (ADX) over ADLSgen2/blob storage and Eventhouse over OneLake/ADLSgen2/blob storage. Query Acceleration policy We are introducing a new policy to enable acceleration for delta external tables: Syntax .alter external table <TableName> policy query_acceleration 'Policy' Where: <TableName> is the name of a Delta Parquet external table. <Policy> is a string literal holding a JSON property bag with the following properties: IsEnabled : Boolean, required. - If true, query acceleration is enabled. Hot: TimeSpan, last 'N' days of data to cache. Steps to enable Query Acceleration Create a delta external table as described in this document: .create-or-alter external table <TableName> kind=delta ( h@'https://storageaccount.blob.core.windows.net/container;<credentials> ) Set a query acceleration policy .alter external table <TableName> policy query_acceleration ```{ "IsEnabled": true, "Hot": "36500d" }``` Query the table. external_table('TableName') Note: Indexing and caching might take some time depending on the volume of data and cluster size. For monitoring the progress, see Monitoring command Costs/Billing Enabling Query Acceleration does come with some additional costs. The accelerated data will be ingested in Kusto and count towards the SSD storage, similar to native Kusto tables. You can control the amount of data to accelerate by configuring number of days to cache. Conclusion Query Acceleration is a powerful feature designed to enhance your data querying capabilities on PetaBytes of data. By understanding when and how to use this feature, you can significantly improve the efficiency and speed of your data operations - whether you are dealing with large datasets, complex queries, or real-time analytics, Query Acceleration provides the performance boost you need to stay ahead. Get started with Azure Data Explorer. Get started with Eventhouse in Microsoft Fabric.496Views1like0CommentsCountry and Region Information in current_principal_details
Discover how Eventhouse's new Country and Region Information feature in the current_principal_details() function is revolutionizing data security and accessibility. By leveraging Microsoft Entra ID, this feature provides country and region information, ensuring sensitive data is accessible only to authorized individuals within specified locations.1.3KViews0likes0CommentsADX Continuous Export to Delta Table - Public Preview
We're excited to announce that continuous export to Delta table is now available in Preview. Continuous export in ADX allows you to export data from Kusto to an external table with a periodically run query. The results are stored in the external table, which defines the destination, such as Azure Blob Storage, and the schema of the exported data. This process guarantees that all records are exported "exactly once", with some exceptions. Continous export previously supported CSV, TSV, JSON and Parquet formats.4.3KViews1like1Comment