Calculating traveled distance from coordinates on works but aggregation causes abort

Juho Hanhimäki · ‎Jul 05 2018

I have location data which is timestamped. I use the data to calculate the distance traveled.

First I have the function which is used for calculating the distance between coordinates:

let getDistance = (lat1:double, lon1:double, lat2:double, lon2:double)
{
let R = 6371;
let dLat = radians(lat2-lat1);
let dLon = radians(lon2-lon1);
let a = sin(dLat/2) * sin(dLat/2) + cos(radians(lat1)) * cos(radians(lat2)) * sin(dLon/2) * sin(dLon/2);
let c = 2 * atan2(sqrt(a), sqrt(1-a));
R * c
};

Then the query:

LocationData_CL
| where TimeGenerated > ago(60d)
| where AssetId_s == "test"
| order by TimeGenerated
| project TimeGenerated, distance = getDistance(Lat_d, Lon_d, prev(Lat_d), prev(Lon_d))

This works fine and returns 390 results.

Now i try to aggregate daily distance:

| summarize sum(distance) by bin(TimeGenerated, 1d)

This fails:

QUERY ABORTED

The query was aborted by the server.
Details: It looks like the accumulated data on the server exceeded the limit.
Support id: e15414e1-8bab-47b9-aaab-6638d1e91c9f

Is there a way I can make this query work?

Meir Mendelovich · ‎Jul 05 2018

Hi,

This error is very rare and used only for queries that hog memory resources above very high threshold. There is a fraction of a percentage of those every day. There is usually a way to optimize them and make them run smoothly.
If you can contact me directly (meirm@microsoft), I would try to assist you.

Thanks,
Meir

Meir Mendelovich · ‎Jul 05 2018

Looking at your query, there is one thing that is completely redundant and may cause it to blow out of memory: the "order" before the summarize. It doesn't make sense to order something before you summarize it.

Performance-wise, you kill the parallelism of the system as ordering means that the system have to unify the results from all nodes and then to summarize while regular summarize using bins and sum functions can be done almost completely in parallel. In highly distributed system like ours (we have some clusters of more than 100 nodes) this has huge effect. I'm not saying this is the case but it is an option.

Calculating traveled distance from coordinates on works but aggregation causes abort

Calculating traveled distance from coordinates on works but aggregation causes abort

Re: Calculating traveled distance from coordinates on works but aggregation causes abort

Re: Calculating traveled distance from coordinates on works but aggregation causes abort

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs

Most Active Hubs

Video Hub

Calculating traveled distance from coordinates on works but aggregation causes abort

Calculating traveled distance from coordinates on works but aggregation causes abort

Re: Calculating traveled distance from coordinates on works but aggregation causes abort

Re: Calculating traveled distance from coordinates on works but aggregation causes abort