Forum Discussion
Juho Hanhimäki
Jul 05, 2018Copper Contributor
Calculating traveled distance from coordinates on works but aggregation causes abort
I have location data which is timestamped. I use the data to calculate the distance traveled. First I have the function which is used for calculating the distance between coordinates: let getDis...
Meir_Mendelovich
Microsoft
Jul 05, 2018Looking at your query, there is one thing that is completely redundant and may cause it to blow out of memory: the "order" before the summarize. It doesn't make sense to order something before you summarize it.
Performance-wise, you kill the parallelism of the system as ordering means that the system have to unify the results from all nodes and then to summarize while regular summarize using bins and sum functions can be done almost completely in parallel. In highly distributed system like ours (we have some clusters of more than 100 nodes) this has huge effect. I'm not saying this is the case but it is an option.
Performance-wise, you kill the parallelism of the system as ordering means that the system have to unify the results from all nodes and then to summarize while regular summarize using bins and sum functions can be done almost completely in parallel. In highly distributed system like ours (we have some clusters of more than 100 nodes) this has huge effect. I'm not saying this is the case but it is an option.