SOLVED

Can you turn off querying hive in Excel Power Query anytime you make a change?

Copper Contributor

We have a hive schema set up with a number of tables, some of which have over a million records. We are trying to access the tables within Excel and are using the Presto Driver by going to Get Data -> Other Sources -> ODBC -> Presto (which we installed and configured). 

 

What we found was when we connect and choose the table and select transform a query is made to the cluster if there isn't a preview. Then each time a change is made like moving a column or deselecting a column, etc. a query is made back to hive. If we are working with a larger table then this causes performance problems. Why does it do this and how can we turn this off? I thought the preview brought in a subset of data and only pressing refresh or close and load would a query be submitted back to hive.

 

I looked in the Power Query Options and Fast Load is NOT selected. I also saw an article that said to choose Specify custom default load settings and to uncheck boxes which I did. 

 

Thank you for any help on this!

6 Replies
Would it not be wise to setup a query on the database side that limits the amount of data first, rather than pulling in all 1E6 records? I know PQ does something they call query folding (which should limit the amount of network traffic when the query is running), but I don't know if your ODBC driver allows that to happen efficiently.

Hi - @Jan Karel Pieterse in our case we are also using the data in Tableau and will not be able to limit the number of records in the database. I am actually very surprised that the application would send requests to the database for any movement like moving a column to another location.

best response confirmed by awatelet (Copper Contributor)
Solution

@awatelet 

If you query the server for few millions of records and on next step filtered them down to few hundreds, Power Query in background combine these steps and returns from the server only few hundreds. That's query folding. If it doesn't work or if you break it with another step, filtering will be done on the client site and that could take years to perform.

As far as I know PowerQuery only does that during development of the query, I *think* it is trying to optimize the query along the way. But I'm no PQ expert by any means :)
Thank you for your response above and will consider filtering first moving forward.
Thank you for taking time to respond and confirming what Sergei posted as well above!
1 best response

Accepted Solutions
best response confirmed by awatelet (Copper Contributor)
Solution

@awatelet 

If you query the server for few millions of records and on next step filtered them down to few hundreds, Power Query in background combine these steps and returns from the server only few hundreds. That's query folding. If it doesn't work or if you break it with another step, filtering will be done on the client site and that could take years to perform.

View solution in original post