Forum Discussion

awatelet's avatar
awatelet
Copper Contributor
Mar 16, 2021
Solved

Can you turn off querying hive in Excel Power Query anytime you make a change?

We have a hive schema set up with a number of tables, some of which have over a million records. We are trying to access the tables within Excel and are using the Presto Driver by going to Get Data -> Other Sources -> ODBC -> Presto (which we installed and configured). 

 

What we found was when we connect and choose the table and select transform a query is made to the cluster if there isn't a preview. Then each time a change is made like moving a column or deselecting a column, etc. a query is made back to hive. If we are working with a larger table then this causes performance problems. Why does it do this and how can we turn this off? I thought the preview brought in a subset of data and only pressing refresh or close and load would a query be submitted back to hive.

 

I looked in the Power Query Options and Fast Load is NOT selected. I also saw an article that said to choose Specify custom default load settings and to uncheck boxes which I did. 

 

Thank you for any help on this!

  • awatelet 

    If you query the server for few millions of records and on next step filtered them down to few hundreds, Power Query in background combine these steps and returns from the server only few hundreds. That's query folding. If it doesn't work or if you break it with another step, filtering will be done on the client site and that could take years to perform.

6 Replies

  • JKPieterse's avatar
    JKPieterse
    Silver Contributor
    Would it not be wise to setup a query on the database side that limits the amount of data first, rather than pulling in all 1E6 records? I know PQ does something they call query folding (which should limit the amount of network traffic when the query is running), but I don't know if your ODBC driver allows that to happen efficiently.
    • awatelet's avatar
      awatelet
      Copper Contributor

      Hi - JKPieterse in our case we are also using the data in Tableau and will not be able to limit the number of records in the database. I am actually very surprised that the application would send requests to the database for any movement like moving a column to another location.

      • JKPieterse's avatar
        JKPieterse
        Silver Contributor
        As far as I know PowerQuery only does that during development of the query, I *think* it is trying to optimize the query along the way. But I'm no PQ expert by any means 🙂

Resources