Mar 16 2021 11:23 AM
We have a hive schema set up with a number of tables, some of which have over a million records. We are trying to access the tables within Excel and are using the Presto Driver by going to Get Data -> Other Sources -> ODBC -> Presto (which we installed and configured).
What we found was when we connect and choose the table and select transform a query is made to the cluster if there isn't a preview. Then each time a change is made like moving a column or deselecting a column, etc. a query is made back to hive. If we are working with a larger table then this causes performance problems. Why does it do this and how can we turn this off? I thought the preview brought in a subset of data and only pressing refresh or close and load would a query be submitted back to hive.
I looked in the Power Query Options and Fast Load is NOT selected. I also saw an article that said to choose Specify custom default load settings and to uncheck boxes which I did.
Thank you for any help on this!
Mar 18 2021 06:45 AM
Mar 18 2021 07:27 AM
Hi - @Jan Karel Pieterse in our case we are also using the data in Tableau and will not be able to limit the number of records in the database. I am actually very surprised that the application would send requests to the database for any movement like moving a column to another location.
Mar 18 2021 09:53 AM
SolutionIf you query the server for few millions of records and on next step filtered them down to few hundreds, Power Query in background combine these steps and returns from the server only few hundreds. That's query folding. If it doesn't work or if you break it with another step, filtering will be done on the client site and that could take years to perform.
Mar 18 2021 09:54 AM
Mar 18 2021 10:51 AM
Mar 18 2021 10:52 AM
Mar 18 2021 09:53 AM
SolutionIf you query the server for few millions of records and on next step filtered them down to few hundreds, Power Query in background combine these steps and returns from the server only few hundreds. That's query folding. If it doesn't work or if you break it with another step, filtering will be done on the client site and that could take years to perform.