Problem : Power QUery will notload to Data Model

Copper Contributor

Background: A PowerQuery had been successfully combining data from a dozen large (101 cols x 100,000 row csv files. The table was extensively cut with ~80% rows filtered out , unwanted cvolumns removed, and some new data columns added. Output was a table loaded to workbook only. This ran successfully over several months with a new CSV data file being added to the list every week. The PQ got somewhat slower, as the data got more extensive, but still ran fine.

 

Problem : on trying to develop this, I wished to be able to load the table to the Workbook Data Model, in order to add some measures, based on ISO Week calendar (e.g. show variationof latest week against average of preceding weeks). Result : if the Query settings are changed to 'Load to Data Model' (irrespective of load to Table, or Connection-only) then the query goes into a spinning-wheel-of-doom. The query shows 'Retrieving Data...' but the row-count never updates nor does it show any source files being loaded.

 

The only way to stop it is to hit ESC, in fact sometimes that doesn't work either and I have to start Task Manager and aim to kill the Excel process; it isn't needed to actually kill it, just to issue the Stop Process command seems to eventually interrupt the loop.

 

Effect : this means I am restricted to loading the combined data into a Table and cannot add context-specific measures.

 

More generally : I am finding Get & Transform powerful for simple use-cases, but in this example, no alone have I found this 'does not load to Data Model' issue but I'm also suffering from a lack of knowleddge and resources on how to optimise the data-load. I am sure there is some ability to cache the data from the older CSVs (only 1 new one added each week), but it is impossible to fnd any decent resources.

 

 

 

25 Replies

Hi @Epee_Sharkey 

 

What version of Excel are you using ?   

 

If you go to Data Model > Manage Data Model does Power Pivot open up

 

Are you able to load a simple table into the data model?

 

In terms of caching older CSVs unfortunately this is not currently possible.  The entire data set has to be refreshed each time.

 

 

Hi @Wyn Hopkins  thanks for the quick reply.

 

I'm using 64-bit Excel 2016(standalone version, not the O365 one). I don't have the laptop open right now, but it is up to the latest, or at least a recent patch. 

 

The Load to Data Model issue is that I can load to Data Model, but on refresh of a table (where Load to DM is ticked), then the query doesn't work, just stays in that kind of starting position loop I mentioned. The only way out is to untick load to DM and then re-tick. Then the query will refresh and the data loads to DM. But in the meantime pivot(s) based on the DM not to mention measures specific to the table are gone ! 

 

I found this happen even on a 'toy' version where I made some much smaller files (10,000 rows an a lot fewer columns). So maybe this is something more generally at fault with my Excel config -- it is very frustrating since without being able to load to DM, I cannot make proper Measures etc. 

 

Thanks for confirming the situation with caching of the CSVs -- I did eventually find something in the blogosphere explaining that in a bit more detail. Frustrating not to be able to find a simple comprehensive explanation of what are the capability AND limitations of G&T. 

 

Are there any knowledge resources that explain this in more detail ? While my CSVs are large (~100MB each), my actual final table is much more compact (10,000s of rows and 10s of columns). If there was a way of reading them in more effiicently, that could work really well. (or at least establish what is the 'art of the possible' ). Maybe I can find something that works solidly for 13 weeks (1 Q), then ask my IT colleagues to provide some kind of data-warehouse or such to extend the solution. 

 

You will likely find that the O365 version of Power Pivot is more stable as a lot of work has been done over the last few years.

 

Are you able to get access to the Diagram view in the Power Pivot model?

 

Maybe @Ed Hansberry  can shed some light on his use of a Data Lake to store CSV files.

Using Data Lake works faster for me for CSV files (at least in my limited experience one larger data set), but I'm not sure if it is any more stable for loading into tables or data models.

 

I'd be inclined to test this with a new workbook. In the old workbook, open Power Query and select ALL queries and copy. Close Power Query, then open it again in the new workbook and Paste. 

 

Now load your desired query to the data model. That will be a fresh load. You could also test this with Power BI by pasting the queries there in its Power Query module, then load only the one you want.

 

Unfortunately, even on the Office 365 version of Power Pivot, I've seen issues where the Data Model gets munged.

 

Unfortunately if you do this and it works, you have to recreate your measures manually.

Thanks @Ed Hansberry  and @Wyn Hopkins 

 

It seems a good suggestion to try making a new workbook, but I'm struggling to try it out because PQ essentially has become unworkable in the original book.

 

One thing I did notice is under Options & Settings/Query Options/Data Load : Data Cache Management options. The cache 'tops out' at 217 MB, I've tried clearing that several times and it always goes back to 21.7. The maximum allowed is at the default 4096.

 

IN my earlier question about caching the CSV files, I was not expecting a cache to be persistent session to session (would be nice though !), but at leat persistent within one session would be good. As far as I can see, there is no caching at all happening and therefore any change on the query is causing the entire set of files to be loaded again. Is this top out at 21/7 a usual thing ? Anything I can do to fix that?

 

@Why Hopkns : you sayO365 version may be more stable - that may be so (I have it on my home PC), but my company is not ready for O365; I had to make a business case for getting Office 2016 Pro+ on my laptop. It is a useful upgrade and the G&T capabilities are truly impressive. But if I cannot get some good info on what exactly are the capabilities and limitations of G&T and Data Model, then it becomes a whole lot less useful. To be clear I'm not expecting this to 'just work'; Initially when I tried this import with G&T on a single CSV file, I was impressed and knew I would run into a limit on #csv files given their size. And I was impressed it kept going so well over so many iterations. But now that it's broken, I cannot even seem to get back to a stable working solution

PS @Ed Hansberry  -- what is 'Data Lake' and where can I find more info about it ?

In terms of editing queries there are a couple of settings you can switch off so try these ( see screenshots)

 

Another question would be how much RAM do you have on your machine.  Hopefully at least 16GB.

 

image.pngimage.png

@Wyn Hopkins  - thanks - that is very helpful.

 

So far re-creating the queries in a new blank workbook seems to be yielding promising results. I can download all the CSV into a table which is successfully going into DM as well.

 

I've changed settings to match your suggestions, which hopefully will help a little bit.

 

My laptop is a basic, i5 8GB RAM. If that was definitely a limiting factor I could possibly justify going to 16GB. But I'm not keen to start down that route without some clear evidence. Are there any tools or indications to benchmark performance ? My impression, based on the fact that a new workbook is performing well (albeit I've not added all my filtering and adjustment steps)

is that the issue was one with the original workbook somehow getting 'clogged up' - it had been going for 3 - 4 months with a steadily increasing folder full of these big CSV files.

 

Hmm,

 

@Ed Hansberry @Wyn Hopkins  I found that re-building the query from the ground up in a new workbook :

 

- overall refresh time is a lot faster (partly because I am filtering more data out)

-...but I still cannot load the resulting table to DM without the problem described earlier.

 

What I noticed is :

 

- while I was doing ordinary point-and-click steps (finding the folder of files, promoting headers, removing unwanted columns, simple filtering etc) all seemed fine the table would load to DM.

 

~- when I added some more complex steps where I was filtering the loaded table against some helper tables (which are small, only ~10 rows and 1 col), then the issues seems to arise.

The query still works, but the table will no longer go into the Data Model.

 

Any clues as to why this might be happening ?

 

One thought I have is that I could move the steps from Power Query and add them into the Data Model ?

Can you explain a bit more about filtering using helper tables. Are you doing merges?
Regarding RAM - the more te better when using 64 BIt Excel. Given your operating system will use 4GB then that doesn’t leave much for large data sets

Essentially with only 8GB of RAM you’re not necessarily benefitting too much from having 64 bit Excel.

If you open up Task Manager and monitor memory performance while your query is running you’ll probably see it maxing out

If you’re performing Merge operations then these are the slowest most memory intensive operations.

 

 

@Wyn Hopkins  Yes, merges... I am merging a column of my table (considerably reduced by earlier filtering and col removal) with a table from the excel workbook ( v small 'helper' table).

In fact I am doing this a few times.

Merge may be slow, but before discovering that technique I had been using a list comparison where it was looking for values in each row of a given column in a Excel table viewed as a list.

So I reckon merge probably looks speedy compared with that.

 

Anyway, my rebuilt query (in brand new workbook) is basically performing 'OK' in terms of time to load.

But it isn't loading to Data Model, only to a table.

 

Out of curiosity I tried loading that table into Data Model, and that throws this error :

PPivotError.png

 

Is that a routine message because it is a violation (why) to load a table which is created by PQ but NOT loaded to DM into the Data Model ?

 

Or is it a clue why I cannot load this table into the Data Model ?

Definitely a helpful article, thanks for linking 

 

Of the 3 options, #1 (save as .xlsx not .xls) and #3 ('circular references' in powerpivot) don't apply. 

#2 (temp file possible corruption) looks interesting. Checking out the location specified indeed I found plenty of the those strangely name folders I deleted all of them, and restarted. 

 

No Dice. Still same weird symptom, the PQ runs fine (it is actually running a lot better than in the previous incarnation) but just will not load into DM. 

Frustrating, I think I have to give up for now. The time that Excel G&T has saved me is being used up in this fruitless pursuit of an answer. 

 

Thanks @Wyn Hopkins  & @Ed Hansberry  for your efforts. 

@Wyn Hopkins @Ed Hansberry - a quick coda on this one. 

 

I bit the bullet and re-created my workbook. This time I consciously 

 

- set up the G&T script so that it only did normal processing, with no references to other tables. 

- this successfully loaded into Data Model, where I recreated the 'merge' -- actually I managed without doing the merges. 

 

Overall, that works a lot faster in terms of refresh  and I get the main table into the Data Model. 

 

Once all this was working, just as an experiment I went back to the Power Query and inserted a new ste. This was enacting a filter to keep only rows where the value is found in a single-column table in the workbook. In fact with the data I used there is one row in one column here. That 'helper table' is loaded simply through the 'From Table' button on the Data ribbon. It is created with the option 'Connection Only' - and I have tried both checking and un-checking Load to DM with same results.  Within Power Query the step (which I added manually through Advanced Editor.) works fine, performing the filter just as expected. 

 

But now, on exiting Power Query via Load To.... the table doesn't load to Data Model. I mean even after 15 - 20 minutes, it is still showing Loading Data... with the spinning wheel but no sign of progress. Maybe it will load eventually (though I doubt it) but 15+ minutes to load a simple 

 

So reverting back to the Power Query I replace my previous edit, and just hard-code the same filter as before. Now the table loads to Data Model (thankfully the measures etc I had coded are still there). 

 

To me, this seems definitive proof -- when the Power Query contains a step referencing a table within the Workbook, then it doesn't load to the Data Model, but will load to a table only. 

 

I will try some smaller examples to explore that for reproducibility. Assuming it does reproduce -- is there a place to report this as a bug or issue?

Hi @Epee_Sharkey 

 

Can you share the M Code from the Advanced editor, I'll then see if I can replicate it.

 

If you have a bug you can go to File > Feedback > Send a frown

 

image.png

@Wyn Hopkins  sorry for the delayed response.

 

I've been having some issues trying to check reproducibility. 

 

My original file is way too big and in any case contains non-public information which is not appropriate to share. 

 

I made a tiny version with a self-contained table created in PQ code, but that actually filters fine against a table from a Worksheet cell range, and loads to the DM just fine. 

 

I also have a toy version of the data, this is a completely separate file, created from scratch which embodies the key data steps as the main file. Unfortunately this still is quite large. However it exhibits the exact same behaviour - loading to Table just fine, but just hanging on load to DM. 

 

I can probably make a Zip with the Toy file and the data files (in a subsidiary folder) if that helps. 

Hi

How many queries are involved? If it's just one then could you just copy and past the M code from the advanced editor. Shouldn't be any security issues with that.

@Wyn Hopkins  you may well be sorry you asked, but here goes.

BTW, my present solution on my full data set (10 - 15 CSV files with ~100,000 rows x 100 columns) is now working really well. Dividing the munge functions between G&T and the DM is giving me what I need*, and it is working faster and more smoothly than before. So I don't have an immediate issue, just a nagging sense of something not being quite right.

(* apart from not being able to filter by table as I explained previously, but I can live with that).

 

Note - this is a 'toy' example using smaller data-sets that I had built just to try out filtering techniques in PQ. But essentially identical to what I was doing on the full data-set.

 

There are 2 key Queries A : Load from folder of CSV files, do some basic filtering and manipulation, resulting in a table. Query B takes that table and does some more complicated operation, the key is using a table (loaded from cells in the workbook) to test a column in the table for presence of text strings.

 

IN checking this for reproducibility, I found that with a very small data set (2 files each of 10 rows), everything worked fine and I could load to the DM no problem. When I increased the size 10 files, each 10,000 rows then :

Query A worked and either loaded to DM or made connection, no problem.

Query B worked on load to Table (or connection only), but ticking the 'Load to DM' meant the query never refreshed. I say 'never' -- I leave it for up to 15/20 minutes and find it is still saying 'retrieving data' and never gets on to incrementing rows.

 

Query A :

let
    Source = Folder.Files("H:\My Documents\SupplyAndDemand\ONDB\ToyData\CSVExtractFruit"),
    #"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
    #"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File from CSVExtractFruit", each #"Transform File from CSVExtractFruit"([Content])),
    #"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
    #"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File from CSVExtractFruit"}),
    #"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File from CSVExtractFruit", Table.ColumnNames(#"Transform File from CSVExtractFruit"(#"Sample File"))),
    #"Changed Type" = Table.TransformColumnTypes(#"Expanded Table Column1",{{"Source.Name", type text}, {"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type text}, {"Column6", type text}, {"Column7", type text}, {"Column8", type text}, {"Column9", type text}, {"Column10", type text}}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Changed Type", [PromoteAllScalars=true]),
    #"Changed Type1" = Table.TransformColumnTypes(#"Promoted Headers",{{"Fruit2018Week49.csv", type text}, {"Week", type text}, {"Market", type text}, {"City", type text}, {"XS_Col1", type text}, {"XS_Col2", type text}, {"What_Ships", type text}, {"Fruit_Desc", type text}, {"Shipment_size", type number}, {"Shipment_Value", Int64.Type}, {"XS_Col3", type text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{"XS_Col1", "XS_Col2", "XS_Col3"}),
    #"Merged Queries" = Table.NestedJoin(#"Removed Columns",{"Market"},Country_to_include,{"CC_to_include"},"Country_to_include",JoinKind.Inner),
    #"Removed Columns1" = Table.RemoveColumns(#"Merged Queries",{"Country_to_include"}),
    #"Merged Queries1" = Table.NestedJoin(#"Removed Columns1",{"Week"},Weeks_to_include,{"Week"},"Weeks_to_include",JoinKind.Inner),
    #"Removed Columns2" = Table.RemoveColumns(#"Merged Queries1",{"Weeks_to_include"})
in
    #"Removed Columns2"

 

Query B : (doesn't load to DM)

let
    Source = CSVExtractFruit,
    #"Added Custom" = Table.AddColumn(#"Source","US Fruit", each if [Market] = "US" then List.Intersect({Text.Split([Fruit_Desc]," "),{US_Fruit[US_Fruit]}{0}}){0}? else null),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "FruitKeyWord", each if Text.Contains([Fruit_Desc], "FRUIT:") then Text.BetweenDelimiters([Fruit_Desc],"FRUIT:"," ") else null),
    #"Merged Columns" = Table.CombineColumns(#"Added Custom1",{"FruitKeyWord", "US Fruit"},Combiner.CombineTextByDelimiter("", QuoteStyle.None),"AllFruitFound"),
    #"Filtered Rows" = Table.SelectRows(#"Merged Columns", each ([AllFruitFound] <> "")),
    #"Merged Queries" = Table.NestedJoin(#"Filtered Rows",{"AllFruitFound"},Top_Fruit,{"Top_Fruit"},"Top_Fruit",JoinKind.LeftOuter),
    #"Expanded Top_Fruit" = Table.ExpandTableColumn(#"Merged Queries", "Top_Fruit", {"Top_Fruit"}, {"Top_Fruit.Top_Fruit"}),
    #"Renamed Columns" = Table.RenameColumns(#"Expanded Top_Fruit",{{"Top_Fruit.Top_Fruit", "TopFruitFound"}})
in
    #"Renamed Columns"

 

Example data-file -- first few rows of 10,000

 

WeekMarketCityXS_Col1XS_Col2What_ShipsFruit_DescShipment_sizeShipment_Value
2019-Week07CAToronto  All_FruitAn assortment of FRUIT:Honeydew Couried to dest101
2019-Week07CAOttawa  VegetableShipment of VEG:konjac sent by overland shipping101
2019-Week07USChicago  All_FruitShipment of FRUIT:Redcurrant sent by overland shipping103
2019-Week07CAOttawa  VegetableShipment of VEG:butternut squash Couried to dest10084