Forum Discussion
Get Data
Before everything else. Could you revise/update the title of this discussion to better reflects the challenge (something like Get Data from CSV Performance issue would make sense) please? (This helps people who Search) - Thanks
But I wouldn't mind having it longer even to a year, if possible, (I just chose 6 months as at the time I had 6 months data) You already suffer with 4395 records - Your call...
Spent quite a bit of time testing different things and approaches, clearly there are 2 steps that impact the perf. (not talking about reading the CSV on a remote SD card...):
- Take the +/- 4400 bottom records of the CSV
- You want the above in reverse order
Tests environment
- CSV (500k rows * 201 cols) stored on local SSD
- PC Core I7 with 16Gb. Ram doesn't really matter as long as you have say 6-8Gb
FYI: Power Query caps the Ram it uses to 256Mb per query (it's not really/exactly per query). Just to say that even if had 100Gb of Ram this wouldn't change anything. For this reason, Buffering (loading to Ram) the CSV isn't an option and would be counterproductive (swap between Ram & HDD) - Query options (see attached pictures):
- Allow data previews to download in the background > Disabled
- Ignore the Privacy Levels and potentially improve performances
Summary
- Various approches: +/- 24 to 27 secs to refresh
- New approach (best I could get): +/- 9 secs
Queries (in attached file zipped due to its size)
- MaxRowsToImport: You said 4395 rows/records isn't a constant number. I implemented a query parameter for you to adjust. If you pref. to have that parameter on an Excel sheet that's doable but this has a tiny cost re. performances
- SourceIndexed: You will have to update it with Power Query Advanced Editor (see below) to restore your Source step. That query only reads the CSV, then an [_INDEX] column is added
Note that this query doesn't load to a sheet, it stays as a Connection only - Bins 5-8 (In a nutshell)
- Create a Temp Table with - only - and [_INDEX] column where the number of rows = MaxRowsToImport
- Merge/Join Temp Table with query SourceIndexed
- Sort [_INDEX] in Descending order. Note that at that step I Buffer the table (1)
- Remove [_INDEX] + Change column types (Text to Decimal)
(1) I didn't test with MaxRowsToImport > 4,400. If you go above, it is possible you hit the counterproductive point I mentioned earlier. To be tested in actual environment...
What to change in query SourceIndexed
It currently begins with:
let
Source = Csv.Document(
File.Contents("C:\Lz\Downloads\MsTech\SimulAndySource.csv"),
[Delimiter=",", Columns=201, Encoding=65001, QuoteStyle=QuoteStyle.None]
),
Replace above lines with:
let
Source = Csv.Document(
Web.Contents("ftp://10.0.0.196/TempLog.csv"),
[Delimiter=",", Columns=201, Encoding=1252, QuoteStyle=QuoteStyle.None]
),
I'm gonna be away for +/- a week. Not sure I will have time to read your posts but if you need help go, other contributors to this community have PQ skills
Quick question: How often (per day/week...) do you get the data from the CSV?
- AndytccFeb 23, 2024Copper ContributorYou mentioned the following in one of you posts.
Before everything else. Could you revise/update the title of this discussion to better reflects the challenge (something like Get Data from CSV Performance issue would make sense) please? (This helps people who Search) - Thanks
How do I do that. - AndytccFeb 23, 2024Copper ContributorI appreciate all your time and effort to help me, but you got me stumped with all this information. I'm lost.