Forum Discussion
Get Data
I'm not sure if I'm wording this right. and I'm also new to the following.
I am using the "get data from the web" in Excel. I am receiving the data no problem. but as the file gets larger it takes too long to retrieve the data. is there a way to somehow delete the data every time is retrieved or what can I do so it doesn't take so long to retrieve the data.
37 Replies
- LorenzoSilver Contributor
Quick check with CSV of 1M rows x 10 columns:
Reverse Rows + Keep Top 201 Rows is slower than Keep Bottom 201 Rows here
- LorenzoSilver Contributor
Hi Andytcc
I am using the "get data from the web", later you say I'm actually getting the data from a csv file on an SD card in a PLC. Quite confusing 🙂
Q1: So CSV only, correct?
Q2: Does PLC mean 'Programmable Logic Controller? (not necessarily important but curious)
Excel / Get Data from... = Power Query
When, within Excel (or Power BI), you Get Data from Web, CSV, Excel, SQL, PDF, JSON... there's what's called a Connector (a piece of software) between the Power Query engine and the Data Source. As you understand the in used Connector depends on the targeted Data Source (CSV in your case)
Wonder why I talk about this? There's someone else to be aware of re. query exec./refresh. Things happen in 2 steps: Query optimization, then actual query execution. I encourage you to read- Overview of query evaluation and query folding in Power Query
- Why does my query run multiple times?
- Why Does Power BI Query My Data Source More Than Once?
To my knowledge (I can't find out articles/forum article mentioning this) your CSV is accessed twice (is it read 100% each time? I don't know):
- During Query optimization
- During Query actual execution
It might be read other times depending on what your query does (cf. Chris Blog Post)
(Involving another XLS/XLSX won't help at all. Reading a CSV is faster than reading an Excel file - probably due to the complexity of Excel file format/structure)
As for the PLC, it will only write to CSV...every hour the PLC writes 201 values in the next available row in the csv file
Q3: As I understand there would be no way/chance to Filter data at the Source (PLC), correct?Since the CSV file is "New data on bottom" I use "Reverse Rows" and "Keep last Rows"
Q4: Not sure I understand the logic here. Shouldn't this be "Reverse Rows" and "Keep first Rows"?Not sure (to be tested and with Power Query expect 'surprises') but instinctively I would do - only - Keep Bottom Rows (201) instead
If there was a way to delete the data automatically once retrieved
No way with Power Query. Power Query gets data (reads only) so it won't write in any case on your SD nor anything elseTiny optimizations you can implement right away in your context (don't expect significant improvements though). In Query Options CURRENT WORKBOOK:
- Data Load: Uncheck 'Allow data previews to download in the background'
- Privacy: Check 'Ignore the Privacy Levels and potentially improve performance'EDIT (forgot)
No idea what the size of your CSV is (#rows ??) nor if this would be acceptable for the end user... Still in Query Options GLOBAL
- Data Load: Check 'Fast Data Load'
=> How Fast is Fast Data Load in Power Query?
Other things you should consider:
- If the CSV represents say 10 columns and for your calc. & graph you only need 3, get rid of the other 7 columns as early as possible in your query
- When you get data from CSV, initially columns are typed Text in Power Query Editor. Later in the APPLIED STEPS you probably have a Changed Type step, before you filter (keep top/bottom rows). The Change Type step (assuming you need Typed data for your calc. & graph) should be moved after (or even at the end of the query)If you post your complete query code I can check if there's something else you could do. But don't expect miracles if there's no way to filter/limit the amount of data before the CSV is generated
- LorenzoSilver Contributor
I might have misunderstood you. Re-reading, Andytcc said I'm actually getting the data from a csv file on an SD card in a PLC. I'm doing it with an IP address and the name of the file
So, where you trying to say he actually uses Web.Contents (probably) and not File.Contents?
(that would explain the confusion re. I am using the "get data from the web")
- mathetesSilver Contributor
What's the nature of the data you're retrieving? What's the source? Does the source itself have a way to limit based, say, on start and end dates?
- AndytccCopper ContributorI'm actually getting the data from a csv file on an SD card in a PLC. I'm doing it with an IP address and the name of the file. every hour the PLC writes 201 values in the next available row in the csv file. and i get the data from the csv file and do graphs and whatnot to my liking.
- mathetesSilver Contributor
So maybe the answer to your question is present in that methodology. Can you not just delete what's on the SD card after each time that you bring it in and convert from CSV to XLSX? Then just append the new data to what you already have in Excel?
It's a question: I'm not speaking from experience in bringing data in that way. I routinely bring in data from the internet, but via a different set of steps (specifically, I export from a financial services organization, then import that "up-to-date" data on investments).