Forum Discussion
Andytcc
Feb 20, 2024Copper Contributor
Get Data
I'm not sure if I'm wording this right. and I'm also new to the following. I am using the "get data from the web" in Excel. I am receiving the data no problem. but as the file gets larger it takes t...
Lorenzo
Feb 21, 2024Silver Contributor
Hi Andytcc
I am using the "get data from the web", later you say I'm actually getting the data from a csv file on an SD card in a PLC. Quite confusing 🙂
Q1: So CSV only, correct?
Q2: Does PLC mean 'Programmable Logic Controller? (not necessarily important but curious)
Excel / Get Data from... = Power Query
When, within Excel (or Power BI), you Get Data from Web, CSV, Excel, SQL, PDF, JSON... there's what's called a Connector (a piece of software) between the Power Query engine and the Data Source. As you understand the in used Connector depends on the targeted Data Source (CSV in your case)
Wonder why I talk about this? There's someone else to be aware of re. query exec./refresh. Things happen in 2 steps: Query optimization, then actual query execution. I encourage you to read
- Overview of query evaluation and query folding in Power Query
- Why does my query run multiple times?
- Why Does Power BI Query My Data Source More Than Once?
To my knowledge (I can't find out articles/forum article mentioning this) your CSV is accessed twice (is it read 100% each time? I don't know):
- During Query optimization
- During Query actual execution
It might be read other times depending on what your query does (cf. Chris Blog Post)
(Involving another XLS/XLSX won't help at all. Reading a CSV is faster than reading an Excel file - probably due to the complexity of Excel file format/structure)
As for the PLC, it will only write to CSV...every hour the PLC writes 201 values in the next available row in the csv file
Q3: As I understand there would be no way/chance to Filter data at the Source (PLC), correct?
Since the CSV file is "New data on bottom" I use "Reverse Rows" and "Keep last Rows"
Q4: Not sure I understand the logic here. Shouldn't this be "Reverse Rows" and "Keep first Rows"?
Not sure (to be tested and with Power Query expect 'surprises') but instinctively I would do - only - Keep Bottom Rows (201) instead
If there was a way to delete the data automatically once retrieved
No way with Power Query. Power Query gets data (reads only) so it won't write in any case on your SD nor anything else
Tiny optimizations you can implement right away in your context (don't expect significant improvements though). In Query Options CURRENT WORKBOOK:
- Data Load: Uncheck 'Allow data previews to download in the background'
- Privacy: Check 'Ignore the Privacy Levels and potentially improve performance'
EDIT (forgot)
No idea what the size of your CSV is (#rows ??) nor if this would be acceptable for the end user... Still in Query Options GLOBAL
- Data Load: Check 'Fast Data Load'
=> How Fast is Fast Data Load in Power Query?
Other things you should consider:
- If the CSV represents say 10 columns and for your calc. & graph you only need 3, get rid of the other 7 columns as early as possible in your query
- When you get data from CSV, initially columns are typed Text in Power Query Editor. Later in the APPLIED STEPS you probably have a Changed Type step, before you filter (keep top/bottom rows). The Change Type step (assuming you need Typed data for your calc. & graph) should be moved after (or even at the end of the query)
If you post your complete query code I can check if there's something else you could do. But don't expect miracles if there's no way to filter/limit the amount of data before the CSV is generated
- SergeiBaklanFeb 21, 2024MVP
Bit more information is here Chris Webb's BI Blog: Comparing The Performance Of Reading Data From Files With File.Contents And Web.Contents In Power Query And Power BI (crossjoin.co.uk)
- LorenzoFeb 21, 2024Silver Contributor
I might have misunderstood you. Re-reading, Andytcc said I'm actually getting the data from a csv file on an SD card in a PLC. I'm doing it with an IP address and the name of the file
So, where you trying to say he actually uses Web.Contents (probably) and not File.Contents?
(that would explain the confusion re. I am using the "get data from the web")
- SergeiBaklanFeb 21, 2024MVP
Perhaps that's me was confused with "get data from the web". Yes, assumed web connector.
Anyway, the task is not clear enough, at least for me. If refresh is once per week that's one story, if every hour that's another. Do we do refresh manually (e.g. on file opening) or Power Automate triggered on csv file update could work. Could we modify csv file update not to keep entire history in it (or keep it separately). Etc.
- LorenzoFeb 21, 2024Silver Contributor
Thanks. I didn't mention that one as Curt (in Chris post) said they will likely only support http/https moving forward (5 years later file: - at least - is still supported). If Andytcc runs a version of Excel that still receive updates and they finally make that change to the Web connector...