Feb 20 2024 12:37 PM
I'm not sure if I'm wording this right. and I'm also new to the following.
I am using the "get data from the web" in Excel. I am receiving the data no problem. but as the file gets larger it takes too long to retrieve the data. is there a way to somehow delete the data every time is retrieved or what can I do so it doesn't take so long to retrieve the data.
Feb 20 2024 12:56 PM
What's the nature of the data you're retrieving? What's the source? Does the source itself have a way to limit based, say, on start and end dates?
Feb 20 2024 01:06 PM
Feb 20 2024 01:53 PM
So maybe the answer to your question is present in that methodology. Can you not just delete what's on the SD card after each time that you bring it in and convert from CSV to XLSX? Then just append the new data to what you already have in Excel?
It's a question: I'm not speaking from experience in bringing data in that way. I routinely bring in data from the internet, but via a different set of steps (specifically, I export from a financial services organization, then import that "up-to-date" data on investments).
Feb 20 2024 02:32 PM
Feb 20 2024 04:51 PM
I've thought of it, but I don't know how I would delete the data without doing it manually.
If there was a way to delete the data automatically once retrieved, I've got it made.
And I can't help you with an automated method. Given your hourly volume I can see why that would be useful. Aren't there methods with Apple's and IBM-compatible systems to automate recurring steps like that? [Let's hope somebody else with experience in that area can join in with that advice.]
What benefit would I have converting to XLSX? Doesn't the format have to be CSV for "Get Data"? As for the PLC, it will only write to CSV.
I was assuming that when you bring the CSV data into Excel, you are converting to Excel format. Is that not the case? Anyway, it was the post-import stage where I was assuming a conversion.
The other thing, I'm only the designer not the end user, even though I'll still have access.
That definitely complicates things. You'll want the process to be fool proof.
Just to explain a bit deeper how I manage retrieved Data:
Since the CSV file is "New data on bottom" I use "Reverse Rows" and "Keep last Rows". I was hoping "Keep last Rows" would solve the issue not realizing "all" the Data is retrieved every time.
Again, not knowing what kind of data this is, not knowing the source....I can only make a guess: I would hope that there'd be a way--especially if you can delete the data from that SD card--to only get the data that are fresh since the last update.
From there I use the table for all my graphs and formulas and so on. Once done and working as planned I'll have a pretty unique and sophisticated yet user friendly program.
I wish you well.
Feb 20 2024 11:13 PM - edited Feb 21 2024 12:08 AM
Hi @Andytcc
I am using the "get data from the web", later you say I'm actually getting the data from a csv file on an SD card in a PLC. Quite confusing 🙂
Q1: So CSV only, correct?
Q2: Does PLC mean 'Programmable Logic Controller? (not necessarily important but curious)
Excel / Get Data from... = Power Query
When, within Excel (or Power BI), you Get Data from Web, CSV, Excel, SQL, PDF, JSON... there's what's called a Connector (a piece of software) between the Power Query engine and the Data Source. As you understand the in used Connector depends on the targeted Data Source (CSV in your case)
Wonder why I talk about this? There's someone else to be aware of re. query exec./refresh. Things happen in 2 steps: Query optimization, then actual query execution. I encourage you to read
To my knowledge (I can't find out articles/forum article mentioning this) your CSV is accessed twice (is it read 100% each time? I don't know):
It might be read other times depending on what your query does (cf. Chris Blog Post)
(Involving another XLS/XLSX won't help at all. Reading a CSV is faster than reading an Excel file - probably due to the complexity of Excel file format/structure)
As for the PLC, it will only write to CSV...every hour the PLC writes 201 values in the next available row in the csv file
Q3: As I understand there would be no way/chance to Filter data at the Source (PLC), correct?
Since the CSV file is "New data on bottom" I use "Reverse Rows" and "Keep last Rows"
Q4: Not sure I understand the logic here. Shouldn't this be "Reverse Rows" and "Keep first Rows"?
Not sure (to be tested and with Power Query expect 'surprises') but instinctively I would do - only - Keep Bottom Rows (201) instead
If there was a way to delete the data automatically once retrieved
No way with Power Query. Power Query gets data (reads only) so it won't write in any case on your SD nor anything else
Tiny optimizations you can implement right away in your context (don't expect significant improvements though). In Query Options CURRENT WORKBOOK:
- Data Load: Uncheck 'Allow data previews to download in the background'
- Privacy: Check 'Ignore the Privacy Levels and potentially improve performance'
EDIT (forgot)
No idea what the size of your CSV is (#rows ??) nor if this would be acceptable for the end user... Still in Query Options GLOBAL
- Data Load: Check 'Fast Data Load'
=> How Fast is Fast Data Load in Power Query?
Other things you should consider:
- If the CSV represents say 10 columns and for your calc. & graph you only need 3, get rid of the other 7 columns as early as possible in your query
- When you get data from CSV, initially columns are typed Text in Power Query Editor. Later in the APPLIED STEPS you probably have a Changed Type step, before you filter (keep top/bottom rows). The Change Type step (assuming you need Typed data for your calc. & graph) should be moved after (or even at the end of the query)
If you post your complete query code I can check if there's something else you could do. But don't expect miracles if there's no way to filter/limit the amount of data before the CSV is generated
Feb 20 2024 11:57 PM
Bit more information is here Chris Webb's BI Blog: Comparing The Performance Of Reading Data From Files With File.Contents And We...
Feb 21 2024 12:40 AM
Thanks. I didn't mention that one as Curt (in Chris post) said they will likely only support http/https moving forward (5 years later file: - at least - is still supported). If @Andytcc runs a version of Excel that still receive updates and they finally make that change to the Web connector...
Feb 21 2024 12:45 AM
Quick check with CSV of 1M rows x 10 columns:
Reverse Rows + Keep Top 201 Rows is slower than Keep Bottom 201 Rows here
Feb 21 2024 01:11 AM
I might have misunderstood you. Re-reading, @Andytcc said I'm actually getting the data from a csv file on an SD card in a PLC. I'm doing it with an IP address and the name of the file
So, where you trying to say he actually uses Web.Contents (probably) and not File.Contents?
(that would explain the confusion re. I am using the "get data from the web")
Feb 21 2024 06:32 AM
With @Lorenzo and @SergeiBaklan having joined in, you're now in good hands, far more competent hands than mine.
Feb 21 2024 07:33 AM
@mathetes You're too humble and I'm not that competent, don't set the bar too high 🙂
Feb 21 2024 08:26 AM
Perhaps that's me was confused with "get data from the web". Yes, assumed web connector.
Anyway, the task is not clear enough, at least for me. If refresh is once per week that's one story, if every hour that's another. Do we do refresh manually (e.g. on file opening) or Power Automate triggered on csv file update could work. Could we modify csv file update not to keep entire history in it (or keep it separately). Etc.
Feb 21 2024 08:28 AM
Feb 21 2024 08:37 AM
Well, thank you both. I guess we have different competencies...but I do consider my expertise to be more in the region of database design, not in data retrieval with all of its (literally) ins and outs.
Feb 21 2024 08:46 AM
Perhaps that's me was confused with "get data from the web". Yes, assumed web connector
And you're probably right as after re-reading (I overlooked on 1st reading): ...with an IP address
Anyway, the task is not clear enough, at least for me
Same here. Let's see how it develops...
Feb 22 2024 07:40 AM
Feb 22 2024 08:15 AM
Feb 22 2024 08:34 AM