Forum Discussion
HELP! How do I automate data cleaning from a continually updating source
- Jan 04, 2023
(SergeiBaklan Thanks for joining)
That is why I decided to manually download the csv file instead of using the IT teams Power BI dataset, as the source data is in a tabular table form, however I will have to pull the data weekly and drop it into an excel sheet and then use Power Query to automate the cleaning process
NO. Once you have a CSV you don't have to "drop it into an Excel sheet", that step is unecessary
From Excel, you go to Data (tab) > From Text/CSV
you then select your CSV (the Power Query Editor opens) and you begin your cleaning process
Once you have completed the cleaning process and output/returned the query result to an Excel sheet, next time you download a CSV you copy it (overwrite the previous CSV) in the same folder and Refresh your query
https://support.microsoft.com/en-us/office/about-power-query-in-excel-7104fbee-9e62-4cb9-a02e-5bfb1a6c536a could be what you're looking for. The key question is: can we connect Excel to your data source?
You said: the data is received through an OLAP server feed on a daily basis (I can download...
Q: Where are the data actually stored (i.e. a SQL Database) so you can dowload them?
Thank you for your quick reply! I have been advised to connect to the data via a Power BI database within my organisation using a Pivot Table. The team who manage the database get it from the 3rd party source and do a basic clean of it before uploading it to the database. I also have the ability to download the raw data as a csv file from the third party source, however the data isn't in great shape and the data team clean it before uploading it to their Power BI database. They just don't have the product knowledge or time to go to the level of correcting the issues I have mentioned.
Unfortunately, I am unable to locate or connect to the database via Power Query, and I believe this is a common issue. I have considered work arounds such a loading the data into a Pivot Table as advised by the data team, and then using Power Query to transform the data to tabular form and edit it.
I am fairly new to Excel, so want to make sure I don't end up with long complex data chains if there is a more simple solution.
- SergeiBaklanJan 04, 2023Diamond Contributor
As variant you may ask them based on dataset to create report with visual which have raw information you need. Having access to such report you may export underlying data into csv or Excel file. Report could be in workspace or in Power BI app.
If to create paginated report it could be exported automatically with Power Automate.
But all of that requires additional work from your IT.
If connect to Power BI dataset cube perhaps that's better to convert PivotTable into cube formulae (or to generate them from scratch for that connection) and adjust them for the corrections. At least that will be refreshable.
If Power Query, formulae or PivotTable, when from another file to keep refreshing.
- LorenzoJan 05, 2023Silver Contributor
Hi SergeiBaklan
A bit off topic so only if you know/have time to check...
When you connect a PowerBI Dataset to Excel for further analysis, are the Tables and Measures "loaded" to the workbook Data Model?
If question not clear let me know. Thanks
- SergeiBaklanJan 05, 2023Diamond Contributor
Forgot to add, we may create implicit measure dragging table field into values if it is allowed for aggregation.
- Lauren_MichelleJan 04, 2023Copper Contributor
I am pretty new to Excel so not 100% versed in everything you have mentioned, but what I can say is that our in houses data team have used the same raw data sent to them automatically from the third party to create a Power BI dataset. They have told me to access this using a Pivot Table in excel, however I am unsure how to pull the underlying data from this, and accessing the Power BI dataset (OLAP) via a Pivot table is very limiting to my ability to automate the data cleaning as I can't turn the data into tabular form (Pivot table parent group restrictions) etc.
That is why I decided to manually download the csv file instead of using the IT teams Power BI dataset, as the source data is in a tabular table form, however I will have to pull the data weekly and drop it into an excel sheet and then use Power Query to automate the cleaning process.
To be honest I am very new to excel so a bit lost 🙂- LorenzoJan 04, 2023Silver Contributor
(SergeiBaklan Thanks for joining)
That is why I decided to manually download the csv file instead of using the IT teams Power BI dataset, as the source data is in a tabular table form, however I will have to pull the data weekly and drop it into an excel sheet and then use Power Query to automate the cleaning process
NO. Once you have a CSV you don't have to "drop it into an Excel sheet", that step is unecessary
From Excel, you go to Data (tab) > From Text/CSV
you then select your CSV (the Power Query Editor opens) and you begin your cleaning process
Once you have completed the cleaning process and output/returned the query result to an Excel sheet, next time you download a CSV you copy it (overwrite the previous CSV) in the same folder and Refresh your query
- LorenzoJan 04, 2023Silver Contributor
If your data source is actually a PowerBI Dataset not sure I can help as I have a limited understanding (mainly due to the fact that I don't have access to something like that) of what it is and how it can be used within Excel. So, instead of saying something wrong I prefer to stop there
However, reading https://learn.microsoft.com/en-us/power-bi/connect-data/service-datasets-understand I'm under the impression your Data Team could provide you something easier to consume (i.e. an Excel workbook with Table(s)) with Get & Transform than a PivotTable
Despite its name a PivoTable isn't a Table. Connecting Power Query to the worksheet containing a PivotTable is doable but I would take this approach as the (almost) last option. If your Data Team can provide you an Excel workbook with Table(s) or a CSV file that would be better