Forum Discussion
HELP! How do I automate data cleaning from a continually updating source
- Jan 04, 2023
(SergeiBaklan Thanks for joining)
That is why I decided to manually download the csv file instead of using the IT teams Power BI dataset, as the source data is in a tabular table form, however I will have to pull the data weekly and drop it into an excel sheet and then use Power Query to automate the cleaning process
NO. Once you have a CSV you don't have to "drop it into an Excel sheet", that step is unecessary
From Excel, you go to Data (tab) > From Text/CSV
you then select your CSV (the Power Query Editor opens) and you begin your cleaning process
Once you have completed the cleaning process and output/returned the query result to an Excel sheet, next time you download a CSV you copy it (overwrite the previous CSV) in the same folder and Refresh your query
Thank you for your quick reply! I have been advised to connect to the data via a Power BI database within my organisation using a Pivot Table. The team who manage the database get it from the 3rd party source and do a basic clean of it before uploading it to the database. I also have the ability to download the raw data as a csv file from the third party source, however the data isn't in great shape and the data team clean it before uploading it to their Power BI database. They just don't have the product knowledge or time to go to the level of correcting the issues I have mentioned.
Unfortunately, I am unable to locate or connect to the database via Power Query, and I believe this is a common issue. I have considered work arounds such a loading the data into a Pivot Table as advised by the data team, and then using Power Query to transform the data to tabular form and edit it.
I am fairly new to Excel, so want to make sure I don't end up with long complex data chains if there is a more simple solution.
As variant you may ask them based on dataset to create report with visual which have raw information you need. Having access to such report you may export underlying data into csv or Excel file. Report could be in workspace or in Power BI app.
If to create paginated report it could be exported automatically with Power Automate.
But all of that requires additional work from your IT.
If connect to Power BI dataset cube perhaps that's better to convert PivotTable into cube formulae (or to generate them from scratch for that connection) and adjust them for the corrections. At least that will be refreshable.
If Power Query, formulae or PivotTable, when from another file to keep refreshing.
- LorenzoJan 05, 2023Silver Contributor
Hi SergeiBaklan
A bit off topic so only if you know/have time to check...
When you connect a PowerBI Dataset to Excel for further analysis, are the Tables and Measures "loaded" to the workbook Data Model?
If question not clear let me know. Thanks
- SergeiBaklanJan 05, 2023Diamond Contributor
Forgot to add, we may create implicit measure dragging table field into values if it is allowed for aggregation.
- SergeiBaklanJan 05, 2023Diamond Contributor
Hi Lorenzo
Connecting to Power BI dataset you are connecting to Analysis Services cube. Thus you see all not-hided measures and tables for this cube. It is not loaded to Excel data model, we have only connection. Thus the only we can do is to create PivotTable based on such cube, or use cube formulae to generate the report.
The only modification we could add that is MDX calculated measure or member. Having into account quite poor intellisense working with MDX in Excel I'd don't recommend this way if only you are not a MDX pro.
In brief, nothing is loaded into Excel data model, we are working with the connected cube.
- LorenzoJan 05, 2023Silver Contributor
Crystal clear SergeiBaklan THANKS MUCH!!!
- Lauren_MichelleJan 04, 2023Copper Contributor
I am pretty new to Excel so not 100% versed in everything you have mentioned, but what I can say is that our in houses data team have used the same raw data sent to them automatically from the third party to create a Power BI dataset. They have told me to access this using a Pivot Table in excel, however I am unsure how to pull the underlying data from this, and accessing the Power BI dataset (OLAP) via a Pivot table is very limiting to my ability to automate the data cleaning as I can't turn the data into tabular form (Pivot table parent group restrictions) etc.
That is why I decided to manually download the csv file instead of using the IT teams Power BI dataset, as the source data is in a tabular table form, however I will have to pull the data weekly and drop it into an excel sheet and then use Power Query to automate the cleaning process.
To be honest I am very new to excel so a bit lost 🙂- LorenzoJan 04, 2023Silver Contributor
(SergeiBaklan Thanks for joining)
That is why I decided to manually download the csv file instead of using the IT teams Power BI dataset, as the source data is in a tabular table form, however I will have to pull the data weekly and drop it into an excel sheet and then use Power Query to automate the cleaning process
NO. Once you have a CSV you don't have to "drop it into an Excel sheet", that step is unecessary
From Excel, you go to Data (tab) > From Text/CSV
you then select your CSV (the Power Query Editor opens) and you begin your cleaning process
Once you have completed the cleaning process and output/returned the query result to an Excel sheet, next time you download a CSV you copy it (overwrite the previous CSV) in the same folder and Refresh your query
- Lauren_MichelleJan 04, 2023Copper ContributorThank you that makes a lot of sense 🙂