Forum Discussion
HELP! How do I automate data cleaning from a continually updating source
- Jan 04, 2023
(SergeiBaklan Thanks for joining)
That is why I decided to manually download the csv file instead of using the IT teams Power BI dataset, as the source data is in a tabular table form, however I will have to pull the data weekly and drop it into an excel sheet and then use Power Query to automate the cleaning process
NO. Once you have a CSV you don't have to "drop it into an Excel sheet", that step is unecessary
From Excel, you go to Data (tab) > From Text/CSV
you then select your CSV (the Power Query Editor opens) and you begin your cleaning process
Once you have completed the cleaning process and output/returned the query result to an Excel sheet, next time you download a CSV you copy it (overwrite the previous CSV) in the same folder and Refresh your query
Thank you for your quick reply! I have been advised to connect to the data via a Power BI database within my organisation using a Pivot Table. The team who manage the database get it from the 3rd party source and do a basic clean of it before uploading it to the database. I also have the ability to download the raw data as a csv file from the third party source, however the data isn't in great shape and the data team clean it before uploading it to their Power BI database. They just don't have the product knowledge or time to go to the level of correcting the issues I have mentioned.
Unfortunately, I am unable to locate or connect to the database via Power Query, and I believe this is a common issue. I have considered work arounds such a loading the data into a Pivot Table as advised by the data team, and then using Power Query to transform the data to tabular form and edit it.
I am fairly new to Excel, so want to make sure I don't end up with long complex data chains if there is a more simple solution.
As variant you may ask them based on dataset to create report with visual which have raw information you need. Having access to such report you may export underlying data into csv or Excel file. Report could be in workspace or in Power BI app.
If to create paginated report it could be exported automatically with Power Automate.
But all of that requires additional work from your IT.
If connect to Power BI dataset cube perhaps that's better to convert PivotTable into cube formulae (or to generate them from scratch for that connection) and adjust them for the corrections. At least that will be refreshable.
If Power Query, formulae or PivotTable, when from another file to keep refreshing.