Forum Discussion

azimer's avatar
azimer
Copper Contributor
Mar 07, 2026

Cleaning messy excel/csv form ERP system

Hi,

 

I’m curious how people here usually deal with messy exports coming from ERP or accounting systems.

 

I often receive Excel or CSV files that are not really ready to use as a table. Things like:

 

-a few rows of system info before the header

 

-totals or notes mixed inside the data

 

-different date formats in the same column

 

-currency symbols and formats all over the place

 

-sometimes even another small table sitting under the first one

 

Before doing any analysis (excel, power BI...) I usually have to spend a bit of time cleaning the file so it becomes a normal structured table.

 

Power query definitely helps, but depending on the export I sometimes still end up doing several cleanup steps first.

 

I’m curious what the usual workflow is for people here. Do you normally build a reusable Power Query transformation for these reports, or do you handle each export differently?

 

I recently walked through one messy export example here while documenting the cleanup steps if anyone is curious about the kind of file I mean: 

 

https://www.mubsiraanalytics.com/blog/how-to-extract-clean-tables-from-messy-excel-csv

 

Mostly just interested to see how others approach this.

1 Reply

  • mathetes's avatar
    mathetes
    Gold Contributor

    Some interesting questions. I feel your pain. I regularly import financial data from

    • a credit card account with American Express
    • investment accounts with Fidelity

    The former, when I use their XLSX format, is utterly frustrating because whoever designed their excel version chose to make it a "pretty" spreadsheet--merged cells, colors, multiple lines of things like address. As a result, the CSV file is far more useful, but I would love to see them make the Excel useful as an Excel table.

    The latter does allow me to export a good Excel table of data, but at the top they have a huge merged cell, followed by a blank row, then the headers, then the data, beginning at row 4. Then at the bottom of the useful table there are two blank rows followed by "Disclosure" and a paragraph of legalese. To import only the data in the table, I've written:

    • a formula that finds the row in which the word "Disclosure" appears in that raw export file
    •  another formula that imports only the table of data from that raw export file. My working workbook uses a dynamic array function to import the data, but knowing that I always want to ignore the first three rows, I set the import function to import $A$4 through the final column ($T) and that row determined by the prior formula. In other words, I'm able to ignore the header and footer that Fidelity is legally obligated to clutter their file with. 

    Other than that, I often have to convert text dates to Excel dates, or text representations of dollar values into actual numeric fields.