Enforcing Data Limits/Requirements in Power Query

Copper Contributor

Hi - I'm new to using Power Pivot/Power Query, but I have some files that are too large for standard Excel and I don't feel like diving straight into Access, so Power Query it is. With that out of the way -

 

I'm trying to create a query that I can use repeatedly to load/transform data and identify 'errors' that need to be fixed, and then ultimately saving the final data as a .CSV. Ultimately it would be nice to have a set of headers that I need to use in the final data, and then map whatever source file I have to those column headers. I'm fine just renaming the headers in the source data to match the final headers, but what I really would like is an ability to create data restrictions or requirements for the columns I'm using and then use Power Query to find outliers that don't match.

 

E.g. let's say one of the columns is always going to be "Service_Code_ID" and it has a character limit of 50 characters. How can I write a repeatable query that find all values in the Service_Code_ID column that exceed 50 characters and display them so that I can update them appropriately? There are probably 12-15 columns total I'm going to be working with and each one has its own set of characteristics that I want to check and flag anything that doesn't qualify.

 

My initial thought was to write some kind of formula for each column and then use conditional formatting to highlight the problem values. At that point I could either filter on highlights or query just rows that contain a highlighted value maybe - but as complicated as it seems like it will be no matter what, using conditional formatting seems more complicated and less easily repeatable.

 

Any ideas?

2 Replies

@MicrosoftMacros2011 In PowerQuery (PQ) you can select a column. Then on the "Add column" tab, choose Extract, Length. This will create a new column with the lengths of the texts in the original columns. Now you can easily filter out the ones that exceed 50. 

 

You don't mention anything about what the other columns characteristics are, so I can't say how to check those. But very likely you can do it in a similar way. Perhaps you have learn to write some M-code as well.  Once you have set this up correctly, you can connect to new data and refresh the query (or queries) over and over again.

Hmm sounds like looking into some M-Code might be best eventually, but also sounds like Power Query doesn't have something out of the box. An additional column is what I would do in Excel and use the LEN function; but I usually only do that if I'm concerned something exceeds a length and I wouldn't want to do an extra column per actual column. Especially not if I eventually needed 1 column per "characteristic." I don't have all of the requirements in front of me, but it's your standard gamut of restrictions: specific lengths, some alpha/numeric, some only numeric, one column has specific text that each value needs to match and only 4 possible options, etc.