SOLVED

Filtering poorly formatted source data...

Occasional Contributor

Hi,

I have a weirdly arranged data set that I am importing into Excel via Power Query.

My source has the structure I have indicated in the attached example (although the example is greatly simplified), i.e. the columns are divided by sub-headers (highlighted in the example), and the values that I am interested in – the Regional data – repeat multiple times in each subsection.

 

What I need are Subsection:Region:Value associations, i.e.

1A Region1 : Value1

1A Region2 : Value2

1A : Region3 : Value3

1B Region1 : Value4

1B Region2 : Value5

etc.

(They don't have to be represented like that neccessarily, but that's the data structure I need).

Note that the subsections (1A, 1B, 1C) are not necessarily the same length: some subsections might be missing data on one or more sites, meaning that the row count for each section is unpredictable… (so, I can’t just use Group by number of rows, in the Power Query import, for instance). This could also change between imports, so any solution has to be able to adapt to variable row counts.

 

The way I initially did it was to remove the 2 Site columns altogether in the Power Query import, leaving just Region and RegData, and then just call UNIQUE() on the resulting import table, but that only works if no region has the same value in 2 different subsections, which could happen, and would break downstream calculations if it did (because my UNIQUE() function would return a range missing a row). It would also be a problem if, by chance, one subsection had no values for a particular Region. I need the solution to return a null value in that case (e.g. 1B : Region2 : NULL), or in other words, the solution has to return a 3x3 range, even if some values are empty, and/or some values are the same...

 

I hope that makes sense.

 

Many thanks for any suggestions.

 

6 Replies
best response confirmed by ColinJHarrison (Occasional Contributor)
Solution

@ColinJHarrison Perhaps something like in the attached file?

Screenshot 2022-11-11 at 08.44.03.png

 

 

@Riny_van_Eekelen as an output? Yes.

Are there formulas in your attachment? I'm looking at it on my phone, and I can't see any formulas. 

@ColinJHarrison No formulas. Just Power Query.

@Riny_van_Eekelen ah ok. Don't think I can open that on my phone! I'll check it out when I'm home. Thanks Riny!

@Riny_van_Eekelen 

Hi Riny. I think your solution has shown me the way to solve my problem, so thank you very much! (Again!)

One question however - do you write your M code directly, or is everything you've put in this solution achievable using the input wizards provided by Power Query? How for instance did you build your custom column call:

= Table.AddColumn(#"Removed columns", "Subsection", each if Text.Contains([Column1], "Region -") then [Column1] else null)

 

At this stage I would have no idea how to hand code someting like that, and I'm just wondering how much of this kind of info I need to be on top of to come up with solutions like this myself...

 

Cheers

@ColinJHarrison You can get quite far in PQ by only clicking on the correct buttons and fill in the options that pop up. But, at some point you need to learn some M-coding in order to make more complex transformation. Not difficult, but it takes time to learn.

 

The line of code you refer to isn't really that special. On the Add Column ribbon you click Custom column.

In the window that opens, you type the name you want to give to the new column in the designated space and then the formula:

if Text.Contains([Column1], "Region -") then [Column1] else null

in the box below. And OK.

PQ then comes up with the rest of the code. I.e. using the function Table.AddColumn, the previous step name, the new column name, the 'each' bit and the closing bracket.

 

By reviewing the whole code, however, you'll soon figure out how it works and how to manipulate it. And then you'll find plenty of resources on line with practical examples of the various M-functions.