Forum Discussion

ColinJHarrison's avatar
ColinJHarrison
Brass Contributor
Nov 11, 2022
Solved

Filtering poorly formatted source data...

Hi,

I have a weirdly arranged data set that I am importing into Excel via Power Query.

My source has the structure I have indicated in the attached example (although the example is greatly simplified), i.e. the columns are divided by sub-headers (highlighted in the example), and the values that I am interested in – the Regional data – repeat multiple times in each subsection.

 

What I need are Subsection:Region:Value associations, i.e.

1A Region1 : Value1

1A Region2 : Value2

1A : Region3 : Value3

1B Region1 : Value4

1B Region2 : Value5

etc.

(They don't have to be represented like that neccessarily, but that's the data structure I need).

Note that the subsections (1A, 1B, 1C) are not necessarily the same length: some subsections might be missing data on one or more sites, meaning that the row count for each section is unpredictable… (so, I can’t just use Group by number of rows, in the Power Query import, for instance). This could also change between imports, so any solution has to be able to adapt to variable row counts.

 

The way I initially did it was to remove the 2 Site columns altogether in the Power Query import, leaving just Region and RegData, and then just call UNIQUE() on the resulting import table, but that only works if no region has the same value in 2 different subsections, which could happen, and would break downstream calculations if it did (because my UNIQUE() function would return a range missing a row). It would also be a problem if, by chance, one subsection had no values for a particular Region. I need the solution to return a null value in that case (e.g. 1B : Region2 : NULL), or in other words, the solution has to return a 3x3 range, even if some values are empty, and/or some values are the same...

 

I hope that makes sense.

 

Many thanks for any suggestions.

 

6 Replies

Resources