SOLVED

Combine data from multiple rows into a single row

Copper Contributor

My apologies for the long post... I've been wrestling with this for awhile and am at wit's end...

 

I have a table of data that contains the results of chemical analyses of water samples. There is a column with a single unique ID for each sample, and over 90 columns with the various chemical parameters (i.e., variables) and the associated data. The problem is that the data are scattered across multiple rows for each sample/analysis. Here's an example: 

 

This is what I have:

SampleIDRow#Var1Var2Var3Var4Var5Var6Var7Var8Var9Var10
4660905_10 -14.8      2574 
4660905_116.73 30.1    <2 3291
4660905_12     123122   
4660905_13     123122   
            
4661103_20     110  2235 
4661103_216.94-21427.6   114<1 3213
4661103_22      114   

 

I need to consolidate the samples into this:

SampleIDRow# Var1Var2Var3Var4Var5Var6Var7Var8Var9Var10
4660905_1 6.73-14.830.1  123122<225743291
4661103_1 6.94-21427.6  110114<122353213

 

Here are the complicating issues:

  • There is no rhyme or reason to the distribution of data - it's not like, for example, row 1 always has Var2, 3, 4, 7,; row 2 always has Var1, 5, 6, etc.; so I can't assume that each sample will have a consistent distribution of data
  • Some of the variables have the same data in multiple rows (e.g., Var6 and Var7 in the example above)
  • Some of the values are not considered numbers by Excel (see Var8 above)
  • The overall data table has thousands of samples; hundreds of them have multiple rows and the number of multiple rows ranges from 2-5 rows per sample. I can do this manually, but it will take at least an entire day to get through everything.

Does anyone know of a simple way to do this? Pivot tables won't work because of the non-numeric entries. I've looked into the Power Query tool, but can't figure out how to do this with so many variable columns. Any help would be greatly appreciated.

 

-Matt

5 Replies
best response confirmed by Dr. U (Copper Contributor)
Solution

Hi Matt,

 

Attached is the sample with Power Query. That's done not optimal way to avoid coding and do all from UI, just to demonstrate the approach.

 

In brief, we unpivot other than SampleID columns, remove duplicates (as Var6, Var7), in second query Columns generate list of columns to be sure empty ones will be included (Var4, Var5), append that query to first one and pivot on attribute finally. Not sure if you need empty Row# column in final result, could be added.

Please see attached.

Sergei,

 

Thanks so much for your response. I think I understand what you did there, but I'm not sure if I can re-created it with my main spreadsheet of data. I'll give it a try, and I may ask for more guidance if you're willing to help. 

- Matt

Matt,

 

Sure, will be glad to help. The only do not promise immediate answers.

Hi Sergei,
I have a PQ merge that is spitting our double rows (62 rather than 31) because the source files include all columns the same but some cells of data reflect "---".

Would this method be the most efficient to use on larger datasets? My file has several columns.


@1002043 

Hi, - yes, it shall work

1 best response

Accepted Solutions
best response confirmed by Dr. U (Copper Contributor)
Solution

Hi Matt,

 

Attached is the sample with Power Query. That's done not optimal way to avoid coding and do all from UI, just to demonstrate the approach.

 

In brief, we unpivot other than SampleID columns, remove duplicates (as Var6, Var7), in second query Columns generate list of columns to be sure empty ones will be included (Var4, Var5), append that query to first one and pivot on attribute finally. Not sure if you need empty Row# column in final result, could be added.

Please see attached.

View solution in original post