Forum Discussion
Combine data from multiple rows into a single row
My apologies for the long post... I've been wrestling with this for awhile and am at wit's end...
I have a table of data that contains the results of chemical analyses of water samples. There is a column with a single unique ID for each sample, and over 90 columns with the various chemical parameters (i.e., variables) and the associated data. The problem is that the data are scattered across multiple rows for each sample/analysis. Here's an example:
This is what I have:
| SampleID | Row# | Var1 | Var2 | Var3 | Var4 | Var5 | Var6 | Var7 | Var8 | Var9 | Var10 |
| 4660905_1 | 0 | -14.8 | 2574 | ||||||||
| 4660905_1 | 1 | 6.73 | 30.1 | <2 | 3291 | ||||||
| 4660905_1 | 2 | 123 | 122 | ||||||||
| 4660905_1 | 3 | 123 | 122 | ||||||||
| 4661103_2 | 0 | 110 | 2235 | ||||||||
| 4661103_2 | 1 | 6.94 | -214 | 27.6 | 114 | <1 | 3213 | ||||
| 4661103_2 | 2 | 114 |
I need to consolidate the samples into this:
| SampleID | Row# | Var1 | Var2 | Var3 | Var4 | Var5 | Var6 | Var7 | Var8 | Var9 | Var10 |
| 4660905_1 | 6.73 | -14.8 | 30.1 | 123 | 122 | <2 | 2574 | 3291 | |||
| 4661103_1 | 6.94 | -214 | 27.6 | 110 | 114 | <1 | 2235 | 3213 |
Here are the complicating issues:
- There is no rhyme or reason to the distribution of data - it's not like, for example, row 1 always has Var2, 3, 4, 7,; row 2 always has Var1, 5, 6, etc.; so I can't assume that each sample will have a consistent distribution of data
- Some of the variables have the same data in multiple rows (e.g., Var6 and Var7 in the example above)
- Some of the values are not considered numbers by Excel (see Var8 above)
- The overall data table has thousands of samples; hundreds of them have multiple rows and the number of multiple rows ranges from 2-5 rows per sample. I can do this manually, but it will take at least an entire day to get through everything.
Does anyone know of a simple way to do this? Pivot tables won't work because of the non-numeric entries. I've looked into the Power Query tool, but can't figure out how to do this with so many variable columns. Any help would be greatly appreciated.
-Matt
Hi Matt,
Attached is the sample with Power Query. That's done not optimal way to avoid coding and do all from UI, just to demonstrate the approach.
In brief, we unpivot other than SampleID columns, remove duplicates (as Var6, Var7), in second query Columns generate list of columns to be sure empty ones will be included (Var4, Var5), append that query to first one and pivot on attribute finally. Not sure if you need empty Row# column in final result, could be added.
Please see attached.
5 Replies
- SergeiBaklanDiamond Contributor
Hi Matt,
Attached is the sample with Power Query. That's done not optimal way to avoid coding and do all from UI, just to demonstrate the approach.
In brief, we unpivot other than SampleID columns, remove duplicates (as Var6, Var7), in second query Columns generate list of columns to be sure empty ones will be included (Var4, Var5), append that query to first one and pivot on attribute finally. Not sure if you need empty Row# column in final result, could be added.
Please see attached.
- Dr. UCopper Contributor
Sergei,
Thanks so much for your response. I think I understand what you did there, but I'm not sure if I can re-created it with my main spreadsheet of data. I'll give it a try, and I may ask for more guidance if you're willing to help.
- Matt
- SergeiBaklanDiamond Contributor
Matt,
Sure, will be glad to help. The only do not promise immediate answers.