matching rows in complex data set

Copper Contributor

I have some large data sets that i need to query to isolate specific data rows.

 

Each row is identified by a date-based designation, with many columns of data following the date.

There may be 3000-5000 rows in each data set.

From a companion data set, I've identified ...say... 500 rows that I want to isolate in the original data set so I can examine those rows in more detail.    As an example, for the rows listed below, I've determined that I want to pull out

the rows ending in 0102,  0301, and 0305.

 

I can do this manually, spending a heck of a lot of time on each search.

 

I'm hoping that someone can suggest a better way to do this.... i.e. use the list of 500 sample numbers to isolate the rows out of a table with 5000 rows.

 

Thanks!

IRM2014010101
IRM2014010102
IRM2014010103
IRM2014010104
IRM2014010201
IRM2014010202
IRM2014010301
IRM2014010302
IRM2014010303
IRM2014010304
IRM2014010305
9 Replies

@FishDoc A quick formula based solution would be to add a column that extracts the four rightmost characters from the IRM code and then finds a match in the list of selected codes.

finally, filter all the row with TRUE.

 

Example attached.

 

Now, a more solid solution would, in my opinion be to use Power Query, where you connect to both data sets and merge ("attach") data from the full set to the codes in the companion set. It would create a new data set without the need to create formulae and manually filter out the relevant rows.

Thanks Riny!

I got lost working thru the Power Query approach....

The Match and filter approach seems simpler, but have a question about what function to use for identifying the match?

@FishDoc The formula in my example is in column A.

 

=ISNUMBER(MATCH(RIGHT(B2,4),selected,0))

 

where "selected" is a named range. So, what's your question?

OK, I copied column B from your example into a new spreadsheet, including the "selected" values in a labeled block at the bottom of the column. The formula was placed in the appropriate spot in column A. Whether I used the word "selected" as in your formula, or replaced it with the fixed Block range ($B$21:$B$23), I got a response of FALSE for all pairings......

Any clue what I'm doing wrong?

Thanks!

@FishDoc Please upload the file. Then I can have a look.

@Riny_van_Eekelen 

 

Good morning!

 

I very much appreciate your assistance, and your willingness to train me in some of the finer points of Excel.

Attached is the file of my 2017 data (I have similar files for 23 years).

 

Column A are the sample numbers for data on fish collected by a particular type of net (a 180 m seine).  Column E are the sample numbers for habitat data collected as the field crew sampled fishes using many different types of nets (this matrix actually required me to do a lot of processing to reduce/reformat the data into the the simple list contained in this file).  Since the fish we're working with were effectively caught only with the 180 m seine net, I need to extract the habitat data associated only with those sample numbers (Column A).

 

I've been doing this by brute force....  Shifting values in Column A downward to match the same values in Column E, then sorting the whole matrix on Column A and discarding rows that don't match A and E.  It's a slow and painful process, and I hadn't had much luck in trying to find a way to automate the matching process.

 

Your guidance is much appreciated!

 

Jon

@FishDoc 

You may add helper column to select data which match column A as

image.png

If you are on Excel 365 now formula could be

=COUNTIF(A4:A226, E4:E827)

otherwise

=COUNTIF($A$4:$A$226,E4)

and drag it down. Apply filter to this column and unselect zero. Better to have data from column A in separate sheet.

@FishDoc You real data is slightly different than your original example where you suggested to look at the last four numbers in the Habitat Reference. But now you seem to look for the entire IRM-code.

 

In the attached file, I moved the shorter Reference list to a separate sheet and gave it a name "reference" used in the MATCH formula. Then I also put the headers in one row (you had "Habitat" and "Reference" is two separate cells). Finally, I put a filter on the longer list, so now you can select only TRUE in column A.

 

 

Hi @FishDoc ,

 

I used power query on your sample file:

 

Step 1:

Made your exclusion list into an excel table, and did a power query on it:

Yea_So_0-1624212040447.png

Yea_So_1-1624212134631.png

 

Step 2:

Defined the Dataset into an excel table, and did a power query on it:

Yea_So_2-1624212205305.png

 

Yea_So_3-1624212229865.png

Step 3:

Did a Merge Query:

When you get to this dialog box I selected the Exclusion list as the Left (top)

and selected the Dataset as the right (bottom), then selected the Anti-Right Join,

meaning only show the rows in the Right table that does not match the Left table.

Yea_So_4-1624212394355.png

The result:

Yea_So_5-1624212534290.png

 

File is attached.

 

Cheers