Jun 17 2021 06:01 PM
Jun 17 2021 06:01 PM
I have some large data sets that i need to query to isolate specific data rows.
Each row is identified by a date-based designation, with many columns of data following the date.
There may be 3000-5000 rows in each data set.
From a companion data set, I've identified ...say... 500 rows that I want to isolate in the original data set so I can examine those rows in more detail. As an example, for the rows listed below, I've determined that I want to pull out
the rows ending in 0102, 0301, and 0305.
I can do this manually, spending a heck of a lot of time on each search.
I'm hoping that someone can suggest a better way to do this.... i.e. use the list of 500 sample numbers to isolate the rows out of a table with 5000 rows.
Jun 17 2021 09:10 PM - edited Jun 18 2021 07:48 AM
@FishDoc A quick formula based solution would be to add a column that extracts the four rightmost characters from the IRM code and then finds a match in the list of selected codes.
finally, filter all the row with TRUE.
Now, a more solid solution would, in my opinion be to use Power Query, where you connect to both data sets and merge ("attach") data from the full set to the codes in the companion set. It would create a new data set without the need to create formulae and manually filter out the relevant rows.
Jun 18 2021 07:42 AM
Jun 18 2021 12:40 PM
Jun 19 2021 05:35 AM
I very much appreciate your assistance, and your willingness to train me in some of the finer points of Excel.
Attached is the file of my 2017 data (I have similar files for 23 years).
Column A are the sample numbers for data on fish collected by a particular type of net (a 180 m seine). Column E are the sample numbers for habitat data collected as the field crew sampled fishes using many different types of nets (this matrix actually required me to do a lot of processing to reduce/reformat the data into the the simple list contained in this file). Since the fish we're working with were effectively caught only with the 180 m seine net, I need to extract the habitat data associated only with those sample numbers (Column A).
I've been doing this by brute force.... Shifting values in Column A downward to match the same values in Column E, then sorting the whole matrix on Column A and discarding rows that don't match A and E. It's a slow and painful process, and I hadn't had much luck in trying to find a way to automate the matching process.
Your guidance is much appreciated!
Jun 19 2021 05:54 AM
You may add helper column to select data which match column A as
If you are on Excel 365 now formula could be
and drag it down. Apply filter to this column and unselect zero. Better to have data from column A in separate sheet.
Jun 19 2021 05:55 AM
@FishDoc You real data is slightly different than your original example where you suggested to look at the last four numbers in the Habitat Reference. But now you seem to look for the entire IRM-code.
In the attached file, I moved the shorter Reference list to a separate sheet and gave it a name "reference" used in the MATCH formula. Then I also put the headers in one row (you had "Habitat" and "Reference" is two separate cells). Finally, I put a filter on the longer list, so now you can select only TRUE in column A.
Jun 20 2021 11:10 AM
Hi @FishDoc ,
I used power query on your sample file:
Made your exclusion list into an excel table, and did a power query on it:
Defined the Dataset into an excel table, and did a power query on it:
Did a Merge Query:
When you get to this dialog box I selected the Exclusion list as the Left (top)
and selected the Dataset as the right (bottom), then selected the Anti-Right Join,
meaning only show the rows in the Right table that does not match the Left table.
File is attached.