Nov 04 2022 12:03 PM
Below is a sample of my data. For each customer there is a 6 digit NAICS Code and an Industry Description | ||||||
NAICS Code | Industry Description | |||||
317910 | Abrasive Product Manufacturing | |||||
423430 | Computer and Computer Peripheral Equipment and Software Merchant Wholesalers | |||||
423510 | Metal Service Centers and Other Metal Merchant Wholesalers | |||||
Below is a sample of NAICS Codes sourced from SBA. | ||||||
They are either 3, 4, 5, or 6 digits long. | ||||||
It is considered a match if the first 3, 4, 5, or 6 digits in my NAICS sourced data (below) | ||||||
match my sample customer data. | ||||||
External NAICS Code | External Industry Description | |||||
211 | oll and gass estraction | |||||
42311 | Automobile and other motor vehicle Merchant Wholesalers | |||||
4235 | Metal and Mineral Merchant Wholesalers | |||||
I would expect the result to be: | ||||||
NAICS Code | Industry Description | External NAICS Code | External Industry Description | |||
317910 | Abrasive Product Manufacturing | |||||
423430 | Computer and Computer Peripheral Equipment and Software Merchant Wholesalers | 42311 | Automobile and other motor vehicle Merchant Wholesalers | |||
423510 | Metal Service Centers and Other Metal Merchant Wholesalers | 4235 | Metal and Mineral Merchant Wholesalers | |||
Can you help automate this logic? |
Nov 04 2022 12:58 PM
@redridgie In the attached I copied your message text/example and my suggested formula:
=FILTER($A$17:$B$19,IFERROR(SEARCH($A$17:$A$19,A26),0)=1,"na")
my suggested formula does NOT agree with you sample results as I do NOT agree with your sample results. the category 423430 is not a subset of 42311 and the descriptions are not compatible. Note that this formula may pull more than 1 result so you may want a variation like:
=INDEX(SORT(FILTER($A$17:$B$19,IFERROR(SEARCH($A$17:$A$19,A26),0)=1,"na"),,-1),1,)
to get the longest matching code.
Nov 04 2022 02:39 PM
Nov 04 2022 03:32 PM - edited Nov 04 2022 03:36 PM
sure. As with all formulas you start in the middle most () so:
FILTER($A$17:$B$19,IFERROR(SEARCH($A$17:$A$19,A26),0)=1,"na")
this FILTER will filter the rows in A17:B19 if the condition is true and that condition is:
IFERROR(SEARCH($A$17:$A$19,A26),0)=1
which is checking if it can find (i.e. search) the code from column A (A17:A19) inside this row's code value (A26) and it must be found starting at position 1 because a code like 123 we don't want to "match" inside a code 45123. the IFERROR is to catch when the code is NOT found and returns a 0 and hence a false.
So that filter will return all the rows from the code table that have codes that "match" and then that list is SORT from biggest to smallest
and finally it takes the 1st row (i.e. INDEX(..., 1, ) ). note we could also have used TAKE(... , 1) instead