INDEX MATCH Duplicate Match Values with Unique Data

Copper Contributor

Hello,

 

I have searched and searched, but nothing I have found works exactly for what I need. I have a workbook with 10 or so sheets, 9 of which pull data from the 10th using INDEXMATCH. The data is grouped by machine locations within a given building where there may be dozens of transactions on each given machine (ex. machine 1 has 200 item transactions but one lookup value - the machine number). 

 

The problem I am having is that I have to create unique identifiers in order to pull all of the transaction data for each machine that each spreadsheet references so when I dump new data for a new billing cycle the unique identifiers may not match up. The process I'm using to create unique identifiers is a COUNTIF function that counts the number of occurrences of each machine number so if there are more or less transactions on the next billing cycle I have to re-copy the new identifier list to each spreadsheet in order to show the correct data. 

 

What I am trying to do is somehow create a formula using INDEXMATCH (and whatever else necessary) to return ALL information related to each occurrence of a number in a given range.

 

EX. Building 1 has Machines 1, 2, and 3

1 Item A at $15.03

1 Item B at $9.71

1 Item C at $4.82

2 Item A Qty 2 at $30.06

2 Item B at $9.71

3 Item C at $4.82

3 Item A at $15.03

This is over-simplified as there are nearly 50 machines and thousands of items.

 

 

Is it possible to have a modified INDEXMATCH formula that pulls specific data for machines 1 through 3 using only the machine number (so the list dynamically updates and N/A values can simply be deleted)? My INDEXMATCH formulas are already set up and can be seen below. Obviously this would be the same formula on all 9 sheets, just with different machine numbers. 

 

My goal is for this to be a simple "dump data, delete n/a's, send to customer" process. 

 

 

Formulas I'm using:

Creating Unique Identifiers: 

LEFT(E2,FIND("-",E2)-1)    

Separates the machine number out of an alpha-numeric identifier stating where each item is located within a machine. Using the alpha-numeric identifier is not an option because there are also multiple occurrences if an item is purchased multiple times in a billing cycle and are subject to change if items are moved.

B2&COUNTIF(B2:B1885,B2)   

Counts the number of times a given machine number repeats and adds it to the machine number creating a unique identifier.

 

INDEXMATCH

These formulas obviously change slightly depending on the sheets they are referring to and the columns the data is located in.

INDEX(Sheet1!$T$2:$T$5000,MATCH('Sheet10'!$B3,Sheet1!$A$2:$A$5000,0))

Column B is where the unique identifiers are copy and pasted.

 

Apologies if I have been redundant in repeating information or over-simplifying. Let me know if more information is needed. I can't disclose very much as the information in the spreadsheets is confidential.

4 Replies

@ShockTherapii   (That's quite a user name!!)

 

I don't have an immediate solution for you. If somebody else does, that's fine.

 

As an old database person, I cringed when I read your description of coming up with NEW "unique" identifiers each billing cycles. It sounds as if each of those identifiers refers to one of the 50 machines, right? So I start with the question WHY do you need a new unique identifier each cycle?

 

Excel comes with some amazing database functions that would allow you to extract data for any given period, and any given machine, and any given transaction type (it DOES get complicated if THOSE are all unique, especially if they're created ad hoc and vary each cycle...).

 

Anyway, I'll let others answer your questions about INDEXMATCH (if it's possible)....

 

My own reaction to what you've described is to wonder whether a re-thinking of the fundamental design is warranted. And I realize that you're working with proprietary info so can't share that level of info. But maybe--if no other answer is forthcoming--you could create a dummy copy of at least a subset of the 10 sheets, nonsense data in them, and a more complete description of what you mean when you describe your ultimate goal in these words "My goal is for this to be a simple "dump data, delete n/a's, send to customer" process." Does each customer use multiple machines for multiple transactions? Or does any given customer just use one machine? To what level of detail does each customer get an itemized list of machine, date, transaction...." Etc. I.e., what does that "dump" look like?

 

I ask all this because my experience has been that often spreadsheets get developed initially as semi-automated ways of doing what was done previously on paper ledgers, and often that initial design works just fine by means of making it a bit easier to do the kind of matching up you're describing. But as volume grows, what's really needed is a radical re-thinking.....   This may or may not be true of your situation, but it would be irresponsible of me not to at least ask.

Thank you for your response. To answer your questions one at a time...

It is not necessarily true that unique identifiers need to be created each time, but that the identifiers won't always match. So in one building and in one machine EACH transaction in a billing cycle would show in the unique identifier as I couldn't see a way to create a unique ID otherwise. So for example if machine 15890 had 60 transactions in a billing cycle there would be 15890, 158901, 158902 all the way up to 1589059. Essentially just being the machine number and a number identifying the count of how many times that machine appeared on a report. The problem lies in that the number of transactions between billing cycles will likely never be the same so the identifiers have to be copied each cycle into the corresponding spreadsheet in order for no data to be missed. I'm sure it's possible to come up with a better system here.

What I meant by my goal is I want to be able to copy the new report we pull from a program each billing cycle into my data dump sheet that all the other sheets pull from and then be done with it. It will be used by people with basically no excel experience, but is a requirement of our client and the manual reporting is an 8 hour process. I refer to it as a data dump because I literally just copy and paste the data so no editing is required to get the workbook to function.

It's one customer, but thousands of employees who are not limited to one machine. They will use most of the machines and have many transactions each in a given billing cycle at their building but ALSO are able to access machines at other sites if working there.

The account managers at each building get a full report of items and item number, quantity, price, who dispensed the item, their employee number and their department. I have no problem really with pulling that information from the data dump with index match.

@ShockTherapii   I have to think it's significant that nobody else has chimed in with a neat solution; there are people on this site who are far more experienced than I with the newest capabilities of Excel. I retired 17 years ago so am most conversant with the capabilities it had at that time (and still does) but I'm realizing in the last week (beginning when I discovered this site) that Excel has added even more amazing powers since I last read the manuals. For one example that I have yet to explore--I've ordered a recent reference manual to help--I see people here talking about Power Query, and I suspect that might have a way of meeting your deeper need. But I'm also quite sure--well, I have a strong suspicion--that what Power Query does is accomplish in more user-friendly fashion what the older database functions do to this day.

 

Let me describe a solution briefly here that uses Excel's database functions. See if it resonates in concept with what you are doing. If it does, I'd be happy to dig deeper. So here's what I have working currently to support my wife in her consulting business. She has two clients, does similar kinds of work for each, with some distinctions (different categories of projects). Has a need to prepare a monthly invoice to bill those clients, and it needs to show a moderate level of detail: for each client, she will show a list of up to 10 or 12 of the different types of projects, and the total number of hours spent on each of those types or categories during the month. The invoice then applies an hourly rate to the total hours and shows that as the invoice amount. That's the output needed.

 

At the input end of things, she keeps track of all of her hours in a single database (key point here: there's no need to separate the database into two just because there are two clients; you just have a code in each record that indicates which client these hours were for), noting in each row of that single database:

  • Date
  • Project category/description
  • Hours (can be fractional)
  • Client

More data elements could be added to each row as needed. For example, depending on the level of detail expected (itemization on the invoice), one could show sub-projects. We don't need that; my point is that more granular data could be recorded if necessary.

I should add that there's a check for data integrity: the entry under Project Category is validated (Excel can do that) so that she can't inadvertently misspell and thereby have an entry missed in the final report.

 

The invoice is created by a series of data extraction queries that take advantage of the Excel functions known as database functions. Here's a sampling:

  • DSUM: will retrieve the total of a specified column from a database, for all rows that meet a set of criteria that can include 2, 3, 4....a large number of specs. In our case, it's the number of hours for each ProjCategory for each Client for the current month (i.e., three identifiers)
  • DCOUNT: counts the records that meet the given criteria
  • DMAX: will retrieve the highest number from a specified subset of the database
  • DMIN: retrievs the lowest number
  • DAVERAGE: I think you're getting the idea

 

The key in all this is alluded to above in italics, because I think it sounds as if your "unique identifiers" are serving a similar purpose--but you're doing it by creating a whole new data entry in each row, and then somehow collecting those into each of your "data dumps."  When in fact, what I'm suggesting is that you could just specify the three (or four, or five) fields that constitute your "unique identifier" in what these database functions call on as the criteria range.

  • And unless I'm totally mistaken, you could be putting all the data on all nine spreadsheets into one single database for even more efficiency. Separating them made sense in the old days when the process was paper based....but rethinking it in more contemporary database frameworks makes it possible for you (I'm suggesting, based on what you've described) to just add an identifier to each row of a single database that reflects which building/which machine is in use.
  •  
  • So let me suggest you go look at Excel's D_____ database functions, get a good description of them, play around a bit with them, and see if they could possibly meet your needs more simply than what you've been doing. I'd be happy to engage more with you if, as I said, any of this resonates as even "possible."
I think you're really onto something here and I am going to look into those database functions in great detail.

I do want to make one distinction that perhaps I wasn't very clear on though. I only have one data dump which is a very large sheet of ALL the data for every machine and building. The 9 sheets are unique to each building which is required as each building does its own billing based on what their usage was. That path was taken as the billing managers didn't want to see other sites information mixed in with theirs.

I don't think this is an obstacle for the solution you proposed, though. I'll have to copy the database formulas into each sheet, but that's not a problem as all of the formulas will reference information contained in just one data dump sheet.

Thank you for your input! I will respond when I've tested these things out to let you know how it went.