Forum Discussion
Formula speed / Laptop specification
Or, if your data can be sorted on the column containing your key, then you might make use of the approximate match option. For example, say the table is in A2:B8 and the value I'm trying to match is in D2, then use an IF statement to return the value it actually matched to and confirm it is an exact match and, if so, return the value I want out of Column B. Approximate match lookups used in this fashion (binary lookup) are exponentially faster on large data sets.
=IF(INDEX(A2:A9,MATCH(D2,A2:A9,1))=D2,INDEX(B2:B9,MATCH(D2,A2:A9,1)),0)
Thank youJMB17 I think I am starting to realise that I need to use 2 formula instead of 1.
I hadn't made in clear in my initial post that I am always working between 2 files. First to see if the record exists in both files and second the return an exact bit of data from each record. So i'm asking for 2 outcomes and i'm begging to think I should be using 2 different formulae?
I will definitely look at power queries, as I said in my last reply, I've never heard of these but will do some digging.
I work pretty much alone so no one to bounce ideas off so I am very grateful for the replies I have received.
Allison
- JMB17Feb 19, 2021Bronze Contributor
I would use two columns to make the formulas shorter/easier to follow. Also, if you are returning multiples columns, you only have to test once (the other formulas can just reference the cell that performs the test).
If your table in 20201109 MOSL_MDS_SSPID.xlsx can be sorted in ascending order on Column A, then you could try:
Test for existence (let's say this formula is in cell C2, note the match option is 1 and not 0):
=INDEX('20201109 MOSL_MDS_SSPID.xlsx'!$A2:$A229079,MATCH(B2,'20201109 MOSL_MDS_SSPID.xlsx'!$A2:$A229079,1))=B2
Then test C2 and return the value from Column I, or return "Doesn't Exist" (again - note the match option is 1):
=IF(C2,INDEX('20201109 MOSL_MDS_SSPID.xlsx'!$I2:$I229079,MATCH(B2,'20201109 MOSL_MDS_SSPID.xlsx'!$A2:$A229079,1)),"Doesn't Exist")
The key is that the data in your table MUST be able to be sorted in ascending order on your key column (in this case, Column A). And, since we are using an approximate match, you must perform the lookup twice - first to confirm the key it matched to is the key you are trying lookup. Then, if the key matches, perform the lookup and return the value you actually want (which makes the approximate match lookup an exact match lookup). Because the approximate match lookup method is much more efficient, you can speed up the calculation in spite of the fact that you are performing the lookup twice.
There's a short explanation here, but you can find more information if you google 'excel binary lookup.'
https://www.powerusersoftwares.com/post/2016/12/20/make-your-vlookup-run-a-hundred-times-faster-with-the-double-vlookup-trick- Al_DyasFeb 19, 2021Copper Contributor
I am speechless, each formula worked instantly. I will try them in my other work. I am really trying not to get too excited before I've properly tested it out but that was so much quicker.
I have google searched a few times, when I've had a few minutes spare, but never found anything that made a difference. I obviously wasn't looking in the right place or often enough.
I've just had a look at power query and yes I think that is worth spending more time investigating too think you are all amazing and so kind for sharing.
Thank you, I am so grateful. You have no idea how helpful you have just been.
Allison
- JMB17Feb 19, 2021Bronze Contributor
The method that approximate matches use are much more efficient due to the data being sorted. The exact match option is linear, the function starts at the top and looks at every value until it finds what it's looking for. But, the approximate match works by repeatedly 'guessing' by selecting the value in the middle of your table and testing it for a match. If it is greater than what you're trying to match, then it throws away the bottom half of the table. If less than, then it throws away the top half (made possible because the data is sorted).
So, in this fashion, it can throw away half the table with each guess and is exponentially faster (it would only require 20 guesses to perform a lookup on a table that took up all 1 million rows of a worksheet - and the table size could double, but it would only add 1 more guess the function would have to perform).