Feb 19 2021 04:13 AM
Hello
I am a regular user of the index match formula between spreadsheets containing over 400,000 records and this can take up to 30 minutes to complete. On smaller spreadsheets the formula regularly takes 5 to 10 minutes. I have tried a variety of ways to improve the speed:
working on my hard drive instead of my work network;
always pasting values;
turn on manual calculation;
look for ways of cutting down the data I need to analyse.
Our IT department tell me I have the maximum ram on my laptop which is a core i5 elitebook. they loaned me a notebook, the highest spec they provide, but it made little to no difference so I didn't think it was worth the expense. No one in IT seems to be able to direct me to a solution other than to tell me to use other software but then can't tell me what that is other than Tableau which just wouldn't work for me or my department. On top of the Excel 13 I already had they have me Excel 2010 64 but this just causes issues and sometimes the formula doesn't recognise i'm working between 2 spreadsheets so I have to close and reopen.
I am not well up on laptop specifications so can't tell you exactly what spec the notebook was. they even gave me a virtual desktop which made no difference at all.
The CPU on my laptop occasionally reaches 99% but even when it doesn't it is extremely slow during calculations.
I am currently working from home too but to be fair I had the same problem when I worked in the office and its been going on for 2 years. We use citrix but am more than happy to work of the hard drive if the formula worked faster, which it currently doesn't.
Is there anyone who can advise me on the spec of laptop that I would need to quickly carry out such a formula or am I destined to sit watching the % on my status bar crawl to 100% for the rest of my days? Is this what other people have to deal with?
All advice/direction is welcomed.
Allison
Feb 19 2021 04:59 AM
@Al_Dyas You haven't really explained what you are indexing and matching. Comparing/matching such large data sets could perhaps be done with Power Query. It allows you to connect to the data without loading it into an Excel sheet. Not sure, though, if this could work you.
Feb 19 2021 06:44 AM
Feb 19 2021 07:26 AM
Thank you @Riny_van_Eekelen. I have no idea what a power query is so I will find out and try that.
A sample of the formula I use is this:
=INDEX('20201109 MOSL_MDS_SSPID.xlsx'!$I2:$I229079,MATCH(B2,'20201109 MOSL_MDS_SSPID.xlsx'!$A2:$A229079,0))
There are many different scenarios but I use the same formula structure for the majority.
From a 2016 file I am looking at a 2019 file to see how the data changed, or if it has been removed. For example the MATCH is looking for the exact sewerage point id (SPID) reference numbers in both files. The index returns the status of the SPID in the 2019 file.
So the index match formula answer 2 question for me, is the SPID still active and if it is what is the status.
In another scenario: I have carried out a process to batch update the status field of a large number of SPIDs overnight and the following day I run an index match against today's report and yesterday's report to check that the update worked. So again I am looking to see if the status of the SPID has changed as it should have.
I hope this helps and makes sense.
Feb 19 2021 07:40 AM
Thank you@JMB17 I think I am starting to realise that I need to use 2 formula instead of 1.
I hadn't made in clear in my initial post that I am always working between 2 files. First to see if the record exists in both files and second the return an exact bit of data from each record. So i'm asking for 2 outcomes and i'm begging to think I should be using 2 different formulae?
I will definitely look at power queries, as I said in my last reply, I've never heard of these but will do some digging.
I work pretty much alone so no one to bounce ideas off so I am very grateful for the replies I have received.
Allison
Feb 19 2021 08:01 AM - edited Feb 19 2021 08:06 AM
I would use two columns to make the formulas shorter/easier to follow. Also, if you are returning multiples columns, you only have to test once (the other formulas can just reference the cell that performs the test).
If your table in 20201109 MOSL_MDS_SSPID.xlsx can be sorted in ascending order on Column A, then you could try:
Test for existence (let's say this formula is in cell C2, note the match option is 1 and not 0):
=INDEX('20201109 MOSL_MDS_SSPID.xlsx'!$A2:$A229079,MATCH(B2,'20201109 MOSL_MDS_SSPID.xlsx'!$A2:$A229079,1))=B2
Then test C2 and return the value from Column I, or return "Doesn't Exist" (again - note the match option is 1):
=IF(C2,INDEX('20201109 MOSL_MDS_SSPID.xlsx'!$I2:$I229079,MATCH(B2,'20201109 MOSL_MDS_SSPID.xlsx'!$A2:$A229079,1)),"Doesn't Exist")
The key is that the data in your table MUST be able to be sorted in ascending order on your key column (in this case, Column A). And, since we are using an approximate match, you must perform the lookup twice - first to confirm the key it matched to is the key you are trying lookup. Then, if the key matches, perform the lookup and return the value you actually want (which makes the approximate match lookup an exact match lookup). Because the approximate match lookup method is much more efficient, you can speed up the calculation in spite of the fact that you are performing the lookup twice.
There's a short explanation here, but you can find more information if you google 'excel binary lookup.'
https://www.powerusersoftwares.com/post/2016/12/20/make-your-vlookup-run-a-hundred-times-faster-with...
Feb 19 2021 08:45 AM
I am speechless, each formula worked instantly. I will try them in my other work. I am really trying not to get too excited before I've properly tested it out but that was so much quicker.
I have google searched a few times, when I've had a few minutes spare, but never found anything that made a difference. I obviously wasn't looking in the right place or often enough.
I've just had a look at power query and yes I think that is worth spending more time investigating too think you are all amazing and so kind for sharing.
Thank you, I am so grateful. You have no idea how helpful you have just been.
Allison
Feb 19 2021 09:09 AM
The method that approximate matches use are much more efficient due to the data being sorted. The exact match option is linear, the function starts at the top and looks at every value until it finds what it's looking for. But, the approximate match works by repeatedly 'guessing' by selecting the value in the middle of your table and testing it for a match. If it is greater than what you're trying to match, then it throws away the bottom half of the table. If less than, then it throws away the top half (made possible because the data is sorted).
So, in this fashion, it can throw away half the table with each guess and is exponentially faster (it would only require 20 guesses to perform a lookup on a table that took up all 1 million rows of a worksheet - and the table size could double, but it would only add 1 more guess the function would have to perform).
Feb 19 2021 09:43 AM
Feb 19 2021 10:31 AM
Feb 19 2021 11:59 AM
Reasons for slow computation and poor performance
Volatile functions slow your performance
The term volatile, simply put, means that these formulas are always recalculated.
Conditional formatting makes your file slow
Many complex formulas degrade your performance
Incorrect or too large references lead to a loss of speed
Large workbooks result in slower worksheets
Increase the speed and improve the performance
Save your file as a binary workbook for better performance
Use pivot tables
Use conditional formatting carefully for better performance
Avoid volatile functions for more performance
Customize the reference areas and use structured references and names
Improve your formula structure for more speed
Avoid matrix formulas in your files
Replace formulas with fixed values
Avoid links between workbooks
Sort your source data for faster calculations
Use VBA for better performance
Switch to manual calculation
Can all workbooks be made faster?
Check links in your file
Remove incorrect or unnecessary formulas
Delete empty cells from memory
Check if you are using volatile functions
Check the references in your formulas
Check your formulas and conditional formatting
Finally, I can only recommend you, if none of the above steps help then you should read this article.
These go into detail on the topic and offer alternative paths.
The articles are sometimes a bit very technical and best only for advanced users but still very helpful.
Excel performance: Improving calculation performance
Excel performance: Tips for optimizing performance obstructions
Hardware: I recommend an SSD hard drive and at least 16GB RAM (when it comes to that many data sets).
At the same time check whether all processors are activated.
Instructions for the check (example with processor i7):
Press the key combination [Ctrl] + [Alt] + [Del] and select the "Task Manager". Switch to the "Performance" tab and click on "CPU" on the left. In the right area of the window you can now read the number of cores used by Windows under. Your quad-core processor should read "Cores: 4" there.
I would be happy to know if I could help.
Nikolino
I know I don't know anything (Socrates)
* Kindly Mark and Vote this reply if it helps please, as it will be beneficial to more Community members reading here
Feb 20 2021 11:07 AM
I have found that Microsoft broke the Worksheet calculation in 64bit versions later than 1902. Stay away from xlCalculationAutomatic in newer versions of Excel you have large spreadsheets or spreadsheets in which cells are periodically changed. You must turn off calculation (Application.Calculation = xlCalculationManual) an fire the calculation of sheet only when you need to otherwise risk the total crash of Excel. I have spent a considerable amount of time and headache, including a ticket with Microsoft until I finally found that Microsoft updates specifically 2012 and 2101 have been totally screwed up!
I have about 15 spreadsheet VBA applications running at customer facilities that poll and populate sheets with operational data through a Modbus communication active X (MBAXP), all running anywhere between versions 1708 to 1902, these applications running flawlessly for weeks on end no problem whatsoever. Then the last two applications I ran into a huge problem. The same application would hang as soon as it started, I couldn't select any cell it would run for about 10 minutes then the darn Excel would crash and restart with recovered versions of the same application, the recovered application having no difference. I originally thought it was a computer problem until I started experimenting with other versions Excel! I downgraded to earlier versions and found that all was good, polling selecting was normal. Then I experimented with the newer version, specifically 2012, thinking it could be the MBAXP. After replacing the MBAXP with a timer and a random number generated sheet populating macro I found that the hangup was the sheet calculation, timing the difference between 2012 (2101 was a little better) and the version 1902 at sometimes over 50 times as long in the new versions!! I had to limit the timer for - again - after about 10 minutes the darn Excel would totally crash and restart. No one at Microsoft (or anywhere else) seems to know why there is such a huge loss in calculation speed. I can only guess the programmers at Microsoft don't know what they are doing. While I finally solve the mystery, I am left to wonder whether Microsoft is intent on screwing up the product, leaving me to consider alternatives. Its quite a bummer. BTW the following link is a step by step instruction on downgrading to a working Excel version. Try v1902 it works. https://answers.microsoft.com/en-us/msoffice/forum/msoffice_install-mso_win10-mso_o365b/revert-to-a-...
Feb 21 2021 02:48 AM
Everything I could find in microsoft's quiver was sent to you as links. As i'm just a simple member, i can't help anymore.
If this information does not help you, contact Microsoft support.
Excel performance: Improving calculation performance
Excel performance: Performance and limit improvements
Excel performance: Tips for optimizing performance obstructions
How to clean up an Excel workbook so that it uses less memory
(Check whether the workbooks were created with earlier Excel versions, if so, change this, maybe this could help).
Known issues with Office and Windows 10
Thank you for your patience and time.
Hope I could help you at least a little.
I wish you continued success with Excel...Don't forget, Excel is the best invention since there was chocolate :)).
Nikolino
I know I don't know anything (Socrates)
Feb 22 2021 03:26 AM
Thank you @NikolinoDE there is a lot to go through there but I will and I will feed back to you.
I am absolutely blown away by how helpful everyone here has been. I really didn't expect so many replies. Lots for me to review and try out and investigate.
I will certainly share the hardware spec with my manager to see what she can do for me. In the meantime I will make a start on reviewing my master workbooks to see where I can follow all the tips and cut out any unnecessary tasks for excel to do. I've promised to share with my colleagues to as there are other in my team having similar problems.
I will go through your list item by item and make a note of any improvements. Though it may take me a week or two because i'll have to do it all in between work but there will be opportunities as I am going along.
Thank you so much for taking the time to share. I've struggled for two years and there have been tears and tantrums along the way so I am excited to get a bit of control back.
Allison
Feb 22 2021 03:38 AM
By the way, on which edition of Excel you are?
Feb 22 2021 03:44 AM
Thank you @rpbenz. Noted about turning off automatic calculation, I will make more use of this than I am currently doing.
I also get a lot of 'Excel not responding' messages and am just left sitting and waiting, or putting a wash on, making a brew, doing a bit of deadheading in the garden (summer obviously).
I am not sure i'll be able to swap between excel versions as I am on my employers network which is very restricted but I have lots of tips to work through and if they don't speed things up for me I will certainly be asking the question of our IT department. I do find 2010 64 bit more problematic and not helpful in that I can't have multiple windows open.
Thank you for sharing, its all helpful and informative and I am grateful
Allison
Feb 22 2021 03:50 AM
Bloomin eck@NikolinoDE that is brilliant. I will take a look and let you know how I get on.
That must have taken a while so thank you for your time.
I'd better get crackin.
Allison
Feb 22 2021 04:04 AM
So looking in About MS Excel it says:
MS Excel 2013 (15.0.5311.1000) MSO (15.0.5275.100) 32-bit
No idea what the bits in brackets mean but this is the version I use regularly through Citrix.
I also have access to:
MS Excel 2010
version 14.0.6023.1000 (64-bit)
I was given access to this version to help with working with the larger spreadsheets but it doesn't and I can't have multiple windows open so I tend to use 2013 most of the time.
I hope this answers your question.
Allison
Jun 26 2021 07:00 AM
Look I use to have the same problem with a slow Lenovo with amd processor. One day I decide to change the hdd with an ssd and surprise! A laptop with 8g of mem was slower than a 20 year old laptop. You change ssd , do little research on the net and you tube create live usb reinstall win and you will be pleasantly surprised!