SOLVED

Mysterious bug on excel calculation, on a vlookup

Copper Contributor

As you can see on the excel file, the value on A7 and A8 looks like a 100% like the one on A11 and A12. But when using the VLookup with it, it gets to different results!

This makes many errors on calculations on our company, recently detected.

Would be nice to know what happend and how to fix it!, maybe it's something with hidden decimals floating point?

11 Replies

I don't have a good satisfying answer. They look the same and everything I do seems to indicate they are the same BUT A7 and A8 are text or something. If you re-type the value in it works fine, if you change the formula to
=VLOOKUP(NUMBERVALUE(A7),$A$2:$B$4,2,TRUE)
or
=VLOOKUP(--CLEAN(A7),$A$2:$B$4,2,TRUE)
they work.
Is it possible you used a keyboard or computer set to a different language or extended keyset where the characters are actually in the extended range?

I apologize for the EDIT notifications: but basically I noticed that I was wrong on the 2nd option above because I originally thought --A7 worked but it didn't and only worked if I also used the CLEAN() function which further points to my question above about another language keyboard or if you imported data.  CLEAN specifically removes non-printable characters in the low range of ASCII.  

@Glarrain 

The real values in A7 and A8 are 0.99999999999999989 and 0.99999999999999989.

The values were probably copied from another source.

 

best response confirmed by Glarrain (Copper Contributor)
Solution

@Glarrain  wrote: ``A7 and A8 looks like a 100% like the one on A11 and A12``

 

And looks can be deceiving.

 

The problem has nothing to do with errant characters (which would result in #VALUE errors, anyway), and using the Excel VALUE or VLOOKUP(...,TRUE) function might work-around the problem only by coincidence.

 

Instead, this is just yet-another example of binary (floating-point) arithmetic anomalies.

 

The work-around is:  whenever you expect a calculation to be accurate to some number of decimal places, explicitly round to that number of decimal places -- and not to arbitrary number, as some people suggest.  For whole percentage values, round to 2 decimal places (because 12% is 0.12).  For percentage values that should be accurate to 2 percentage decimal places, round to 4 decimal places (because 12.34% is 0.1234).

 

It appears that the values in A7 and A8 had been calculated (somewhere), then copy-and-pasted-value into A7 and A8.  And the calculated values are infinitesimally smaller (-1.11E-16) than how the values appear, even when formatted to display 15 significant digits.  We can see such differences with formulas of the form =SUM(A7,-(A7&"")) formatted as Scientific or General.

 

(That works because the expression A7&"" returns the value formatted with up to 15 significant digits.   That is Excel's arbitrary formatting limit.  It is not the limit of numeric precision, as many documents state incorrectly.)

 

Additionally, VLOOKUP is among the many Excel functions that compare exact binary values, not the values arbitrarily rounded to up to 15 significant digits, as compare operators ("=", ">=", etc) and the *IF[S] functions do (COUNTIF, SUMIFS, etc).

 

Since the value in A7 is infinitesimally less than the value in A4 (exactly 100%), VLOOKUP(...,TRUE) returns the value in column B (B3) that corresponds to the largest value less than 100%, namely 95% in A3.

 

Since the (same) value in A8 is infinitesimally different from the value in A4, VLOOKUP(...,FALSE) returns #N/A because it fails to find an exactly binary match.

 

In both cases, one work-around is to use ROUND(A7,4) and ROUND(A8,4) if you want to allow for accuracy to fractional percentages (2 percentage decimal places), as you format them.

 

Another solution is to explicitly round the calculations that sourced the values in A7 and A8.  You do not show us those formulas.

 

Neither solution is more "right" or "wrong".  The choice is yours to make, depending on your intentions.

@Detlef Lewin  wrote: ``A7 and A8 are 0.99999999999999989 and 0.99999999999999989``.

 

Not really.  That is just a 17-digit approximation, which is sufficient to reproduce the binary value exactly.  We can see those values in the XML file that "xslx" files are stored as, as you know.

 

But the exact decimal representation of the binary values is 0.999999999999999,88897769753748434595763683319091796875.

 

(I use period for the decimal place and comma to demarcate the first 15 significant digits.)

@Joe User 


But the exact decimal representation of the binary values is 0.999999999999999,88897769753748434595763683319091796875.

 


Where did you get this value from?

 

@Detlef Lewin  and @Joe User , you both indicate that the value is a round off error. how did you determine that? I expanded the number of decimal places and did not see that:

mtarler_0-1622749088046.png

is there some other trick i should be using to check for that?

Furthermore, that doesn't explain why NUMBERVALUE() or --TRIM() applied to that cell would fix that issue since if it really is 0.999999... neither of those functions would change or fix that.

@Detlef Lewin  wrote, referring to 0.999999999999999,88897769753748434595763683319091796875: ``Where did you get this value from?``

 

64-bit binary floating-point values are the sum of 53 consecutive powers of 2, scaled by a power of 2.  I wrote VBA code to calculate the exact sum.  (I wrote the code 16+ years ago.)

 

Using a browser search, I believe you can find online "calculators" that do the same thing.  But beware:  some of them use the 32-bit BFP representation, which is a sum of only 24 consecutive powers of 2.

@mtarler  wrote: ``you both indicate that the value is a round off error. how did you determine that? I expanded the number of decimal places and did not see that``

 

As I explained:  ``We can see such differences with formulas of the form =SUM(A7,-(A7&"")) formatted as Scientific or General``.

 

FYI, we cannot always use simply =A7-(A7&"") because of other anomalies that are specific to Excel arithmetic.  In particular, sometimes Excel replaces the exact infinitesimal difference of a cell formula with exactly zero, if Excel (arbitrarily) deems that the two operands of the last subtraction are "close enough".

 

And as I explained, arguably not clearly:  Excel formats ``up to 15 significant digits.   That is Excel's arbitrary formatting limit.  It is not the limit of numeric precision, as many documents state incorrectly.``

 

That is why we cannot see the infinitesimal differences in some cases, as you demonstrated.

 

Finally, as I explained (7 min before your posting, so you might have missed it), we can see values formatted with up to __17__ significant digits by looking at the XML worksheet in the "xlsx" file, among other methods.

 

The "xlsx" and "xlsm" files are zip archives.  Copy foobar.xlsx to foobar.zip, then double click foobar.zip in Windows, find the XML worksheet, and open it in Notepad.  You will see entries like 0.99999999999999989 surrounded by "v" tags.

 

The 17-digit decimal value is just yet-another approximation of the binary value.  But the IEEE 754 standard specifies that 17 significant digits (rounded) is sufficient to convert between decimal and binary representations with no loss of precision.

 

(BTW, the IEEE 754 standard __never__ mentions anything about 15 significant digits, contrary to most online documents, from MSFT and others.)

(With critical corrections for posterity.)

 

@mtarler  wrote: ``that doesn't explain why NUMBERVALUE() or --TRIM() applied to that cell would fix that issue since if it really is 0.999999... neither of those functions would change or fix that``.

 

Sorry, I overlooked that question.

 

They work-around the problem because they cause Excel to return the value rounded to 15 significant digits.  In this case, 1 - 2^-53 (approx 0.99999999999999989) rounds to exactly 1, which is a binary match for the value in A4 (100%).

 

But as I said before, that is only by coincidence.

 

Consider the following example.

 

Enter =1 + 2^-52 into A4.  That might be the result of a calculation that we expected to be exactly 100%.

 

Then, with =1 - 2^-53 in A7, note that VLOOKUP(NUMBERVALUE(A7),A4,1,FALSE) and VLOOKUP(NUMBERVALUE(A7),A4,1,TRUE) still return #N/A (no match).

 

The reason is:  NUMBERVALUE(A7) rounds A7 to exactly 1 (100%), but it does not match the exact binary value in A4.

 

In contrast, if we enter exactly 1 into A4, both VLOOKUPs return 1 (match) because the binary values now match.

 

Thank you for your explanations. Yes I finally did go into the XML file and see the actual stored value as 0.9999....98. The round off error was my first thought but naively thought I would be able to see it if i expanded the precision enough. And yes you must be right that both NUBMERVALUE() and --TRIM() must be forcing the value to be read as text and then converted back to a number and result in 1.0 (100%) as the value. That said, VALUE() function does NOT work the same and does not appear to force that conversion/rounding. But I did test and see that even if I type =1.0-1.11E-16 into a cell and then force Excel to show 20+ digits it still displays as 1.00000000... or 100.000000....% but then using those formulas it presents the same problem. I even tried tricks like looking at the cell using =TEXT(1-(1E-20),"0.0000000000000000000000") or =TEXT(1-(1E-20),"0.00000000000000000000000E+00") and excel still refuses to cough up the actual full precision value. I wish there was a way to make excel cough that up without going into the XML...

@mtarler  wrote: ``you must be right [about] NUBMERVALUE() [... but] VALUE() function does NOT work the same and does not appear to force that conversion/rounding``

 

You are right.  I made a lot of typos in that posting:  substituting VALUE for NUMBERVALUE; substituting VLOOKUP for MATCH -- or better, using VLOOKUP correctly for comparison; using decimal approximations (1.11E-16) instead of the exact binary values that I intended (2^-53); and intending to use 1+2^-53 ("1+1.11E-16" sic) in A4 instead of 1+2^-52 (approx 1+2.22E-16).

 

I think you got the gist of it (``you must be right that both NUBMERVALUE() and --TRIM() must be forcing the value to be read as text``).  But I might correct my previous posting for posterity.

 

 

-----

 

@mtarler  wrote: ``I even tried tricks like [...] =TEXT(1-(1E-20),"0.00000000000000000000000E+00") and excel still refuses to cough up the actual full precision value``

 

What part of ``Excel formats up to 15 significant digits. That is Excel's arbitrary formatting limit`` did you not understand? (wink)

 

Oh well, with all of my previous typos noted above, perhaps I deserve a little distrust (sigh).

 

BTW, -1E-20 is too small to make a difference, anyway.  Note that MATCH(1-1E-20,{1},0) returns 1, indicating a binary match.

 

In contrast, MATCH(1-2^-53,{1},0) returns #N/A, indicating no binary match. IOW, -2^-53 makes the intended difference. (But +2^-53 does not.)

1 best response

Accepted Solutions
best response confirmed by Glarrain (Copper Contributor)
Solution

@Glarrain  wrote: ``A7 and A8 looks like a 100% like the one on A11 and A12``

 

And looks can be deceiving.

 

The problem has nothing to do with errant characters (which would result in #VALUE errors, anyway), and using the Excel VALUE or VLOOKUP(...,TRUE) function might work-around the problem only by coincidence.

 

Instead, this is just yet-another example of binary (floating-point) arithmetic anomalies.

 

The work-around is:  whenever you expect a calculation to be accurate to some number of decimal places, explicitly round to that number of decimal places -- and not to arbitrary number, as some people suggest.  For whole percentage values, round to 2 decimal places (because 12% is 0.12).  For percentage values that should be accurate to 2 percentage decimal places, round to 4 decimal places (because 12.34% is 0.1234).

 

It appears that the values in A7 and A8 had been calculated (somewhere), then copy-and-pasted-value into A7 and A8.  And the calculated values are infinitesimally smaller (-1.11E-16) than how the values appear, even when formatted to display 15 significant digits.  We can see such differences with formulas of the form =SUM(A7,-(A7&"")) formatted as Scientific or General.

 

(That works because the expression A7&"" returns the value formatted with up to 15 significant digits.   That is Excel's arbitrary formatting limit.  It is not the limit of numeric precision, as many documents state incorrectly.)

 

Additionally, VLOOKUP is among the many Excel functions that compare exact binary values, not the values arbitrarily rounded to up to 15 significant digits, as compare operators ("=", ">=", etc) and the *IF[S] functions do (COUNTIF, SUMIFS, etc).

 

Since the value in A7 is infinitesimally less than the value in A4 (exactly 100%), VLOOKUP(...,TRUE) returns the value in column B (B3) that corresponds to the largest value less than 100%, namely 95% in A3.

 

Since the (same) value in A8 is infinitesimally different from the value in A4, VLOOKUP(...,FALSE) returns #N/A because it fails to find an exactly binary match.

 

In both cases, one work-around is to use ROUND(A7,4) and ROUND(A8,4) if you want to allow for accuracy to fractional percentages (2 percentage decimal places), as you format them.

 

Another solution is to explicitly round the calculations that sourced the values in A7 and A8.  You do not show us those formulas.

 

Neither solution is more "right" or "wrong".  The choice is yours to make, depending on your intentions.

View solution in original post