SOLVED

Serious bug with certain decimal numbers: Countifs and Roundup failed

Brass Contributor

I have a file that had time upto the seconds level. Some of these numbers had 99999.. after the fifth decimal digit. And all these numbers seem to throw a major bug.

I wanted to find out how many entries were made on or before the current time. Shocking excel returned the answer as zero for those times or numbers with 99 in the sixth decimal point onwards (which is impossible because the time is already in the array).

 

I even tried rounding those numbers to five decimal points but it didn't round.

OpenOffice Calc, Google Sheets and Zoho sheets does not seem to create any such trouble. I have attached the file.

 

Any thoughts?

21 Replies

@Viz 

Try using MROUND(..., 1/86400)

(1 second = 1/86400 of a day)

 

S0357.png

@Hans Vogelaar  wrote: ``Try using MROUND(..., 1/86400)``

 

I think that is ill-advised.

 

Consider Sheet1!B4.  It display 8:49, but the actual value is 2/1/2020 8:49:48.  (I write dates in the form MDY.)

 

=ISNUMBER(MATCH(MROUND(B4, 1/86400), B4, 0)) returns FALSE(!).

 

(FYI, we could write "0:0:1" with double-quotes instead 1/86400.)

 

The reason is:  the binary result from MROUND does not match the binary representation of 2/1/2020 8:49:48.  In fact, MROUND fails for more than 25% of the values in column B.

 

An exact binary match is important for lookups, for example.

 

In contrast....

 

=ISNUMBER(MATCH(INT(B4)+TEXT(B4,"h:m:s"), B4, 0)) returns TRUE for all values in column B.

 

-----

 

Caveat: INT(B4)+TEXT(B4,"h:m:s") works for constants that are accurate to the second.

 

And it works for calculated date and time that results in less than 23:59:59.500.

 

However, if date and time values might be accurate to the 1/10 second (or less), the following rounds to the second more reliably:

 

INT(B4)+TEXT(MOD(B4,1),"[h]:m:s.000")

 

The reason is:  Excel rounds to the second when formatting to the hour, minute or second.

 

Consequently, for example, 2/1/2020 23:59:59.6 displays as 2/2/2020 0:00:00.

 

But INT(B4)+TEXT(B4,"h:m:s") would incorrectly result in 2/1/2020 0:00:00.

@Viz 

I think there are two separate issues, and it is unclear if the issue that you focused on is truly important for your purposes -- although it is interesting curiosity.

 

-----

 

Yes, Excel has a formatting defect that causes 43863.89142 to be displayed as 43863.8914199999, for example.

 

(MSFT used to document the defect in KB161234.  But I cannot find that document or the equivalent online anymore.  No matter.  The KB description was inaccurate, anyway.)

 

The defect is specific to Excel.  It has nothing to do with the 64-bit binary floating-point standard.  That is why you do not see it in other applications.

 

And the formatting defect is not limited to ROUNDUP.  The formatting defect arises with relatively few binary values that meet very specific conditions.  (I can explain, if you wish.  It is TMI for most people.)

 

Moreover, the effect of the formatting defect can be confusing because when we enter 43863.89142, initially it has the binary value of 43863.89142, even though it appears to be 43863.8914199999.  But if we edit the cell, it changes to the binary value of 43863.8914199999, which is different.

 

For example, initially =ISNUMBER(MATCH(VALUE("43863.89142"), { 43863.89142 }, 0)) returns TRUE, even though it appears to be =ISNUMBER(MATCH(VALUE("43863.89142"), { 43863.8914199999 }, 0)).

 

But if we select the cell and press f2, then Enter, the formula returns FALSE.

 

OTOH, if we have =VALUE("43863.89142") in A1, =COUNTIF(A1, "="&43863.89142) always returns zero because COUNTIF sees the string "=43863.8914199999".

 

The only work-around that I know is to replace such constants with expressions like VALUE("43863.89142"), ROUND("43863.89142",5), A1+"43863.89142", etc.

 

The key is to enter such numbers in double-quotes in an arithmetic expression, when you encounter the formatting defect.

 

-----

 

But the real question is:  why are you using ROUNDUP(B4,5), in the first place?

 

In Sheet1, the date and time constants in column B are already accurate to the second.  There is no need to round them.

 

And if there is a need (for example, for calculated date and time values), rounding to 5 decimal places is the wrong way to do it.

 

Instead, you might use one of the following, depending on your requirements.

 

TEXT(B4, "m/d/yyyy h:m:s")

or

--TEXT(B4, "m/d/yyyy h:m:s")

or

INT(B4)+TEXT(B4,"h:m:s")

or

INT(B4)+TEXT(MOD(B4,1),"[h]:m:s")

 

If you provide the original use of COUNTIFS that led to the problem, we can be more specific.

 

I think what you posted in your attachment is your attempts to understand and work around the original problem.  I "cannot see the forest for the trees".

@Joe User 


The only work-around that I know is to replace such constants with expressions like VALUE("43863.89142"), ROUND("43863.89142",5), A1+"43863.89142", etc.

 

The key is to enter such numbers in double-quotes in an arithmetic expression, when you encounter the formatting defect.

Interesting. I've never sever such a workaround before. Will keep it in mind.

 

 

@Joe User , FYI

Classic Diego Oppenheimer post which explains floating point precision is here Understanding Floating Point Precision, aka “Why does Excel Give Me Seemingly Wrong Answers?” - Micr...

 

Microsoft restructured documentation with introducing of docs.microsoft.com, articles as KB161234 are here. With some updates where applicable. In particular this one is Floating-point arithmetic may give inaccurate result in Excel - Office | Microsoft Docs

best response confirmed by allyreckerman (Microsoft)
Solution

@Viz 

In addition, Excel works by default with 15 digits and many functions ignore extra digits, but some work in background with 17 digits. More about that is here 17 or 15 digits of precision - Excel Tips - MrExcel Publishing

 

For this particular case if use

=SUMPRODUCT( ($C$1:$C$11<=C4)*($B$1:$B$11=B4))

instead of COUNTIFS() it returns correct result.

Thank you for a detailed response, @Joe User and thank you for the alternative solution @Sergei Baklan 

 

I had a list of end users with various entry times and I had to count how many times an user has done entry up to a given point. So, I wanted to apply this: =Countifs([Time],"<="&[@Time],[User],[@User])

 

That is when I noticed that for some of the entries I was getting wrong answers. Roundup was more of a workaround I attempted to fix the problem.

I didn't realise this issue existed in MS Excel. Thank you for helping me understand the bug.

@Sergei Baklan  wrote:

``Classic Diego Oppenheimer post which explains floating point precision``

and

``articles as KB161234 are here. [....] In particular this one is``

 

 

First, as I stated clearly previously, this formatting defect has nothing to do with 64-bit binary floating-point precision.

 

To demonstrate that fact, note that when we enter 43863.89142, Excel incorrectly displays 43863.8914199999, but VBA correctly displays 43863.89142.

 

And in fact, the binary approximations are different.  The following shows the exact decimal representation of the two binary approximations:

 

43863.89142:

43863.8914199999,9992549419403076171875

 

43863.8914199999:

43863.8914199998,98062087595462799072265625

 

(I use period for the decimal point and comma to demarcate the first 15 significant digits.)

 

-----

 

Second, the MSFT article that you cite is a retitling of KB 78113, not KB 161234.

 

But aha! I just stumbled across an archived version of KB 161234 at Microsoft KB Archive/161234 - BetaArchive Wiki .  For posterity, it reads, in part:

 

``when you type or calculate a number between 32,768 and 65,535 that contains a decimal portion of .848, the number is evaluated and displayed in the formula bar with a decimal portion of .8479999999. [....] Microsoft has confirmed this to be a problem``

 

That KB description is incomplete.  The formatting defect is not limited to the list Excel versions (5.0, 95 and 97).  Obviously, it applies to the latest Excel versions as well.  And the emphasis should be on constants and calculations that we explicitly round.

 

(Obviously, unrounded calculations with decimal fractions are susceptible to the anomalies of binary floating-point arithmetic, which can result in unexpected decimal fraction digits.)

 

And I take issue with the statement:  ``.848 [...] is evaluated [...] with a decimal portion of .8479999999``.  As I noted previously, when we enter 43863.89142, initially the value that appears to be 43863.8914199999 has the binary approximation of 43863.89142.  It is changed to ("evaluated as") the binary approximation of 43863.8914199999 only after we edit the line in the Formula Bar.

 

More importantly, the formatting defect is not limited to that integer range and that decimal fraction.  For example, for that integer range, other 3-digit fractions are 0.098, 0.223, 0.348, 0.473, 0.598, 0.723 and 0.973 as well as 0.848.

 

And @Viz's examples include 5-digit fractions in that integer range; for example, 43863.89142.

 

(Caveat: The following is TMI for most people.)

 

It is difficult to predict the decimal values that evoke the formatting defect.

 

And it is difficult for me to explain the conditions that evoke the formatting defect without resorting to the details of the binary representation.  Those conditions are:

 

1. The integer part of the number is 65535 or less.  If the integer part is zero, the decimal fraction must exceed 0.5.  And

 

2. The binary approximation of the decimal fraction can be represented in 32 bits; that is, all the bits to the right are zero.  And

 

3. The 16th significant digit of the exact decimal representation of the binary approximation is 5 or more.

 

I'm sure the 3rd condition is not explicit in the Excel implementation.  But we notice a formatting defect only when the 15th significant digit should be rounded up, and it is not.

@Hans Vogelaar 

 

Thank you, Hans.

 

It almost sorted the problem. But there were still a few numbers that it couldn't fix. As Joe mentioned, there seems to be some inherent problem in certain odd cases.

The solution that Sergei gave to use sumproduct in place of Countif fixed the problem.

@Viz 

After a lot of rummaging around on Google, I came to this result.

The short answer is, you can't edit your numbers the way you want after you've entered the numeric value and hit enter. The moment you do this, the number is truncated to 15 digits and shown as exponential.

 

This is not a bug, it is a function.

 

Everything you do after that uses the truncated value, so no formatting trick is helpful. However, you can use the “Text to columns“ option to convert the exponential notation to text (click on the column heading, click “Text to data“ then click Delimited, Next uncheck all delimiting boxes, select “Text in Column data format "And then from" Finish ") to convert the values ​​into text and immediately display them as a 15-digit number. But it will only be 15 digits so if you had a longer number the pause is lost.

 

To have the number in the worksheet exactly as you entered it, you need to save it as text. So you have to prepend an apostrophe or format cells as text before you can enter / copy values. If it "sometimes doesn't work" then you are either doing something wrong or you have some kind of autocorrect enabled (if you use an apostrophe and a large number> 15 digits, Excel will treat it as an incorrect value and mark the cell with a warning hence it is this remark is neither personal nor critical.

 

Another way to do this in bulk is to import values ​​from a text file. Dare I say it's the only way out for most situations. Be sure to import the file and not just open it, as Excel treats the csv type like the native format and of course cuts large numbers to 15 digits.

 

The bottom line is, if you type a large numeric value into an Excel cell formatted as text, it will continue to be displayed in scientific notation as long as the cell is not wide enough. It's annoying, but a side effect of some of Excel's internal algorithms. If you just make the cell wider you will see the full value every time.

 

A pleasant task for most simple users like me, when you have nothing to do or don't want to do anything.

 

With my little knowledge, I cannot 100% verify whether all of this is the case.

 

Who am I already! ... I know that I don't know anything.

Nikolino

 

 

@NikolinoDE  wrote: ``The moment you do this, the number is truncated to 15 digits and shown as exponential.``

 

Hey, Niko, I think you responded to the wrong thread.  If you agree, you can delete your response, and I'll delete mine.

 

Nothing about Viz's problem has anything to do with trying to enter more than 15 significant digits.

 

He is entering date and time perhaps in the form 2/1/2020 8:49:48 AM.  At least, that is how column C appears to me, due to my regional settings.  (My date is in the form MDY.)

 

@Joe User 

If the author thinks that the different behavior of the cells is not based on this (different formatting) and my comment for you means that it is bad / wrong / annoying, I of course take back my comment and apologize for the inconvenience.

You do not need to delete yours as compensation.

 

Comments are there to help further, to suggest possible solutions,

to reveal other solutions and so much more.

Comments are not there to disturb or irritate or to fulfill any end in itself.

 

Thank you for your understanding and patience

 

 

Nikolino

I know I don't know anything (Socrates)

@Viz  wrote: ``thank you for the alternative solution @Sergei Baklan``


I agree with @Sergei Baklan's SUMPRODUCT alternative to solve your problem with COUNTIFS. I suggest that you mark his response as "best" or "answer".

 

Ironically, you were correct in the first place: the original problem with COUNTIFS is indeed related the formatting defect that you stumbled across with your misguided rounding work-around. The formatting defect arises even without the explicit rounding.

 

I quibble with some details of Sergei's explanation of the issue. (See below.) But his SUMPRODUCT solution works because it treats the cell values in a consistent manner, in contrast to COUNTIFS.

 

Consider the date and time in C3 (2/1/2020 8:49:48 AM, which you format as 8:49).

 

The exact decimal representation of the binary approximation is 43862.3679166666,6977107524871826171875.

 

For the COUNTIFS condition that is effectively "<="&C3, Excel replaces C3 with the decimal representation. But Excel formats only up to 15 significant digits.

 

That should result in "<=43862.3679166667". And if it did, COUNTIFS would correctly count that as 1.

 

But unfortunately, that particular binary value meets the conditions that evoke the formatting defect, to wit: the integer part is less than 65536, and the remaining binary fraction can be represented in 32 bits or less (31).

 

So Excel formats the COUNTIFS condition as "<=43862.3679166666", failing to round up the 15th digit as it should, due to the formatting defect.

 

And since that is less than the value of the date and time in C3, COUNTIFS incorrectly does not count that.

 

With Sergei's SUMPRODUCT, the comparison expression is effectively C3<=C3. Of course, that is TRUE, no matter how Excel treats C3 in that context. So SUMPRODUCT correctly counts that as 1.

 

-----

 

@Sergei Baklan  wrote: ``Excel works by default with 15 digits and many functions ignore extra digits, but some work in background with 17 digits``

 

Actually, Excel works with the full binary precision by default. That is the cause of most arithmetic anomalies.

 

Excel never "works in the background with 17 digits".

 

But it is true that binary values can be approximated with 17 significant digits, with no loss of precision when converting back to binary. So Excel does represent binary values with up to 17 significant decimal digits when a workbook is saved as an XML file, which includes "xlsx" and "xlsm" files.

 

Excel never "stores 15 digits of precision", as many articles state incorrectly.

 

But Excel does format only up to 15 significant digits. And that is what causes the problem with Viz's COUNTIFS formulas, in conjunction with the formatting defect documented in KB 161234.

 

It is also true that for some comparisons, Excel rounds operands internally to 15 significant digits just for the purpose of the comparison.  That is the case with comparison operators ("=", "<", etc) and with COUNTIF[S].

 

(But interestingly, the internal rounding is not susceptible to the formatting defect.)

 

(And for some other comparisons, Excel compares the full binary precision. That is the case with lookup functions like MATCH, VLOOKUP, etc, for example.)

 

 

@Viz 

That's very interesting,

So the essence of this rounding issue seems to be for numbers like x=43863.89142,

=COUNTIF(x,x)
=> 1

as expected, but

=COUNTIF(x,"="&x)
=> 0 ?!

A safer approach is to convert date/times to integers for comparisons by multiplying by 24*60*60.

 

@lori_m 

As I remember such behaviour is for all <DO>IF() functions, e.g. =SUMIF(x,"="&x, x) =>0 vs

=SUMIF(x,x,x) =>x 

@Sergei Baklan 

Yes, and in general any functions or operators that take text type inputs will be similarly affected since expressions such as ""&x get incorrectly rounded. The SUMPRODUCT workaround you suggest takes numeric inputs and so is not affected.

@lori_m 

 

Thank you for the solution. Especially since Sumproduct doesn't spill, for DA this seems to be the best work around.

On a side note,  interestingly this problem arises only if the integer part is above 1024. If I try 1023.89142, it seems to work very well. But the moment it crosses 2^10, there is some problem. And again it is happening for 0.89142 but there is no problem for 0.89143 or 0.89141

@Viz 

The 1024 threshold looks to be significant, a recent MS Research podcast stated that the original C++ code predated IEEE-754 double precision standard, some parts still using an old 40-point format (see https://en.wikipedia.org/wiki/Microsoft_Binary_Format )

 

Since technical documentation is lacking, I had collected together some findings on numeric formats here, one being 'for accurate timing calculations serial numbers should be scaled by the number of milliseconds per day'.

It took me a while to process what you have written. Interestingly that the problem does not happen if integer is more than 65535. And I noticed that it didn't happen for values below 1024 either.
1 best response

Accepted Solutions
best response confirmed by allyreckerman (Microsoft)
Solution

@Viz 

In addition, Excel works by default with 15 digits and many functions ignore extra digits, but some work in background with 17 digits. More about that is here 17 or 15 digits of precision - Excel Tips - MrExcel Publishing

 

For this particular case if use

=SUMPRODUCT( ($C$1:$C$11<=C4)*($B$1:$B$11=B4))

instead of COUNTIFS() it returns correct result.

View solution in original post