06-16-2019 06:18 AM - edited 06-17-2019 05:16 PM
06-16-2019 06:18 AM - edited 06-17-2019 05:16 PM
I think this is an easy-to-reproduce bug with frequency() function in Excel. Please also look at the screenshot at the end.
As the Dellef_Lewin confirmed in his response below towards the end, this seems to be the issue of floating point numbers. The problem is that even the long-format display mechanism to display more decimals via format->number->decimal places is not correct. So the only way is that we must go to the underlying XML files and verify the numbers explicitly.
In summary, this is not the infamous fundamental precision problems with floating point numbers. This is the problem with the design inconsistency among EXCEL's basic functions: IF(), SORT(), RANK() and in the latest, the FREQUENCY() in terms of the number of decimal digits that are handled accurately. For example, IF() and its family handle 15 decimal digits only, where as FREQUENCY and others handle 17 decimal digits. So when IF says two numbers are equal, SORT() or FREQUENCY() may give opposite & surprising results.
I have already anticipated the error with the floating point numbers & their accuracy in this post, and I think that it is not the case here. When I displayed the numbers with 30 decimal digits precision on excel itself, I see that the numbers are equal. Note that eventhough IF() comparison confirms the same when compared, the IF() is known to have issues by itself).
Basically, I entered a series of numbers 1,1.1,...,3 using the following steps:
Create the bins in another column using the following steps:
Notes on debugging attempts & possible causes:
First of all, note that a non-extreme bin is formed with elements X such that low < X <= high.
Now can someone kindly explain where the problem is?
My intention is to generate a neat histogram by myself after this basic use of frequency() function to generate the values on x-axis and y-axis.
The numbers, when calculated using the mathematically identical formula (=A1+0.1 etc.) are different from the same result I type them explicitly!
06-16-2019 06:46 AM
06-16-2019 07:51 AM
That's floating point error, but it's not clear, at least for me, why in FREQUENCY. Equivalent formulas like
=COUNTIFS($A$1:$A$30,"<="&C8,$A$1:$A$30,">"&C7) or =COUNTIFS($A$1:$A$30,"<="&C8:C14,$A$1:$A$30,">"&C7:C13) as CSE =SUMPRODUCT(($A$1:$A$30<=C8)*($A$1:$A$30>C7))
etc., all return correct result on this sample, and only FREQUENCY is affected.
06-16-2019 01:11 PM
The reason for this could be how the numbers are processed in the different functions.
At least COUNTIFS() is known to cut numbers after 15 decimal places.
Ah, found something in my archive:
Thanks to MrExcel for solving the case.
06-16-2019 06:04 PM - edited 06-17-2019 05:23 PM
Is this a floating point error really? because when I checked numbers displayed in 30 precision digits, it shows correctly!
Probably looking at the underlying XML files is the only way.
Thanks to @Detlef_Lewin for confirming. I can consider the alternative workarounds that use 15 digits only.
06-17-2019 02:30 PM
Detlef, thank you for the links (and Bill with Jan Karel for the investigation and explanation of the issue), I had no idea about this before. Here https://sfmagazine.com/post-entry/august-2017-excel-rank-countif-and-floating-point-errors/ is bit more in conclusion part.
In general, using of different precision for SORT, RANK and FREQUENCY is kind of design bug. As well as considering 1900 as leap year and wrong using of negation with exponentiation http://mathforum.org/library/drmath/view/69058.html
07-11-2020 02:13 AM - edited 07-11-2020 12:36 PM
(Coming late to the party, I know. But FYI....)
@HarishVangala wrote: ``the only way is that we must go to the underlying XML files and verify the numbers``.
Actually, no. We can verify the differences directly in Excel.
In your example, enter formulas of the form =MATCH(ROUND(A1,1),A1,0) into E1:E14. Or more generally, =MATCH(--(A1&""),A1,0).
The #N/A errors demonstrate that the internal binary values starting in A3 are not the same as the decimal representations, which Excel arbitrarily limits to 15 significant digits (rounded).
Moreover, you can calculate the difference by entering formulas of the form
=SUM(A1,-(A1&"")) into F1:F14, formatted as General or (better) Scientific.
0.00E+00 is exact zero.
Numbers of the form 2.22E-16 are the approximate difference between internal binary values and their decimal representation that is limited to 15 significant digits.
(Caveat: Formulas of the form =A1-(A1&"") do not always show the infinitesimal difference, due to a dubious "trick" that Excel applies inconsistently hide the differences.)
You can avoid the binary anomalies by the changing the formulas in A2:A14 to the form =ROUND(A1+0.1,1).
@HarishVangala wrote: ``this is not the infamous fundamental precision problems with floating point numbers. This is the problem with the design inconsistency among EXCEL's basic functions``
It is both.
The root cause is anomalies associated with 64-bit binary floating-point arithmetic, which is not limited to Excel. The anomalies are considered to be industry-accepted design trade-offs of that binary form. They are not considered bugs. They are not limited to Excel.
However, Excel compounds the problem by design inconsistencies, which apply to basic "operations" (notably subtraction and comparisons) as well as some "basic functions". I agree that the design inconsistencies should be considered defects.
@HarishVangala wrote: ``For example, IF() and its family handle 15 decimal digits only, where as FREQUENCY and others handle 17 decimal digits``
It is just as misleading to think that Excel "stores" or "uses" 17-digit precision as it is to think that Excel "stores" 15-digit precision, a common misstatement in documentation.
It is true that Excel stores up to 17-digit precision in XML. But the 17-digit decimal representations are still just approximations.
The IEEE 794 standard specifies that 17 significant decimal digits are necessary and sufficient to convert between the 64-bit binary floating-point and the decimal representation with no loss of precision.
Nevertheless, Excel stores the exact binary value into memory when the application is running.
For example, the 15-digit approximation of =1/3 is 0.333333333333333, and the 17-digit approximation is 0.33333333333333331.
But the exact decimal representation of the internal binary is 0.333333333333333,314829616256247390992939472198486328125.
(I use period for the decimal point and comma to demarcate the first 15 significant digits.)