Forum Discussion

Stefano66's avatar
Stefano66
Copper Contributor
Mar 02, 2023
Solved

SUMIF and SUMIFS give wrong result

Hi

with this data

 

 

 

 

 

SUMIF(A1:A4,"IE0032860565",B1:B4) results in 1.13686837721616E-13.

The same does SUMIFS(B1:B4,A1:A4,"IE0032860565").

However, SUM(B1:B4) gives zero, which is the only correct result.

How can I have SUMIF and SUMIFS too giving zero?

I'm using Microsoft 365 Apps for enterprise on Windows 10 Pro Version 22H2 64 bit

Microsoft® Excel® per Microsoft 365 MSO (Versione 2211 Build 16.0.15831.20098) 32 bit

Thanks

Stefano Leoni

  • Stefano66 

     

    "If a picture is worth a thousand words, an Excel file is worth a thousand pictures". (wink)

     

    Please attach an Excel file, not an image.

     

    We cannot give you a dispositive explanation because the data is probably more precise that you show in the image.  The extra precision might even be invisible in the Excel file because Excel displays (formats) only up to 15 significant digits, rounded.

     

    But I can offer an example that duplicates the symptoms, albeit not the exact values.

     

    Suppose B1 is, in effect, the result of the following calculation:  =2878110+"4.66E-10".

     

    And for simplicity, suppose the string in A1:A4 is "a".

     

    Then =SUMIF(A1:A4,"a",B1:B4) returns 4.66E-10, as we should expect.

     

    But =SUM(B1:B4) returns 0.00E+00, which is exactly zero in Scientific format.

     

    SUM returns exactly zero because of a dubious decision that Excel makes, to wit:  if the two non-zero operands of the last subtraction (or last addition of oppositely-signed values) are "close enough", Excel substitutes the actual infinitesimal difference with exactly zero in an attempt to hide such "close" differences.

     

    But Excel did not implement that rule consistency.  In particular, SUMIF(S) and SUMPRODUCT do not implement the rule.  In fact, it is not implemented consistently within SUM.

     

    -----

     

    The infinitesimal (and possibly hidden) differences in B1:B4 are a common problem that arises from internal binary arithmetic calculations.

     

    In general, the work-around is to explicitly round the SUMIF(S) calculation to the number of decimal places that we expect to be accurate, based on the precision of the original data.

     

     

  • JoeUser2004's avatar
    JoeUser2004
    Bronze Contributor

    Stefano66 

     

    "If a picture is worth a thousand words, an Excel file is worth a thousand pictures". (wink)

     

    Please attach an Excel file, not an image.

     

    We cannot give you a dispositive explanation because the data is probably more precise that you show in the image.  The extra precision might even be invisible in the Excel file because Excel displays (formats) only up to 15 significant digits, rounded.

     

    But I can offer an example that duplicates the symptoms, albeit not the exact values.

     

    Suppose B1 is, in effect, the result of the following calculation:  =2878110+"4.66E-10".

     

    And for simplicity, suppose the string in A1:A4 is "a".

     

    Then =SUMIF(A1:A4,"a",B1:B4) returns 4.66E-10, as we should expect.

     

    But =SUM(B1:B4) returns 0.00E+00, which is exactly zero in Scientific format.

     

    SUM returns exactly zero because of a dubious decision that Excel makes, to wit:  if the two non-zero operands of the last subtraction (or last addition of oppositely-signed values) are "close enough", Excel substitutes the actual infinitesimal difference with exactly zero in an attempt to hide such "close" differences.

     

    But Excel did not implement that rule consistency.  In particular, SUMIF(S) and SUMPRODUCT do not implement the rule.  In fact, it is not implemented consistently within SUM.

     

    -----

     

    The infinitesimal (and possibly hidden) differences in B1:B4 are a common problem that arises from internal binary arithmetic calculations.

     

    In general, the work-around is to explicitly round the SUMIF(S) calculation to the number of decimal places that we expect to be accurate, based on the precision of the original data.

     

     

    • Stefano66's avatar
      Stefano66
      Copper Contributor

      JoeUser2004 

      Thanks for your reply.

      My actual data are the result of some calculations in a bigger table; however, for the sample of this post I typed the values directly in the interval B1:B4, so I suppose that no infinitesimal difference could be involved.

      Reading what you wrote I realize that the values shown in the image I posted could be misunderstood: I live in Italy and here we use the dot as thousand separator and the comma as decimal one, so actual values in US notation are 2,878.11, -959, -959 and -960.11.

      This said, I tried to apply the following formula to all my values:

      value*10^6 - INT(value*10^6)

      If an infinitesimal difference producing the result I got in the order of E-13 exists, it will be shown in the result of the formula; but I always obtain 0.00E+00, so my assumption that no infinitesimal hidden difference, at least a significant one, exists is correct.

      It seems that SUMIF and SUMIFS manipulate the values affected in a way that sometimes gives a difference, but it's not clear how and when they do it: with the whole of my data, the result is usually correct, only in two cases it's wrong.

      I think I'll go with the rounding work-around you suggested.

       

      • JoeUser2004's avatar
        JoeUser2004
        Bronze Contributor

        Hecatonchire  wrote:  ``Floating-point arithmetic may give inaccurate results in Excel ``

         

        While that explains (very poorly, IMHO) the root cause of the infinitesimal difference in the first place, it does not explain why SUM and SUMIF(S) give different results in this context.  For a hint, refer to the section "Example when a value reaches zero" in the cited MSFT article.

         

        -----

         

        Stefano66  wrote:  ``so actual values in US notation are 2,878.11, -959, -959 and -960.11``

         

        Well, of course!  My bad!  My tired eyes misread "2.878,110" as 2,878,110, so I didn't bother to question my (mis)interpretation.

         

        But the explanation of the difference between SUM and SUMIF(S) remains the same.

         

        The infinitesimal difference is due to internal binary anomalies.

         

        But the difference between SUM and SUMIF(S) is due to Excel's arbitrary and inconsistent application of the misleading heuristic with the last difference of operands that are "close enough".

         

        To demonstrate....

         

        1. =B1+B2+B3+B4 results in exactly zero -- 0.00E+00 when formatted as Scientific

        2. =B1+B2+B3+B4+0 results in 1.14E-13 (rounded)

        3. =(B1+B2+B3+B4) results in 1.14E-13

        4. =IF(B1+B2+B3+B4 = 0, TRUE) results in FALSE

        5. =SUM(2878.11,-959,-959,-960.11) results in 1.14E-13

        6. =SUM(B1,B2,B3,B4) results in 0.00E+00

         

        Obviously, adding zero (#2) or parentheses (#3) should not make a difference.  They make a difference because the last operation is no longer subtraction of two non-zero values.  (Think of parentheses as a grouping "operation".)

         

        IF(B1+B2+B3+B4 = 0, TRUE) returns FALSE (#4) because the heuristic only applies to the last difference in a formula, not an expression within a formula.

         

        The SUM difference between #5 and #6 is due to the (even more) arbitrary application of the heuristic within SUM.

         

        -----

         

        Stefano66  wrote:  ``my assumption that no infinitesimal hidden difference, at least a significant one, exists is correct.``

         

        No, it is not.  You are thinking decimal instead of binary.

         

        The root cause of the problem is:

         

        1. Numbers are represented by the sum of 53 consecutive powers of 2 ("bits").

         

        2. Most decimal fractions cannot be represented exactly even as an infinite sum of powers of 2, much less 53 consecutive powers of 2.

         

        3. The binary approximation of a particular decimal fraction might vary depending on the magnitude of the number because some bits must be used to represent the integer part, leaving fewer bits for the approximation of the decimal fraction.  That is why, for example, 10.01 - 10 = 0.01 returns FALSE(!).

         

        4. Multiplying by 1E+06 effectively adds shifted binary values (shifted by up to 19 binary places), resulting in a loss of precision when the sum is rounded to 53 consecutive powers of 2.  (There are other technical factors, such as binary "normalization".)

         

        Applying #1 through #3, we find that the exact decimal representations of the binary approximations of 2878.11, -960.11 and 2878.11-959-959 are:

         

        2878.11000000000012732925824820995330810546875

        -960.1100000000000136424205265939235687255859375

         960.11000000000012732925824820995330810546875

         

        Ignoring signs, the difference between -960.11 and 2878.11-959-959 arises because 2878.11-959-959 merely reduces the integer part of 2878.11 without changing the original less-precise binary approximation of 0.11 in that context (2878+0.11).

         

        Again, we cannot see the exact precision of these values because Excel formats only the first 15 significant digits (highlighted), rounded.

         

        (With apologies if that was TMI.)

Resources