Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Other Discussions
  3. Clever Code
  4. I hate floating point operations

I hate floating point operations

Scheduled Pinned Locked Moved Clever Code
c++comquestion
63 Posts 24 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • K KaRl

    <Using MFC> double dValue = atof("0.1"); ASSERT(dValue == 0.1); double dSecondValue = (1 + dValue + dValue + dValue + dValue); ASSERT(dSecondValue == 1.4); // Crash


    Where do you expect us to go when the bombs fall?

    Fold with us! ¤ flickr

    P Offline
    P Offline
    PIEBALDconsult
    wrote on last edited by
    #7

    Maybe there should be "rookie mistakes" forum in here.

    K 1 Reply Last reply
    0
    • R Ravi Bhavnani

      Anyone who dares to equality-compare floating point values with literals probably doesn't have a understanding of basic computer architecture. :) /ravi

      K Offline
      K Offline
      KaRl
      wrote on last edited by
      #8

      I may have oversimplified. The case was more like the following: double dTime = 0.; double dT = atof(<some value read in a file>); double dFinal = atof(<some value read in a file>); do{ ... dTime += dT; ... while(dTime < dFinal); A loop was missing because of the 'epsilon' induced by atof.


      Where do you expect us to go when the bombs fall?

      Fold with us! ¤ flickr

      T G 2 Replies Last reply
      0
      • P PIEBALDconsult

        Maybe there should be "rookie mistakes" forum in here.

        K Offline
        K Offline
        KaRl
        wrote on last edited by
        #9

        With yours it makes then two of them.


        Where do you expect us to go when the bombs fall?

        Fold with us! ¤ flickr

        P 1 Reply Last reply
        0
        • K KaRl

          <Using MFC> double dValue = atof("0.1"); ASSERT(dValue == 0.1); double dSecondValue = (1 + dValue + dValue + dValue + dValue); ASSERT(dSecondValue == 1.4); // Crash


          Where do you expect us to go when the bombs fall?

          Fold with us! ¤ flickr

          C Offline
          C Offline
          Chris Maunder
          wrote on last edited by
          #10

          I really do think the compiler should throw an error when you try to compare floating point values for equality.

          cheers, Chris Maunder

          CodeProject.com : C++ MVP

          K 1 2 Replies Last reply
          0
          • K KaRl

            With yours it makes then two of them.


            Where do you expect us to go when the bombs fall?

            Fold with us! ¤ flickr

            P Offline
            P Offline
            PIEBALDconsult
            wrote on last edited by
            #11

            At least three actually. Hmmmm... four... Incomplete, in no particular order, and without admitting to which ones I've committed. General: 1) Trying to swap two values without an intermediary 2) Not realizing the limitations of floating-point numbers 3) Various ways of introducing infinite loops 3a) Including infinite recursion 3a1) Especially with properties 4) Tests which either always pass or always fail In C languages: 1) Accidently using assignment in a test 2) Accidently falling-through in switches (not in C#) In SQL: 1) Not understanding implicit conversions 2) Naming tables, columns, etc. with reserved words, and not knowing about [] (if available)

            1 Reply Last reply
            0
            • K KaRl

              <Using MFC> double dValue = atof("0.1"); ASSERT(dValue == 0.1); double dSecondValue = (1 + dValue + dValue + dValue + dValue); ASSERT(dSecondValue == 1.4); // Crash


              Where do you expect us to go when the bombs fall?

              Fold with us! ¤ flickr

              R Offline
              R Offline
              Rick York
              wrote on last edited by
              #12

              In VS2003 the float.h header has the following definitions : #define DBL_EPSILON 2.2204460492503131e-016 /* smallest such that 1.0+DBL_EPSILON != 1.0 */ This is a very handy value to use when comparing floating point values. A tactic similar to this can be used : double value = ComputeValue(); double delta = fabs( value - expectedValue ); if( delta <= DBL_EPSILON ) TRACE( "values are considered to be equal\n" );

              W 1 Reply Last reply
              0
              • R Rick York

                In VS2003 the float.h header has the following definitions : #define DBL_EPSILON 2.2204460492503131e-016 /* smallest such that 1.0+DBL_EPSILON != 1.0 */ This is a very handy value to use when comparing floating point values. A tactic similar to this can be used : double value = ComputeValue(); double delta = fabs( value - expectedValue ); if( delta <= DBL_EPSILON ) TRACE( "values are considered to be equal\n" );

                W Offline
                W Offline
                Warren Stevens
                wrote on last edited by
                #13

                Rick York wrote:

                if( delta <= DBL_EPSILON )

                This is still not foolproof, as the floating point round-off errors can accumulate, depending on your calculations. If there is enough error in your calculations (try using log() or tan() near their "blow up" values, if you want really bad results, really quickly) then DBL_EPSILON will not be sufficient. Unfortunately (having seen this problem in action for many years) there is no one-line solution to this problem. The proper comparison will depend on your calculations, the input values, and what you are using your results for.

                www.IconsReview.com <-- Huge list of stock icon collections (both free and commercial)

                R 1 Reply Last reply
                0
                • W Warren Stevens

                  Rick York wrote:

                  if( delta <= DBL_EPSILON )

                  This is still not foolproof, as the floating point round-off errors can accumulate, depending on your calculations. If there is enough error in your calculations (try using log() or tan() near their "blow up" values, if you want really bad results, really quickly) then DBL_EPSILON will not be sufficient. Unfortunately (having seen this problem in action for many years) there is no one-line solution to this problem. The proper comparison will depend on your calculations, the input values, and what you are using your results for.

                  www.IconsReview.com <-- Huge list of stock icon collections (both free and commercial)

                  R Offline
                  R Offline
                  Rick York
                  wrote on last edited by
                  #14

                  Very true and that's why I said, "A tactic similar to this can be used." Personally I always compare to a "tolerance" value and that works well. The big issue is - what do you use for a tolerance value ? That varies according to the circumstances as you said.

                  W T K 3 Replies Last reply
                  0
                  • K KaRl

                    I may have oversimplified. The case was more like the following: double dTime = 0.; double dT = atof(<some value read in a file>); double dFinal = atof(<some value read in a file>); do{ ... dTime += dT; ... while(dTime < dFinal); A loop was missing because of the 'epsilon' induced by atof.


                    Where do you expect us to go when the bombs fall?

                    Fold with us! ¤ flickr

                    T Offline
                    T Offline
                    Tim Smith
                    wrote on last edited by
                    #15

                    Ravi's statement still holds. Floating point addition is bad, multiplication is good. What is 20.0 + 0.000000000000000000000000000001? 20 There isn't enough mantissa to hold all the digits. Then you add in the fact that floating point is basically base 2 while our math is base 10, floating point doesn't have much hope of representing numbers exactly. That is why banks used such things as scaled integers.

                    Tim Smith I'm going to patent thought. I have yet to see any prior art.

                    K 1 Reply Last reply
                    0
                    • R Rick York

                      Very true and that's why I said, "A tactic similar to this can be used." Personally I always compare to a "tolerance" value and that works well. The big issue is - what do you use for a tolerance value ? That varies according to the circumstances as you said.

                      W Offline
                      W Offline
                      Warren Stevens
                      wrote on last edited by
                      #16

                      Rick York wrote:

                      Very true and that's why I said, "A tactic similar to this can be used."

                      Don't take any offense - I wasn't trying to be pedantic (or bust your chops on the subject) I just wanted any newbie readers to be clear that there isn't a one-liner fix to the problem; after all this is the subtle bugs board.

                      Rick York wrote:

                      The big issue is - what do you use for a tolerance value ?

                      Yes! :sigh: the million dollar question...


                      www.IconsReview.com[^] Huge list of stock icon collections (both free and commercial)

                      1 Reply Last reply
                      0
                      • R Rick York

                        Very true and that's why I said, "A tactic similar to this can be used." Personally I always compare to a "tolerance" value and that works well. The big issue is - what do you use for a tolerance value ? That varies according to the circumstances as you said.

                        T Offline
                        T Offline
                        Tim Smith
                        wrote on last edited by
                        #17

                        Do a google search for "compare two floating point values" and you can find an article that talks about comparing two floats using their bit pattern (a.k.a. *((int *)&value)

                        Tim Smith I'm going to patent thought. I have yet to see any prior art.

                        1 Reply Last reply
                        0
                        • R Rick York

                          Very true and that's why I said, "A tactic similar to this can be used." Personally I always compare to a "tolerance" value and that works well. The big issue is - what do you use for a tolerance value ? That varies according to the circumstances as you said.

                          K Offline
                          K Offline
                          KaRl
                          wrote on last edited by
                          #18

                          Rick York wrote:

                          what do you use for a tolerance value ?

                          Something adapted to the context but the risk of a mistaken test result will ever exist.


                          Where do you expect us to go when the bombs fall?

                          Fold with us! ¤ flickr

                          1 Reply Last reply
                          0
                          • T Tim Smith

                            Ravi's statement still holds. Floating point addition is bad, multiplication is good. What is 20.0 + 0.000000000000000000000000000001? 20 There isn't enough mantissa to hold all the digits. Then you add in the fact that floating point is basically base 2 while our math is base 10, floating point doesn't have much hope of representing numbers exactly. That is why banks used such things as scaled integers.

                            Tim Smith I'm going to patent thought. I have yet to see any prior art.

                            K Offline
                            K Offline
                            KaRl
                            wrote on last edited by
                            #19

                            Tim Smith wrote:

                            Ravi's statement still holds. Floating point addition is bad, multiplication is good.

                            Mine still holds too, beware atof. I believe you could get the same result without any addition or multiplication (whose I doubt it is good). Introduction of an epsilon by atof is not indicated in the documentation[^]. Some might be fooled.

                            Tim Smith wrote:

                            scaled integers

                            Replacing a double by a structure of an integer and a floating point position?


                            Where do you expect us to go when the bombs fall?

                            Fold with us! ¤ flickr

                            D T 2 Replies Last reply
                            0
                            • C Chris Maunder

                              I really do think the compiler should throw an error when you try to compare floating point values for equality.

                              cheers, Chris Maunder

                              CodeProject.com : C++ MVP

                              K Offline
                              K Offline
                              KaRl
                              wrote on last edited by
                              #20

                              A warning may be sufficient, like the ';' after a 'if'. In my case, that would not have been enough. Guys who made that code didn't believe in warnings. When I reactivated the compiler option, over 1,400 warnings popped up at the first rebuild. Yeepee.


                              Where do you expect us to go when the bombs fall?

                              Fold with us! ¤ flickr

                              L 1 Reply Last reply
                              0
                              • K KaRl

                                A warning may be sufficient, like the ';' after a 'if'. In my case, that would not have been enough. Guys who made that code didn't believe in warnings. When I reactivated the compiler option, over 1,400 warnings popped up at the first rebuild. Yeepee.


                                Where do you expect us to go when the bombs fall?

                                Fold with us! ¤ flickr

                                L Offline
                                L Offline
                                Lost User
                                wrote on last edited by
                                #21

                                K(arl) wrote:

                                over 1,400 warnings popped up at the first rebuild

                                :omg: A cardinal sin. Everything we do here is warning level 3 or higher, with "warning as errors" on release builds.


                                Kicking squealing Gucci little piggy.
                                The Rob Blog

                                1 Reply Last reply
                                0
                                • K KaRl

                                  Tim Smith wrote:

                                  Ravi's statement still holds. Floating point addition is bad, multiplication is good.

                                  Mine still holds too, beware atof. I believe you could get the same result without any addition or multiplication (whose I doubt it is good). Introduction of an epsilon by atof is not indicated in the documentation[^]. Some might be fooled.

                                  Tim Smith wrote:

                                  scaled integers

                                  Replacing a double by a structure of an integer and a floating point position?


                                  Where do you expect us to go when the bombs fall?

                                  Fold with us! ¤ flickr

                                  D Offline
                                  D Offline
                                  Dan Neely
                                  wrote on last edited by
                                  #22

                                  K(arl) wrote:

                                  Tim Smith wrote: scaled integers Replacing a double by a structure of an integer and a floating point position?

                                  Possible I suppose, but storing the value in cents, not dollars would be a simpler method.

                                  -- Rules of thumb should not be taken for the whole hand.

                                  1 Reply Last reply
                                  0
                                  • K KaRl

                                    <Using MFC> double dValue = atof("0.1"); ASSERT(dValue == 0.1); double dSecondValue = (1 + dValue + dValue + dValue + dValue); ASSERT(dSecondValue == 1.4); // Crash


                                    Where do you expect us to go when the bombs fall?

                                    Fold with us! ¤ flickr

                                    K Offline
                                    K Offline
                                    Kochise
                                    wrote on last edited by
                                    #23

                                    Try this, this is what I use in every of my code :

                                    double dValue = atof("0.1");
                                    double dTest = 0.1;
                                    ASSERT
                                    (
                                    ((*((LONGLONG*)&dValue))&0xFFFFFFFFFFFFFF00)
                                    == ((*((LONGLONG*)&dTest)) &0xFFFFFFFFFFFFFF00)
                                    );

                                    double dSecondValue = (1 + dValue + dValue + dValue + dValue);
                                    double dTest2 = 1.4;
                                    ASSERT
                                    (
                                    (*((LONGLONG*)&dSecondValue)&0xFFFFFFFFFFFFFF00)
                                    == (*((LONGLONG*)&dTest2) &0xFFFFFFFFFFFFFF00)
                                    ); // *NO* Crash

                                    By reducing mantissa's complexity (skiping lasting bits) by an interger cast (mostly like an union over a double), you can do some pretty decent comparison with no headache... By using float (4 bytes) instead, you could simply things to :

                                    float dValue = atof("0.1");
                                    float dTest = 0.1;
                                    ASSERT
                                    (
                                    ((*((int*)&dValue))&0xFFFFFFF0)
                                    == ((*((int*)&dTest)) &0xFFFFFFF0)
                                    );

                                    float dSecondValue = (1 + dValue + dValue + dValue + dValue);
                                    float dTest2 = 1.4;
                                    ASSERT
                                    (
                                    (*((int*)&dSecondValue)&0xFFFFFFF0)
                                    == (*((int*)&dTest2) &0xFFFFFFF0)
                                    ); // *NO* Crash

                                    The problem comes mostly because the preprocessor code which convert double dTest = 0.1 is *NOT* the same than the code within ATOF which convert double dValue = atof("0.1"). So you don't get a bitwise exact match of the value, only a close approximation. By using the cast technique, you : 1- can control over how many bits how want to perform the comparison 2- do a full integer comparison, which is faster by far than loading floating point registers to do the same 3- etc... So define the following macros :

                                    #define DCMP(x,y) ((*((LONGLONG*)&x))&0xFFFFFFFFFFFFFF00)==((*((LONGLONG*)&y))&0xFFFFFFFFFFFFFF00)
                                    #define FCMP(x,y) (*((int*)&x)&0xFFFFFFF0)==(*((int*)&y)&0xFFFFFFF0)

                                    Use DCMP on double, and FCMP on float... But beware, you cannot do that :

                                    ASSERT(DCMP(atof("0.1"),0.1)); // atof returns a value which have to be stored...

                                    The following code works :

                                    #define FCMP(x,y) (*((int*)&x)&0xFFFFF000)==(*((int*)&y)&0xFFFFF000)

                                    float dSecondValue = atof("1.4"); // RAW : 0x3FB332DF
                                    float dTest2 = 1.39999; // RAW : 0x3FB33333, last 12 bits are differents, so don't compare them
                                    ASSERT(FCMP(dSecondValue,dTest2)); // *NO* Crash

                                    Kochise EDIT : you may have used a memcmp approach, which is similar in functionality, but you can only test on byte boundaries (base of lenght of comparison is byte) and x86 is little endian, so you start comparing the different bytes first,

                                    T K 2 Replies Last reply
                                    0
                                    • K KaRl

                                      Tim Smith wrote:

                                      Ravi's statement still holds. Floating point addition is bad, multiplication is good.

                                      Mine still holds too, beware atof. I believe you could get the same result without any addition or multiplication (whose I doubt it is good). Introduction of an epsilon by atof is not indicated in the documentation[^]. Some might be fooled.

                                      Tim Smith wrote:

                                      scaled integers

                                      Replacing a double by a structure of an integer and a floating point position?


                                      Where do you expect us to go when the bombs fall?

                                      Fold with us! ¤ flickr

                                      T Offline
                                      T Offline
                                      Tim Smith
                                      wrote on last edited by
                                      #24

                                      Read any book on the issues of floating point math and it will tell you that floating point addition is inherently more imprecise that floating point multiplication. For example, this is bad. You accumulate small error all the time: float x = 10; for (i = 0; i < 1000; i++) x += 0.05; This is much better but can still have a problem with the addition: float x = 10; for (i = 0; i < 1000; i++) float x1 = x + (i * 0.05);

                                      Tim Smith I'm going to patent thought. I have yet to see any prior art.

                                      K 1 Reply Last reply
                                      0
                                      • K Kochise

                                        Try this, this is what I use in every of my code :

                                        double dValue = atof("0.1");
                                        double dTest = 0.1;
                                        ASSERT
                                        (
                                        ((*((LONGLONG*)&dValue))&0xFFFFFFFFFFFFFF00)
                                        == ((*((LONGLONG*)&dTest)) &0xFFFFFFFFFFFFFF00)
                                        );

                                        double dSecondValue = (1 + dValue + dValue + dValue + dValue);
                                        double dTest2 = 1.4;
                                        ASSERT
                                        (
                                        (*((LONGLONG*)&dSecondValue)&0xFFFFFFFFFFFFFF00)
                                        == (*((LONGLONG*)&dTest2) &0xFFFFFFFFFFFFFF00)
                                        ); // *NO* Crash

                                        By reducing mantissa's complexity (skiping lasting bits) by an interger cast (mostly like an union over a double), you can do some pretty decent comparison with no headache... By using float (4 bytes) instead, you could simply things to :

                                        float dValue = atof("0.1");
                                        float dTest = 0.1;
                                        ASSERT
                                        (
                                        ((*((int*)&dValue))&0xFFFFFFF0)
                                        == ((*((int*)&dTest)) &0xFFFFFFF0)
                                        );

                                        float dSecondValue = (1 + dValue + dValue + dValue + dValue);
                                        float dTest2 = 1.4;
                                        ASSERT
                                        (
                                        (*((int*)&dSecondValue)&0xFFFFFFF0)
                                        == (*((int*)&dTest2) &0xFFFFFFF0)
                                        ); // *NO* Crash

                                        The problem comes mostly because the preprocessor code which convert double dTest = 0.1 is *NOT* the same than the code within ATOF which convert double dValue = atof("0.1"). So you don't get a bitwise exact match of the value, only a close approximation. By using the cast technique, you : 1- can control over how many bits how want to perform the comparison 2- do a full integer comparison, which is faster by far than loading floating point registers to do the same 3- etc... So define the following macros :

                                        #define DCMP(x,y) ((*((LONGLONG*)&x))&0xFFFFFFFFFFFFFF00)==((*((LONGLONG*)&y))&0xFFFFFFFFFFFFFF00)
                                        #define FCMP(x,y) (*((int*)&x)&0xFFFFFFF0)==(*((int*)&y)&0xFFFFFFF0)

                                        Use DCMP on double, and FCMP on float... But beware, you cannot do that :

                                        ASSERT(DCMP(atof("0.1"),0.1)); // atof returns a value which have to be stored...

                                        The following code works :

                                        #define FCMP(x,y) (*((int*)&x)&0xFFFFF000)==(*((int*)&y)&0xFFFFF000)

                                        float dSecondValue = atof("1.4"); // RAW : 0x3FB332DF
                                        float dTest2 = 1.39999; // RAW : 0x3FB33333, last 12 bits are differents, so don't compare them
                                        ASSERT(FCMP(dSecondValue,dTest2)); // *NO* Crash

                                        Kochise EDIT : you may have used a memcmp approach, which is similar in functionality, but you can only test on byte boundaries (base of lenght of comparison is byte) and x86 is little endian, so you start comparing the different bytes first,

                                        T Offline
                                        T Offline
                                        Tim Smith
                                        wrote on last edited by
                                        #25

                                        Your code still doesn't work since it suffers from boundary conditions. For example: 0xFFFFFF00 0xFFFFFEFF These are two very close floating point numbers, but your test will fail. Also, there are problems with -0 and +0. bool CMP (float x, float y, int tol) { int ix = *((int *) &x); int iy = *((int *) &y); if (ix < 0) ix = 0x80000000 - ix; if (iy < 0) iy = 0x80000000 - iy; return abs (ix - iy) <= tol; } This fixes the boundary condition and the +0, -0 issue. However, it still has problems with such things as +inf and -inf being close to +/- MAX_FLT and other issues with special floating point bit patterns.

                                        Tim Smith I'm going to patent thought. I have yet to see any prior art.

                                        K 1 Reply Last reply
                                        0
                                        • T Tim Smith

                                          Your code still doesn't work since it suffers from boundary conditions. For example: 0xFFFFFF00 0xFFFFFEFF These are two very close floating point numbers, but your test will fail. Also, there are problems with -0 and +0. bool CMP (float x, float y, int tol) { int ix = *((int *) &x); int iy = *((int *) &y); if (ix < 0) ix = 0x80000000 - ix; if (iy < 0) iy = 0x80000000 - iy; return abs (ix - iy) <= tol; } This fixes the boundary condition and the +0, -0 issue. However, it still has problems with such things as +inf and -inf being close to +/- MAX_FLT and other issues with special floating point bit patterns.

                                          Tim Smith I'm going to patent thought. I have yet to see any prior art.

                                          K Offline
                                          K Offline
                                          Kochise
                                          wrote on last edited by
                                          #26

                                          My macro can be of great help if you know where you put your foot. Eg when dealing with strict positive numbers set, or strict negative numbers set, without mixing the two. However the test case only works with 0xFF... values padded with 0, not like your 0xFFFFFEFF example. I think you wanted to say 0xFFFFFE00 which is correct :) Kochise PS : If I remember right, there is a 'magical trick' explained in a raticle on CP which explain how to cast double to float and the way back only using integer operations, and it works pretty well and fast, and also deals with the sign...

                                          In Code we trust !

                                          T 1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Don't have an account? Register

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups