Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. Algorithms
  4. fast division algoirthm

fast division algoirthm

Scheduled Pinned Locked Moved Algorithms
questionalgorithms
8 Posts 2 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • K Offline
    K Offline
    khomeyni
    wrote on last edited by
    #1

    hello I want to find a fast algorithm for computing 1/d , where d is double ( albeit it can be converted to integer) what is the best algorithm for this case of many algorithms(SRT , goldschmith,newton raphson, ...)?I'm writing my program in c language. thanks in advance.

    L 1 Reply Last reply
    0
    • K khomeyni

      hello I want to find a fast algorithm for computing 1/d , where d is double ( albeit it can be converted to integer) what is the best algorithm for this case of many algorithms(SRT , goldschmith,newton raphson, ...)?I'm writing my program in c language. thanks in advance.

      L Offline
      L Offline
      Lost User
      wrote on last edited by
      #2

      It depends. On the quality of the implementation, the number of bits that you care about, the platform, and even the compiler. There's no simple answer. You'll just have to try them - unless you tell more about the intended target etc. Some more options might be: - Avoiding the division altogether. Calculate with fractions. - Using hardware-supplied division, if it exists. On some processors it's not half bad. - Using hardware-supplied approximate reciprocal, optionally with refinement step. x86 has had RCPSS[^] for a while now, it's a lot faster than actually dividing, even if you do a refinement step. 3DNow! actually had better support for fast reciprocals but it's AMD specific and obsolete. - Using crazy IEEE-format-specific tricks such as this[^]

      K 1 Reply Last reply
      0
      • L Lost User

        It depends. On the quality of the implementation, the number of bits that you care about, the platform, and even the compiler. There's no simple answer. You'll just have to try them - unless you tell more about the intended target etc. Some more options might be: - Avoiding the division altogether. Calculate with fractions. - Using hardware-supplied division, if it exists. On some processors it's not half bad. - Using hardware-supplied approximate reciprocal, optionally with refinement step. x86 has had RCPSS[^] for a while now, it's a lot faster than actually dividing, even if you do a refinement step. 3DNow! actually had better support for fast reciprocals but it's AMD specific and obsolete. - Using crazy IEEE-format-specific tricks such as this[^]

        K Offline
        K Offline
        khomeyni
        wrote on last edited by
        #3

        I,m writing a bit of a large programme and within it there is some divisions, the programme must be implemented on fpga and in fpga there is no division.so I must write a fast algorithm for all divisions , I don't have any access to hardware and I must implement it on software level. I have read some algorithm from here , but I don't know which one to use.

        L 1 Reply Last reply
        0
        • K khomeyni

          I,m writing a bit of a large programme and within it there is some divisions, the programme must be implemented on fpga and in fpga there is no division.so I must write a fast algorithm for all divisions , I don't have any access to hardware and I must implement it on software level. I have read some algorithm from here , but I don't know which one to use.

          L Offline
          L Offline
          Lost User
          wrote on last edited by
          #4

          So, you can't add a hardware divider to the FPGA then? Or fast reciprocal support? Anyway it depends. Does it have fast multiplication? If not, well, that's a problem, you could only implement the slow methods then. If you have fast multiplication and IEEE floats, you can use the weird trick I linked to in my previous post with a couple of refinement steps. That's really just Newton–Raphson division with a simpler calculation for the initial approximation (but afaik it still only takes 3 refinements for single-precision floats, just like the regular initial approximation). Fast reciprocal support works that way too - give a fast initial approximation (handling the exponent right and getting significant bits from a lookup table, if you get 12 significant bits that way you only need one refinement step for single-precision or, 13 are enough to get 2 steps for double-precision) and optionally have instructions that help implement the refinement step (like AMD's PFRCPIT1 and PFRCPIT2), for example to calculate Y = (1 - D*X) and to calculate X + X * Y. Even without those tricks Newton–Raphson division is still not bad, with the linear approximation it takes only 4 refinements for double-precision floats, but it also takes some annoying exponent adjustments to get in the right range first (in hardware that wouldn't be half as annoying). Goldschmidt division is, afaik, roughly equivalent in performance and might have a slightly less complex implementation. It's really the same sort of deal - trickery with the exponent to get in the right range, the "2 - something" estimation trick (which is rearranged in Newton-Raphson division, but it's really the same thing), and doing the refinement step until all the bits are right. It just looks a little different.

          K 1 Reply Last reply
          0
          • L Lost User

            So, you can't add a hardware divider to the FPGA then? Or fast reciprocal support? Anyway it depends. Does it have fast multiplication? If not, well, that's a problem, you could only implement the slow methods then. If you have fast multiplication and IEEE floats, you can use the weird trick I linked to in my previous post with a couple of refinement steps. That's really just Newton–Raphson division with a simpler calculation for the initial approximation (but afaik it still only takes 3 refinements for single-precision floats, just like the regular initial approximation). Fast reciprocal support works that way too - give a fast initial approximation (handling the exponent right and getting significant bits from a lookup table, if you get 12 significant bits that way you only need one refinement step for single-precision or, 13 are enough to get 2 steps for double-precision) and optionally have instructions that help implement the refinement step (like AMD's PFRCPIT1 and PFRCPIT2), for example to calculate Y = (1 - D*X) and to calculate X + X * Y. Even without those tricks Newton–Raphson division is still not bad, with the linear approximation it takes only 4 refinements for double-precision floats, but it also takes some annoying exponent adjustments to get in the right range first (in hardware that wouldn't be half as annoying). Goldschmidt division is, afaik, roughly equivalent in performance and might have a slightly less complex implementation. It's really the same sort of deal - trickery with the exponent to get in the right range, the "2 - something" estimation trick (which is rearranged in Newton-Raphson division, but it's really the same thing), and doing the refinement step until all the bits are right. It just looks a little different.

            K Offline
            K Offline
            khomeyni
            wrote on last edited by
            #5

            thanks harold for your complete and good reply I cannot add a hardware to the FPGA (the program will be complicated) and don't have any reciprocal support. I wrote some algorithms as follows( from the pages you send to me and from others):

            void test1(){
            const long MAX_Iter=10000000;//10^7
            int *ar1=new int [MAX_Iter];/// array of integers to find the reciprocal
            double *R=new double [MAX_Iter];//array of double for saving the regular division 1/x
            double *R2=new double [MAX_Iter];//array of double for saving the algorithms division 1/x
            double *ERROR=new double [MAX_Iter];
            double max_error=pow(10,-100.0);;

            /\*
            \*  generate some number bigger than 2
            \*/
            for(int i=0;i
            
            L 1 Reply Last reply
            0
            • K khomeyni

              thanks harold for your complete and good reply I cannot add a hardware to the FPGA (the program will be complicated) and don't have any reciprocal support. I wrote some algorithms as follows( from the pages you send to me and from others):

              void test1(){
              const long MAX_Iter=10000000;//10^7
              int *ar1=new int [MAX_Iter];/// array of integers to find the reciprocal
              double *R=new double [MAX_Iter];//array of double for saving the regular division 1/x
              double *R2=new double [MAX_Iter];//array of double for saving the algorithms division 1/x
              double *ERROR=new double [MAX_Iter];
              double max_error=pow(10,-100.0);;

              /\*
              \*  generate some number bigger than 2
              \*/
              for(int i=0;i
              
              L Offline
              L Offline
              Lost User
              wrote on last edited by
              #6

              "test4" is doing a value-converting cast, while the algorithm is meant to be used with a reinterpret_cast: take the bit-pattern of the float, and use it as though it was an integer. Depending on the available commands, points may not be necessary to implement that. Perhaps if you tell me how this FPGA will work I can work out some way to implement that algorithm on it. I'm not sure that Goldschmidt division is right - well actually the implementation is probably right and I've seen it before, but what it's doing (calculating the fixedpoint inverse of a fixedpoint input, I think?) doesn't match its usage there. That would also explain its super high error.

              K 1 Reply Last reply
              0
              • L Lost User

                "test4" is doing a value-converting cast, while the algorithm is meant to be used with a reinterpret_cast: take the bit-pattern of the float, and use it as though it was an integer. Depending on the available commands, points may not be necessary to implement that. Perhaps if you tell me how this FPGA will work I can work out some way to implement that algorithm on it. I'm not sure that Goldschmidt division is right - well actually the implementation is probably right and I've seen it before, but what it's doing (calculating the fixedpoint inverse of a fixedpoint input, I think?) doesn't match its usage there. That would also explain its super high error.

                K Offline
                K Offline
                khomeyni
                wrote on last edited by
                #7

                actually I don't know anymore about it's implementation , my project manager say's something to me and I must do the desired algorithm. also I don't study anything about fpga and hardware implementations. some thing that I know is that he wants to speed up his program by writing it in C , for example 6*6 inversion program which in it there is some 1/d where d is float(we can convert it to integer by multiplying it with 2^n) . he has wrote his code in labveiw and wants to convert it to C language. then it will be implemented on fpga.

                L 1 Reply Last reply
                0
                • K khomeyni

                  actually I don't know anymore about it's implementation , my project manager say's something to me and I must do the desired algorithm. also I don't study anything about fpga and hardware implementations. some thing that I know is that he wants to speed up his program by writing it in C , for example 6*6 inversion program which in it there is some 1/d where d is float(we can convert it to integer by multiplying it with 2^n) . he has wrote his code in labveiw and wants to convert it to C language. then it will be implemented on fpga.

                  L Offline
                  L Offline
                  Lost User
                  wrote on last edited by
                  #8

                  Ok I don't understand anymore. So you're making code that will be implemented on an FPGA itself, in the fabric, not as a program that is run on the processor that you're turning an FPGA into? Then why not use the special tricks that work well in hardware?

                  1 Reply Last reply
                  0
                  Reply
                  • Reply as topic
                  Log in to reply
                  • Oldest to Newest
                  • Newest to Oldest
                  • Most Votes


                  • Login

                  • Don't have an account? Register

                  • Login or register to search.
                  • First post
                    Last post
                  0
                  • Categories
                  • Recent
                  • Tags
                  • Popular
                  • World
                  • Users
                  • Groups