Distribution of floating-point operations in scientific computing

Daniel Pfeffer

I remember reading that someone had performed such an analysis, but I can't find any pointers to it. The idea was that additions/subtractions are more common than multiplications, which in turn are much more common than divisions/square root. This implies that optimizing the less common operations is likely to give a lower return than optimizing the more common operations. As I said, my Google-fu is non-functional today. :(

If you have an important point to make, don't try to be subtle or clever. Use a pile driver. Hit the point once. Then come back and hit it again. Then hit it a third time - a tremendous whack. --Winston Churchill

Daniel Pfeffer

Kornfeld Eliyahu Peter wrote:

For statistical purposes it is 25% each :)

Actually, it isn't. A review of floating-point programs that I have written shows that addition/subtraction is more common than multiplication, and these are much more common than division/square root. I am writing various floating-point libraries, and would like this information so I can know where to spend my optimization time.

If you have an important point to make, don't try to be subtle or clever. Use a pile driver. Hit the point once. Then come back and hit it again. Then hit it a third time - a tremendous whack. --Winston Churchill

Chris C B

Looks to me that every body is wrong. There are clearly more zeros than ones. Each byte is packed with leading zeros. The ones are big-time losers. QED. :laugh:

Jochen Arndt

Wrong thread?

Chris C B

Woops! Yes, it should be the thread below. :-O You will understand my difficulty when you see my next thread. :laugh:

Rage

What for ?

Do not escape reality : improve reality !

Daniel Pfeffer

I'm writing a floating-point package in C++ that provides: 1. A full implementation of the binary part of the IEEE-754-2008 Standard for Floating-Point Arithmetic (single-, double- and quad-precision) 2. Implementation of higher-precision formats, compatible with the Standard (up to binary1024). I have a basic implementation written using the "standard" algorithms, and would like some idea of where to invest time on improvements. Obviously, spending a lot of time on an operation that is rarely executed is not the best use of my time... :)

If you have an important point to make, don't try to be subtle or clever. Use a pile driver. Hit the point once. Then come back and hit it again. Then hit it a third time - a tremendous whack. --Winston Churchill

DaveAuld

Not quite, the thread below the thread below... take a step away from keyboard......

Dave Find Me On:Web|Youtube|Facebook|Twitter|LinkedIn Folding Stats: Team CodeProject

Jochen Arndt

Daniel Pfeffer wrote:

I'm writing a floating-point package in C++

That was not clear from your original question. So I will dig in here: I would not think about that. All basic operations will be used often (more or less) and should be therefore optimised as far as possible. Because division is the slowest operation it might be the first candidate even used probably less than the other operations. When a calculation uses divisions, a better implementation would probably reduce the overall calculation time by a greater factor than without division optimsation but with addition and multiplication optimisation.

Chris C B

That isn't a thread - it's just a single post. ;P :-\ Anyway, I don't use a keyboard, I just use my psychic powers to make the words appear on the screen. :laugh:

Daniel Pfeffer

OK, that makes sense. Thanks.

If you have an important point to make, don't try to be subtle or clever. Use a pile driver. Hit the point once. Then come back and hit it again. Then hit it a third time - a tremendous whack. --Winston Churchill

Jochen Arndt

You are welcome. It is an interesting and challenging topic. Did you plan to publish it as an article?

Daniel Pfeffer

Eventually - yes. The code works for the few problems that I've thrown at it, but that's not good enough (see the Pentium bug...). My biggest problem is finding an appropriate test suite; most of them cost an arm and a leg, and I can't justify spending that sort of money on a hobby. :(

If you have an important point to make, don't try to be subtle or clever. Use a pile driver. Hit the point once. Then come back and hit it again. Then hit it a third time - a tremendous whack. --Winston Churchill

patbob

The distribution of operations depends on the problem set. However, you might be able to take some general guidelines from the evolution of computers themselves. Addition/subtraction came first, with floating point units being added later. If you look at those floating point units, you'll probably see that later ones implemented more operators. On the other hand, if you look at GPUs, they've always had floating point hardware -- those problem sets were never tractable in real time until floating point hardware existed. As for testing, the best way I found was to look at the architecture of the hardware, and design a test that tested it. For example, the old VAX FPUs used a nibble lookup table for multiplication, so I concluded that I needed to test every pattern in that lookup table to know if the hardware was OK. That did not reliably happen by simply pounding a lot of math-happy code at the FPU -- it required a specially created dataset that could be proven to be exercising each entry in the lookup table. If your hardware doesn't use a nibble lookup table, that test would likely be useless since it might not achieve full coverage.

We can program with only 1's, but if all you've got are zeros, you've got nothing.

englebart

The most floating point math I have seen recently was in a mapping package. It was heavily loaded with trigonometry functions as you can imagine. I could see the optimizations for those functions varying heavily dependent on the bit size. (per your 1024 bit precision capability)

kalberts

Even "real" numbers can turn out to be misleading, if the background for the figures are not completely understood. Such as: 30+ years ago I was working on a computer which had an extreme FPU (it filled about half a square meter of circuit board). It was so fast that for integer multiply and divide, the 32 bit integer value was internally converted to a 64 bit floating point value, the operation performed by the FPU, and the result converted back to integer format. So a count of FP multiply/divide operations would count integer operations as well. Another case: At my university, the IT people running the huge mainframe (this was many years ago) attached a counter to the Divide by Zero flag, and discovered that every single day, literally tens of millions of divide by zero was performed. For a few days, there was a big uproar in the IT department over the "low code quality" causing so many exceptions - until one of the mechanical engineering guys noticed the worries and explained that this was quite normal and expected: Some of the standard matrix operations would generate partial results where some number indeed was divided by zero, but the algorithm did not make use of those partial results. So there was no "real" need to perform those divisions at all; it was just a consequence of using a standard matrix library operating on all elements rather than those actually used. If you didn't know, you might have spent lots of time speeding up the processing of the Divide by Zero exception, which might have been a waste. When you ask for other people's use of a certain mechanism, you will not know the context from which these figures were drawn. If you collect data from two dozen independent sources, you might get an idea about the "typical" figures, but they might be completely off for one specific application domain. To illustrate: This machine with the half sqare meter FPU were mostly used in engineering applications, where FP performance was at a premium. For business use, you could choose the BCD option. Business applications hardly do division at all, so there was no BCD divide hardware - it was implemented purely in microcode, and it was dead slow! But no customer complained over it: They never discovered, because they never used BCD divide. For comparison: FP divide started with a table lookup for the two operands, giving the first 11 bits correct, followed by 1-clock-cycle iterations, each iteration doubling the number of correct result bits. Finally, 1 cycle was requred for normalizi

TNCaver

Wouldn't that vary from one application to the next, depending on the purpose of the app?

If you think 'goto' is evil, try writing an Assembly program without JMP.

JRickey

If the less common operations are dramatically slower than the common ones, it still may be worth it to optimize them. Take a look at the speed comparisons at Integer and Floating-Point Arithmetic Speed vs Precision[^]. Consider the Core i7-4770 floating point graph for 32-bit operations, indicating multiplication takes about 3 times as long as addition. If addition occurs 75% of the time and multiplication 25%, you will spend the same time on each. The decision might be influenced by which operation would be easier to optimize and which would produce the greater gain once optimized. (I see Jochen Arndt gave similar advice. This puts some numbers to it for you.)

Lost User

In terms of which algorithm gets run most often, I'd guess the Sieve of Eratosthenes. :doh: You might want to profile that. Or anything "floating" in the more popular encryption techniques (if that's possible). And bitcoin mining.

"(I) am amazed to see myself here rather than there ... now rather than then". ― Blaise Pascal

obermd

Since multiplication can be done via addition and division can be done via subtractions and hardware shifts it makes a lot of sense that there are more additions and subtractions than other operations. Roots can be done via smart algorithms using multiplication, division, and subtractions.