Floating point assembler code [modified]

Tomerland

Dear gurus, I detected a runtime-bottleneck in my code at the following line

pY[j] = d1 * pY[j] + d2 * pX[j];

. All variables are floating point. This line is excecuted millions of times. Now I think about improving runtime by using inline-assembler. Can somebody give me a hint how to start in the best way (e.g. using a library) (by the way: I'm not a beginner at assembler-coding) Kind regards

modified on Thursday, July 3, 2008 5:16 AM

Roger Stoltz · modified on Thursday, July 3, 2008 5:16 AM

Tomerland wrote:

All variables are floating point. This line is excecuted millions of times. Now I think about improving runtime by using inline-assembler.

I would recommend you to not use inline-asm for various reasons, but the primary reason is "there must be a better way". A few things to reflect upon: 1. Floating point instructions are slow. Does it have to be floating points or would integer values suffice? 2. Updating an array of floating points is rarely a "bottleneck" from a data administration point of view. You rarely take time critical actions based upon floating point calculation with e.g. 15-bit exponent and a 64-bit mantissa which is what you have when using long double. If you're doing this from your GUI thread it will become unresponsive and the user will experience the UI as hung, but that's another problem. Is it not possible to calculate the new values of the array in a worker thread and post a message to the GUI thread when the calculation has finished? 3. How about using integers that are multiplied by e.g. 100? When you calculate with the values you have two "decimals", but when you present the value to the user, or store it, you can convert to a floating point value. I used this technique in an embedded system that had no support for floating point values, but I multiplied with 16 instead which was a good enough approximation.

"It's supposed to be hard, otherwise anybody could do it!" - selfquote
"High speed never compensates for wrong direction!" - unknown

Alan Balkany · modified on Thursday, July 3, 2008 5:16 AM

A few ideas: 1. Since the line is executed millions of times, you may be able to speed it up with loop unrolling http://en.wikipedia.org/wiki/Loop_unrolling[^]. 2. Every time you do pY[j], it requires a multiplication and an addition. If you're processing the array sequentially, it would be faster to start with a pointer to pY[0], then increment it to get the next array element. If you show some more surrounding code, we may be able to see some other optimizations.