Programing gut feeling - ** Update! **
-
I am not asking for absolute knowledge, I am not asking you to Google for me. And I know I could measure the answer myself. I just wonder whether you share my gut feeling on this. During a review yesterday I came across: (The language here is Go, but I would argue the same way in C etc. And
uint64()
is compile-time)gopNr = reqGopNr - entryStart + uint64(entry.Offset)
if gopNr >= uint64(entry.assetLen) { // Avoid mod operation every time since wrap is unusual
gopNr = gopNr % uint64(entry.assetLen)
}I commented:
I do not think % can have a measurable cost for small divisors. I would skip the if. It is a single IDIV operation in X86. If that is expensive, there might even be an if in the operator already...
Shooting from the hip, what is your gut feeling? ** Update! ** Thanks for all the interesting feedback. So I did measure it: Go Playground - The Go Programming Language[^] For some reason the code always measures zero or times out on that playground, but measures fine locally. The verdict is: Running with
if
is in fact faster, if we stick to the original assumption that the divisor is almost always smaller. The difference is a blazing 10 nanoseconds or if you prefer factor ~4x on an old x86 laptop. I was wrong thinking that it would not be measurable. But this will run on monster server and this is not the most frequently visited code. So I still vote to remove theif
for the sake of readability."If we don't change direction, we'll end up where we're going"
I'm in two minds: part of me agrees with with Carlos - it depends on the target processor. ARM for example has conditional execution on almost every instruction, so the if becomes a "skip" rather than a full on jump. But ... the modulus operator is an integer divide with knobs on (unless the divisor is always a power of two), and they aren't cheap, so it could be that it's worth the comparison cost even if it breaks branch prediction. And since the condition requires address calculation as well as a comparison, I'd probably say "dump it" even then. Optimization may improve it if it's in a tight loop, but I'd want to look at the assembly code before making a final decision.
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony "Common sense is so rare these days, it should be classified as a super power" - Random T-shirt AntiTwitter: @DalekDave is now a follower!
-
I'm in two minds: part of me agrees with with Carlos - it depends on the target processor. ARM for example has conditional execution on almost every instruction, so the if becomes a "skip" rather than a full on jump. But ... the modulus operator is an integer divide with knobs on (unless the divisor is always a power of two), and they aren't cheap, so it could be that it's worth the comparison cost even if it breaks branch prediction. And since the condition requires address calculation as well as a comparison, I'd probably say "dump it" even then. Optimization may improve it if it's in a tight loop, but I'd want to look at the assembly code before making a final decision.
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony "Common sense is so rare these days, it should be classified as a super power" - Random T-shirt AntiTwitter: @DalekDave is now a follower!
Way more in depth than I've ever gotten. I just avoid ifs in tight code! But maybe I shouldn't be. I do know if you can make say, an entire DFA traversal without conditional branching (and I think it's possible?) it should be significantly faster than the traditional method, which requires a ton of branching, but then you probably wouldn't be using idiv instructions in the first place with such a beast. So I guess ultimately it depends, as you suggest. I did not know that about the ARMs. I've been mostly dealing with Tensilica XTensa LX chips, but I'm getting sick of them. The trouble with ARMs is they're as rare as hen's teeth. Out of stock everywhere for the ones I want.
To err is human. Fortune favors the monsters.
-
I am not asking for absolute knowledge, I am not asking you to Google for me. And I know I could measure the answer myself. I just wonder whether you share my gut feeling on this. During a review yesterday I came across: (The language here is Go, but I would argue the same way in C etc. And
uint64()
is compile-time)gopNr = reqGopNr - entryStart + uint64(entry.Offset)
if gopNr >= uint64(entry.assetLen) { // Avoid mod operation every time since wrap is unusual
gopNr = gopNr % uint64(entry.assetLen)
}I commented:
I do not think % can have a measurable cost for small divisors. I would skip the if. It is a single IDIV operation in X86. If that is expensive, there might even be an if in the operator already...
Shooting from the hip, what is your gut feeling? ** Update! ** Thanks for all the interesting feedback. So I did measure it: Go Playground - The Go Programming Language[^] For some reason the code always measures zero or times out on that playground, but measures fine locally. The verdict is: Running with
if
is in fact faster, if we stick to the original assumption that the divisor is almost always smaller. The difference is a blazing 10 nanoseconds or if you prefer factor ~4x on an old x86 laptop. I was wrong thinking that it would not be measurable. But this will run on monster server and this is not the most frequently visited code. So I still vote to remove theif
for the sake of readability."If we don't change direction, we'll end up where we're going"
I wouldn't care one way or the other unless this code was executed frequently. Very frequently.
Robust Services Core | Software Techniques for Lemmings | Articles
The fox knows many things, but the hedgehog knows one big thing. -
I am not asking for absolute knowledge, I am not asking you to Google for me. And I know I could measure the answer myself. I just wonder whether you share my gut feeling on this. During a review yesterday I came across: (The language here is Go, but I would argue the same way in C etc. And
uint64()
is compile-time)gopNr = reqGopNr - entryStart + uint64(entry.Offset)
if gopNr >= uint64(entry.assetLen) { // Avoid mod operation every time since wrap is unusual
gopNr = gopNr % uint64(entry.assetLen)
}I commented:
I do not think % can have a measurable cost for small divisors. I would skip the if. It is a single IDIV operation in X86. If that is expensive, there might even be an if in the operator already...
Shooting from the hip, what is your gut feeling? ** Update! ** Thanks for all the interesting feedback. So I did measure it: Go Playground - The Go Programming Language[^] For some reason the code always measures zero or times out on that playground, but measures fine locally. The verdict is: Running with
if
is in fact faster, if we stick to the original assumption that the divisor is almost always smaller. The difference is a blazing 10 nanoseconds or if you prefer factor ~4x on an old x86 laptop. I was wrong thinking that it would not be measurable. But this will run on monster server and this is not the most frequently visited code. So I still vote to remove theif
for the sake of readability."If we don't change direction, we'll end up where we're going"
The conditional branch should be slower on modern/latest Desktop cpu. Have a look at this table: [Instruction tables](https://www.agner.org/optimize/instruction\_tables.pdf) Scroll down to the Intel 11th generation Tiger Lake. The IDIV only costs 4 ops. The JGE and two MOVs for the conditional will exceed that. It depends on the cpu, older architectures benefit from the branch.
-
I am not asking for absolute knowledge, I am not asking you to Google for me. And I know I could measure the answer myself. I just wonder whether you share my gut feeling on this. During a review yesterday I came across: (The language here is Go, but I would argue the same way in C etc. And
uint64()
is compile-time)gopNr = reqGopNr - entryStart + uint64(entry.Offset)
if gopNr >= uint64(entry.assetLen) { // Avoid mod operation every time since wrap is unusual
gopNr = gopNr % uint64(entry.assetLen)
}I commented:
I do not think % can have a measurable cost for small divisors. I would skip the if. It is a single IDIV operation in X86. If that is expensive, there might even be an if in the operator already...
Shooting from the hip, what is your gut feeling? ** Update! ** Thanks for all the interesting feedback. So I did measure it: Go Playground - The Go Programming Language[^] For some reason the code always measures zero or times out on that playground, but measures fine locally. The verdict is: Running with
if
is in fact faster, if we stick to the original assumption that the divisor is almost always smaller. The difference is a blazing 10 nanoseconds or if you prefer factor ~4x on an old x86 laptop. I was wrong thinking that it would not be measurable. But this will run on monster server and this is not the most frequently visited code. So I still vote to remove theif
for the sake of readability."If we don't change direction, we'll end up where we're going"
I suppose both operands are known to always be positive? If not, you understand that the meaning of the
%
operator varies between languages? I probably wouldn't use theif
, though I might test both ways just out of curiosity. It looks like a sophomoric inclusion. A freshman doesn't know an issue may exist. A sophomore thinks an issue may exist -- and adds protection. A master knows the suspected issue doesn't exist. A little knowledge is a dangerous thing. It's like when junior developers test an index value every time rather than simply catching an Exception (C#) when something goes awry. -
I am not asking for absolute knowledge, I am not asking you to Google for me. And I know I could measure the answer myself. I just wonder whether you share my gut feeling on this. During a review yesterday I came across: (The language here is Go, but I would argue the same way in C etc. And
uint64()
is compile-time)gopNr = reqGopNr - entryStart + uint64(entry.Offset)
if gopNr >= uint64(entry.assetLen) { // Avoid mod operation every time since wrap is unusual
gopNr = gopNr % uint64(entry.assetLen)
}I commented:
I do not think % can have a measurable cost for small divisors. I would skip the if. It is a single IDIV operation in X86. If that is expensive, there might even be an if in the operator already...
Shooting from the hip, what is your gut feeling? ** Update! ** Thanks for all the interesting feedback. So I did measure it: Go Playground - The Go Programming Language[^] For some reason the code always measures zero or times out on that playground, but measures fine locally. The verdict is: Running with
if
is in fact faster, if we stick to the original assumption that the divisor is almost always smaller. The difference is a blazing 10 nanoseconds or if you prefer factor ~4x on an old x86 laptop. I was wrong thinking that it would not be measurable. But this will run on monster server and this is not the most frequently visited code. So I still vote to remove theif
for the sake of readability."If we don't change direction, we'll end up where we're going"
My gut feeling is that code that's clever at the expense of readability should only be allowed when it has a demonstrated performance impact. Show me something that indicates that it will significantly increase application performance or the PR gets rejected.
Did you ever see history portrayed as an old man with a wise brow and pulseless heart, weighing all things in the balance of reason? Is not rather the genius of history like an eternal, imploring maiden, full of fire, with a burning heart and flaming soul, humanly warm and humanly beautiful? --Zachris Topelius
-
I am not asking for absolute knowledge, I am not asking you to Google for me. And I know I could measure the answer myself. I just wonder whether you share my gut feeling on this. During a review yesterday I came across: (The language here is Go, but I would argue the same way in C etc. And
uint64()
is compile-time)gopNr = reqGopNr - entryStart + uint64(entry.Offset)
if gopNr >= uint64(entry.assetLen) { // Avoid mod operation every time since wrap is unusual
gopNr = gopNr % uint64(entry.assetLen)
}I commented:
I do not think % can have a measurable cost for small divisors. I would skip the if. It is a single IDIV operation in X86. If that is expensive, there might even be an if in the operator already...
Shooting from the hip, what is your gut feeling? ** Update! ** Thanks for all the interesting feedback. So I did measure it: Go Playground - The Go Programming Language[^] For some reason the code always measures zero or times out on that playground, but measures fine locally. The verdict is: Running with
if
is in fact faster, if we stick to the original assumption that the divisor is almost always smaller. The difference is a blazing 10 nanoseconds or if you prefer factor ~4x on an old x86 laptop. I was wrong thinking that it would not be measurable. But this will run on monster server and this is not the most frequently visited code. So I still vote to remove theif
for the sake of readability."If we don't change direction, we'll end up where we're going"
I'll put the `if` there, especially if wrapping is unusual. Even on modern chips with so-called "fast division", 64-bit `div` (surely we're talking about unsigned numbers here?) takes over a dozen cycles *at best*. Sure it has only a few µops today, but they're µops with a high latency (or at least one of them is anyway). Further in the past, `div` only gets worse. Computers with "slow division" are still extremely common. Cascade Lake still had slow division, those are high-end computers that are only a couple of years old. By contrast, a branch *can* be bad, but this one won't be, if the comment is to be believed. If wrapping is unusual, then the branch will usually be correctly predicted non-taken. The comparison (and associated loads, if any) that happens before the branch is also nearly irrelevant in that case, because that dependency chain ends in the branch. Code after it does not need to wait until the comparison is done. In the normal case where there is no wrapping, an instruction that uses the new value of `gopNr` may be able to execute back-to-back with the instruction that produced it (doesn't mean it *will*, but it could). That is of course impossible if there was a `div` between them. > If that is expensive, there might even be an if in the operator already... Doesn't happen on any compiler I'm familiar with. I'm not familiar with the Go compiler, but still. It's not really a thing.
-
My gut feeling is that code that's clever at the expense of readability should only be allowed when it has a demonstrated performance impact. Show me something that indicates that it will significantly increase application performance or the PR gets rejected.
Did you ever see history portrayed as an old man with a wise brow and pulseless heart, weighing all things in the balance of reason? Is not rather the genius of history like an eternal, imploring maiden, full of fire, with a burning heart and flaming soul, humanly warm and humanly beautiful? --Zachris Topelius
-
The conditional branch should be slower on modern/latest Desktop cpu. Have a look at this table: [Instruction tables](https://www.agner.org/optimize/instruction\_tables.pdf) Scroll down to the Intel 11th generation Tiger Lake. The IDIV only costs 4 ops. The JGE and two MOVs for the conditional will exceed that. It depends on the cpu, older architectures benefit from the branch.
CMP+JGE (these macro-fuse) and two MOVs is only 3 µops, and they're fast µops. Why are there two movs anyway, only `entry.assetLen` should be getting loaded here, we already have `gopNr` and I don't see any immediate reason to copy it to another register. These µops are also not in the dependency chain of `gopNr`, they're only there for the compare&branch, following code could execute at the same time as this condition is being evaluated (of course subject to throughput limitations). At least one of the µops in IDIV has a bad latency and moderately bad throughput, and they're in the dependency chain from computing `gopNr` to using it (not shown).
-
CMP+JGE (these macro-fuse) and two MOVs is only 3 µops, and they're fast µops. Why are there two movs anyway, only `entry.assetLen` should be getting loaded here, we already have `gopNr` and I don't see any immediate reason to copy it to another register. These µops are also not in the dependency chain of `gopNr`, they're only there for the compare&branch, following code could execute at the same time as this condition is being evaluated (of course subject to throughput limitations). At least one of the µops in IDIV has a bad latency and moderately bad throughput, and they're in the dependency chain from computing `gopNr` to using it (not shown).
-
I am not asking for absolute knowledge, I am not asking you to Google for me. And I know I could measure the answer myself. I just wonder whether you share my gut feeling on this. During a review yesterday I came across: (The language here is Go, but I would argue the same way in C etc. And
uint64()
is compile-time)gopNr = reqGopNr - entryStart + uint64(entry.Offset)
if gopNr >= uint64(entry.assetLen) { // Avoid mod operation every time since wrap is unusual
gopNr = gopNr % uint64(entry.assetLen)
}I commented:
I do not think % can have a measurable cost for small divisors. I would skip the if. It is a single IDIV operation in X86. If that is expensive, there might even be an if in the operator already...
Shooting from the hip, what is your gut feeling? ** Update! ** Thanks for all the interesting feedback. So I did measure it: Go Playground - The Go Programming Language[^] For some reason the code always measures zero or times out on that playground, but measures fine locally. The verdict is: Running with
if
is in fact faster, if we stick to the original assumption that the divisor is almost always smaller. The difference is a blazing 10 nanoseconds or if you prefer factor ~4x on an old x86 laptop. I was wrong thinking that it would not be measurable. But this will run on monster server and this is not the most frequently visited code. So I still vote to remove theif
for the sake of readability."If we don't change direction, we'll end up where we're going"
megaadam wrote:
blazing 10 nanoseconds
That's about 10 feet (3 meters) at light speed. Whenever I see 'nanoseconds', I am reminded of USN Rear Admiral Grace Hopper - one of the "great's" in early computing history. See, for example, Grace Hopper's Nanoseconds[^] Mis-spelt 'Hopper'