Reusing an Incremented Variable within a Single Statement

Niklas L

Good points.

Joe Woodbury wrote:

const int temp = src[i]; The "const" here does nothing and ends up being slower, not just because you are confusing the compiler, but because it prevents the compiler from using an indexed assembly call.

Care to elaborate on this a bit? Why would the const affect the compiler? I would assume the compiler easier could chose an indexed call just because there is a const.

home

Skippums

Joe Woodbury wrote:

This is seriously broken. You are allocating a pointer and then decrementing it. This is VERY bad (especially since you declare dst const and now can't delete it.) If you want to use 1 indexing, simply allocate one extra element and don't worry about element zero.

Actually you can delete it:

delete[] (dst + 1);

Joe Woodbury wrote:

The behavior here is technically undefined, however with VS 2008 it will increment i, then use the result as the index for both arrays.

That was my original question... Thank you for answering it! Even if it would have done what I wanted, I still probably wouldn't have used it due to code maintainability issues.

Joe Woodbury wrote:

The "const" here does nothing

Actually, the const there prevents people in the future from modifying the value stored in temp, which may prevent coder error. In the simple loop I proposed, it probably will have no effect either way, since screwing up the loop through human error is unlikely and I'm guessing VS2008 is smart enough not to allocate memory for it. Also, the actual loop I am using is more complex than this example, so the compiler couldn't have used an indexed assembly call anyway.

Joe Woodbury wrote:

Because memcpy() will very likely be faster in this case

I can't use memcpy... that is part of the complexity of my problem. See my response to the post immediately before yours if you are curious as to why I can't use that function. I do fully agree with your statement:

Joe Woodbury wrote:

Assuming you have a sound algorithm, a general rule is writing sensible, plain C/C++ code

Thanks for the help!

Sounds like somebody's got a case of the Mondays -Jeff

Joe Woodbury

for (size_t i = 0; i < srcLen; ++i)
dst[i + 1] = src[i];

for (size_t i = 0; i < srcLen; )
{
const int temp = src[i];
dst[++i] = temp;
}

Sorry, I was typing fast and was rather confusing. I should have said that the second doesn't take advantage of the indexing assembly can do. In assembly both calls use something like mov DWORD PTR [ecx+eax*4], edx, the first, however, "sees" the add by one and simply makes the final copy mov DWORD PTR [eax+edx*4+4], ecx the instruction is one extra byte. Now you could write the function as follows, however, my experience is that it will end up running slower. Partly because the compiler may not make pDst and pSrc register variables (due to not enough room) and partly because of the above.

int* pDst = &dst[1];
int* pSrc = &src[0]; // just to be more clear

for (size_t i = 0; i < srcLen; ++i)
*pDst++ = *pSrc++;

[Note: Turns out that in an admittedly isolated environment, the second algorithm and third algorithms are slightly faster than the first, which surprised me, but that's the voodoo of assembly and modern CPUs. memcpy() is a solid 15% faster. Of course, memcpy is likely using SSE2 instructions which makes it even faster than rep movsd.

modified on Tuesday, August 24, 2010 2:58 PM

Joe Woodbury

Skippums wrote:

I'm guessing VS2008 is smart enough not to allocate memory for it

It actual does since it runs out of registers (I was a little surprised by this, but it makes sense once you look at the dissassembled code.) In practice, though, I'd imagine the cache would prevent this from being a big performance hit.

Niklas L · modified on Tuesday, August 24, 2010 2:58 PM

Thanks for the explanation. It makes sense if it depends on the indexing rather than the const keyword.

home

Paul Michalik

One question: Have you measured how much faster those a-priori optimizations are running compared to the very cleanly coded algorithm from the post above (using std::copy)?

Aescleal

So what's wrong with the wrapping class solution to convert your array to a 1 based one? Seeing your code there you could use std::generate on a std::vector to get the same effect as your manual loop. Generally if you're doing low level memory fiddling, pointer arithmetic and looping at the same time in C++ there's usually a simpler way. Cheers, Ash

Joe Woodbury

Repeating my edit above: Out of curiosity, I benchmarked the various algorithms using just an int array of 10,000 and 100,000 elements.

void Test1()
{
for (size_t i = 0; i < len; ++i)
pDst[i + 1] = pSrc[i];
}

void Test2()
{
for (size_t i = 0; i < len; )
{
const int temp = pSrc[i];
pDst[++i] = temp;
}
}

void Test3()
{
memcpy(&pDst[1], pSrc, len * sizeof(int));
}

void Test4()
{
int *src_start = (int*) &pSrc[0];
int *src_end = (int*) &pSrc[len];

std::copy(src\_start, src\_end, &pDst\[1\]);

}

These are very artificial tests, but as expected Test3 & Test 4 were fastest by about 15%. Test4 was often slightly faster by a few cycles than Test3. I scratched my head since both end up at memcpy(), but Test4 has more apparent overhead. But then I realized it was the len * sizeof(int) calculation that slightly slows Test3(). Surprisingly, Test2 was ever so slightly faster than Test1 (by about 0.1% - 0.5% on my system.) I suspect the CPU cache covers the "save" of the register.

norish

This would be;

int i = 1;
i = i + 1;
int j = i + i;

:)

Paul Michalik

Thanks for the insight. A nice lesson for the "early optimizers"...

Stefan_Lang

If your intent is accessing two arrays within a loop, then - no matter the relation between indices - the fastest way would be to use two pointers to the individual elements and increment these. If you're so intent on improving performance, consider this: every direct access to an array element via an index value requires 1. loading the start address of the array 2. get the size of an element 3. multiplying that by the index(-1) 4. adding that to the start address 5. dereference this address to get to the actual element. As opposed to using pointers which just requires 1. load the pointer 2. dereference it Of course, incrementing the pointers eats up most of this advantage, as you have to add the element size each iteration. And most likely a good optimizer will convert your code into something like this anyway. What I want to say, using an index is just complicating things unneccesarily if all you want to do is access each element sequentially.

JFDR_02

"why not write efficient code when you can?" X| Don't be daft. Write readable code when you can. Optimize when necessary and use a profiler to detect inefficencies.

Skippums

Why use a performance profiler when I can just guess which parts will be inefficient during implementation? And for the record, my code is readable; the compiler understands it just fine. :-D

Sounds like somebody's got a case of the Mondays -Jeff

Niklas L

Skippums wrote:

Why use a performance profiler when I can just guess which parts will be inefficient

Unless you're writing Hello World, you will be surprised.

home

Skippums

I guess I need to be more explicit when I type sarcastic comments. Sadly, I couldn't find the appropriate voice inflection characters on my keyboard. :)

Sounds like somebody's got a case of the Mondays -Jeff

Niklas L

..or don't write what I will read :confused: :-D

home