Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. Reusing an Incremented Variable within a Single Statement

Reusing an Incremented Variable within a Single Statement

Scheduled Pinned Locked Moved C / C++ / MFC
c++tutorialquestionlounge
26 Posts 10 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J Joe Woodbury

    for (size_t i = 0; i < srcLen; ++i)
    dst[i + 1] = src[i];

    for (size_t i = 0; i < srcLen; )
    {
    const int temp = src[i];
    dst[++i] = temp;
    }

    Sorry, I was typing fast and was rather confusing. I should have said that the second doesn't take advantage of the indexing assembly can do. In assembly both calls use something like mov DWORD PTR [ecx+eax*4], edx, the first, however, "sees" the add by one and simply makes the final copy mov DWORD PTR [eax+edx*4+4], ecx the instruction is one extra byte. Now you could write the function as follows, however, my experience is that it will end up running slower. Partly because the compiler may not make pDst and pSrc register variables (due to not enough room) and partly because of the above.

    int* pDst = &dst[1];
    int* pSrc = &src[0]; // just to be more clear

    for (size_t i = 0; i < srcLen; ++i)
    *pDst++ = *pSrc++;

    [Note: Turns out that in an admittedly isolated environment, the second algorithm and third algorithms are slightly faster than the first, which surprised me, but that's the voodoo of assembly and modern CPUs. memcpy() is a solid 15% faster. Of course, memcpy is likely using SSE2 instructions which makes it even faster than rep movsd.

    modified on Tuesday, August 24, 2010 2:58 PM

    N Offline
    N Offline
    Niklas L
    wrote on last edited by
    #15

    Thanks for the explanation. It makes sense if it depends on the indexing rather than the const keyword.

    home

    1 Reply Last reply
    0
    • J Joe Woodbury

      Skippums wrote:

      I'm guessing VS2008 is smart enough not to allocate memory for it

      It actual does since it runs out of registers (I was a little surprised by this, but it makes sense once you look at the dissassembled code.) In practice, though, I'd imagine the cache would prevent this from being a big performance hit.

      P Offline
      P Offline
      Paul Michalik
      wrote on last edited by
      #16

      One question: Have you measured how much faster those a-priori optimizations are running compared to the very cleanly coded algorithm from the post above (using std::copy)?

      J 1 Reply Last reply
      0
      • S Skippums

        I don't get a list of the type I want back from the API function call... I have to call each API method on a per-element basis. Basically, I have a cell-array from Matlab being passed in:

        // prhs is of type "mxArray const *"
        size_t const indexCount = mxGetNumberOfElements(prhs[0]);
        size_t const dataSize = mxGetNumberOfElements(prhs[1]);
        double const ** data = new double const *[dataSize];
        // Get the 1-based indices into the data array
        unsigned __int32 const *const indices = static_cast<unsigned __int32 const *>(mxGetData(prhs[0]));
        for (size_t i = 0; i < dataSize; ++i) {
        // Get a double* to the data in the cell array at 0-indexed location i
        data[i] = mxGetPr(mxGetCell(prhs[1], i));
        }
        --data; // Make data 1-indexed so I don't have to modify "indices" to be 0-based

        That is why I didn't do what you suggested.

        Sounds like somebody's got a case of the Mondays -Jeff

        A Offline
        A Offline
        Aescleal
        wrote on last edited by
        #17

        So what's wrong with the wrapping class solution to convert your array to a 1 based one? Seeing your code there you could use std::generate on a std::vector to get the same effect as your manual loop. Generally if you're doing low level memory fiddling, pointer arithmetic and looping at the same time in C++ there's usually a simpler way. Cheers, Ash

        1 Reply Last reply
        0
        • P Paul Michalik

          One question: Have you measured how much faster those a-priori optimizations are running compared to the very cleanly coded algorithm from the post above (using std::copy)?

          J Offline
          J Offline
          Joe Woodbury
          wrote on last edited by
          #18

          Repeating my edit above: Out of curiosity, I benchmarked the various algorithms using just an int array of 10,000 and 100,000 elements.

          void Test1()
          {
          for (size_t i = 0; i < len; ++i)
          pDst[i + 1] = pSrc[i];
          }

          void Test2()
          {
          for (size_t i = 0; i < len; )
          {
          const int temp = pSrc[i];
          pDst[++i] = temp;
          }
          }

          void Test3()
          {
          memcpy(&pDst[1], pSrc, len * sizeof(int));
          }

          void Test4()
          {
          int *src_start = (int*) &pSrc[0];
          int *src_end = (int*) &pSrc[len];

          std::copy(src\_start, src\_end, &pDst\[1\]);
          

          }

          These are very artificial tests, but as expected Test3 & Test 4 were fastest by about 15%. Test4 was often slightly faster by a few cycles than Test3. I scratched my head since both end up at memcpy(), but Test4 has more apparent overhead. But then I realized it was the len * sizeof(int) calculation that slightly slows Test3(). Surprisingly, Test2 was ever so slightly faster than Test1 (by about 0.1% - 0.5% on my system.) I suspect the CPU cache covers the "save" of the register.

          P 1 Reply Last reply
          0
          • S Skippums

            I assume you meant to use "i++" on the third line of your second example, but understand what you meant. I really thought that the statements:

            int i = 1;
            int j = (i) + ++i;

            would be equivalent to "int j = 3;", but no matter how I apply the parentheses, it always comes out as 4, just as your response predicts it would. Thanks,

            Sounds like somebody's got a case of the Mondays -Jeff

            N Offline
            N Offline
            norish
            wrote on last edited by
            #19

            This would be;

            int i = 1;
            i = i + 1;
            int j = i + i;

            :)

            1 Reply Last reply
            0
            • J Joe Woodbury

              Repeating my edit above: Out of curiosity, I benchmarked the various algorithms using just an int array of 10,000 and 100,000 elements.

              void Test1()
              {
              for (size_t i = 0; i < len; ++i)
              pDst[i + 1] = pSrc[i];
              }

              void Test2()
              {
              for (size_t i = 0; i < len; )
              {
              const int temp = pSrc[i];
              pDst[++i] = temp;
              }
              }

              void Test3()
              {
              memcpy(&pDst[1], pSrc, len * sizeof(int));
              }

              void Test4()
              {
              int *src_start = (int*) &pSrc[0];
              int *src_end = (int*) &pSrc[len];

              std::copy(src\_start, src\_end, &pDst\[1\]);
              

              }

              These are very artificial tests, but as expected Test3 & Test 4 were fastest by about 15%. Test4 was often slightly faster by a few cycles than Test3. I scratched my head since both end up at memcpy(), but Test4 has more apparent overhead. But then I realized it was the len * sizeof(int) calculation that slightly slows Test3(). Surprisingly, Test2 was ever so slightly faster than Test1 (by about 0.1% - 0.5% on my system.) I suspect the CPU cache covers the "save" of the register.

              P Offline
              P Offline
              Paul Michalik
              wrote on last edited by
              #20

              Thanks for the insight. A nice lesson for the "early optimizers"...

              1 Reply Last reply
              0
              • S Skippums

                Well actually, I am trying to write something like the following where I am copying from a 0-indexed array to a 1-indexed array:

                const int src[] = {0, 1};
                const size_t srcLen = sizeof(src) / sizeof(int);
                int *const dst = new int[srcLen] - 1;
                for (size_t i = 0; i < srcLen; ++i)
                dst[i + 1] = src[i];

                NOTE: I have simplified the array copy in the above example... I am actually using an external API function to get the value of the src array elements, which is why I am not just using the statement "const int* const dst = &src[0] - 1;". The compiler will probably optimize this by keeping the value of i + 1 from the array assignment for the next loop iteration in a register (as opposed to recomputing it), but if I could write the for-loop as:

                for (size_t i = 0; i < srcLen; )
                dst[++i] = src[i];

                This would *almost* guarantee that i is incremented only once. Given the initial response, I can't have a single statement with two different values for i, so I'm not able to do what I wanted anyway. It would probably have a higher probability of being optimized if I wrote the loop as:

                for (size_t i = 0; i < srcLen; ) {
                const int temp = src[i];
                dst[++i] = temp;
                }

                Almost every compiler available would put temp in a register instead of allocating it on the stack, but I really don't want to write the loop like that. Guess I'll keep my fingers crossed that the compiler optimizes it! (I don't really care timing-wise as the loop is only iterating over a couple hundred thousand elements one time per run, which is an insignificant amount of time as a fraction of the program's run-time, but why not write efficient code when you can?)

                Sounds like somebody's got a case of the Mondays -Jeff

                S Offline
                S Offline
                Stefan_Lang
                wrote on last edited by
                #21

                If your intent is accessing two arrays within a loop, then - no matter the relation between indices - the fastest way would be to use two pointers to the individual elements and increment these. If you're so intent on improving performance, consider this: every direct access to an array element via an index value requires 1. loading the start address of the array 2. get the size of an element 3. multiplying that by the index(-1) 4. adding that to the start address 5. dereference this address to get to the actual element. As opposed to using pointers which just requires 1. load the pointer 2. dereference it Of course, incrementing the pointers eats up most of this advantage, as you have to add the element size each iteration. And most likely a good optimizer will convert your code into something like this anyway. What I want to say, using an index is just complicating things unneccesarily if all you want to do is access each element sequentially.

                1 Reply Last reply
                0
                • S Skippums

                  Well actually, I am trying to write something like the following where I am copying from a 0-indexed array to a 1-indexed array:

                  const int src[] = {0, 1};
                  const size_t srcLen = sizeof(src) / sizeof(int);
                  int *const dst = new int[srcLen] - 1;
                  for (size_t i = 0; i < srcLen; ++i)
                  dst[i + 1] = src[i];

                  NOTE: I have simplified the array copy in the above example... I am actually using an external API function to get the value of the src array elements, which is why I am not just using the statement "const int* const dst = &src[0] - 1;". The compiler will probably optimize this by keeping the value of i + 1 from the array assignment for the next loop iteration in a register (as opposed to recomputing it), but if I could write the for-loop as:

                  for (size_t i = 0; i < srcLen; )
                  dst[++i] = src[i];

                  This would *almost* guarantee that i is incremented only once. Given the initial response, I can't have a single statement with two different values for i, so I'm not able to do what I wanted anyway. It would probably have a higher probability of being optimized if I wrote the loop as:

                  for (size_t i = 0; i < srcLen; ) {
                  const int temp = src[i];
                  dst[++i] = temp;
                  }

                  Almost every compiler available would put temp in a register instead of allocating it on the stack, but I really don't want to write the loop like that. Guess I'll keep my fingers crossed that the compiler optimizes it! (I don't really care timing-wise as the loop is only iterating over a couple hundred thousand elements one time per run, which is an insignificant amount of time as a fraction of the program's run-time, but why not write efficient code when you can?)

                  Sounds like somebody's got a case of the Mondays -Jeff

                  J Offline
                  J Offline
                  JFDR_02
                  wrote on last edited by
                  #22

                  "why not write efficient code when you can?" X| Don't be daft. Write readable code when you can. Optimize when necessary and use a profiler to detect inefficencies.

                  S 1 Reply Last reply
                  0
                  • J JFDR_02

                    "why not write efficient code when you can?" X| Don't be daft. Write readable code when you can. Optimize when necessary and use a profiler to detect inefficencies.

                    S Offline
                    S Offline
                    Skippums
                    wrote on last edited by
                    #23

                    Why use a performance profiler when I can just guess which parts will be inefficient during implementation? And for the record, my code is readable; the compiler understands it just fine. :-D

                    Sounds like somebody's got a case of the Mondays -Jeff

                    N 1 Reply Last reply
                    0
                    • S Skippums

                      Why use a performance profiler when I can just guess which parts will be inefficient during implementation? And for the record, my code is readable; the compiler understands it just fine. :-D

                      Sounds like somebody's got a case of the Mondays -Jeff

                      N Offline
                      N Offline
                      Niklas L
                      wrote on last edited by
                      #24

                      Skippums wrote:

                      Why use a performance profiler when I can just guess which parts will be inefficient

                      Unless you're writing Hello World, you will be surprised.

                      home

                      S 1 Reply Last reply
                      0
                      • N Niklas L

                        Skippums wrote:

                        Why use a performance profiler when I can just guess which parts will be inefficient

                        Unless you're writing Hello World, you will be surprised.

                        home

                        S Offline
                        S Offline
                        Skippums
                        wrote on last edited by
                        #25

                        I guess I need to be more explicit when I type sarcastic comments. Sadly, I couldn't find the appropriate voice inflection characters on my keyboard. :)

                        Sounds like somebody's got a case of the Mondays -Jeff

                        N 1 Reply Last reply
                        0
                        • S Skippums

                          I guess I need to be more explicit when I type sarcastic comments. Sadly, I couldn't find the appropriate voice inflection characters on my keyboard. :)

                          Sounds like somebody's got a case of the Mondays -Jeff

                          N Offline
                          N Offline
                          Niklas L
                          wrote on last edited by
                          #26

                          ..or don't write what I will read :confused: :-D

                          home

                          1 Reply Last reply
                          0
                          Reply
                          • Reply as topic
                          Log in to reply
                          • Oldest to Newest
                          • Newest to Oldest
                          • Most Votes


                          • Login

                          • Don't have an account? Register

                          • Login or register to search.
                          • First post
                            Last post
                          0
                          • Categories
                          • Recent
                          • Tags
                          • Popular
                          • World
                          • Users
                          • Groups