Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. Reusing an Incremented Variable within a Single Statement

Reusing an Incremented Variable within a Single Statement

Scheduled Pinned Locked Moved C / C++ / MFC
c++tutorialquestionlounge
26 Posts 10 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S Skippums

    Does the C++ standard say anything about which side of the assignment operator is executed first? For example,

    int j[1];
    int i = 0;
    j[i++] = i;

    Will j[0] be 0 or 1 after this block of code executes? Is this per the C++ standard, or is the result of this code compiler-dependent? In general, what rules govern the order of code execution like this? Is it operator associativity (assignment is right-to-left associative, so j[0] = 0), or operator precedence (post-increment and operator[] both have higher precedence than assignment, so j[0] = 1)? Thanks,

    Sounds like somebody's got a case of the Mondays -Jeff

    S Offline
    S Offline
    Sauro Viti
    wrote on last edited by
    #2

    You get j[0] = 0; the postfix increment is meant as "use before increment" and it produce the same result than the statements below:

    j[i] = i;
    i = i + 1;

    You can try the following:

    int j[1];
    int i = 0;
    j[i] = j++;

    This again will produce j[0] = 0. The answer to your question is: the increment is applied after the whole statement has been processed, like as you wrote the same statement without the postfix increment (i.e. j[i] = i) and then another statement with only the increment (i.e. i = i + 1).

    S 1 Reply Last reply
    0
    • S Skippums

      Does the C++ standard say anything about which side of the assignment operator is executed first? For example,

      int j[1];
      int i = 0;
      j[i++] = i;

      Will j[0] be 0 or 1 after this block of code executes? Is this per the C++ standard, or is the result of this code compiler-dependent? In general, what rules govern the order of code execution like this? Is it operator associativity (assignment is right-to-left associative, so j[0] = 0), or operator precedence (post-increment and operator[] both have higher precedence than assignment, so j[0] = 1)? Thanks,

      Sounds like somebody's got a case of the Mondays -Jeff

      L Offline
      L Offline
      Luc Pattyn
      wrote on last edited by
      #3

      IMO that is irrelevant as you should not even consider writing code like that. The only things you can achieve here is confuse yourself, confuse all future readers, and make debugging tougher than necessary. If it is

      j[i]=i;
      i++;

      you want, then write exactly that. It makes perfectly clear what is going on, and, at least in an Intel-Windows environment, it does not have any influence on code size or performance. :)

      Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum

      Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.

      S 1 Reply Last reply
      0
      • L Luc Pattyn

        IMO that is irrelevant as you should not even consider writing code like that. The only things you can achieve here is confuse yourself, confuse all future readers, and make debugging tougher than necessary. If it is

        j[i]=i;
        i++;

        you want, then write exactly that. It makes perfectly clear what is going on, and, at least in an Intel-Windows environment, it does not have any influence on code size or performance. :)

        Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum

        Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.

        S Offline
        S Offline
        Skippums
        wrote on last edited by
        #4

        Well actually, I am trying to write something like the following where I am copying from a 0-indexed array to a 1-indexed array:

        const int src[] = {0, 1};
        const size_t srcLen = sizeof(src) / sizeof(int);
        int *const dst = new int[srcLen] - 1;
        for (size_t i = 0; i < srcLen; ++i)
        dst[i + 1] = src[i];

        NOTE: I have simplified the array copy in the above example... I am actually using an external API function to get the value of the src array elements, which is why I am not just using the statement "const int* const dst = &src[0] - 1;". The compiler will probably optimize this by keeping the value of i + 1 from the array assignment for the next loop iteration in a register (as opposed to recomputing it), but if I could write the for-loop as:

        for (size_t i = 0; i < srcLen; )
        dst[++i] = src[i];

        This would *almost* guarantee that i is incremented only once. Given the initial response, I can't have a single statement with two different values for i, so I'm not able to do what I wanted anyway. It would probably have a higher probability of being optimized if I wrote the loop as:

        for (size_t i = 0; i < srcLen; ) {
        const int temp = src[i];
        dst[++i] = temp;
        }

        Almost every compiler available would put temp in a register instead of allocating it on the stack, but I really don't want to write the loop like that. Guess I'll keep my fingers crossed that the compiler optimizes it! (I don't really care timing-wise as the loop is only iterating over a couple hundred thousand elements one time per run, which is an insignificant amount of time as a fraction of the program's run-time, but why not write efficient code when you can?)

        Sounds like somebody's got a case of the Mondays -Jeff

        L A J S J 5 Replies Last reply
        0
        • S Sauro Viti

          You get j[0] = 0; the postfix increment is meant as "use before increment" and it produce the same result than the statements below:

          j[i] = i;
          i = i + 1;

          You can try the following:

          int j[1];
          int i = 0;
          j[i] = j++;

          This again will produce j[0] = 0. The answer to your question is: the increment is applied after the whole statement has been processed, like as you wrote the same statement without the postfix increment (i.e. j[i] = i) and then another statement with only the increment (i.e. i = i + 1).

          S Offline
          S Offline
          Skippums
          wrote on last edited by
          #5

          I assume you meant to use "i++" on the third line of your second example, but understand what you meant. I really thought that the statements:

          int i = 1;
          int j = (i) + ++i;

          would be equivalent to "int j = 3;", but no matter how I apply the parentheses, it always comes out as 4, just as your response predicts it would. Thanks,

          Sounds like somebody's got a case of the Mondays -Jeff

          N 1 Reply Last reply
          0
          • S Skippums

            Well actually, I am trying to write something like the following where I am copying from a 0-indexed array to a 1-indexed array:

            const int src[] = {0, 1};
            const size_t srcLen = sizeof(src) / sizeof(int);
            int *const dst = new int[srcLen] - 1;
            for (size_t i = 0; i < srcLen; ++i)
            dst[i + 1] = src[i];

            NOTE: I have simplified the array copy in the above example... I am actually using an external API function to get the value of the src array elements, which is why I am not just using the statement "const int* const dst = &src[0] - 1;". The compiler will probably optimize this by keeping the value of i + 1 from the array assignment for the next loop iteration in a register (as opposed to recomputing it), but if I could write the for-loop as:

            for (size_t i = 0; i < srcLen; )
            dst[++i] = src[i];

            This would *almost* guarantee that i is incremented only once. Given the initial response, I can't have a single statement with two different values for i, so I'm not able to do what I wanted anyway. It would probably have a higher probability of being optimized if I wrote the loop as:

            for (size_t i = 0; i < srcLen; ) {
            const int temp = src[i];
            dst[++i] = temp;
            }

            Almost every compiler available would put temp in a register instead of allocating it on the stack, but I really don't want to write the loop like that. Guess I'll keep my fingers crossed that the compiler optimizes it! (I don't really care timing-wise as the loop is only iterating over a couple hundred thousand elements one time per run, which is an insignificant amount of time as a fraction of the program's run-time, but why not write efficient code when you can?)

            Sounds like somebody's got a case of the Mondays -Jeff

            L Offline
            L Offline
            Luc Pattyn
            wrote on last edited by
            #6

            OK, now I'll try and answer as good as I can, from memory: 1. in C the statement dst[i++]=i; is undefined, i.e. the exact order of the operation is not specified by the language specification. A compiler might first evaluate the RHS expression, then start working on the LHS; or it could first reduce the LHS to a simple target address, then evaluate the RHS, then assign. A compiler would even be allowed to unroll the loop (say by a factor of 2) and treat both halves differently. I don't think you'd like that. 2. in C++ things are slightly better, and AFAICR the spec got more stringent over time; the compiler builder still has options but is urged to document the choices he made; I have actually seen compiler vendors do it. So for a specific compiler, you might officially get away with such code. You still would not pass my code review though, my style guide would stop you in time. :-D

            Skippums wrote:

            why not write efficient code when you can?

            I agree to some extent. When two styles are possible: if readability is the same, I'll opt for the higher performance variant. If a trade-of is to be made, I'll judge case by case. And if performance is all and everything, I'll put the normal code as a comment and write whatever it takes. But you have me puzzled even more by your base-0 and base-1 indexing. Can't you just make it all consistent (and I do mean base-0) throughout your app and get rid of the problem entirely? :)

            Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum

            Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.

            S 1 Reply Last reply
            0
            • S Skippums

              Well actually, I am trying to write something like the following where I am copying from a 0-indexed array to a 1-indexed array:

              const int src[] = {0, 1};
              const size_t srcLen = sizeof(src) / sizeof(int);
              int *const dst = new int[srcLen] - 1;
              for (size_t i = 0; i < srcLen; ++i)
              dst[i + 1] = src[i];

              NOTE: I have simplified the array copy in the above example... I am actually using an external API function to get the value of the src array elements, which is why I am not just using the statement "const int* const dst = &src[0] - 1;". The compiler will probably optimize this by keeping the value of i + 1 from the array assignment for the next loop iteration in a register (as opposed to recomputing it), but if I could write the for-loop as:

              for (size_t i = 0; i < srcLen; )
              dst[++i] = src[i];

              This would *almost* guarantee that i is incremented only once. Given the initial response, I can't have a single statement with two different values for i, so I'm not able to do what I wanted anyway. It would probably have a higher probability of being optimized if I wrote the loop as:

              for (size_t i = 0; i < srcLen; ) {
              const int temp = src[i];
              dst[++i] = temp;
              }

              Almost every compiler available would put temp in a register instead of allocating it on the stack, but I really don't want to write the loop like that. Guess I'll keep my fingers crossed that the compiler optimizes it! (I don't really care timing-wise as the loop is only iterating over a couple hundred thousand elements one time per run, which is an insignificant amount of time as a fraction of the program's run-time, but why not write efficient code when you can?)

              Sounds like somebody's got a case of the Mondays -Jeff

              A Offline
              A Offline
              Aescleal
              wrote on last edited by
              #7

              Okay, my mind is completely boggled by this code... - you allocate an intrinsic array of size whatever - you decrement the pointer to the array - you then manually copy another array into this one starting at the same position you allocated the memory at What's wrong with...

              int *src_start = ... // from API
              int *src_end = ... // from API

              int *p = new int[ src_end - src_start ];
              std::copy( src_start, src_end, p );
              int *p_to_1_based_alias = p - 1;

              If that doesn't at least perform as well as your handrolled loop then trade your compiler in for a new one. You can also get rid of any explicit memory management using a vector. Another idea would be to just front end the array you get back from the API and convert the index:

              template<std::size_t N, typename T>
              class N_based_array
              {
              public:
              N_based_array( T *data ) : data_( data ) {}

                  T &operator\[\]( std::size\_t index )
                  {
                      return data\_\[ index - N \];
                  }
              
              private:
              	T \*data\_;
              

              };

              Cheers, Ash

              S 1 Reply Last reply
              0
              • S Skippums

                Well actually, I am trying to write something like the following where I am copying from a 0-indexed array to a 1-indexed array:

                const int src[] = {0, 1};
                const size_t srcLen = sizeof(src) / sizeof(int);
                int *const dst = new int[srcLen] - 1;
                for (size_t i = 0; i < srcLen; ++i)
                dst[i + 1] = src[i];

                NOTE: I have simplified the array copy in the above example... I am actually using an external API function to get the value of the src array elements, which is why I am not just using the statement "const int* const dst = &src[0] - 1;". The compiler will probably optimize this by keeping the value of i + 1 from the array assignment for the next loop iteration in a register (as opposed to recomputing it), but if I could write the for-loop as:

                for (size_t i = 0; i < srcLen; )
                dst[++i] = src[i];

                This would *almost* guarantee that i is incremented only once. Given the initial response, I can't have a single statement with two different values for i, so I'm not able to do what I wanted anyway. It would probably have a higher probability of being optimized if I wrote the loop as:

                for (size_t i = 0; i < srcLen; ) {
                const int temp = src[i];
                dst[++i] = temp;
                }

                Almost every compiler available would put temp in a register instead of allocating it on the stack, but I really don't want to write the loop like that. Guess I'll keep my fingers crossed that the compiler optimizes it! (I don't really care timing-wise as the loop is only iterating over a couple hundred thousand elements one time per run, which is an insignificant amount of time as a fraction of the program's run-time, but why not write efficient code when you can?)

                Sounds like somebody's got a case of the Mondays -Jeff

                J Offline
                J Offline
                Joe Woodbury
                wrote on last edited by
                #8

                Skippums wrote:

                int *const dst = new int[srcLen] - 1;

                This is seriously broken. You are allocating a pointer and then decrementing it. This is VERY bad (especially since you declare dst const and now can't delete it.) If you want to use 1 indexing, simply allocate one extra element and don't worry about element zero.

                Skippums wrote:

                for (size_t i = 0; i < srcLen; ) dst[++i] = src[i];

                The behavior here is technically undefined, however with VS 2008 it will increment i, then use the result as the index for both arrays.

                Skippums wrote:

                const int temp = src[i];

                The "const" here does nothing and ends up being slower, not just because you are confusing the compiler, but because it prevents the compiler from using an indexed assembly call.

                Skippums wrote:

                (I don't really care timing-wise as the loop is only iterating over a couple hundred thousand elements one time per run, which is an insignificant amount of time as a fraction of the program's run-time, but why not write efficient code when you can?)

                Because memcpy() will very likely be faster in this case and you aren't concentrating on where the optimizations will actually do some good. Assuming you have a sound algorithm, a general rule is writing sensible, plain C/C++ code will allow the compiler to do the best job optimizing (I've proven this many times to skeptics.) * * * * * Note: Out of curiosity, I benchmarked the various algorithms using just an int array of 10,000 and 100,000 elements.

                void Test1()
                {
                for (size_t i = 0; i < len; ++i)
                pDst[i + 1] = pSrc[i];
                }

                void Test2()
                {
                for (size_t i = 0; i < len; )
                {
                const int temp = pSrc[i];
                pDst[++i] = temp;
                }
                }

                void Test3()
                {
                memcpy(&pDst[1], pSrc, len * sizeof(int));
                }

                void Test4()
                {
                int *src_start = (int*) &pSrc[0];
                int *src_end = (int*) &pSrc[len];

                std::copy(src\_start, src\_end, &pDst\[1\]);
                

                }

                These are very artificial tests, but as expected Test3 & Test 4 were fastest by about 15%. Test4 was often slightly faster by a few cycles than Test3. I scratched my head since both end up at memcpy(), but Test4 has more apparent overhead. But then I realized it was the len * sizeof(int) calculation that slightly slows Test3(). Surprisingly, Test2 was ever so slightly faster

                N S 2 Replies Last reply
                0
                • L Luc Pattyn

                  OK, now I'll try and answer as good as I can, from memory: 1. in C the statement dst[i++]=i; is undefined, i.e. the exact order of the operation is not specified by the language specification. A compiler might first evaluate the RHS expression, then start working on the LHS; or it could first reduce the LHS to a simple target address, then evaluate the RHS, then assign. A compiler would even be allowed to unroll the loop (say by a factor of 2) and treat both halves differently. I don't think you'd like that. 2. in C++ things are slightly better, and AFAICR the spec got more stringent over time; the compiler builder still has options but is urged to document the choices he made; I have actually seen compiler vendors do it. So for a specific compiler, you might officially get away with such code. You still would not pass my code review though, my style guide would stop you in time. :-D

                  Skippums wrote:

                  why not write efficient code when you can?

                  I agree to some extent. When two styles are possible: if readability is the same, I'll opt for the higher performance variant. If a trade-of is to be made, I'll judge case by case. And if performance is all and everything, I'll put the normal code as a comment and write whatever it takes. But you have me puzzled even more by your base-0 and base-1 indexing. Can't you just make it all consistent (and I do mean base-0) throughout your app and get rid of the problem entirely? :)

                  Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum

                  Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.

                  S Offline
                  S Offline
                  Skippums
                  wrote on last edited by
                  #9

                  I have two arrays being sent to my function from an outside source. The first array is a list of 1-based indices into the the second array that contains the data. When the first array has an index of 0, this means that there is no associated data element in the second array. So, for example:

                  const size_t indexArray[] = {1, 0, 3, 2, 0};
                  DATA_TYPE *const dataArray = new DATA_TYPE[3] - 1; // This is a 1-indexed array and has three elements in it (since max(indexArray) == 3)

                  Now, as I said, no matter what I do I have to read in the values of dataArray using API calls on a per-element basis. I could very easily copy the values from indexArray into another variable in my C-code, subtracting 1 from each element as I perform the copy. However, I think it is more straightforward (and more efficient) to just get a pointer to the existing indexArray data (using an API call) and subtract one from my dynamically allocated dataArray object as I illustrated in the above code. Using this approach, I can compare each element from indexArray to 0 to test for the special condition, whereas if I did a copy and subtracted one I would have to test against numeric_limits<size_t>::max(), and pray that the max() macro from windows.h wasn't included by some other header file that I am including. I may also need to make the copy loop more complex, if I don't want to assume that the result of (0u - 1u) is the maximum unsigned integer. Hopefully that explains in better detail why I am not just converting it to 0-based indices.

                  Sounds like somebody's got a case of the Mondays -Jeff

                  1 Reply Last reply
                  0
                  • A Aescleal

                    Okay, my mind is completely boggled by this code... - you allocate an intrinsic array of size whatever - you decrement the pointer to the array - you then manually copy another array into this one starting at the same position you allocated the memory at What's wrong with...

                    int *src_start = ... // from API
                    int *src_end = ... // from API

                    int *p = new int[ src_end - src_start ];
                    std::copy( src_start, src_end, p );
                    int *p_to_1_based_alias = p - 1;

                    If that doesn't at least perform as well as your handrolled loop then trade your compiler in for a new one. You can also get rid of any explicit memory management using a vector. Another idea would be to just front end the array you get back from the API and convert the index:

                    template<std::size_t N, typename T>
                    class N_based_array
                    {
                    public:
                    N_based_array( T *data ) : data_( data ) {}

                        T &operator\[\]( std::size\_t index )
                        {
                            return data\_\[ index - N \];
                        }
                    
                    private:
                    	T \*data\_;
                    

                    };

                    Cheers, Ash

                    S Offline
                    S Offline
                    Skippums
                    wrote on last edited by
                    #10

                    I don't get a list of the type I want back from the API function call... I have to call each API method on a per-element basis. Basically, I have a cell-array from Matlab being passed in:

                    // prhs is of type "mxArray const *"
                    size_t const indexCount = mxGetNumberOfElements(prhs[0]);
                    size_t const dataSize = mxGetNumberOfElements(prhs[1]);
                    double const ** data = new double const *[dataSize];
                    // Get the 1-based indices into the data array
                    unsigned __int32 const *const indices = static_cast<unsigned __int32 const *>(mxGetData(prhs[0]));
                    for (size_t i = 0; i < dataSize; ++i) {
                    // Get a double* to the data in the cell array at 0-indexed location i
                    data[i] = mxGetPr(mxGetCell(prhs[1], i));
                    }
                    --data; // Make data 1-indexed so I don't have to modify "indices" to be 0-based

                    That is why I didn't do what you suggested.

                    Sounds like somebody's got a case of the Mondays -Jeff

                    A 1 Reply Last reply
                    0
                    • J Joe Woodbury

                      Skippums wrote:

                      int *const dst = new int[srcLen] - 1;

                      This is seriously broken. You are allocating a pointer and then decrementing it. This is VERY bad (especially since you declare dst const and now can't delete it.) If you want to use 1 indexing, simply allocate one extra element and don't worry about element zero.

                      Skippums wrote:

                      for (size_t i = 0; i < srcLen; ) dst[++i] = src[i];

                      The behavior here is technically undefined, however with VS 2008 it will increment i, then use the result as the index for both arrays.

                      Skippums wrote:

                      const int temp = src[i];

                      The "const" here does nothing and ends up being slower, not just because you are confusing the compiler, but because it prevents the compiler from using an indexed assembly call.

                      Skippums wrote:

                      (I don't really care timing-wise as the loop is only iterating over a couple hundred thousand elements one time per run, which is an insignificant amount of time as a fraction of the program's run-time, but why not write efficient code when you can?)

                      Because memcpy() will very likely be faster in this case and you aren't concentrating on where the optimizations will actually do some good. Assuming you have a sound algorithm, a general rule is writing sensible, plain C/C++ code will allow the compiler to do the best job optimizing (I've proven this many times to skeptics.) * * * * * Note: Out of curiosity, I benchmarked the various algorithms using just an int array of 10,000 and 100,000 elements.

                      void Test1()
                      {
                      for (size_t i = 0; i < len; ++i)
                      pDst[i + 1] = pSrc[i];
                      }

                      void Test2()
                      {
                      for (size_t i = 0; i < len; )
                      {
                      const int temp = pSrc[i];
                      pDst[++i] = temp;
                      }
                      }

                      void Test3()
                      {
                      memcpy(&pDst[1], pSrc, len * sizeof(int));
                      }

                      void Test4()
                      {
                      int *src_start = (int*) &pSrc[0];
                      int *src_end = (int*) &pSrc[len];

                      std::copy(src\_start, src\_end, &pDst\[1\]);
                      

                      }

                      These are very artificial tests, but as expected Test3 & Test 4 were fastest by about 15%. Test4 was often slightly faster by a few cycles than Test3. I scratched my head since both end up at memcpy(), but Test4 has more apparent overhead. But then I realized it was the len * sizeof(int) calculation that slightly slows Test3(). Surprisingly, Test2 was ever so slightly faster

                      N Offline
                      N Offline
                      Niklas L
                      wrote on last edited by
                      #11

                      Good points.

                      Joe Woodbury wrote:

                      const int temp = src[i]; The "const" here does nothing and ends up being slower, not just because you are confusing the compiler, but because it prevents the compiler from using an indexed assembly call.

                      Care to elaborate on this a bit? Why would the const affect the compiler? I would assume the compiler easier could chose an indexed call just because there is a const.

                      home

                      J 1 Reply Last reply
                      0
                      • J Joe Woodbury

                        Skippums wrote:

                        int *const dst = new int[srcLen] - 1;

                        This is seriously broken. You are allocating a pointer and then decrementing it. This is VERY bad (especially since you declare dst const and now can't delete it.) If you want to use 1 indexing, simply allocate one extra element and don't worry about element zero.

                        Skippums wrote:

                        for (size_t i = 0; i < srcLen; ) dst[++i] = src[i];

                        The behavior here is technically undefined, however with VS 2008 it will increment i, then use the result as the index for both arrays.

                        Skippums wrote:

                        const int temp = src[i];

                        The "const" here does nothing and ends up being slower, not just because you are confusing the compiler, but because it prevents the compiler from using an indexed assembly call.

                        Skippums wrote:

                        (I don't really care timing-wise as the loop is only iterating over a couple hundred thousand elements one time per run, which is an insignificant amount of time as a fraction of the program's run-time, but why not write efficient code when you can?)

                        Because memcpy() will very likely be faster in this case and you aren't concentrating on where the optimizations will actually do some good. Assuming you have a sound algorithm, a general rule is writing sensible, plain C/C++ code will allow the compiler to do the best job optimizing (I've proven this many times to skeptics.) * * * * * Note: Out of curiosity, I benchmarked the various algorithms using just an int array of 10,000 and 100,000 elements.

                        void Test1()
                        {
                        for (size_t i = 0; i < len; ++i)
                        pDst[i + 1] = pSrc[i];
                        }

                        void Test2()
                        {
                        for (size_t i = 0; i < len; )
                        {
                        const int temp = pSrc[i];
                        pDst[++i] = temp;
                        }
                        }

                        void Test3()
                        {
                        memcpy(&pDst[1], pSrc, len * sizeof(int));
                        }

                        void Test4()
                        {
                        int *src_start = (int*) &pSrc[0];
                        int *src_end = (int*) &pSrc[len];

                        std::copy(src\_start, src\_end, &pDst\[1\]);
                        

                        }

                        These are very artificial tests, but as expected Test3 & Test 4 were fastest by about 15%. Test4 was often slightly faster by a few cycles than Test3. I scratched my head since both end up at memcpy(), but Test4 has more apparent overhead. But then I realized it was the len * sizeof(int) calculation that slightly slows Test3(). Surprisingly, Test2 was ever so slightly faster

                        S Offline
                        S Offline
                        Skippums
                        wrote on last edited by
                        #12

                        Joe Woodbury wrote:

                        This is seriously broken. You are allocating a pointer and then decrementing it. This is VERY bad (especially since you declare dst const and now can't delete it.) If you want to use 1 indexing, simply allocate one extra element and don't worry about element zero.

                        Actually you can delete it:

                        delete[] (dst + 1);

                        Joe Woodbury wrote:

                        The behavior here is technically undefined, however with VS 2008 it will increment i, then use the result as the index for both arrays.

                        That was my original question... Thank you for answering it! Even if it would have done what I wanted, I still probably wouldn't have used it due to code maintainability issues.

                        Joe Woodbury wrote:

                        The "const" here does nothing

                        Actually, the const there prevents people in the future from modifying the value stored in temp, which may prevent coder error. In the simple loop I proposed, it probably will have no effect either way, since screwing up the loop through human error is unlikely and I'm guessing VS2008 is smart enough not to allocate memory for it. Also, the actual loop I am using is more complex than this example, so the compiler couldn't have used an indexed assembly call anyway.

                        Joe Woodbury wrote:

                        Because memcpy() will very likely be faster in this case

                        I can't use memcpy... that is part of the complexity of my problem. See my response to the post immediately before yours if you are curious as to why I can't use that function. I do fully agree with your statement:

                        Joe Woodbury wrote:

                        Assuming you have a sound algorithm, a general rule is writing sensible, plain C/C++ code

                        Thanks for the help!

                        Sounds like somebody's got a case of the Mondays -Jeff

                        J 1 Reply Last reply
                        0
                        • N Niklas L

                          Good points.

                          Joe Woodbury wrote:

                          const int temp = src[i]; The "const" here does nothing and ends up being slower, not just because you are confusing the compiler, but because it prevents the compiler from using an indexed assembly call.

                          Care to elaborate on this a bit? Why would the const affect the compiler? I would assume the compiler easier could chose an indexed call just because there is a const.

                          home

                          J Offline
                          J Offline
                          Joe Woodbury
                          wrote on last edited by
                          #13

                          for (size_t i = 0; i < srcLen; ++i)
                          dst[i + 1] = src[i];

                          for (size_t i = 0; i < srcLen; )
                          {
                          const int temp = src[i];
                          dst[++i] = temp;
                          }

                          Sorry, I was typing fast and was rather confusing. I should have said that the second doesn't take advantage of the indexing assembly can do. In assembly both calls use something like mov DWORD PTR [ecx+eax*4], edx, the first, however, "sees" the add by one and simply makes the final copy mov DWORD PTR [eax+edx*4+4], ecx the instruction is one extra byte. Now you could write the function as follows, however, my experience is that it will end up running slower. Partly because the compiler may not make pDst and pSrc register variables (due to not enough room) and partly because of the above.

                          int* pDst = &dst[1];
                          int* pSrc = &src[0]; // just to be more clear

                          for (size_t i = 0; i < srcLen; ++i)
                          *pDst++ = *pSrc++;

                          [Note: Turns out that in an admittedly isolated environment, the second algorithm and third algorithms are slightly faster than the first, which surprised me, but that's the voodoo of assembly and modern CPUs. memcpy() is a solid 15% faster. Of course, memcpy is likely using SSE2 instructions which makes it even faster than rep movsd.

                          modified on Tuesday, August 24, 2010 2:58 PM

                          N 1 Reply Last reply
                          0
                          • S Skippums

                            Joe Woodbury wrote:

                            This is seriously broken. You are allocating a pointer and then decrementing it. This is VERY bad (especially since you declare dst const and now can't delete it.) If you want to use 1 indexing, simply allocate one extra element and don't worry about element zero.

                            Actually you can delete it:

                            delete[] (dst + 1);

                            Joe Woodbury wrote:

                            The behavior here is technically undefined, however with VS 2008 it will increment i, then use the result as the index for both arrays.

                            That was my original question... Thank you for answering it! Even if it would have done what I wanted, I still probably wouldn't have used it due to code maintainability issues.

                            Joe Woodbury wrote:

                            The "const" here does nothing

                            Actually, the const there prevents people in the future from modifying the value stored in temp, which may prevent coder error. In the simple loop I proposed, it probably will have no effect either way, since screwing up the loop through human error is unlikely and I'm guessing VS2008 is smart enough not to allocate memory for it. Also, the actual loop I am using is more complex than this example, so the compiler couldn't have used an indexed assembly call anyway.

                            Joe Woodbury wrote:

                            Because memcpy() will very likely be faster in this case

                            I can't use memcpy... that is part of the complexity of my problem. See my response to the post immediately before yours if you are curious as to why I can't use that function. I do fully agree with your statement:

                            Joe Woodbury wrote:

                            Assuming you have a sound algorithm, a general rule is writing sensible, plain C/C++ code

                            Thanks for the help!

                            Sounds like somebody's got a case of the Mondays -Jeff

                            J Offline
                            J Offline
                            Joe Woodbury
                            wrote on last edited by
                            #14

                            Skippums wrote:

                            I'm guessing VS2008 is smart enough not to allocate memory for it

                            It actual does since it runs out of registers (I was a little surprised by this, but it makes sense once you look at the dissassembled code.) In practice, though, I'd imagine the cache would prevent this from being a big performance hit.

                            P 1 Reply Last reply
                            0
                            • J Joe Woodbury

                              for (size_t i = 0; i < srcLen; ++i)
                              dst[i + 1] = src[i];

                              for (size_t i = 0; i < srcLen; )
                              {
                              const int temp = src[i];
                              dst[++i] = temp;
                              }

                              Sorry, I was typing fast and was rather confusing. I should have said that the second doesn't take advantage of the indexing assembly can do. In assembly both calls use something like mov DWORD PTR [ecx+eax*4], edx, the first, however, "sees" the add by one and simply makes the final copy mov DWORD PTR [eax+edx*4+4], ecx the instruction is one extra byte. Now you could write the function as follows, however, my experience is that it will end up running slower. Partly because the compiler may not make pDst and pSrc register variables (due to not enough room) and partly because of the above.

                              int* pDst = &dst[1];
                              int* pSrc = &src[0]; // just to be more clear

                              for (size_t i = 0; i < srcLen; ++i)
                              *pDst++ = *pSrc++;

                              [Note: Turns out that in an admittedly isolated environment, the second algorithm and third algorithms are slightly faster than the first, which surprised me, but that's the voodoo of assembly and modern CPUs. memcpy() is a solid 15% faster. Of course, memcpy is likely using SSE2 instructions which makes it even faster than rep movsd.

                              modified on Tuesday, August 24, 2010 2:58 PM

                              N Offline
                              N Offline
                              Niklas L
                              wrote on last edited by
                              #15

                              Thanks for the explanation. It makes sense if it depends on the indexing rather than the const keyword.

                              home

                              1 Reply Last reply
                              0
                              • J Joe Woodbury

                                Skippums wrote:

                                I'm guessing VS2008 is smart enough not to allocate memory for it

                                It actual does since it runs out of registers (I was a little surprised by this, but it makes sense once you look at the dissassembled code.) In practice, though, I'd imagine the cache would prevent this from being a big performance hit.

                                P Offline
                                P Offline
                                Paul Michalik
                                wrote on last edited by
                                #16

                                One question: Have you measured how much faster those a-priori optimizations are running compared to the very cleanly coded algorithm from the post above (using std::copy)?

                                J 1 Reply Last reply
                                0
                                • S Skippums

                                  I don't get a list of the type I want back from the API function call... I have to call each API method on a per-element basis. Basically, I have a cell-array from Matlab being passed in:

                                  // prhs is of type "mxArray const *"
                                  size_t const indexCount = mxGetNumberOfElements(prhs[0]);
                                  size_t const dataSize = mxGetNumberOfElements(prhs[1]);
                                  double const ** data = new double const *[dataSize];
                                  // Get the 1-based indices into the data array
                                  unsigned __int32 const *const indices = static_cast<unsigned __int32 const *>(mxGetData(prhs[0]));
                                  for (size_t i = 0; i < dataSize; ++i) {
                                  // Get a double* to the data in the cell array at 0-indexed location i
                                  data[i] = mxGetPr(mxGetCell(prhs[1], i));
                                  }
                                  --data; // Make data 1-indexed so I don't have to modify "indices" to be 0-based

                                  That is why I didn't do what you suggested.

                                  Sounds like somebody's got a case of the Mondays -Jeff

                                  A Offline
                                  A Offline
                                  Aescleal
                                  wrote on last edited by
                                  #17

                                  So what's wrong with the wrapping class solution to convert your array to a 1 based one? Seeing your code there you could use std::generate on a std::vector to get the same effect as your manual loop. Generally if you're doing low level memory fiddling, pointer arithmetic and looping at the same time in C++ there's usually a simpler way. Cheers, Ash

                                  1 Reply Last reply
                                  0
                                  • P Paul Michalik

                                    One question: Have you measured how much faster those a-priori optimizations are running compared to the very cleanly coded algorithm from the post above (using std::copy)?

                                    J Offline
                                    J Offline
                                    Joe Woodbury
                                    wrote on last edited by
                                    #18

                                    Repeating my edit above: Out of curiosity, I benchmarked the various algorithms using just an int array of 10,000 and 100,000 elements.

                                    void Test1()
                                    {
                                    for (size_t i = 0; i < len; ++i)
                                    pDst[i + 1] = pSrc[i];
                                    }

                                    void Test2()
                                    {
                                    for (size_t i = 0; i < len; )
                                    {
                                    const int temp = pSrc[i];
                                    pDst[++i] = temp;
                                    }
                                    }

                                    void Test3()
                                    {
                                    memcpy(&pDst[1], pSrc, len * sizeof(int));
                                    }

                                    void Test4()
                                    {
                                    int *src_start = (int*) &pSrc[0];
                                    int *src_end = (int*) &pSrc[len];

                                    std::copy(src\_start, src\_end, &pDst\[1\]);
                                    

                                    }

                                    These are very artificial tests, but as expected Test3 & Test 4 were fastest by about 15%. Test4 was often slightly faster by a few cycles than Test3. I scratched my head since both end up at memcpy(), but Test4 has more apparent overhead. But then I realized it was the len * sizeof(int) calculation that slightly slows Test3(). Surprisingly, Test2 was ever so slightly faster than Test1 (by about 0.1% - 0.5% on my system.) I suspect the CPU cache covers the "save" of the register.

                                    P 1 Reply Last reply
                                    0
                                    • S Skippums

                                      I assume you meant to use "i++" on the third line of your second example, but understand what you meant. I really thought that the statements:

                                      int i = 1;
                                      int j = (i) + ++i;

                                      would be equivalent to "int j = 3;", but no matter how I apply the parentheses, it always comes out as 4, just as your response predicts it would. Thanks,

                                      Sounds like somebody's got a case of the Mondays -Jeff

                                      N Offline
                                      N Offline
                                      norish
                                      wrote on last edited by
                                      #19

                                      This would be;

                                      int i = 1;
                                      i = i + 1;
                                      int j = i + i;

                                      :)

                                      1 Reply Last reply
                                      0
                                      • J Joe Woodbury

                                        Repeating my edit above: Out of curiosity, I benchmarked the various algorithms using just an int array of 10,000 and 100,000 elements.

                                        void Test1()
                                        {
                                        for (size_t i = 0; i < len; ++i)
                                        pDst[i + 1] = pSrc[i];
                                        }

                                        void Test2()
                                        {
                                        for (size_t i = 0; i < len; )
                                        {
                                        const int temp = pSrc[i];
                                        pDst[++i] = temp;
                                        }
                                        }

                                        void Test3()
                                        {
                                        memcpy(&pDst[1], pSrc, len * sizeof(int));
                                        }

                                        void Test4()
                                        {
                                        int *src_start = (int*) &pSrc[0];
                                        int *src_end = (int*) &pSrc[len];

                                        std::copy(src\_start, src\_end, &pDst\[1\]);
                                        

                                        }

                                        These are very artificial tests, but as expected Test3 & Test 4 were fastest by about 15%. Test4 was often slightly faster by a few cycles than Test3. I scratched my head since both end up at memcpy(), but Test4 has more apparent overhead. But then I realized it was the len * sizeof(int) calculation that slightly slows Test3(). Surprisingly, Test2 was ever so slightly faster than Test1 (by about 0.1% - 0.5% on my system.) I suspect the CPU cache covers the "save" of the register.

                                        P Offline
                                        P Offline
                                        Paul Michalik
                                        wrote on last edited by
                                        #20

                                        Thanks for the insight. A nice lesson for the "early optimizers"...

                                        1 Reply Last reply
                                        0
                                        • S Skippums

                                          Well actually, I am trying to write something like the following where I am copying from a 0-indexed array to a 1-indexed array:

                                          const int src[] = {0, 1};
                                          const size_t srcLen = sizeof(src) / sizeof(int);
                                          int *const dst = new int[srcLen] - 1;
                                          for (size_t i = 0; i < srcLen; ++i)
                                          dst[i + 1] = src[i];

                                          NOTE: I have simplified the array copy in the above example... I am actually using an external API function to get the value of the src array elements, which is why I am not just using the statement "const int* const dst = &src[0] - 1;". The compiler will probably optimize this by keeping the value of i + 1 from the array assignment for the next loop iteration in a register (as opposed to recomputing it), but if I could write the for-loop as:

                                          for (size_t i = 0; i < srcLen; )
                                          dst[++i] = src[i];

                                          This would *almost* guarantee that i is incremented only once. Given the initial response, I can't have a single statement with two different values for i, so I'm not able to do what I wanted anyway. It would probably have a higher probability of being optimized if I wrote the loop as:

                                          for (size_t i = 0; i < srcLen; ) {
                                          const int temp = src[i];
                                          dst[++i] = temp;
                                          }

                                          Almost every compiler available would put temp in a register instead of allocating it on the stack, but I really don't want to write the loop like that. Guess I'll keep my fingers crossed that the compiler optimizes it! (I don't really care timing-wise as the loop is only iterating over a couple hundred thousand elements one time per run, which is an insignificant amount of time as a fraction of the program's run-time, but why not write efficient code when you can?)

                                          Sounds like somebody's got a case of the Mondays -Jeff

                                          S Offline
                                          S Offline
                                          Stefan_Lang
                                          wrote on last edited by
                                          #21

                                          If your intent is accessing two arrays within a loop, then - no matter the relation between indices - the fastest way would be to use two pointers to the individual elements and increment these. If you're so intent on improving performance, consider this: every direct access to an array element via an index value requires 1. loading the start address of the array 2. get the size of an element 3. multiplying that by the index(-1) 4. adding that to the start address 5. dereference this address to get to the actual element. As opposed to using pointers which just requires 1. load the pointer 2. dereference it Of course, incrementing the pointers eats up most of this advantage, as you have to add the element size each iteration. And most likely a good optimizer will convert your code into something like this anyway. What I want to say, using an index is just complicating things unneccesarily if all you want to do is access each element sequentially.

                                          1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Don't have an account? Register

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups