A very small performance issue in C++
-
Hi, I have a query on pointer arithmetic. I declared a structure like this struct myStruct{ int index; char sz[124]; }; Inside the main(), I created 10000 objects of the structure and assigned it to a pointer variable. myStruct* obj = new myStruct[10001]; memset( obj,0,sizeof(myStruct) * 10005); myStruct* obj1 = obj; long l1 = 0; for( long l=0;l<100000000;l++) { obj++; l1 += obj->index; //just an addition so that compiler wont remove the previous line if( i % 10000 == 0 ){ obj = obj1; l1 = 0; //reseting the values; } } When I executed this, it took me 8 sec. But if I reduce the size of myStruct to 64, it takes only 4 sec. ie. struct myStruct{ int index; char sz[60]; }; and again reducing the size to 32, it takes only 2 secs. I examined the assembly code which was generated by Disassembly, and the only change was in the 2nd line obj++; mov edx,dword ptr [ebp-14h] add edx,100h mov dword ptr [ebp-14h],edx thats in add edx,100h ( 256 b ) time : 8sec add edx,20h ( 32 b ) time : 2sec Can anybody pl. give me an explanation for this behaviour. I was able to reproduce this behaviour in Unix also.... Thanks and Regards Jagadeesh "A robust program is resistant to errors -- it either works correctly, or it does not work at all; whereas a fault tolerant program must actually recover from errors."
-
Hi, I have a query on pointer arithmetic. I declared a structure like this struct myStruct{ int index; char sz[124]; }; Inside the main(), I created 10000 objects of the structure and assigned it to a pointer variable. myStruct* obj = new myStruct[10001]; memset( obj,0,sizeof(myStruct) * 10005); myStruct* obj1 = obj; long l1 = 0; for( long l=0;l<100000000;l++) { obj++; l1 += obj->index; //just an addition so that compiler wont remove the previous line if( i % 10000 == 0 ){ obj = obj1; l1 = 0; //reseting the values; } } When I executed this, it took me 8 sec. But if I reduce the size of myStruct to 64, it takes only 4 sec. ie. struct myStruct{ int index; char sz[60]; }; and again reducing the size to 32, it takes only 2 secs. I examined the assembly code which was generated by Disassembly, and the only change was in the 2nd line obj++; mov edx,dword ptr [ebp-14h] add edx,100h mov dword ptr [ebp-14h],edx thats in add edx,100h ( 256 b ) time : 8sec add edx,20h ( 32 b ) time : 2sec Can anybody pl. give me an explanation for this behaviour. I was able to reproduce this behaviour in Unix also.... Thanks and Regards Jagadeesh "A robust program is resistant to errors -- it either works correctly, or it does not work at all; whereas a fault tolerant program must actually recover from errors."
Hello, Jagadeesh VN wrote: Inside the main(), I created 10000 objects of the structure and assigned it to a pointer variable.
myStruct* obj = new myStruct[10001]; memset( obj,0,sizeof(myStruct) * 10005);
Here you actually create 10.001 structs and initialize 10.005! This is very wrong! Jagadeesh VN wrote:for( long l=0;l<100000000;l++) { obj++; l1 += obj->index; //just an addition so that compiler wont remove the previous line if( i % 10000 == 0 ){ obj = obj1; l1 = 0; //reseting the values; }
A couple of questions on this piece of code:* Why the dummy addition (l1 += obj->index;
) The compiler won't remove the line, since you use the object further down your code!- Why such a long loop?
- Your loop does nothing usefull, why? To get to an answer, I honestly don't think that the long loop is your problem. I think the problem lies with
memset()
. Let me explain: your execution time drops liniear with the size of the struct. So the size of the array decreases also liniear. (You halve the size of the stuct, so the size of the array is also halved.) So the amount of memory thatmemset()
has to fill is also halved! Besides that, your loop executes 100 million times! Don't expect that to finish in a few ms.. Behind every great black man... ... is the police. - Conspiracy brother Blog[^]
-
Hi, I have a query on pointer arithmetic. I declared a structure like this struct myStruct{ int index; char sz[124]; }; Inside the main(), I created 10000 objects of the structure and assigned it to a pointer variable. myStruct* obj = new myStruct[10001]; memset( obj,0,sizeof(myStruct) * 10005); myStruct* obj1 = obj; long l1 = 0; for( long l=0;l<100000000;l++) { obj++; l1 += obj->index; //just an addition so that compiler wont remove the previous line if( i % 10000 == 0 ){ obj = obj1; l1 = 0; //reseting the values; } } When I executed this, it took me 8 sec. But if I reduce the size of myStruct to 64, it takes only 4 sec. ie. struct myStruct{ int index; char sz[60]; }; and again reducing the size to 32, it takes only 2 secs. I examined the assembly code which was generated by Disassembly, and the only change was in the 2nd line obj++; mov edx,dword ptr [ebp-14h] add edx,100h mov dword ptr [ebp-14h],edx thats in add edx,100h ( 256 b ) time : 8sec add edx,20h ( 32 b ) time : 2sec Can anybody pl. give me an explanation for this behaviour. I was able to reproduce this behaviour in Unix also.... Thanks and Regards Jagadeesh "A robust program is resistant to errors -- it either works correctly, or it does not work at all; whereas a fault tolerant program must actually recover from errors."
This is an effect of L1 cache. You did not mention the CPU you were using. I assume it was a P4. The L1 access line size is 64 bytes if my memory serves me right. If there are no alignment issues(your compiler often does a good job at this), the smaller size the structure is the more effective for a single L1 line access. Nowadays, people often talk about structure of arrary (SoA), that address such issues....