memory manager hate 1024 and 2048?
-
In fact, I can't believe the result myself. But from my simple test, i found the matrix with 1024 alike is much slowly in memory access than 1023 or 1025. I dont know the real reason. My test program is very simple. My aim is test the memory copy cost in a matrix. There are three parameters to used, M for the size of matrix, N for the rows or columns try to copy, and Z is make the program run longer to get a stable result. #include #include #include #include void main(int argc, char** argv) { int M,N,Z; M = atoi(argv[1]); N = atoi(argv[2]); Z = atoi(argv[3]); char *buf, *buf2; buf = new char[M*M]; memset(buf, 1, M*M); buf2 = new char[M*M]; memset(buf2, 2, M*M); int i,j,k; clock_t a = clock(); for (k=0; k
-
In fact, I can't believe the result myself. But from my simple test, i found the matrix with 1024 alike is much slowly in memory access than 1023 or 1025. I dont know the real reason. My test program is very simple. My aim is test the memory copy cost in a matrix. There are three parameters to used, M for the size of matrix, N for the rows or columns try to copy, and Z is make the program run longer to get a stable result. #include #include #include #include void main(int argc, char** argv) { int M,N,Z; M = atoi(argv[1]); N = atoi(argv[2]); Z = atoi(argv[3]); char *buf, *buf2; buf = new char[M*M]; memset(buf, 1, M*M); buf2 = new char[M*M]; memset(buf2, 2, M*M); int i,j,k; clock_t a = clock(); for (k=0; k
From what I see here:
for (k=0; kN); }
majority of time you are just copying 3 bytes (assuming N=3) and not 1023,1024 and etc... On first iteration however, I would think you spend more time multiplyingN*M
and considering that N=3 in no case you get power of 2... "...Ability to type is not enough to become a Programmer. Unless you type in VB. But then again you have to type really fast..." Me -
From what I see here:
for (k=0; kN); }
majority of time you are just copying 3 bytes (assuming N=3) and not 1023,1024 and etc... On first iteration however, I would think you spend more time multiplyingN*M
and considering that N=3 in no case you get power of 2... "...Ability to type is not enough to become a Programmer. Unless you type in VB. But then again you have to type really fast..." Me -
The problem is not whether the script can be speed up or not. My script is just a speed tester. The main trouble in the second loop. Why 1024 perform much worse than 1023 and 1025 in the same program, even 1025 is bigger than 1024? Any idea?
Why 1024 perform much worse than 1023 and 1025 in the same program, even 1025 is bigger than 1024? You are not allocating 1024 bytes on the Heap and you are not copying 1024 bytes. So, what exactly you want to test? If speed of memcpy -- then I'll give you an answer: the shorter the better -- just check the code: nothing specific to pow of 2... If you assume that allocation with size 1024 would be alligned at 1024 address: wrong again -- that is true for HeapAlloc -- you are using new, which is default malloc that in reality is your requested size + 4 bytes in front to be returned by _msize()... Now I agree, that getting non-linear results seems weird. However, are you running any kind of optimization?... I would recommend trying to decrease Z, while making N=M -- see what happens... "...Ability to type is not enough to become a Programmer. Unless you type in VB. But then again you have to type really fast..." Me
-
Why 1024 perform much worse than 1023 and 1025 in the same program, even 1025 is bigger than 1024? You are not allocating 1024 bytes on the Heap and you are not copying 1024 bytes. So, what exactly you want to test? If speed of memcpy -- then I'll give you an answer: the shorter the better -- just check the code: nothing specific to pow of 2... If you assume that allocation with size 1024 would be alligned at 1024 address: wrong again -- that is true for HeapAlloc -- you are using new, which is default malloc that in reality is your requested size + 4 bytes in front to be returned by _msize()... Now I agree, that getting non-linear results seems weird. However, are you running any kind of optimization?... I would recommend trying to decrease Z, while making N=M -- see what happens... "...Ability to type is not enough to become a Programmer. Unless you type in VB. But then again you have to type really fast..." Me
In fact, i am working on a tiled image application, each tile sometime need several rows or columns of data in nearby tile. This is the reason i write the tester. In this situation, you will find 1024 is not a good selection, even you create a matrix with side of 1025 (but not use the extended 1 pixel) will work better than exactly 1024. I think i didn't turn on any special optimization. Just use the default release option as console in vs7. I also copy my tester to other pc. The result is almost the same. PS. my Os is win2k, CPU is P4 1.7G, Memory 512M