Interesting problems

FlyingDancer

Program Code: #include #include #define N 1024 int i,j,k; float slice[N][N]; void main() { time_t start,end; float s; start=time(NULL); for(k=0;k<100;k++) { for(j=0;j slice[i][j]=(float)(slice[i][j]+0.01); slice[j][i]=(float)(slice[j][i]+0.01); } printf("%d\n",k); } end=time(NULL); s=difftime(end,start); printf(" The total time is %f:",s); } Questions or problems(Compiled by Visual C++ 6.0): 1.If N equals 1022,1023,1025 or 1026, its run time is about 13 seconds, else if N=1024 that will be about 56 seconds. that is, the speed is very different. 2. "slice[j][i]=(float)(slice[j][i]+0.01)" is executed over two times faster than "slice[i][j]=(float)(slice[i][j]+0.01);". You can have a try by cutting off one of these sentences. 3. An exception will happen if "int i,j,k; float slice[N][N];" is moved into function main() that says "test.exe has encountered a problem and needs to close. We are sorry for the inconvenience.". In addition, my program is named by "test.cpp" Have you known these problems? and could you give me an explanation and how to avoid these bad results please Any is appreciated! Thanks!

Maxwell Chen

With VC++6, and with VC++7(2002), I saw the same situation. 2) Regarding to FlyingDancer wrote: float slice[N][N];" is moved into function main() please try this, since you use .cpp extension. It does not crash.

// #include<stdio.h>
#include <iostream>
#include <time.h>

// #define N 1023
// float slice[N][N];
void main()
{
int i,j,k;
time_t start,end;
float s;
const int N = 1024;
float (*slice)[N] = new float[N][N];

start=time(NULL);
for(k=0;k<100;k++)
{
	for(j=0;j<N;j++)
		for(i=0;i<N;i++)
		{
			slice\[i\]\[j\]=(float)(slice\[i\]\[j\]+0.01F);
			slice\[j\]\[i\]=(float)(slice\[j\]\[i\]+0.01F);
		}
		printf("%d\\n",k);
}
end=time(NULL);
s=difftime(end,start);
printf("   The total time is %f:",s);
delete\[\] slice;

}

Regarding to 1024 taking that long time, I dunno. I guess that it may be the x86 instructions... Maxwell Chen

FlyingDancer

Yeah It can work well, not considering its run speed Your way is to adopt an array pointer, that seems a little different but what has been solved regarding to a big array slice[N][N] by adopting a array pointer? This is a difficult problem, really. Is it related with OS, compiling way? Or maybe it is a memory allocating problem... How do you think about this?

ohadp

it crashes because if not allocated dynamically this array is allocated on the stack which probably can't hold 1024*1024*4-bytes...

FlyingDancer

Problem: "slice[j][i]=(float)(slice[j][i]+0.01F);" is executed faster than "slice[i][j]=(float)(slice[i][j]+0.01F);" I think it should be answered from two aspects 1. In VC, A two-dimension array is stored according to its row first, then its col,... 2. Virtual memory technology. Paging and Swaping In this problem every row of that array couldn't get enough free space so when accessing to any data of another row a swapping action will happen therefore one runs faster than the other Am I right?

Maxwell Chen

:-D Maxwell Chen

Paul Ranson

1. I don't know, other than it's likely to be a virtual memory pathology. 2. This is your code, // v1 for ( j = 0; j< N; j++ ) { for ( i = 0; i < N; i++ ) { slice [i][j] = (float)(slice[i][j] + 0.01 ) ; } } // which is equivalent to for ( j = 0; j< N; j++ ) { for ( i = 0; i < N; i++ ) { float * pf = slice + ( i * N ) + j ; *pf += 0.01 ; } } // v2 for ( j = 0; j< N; j++ ) { for ( i = 0; i < N; i++ ) { slice [j][i] = (float)(slice[j][i] + 0.01 ) ; } } // which is equivalent to for ( j = 0; j< N; j++ ) { float * pf = slice + (j * N) ; for ( i = 0; i < N; i++ ) { *pf += 0.01 ; ++pf ; } } IOW in the first example you are asking the CPU to do an extra multiplication each time around the inner loop. The optimiser may be able to turn it into an addition (if that's faster...), but it's still extra work. More subtley the second example accesses memory consecutively, so the data is much more likely to be in the CPU cache, whereas the first accesses every N * sizeof ( float ) bytes which means the next value will never be in the cache, accessing main memory means waiting about, accessing the cache puts that off, and since the cache is read and written to main memory in relatively large chunks you will get an entire 'cache line' of modified values going to main memory in the same time as it takes to write one. Anyway it would be worth examining the generated machine code for each example to see what the optimiser actually does, and perhaps play with the options. 3. The default stack size for Win32 is 1MB. You are asking to allocate 4MB (sizeof ( float ) == 4 ) so the only way is to exit with an exception. You can adjust this in the linker, or with EditBin, but for a data structure of this nature either declaring it statically as in your example or allocation on the heap as in Maxwell's is appropriate. Paul

FlyingDancer

Great! Full and clear!! Thank you very much!!! :laugh: