Unrolling a loop: why can't .NET add 1 + 2 + 3 + ...

MtnBiknGuy

Take some standard code such as shown below. It simply loops to add up a series of terms and it produces the correct result. // sum numbers with a loop public int DoSumLooping(int iterations) { int result = 0; for(int i = 1;i <=iterations;i++) { result += i; } return result; } Now translate this into a specific solution that doesn't use looping (and use the same value for the number of iterations the loop performs). This code returns an incorrect result. The method consists entirely of a very straightforward code statement, but in this case .NET adds the numbers incorrectly. public double ComputeSum( ) { // Brute force sum method // For iterations == 10000 double sum = 0+ 1+ 2+ 3+ 4+ 5+ 6+ 7+ 8+ 9+ 10+ 11+ 12+ 13+ 14+ 15+ ... + 9997+ 9998+ 9999+ 10000; return sum; } The above method returns an incorrect result with any number of terms above about 200. It will correctly add 1 + 2 + ... + 200, but it will NOT correctly add 1 + 2 + ... + 1000. I have just run across this, and I have not yet researched the possible reasons for this behavior. It may be a known issue related to either stack size or the length of a code line, but to my knowledge it hasn't been discussed in any of the "popular" literature on C# and .NET. I need to write code like this, so if anyone has already encountered this issue, please advise me. Here's another example that also creates problems, but of a somewhat different nature. Take the following code and translate it into a specific, non-looping method and try to execute it using reflection. It fails. public double LoopToCompute() { double sumOfProducts = 0; double grandTotal = 0; for (int i = 0; i < maxRows; ++i) { for (int j = 0; j < maxCols; ++j) { sumOfProducts += coeff[j] * table[i][j]; } a_point[i] = sumOfProducts; grandTotal += sumOfProducts; sumOfProducts = 0; } return grandTotal; }//LoopToCompute The above code works -- but it's equivalent code with loops unrolled (shown below) doesn't work unless the maxRows is set very small. For small values, the 2 methods (above and below) produce identical results. There is nothing "wrong" with the code in that sense. It's similar to the above situation. If the "size" of the code statement or the number of code statements is too large, .NET fails. In this case (using reflection) it doesn't return the incorrect result, as the first example did. In this case, reflection calls it an invalid program and refuses to

Daniel Turini

I'd suggest you to use ILDASM and see the generated code. This will help to isolate if the problem is on JIT or on C# compiler. Also, you could try using other .NET languages, like VB.NET. BTW, for this kind of code, managed C++ rules. With C++ you always have much more control over the compiler.

Help me dominate the world - click this link and my army will grow

leppie

Interesting situation :) What version of .NET are u using? Are you going over the 2048 character line limit? The did the same thing, but with a single " + xxxx" on a single line and get the proper result (50005000 for 10000). In fact the compiler does constant folding and only loads the result. leppie::AllocCPArticle("Zee blog");
Seen on my Campus BBS: Linux is free...coz no-one wants to pay for it.

Andy Davey

Just out of interest, are you unrolling your loops purely for performance reasons? What is the measured difference in performance between using the loop and unrolling it? Andy

MtnBiknGuy

Yes, I'm unrolling the loops for performance reasons. So far, in my timing tests, the unrolled loops are about 5-10 times SLOWER (it's not the example code I posted. In that code, the unrolled loops are faster). Figure that one out. There is a range of performance differences. Unrolling a loop AND hard-coding the logic (via reflection emit) is about 3000% faster according to Jesse Liberty in his Programming C# book. On the other hand, in my complex case where I unroll the loop but still require 2 dimensional array lookups in the unrolled lines of code, I see the slower performance I quoted above (average loop 190ms, average unrolled code 980ms). When I get a chance, I could post my code (if anyone is interested, that is). Dave

MtnBiknGuy

I found the problem. I think you could have guessed it -- it was related to the line limit. Here's what I was doing. I was using code to generate my unrolled loop code. I was then compiling this in a process and using reflection to load the assembly and invoke the method. I was not getting any compiler errors, nor was I getting any runtime errors. I was just getting incorrect math results. However, when I posted the code (without the reflection part) everyone else got the correct results. This prompted me to look at the reflection/compilation code more closely. It turns out the when I pasted the generated code into the IDE I discovered the line limit you mentioned. I inserted a new line in my generated code after every 200 terms, and presto- the math results were now correct. It's a bit disconcerting that I didn't get a compiler error from the command line compiler. I don't know why yet. Does the command line compiler have that same line limit? I assume so, given my experience. I guess the compiler just took as many terms as it could and somehow terminated the statement and produced a runnable program ... all very strange. I wouldn't have thought that possible.

Anonymous

It's working with DotGNU anyway ... Log a bug with MS

MtnBiknGuy

leppie, Do you happen to know why the following method "is not a valid program"? It compiles fine, but the runtime won't even try to run it. It contains 500 statements (I cut out 497 of them for this post) like those below. The method represents another unrolled loop. (And this is the one I mentioned that is actually 5-10 times slower than the loop after it's unrolled.) I appreciate any comments. Dave P.S. I forgot to mention that the problem is related to the number of statements in the method. 1000 doesn't work. I think it works up to a few hundred statements, then stops being "a valid program." public double DoBruteForceCompute() { double bruteForceSum = 0; point1=coeff1*table[0][0] +coeff2*table[0][1] +coeff3*table[0][2] +coeff4*table[0][3] +coeff5*table[0][4] +coeff6*table[0][5] +coeff7*table[0][6] +coeff8*table[0][7] +coeff9*table[0][8] +coeff10*table[0][9] +coeff11*table[0][10] +coeff12*table[0][11] +coeff13*table[0][12] +coeff14*table[0][13] +coeff15*table[0][14] +coeff16*table[0][15] +coeff17*table[0][16] +coeff18*table[0][17] +coeff19*table[0][18] +coeff20*table[0][19] +coeff21*table[0][20] +coeff22*table[0][21] +coeff23*table[0][22] +coeff24*table[0][23] +coeff25*table[0][24] +coeff26*table[0][25] +coeff27*table[0][26] +coeff28*table[0][27] +coeff29*table[0][28] +coeff30*table[0][29] +coeff31*table[0][30] +coeff32*table[0][31] +coeff33*table[0][32] +coeff34*table[0][33] +coeff35*table[0][34] ; point2=coeff1*table[1][0] +coeff2*table[1][1] +coeff3*table[1][2] +coeff4*table[1][3] +coeff5*table[1][4] +coeff6*table[1][5] +coeff7*table[1][6] +coeff8*table[1][7] +coeff9*table[1][8] +coeff10*table[1][9] +coeff11*table[1][10] +coeff12*table[1][11] +coeff13*table[1][12] +coeff14*table[1][13] +coeff15*table[1][14] +coeff16*table[1][15] +coeff17*table[1][16] +coeff18*table[1][17] +coeff19*table[1][18] +coeff20*table[1][19] +coeff21*table[1][20] +coeff22*table[1][21] +coeff23*table[1][22] +coeff24*table[1][23] +coeff25*table[1][24] +coeff26*table[1][25] +coeff27*table[1][26] +coeff28*table[1][27] +coeff29*table[1][28] +coeff30*table[1][29] +coeff31*table[1][30] +coeff32*table[1][31] +coeff33*table[1][32] +coeff34*table[1][33] +coeff35*table[1][34] ; [...] point500=coeff1*table[499][0] +coeff2*table[499][1] +coeff3*table[499][2] +coeff4*table[499][3] +coeff5*table[499][4] +coeff6*table[499][5] +coeff7*table[499][6] +coeff8*table[499][7] +coeff9*table[499][8] +coeff10*table[499][9] +coeff11*table[499][10] +coeff12*table[499][11] +coeff13*table[499][12] +coeff14*table

MtnBiknGuy

leppie, Does CodeDom CodeSnippetStatement take care of breaking long lines or does it too have a limit on the length of a string it can take? Dave

leppie

You pretty lazy for a mountain biker.... just make a wrapper method :) leppie::AllocCPArticle("Zee blog");
Seen on my Campus BBS: Linux is free...coz no-one wants to pay for it.

MtnBiknGuy

...probably more like I've taken too many crashes on my head