How can I avoid a for-loop and speed up my code
-
Behind the scenes, Matlab is creating a for loop for you. If you want a faster implementation that avoids the .net bounds checking, you could try writing the innermost layer of your code in c++ (with optional processor specific asm optimizations to use MMX/SSE instructions) and then using pinvoke to call it from the main c# portion of your app.
-- Rules of thumb should not be taken for the whole hand.
Unsafe C# can get around bounds checking as well, and would be a lot easier that having to code up a separate C++ lib and invoke that from C#.
Tech, life, family, faith: Give me a visit. I'm currently blogging about: Check out this cutie The apostle Paul, modernly speaking: Epistles of Paul Judah Himango
-
Hi coders I'm doing some advanced calculations mostly on sound, but I'm relative new to C#...so I was wondering if anyone has some advice on how I can (if possible) speed up the more primitive part of my code...e.g when I want to multply to arrays colmnwise. In Matlab this would look like Z=X.*Y but I cannot figure out a smart way to do this without using a for-loop in C# like for (int i = 0; i < length; i++) { Z[i]=x[i]*Y[i]; } As you can see, this takes a long time if the array consist of a sound segment of e.g 3 seconds which is 132300 samples. Do you know of any better way how this can be done... Best regards AL
One way to speed up things like this on multi-processor or multi-core machines is to use multiple threads to do the processing. Say you have one of those new Intel Quad Core processors, you've got 4 hardware threads available. Split the array into 4 segments, then spawn 3 new threads, each the processes a segment of its own. Then have the current thread process a segment of its own (4 threads total). You've essentially increased performance 4x. Of course, this solution only works for processors that are hyperthreaded (i.e. 2 hardware threads per physical processor core) or for machines that have multiple processors or multiple cores. p.s. one should always question optimizing code like this unless you're absolutely, positively certain that there is a performance bottleneck here and the current performance is not acceptable.
Tech, life, family, faith: Give me a visit. I'm currently blogging about: Check out this cutie The apostle Paul, modernly speaking: Epistles of Paul Judah Himango
-
One way to speed up things like this on multi-processor or multi-core machines is to use multiple threads to do the processing. Say you have one of those new Intel Quad Core processors, you've got 4 hardware threads available. Split the array into 4 segments, then spawn 3 new threads, each the processes a segment of its own. Then have the current thread process a segment of its own (4 threads total). You've essentially increased performance 4x. Of course, this solution only works for processors that are hyperthreaded (i.e. 2 hardware threads per physical processor core) or for machines that have multiple processors or multiple cores. p.s. one should always question optimizing code like this unless you're absolutely, positively certain that there is a performance bottleneck here and the current performance is not acceptable.
Tech, life, family, faith: Give me a visit. I'm currently blogging about: Check out this cutie The apostle Paul, modernly speaking: Epistles of Paul Judah Himango
For a job as simple as an inner product, I expect bus bandwidth limitations will be dominant over CPU limitations, so no much help from multi-threading... I do fully agree with the p.s. though :)
Luc Pattyn
-
Unsafe C# can get around bounds checking as well, and would be a lot easier that having to code up a separate C++ lib and invoke that from C#.
Tech, life, family, faith: Give me a visit. I'm currently blogging about: Check out this cutie The apostle Paul, modernly speaking: Epistles of Paul Judah Himango
-
True, but IIRC native code is still faster than unsafe.
-- Rules of thumb should not be taken for the whole hand.
I'd be surprised at that. And given the P/Invoke overhead that would be required with a C++ lib on the side, unsafe C# may quite well out-perform.
Tech, life, family, faith: Give me a visit. I'm currently blogging about: Check out this cutie The apostle Paul, modernly speaking: Epistles of Paul Judah Himango
-
For a job as simple as an inner product, I expect bus bandwidth limitations will be dominant over CPU limitations, so no much help from multi-threading... I do fully agree with the p.s. though :)
Luc Pattyn
The job is simple, but the bottleneck is the linear time it takes to compute one operation, move onto the next, until finished. Meanwhile, potentially 1 or more cores are idle and could be doing these operations in the meantime. I'm quite certain you'd see a good speed up here. The MS Robotics team that built the CCR (concurrency and coordination runtime, a .NET library for threading and coordination among threads) found big speedups, often near a multiple of the number of cores in a machine, by utilizing multiple threads to do this kind of thing. Given, they are doing lots of IO, however. Joe Duffy, a CLR architect, is busy working on the PLinq (Parallel Language Integrated Query) project that will allow devs to easily parallelize queries and transformations on data. This is essentially the technique I described above: using 1 thread per hardware thread to parallelize queries and transformations on data. Eric Sink has an article[^] on his blog showing how C# can do Map, which utilizes this idea.
Tech, life, family, faith: Give me a visit. I'm currently blogging about: Check out this cutie The apostle Paul, modernly speaking: Epistles of Paul Judah Himango
-
The job is simple, but the bottleneck is the linear time it takes to compute one operation, move onto the next, until finished. Meanwhile, potentially 1 or more cores are idle and could be doing these operations in the meantime. I'm quite certain you'd see a good speed up here. The MS Robotics team that built the CCR (concurrency and coordination runtime, a .NET library for threading and coordination among threads) found big speedups, often near a multiple of the number of cores in a machine, by utilizing multiple threads to do this kind of thing. Given, they are doing lots of IO, however. Joe Duffy, a CLR architect, is busy working on the PLinq (Parallel Language Integrated Query) project that will allow devs to easily parallelize queries and transformations on data. This is essentially the technique I described above: using 1 thread per hardware thread to parallelize queries and transformations on data. Eric Sink has an article[^] on his blog showing how C# can do Map, which utilizes this idea.
Tech, life, family, faith: Give me a visit. I'm currently blogging about: Check out this cutie The apostle Paul, modernly speaking: Epistles of Paul Judah Himango
Sure, I believe multithreading can be great for any single job that is compute bound (including Map, the CCR stuff, and much more), as well as for most situations where a multitude of jobs come together. But my point is multiplying two arrays isnt much more than a data mover. And I expect the loop overhead will mostly be dealt with by the CPU's out-of-order capabilities. So lets wait and see. :)
Luc Pattyn
-
I'd be surprised at that. And given the P/Invoke overhead that would be required with a C++ lib on the side, unsafe C# may quite well out-perform.
Tech, life, family, faith: Give me a visit. I'm currently blogging about: Check out this cutie The apostle Paul, modernly speaking: Epistles of Paul Judah Himango
-
Hi coders I'm doing some advanced calculations mostly on sound, but I'm relative new to C#...so I was wondering if anyone has some advice on how I can (if possible) speed up the more primitive part of my code...e.g when I want to multply to arrays colmnwise. In Matlab this would look like Z=X.*Y but I cannot figure out a smart way to do this without using a for-loop in C# like for (int i = 0; i < length; i++) { Z[i]=x[i]*Y[i]; } As you can see, this takes a long time if the array consist of a sound segment of e.g 3 seconds which is 132300 samples. Do you know of any better way how this can be done... Best regards AL
There is a library at Microsoft Research called Accelerator that might be worth investigating. Accelerator provides a high-level data-parallel programming model as a library that is available for all .Net programming languages. The library translates the data-parallel operations on-the-fly to optimized GPU pixel shader code and API calls. Future versions will target multi-core cpus. Download, Channel9 Video
-
Yeah, it looks like he commented the unsafe portions, meaning he's back to normal array bounds checking and all that. I wonder why?
Tech, life, family, faith: Give me a visit. I'm currently blogging about: Check out this cutie The apostle Paul, modernly speaking: Epistles of Paul Judah Himango
-
Hi coders I'm doing some advanced calculations mostly on sound, but I'm relative new to C#...so I was wondering if anyone has some advice on how I can (if possible) speed up the more primitive part of my code...e.g when I want to multply to arrays colmnwise. In Matlab this would look like Z=X.*Y but I cannot figure out a smart way to do this without using a for-loop in C# like for (int i = 0; i < length; i++) { Z[i]=x[i]*Y[i]; } As you can see, this takes a long time if the array consist of a sound segment of e.g 3 seconds which is 132300 samples. Do you know of any better way how this can be done... Best regards AL
Bet you didn't know you would get all this 'feed back' did you:) Just remember that no matter what you will still have the loop. So, think of it like this do I have a loop here with a little code or do I have a loop somewhere else with extra code to get there? Your call, but these guys are giving some real good knowledge that you should diffently try to learn. Good Luck, Jason
Programmer: A biological machine designed to convert caffeine into code. * Developer: A person who develops working systems by writing and using software. [^]
-
Hi coders I'm doing some advanced calculations mostly on sound, but I'm relative new to C#...so I was wondering if anyone has some advice on how I can (if possible) speed up the more primitive part of my code...e.g when I want to multply to arrays colmnwise. In Matlab this would look like Z=X.*Y but I cannot figure out a smart way to do this without using a for-loop in C# like for (int i = 0; i < length; i++) { Z[i]=x[i]*Y[i]; } As you can see, this takes a long time if the array consist of a sound segment of e.g 3 seconds which is 132300 samples. Do you know of any better way how this can be done... Best regards AL
You can multithread the multiplication if the arrays are large enough. Another option is to drop into unsafe code and use pointer arithmetic. Antoher option is to use 64 bit multiplication but you might need a bitwise transform on the result. (Check out the the assembly for strlen for a non-application example) I don't know about intel chips but some processors provide op codes for array based operations. Check the MMX instruction set and you may be able to multiply the entire set in two or three op codes. http://web.cs.wpi.edu/~matt/courses/cs563/talks/powwie/p3/mmx.htm
On two occasions I have been asked [by members of Parliament], 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question. - Charles Babbage
-
Hi coders I'm doing some advanced calculations mostly on sound, but I'm relative new to C#...so I was wondering if anyone has some advice on how I can (if possible) speed up the more primitive part of my code...e.g when I want to multply to arrays colmnwise. In Matlab this would look like Z=X.*Y but I cannot figure out a smart way to do this without using a for-loop in C# like for (int i = 0; i < length; i++) { Z[i]=x[i]*Y[i]; } As you can see, this takes a long time if the array consist of a sound segment of e.g 3 seconds which is 132300 samples. Do you know of any better way how this can be done... Best regards AL
Hi coders Just wanted to say thanks for all your many inputs - As expected there seems to be no magic answer to this quistion, but I will take a closer look at using pointers - I did not know this was possible in C# so thanks alot... AL