Parallelizing code the easy (er) way
-
Can you share these more complete versions, or do we really have to wait for you to wrie an article?
Okay, I'll work on the article. Just don't tell my boss. ;)
-
I've created a whole bunch of parallel wrappers for the Func... and Action... helpers in .NET. It makes it much more convenient to execute stuff in parallel from the thread pool. There are multiple overloads for each overload of Func<> and Action<>, but here's one example:
public static void Invoke<T1, T2>( Action<T1, T2> action, IEnumerable<object[]> args ) { List<IAsyncResult> results = new List<IAsyncResult>(); foreach( object[] argList in args ) results.Add( action.BeginInvoke( (T1)argList[ 0 ], (T2)argList[ 1 ], null, null ) ); foreach( IAsyncResult result in results ) { action.EndInvoke( result ); } }
And I can use it like this:class Program { static void Main( string[] args ) { ParallelAction.Invoke<string, int>( Foo, new[] { new object[] { "string1", 1 }, new object[] { "string2", 2 }, new object[] { "string3", 3 }, } ); Console.ReadLine(); } public static void Foo( string s, int i ) { Console.WriteLine( "Got: s={0}, i={1}", s, i ); } }
I have more complete versions that handle exceptions on threads, but those are too big to post here. Maybe I'll write an article later.The ApiChange tool has a nice Api built upon it with full excepiton handling and other goodies. The base class is WorkItemDispatcher where you can send work via a queue to the thread pool while you can limit the maximum concurrent threads to some value. WorkItemDispatcher.cs In the unit tests are some examples how this thing can be used. WorkItemDispatcherTests.cs I did need it to concurrently read files from n threads but not at the same time since the hard disc would drop dead if 500 threds would read 500 different files in parallel. Did you know about the ApicChange tool? Perhaps I will write an article about that one as well :) More info about this can be found here: http://geekswithblogs.net/akraus1/Default.aspx Yours, Alois Kraus
-
I've created a whole bunch of parallel wrappers for the Func... and Action... helpers in .NET. It makes it much more convenient to execute stuff in parallel from the thread pool. There are multiple overloads for each overload of Func<> and Action<>, but here's one example:
public static void Invoke<T1, T2>( Action<T1, T2> action, IEnumerable<object[]> args ) { List<IAsyncResult> results = new List<IAsyncResult>(); foreach( object[] argList in args ) results.Add( action.BeginInvoke( (T1)argList[ 0 ], (T2)argList[ 1 ], null, null ) ); foreach( IAsyncResult result in results ) { action.EndInvoke( result ); } }
And I can use it like this:class Program { static void Main( string[] args ) { ParallelAction.Invoke<string, int>( Foo, new[] { new object[] { "string1", 1 }, new object[] { "string2", 2 }, new object[] { "string3", 3 }, } ); Console.ReadLine(); } public static void Foo( string s, int i ) { Console.WriteLine( "Got: s={0}, i={1}", s, i ); } }
I have more complete versions that handle exceptions on threads, but those are too big to post here. Maybe I'll write an article later.Using the thread pool that way is dangerous, you might deadlock if all threads in the pool are in use: The action being invoked cannot start because it's waiting for a thread in the pool to become free. But all threads in the pool might be busy waiting for actions to finish. (see http://dotnetdebug.net/2005/07/17/threadpool-deadlocks-avoid-drowning-yourself/[^]) The Task Parallel Library in .NET 4.0 solves this problem by detecting this scenario and executing tasks synchronously where necessary. Basically your code is just a dangerous and (for a large number of short tasks) inefficient version of Parallel.ForEach[^].
-
Using the thread pool that way is dangerous, you might deadlock if all threads in the pool are in use: The action being invoked cannot start because it's waiting for a thread in the pool to become free. But all threads in the pool might be busy waiting for actions to finish. (see http://dotnetdebug.net/2005/07/17/threadpool-deadlocks-avoid-drowning-yourself/[^]) The Task Parallel Library in .NET 4.0 solves this problem by detecting this scenario and executing tasks synchronously where necessary. Basically your code is just a dangerous and (for a large number of short tasks) inefficient version of Parallel.ForEach[^].
I think you misunderstood the threadpool deadlock blob it's not really a problem to have more request pending that there is thread in the tread pool, UNLESS some of those pending tasks are waiting on other pending task. Which is not his case! Anyway, I concur that the Parallel class rocks! Even more than his sample code! :-D
A train station is where the train stops. A bus station is where the bus stops. On my desk, I have a work station.... _________________________________________________________ My programs never have bugs, they just develop random features.
-
I think you misunderstood the threadpool deadlock blob it's not really a problem to have more request pending that there is thread in the tread pool, UNLESS some of those pending tasks are waiting on other pending task. Which is not his case! Anyway, I concur that the Parallel class rocks! Even more than his sample code! :-D
A train station is where the train stops. A bus station is where the bus stops. On my desk, I have a work station.... _________________________________________________________ My programs never have bugs, they just develop random features.
Super Lloyd wrote:
it's not really a problem to have more request pending that there is thread in the tread pool, UNLESS some of those pending tasks are waiting on other pending task. Which is not his case!
You don't know if that's the case. The thread calling ParallelAction.Invoke might be running on the thread-pool, too. For example, if you use ParallelAction.Invoke in an ASP.NET request. Or if you use it within another parallel action. The only place to safely use ParallelAction.Invoke would be if in code known to run on a thread you created yourself. But most code in your app shouldn't care on which thread it runs on, so you cannot safely use ParallelAction.Invoke. In fact you cannot ever safely use Delegate.EndInvoke in most code, which shows that this was a big design mistake in the ThreadPool API. Fortunately the .NET 4 Task class (and the Parallel.ForEach built on top of it) doesn't have this problem.
-
Super Lloyd wrote:
it's not really a problem to have more request pending that there is thread in the tread pool, UNLESS some of those pending tasks are waiting on other pending task. Which is not his case!
You don't know if that's the case. The thread calling ParallelAction.Invoke might be running on the thread-pool, too. For example, if you use ParallelAction.Invoke in an ASP.NET request. Or if you use it within another parallel action. The only place to safely use ParallelAction.Invoke would be if in code known to run on a thread you created yourself. But most code in your app shouldn't care on which thread it runs on, so you cannot safely use ParallelAction.Invoke. In fact you cannot ever safely use Delegate.EndInvoke in most code, which shows that this was a big design mistake in the ThreadPool API. Fortunately the .NET 4 Task class (and the Parallel.ForEach built on top of it) doesn't have this problem.
mm.. you're right... it will not deadlock automatically, but heavy use of BeginInvoke() & EndInvoke() all over the places is, indeed, potentially fatal... I hadn't realized it yet, I confess, because (lucky me! :- ) I prefer to use my own thread (as opposed to PoolThread) and prefer to run Action asynchronously (no need for EndInvoke then!) Interesting after all! :-)
A train station is where the train stops. A bus station is where the bus stops. On my desk, I have a work station.... _________________________________________________________ My programs never have bugs, they just develop random features.
-
Super Lloyd wrote:
it's not really a problem to have more request pending that there is thread in the tread pool, UNLESS some of those pending tasks are waiting on other pending task. Which is not his case!
You don't know if that's the case. The thread calling ParallelAction.Invoke might be running on the thread-pool, too. For example, if you use ParallelAction.Invoke in an ASP.NET request. Or if you use it within another parallel action. The only place to safely use ParallelAction.Invoke would be if in code known to run on a thread you created yourself. But most code in your app shouldn't care on which thread it runs on, so you cannot safely use ParallelAction.Invoke. In fact you cannot ever safely use Delegate.EndInvoke in most code, which shows that this was a big design mistake in the ThreadPool API. Fortunately the .NET 4 Task class (and the Parallel.ForEach built on top of it) doesn't have this problem.
FYI, I came up with these when I was writing a server that had to execute complex hierarchical queries of data from a distributed cache. Since each element each level in the object hierarchy was stored as an individual blob, simple iterative queries took forever. I parallelized the queries at each level for it's children this way. There was no danger of a deadlock, but I get your point. (giving someone enough rope to hang themselves... but in that case, they shouldn't be trusted with writing multi-threaded apps) BTW, this improved our performance by orders of magnitude, and saved us a lot of coding time, by just having an abstraction like this. So, if you still don't like it, don't use it. ;-)
-
Super Lloyd wrote:
it's not really a problem to have more request pending that there is thread in the tread pool, UNLESS some of those pending tasks are waiting on other pending task. Which is not his case!
You don't know if that's the case. The thread calling ParallelAction.Invoke might be running on the thread-pool, too. For example, if you use ParallelAction.Invoke in an ASP.NET request. Or if you use it within another parallel action. The only place to safely use ParallelAction.Invoke would be if in code known to run on a thread you created yourself. But most code in your app shouldn't care on which thread it runs on, so you cannot safely use ParallelAction.Invoke. In fact you cannot ever safely use Delegate.EndInvoke in most code, which shows that this was a big design mistake in the ThreadPool API. Fortunately the .NET 4 Task class (and the Parallel.ForEach built on top of it) doesn't have this problem.
In fact you cannot ever safely use Delegate.EndInvoke in most code, which shows that this was a big design mistake in the ThreadPool API. My understanding is that Microsoft explicitly denounced fire-and-forget semantics with Delegate.BeginInvoke (as distinct from Control.BeginInvoke, where fire-and-forget is the norm). I think garbage-collection will usually clear up resources left dangling by fire-and-forget code, but according to Microsoft one is supposed to use EndInvoke. How one can do so safely without deadlock I have no idea. My own approach is to simply not use the system thread pool.
-
In fact you cannot ever safely use Delegate.EndInvoke in most code, which shows that this was a big design mistake in the ThreadPool API. My understanding is that Microsoft explicitly denounced fire-and-forget semantics with Delegate.BeginInvoke (as distinct from Control.BeginInvoke, where fire-and-forget is the norm). I think garbage-collection will usually clear up resources left dangling by fire-and-forget code, but according to Microsoft one is supposed to use EndInvoke. How one can do so safely without deadlock I have no idea. My own approach is to simply not use the system thread pool.
Yes, better forget the Delegate.Begin/EndInvoke methods completely. For some reason, they are also coupled to the remoting code[^], making them much slower than directly using the ThreadPool class. For fire-and-forget semantics, simply use ThreadPool.QueueUserWorkItem.
-
In fact you cannot ever safely use Delegate.EndInvoke in most code, which shows that this was a big design mistake in the ThreadPool API. My understanding is that Microsoft explicitly denounced fire-and-forget semantics with Delegate.BeginInvoke (as distinct from Control.BeginInvoke, where fire-and-forget is the norm). I think garbage-collection will usually clear up resources left dangling by fire-and-forget code, but according to Microsoft one is supposed to use EndInvoke. How one can do so safely without deadlock I have no idea. My own approach is to simply not use the system thread pool.
supercat9 wrote:
simply not use the system thread pool
does that include the events used by timers (other than System.Windows.Forms.Timer), serial ports, filesystemwatchers, and anything else that causes asynchronous code execution? :)
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]
I only read formatted code with indentation, so please use PRE tags for code snippets.
I'm not participating in frackin' Q&A, so if you want my opinion, ask away in a real forum (or on my profile page).
-
supercat9 wrote:
simply not use the system thread pool
does that include the events used by timers (other than System.Windows.Forms.Timer), serial ports, filesystemwatchers, and anything else that causes asynchronous code execution? :)
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]
I only read formatted code with indentation, so please use PRE tags for code snippets.
I'm not participating in frackin' Q&A, so if you want my opinion, ask away in a real forum (or on my profile page).
If one uses outside code that itself uses the threadpool, one's stuck with implicitly using the threadpool for those things. I'm not sure I see a good rationale for something like SerialPort to use the thread pool. Since serial ports tend to persist for awhile and are relatively limited in number, I would think that having each port create a thread which dies when the port dies would be a better approach than having the port use a threadpool thread that may or may not be available.
-
If one uses outside code that itself uses the threadpool, one's stuck with implicitly using the threadpool for those things. I'm not sure I see a good rationale for something like SerialPort to use the thread pool. Since serial ports tend to persist for awhile and are relatively limited in number, I would think that having each port create a thread which dies when the port dies would be a better approach than having the port use a threadpool thread that may or may not be available.
supercat9 wrote:
having each port create a thread which dies when the port dies would be a better approach
I tend to agree with you, and lacking any statement about which threads are used, I once ran some tests[^]; my conclusion was that .NET seems to use ThreadPool all the time, and never create threads other than the ones it needs inside ThreadPool. :)
Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles]
I only read formatted code with indentation, so please use PRE tags for code snippets.
I'm not participating in frackin' Q&A, so if you want my opinion, ask away in a real forum (or on my profile page).
modified on Friday, June 11, 2010 3:15 PM
-
Using the thread pool that way is dangerous, you might deadlock if all threads in the pool are in use: The action being invoked cannot start because it's waiting for a thread in the pool to become free. But all threads in the pool might be busy waiting for actions to finish. (see http://dotnetdebug.net/2005/07/17/threadpool-deadlocks-avoid-drowning-yourself/[^]) The Task Parallel Library in .NET 4.0 solves this problem by detecting this scenario and executing tasks synchronously where necessary. Basically your code is just a dangerous and (for a large number of short tasks) inefficient version of Parallel.ForEach[^].
Okay, so now that I had some free time to go look into the article you linked in more detail, I don't think I agree with your assessment that this is dangerous. First of all, you will only deadlock if you starve off all of the threads in the thread pool and those tasks are all waiting on some task that is queued and can't run. Further reading confirmed my suspicions that this scenario is more likely to happen in an ASP.NET application. I wrote these for a windows service that has nothing to do with ASP.NET. On the one hand, I understand the danger posed, but I think you've greatly overstated the issue. I've used this pattern heavily and never seen a single dead lock, because the operations that I'm running are for the most part atomic. Parallel.ForEach, of course, was not available when I wrote this. So, there you are. Sadly, if you on .NET 3.5 you can no longer download the parallel extensions (there's a deprecated version under the NET2 namespace). Otherwise, you'll have to use .NET 4.0, and that's pretty green code right now.
-
Okay, so now that I had some free time to go look into the article you linked in more detail, I don't think I agree with your assessment that this is dangerous. First of all, you will only deadlock if you starve off all of the threads in the thread pool and those tasks are all waiting on some task that is queued and can't run. Further reading confirmed my suspicions that this scenario is more likely to happen in an ASP.NET application. I wrote these for a windows service that has nothing to do with ASP.NET. On the one hand, I understand the danger posed, but I think you've greatly overstated the issue. I've used this pattern heavily and never seen a single dead lock, because the operations that I'm running are for the most part atomic. Parallel.ForEach, of course, was not available when I wrote this. So, there you are. Sadly, if you on .NET 3.5 you can no longer download the parallel extensions (there's a deprecated version under the NET2 namespace). Otherwise, you'll have to use .NET 4.0, and that's pretty green code right now.
As soon as you have thread-pool threads executing your ParallelAction.Invoke, there's the possibility of deadlocks. If you're sure you only ever call ParallelAction.Invoke from non-thread-pool threads (main thread or threads you created yourself), then go ahead. Otherwise, there's the possibility of deadlock. Yes, the deadlock can occur only if the thread pool is out of threads. That usually doesn't happen in testing, but it will happen under heavy load in production.
-
As soon as you have thread-pool threads executing your ParallelAction.Invoke, there's the possibility of deadlocks. If you're sure you only ever call ParallelAction.Invoke from non-thread-pool threads (main thread or threads you created yourself), then go ahead. Otherwise, there's the possibility of deadlock. Yes, the deadlock can occur only if the thread pool is out of threads. That usually doesn't happen in testing, but it will happen under heavy load in production.
I know what you're saying, and I know what the article says, but I'm just not buying it. My experience tells me otherwise. I stress tested my service many times over what the hardware could handle, or what would be an expected load just to make sure it could handle it. I KNOW that I used up every thread in the thread pool because I did it on purpose. I had config options for the number of io threads and worker threads so I could set them explicitly. I set them with scaling values and pumped a massive amount of data through the system, on purpose, so I could find the sweet spot. I never saw a single dead lock. I say again, I think this is a lot of to do about nothing.
-
I've created a whole bunch of parallel wrappers for the Func... and Action... helpers in .NET. It makes it much more convenient to execute stuff in parallel from the thread pool. There are multiple overloads for each overload of Func<> and Action<>, but here's one example:
public static void Invoke<T1, T2>( Action<T1, T2> action, IEnumerable<object[]> args ) { List<IAsyncResult> results = new List<IAsyncResult>(); foreach( object[] argList in args ) results.Add( action.BeginInvoke( (T1)argList[ 0 ], (T2)argList[ 1 ], null, null ) ); foreach( IAsyncResult result in results ) { action.EndInvoke( result ); } }
And I can use it like this:class Program { static void Main( string[] args ) { ParallelAction.Invoke<string, int>( Foo, new[] { new object[] { "string1", 1 }, new object[] { "string2", 2 }, new object[] { "string3", 3 }, } ); Console.ReadLine(); } public static void Foo( string s, int i ) { Console.WriteLine( "Got: s={0}, i={1}", s, i ); } }
I have more complete versions that handle exceptions on threads, but those are too big to post here. Maybe I'll write an article later.Even wrote an article about it. Action Extensions[^]
xacc.ide
IronScheme - 1.0 RC 1 - out now!
((λ (x) `(,x ',x)) '(λ (x) `(,x ',x))) The Scheme Programming Language – Fourth Edition -
I know what you're saying, and I know what the article says, but I'm just not buying it. My experience tells me otherwise. I stress tested my service many times over what the hardware could handle, or what would be an expected load just to make sure it could handle it. I KNOW that I used up every thread in the thread pool because I did it on purpose. I had config options for the number of io threads and worker threads so I could set them explicitly. I set them with scaling values and pumped a massive amount of data through the system, on purpose, so I could find the sweet spot. I never saw a single dead lock. I say again, I think this is a lot of to do about nothing.
Derek Viljoen wrote:
I never saw a single dead lock.
"Works on my machine", basically? Without monitoring how many threads are in the treadpool, jobs waiting for a thread etc.? Without explicitely testing the specified deadlock condition? It might well be you are right, but your proof is lacking.
Agh! Reality! My Archnemesis![^]
| FoldWithUs! | sighist | µLaunch - program launcher for server core and hyper-v server. -
Derek Viljoen wrote:
I never saw a single dead lock.
"Works on my machine", basically? Without monitoring how many threads are in the treadpool, jobs waiting for a thread etc.? Without explicitely testing the specified deadlock condition? It might well be you are right, but your proof is lacking.
Agh! Reality! My Archnemesis![^]
| FoldWithUs! | sighist | µLaunch - program launcher for server core and hyper-v server.Do you normally have this kind of trouble with reading comprehension? I said I used up all the threads on purpose.
-
Do you normally have this kind of trouble with reading comprehension? I said I used up all the threads on purpose.
-
I've created a whole bunch of parallel wrappers for the Func... and Action... helpers in .NET. It makes it much more convenient to execute stuff in parallel from the thread pool. There are multiple overloads for each overload of Func<> and Action<>, but here's one example:
public static void Invoke<T1, T2>( Action<T1, T2> action, IEnumerable<object[]> args ) { List<IAsyncResult> results = new List<IAsyncResult>(); foreach( object[] argList in args ) results.Add( action.BeginInvoke( (T1)argList[ 0 ], (T2)argList[ 1 ], null, null ) ); foreach( IAsyncResult result in results ) { action.EndInvoke( result ); } }
And I can use it like this:class Program { static void Main( string[] args ) { ParallelAction.Invoke<string, int>( Foo, new[] { new object[] { "string1", 1 }, new object[] { "string2", 2 }, new object[] { "string3", 3 }, } ); Console.ReadLine(); } public static void Foo( string s, int i ) { Console.WriteLine( "Got: s={0}, i={1}", s, i ); } }
I have more complete versions that handle exceptions on threads, but those are too big to post here. Maybe I'll write an article later.