Seeking a better understanding of .NET multithreading and the System.Threading.Tasks.Parallel methods
-
I have a project that needs the ability to modify dozens, hundreds, and potentially thousands of file modifications which are fairly intensive and affect associated resource files as well. To speed things up and maximize the ability to process multiple threads to take advantage of parallel processing I decided to use the System.Threads.Tasks.Parallel class to drive these file changes. There are a few things that I have learned and discovered along the way that I would like to better understand. First, before I go any further, my project has a BIG need to track all changes in a log file that occur BEFORE they happen to minimize the risk of losing data when something goes wrong. That log file is then parsed for undo actions. This requires the chain of events to be tracked; and logging each change before it happens requires several sub-tasks that use .NET's await feature. A basic picture of the process used to change the files looks something like this:
public class MainFileType
{
internal async void DoSomeMajorChanges(RichTextboxBuilder builder, StreamWriter changeLog)
{
bool result;
await Task.Run(new Action(() => changeLog.LogAction(this))).ConfigureAwait(false);
await Task.Run(new Action(()=> result = coreFile.DoChanges())).ConfigureAwait(false);
builder.Control.BeginInvoke(new Action() builder.NotifyUser("Some Change Occurred", Color.Red));
foreach (ResourceFile file in this.AssociatedFiles)
{
await Task.Run(new Action(() => changeLog.LogAction(file))).ConfigureAwait(false);
await Task.Run(new Action(() => result |= file.DoChanges())).ConfigureAwait(false);
builder.Control.BeginInvoke(new Action() builder.NotifyUser("Some Change Occurred", Color.Blue));
}
return result;
}
}This code is called by a UI that is shown any time a single or multiple files are modified. The UI regularly reports to the user:
public class ChangeManagerUI : Form
{
private bool processed;
object task;
StreamWriter parseableActionLog;private void OnFormShown(object sender, EventArgs e) { if (!processed) { processed = true; MainFileType file; RichTextboxBuilder builder = null; List batch = task as List; Refresh(); if (batch != null) { RichTextboxBuilder.BeginConcurrentAppendProcess(this, batch.Count);
-
I have a project that needs the ability to modify dozens, hundreds, and potentially thousands of file modifications which are fairly intensive and affect associated resource files as well. To speed things up and maximize the ability to process multiple threads to take advantage of parallel processing I decided to use the System.Threads.Tasks.Parallel class to drive these file changes. There are a few things that I have learned and discovered along the way that I would like to better understand. First, before I go any further, my project has a BIG need to track all changes in a log file that occur BEFORE they happen to minimize the risk of losing data when something goes wrong. That log file is then parsed for undo actions. This requires the chain of events to be tracked; and logging each change before it happens requires several sub-tasks that use .NET's await feature. A basic picture of the process used to change the files looks something like this:
public class MainFileType
{
internal async void DoSomeMajorChanges(RichTextboxBuilder builder, StreamWriter changeLog)
{
bool result;
await Task.Run(new Action(() => changeLog.LogAction(this))).ConfigureAwait(false);
await Task.Run(new Action(()=> result = coreFile.DoChanges())).ConfigureAwait(false);
builder.Control.BeginInvoke(new Action() builder.NotifyUser("Some Change Occurred", Color.Red));
foreach (ResourceFile file in this.AssociatedFiles)
{
await Task.Run(new Action(() => changeLog.LogAction(file))).ConfigureAwait(false);
await Task.Run(new Action(() => result |= file.DoChanges())).ConfigureAwait(false);
builder.Control.BeginInvoke(new Action() builder.NotifyUser("Some Change Occurred", Color.Blue));
}
return result;
}
}This code is called by a UI that is shown any time a single or multiple files are modified. The UI regularly reports to the user:
public class ChangeManagerUI : Form
{
private bool processed;
object task;
StreamWriter parseableActionLog;private void OnFormShown(object sender, EventArgs e) { if (!processed) { processed = true; MainFileType file; RichTextboxBuilder builder = null; List batch = task as List; Refresh(); if (batch != null) { RichTextboxBuilder.BeginConcurrentAppendProcess(this, batch.Count);
Parallel.For
doesn't return aTask
which can beawait
ed; it has no choice but to block the current thread until the processing has been completed. By wrapping it in aTask.Run
, you're blocking a background thread instead of the UI thread. TheParallel
methods also don't work well withasync
methods. The delegate you pass in is expected to run synchronously to completion. You've declared yourDoSomeMajorChanges
method asasync void
. Your should avoidasync void
like the plague: Avoid async void methods | You’ve Been Haacked[^] Theawait Task.Run(...)
lines in yourDoSomeMajorChanges
method serve no purpose. Since there are no otherawait
s in that method, you can simply make it synchronous.internal bool DoSomeMajorChanges(RichTextboxBuilder builder, StreamWriter changeLog)
{
bool result;
changeLog.LogAction(this);
result = coreFile.DoChanges();
builder.Control.BeginInvoke(new Action() builder.NotifyUser("Some Change Occurred", Color.Red));
foreach (ResourceFile file in this.AssociatedFiles)
{
changeLog.LogAction(file);
result |= file.DoChanges();
builder.Control.BeginInvoke(new Action() builder.NotifyUser("Some Change Occurred", Color.Blue));
}return result;
}
The delegate you're passing to
Parallel.For
is referencing captured variables (file
andbuilder
). This is not thread-safe, which is why you are seeing inconsistent results. Move the variable declarations inside the delegate:Task.Run(() =>
{
Parallel.For(0, batch.Count, i =>
{
var file = batch[i];
var builder = RichTextboxBuilder.BeginConcurrentAppend(i);
file.DoSomeMajorChanges(builder, parseableActionLog)
RichTextboxBuilder.EndConcurrentAppend(i);
});
});NB: You will not be able to refer to the last
RichTextboxBuilder
instance outside of the loop. It's not clear what yourRichTextboxBuilder
methods are doing. They could potentially be harming your concurrency.
-
I have a project that needs the ability to modify dozens, hundreds, and potentially thousands of file modifications which are fairly intensive and affect associated resource files as well. To speed things up and maximize the ability to process multiple threads to take advantage of parallel processing I decided to use the System.Threads.Tasks.Parallel class to drive these file changes. There are a few things that I have learned and discovered along the way that I would like to better understand. First, before I go any further, my project has a BIG need to track all changes in a log file that occur BEFORE they happen to minimize the risk of losing data when something goes wrong. That log file is then parsed for undo actions. This requires the chain of events to be tracked; and logging each change before it happens requires several sub-tasks that use .NET's await feature. A basic picture of the process used to change the files looks something like this:
public class MainFileType
{
internal async void DoSomeMajorChanges(RichTextboxBuilder builder, StreamWriter changeLog)
{
bool result;
await Task.Run(new Action(() => changeLog.LogAction(this))).ConfigureAwait(false);
await Task.Run(new Action(()=> result = coreFile.DoChanges())).ConfigureAwait(false);
builder.Control.BeginInvoke(new Action() builder.NotifyUser("Some Change Occurred", Color.Red));
foreach (ResourceFile file in this.AssociatedFiles)
{
await Task.Run(new Action(() => changeLog.LogAction(file))).ConfigureAwait(false);
await Task.Run(new Action(() => result |= file.DoChanges())).ConfigureAwait(false);
builder.Control.BeginInvoke(new Action() builder.NotifyUser("Some Change Occurred", Color.Blue));
}
return result;
}
}This code is called by a UI that is shown any time a single or multiple files are modified. The UI regularly reports to the user:
public class ChangeManagerUI : Form
{
private bool processed;
object task;
StreamWriter parseableActionLog;private void OnFormShown(object sender, EventArgs e) { if (!processed) { processed = true; MainFileType file; RichTextboxBuilder builder = null; List batch = task as List; Refresh(); if (batch != null) { RichTextboxBuilder.BeginConcurrentAppendProcess(this, batch.Count);
Multi-tasking "file operations" isn't as great as it sounds. Often the whole thing will run slower due to head and channel contention.
It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it. ― Confucian Analects: Rules of Confucius about his food
-
Parallel.For
doesn't return aTask
which can beawait
ed; it has no choice but to block the current thread until the processing has been completed. By wrapping it in aTask.Run
, you're blocking a background thread instead of the UI thread. TheParallel
methods also don't work well withasync
methods. The delegate you pass in is expected to run synchronously to completion. You've declared yourDoSomeMajorChanges
method asasync void
. Your should avoidasync void
like the plague: Avoid async void methods | You’ve Been Haacked[^] Theawait Task.Run(...)
lines in yourDoSomeMajorChanges
method serve no purpose. Since there are no otherawait
s in that method, you can simply make it synchronous.internal bool DoSomeMajorChanges(RichTextboxBuilder builder, StreamWriter changeLog)
{
bool result;
changeLog.LogAction(this);
result = coreFile.DoChanges();
builder.Control.BeginInvoke(new Action() builder.NotifyUser("Some Change Occurred", Color.Red));
foreach (ResourceFile file in this.AssociatedFiles)
{
changeLog.LogAction(file);
result |= file.DoChanges();
builder.Control.BeginInvoke(new Action() builder.NotifyUser("Some Change Occurred", Color.Blue));
}return result;
}
The delegate you're passing to
Parallel.For
is referencing captured variables (file
andbuilder
). This is not thread-safe, which is why you are seeing inconsistent results. Move the variable declarations inside the delegate:Task.Run(() =>
{
Parallel.For(0, batch.Count, i =>
{
var file = batch[i];
var builder = RichTextboxBuilder.BeginConcurrentAppend(i);
file.DoSomeMajorChanges(builder, parseableActionLog)
RichTextboxBuilder.EndConcurrentAppend(i);
});
});NB: You will not be able to refer to the last
RichTextboxBuilder
instance outside of the loop. It's not clear what yourRichTextboxBuilder
methods are doing. They could potentially be harming your concurrency.
Thanks Richard. That helps; particularly the info about Parallel itself being syncronous. I knew about the async void issue from my research while solving issues. My real method is Task (just like yours). Being relatively new to asynchronous code, I just forgot about it when simplifying my code. Just to clear things up: So anything inside the Parallel.For/Foreach will run syncronously; except of course if it calls an asynchronous method correct?
-
Thanks Richard. That helps; particularly the info about Parallel itself being syncronous. I knew about the async void issue from my research while solving issues. My real method is Task (just like yours). Being relatively new to asynchronous code, I just forgot about it when simplifying my code. Just to clear things up: So anything inside the Parallel.For/Foreach will run syncronously; except of course if it calls an asynchronous method correct?
pr1mem0ver wrote:
So anything inside the Parallel.For/Foreach will run syncronously; except of course if it calls an asynchronous method correct?
It will run synchronously. If it calls an asynchronous method, the delegate will return and the loop will terminate before the asynchronous method has completed. There are ways around that, but they tend to cause serious problems: Should I expose synchronous wrappers for asynchronous methods? | .NET Parallel Programming[^]
"These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer
-
pr1mem0ver wrote:
So anything inside the Parallel.For/Foreach will run syncronously; except of course if it calls an asynchronous method correct?
It will run synchronously. If it calls an asynchronous method, the delegate will return and the loop will terminate before the asynchronous method has completed. There are ways around that, but they tend to cause serious problems: Should I expose synchronous wrappers for asynchronous methods? | .NET Parallel Programming[^]
"These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer
I forgot to thank you for your help back in April. Thanks!