Logical Processors Vs Physical Processors
-
I want to fix the number of threads to spawn for a particular task and I do'nt know if that should depend on the number of logical processors or on the number of physical processors available to the system. The number of physical processors can be read using the ::GetSystemInfo API while the ::GetLogicalProcessorInformation API returns the number of logical processors. Any insight please ? thx in advance.
Push Framework - now released ! http://www.pushframework.com
-
I want to fix the number of threads to spawn for a particular task and I do'nt know if that should depend on the number of logical processors or on the number of physical processors available to the system. The number of physical processors can be read using the ::GetSystemInfo API while the ::GetLogicalProcessorInformation API returns the number of logical processors. Any insight please ? thx in advance.
Push Framework - now released ! http://www.pushframework.com
[ADDED] What follows is based on the assumption a lot of homogeneous work, mainly calculations, is to be performed and can be organized in a (small) number of threads without needing lots of synchronization. It would not apply to inhomogeneous operations, say some threads performing calculations, others disk I/O, others network I/O, etc. [/ADDED] In general I would base my considerations on the number of threads that can be active at the same time; this would ignore the physical aspects involved such as separate packaging, multi-core, hyper-threading... So the number of (logical) processors returned by GetSystemInfo() is what I would use. The one exception would be applications where performance is dominated by cache efficiency; if so, what really matters is the number of separate level-2 caches available, which could be better approximated by the number of physical processors. Hint: what would be keeping you from trying both? As your threading code would be dynamic anyway, why not perform an experiment? or even make it automatic: run with N threads for a while, then try 2*N and see if that is an improvement, then decide. :)
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
modified on Tuesday, February 8, 2011 5:10 PM
-
[ADDED] What follows is based on the assumption a lot of homogeneous work, mainly calculations, is to be performed and can be organized in a (small) number of threads without needing lots of synchronization. It would not apply to inhomogeneous operations, say some threads performing calculations, others disk I/O, others network I/O, etc. [/ADDED] In general I would base my considerations on the number of threads that can be active at the same time; this would ignore the physical aspects involved such as separate packaging, multi-core, hyper-threading... So the number of (logical) processors returned by GetSystemInfo() is what I would use. The one exception would be applications where performance is dominated by cache efficiency; if so, what really matters is the number of separate level-2 caches available, which could be better approximated by the number of physical processors. Hint: what would be keeping you from trying both? As your threading code would be dynamic anyway, why not perform an experiment? or even make it automatic: run with N threads for a while, then try 2*N and see if that is an improvement, then decide. :)
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
modified on Tuesday, February 8, 2011 5:10 PM
Hi and thx for your reply. The work is inhomegeneous, so the result of the experiment will surely show an increase of performance using 2*N threads. That is the current number of threads I am using : #threads := c * #Physical Processors where c is a constant that compensates for the non-zero probability that a thread could be in a blocking state within the task. Exceeding this limit, will show a decrease of performance due to context switching. My specific concern is with virtualized environment where most of our application execute nowadays. The guest system is often allowed to "see" less processors than the host : I expect this to be true then : #Physical Processors > #Logical Processors Thus the c * #Physical Processors limit will exceed the actual number of threads that the system can actually execute in //.
Push Framework - now released ! http://www.pushframework.com
-
Hi and thx for your reply. The work is inhomegeneous, so the result of the experiment will surely show an increase of performance using 2*N threads. That is the current number of threads I am using : #threads := c * #Physical Processors where c is a constant that compensates for the non-zero probability that a thread could be in a blocking state within the task. Exceeding this limit, will show a decrease of performance due to context switching. My specific concern is with virtualized environment where most of our application execute nowadays. The guest system is often allowed to "see" less processors than the host : I expect this to be true then : #Physical Processors > #Logical Processors Thus the c * #Physical Processors limit will exceed the actual number of threads that the system can actually execute in //.
Push Framework - now released ! http://www.pushframework.com
here are some more thoughts: - I'm not really familiar with virtual operations, however I expect the virtualisation widens the gap between your app's best performance and the number of physical processors even more, so I would not use the number of physical processors at all. I would rather use the number of logical processors, divided by some constant to compensate for how much of your system you want to dedicate to that app (maybe equal to the number of virtual systems). - if your operations are not homogeneous and sometimes blocking (and assuming they have random blocking phases) but you can organize them in identical jobs, then you can add threads to execute said jobs until (a) performance no longer improves and/or (b) total CPU load reaches 100% or whatever number you want to allot to your virtual machine. :)
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
-
here are some more thoughts: - I'm not really familiar with virtual operations, however I expect the virtualisation widens the gap between your app's best performance and the number of physical processors even more, so I would not use the number of physical processors at all. I would rather use the number of logical processors, divided by some constant to compensate for how much of your system you want to dedicate to that app (maybe equal to the number of virtual systems). - if your operations are not homogeneous and sometimes blocking (and assuming they have random blocking phases) but you can organize them in identical jobs, then you can add threads to execute said jobs until (a) performance no longer improves and/or (b) total CPU load reaches 100% or whatever number you want to allot to your virtual machine. :)
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
I see, the ideal is to use the logical processors then. In your first reply I thought you meant the physical processors here, because you were talking about the ::GetSystemInfo API.
Luc Pattyn wrote:
So the number of (logical) processors returned by GetSystemInfo() is what I would use.
Thanks !
Push Framework - now released ! http://www.pushframework.com