Using OpenMP - #of CPU or # of threads ?
-
Just learning OpenMP. So what is , if any , the difference between "processor" / CPU or "thread"? They seems to be used interchangeably. And do I have to specify # of processors / threads to be actually used by the app? I suppose I can also select which part of code will be "multi x", but that is irrelevant for now.
// Get the number of processors in this system int iCPU = 0; iCPU = omp\_get\_num\_procs(); cout << "# of CPU's " << dec << +iCPU << endl; // Now set the number of threads ?? omp\_set\_num\_threads(iCPU);
Processors or CPUs mean the same thing. They are the hardware devices (or parts of a chip) that execute the instructions in a program. Processes or Threads, are activities that can run inside a process
or
. A thread is simply a discrete set of instructions that are executed by the CPU. Some applications run quite happily with a single thread. Others require multiple threads in order to perform different parts of the program in parallel. However, using multiple threads is not always the most efficient way to run an application, since a single CPU (or 'core' in a modern chip) can only run one thread at a time. So if you have a four-core chip you can run four threads in parallel with reasonable efficiency. With a single core it would probably not be very efficient owing to operating system overhead to manage switching between threads. Unless you really need threads, and fully understand the concept and practicalities of them, they are best avoided. -
Processors or CPUs mean the same thing. They are the hardware devices (or parts of a chip) that execute the instructions in a program. Processes or Threads, are activities that can run inside a process
or
. A thread is simply a discrete set of instructions that are executed by the CPU. Some applications run quite happily with a single thread. Others require multiple threads in order to perform different parts of the program in parallel. However, using multiple threads is not always the most efficient way to run an application, since a single CPU (or 'core' in a modern chip) can only run one thread at a time. So if you have a four-core chip you can run four threads in parallel with reasonable efficiency. With a single core it would probably not be very efficient owing to operating system overhead to manage switching between threads. Unless you really need threads, and fully understand the concept and practicalities of them, they are best avoided.So the example I used retrieves # of processors / CPU hardware and sets the # of threads per each processor / CPU equal to # of processors / CPU hardware. Does not make much sense. The way I read the example - it really does not tell me how many processors / CPU hardware will be utilized. Perhaps I need to read the OpenMP doc.
-
So the example I used retrieves # of processors / CPU hardware and sets the # of threads per each processor / CPU equal to # of processors / CPU hardware. Does not make much sense. The way I read the example - it really does not tell me how many processors / CPU hardware will be utilized. Perhaps I need to read the OpenMP doc.
-
I do not know exactly what you are trying to achieve here. Maybe the documentation makes it clear.
My objective is for the app to utilize ALL hardware processors / CPU. From posts so far I feel I do not need multiple threads running in each CPU, one will do for now. I am just trying to understand the relations between processors and threads. The following code leads to believe that the threads are NOT related to # of processors.
// Get the number of processors in this system int iCPU = 0; iCPU = omp\_get\_num\_procs(); cout << "# of CPU's " << dec << +iCPU << endl;
gets "4" which is number of CPU's
int iMAXThread = 0; iMAXThread = omp\_get\_max\_threads(); cout << "# of iMAXThread " << dec << +iMAXThread << endl;
also gets "4" - apparently default of "one thread to CPU ?"
// Now set the number of threads omp\_set\_num\_threads(10); iMAXThread = omp\_get\_max\_threads();
returns "10" per "system ?"
cout << "# of iMAXThread " << dec << +iMAXThread << endl; exit(1);
-
My objective is for the app to utilize ALL hardware processors / CPU. From posts so far I feel I do not need multiple threads running in each CPU, one will do for now. I am just trying to understand the relations between processors and threads. The following code leads to believe that the threads are NOT related to # of processors.
// Get the number of processors in this system int iCPU = 0; iCPU = omp\_get\_num\_procs(); cout << "# of CPU's " << dec << +iCPU << endl;
gets "4" which is number of CPU's
int iMAXThread = 0; iMAXThread = omp\_get\_max\_threads(); cout << "# of iMAXThread " << dec << +iMAXThread << endl;
also gets "4" - apparently default of "one thread to CPU ?"
// Now set the number of threads omp\_set\_num\_threads(10); iMAXThread = omp\_get\_max\_threads();
returns "10" per "system ?"
cout << "# of iMAXThread " << dec << +iMAXThread << endl; exit(1);
I'd recommend you google for "openmp tutorial C++" and work through one or more of them. I don't know anything about OpenMP, except that its a MultiProcessing (i.e. threading) extension to C/C++. I do know that multi-threading is easy to get wrong. Added to which, having multiple threads means that you get out-of-order (to the human brain, anyway) execution, which can lead to unexpected results e.g. (the following code comes from one of the tutorials ... don't ask me anything further about it!)
Quote:
$ cat example.c
#include #include int main()
{
#pragma omp parallel
{
int ID = omp_get_thread_num();
printf("hello(%d)", ID);
printf("world(%d)\n", ID);
}return 0;
}
$ gcc -fopenmp example.c -o example
$ ./example
hello(1)world(1)
hello(0)world(0)
hello(3)world(3)
hello(2)world(2)
$ ./example
hello(3)world(3)
hello(0)world(0)
hello(1)world(1)
hello(2)world(2)
$ ./example
hello(3)world(3)
hello(2)hello(1)world(1)
hello(0)world(0)
world(2)That's about as simple as a MP program gets, and you can see that a) the threads didn't run in the same order every time, and b) in the last run thread(2) was interrupted by thread(1) and thread(0) before it completed. Proceed with caution!
-
I'd recommend you google for "openmp tutorial C++" and work through one or more of them. I don't know anything about OpenMP, except that its a MultiProcessing (i.e. threading) extension to C/C++. I do know that multi-threading is easy to get wrong. Added to which, having multiple threads means that you get out-of-order (to the human brain, anyway) execution, which can lead to unexpected results e.g. (the following code comes from one of the tutorials ... don't ask me anything further about it!)
Quote:
$ cat example.c
#include #include int main()
{
#pragma omp parallel
{
int ID = omp_get_thread_num();
printf("hello(%d)", ID);
printf("world(%d)\n", ID);
}return 0;
}
$ gcc -fopenmp example.c -o example
$ ./example
hello(1)world(1)
hello(0)world(0)
hello(3)world(3)
hello(2)world(2)
$ ./example
hello(3)world(3)
hello(0)world(0)
hello(1)world(1)
hello(2)world(2)
$ ./example
hello(3)world(3)
hello(2)hello(1)world(1)
hello(0)world(0)
world(2)That's about as simple as a MP program gets, and you can see that a) the threads didn't run in the same order every time, and b) in the last run thread(2) was interrupted by thread(1) and thread(0) before it completed. Proceed with caution!
-
I really appreciate all contributions to this thread so far. As anything "new" to me I do not have any specific / planned way to proceed with my learning , especially to cover some of the less obvious / less visible aspects. I do however maintain that learning new technology is not always "from bottom up" , it is a mosaic / puzzle whose pieces by itself do not make much sense. Let me restate my "objective" in using OpenMP - utilize ALL CPU's available in the application. I have deliberately NOT specified WHAT the app does. My objective should be descriptive enough. But to attempt to refine the "puzzle" , and for those curious, I do not foresee any future applications critical in sequence of how the app is executed. I still believe that utilizing all hardware CPU's will eventually benefit the app, but honestly I am not interested in proving / disproving that in any specific way. As far as OpenMP "tutorials" - it is very mature technology and most of the tutorials reflect that. And some of them are just repetitious in the cautioning the reader in "difficulties " of debugging multi threaded applications. Cheers Vaclav
-
I really appreciate all contributions to this thread so far. As anything "new" to me I do not have any specific / planned way to proceed with my learning , especially to cover some of the less obvious / less visible aspects. I do however maintain that learning new technology is not always "from bottom up" , it is a mosaic / puzzle whose pieces by itself do not make much sense. Let me restate my "objective" in using OpenMP - utilize ALL CPU's available in the application. I have deliberately NOT specified WHAT the app does. My objective should be descriptive enough. But to attempt to refine the "puzzle" , and for those curious, I do not foresee any future applications critical in sequence of how the app is executed. I still believe that utilizing all hardware CPU's will eventually benefit the app, but honestly I am not interested in proving / disproving that in any specific way. As far as OpenMP "tutorials" - it is very mature technology and most of the tutorials reflect that. And some of them are just repetitious in the cautioning the reader in "difficulties " of debugging multi threaded applications. Cheers Vaclav
OpenMP works out how many CPU core you have at runtime. It isn't a fixed answer and there may be cores unavailable to you and there isn't a thing you can do about it. This call returns the number of cores available to you at time of call
int omp_get_num_procs();
Inside Windows or Linux certain threads may have a core locked with core affinity and you can sit there and whistle dixie the OS is never going to give OpenMP that core. OpenMP is not the OS and is not in control of core scheduling it asks and locks the maximum the O/S will allow. So if you aren't getting all the cores you need to look at the O/S level. google something like "Affinity control outside OpenMP"
In vino veritas
-
My objective is for the app to utilize ALL hardware processors / CPU. From posts so far I feel I do not need multiple threads running in each CPU, one will do for now. I am just trying to understand the relations between processors and threads. The following code leads to believe that the threads are NOT related to # of processors.
// Get the number of processors in this system int iCPU = 0; iCPU = omp\_get\_num\_procs(); cout << "# of CPU's " << dec << +iCPU << endl;
gets "4" which is number of CPU's
int iMAXThread = 0; iMAXThread = omp\_get\_max\_threads(); cout << "# of iMAXThread " << dec << +iMAXThread << endl;
also gets "4" - apparently default of "one thread to CPU ?"
// Now set the number of threads omp\_set\_num\_threads(10); iMAXThread = omp\_get\_max\_threads();
returns "10" per "system ?"
cout << "# of iMAXThread " << dec << +iMAXThread << endl; exit(1);
No, threads and processors are not related in any way. Processors (CPUs) are hardware, threads are pieces of code that run in a CPU. An application can create as many threads as it likes regardless of how many processors exist in the system that it runs on. In most cases you do not need to know how many CPUs are available, since your application should be based on its business/technical design rather than the hardware it runs on. Finally, do not assume that you can control how many processors your application uses; the operating system is in control of resources, and allocates them as necessary.
-
OpenMP works out how many CPU core you have at runtime. It isn't a fixed answer and there may be cores unavailable to you and there isn't a thing you can do about it. This call returns the number of cores available to you at time of call
int omp_get_num_procs();
Inside Windows or Linux certain threads may have a core locked with core affinity and you can sit there and whistle dixie the OS is never going to give OpenMP that core. OpenMP is not the OS and is not in control of core scheduling it asks and locks the maximum the O/S will allow. So if you aren't getting all the cores you need to look at the O/S level. google something like "Affinity control outside OpenMP"
In vino veritas
Thanks Leon. Found this and my first impression is that simple "Hello world" MAY be better way to get more familiar with OpenMP. Just "adding OpenMP" to my existing code is not going to tell me if it is doing anything useful until I add some controls / measurements / trace. That is contrary to my initial statement that I am not interested in analyzing OpenP processes, but to do that I need more experience with OpenMP. Error[^]
-
Thanks Leon. Found this and my first impression is that simple "Hello world" MAY be better way to get more familiar with OpenMP. Just "adding OpenMP" to my existing code is not going to tell me if it is doing anything useful until I add some controls / measurements / trace. That is contrary to my initial statement that I am not interested in analyzing OpenP processes, but to do that I need more experience with OpenMP. Error[^]