OpenMP
-
Hi All, VS2005 comes with OpenMP extensions in the compiler. Intel seems to be pushing the OpenMP standard(?) but... Does anyone use them? Are they more useful than creating threads (i.e. with syncronisation problems etc)? Does OpenMP exploit HyperThreading (or the AMD equivilant) better then threads? Thoughts? thanks, Rich "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far the Universe is winning." -- Rich Cook
-
Hi All, VS2005 comes with OpenMP extensions in the compiler. Intel seems to be pushing the OpenMP standard(?) but... Does anyone use them? Are they more useful than creating threads (i.e. with syncronisation problems etc)? Does OpenMP exploit HyperThreading (or the AMD equivilant) better then threads? Thoughts? thanks, Rich "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far the Universe is winning." -- Rich Cook
RichardS wrote:
Does OpenMP exploit HyperThreading (or the AMD equivilant) better then threads?
in the end threads and processes look the same to the base system, OpenMP would then be equivalent once "started" as "a" threading model started in windows. OpenMP relies on a common standard of "defining" a threaded operation to a compiler (which could be massively parallel -- solving the same solution for every point in an array or async threads doing different jobs). The compiler being the core of turning high level code into machine language, this does make sense. You can do everything that OpenMP does using "almost" any threading system, how well it is written determines how functionally equivalent. OpenMP allows the compiler to optimize the result, so is slightly more efficient than a library like OpenThreads that hides multi-language threading concerns. Intel is pushing the OpenMP because the supercomputing initiative is pushing OpenMP and they like to court the big bucks. That doesn't mean it is a bad standard, it is pretty darn good, and at the compiler level allows many optimizations for working threads that is difficult for the programmer to write himself. Just being honest where it comes from. Yes it is used. :) _________________________ Asu no koto o ieba, tenjo de nezumi ga warau. Talk about things of tomorrow and the mice in the ceiling laugh. (Japanese Proverb)
-
Hi All, VS2005 comes with OpenMP extensions in the compiler. Intel seems to be pushing the OpenMP standard(?) but... Does anyone use them? Are they more useful than creating threads (i.e. with syncronisation problems etc)? Does OpenMP exploit HyperThreading (or the AMD equivilant) better then threads? Thoughts? thanks, Rich "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far the Universe is winning." -- Rich Cook
RichardS wrote:
Does OpenMP exploit HyperThreading (or the AMD equivilant) better then threads?
AMD doesn't have an equivilant to hyperthreading at present. I don't know if they're going to make one in the future, or don't feel it offers enough of a benefit to make the investment, and're instead rolling all thier effort into N-core designs. I haven't heard anything about it, so I suspect they're going with the 2nd approach.
-
RichardS wrote:
Does OpenMP exploit HyperThreading (or the AMD equivilant) better then threads?
AMD doesn't have an equivilant to hyperthreading at present. I don't know if they're going to make one in the future, or don't feel it offers enough of a benefit to make the investment, and're instead rolling all thier effort into N-core designs. I haven't heard anything about it, so I suspect they're going with the 2nd approach.
I heard the same thing. However with a true muli-processor achitechure, I would think one would achieve better parallelism vs HyperThreading. So if I reformat my question as, does OpenMP lead to better intrinsic parallelism (i.e. from an optimisation point of view)? thanks, Rich "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far the Universe is winning." -- Rich Cook
-
I heard the same thing. However with a true muli-processor achitechure, I would think one would achieve better parallelism vs HyperThreading. So if I reformat my question as, does OpenMP lead to better intrinsic parallelism (i.e. from an optimisation point of view)? thanks, Rich "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far the Universe is winning." -- Rich Cook
RichardS wrote:
However with a true muli-processor achitechure, I would think one would achieve better parallelism vs HyperThreading.
Yep. HT uses the fact that parts of a CPU are idle to fake a second core, it reduces the penalties from context switches, and if the apps are doing divergent tasks can greatly speed things up. Running 2 MPEG encoders at once on an HT system won't gain much over running them sequentially because both threads are trying to do the exact same thing. For distributed computing you should run a different project on each of the fake cores to diversify the load. Dual core's not quite as good a 2 chip setup either, since the two cores are sharing a single bus for memory access. For non IO bound tasks the gap is much narrower though.
-
RichardS wrote:
However with a true muli-processor achitechure, I would think one would achieve better parallelism vs HyperThreading.
Yep. HT uses the fact that parts of a CPU are idle to fake a second core, it reduces the penalties from context switches, and if the apps are doing divergent tasks can greatly speed things up. Running 2 MPEG encoders at once on an HT system won't gain much over running them sequentially because both threads are trying to do the exact same thing. For distributed computing you should run a different project on each of the fake cores to diversify the load. Dual core's not quite as good a 2 chip setup either, since the two cores are sharing a single bus for memory access. For non IO bound tasks the gap is much narrower though.
http://www.theinquirer.net/?article=30609[^] http://www.pugetsystems.com/articles.php?id=23[^] Cache coherency on a real dual core (same die, not 2 die's on a package), has much lower latency than a 2 socket cpu machine. As a result you often get a boost. What the articles above do not mention, is what happens with server performance when the OS supports NUMA memory as per AMD Opterons. Then you get massive memory throughput to the CPU's, and for database apps like SQL Server, you see huge leaps in performance. Went from a dual 3GHz Xeon to a dual AMD Opteron 265 (1.8GHz) and saw a 3-4 fold increase in query performance. Fantastic to see. Was the memory bandwidth shining though. Using 2003 Server for the OS, which is NUMA aware, as is XP-64, and 2.6 Linux kernels. As for HT, Intel is not using it in future architectures. HT helps with the odd user app, but on the whole it actually hinders peformance, and seriously so in many server apps. I've heard MS people say that you don't want to use HT with things like SQL server. It actually degrades performance, and I've seen it in my own aps on Linux, where you are knackered if the schedueler cannot understand HT virtual CPU's are not to be weighted the same as the main CPU. i.e. 2.4 Linux kernel and Windows 2000 and below.
-
http://www.theinquirer.net/?article=30609[^] http://www.pugetsystems.com/articles.php?id=23[^] Cache coherency on a real dual core (same die, not 2 die's on a package), has much lower latency than a 2 socket cpu machine. As a result you often get a boost. What the articles above do not mention, is what happens with server performance when the OS supports NUMA memory as per AMD Opterons. Then you get massive memory throughput to the CPU's, and for database apps like SQL Server, you see huge leaps in performance. Went from a dual 3GHz Xeon to a dual AMD Opteron 265 (1.8GHz) and saw a 3-4 fold increase in query performance. Fantastic to see. Was the memory bandwidth shining though. Using 2003 Server for the OS, which is NUMA aware, as is XP-64, and 2.6 Linux kernels. As for HT, Intel is not using it in future architectures. HT helps with the odd user app, but on the whole it actually hinders peformance, and seriously so in many server apps. I've heard MS people say that you don't want to use HT with things like SQL server. It actually degrades performance, and I've seen it in my own aps on Linux, where you are knackered if the schedueler cannot understand HT virtual CPU's are not to be weighted the same as the main CPU. i.e. 2.4 Linux kernel and Windows 2000 and below.
That's interesting, I have an HT cpu and I mainly use my computer for development in visual studio so I stopwatched a huge build in both ht on and ht off and it was faster on so I've left it on. Maybe I should splurge this year for a new dual or whatever is out there core processor, if only there were some benchmarks online related specifically for software developers...
-
That's interesting, I have an HT cpu and I mainly use my computer for development in visual studio so I stopwatched a huge build in both ht on and ht off and it was faster on so I've left it on. Maybe I should splurge this year for a new dual or whatever is out there core processor, if only there were some benchmarks online related specifically for software developers...
I assume you have a P4. On a modern AMD, or the new Intel Conroe's (not out yet), you should see a significant jump in compile performance. Reason being that these CPU's have a much shorter pipeline, and as a result don't suffer from bad guesses in the branch prediction so much. P4's excel at plain plain big blocks of code. As soon as you get lots of conditional if's and branching, the performance drops of badly, so to the high latency in the deep pipeline. Compiling is probably the worst thing you can do with a P4. I use a dual 3.0 GHz Xeon at work with VS 6, 2003 & 2005. My plain and old AMD 2500 Athlon (old 32 bit version) beats it hands down in speed. Can do a build in 7 minutes v's 20 for the one at work. P4's are at there best doing things like MP4 video encoding, but the sheer FPU power of the 64bit Athlons these days leaves them standing.
-
I assume you have a P4. On a modern AMD, or the new Intel Conroe's (not out yet), you should see a significant jump in compile performance. Reason being that these CPU's have a much shorter pipeline, and as a result don't suffer from bad guesses in the branch prediction so much. P4's excel at plain plain big blocks of code. As soon as you get lots of conditional if's and branching, the performance drops of badly, so to the high latency in the deep pipeline. Compiling is probably the worst thing you can do with a P4. I use a dual 3.0 GHz Xeon at work with VS 6, 2003 & 2005. My plain and old AMD 2500 Athlon (old 32 bit version) beats it hands down in speed. Can do a build in 7 minutes v's 20 for the one at work. P4's are at there best doing things like MP4 video encoding, but the sheer FPU power of the 64bit Athlons these days leaves them standing.
Yeah p4 3.06 ghz, it was smoking fast when I first got it. Then I upgraded to a SATA striped array because the hard drive was the bottleneck, now I'm back to the CPU being the major bottleneck during a big build. I guess it's time to start shopping around for a new P.C., or maybe hold off a year and get one around Vista release. I just lived through a heavy year of intense coding with what I've got now, I always seem to buy the new PC before I actually need it the most and do the most coding when I've got a system that has long since peaked as the fastest thing out there for performance in my price range. Stupid computers! :doh: I've always been an Intel man, but I guess that's anachronistic these days, I'll have to broaden my horizons. I wish someone would post build times for some sort of standard reference project that we could download and test against our system with good metrics and examples of different systems and how fast they are. Maybe a useful "open" article for CodeProject some day.