Weird Performance
-
I am seeing an odd performance pattern that does entirely make sense to me. I am not asking for anything to be solved - just a confirmation of my observations, or not. First, the background : I have tried a few options for a tracing library. Among them were Paul DiLascia's old TraceWin app and library. That worked pretty well and was the last one I used. It had a few drawbacks and the main was performance. I found it took on order of 6mS per message and that is just to slow for me and my app(s). I finally decided to write my own and it is looking really good. I now have overhead of about 3μS per message - yes, microseconds. This is where the weird part comes in. That is the performance on my main development machine. It is a i9-9900X at 3.5Ghz and it runs W10. My testing machine is a Xeon i7-3820 at 3.6GHz and it runs W7, thank the heavens. The weird part is the Xeon has an overhead of less than 1μS per message, typically right at 0.9 and these numbers are quite repeatable. My goal was to make this as low-overhead as possible and I think I have succeeded. Each message puts the message's text into a buffer (with sprintf) and then copies the buffer into a piece of shared memory using memcpy. However, first it acquires a mutex that guards access to the shared memory. I would not expect an i7 Xeon to be faster than an i9 Core-series processor at essentially the same clock rate at much of anything and certainly not three times faster. This leads to my main question : has anyone else seen this kind of performance difference accessing kernel-level OS objects between W7 and W10? As I am writing this, I just thought of a possible explanation : clock throttling. I bet the i9 has its clock throttled back during this test. I will experiment a little and see if that's it. -edit- Upon further review, I think that is the explanation. In the task manager it shows the CPU idling at 1.2GHz and this would explain the performance difference. The test doesn't last long enough for the turbo mode to kick in so the CPU stays at its idle clock rate. Oh well. I guess this means the trace library has a sub 1μ overhead and I am even happier about that. Yee haw.
"They have a consciousness, they have a life, they have a soul! Damn you! Let the rabbits wear glasses! Save our brothers! Can I get an amen?"
-
I am seeing an odd performance pattern that does entirely make sense to me. I am not asking for anything to be solved - just a confirmation of my observations, or not. First, the background : I have tried a few options for a tracing library. Among them were Paul DiLascia's old TraceWin app and library. That worked pretty well and was the last one I used. It had a few drawbacks and the main was performance. I found it took on order of 6mS per message and that is just to slow for me and my app(s). I finally decided to write my own and it is looking really good. I now have overhead of about 3μS per message - yes, microseconds. This is where the weird part comes in. That is the performance on my main development machine. It is a i9-9900X at 3.5Ghz and it runs W10. My testing machine is a Xeon i7-3820 at 3.6GHz and it runs W7, thank the heavens. The weird part is the Xeon has an overhead of less than 1μS per message, typically right at 0.9 and these numbers are quite repeatable. My goal was to make this as low-overhead as possible and I think I have succeeded. Each message puts the message's text into a buffer (with sprintf) and then copies the buffer into a piece of shared memory using memcpy. However, first it acquires a mutex that guards access to the shared memory. I would not expect an i7 Xeon to be faster than an i9 Core-series processor at essentially the same clock rate at much of anything and certainly not three times faster. This leads to my main question : has anyone else seen this kind of performance difference accessing kernel-level OS objects between W7 and W10? As I am writing this, I just thought of a possible explanation : clock throttling. I bet the i9 has its clock throttled back during this test. I will experiment a little and see if that's it. -edit- Upon further review, I think that is the explanation. In the task manager it shows the CPU idling at 1.2GHz and this would explain the performance difference. The test doesn't last long enough for the turbo mode to kick in so the CPU stays at its idle clock rate. Oh well. I guess this means the trace library has a sub 1μ overhead and I am even happier about that. Yee haw.
"They have a consciousness, they have a life, they have a soul! Damn you! Let the rabbits wear glasses! Save our brothers! Can I get an amen?"
I suppose you speak about c/c++, don't you?
M.D.V. ;) If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about? Help me to understand what I'm saying, and I'll explain it better to you Rating helpful answers is nice, but saying thanks can be even nicer.
-
I suppose you speak about c/c++, don't you?
M.D.V. ;) If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about? Help me to understand what I'm saying, and I'll explain it better to you Rating helpful answers is nice, but saying thanks can be even nicer.
-
The library I made is in c++.
"They have a consciousness, they have a life, they have a soul! Damn you! Let the rabbits wear glasses! Save our brothers! Can I get an amen?"
Any chance to publish an article about it? ;)
M.D.V. ;) If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about? Help me to understand what I'm saying, and I'll explain it better to you Rating helpful answers is nice, but saying thanks can be even nicer.
-
Any chance to publish an article about it? ;)
M.D.V. ;) If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about? Help me to understand what I'm saying, and I'll explain it better to you Rating helpful answers is nice, but saying thanks can be even nicer.
I am really not sure. It is so entwined with the rest of my framework it would be difficult to extract and isolate. I have stuff like multiple trace contexts, editable colors on a per-thread basis, and other stuff. I guess I could remove all of that and provide basic functionality without the accessories. I'll finish up a few other things and see what I can do.
"They have a consciousness, they have a life, they have a soul! Damn you! Let the rabbits wear glasses! Save our brothers! Can I get an amen?"
-
I am really not sure. It is so entwined with the rest of my framework it would be difficult to extract and isolate. I have stuff like multiple trace contexts, editable colors on a per-thread basis, and other stuff. I guess I could remove all of that and provide basic functionality without the accessories. I'll finish up a few other things and see what I can do.
"They have a consciousness, they have a life, they have a soul! Damn you! Let the rabbits wear glasses! Save our brothers! Can I get an amen?"
Rick York wrote:
t is so entwined with the rest of my framework it would be difficult to extract and isolate. I have stuff like multiple trace contexts, editable colors on a per-thread basis, and other stuff
What about a serie of articles? :rolleyes: ;P :laugh:
M.D.V. ;) If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about? Help me to understand what I'm saying, and I'll explain it better to you Rating helpful answers is nice, but saying thanks can be even nicer.