QueryThreadCycleTime, QueryProcessCycleTime: what do these number mean?
-
I've tried e.g. QueryThreadCycleTime as follows:
#include
#includeint main()
{for (int i = 0; i < 10; ++i)
{
uint64_t n1 = 0, n2 = 0;
BOOL ok1 = QueryThreadCycleTime(GetCurrentThread(), &n1);
BOOL ok2 = QueryThreadCycleTime(GetCurrentThread(), &n2);if (ok1 && ok2) std::cout << n2 - n1 << "\\n"; else std::cout << "n/a\\n";
}
}Typical results are:
1036
1114
1734
748
706
670
652
716
652
666The numbers vary widely - ok, maybe these are expensive calls - but what is the point then? If I replace the thread cycles with process cycles, numbers get even weirder:
BOOL ok1 = QueryProcessCycleTime(GetCurrentProcess(), &n1); BOOL ok2 = QueryProcessCycleTime(GetCurrentProcess(), &n2);
With typical results:
39666
39520
304964
145932
47486
287156
191528
208652
196176
288642This blog post[^] suggests that it should be no worse than
QueryPerformanceCounter
, but that's not what I'm seeing. Anyone with insights? -
I've tried e.g. QueryThreadCycleTime as follows:
#include
#includeint main()
{for (int i = 0; i < 10; ++i)
{
uint64_t n1 = 0, n2 = 0;
BOOL ok1 = QueryThreadCycleTime(GetCurrentThread(), &n1);
BOOL ok2 = QueryThreadCycleTime(GetCurrentThread(), &n2);if (ok1 && ok2) std::cout << n2 - n1 << "\\n"; else std::cout << "n/a\\n";
}
}Typical results are:
1036
1114
1734
748
706
670
652
716
652
666The numbers vary widely - ok, maybe these are expensive calls - but what is the point then? If I replace the thread cycles with process cycles, numbers get even weirder:
BOOL ok1 = QueryProcessCycleTime(GetCurrentProcess(), &n1); BOOL ok2 = QueryProcessCycleTime(GetCurrentProcess(), &n2);
With typical results:
39666
39520
304964
145932
47486
287156
191528
208652
196176
288642This blog post[^] suggests that it should be no worse than
QueryPerformanceCounter
, but that's not what I'm seeing. Anyone with insights?As quoted at QueryProcessCycleTime function (realtimeapiset.h) - Win32 apps | Microsoft Learn[^]:
The number of CPU clock cycles used by the threads of the process. This value includes cycles spent in both user mode and kernel mode.
. So it just shows which threads are consuming what. That may allow you to tune your application if it is largely compute bound.
-
I've tried e.g. QueryThreadCycleTime as follows:
#include
#includeint main()
{for (int i = 0; i < 10; ++i)
{
uint64_t n1 = 0, n2 = 0;
BOOL ok1 = QueryThreadCycleTime(GetCurrentThread(), &n1);
BOOL ok2 = QueryThreadCycleTime(GetCurrentThread(), &n2);if (ok1 && ok2) std::cout << n2 - n1 << "\\n"; else std::cout << "n/a\\n";
}
}Typical results are:
1036
1114
1734
748
706
670
652
716
652
666The numbers vary widely - ok, maybe these are expensive calls - but what is the point then? If I replace the thread cycles with process cycles, numbers get even weirder:
BOOL ok1 = QueryProcessCycleTime(GetCurrentProcess(), &n1); BOOL ok2 = QueryProcessCycleTime(GetCurrentProcess(), &n2);
With typical results:
39666
39520
304964
145932
47486
287156
191528
208652
196176
288642This blog post[^] suggests that it should be no worse than
QueryPerformanceCounter
, but that's not what I'm seeing. Anyone with insights?Your code on my machine:
2716
2793
2768
2617
2977
2708
2686
2795
2731
2651well, not perfect, but... Let's try another time:
2703
2891
52734
2611
2623
2613
2611
2609
2608
2592Absolute rubbish laddie!
"In testa che avete, Signor di Ceprano?" -- Rigoletto
-
Your code on my machine:
2716
2793
2768
2617
2977
2708
2686
2795
2731
2651well, not perfect, but... Let's try another time:
2703
2891
52734
2611
2623
2613
2611
2609
2608
2592Absolute rubbish laddie!
"In testa che avete, Signor di Ceprano?" -- Rigoletto
-
CPallini wrote:
Absolute rubbish laddie!
I always suspected you were Scottish and not Italian. :laugh:
-
I've tried e.g. QueryThreadCycleTime as follows:
#include
#includeint main()
{for (int i = 0; i < 10; ++i)
{
uint64_t n1 = 0, n2 = 0;
BOOL ok1 = QueryThreadCycleTime(GetCurrentThread(), &n1);
BOOL ok2 = QueryThreadCycleTime(GetCurrentThread(), &n2);if (ok1 && ok2) std::cout << n2 - n1 << "\\n"; else std::cout << "n/a\\n";
}
}Typical results are:
1036
1114
1734
748
706
670
652
716
652
666The numbers vary widely - ok, maybe these are expensive calls - but what is the point then? If I replace the thread cycles with process cycles, numbers get even weirder:
BOOL ok1 = QueryProcessCycleTime(GetCurrentProcess(), &n1); BOOL ok2 = QueryProcessCycleTime(GetCurrentProcess(), &n2);
With typical results:
39666
39520
304964
145932
47486
287156
191528
208652
196176
288642This blog post[^] suggests that it should be no worse than
QueryPerformanceCounter
, but that's not what I'm seeing. Anyone with insights?peterchen wrote:
This blog post[^] suggests that it should be no worse than QueryPerformanceCounter, but that's not what I'm seeing. Anyone with insights?
My translation of what you are asking:
peterchen should have asked:
QueryProcessCycleTime
uses the RDTSC instruction.QueryPerformanceCounter
historically also used the RDTSC instruction. Why aren't they giving similar outputs?As you probably already know the TSC is superceeded by [the HPET](https://en.wikipedia.org/wiki/High\_Precision\_Event\_Timer). There are a dozen reasons why RDTSC provides inaccurate results. The Meltdown/Spectre mitigations were probably the nail in the coffin so to speak. The answer to your question is that [
QueryPerformanceCounter
](https://learn.microsoft.com/en-us/windows/win32/api/profileapi/nf-profileapi-queryperformancecounter) uses the HPET/APIC clock andQueryProcessCycleTime
is still using the old [rdtsc instruction](https://learn.microsoft.com/en-us/cpp/intrinsics/rdtsc?view=msvc-170). It's apparently a huge mess, @HaroldAptroot says *sometimes* QueryPerformanceCounter uses HPET and sometimes it doesn't depending on whether or not the TSC is invariant. -
peterchen wrote:
This blog post[^] suggests that it should be no worse than QueryPerformanceCounter, but that's not what I'm seeing. Anyone with insights?
My translation of what you are asking:
peterchen should have asked:
QueryProcessCycleTime
uses the RDTSC instruction.QueryPerformanceCounter
historically also used the RDTSC instruction. Why aren't they giving similar outputs?As you probably already know the TSC is superceeded by [the HPET](https://en.wikipedia.org/wiki/High\_Precision\_Event\_Timer). There are a dozen reasons why RDTSC provides inaccurate results. The Meltdown/Spectre mitigations were probably the nail in the coffin so to speak. The answer to your question is that [
QueryPerformanceCounter
](https://learn.microsoft.com/en-us/windows/win32/api/profileapi/nf-profileapi-queryperformancecounter) uses the HPET/APIC clock andQueryProcessCycleTime
is still using the old [rdtsc instruction](https://learn.microsoft.com/en-us/cpp/intrinsics/rdtsc?view=msvc-170). It's apparently a huge mess, @HaroldAptroot says *sometimes* QueryPerformanceCounter uses HPET and sometimes it doesn't depending on whether or not the TSC is invariant.Randor wrote:
As you probably already know the TSC is superceeded by the HPET.
Not actually true though, QPC is based on HPET *only when necessary*, which is basically if you have a CPU that does not have Invariant TSC (and you don't, unless your CPU is from the mid 2000's). QPC is based on the TSC on every reasonable computer.
-
Randor wrote:
As you probably already know the TSC is superceeded by the HPET.
Not actually true though, QPC is based on HPET *only when necessary*, which is basically if you have a CPU that does not have Invariant TSC (and you don't, unless your CPU is from the mid 2000's). QPC is based on the TSC on every reasonable computer.
-
Hmmm, Do you know where I can find a list of processors with an invariant TSC? I see that cpuid has 80000007H to indicate support but where can I find a list of processors that support it?