Cost of a function call

Lost User

Friends i need to know about the overhead involved in a function call w.r.t Visual C++ compiler. Actually i am writing a server based aplication in which speed and efficiency are the primary requirements. There is a function in my program which makes some complex calculations. This function is called by a for loop about 150 times. I also make this function inline function, but i've read in number of C++ docs that making a function inline is of no guarantee that compiler really makes it inline and it depends on a compiler to decide whether to "paste" the code or to keep it a separate function. So i am in confusion that in my application, a loop is calling the function 150 times and if too many clients send request and i call this function 150 times for each request then there will be lots of CPU cycles required which decreases the efficiency of my application. So what do you people suggest me to do? Please also tell me the cost involved in calling a function and also tell me how can i judge that the compiler really makes my function inline or not ?? Thanks

Michael Dunn

Don't worry about it. From your description, the process of setting up a new stack and calling a subroutine isn't the bottleneck. Removing those steps would not gain any significant time compared to the big complex function. --Mike-- Eh! Steve! Homepage | RightClick-Encrypt | 1ClickPicGrabber "You have Erica on the brain" - Jon Sagara to me

Joe Woodbury

A function call will take up CPU time, however it is very minimal as compared to what you are doing inside the function. Bringing the function inline could actually make the operation slower under certain circumstances since it could change the optimization. All this is mostly irrelevant since optimizations shouldn't be based on theory but on measurable results always keeping in mind the adage that 90% of your time is spent in 10% of your code. Also, the algorithm itself is more important than the implementation. (By chance I was doing some optimizing this morning. So I modified the test a little and found that making an intermediate function call added about 8 CPU cycles per call on a Celeron 900. The result with the new algorithm was still 4x faster than the original algorithm.) PS. You could use the keyword __forceinline if your test show that this will improve performance.

S van Leent

Well, I know someone who makes a 3D-graphics engine, it really does matter pushing things on the stack or not. So I think a compiler should just inline when you want it to inline. Which doesn't gain much over complex calculations, but simple calculations, at the other hand, do matter. LPCTSTR Dutch = TEXT("Double Dutch :-)");

Joe Woodbury

S van Leent wrote: it really does matter pushing things on the stack or not It MAY matter, performance wise. If speed is of most importance, a developer may choose to use the __fastcall modifier on functions. However, if speed is that important, you should always test your code, not simply presume it is faster one way over the other. S van Leent wrote: So I think a compiler should just inline when you want it to inline. I totally disagree. The standard is quite correct in this regard. For convenience, or when using templates, developers may implement a member function in the header, either explicitly inline or within the body of the class definition. If all of those are automatically inlined, it will result in bloated code and may very well result in slower code. (Beyond CPU cycle counts, bloated code will result in more paging which will have a devastating effect on performance.) In general, unless you really understand the CPU architecture, you should just write clean C/C++ code and let the compiler do it's thing. Again, most performance bottlenecks are in a tiny part of the code and can often be fixed by using a better algorithm and in understanding and leveraging the OS better.

Lost User

Joe Woodbury wrote: By chance I was doing some optimizing this morning. So I modified the test a little and found that making an intermediate function call added about 8 CPU cycles per call on a Celeron 900. Can you please tell me what method you normally use to find out the number of CPU cycles involved in a function call ???

Baris Kurtlutepe

Shah Shehpori wrote: but i've read in number of C++ docs that making a function inline is of no guarantee that compiler really makes it inline and it depends on a compiler.. In Visual C++ you can use the __forceinline keyword (which is a Visual C++ specific keyword as the double underscore suggests) to force the compiler to compile it as an inline function, again there are restrictions to this but I don't think that'll be the case for you. You can read further in MSDN. Edit: Oops sorry it was already suggested above. I guess the force is not with me then. :)

Joe Woodbury

At the core, you use the rdtsc x86 instruction. The actual sequence is:

ULARGE\_INTEGER cycles;

if (!m\_onNT)
	\_asm cli

\_asm
{
	pushad
	cpuid
	rdtsc
	mov cycles.HighPart,edx
	mov cycles.LowPart,eax
	popad
}

if (!m\_onNT)
	\_asm sti

You then calculate the overhead of the base test and then time the tests and do analysis (I throw away the top and bottom 20% of the results and average the rest.) You can also calculate the actual speed of the CPU and convert the cycles into seconds. I only do this if I really need to know the time in seconds, otherwise, I just compare cycles. There are classes posted in CodeProject to help with all this, though I use my own.

S van Leent

Joe Woodbury wrote: In general, unless you really understand the CPU architecture, you should just write clean C/C++ code and let the compiler do it's thing. I agree with that, also, I think Inlining is sometimes a lazy way of writing a macro, which is (the macro) sometimes much better. LPCTSTR Dutch = TEXT("Double Dutch :-)");