Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. Cost of a function call

Cost of a function call

Scheduled Pinned Locked Moved C / C++ / MFC
c++questionsysadminbusinessperformance
9 Posts 5 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • L Offline
    L Offline
    Lost User
    wrote on last edited by
    #1

    Friends i need to know about the overhead involved in a function call w.r.t Visual C++ compiler. Actually i am writing a server based aplication in which speed and efficiency are the primary requirements. There is a function in my program which makes some complex calculations. This function is called by a for loop about 150 times. I also make this function inline function, but i've read in number of C++ docs that making a function inline is of no guarantee that compiler really makes it inline and it depends on a compiler to decide whether to "paste" the code or to keep it a separate function. So i am in confusion that in my application, a loop is calling the function 150 times and if too many clients send request and i call this function 150 times for each request then there will be lots of CPU cycles required which decreases the efficiency of my application. So what do you people suggest me to do? Please also tell me the cost involved in calling a function and also tell me how can i judge that the compiler really makes my function inline or not ?? Thanks

    M J B 3 Replies Last reply
    0
    • L Lost User

      Friends i need to know about the overhead involved in a function call w.r.t Visual C++ compiler. Actually i am writing a server based aplication in which speed and efficiency are the primary requirements. There is a function in my program which makes some complex calculations. This function is called by a for loop about 150 times. I also make this function inline function, but i've read in number of C++ docs that making a function inline is of no guarantee that compiler really makes it inline and it depends on a compiler to decide whether to "paste" the code or to keep it a separate function. So i am in confusion that in my application, a loop is calling the function 150 times and if too many clients send request and i call this function 150 times for each request then there will be lots of CPU cycles required which decreases the efficiency of my application. So what do you people suggest me to do? Please also tell me the cost involved in calling a function and also tell me how can i judge that the compiler really makes my function inline or not ?? Thanks

      M Offline
      M Offline
      Michael Dunn
      wrote on last edited by
      #2

      Don't worry about it. From your description, the process of setting up a new stack and calling a subroutine isn't the bottleneck. Removing those steps would not gain any significant time compared to the big complex function. --Mike-- Eh! Steve! Homepage | RightClick-Encrypt | 1ClickPicGrabber "You have Erica on the brain" - Jon Sagara to me

      1 Reply Last reply
      0
      • L Lost User

        Friends i need to know about the overhead involved in a function call w.r.t Visual C++ compiler. Actually i am writing a server based aplication in which speed and efficiency are the primary requirements. There is a function in my program which makes some complex calculations. This function is called by a for loop about 150 times. I also make this function inline function, but i've read in number of C++ docs that making a function inline is of no guarantee that compiler really makes it inline and it depends on a compiler to decide whether to "paste" the code or to keep it a separate function. So i am in confusion that in my application, a loop is calling the function 150 times and if too many clients send request and i call this function 150 times for each request then there will be lots of CPU cycles required which decreases the efficiency of my application. So what do you people suggest me to do? Please also tell me the cost involved in calling a function and also tell me how can i judge that the compiler really makes my function inline or not ?? Thanks

        J Offline
        J Offline
        Joe Woodbury
        wrote on last edited by
        #3

        A function call will take up CPU time, however it is very minimal as compared to what you are doing inside the function. Bringing the function inline could actually make the operation slower under certain circumstances since it could change the optimization. All this is mostly irrelevant since optimizations shouldn't be based on theory but on measurable results always keeping in mind the adage that 90% of your time is spent in 10% of your code. Also, the algorithm itself is more important than the implementation. (By chance I was doing some optimizing this morning. So I modified the test a little and found that making an intermediate function call added about 8 CPU cycles per call on a Celeron 900. The result with the new algorithm was still 4x faster than the original algorithm.) PS. You could use the keyword __forceinline if your test show that this will improve performance.

        S L 2 Replies Last reply
        0
        • J Joe Woodbury

          A function call will take up CPU time, however it is very minimal as compared to what you are doing inside the function. Bringing the function inline could actually make the operation slower under certain circumstances since it could change the optimization. All this is mostly irrelevant since optimizations shouldn't be based on theory but on measurable results always keeping in mind the adage that 90% of your time is spent in 10% of your code. Also, the algorithm itself is more important than the implementation. (By chance I was doing some optimizing this morning. So I modified the test a little and found that making an intermediate function call added about 8 CPU cycles per call on a Celeron 900. The result with the new algorithm was still 4x faster than the original algorithm.) PS. You could use the keyword __forceinline if your test show that this will improve performance.

          S Offline
          S Offline
          S van Leent
          wrote on last edited by
          #4

          Well, I know someone who makes a 3D-graphics engine, it really does matter pushing things on the stack or not. So I think a compiler should just inline when you want it to inline. Which doesn't gain much over complex calculations, but simple calculations, at the other hand, do matter. LPCTSTR Dutch = TEXT("Double Dutch :-)");

          J 1 Reply Last reply
          0
          • S S van Leent

            Well, I know someone who makes a 3D-graphics engine, it really does matter pushing things on the stack or not. So I think a compiler should just inline when you want it to inline. Which doesn't gain much over complex calculations, but simple calculations, at the other hand, do matter. LPCTSTR Dutch = TEXT("Double Dutch :-)");

            J Offline
            J Offline
            Joe Woodbury
            wrote on last edited by
            #5

            S van Leent wrote: it really does matter pushing things on the stack or not It MAY matter, performance wise. If speed is of most importance, a developer may choose to use the __fastcall modifier on functions. However, if speed is that important, you should always test your code, not simply presume it is faster one way over the other. S van Leent wrote: So I think a compiler should just inline when you want it to inline. I totally disagree. The standard is quite correct in this regard. For convenience, or when using templates, developers may implement a member function in the header, either explicitly inline or within the body of the class definition. If all of those are automatically inlined, it will result in bloated code and may very well result in slower code. (Beyond CPU cycle counts, bloated code will result in more paging which will have a devastating effect on performance.) In general, unless you really understand the CPU architecture, you should just write clean C/C++ code and let the compiler do it's thing. Again, most performance bottlenecks are in a tiny part of the code and can often be fixed by using a better algorithm and in understanding and leveraging the OS better.

            S 1 Reply Last reply
            0
            • J Joe Woodbury

              A function call will take up CPU time, however it is very minimal as compared to what you are doing inside the function. Bringing the function inline could actually make the operation slower under certain circumstances since it could change the optimization. All this is mostly irrelevant since optimizations shouldn't be based on theory but on measurable results always keeping in mind the adage that 90% of your time is spent in 10% of your code. Also, the algorithm itself is more important than the implementation. (By chance I was doing some optimizing this morning. So I modified the test a little and found that making an intermediate function call added about 8 CPU cycles per call on a Celeron 900. The result with the new algorithm was still 4x faster than the original algorithm.) PS. You could use the keyword __forceinline if your test show that this will improve performance.

              L Offline
              L Offline
              Lost User
              wrote on last edited by
              #6

              Joe Woodbury wrote: By chance I was doing some optimizing this morning. So I modified the test a little and found that making an intermediate function call added about 8 CPU cycles per call on a Celeron 900. Can you please tell me what method you normally use to find out the number of CPU cycles involved in a function call ???

              J 1 Reply Last reply
              0
              • L Lost User

                Friends i need to know about the overhead involved in a function call w.r.t Visual C++ compiler. Actually i am writing a server based aplication in which speed and efficiency are the primary requirements. There is a function in my program which makes some complex calculations. This function is called by a for loop about 150 times. I also make this function inline function, but i've read in number of C++ docs that making a function inline is of no guarantee that compiler really makes it inline and it depends on a compiler to decide whether to "paste" the code or to keep it a separate function. So i am in confusion that in my application, a loop is calling the function 150 times and if too many clients send request and i call this function 150 times for each request then there will be lots of CPU cycles required which decreases the efficiency of my application. So what do you people suggest me to do? Please also tell me the cost involved in calling a function and also tell me how can i judge that the compiler really makes my function inline or not ?? Thanks

                B Offline
                B Offline
                Baris Kurtlutepe
                wrote on last edited by
                #7

                Shah Shehpori wrote: but i've read in number of C++ docs that making a function inline is of no guarantee that compiler really makes it inline and it depends on a compiler.. In Visual C++ you can use the __forceinline keyword (which is a Visual C++ specific keyword as the double underscore suggests) to force the compiler to compile it as an inline function, again there are restrictions to this but I don't think that'll be the case for you. You can read further in MSDN. Edit: Oops sorry it was already suggested above. I guess the force is not with me then. :)

                1 Reply Last reply
                0
                • L Lost User

                  Joe Woodbury wrote: By chance I was doing some optimizing this morning. So I modified the test a little and found that making an intermediate function call added about 8 CPU cycles per call on a Celeron 900. Can you please tell me what method you normally use to find out the number of CPU cycles involved in a function call ???

                  J Offline
                  J Offline
                  Joe Woodbury
                  wrote on last edited by
                  #8

                  At the core, you use the rdtsc x86 instruction. The actual sequence is:

                  ULARGE\_INTEGER cycles;
                  
                  if (!m\_onNT)
                  	\_asm cli
                  
                  \_asm
                  {
                  	pushad
                  	cpuid
                  	rdtsc
                  	mov cycles.HighPart,edx
                  	mov cycles.LowPart,eax
                  	popad
                  }
                  
                  if (!m\_onNT)
                  	\_asm sti
                  

                  You then calculate the overhead of the base test and then time the tests and do analysis (I throw away the top and bottom 20% of the results and average the rest.) You can also calculate the actual speed of the CPU and convert the cycles into seconds. I only do this if I really need to know the time in seconds, otherwise, I just compare cycles. There are classes posted in CodeProject to help with all this, though I use my own.

                  1 Reply Last reply
                  0
                  • J Joe Woodbury

                    S van Leent wrote: it really does matter pushing things on the stack or not It MAY matter, performance wise. If speed is of most importance, a developer may choose to use the __fastcall modifier on functions. However, if speed is that important, you should always test your code, not simply presume it is faster one way over the other. S van Leent wrote: So I think a compiler should just inline when you want it to inline. I totally disagree. The standard is quite correct in this regard. For convenience, or when using templates, developers may implement a member function in the header, either explicitly inline or within the body of the class definition. If all of those are automatically inlined, it will result in bloated code and may very well result in slower code. (Beyond CPU cycle counts, bloated code will result in more paging which will have a devastating effect on performance.) In general, unless you really understand the CPU architecture, you should just write clean C/C++ code and let the compiler do it's thing. Again, most performance bottlenecks are in a tiny part of the code and can often be fixed by using a better algorithm and in understanding and leveraging the OS better.

                    S Offline
                    S Offline
                    S van Leent
                    wrote on last edited by
                    #9

                    Joe Woodbury wrote: In general, unless you really understand the CPU architecture, you should just write clean C/C++ code and let the compiler do it's thing. I agree with that, also, I think Inlining is sometimes a lazy way of writing a macro, which is (the macro) sometimes much better. LPCTSTR Dutch = TEXT("Double Dutch :-)");

                    1 Reply Last reply
                    0
                    Reply
                    • Reply as topic
                    Log in to reply
                    • Oldest to Newest
                    • Newest to Oldest
                    • Most Votes


                    • Login

                    • Don't have an account? Register

                    • Login or register to search.
                    • First post
                      Last post
                    0
                    • Categories
                    • Recent
                    • Tags
                    • Popular
                    • World
                    • Users
                    • Groups