Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. Most efficient switch statement variable to use?

Most efficient switch statement variable to use?

Scheduled Pinned Locked Moved C#
questioncsharpcss
20 Posts 8 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • D DaveyM69

    Luc Pattyn wrote:

    economize on memory

    Interesting point - I've never delved that deep into the CPU architecture. Are things always word aligned or byte aligned, or is it variable? If it's word aligned then there would be no advantage at all to using anything less than the native size.

    Dave
    BTW, in software, hope and pray is not a viable strategy. (Luc Pattyn)
    Visual Basic is not used by normal people so we're not covering it here. (Uncyclopedia)

    L Offline
    L Offline
    Lost User
    wrote on last edited by
    #11

    The CPU (assuming x86) doesn't require much, SSE requires 16 byte alignment unless you don't care about an approx 100% overhead. But the windows (and linux, too) ABI requires alignment of all things to at least their size, and of the stack to "twice the pointer size" (depends on bit-ness, obviously) The individual elements in an array of bytes can be byte aligned but the starting address will usually be dword (or more) aligned. And Luc is correct of course, in general. Some instructions such as div and idiv have a latency and throughput depending on the value of the result, and small types can lead to smaller values (thus faster computation), obviously that is not guaranteed since it depends on the actual values. For floats, lower precision makes operations such as fdiv faster.

    1 Reply Last reply
    0
    • L Lost User

      Some forms of "bitfucking" are becoming more important though, since with the widening CPU-RAM speed gap it becomes increasingly important to not address more memory in inner loops than the size of the L2 cache (needing less if of course even better), if "bitfucks" are needed to accomplish that then so be it. And since the conditions for store-forwarding are very restrictive, extracting smaller types from within larger types at a non-aligned point should always be done with a "bitfuck" - performance will suck if you write it to memory and read a smaller and unaligned part of it back. The reverse, inserting a small type into a large type is even worse, it is never store-forwarded so "bitfucking" is always needed unless the code is not in a performance critical section.

      D Offline
      D Offline
      Deresen
      wrote on last edited by
      #12

      You are true, and you always have to bitfuck wherever you can I agree with that. Maybe it's because I like it, maybe because I started with microcontrollers. But if we go and calculate, we will see that it doesn't really matter which types you will use. An average to small cache size is 1 Gb, this means. Let's say your program will use half of it so this leaves you to use 500 M of bytes. If you're an efficient programmer you can never make your program to use the whole 500 M. Even if you store numbers which you could put into a byte in an Int128 it'll take you an array of (500/16 =~) 30 million to fill it. The only way to fill your RAM is when you're busy with graphics. And even then it's not neccecary to bitfuck, because Microsoft made some good library's which will all take care of the memory problem. In conclusion I think we can say that bitfucking is fun, but not really neccesary if you're just making efficient code.

      L 1 Reply Last reply
      0
      • D Deresen

        You are true, and you always have to bitfuck wherever you can I agree with that. Maybe it's because I like it, maybe because I started with microcontrollers. But if we go and calculate, we will see that it doesn't really matter which types you will use. An average to small cache size is 1 Gb, this means. Let's say your program will use half of it so this leaves you to use 500 M of bytes. If you're an efficient programmer you can never make your program to use the whole 500 M. Even if you store numbers which you could put into a byte in an Int128 it'll take you an array of (500/16 =~) 30 million to fill it. The only way to fill your RAM is when you're busy with graphics. And even then it's not neccecary to bitfuck, because Microsoft made some good library's which will all take care of the memory problem. In conclusion I think we can say that bitfucking is fun, but not really neccesary if you're just making efficient code.

        L Offline
        L Offline
        Lost User
        wrote on last edited by
        #13

        Deresen wrote:

        average to small cache size is 1 Gb

        Where did you get that information? The biggest cache size I've seen so far is 12MB (as 2x6MB) of L2. That is not so much, and every so often an other program will come along and trash it (if you don't do it yourself)

        D 1 Reply Last reply
        0
        • L Lost User

          Deresen wrote:

          average to small cache size is 1 Gb

          Where did you get that information? The biggest cache size I've seen so far is 12MB (as 2x6MB) of L2. That is not so much, and every so often an other program will come along and trash it (if you don't do it yourself)

          D Offline
          D Offline
          Deresen
          wrote on last edited by
          #14

          My mistake, I was thinking about RAM memory. :omg:

          L 1 Reply Last reply
          0
          • D Deresen

            My mistake, I was thinking about RAM memory. :omg:

            L Offline
            L Offline
            Lost User
            wrote on last edited by
            #15

            Well then 1GB is indeed small :) But RAM is slow (compared to the CPU), a cache miss can easily cost 100 cycles - long enough to justify doing complex calculations just to avoid the cache miss, plenty of time for 150 to 250 instructions That makes me wonder what the theoretical maximum of instructions in 100 cycles is (on Core2) 500 if looking only at the predecoder specs: macro fusion can fuse 2 instructions but can only be done once per cycle, so 5 instructions (3 of which must be 1 µop) and the size of such a "block" should satisfy N*size = 16 (to Never cross a boundary) and no 66H or 67H prefixes should occur anywhere But then looking at the rest: the sequence must not have any dependacy chains, not all instructions are perfectly pipelined, there are only 3 "normal" ports (0, 1 and 5) and even register reading can be a bottleneck. Only 6 µops per cycle are allowed, but that includes memory read/write (bringing us down to 400 except for µop fusion) The predecoder throughput would be less important (only the first iteration) if we were executing a small (less than 4 times 16 bytes) loop.. And I'm not even going to mention the rest. The best throughput of any one instruction is 3/cycle (a stream of NOP's for example) so it should be possible to do (slightly) better than that, right? This is too complex, I'll leave it to the pro's.

            1 Reply Last reply
            0
            • B Bruce Coward

              When I am using switch statements in C# what is the most efficient type of switch variable to use? Normally I only have less than 10 cases which makes an Int32 seem rather overkill but am I correct in assuming that as the machine is running 32 bits that may be more efficient than trimming the switch variable down to 16 or even 8 bits which may cause more code steps? Cheers, Bruce :cool:

              M Offline
              M Offline
              Mark Churchill
              wrote on last edited by
              #16

              Minor optimisation tricks like that are something that your compiler should (and probably is) is handling for you. Write clear and readable code.

              Mark Churchill Director, Dunn & Churchill Pty Ltd Free Download: Diamond Binding: The simple, powerful, reliable, and effective data layer toolkit for Visual Studio.
              Entanglar: .Net game engine featuring automatic networking and powerful HLSL gfx binding.

              P L 2 Replies Last reply
              0
              • D DaveyM69

                Luc Pattyn wrote:

                economize on memory

                Interesting point - I've never delved that deep into the CPU architecture. Are things always word aligned or byte aligned, or is it variable? If it's word aligned then there would be no advantage at all to using anything less than the native size.

                Dave
                BTW, in software, hope and pray is not a viable strategy. (Luc Pattyn)
                Visual Basic is not used by normal people so we're not covering it here. (Uncyclopedia)

                L Offline
                L Offline
                Luc Pattyn
                wrote on last edited by
                #17

                DaveyM69 wrote:

                Are things always word aligned

                There is the notion of "natural alignment" which states each item should be aligned to its size, so 2B shorts have even addresses, 4B ints have addresses that are multiples of 4, etc. (although items larger than the int size (long and double) don't need to be aligned (but SIMD data does). A struct by default would use padding to achieve that when necessary, i.e. it would insert dummy bytes when required. To reduce the size bloat, the suggestion if to orden members from largest to smallest. The linker and the run-time will allocate objects at a multiple of 8 or even 16B, so a structs that would only need 6B will effectively be layed out 8B apart. Warning: some Win32 APIs expect an array of structs with odd sizes, such as 6. If you don't want any padding, use Marshal attributes with explicit offsets. :)

                Luc Pattyn [Forum Guidelines] [My Articles]


                - before you ask a question here, search CodeProject, then Google - the quality and detail of your question reflects on the effectiveness of the help you are likely to get - use the code block button (PRE tags) to preserve formatting when showing multi-line code snippets


                modified on Sunday, June 12, 2011 8:36 AM

                1 Reply Last reply
                0
                • M Mark Churchill

                  Minor optimisation tricks like that are something that your compiler should (and probably is) is handling for you. Write clear and readable code.

                  Mark Churchill Director, Dunn & Churchill Pty Ltd Free Download: Diamond Binding: The simple, powerful, reliable, and effective data layer toolkit for Visual Studio.
                  Entanglar: .Net game engine featuring automatic networking and powerful HLSL gfx binding.

                  P Offline
                  P Offline
                  PIEBALDconsult
                  wrote on last edited by
                  #18

                  Hear hear!

                  1 Reply Last reply
                  0
                  • M Mark Churchill

                    Minor optimisation tricks like that are something that your compiler should (and probably is) is handling for you. Write clear and readable code.

                    Mark Churchill Director, Dunn & Churchill Pty Ltd Free Download: Diamond Binding: The simple, powerful, reliable, and effective data layer toolkit for Visual Studio.
                    Entanglar: .Net game engine featuring automatic networking and powerful HLSL gfx binding.

                    L Offline
                    L Offline
                    Lost User
                    wrote on last edited by
                    #19

                    Alas, it isn't (but it should!).

                    1 Reply Last reply
                    0
                    • L Lost User

                      Downcasting is often a 0 step operation (no operation actually, but just start using less bits of the register) It won't become faster though, even operations on 128bits at a time have the same latency and throughput. On Core2 at least. note: the following part is based on speculation So would code would the JIT compiler generate for a switch instruction in MSIL? With a bit of bad luck it will generate something like: (assume switch variable is in eax) mov edx,SwitchTabelBase ;or any other register that can be used as base jmp [edx+4*eax] Bad luck because that would mean it will need an extra operation if it sees a downcast to a byte first, it can't just assume that the cast is a nop, unless it would do some expensive analyzes it can't know that the value won't exceed 255 anyway. So it would have to do: movzx eax,al mov edx,SwitchTabelBase ;or any other register that can be used as base jmp [edx+4*eax] Or possibly: and eax,0xFF mov edx,SwitchTabelBase ;or any other register that can be used as base jmp [edx+4*eax] Or something else? Who knows? How can you disassemble the code after it's JIT-compiled anyway? However! (speculations end here) A switch instruction (in MSIL) is only generated when the resulting table would be dense, when it isn't generated, a 'tree' of if's is generated instead (or a chain of if's as a special case, which the .NET reflector doesn't understand). It isn't really a tree though, it's a mess of if's and labels (in a linear way), but the data-flow as "as though it were a tree of if's". It does the expected thing - split the range of values in 2 every time. Obviously this is a O(log n) algorithm so beware, switch doesn't always perform O(1). edit: the whole point for this was to note that: The size of the operand doesn't matter for speed, unless it's bigger than 64bits, because these are comparisons, and a 64bit comparison can be done as fast as any smaller comparison. 32bit if running in 32bit mode. The 128bits thing only works when working with SSE, which the .NET JIT compiler doesn't use, except for FISTTP which is technically an SSE instruction but it works with the regular floating point stack. For switches on strings it's a whole different story - if a real switch is used, all possible values are put into a dictionary every time again the dictionary is not saved. Otherwise it generated a chain of if's, using op_Equality (aka ==) between the value and every case. Both algorithms are O(n) in the number of cases, but the chain of if's at least has an early

                      L Offline
                      L Offline
                      Lost User
                      wrote on last edited by
                      #20

                      Aw no comments? It took me quite a while to do the required research..

                      1 Reply Last reply
                      0
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Don't have an account? Register

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups