Most efficient switch statement variable to use?

Bruce Coward

When I am using switch statements in C# what is the most efficient type of switch variable to use? Normally I only have less than 10 cases which makes an Int32 seem rather overkill but am I correct in assuming that as the machine is running 32 bits that may be more efficient than trimming the switch variable down to 16 or even 8 bits which may cause more code steps? Cheers, Bruce :cool:

PIEBALDconsult

Probably, test it and let us know.

Luc Pattyn

Hi Bruce, in general integer operations are fastest for the native word size, meaning 8-bit or 16-bit operations are not faster than 32-bit operations on modern CPUs. This tells us byte and short mainly exist to support compatibility with existing data structures, files, etc; and of course to economize on memory when using large amounts of them as in arrays. BTW: This may not be very easy to test, since (1) programming languages use the native size for literal values anyway, and (2) often the compiler will use int operations although byte or short where coded when those ints are equivalent to what you actually coded. :)

Luc Pattyn [Forum Guidelines] [My Articles]

- before you ask a question here, search CodeProject, then Google - the quality and detail of your question reflects on the effectiveness of the help you are likely to get - use the code block button (PRE tags) to preserve formatting when showing multi-line code snippets

modified on Sunday, June 12, 2011 8:35 AM

PIEBALDconsult · modified on Sunday, June 12, 2011 8:35 AM

Luc Pattyn wrote:

often the compiler will use int operations although byte or short where coded

That's what I would expect; a bunch of up-casting to int.

Ennis Ray Lynch Jr

I don't know about faster but constants would increase maintainability. While there is no speed decrease it is poor practice to use strings as a switch variable.

Need custom software developed? I do C# development and consulting all over the United States. A man said to the universe: "Sir I exist!" "However," replied the universe, "The fact has not created in me A sense of obligation." --Stephen Crane

DaveyM69 · modified on Sunday, June 12, 2011 8:35 AM

Luc Pattyn wrote:

economize on memory

Interesting point - I've never delved that deep into the CPU architecture. Are things always word aligned or byte aligned, or is it variable? If it's word aligned then there would be no advantage at all to using anything less than the native size.

Dave
BTW, in software, hope and pray is not a viable strategy. (Luc Pattyn)
Visual Basic is not used by normal people so we're not covering it here. (Uncyclopedia)

Deresen · modified on Sunday, June 12, 2011 8:35 AM

Totaly correct! But in these times with an average cpu power of 2 GHz it is not really neccesary to 'bitfuck'. The problem comes when you go programming on external devices, though today most of the pda's and cell phones only have a lack of memory in stead of cpu power. It's getting tough when you program on microcontrollers, but then you shouldn't uing C#, but assembler!!

PIEBALDconsult

structs can be laid exactly as required. Each element of an array of bytes comes immediately after the previous. Etc.

Lost User

Downcasting is often a 0 step operation (no operation actually, but just start using less bits of the register) It won't become faster though, even operations on 128bits at a time have the same latency and throughput. On Core2 at least. note: the following part is based on speculation So would code would the JIT compiler generate for a switch instruction in MSIL? With a bit of bad luck it will generate something like: (assume switch variable is in eax) mov edx,SwitchTabelBase ;or any other register that can be used as base jmp [edx+4*eax] Bad luck because that would mean it will need an extra operation if it sees a downcast to a byte first, it can't just assume that the cast is a nop, unless it would do some expensive analyzes it can't know that the value won't exceed 255 anyway. So it would have to do: movzx eax,al mov edx,SwitchTabelBase ;or any other register that can be used as base jmp [edx+4*eax] Or possibly: and eax,0xFF mov edx,SwitchTabelBase ;or any other register that can be used as base jmp [edx+4*eax] Or something else? Who knows? How can you disassemble the code after it's JIT-compiled anyway? However! (speculations end here) A switch instruction (in MSIL) is only generated when the resulting table would be dense, when it isn't generated, a 'tree' of if's is generated instead (or a chain of if's as a special case, which the .NET reflector doesn't understand). It isn't really a tree though, it's a mess of if's and labels (in a linear way), but the data-flow as "as though it were a tree of if's". It does the expected thing - split the range of values in 2 every time. Obviously this is a O(log n) algorithm so beware, switch doesn't always perform O(1). edit: the whole point for this was to note that: The size of the operand doesn't matter for speed, unless it's bigger than 64bits, because these are comparisons, and a 64bit comparison can be done as fast as any smaller comparison. 32bit if running in 32bit mode. The 128bits thing only works when working with SSE, which the .NET JIT compiler doesn't use, except for FISTTP which is technically an SSE instruction but it works with the regular floating point stack. For switches on strings it's a whole different story - if a real switch is used, all possible values are put into a dictionary every time again the dictionary is not saved. Otherwise it generated a chain of if's, using op_Equality (aka ==) between the value and every case. Both algorithms are O(n) in the number of cases, but the chain of if's at least has an early

Lost User

Some forms of "bitfucking" are becoming more important though, since with the widening CPU-RAM speed gap it becomes increasingly important to not address more memory in inner loops than the size of the L2 cache (needing less if of course even better), if "bitfucks" are needed to accomplish that then so be it. And since the conditions for store-forwarding are very restrictive, extracting smaller types from within larger types at a non-aligned point should always be done with a "bitfuck" - performance will suck if you write it to memory and read a smaller and unaligned part of it back. The reverse, inserting a small type into a large type is even worse, it is never store-forwarded so "bitfucking" is always needed unless the code is not in a performance critical section.

Lost User

The CPU (assuming x86) doesn't require much, SSE requires 16 byte alignment unless you don't care about an approx 100% overhead. But the windows (and linux, too) ABI requires alignment of all things to at least their size, and of the stack to "twice the pointer size" (depends on bit-ness, obviously) The individual elements in an array of bytes can be byte aligned but the starting address will usually be dword (or more) aligned. And Luc is correct of course, in general. Some instructions such as div and idiv have a latency and throughput depending on the value of the result, and small types can lead to smaller values (thus faster computation), obviously that is not guaranteed since it depends on the actual values. For floats, lower precision makes operations such as fdiv faster.

Deresen

You are true, and you always have to bitfuck wherever you can I agree with that. Maybe it's because I like it, maybe because I started with microcontrollers. But if we go and calculate, we will see that it doesn't really matter which types you will use. An average to small cache size is 1 Gb, this means. Let's say your program will use half of it so this leaves you to use 500 M of bytes. If you're an efficient programmer you can never make your program to use the whole 500 M. Even if you store numbers which you could put into a byte in an Int128 it'll take you an array of (500/16 =~) 30 million to fill it. The only way to fill your RAM is when you're busy with graphics. And even then it's not neccecary to bitfuck, because Microsoft made some good library's which will all take care of the memory problem. In conclusion I think we can say that bitfucking is fun, but not really neccesary if you're just making efficient code.

Lost User

Deresen wrote:

average to small cache size is 1 Gb

Where did you get that information? The biggest cache size I've seen so far is 12MB (as 2x6MB) of L2. That is not so much, and every so often an other program will come along and trash it (if you don't do it yourself)

Deresen

My mistake, I was thinking about RAM memory. :omg:

Lost User

Well then 1GB is indeed small :) But RAM is slow (compared to the CPU), a cache miss can easily cost 100 cycles - long enough to justify doing complex calculations just to avoid the cache miss, plenty of time for 150 to 250 instructions That makes me wonder what the theoretical maximum of instructions in 100 cycles is (on Core2) 500 if looking only at the predecoder specs: macro fusion can fuse 2 instructions but can only be done once per cycle, so 5 instructions (3 of which must be 1 µop) and the size of such a "block" should satisfy N*size = 16 (to Never cross a boundary) and no 66H or 67H prefixes should occur anywhere But then looking at the rest: the sequence must not have any dependacy chains, not all instructions are perfectly pipelined, there are only 3 "normal" ports (0, 1 and 5) and even register reading can be a bottleneck. Only 6 µops per cycle are allowed, but that includes memory read/write (bringing us down to 400 except for µop fusion) The predecoder throughput would be less important (only the first iteration) if we were executing a small (less than 4 times 16 bytes) loop.. And I'm not even going to mention the rest. The best throughput of any one instruction is 3/cycle (a stream of NOP's for example) so it should be possible to do (slightly) better than that, right? This is too complex, I'll leave it to the pro's.

Mark Churchill

Minor optimisation tricks like that are something that your compiler should (and probably is) is handling for you. Write clear and readable code.

Mark Churchill Director, Dunn & Churchill Pty Ltd Free Download: Diamond Binding: The simple, powerful, reliable, and effective data layer toolkit for Visual Studio.
Entanglar: .Net game engine featuring automatic networking and powerful HLSL gfx binding.

Luc Pattyn

DaveyM69 wrote:

Are things always word aligned

There is the notion of "natural alignment" which states each item should be aligned to its size, so 2B shorts have even addresses, 4B ints have addresses that are multiples of 4, etc. (although items larger than the int size (long and double) don't need to be aligned (but SIMD data does). A struct by default would use padding to achieve that when necessary, i.e. it would insert dummy bytes when required. To reduce the size bloat, the suggestion if to orden members from largest to smallest. The linker and the run-time will allocate objects at a multiple of 8 or even 16B, so a structs that would only need 6B will effectively be layed out 8B apart. Warning: some Win32 APIs expect an array of structs with odd sizes, such as 6. If you don't want any padding, use Marshal attributes with explicit offsets. :)

Luc Pattyn [Forum Guidelines] [My Articles]

- before you ask a question here, search CodeProject, then Google - the quality and detail of your question reflects on the effectiveness of the help you are likely to get - use the code block button (PRE tags) to preserve formatting when showing multi-line code snippets

modified on Sunday, June 12, 2011 8:36 AM

PIEBALDconsult

Hear hear!

Lost User

Alas, it isn't (but it should!).

Lost User

Aw no comments? It took me quite a while to do the required research..