Most efficient switch statement variable to use?

Lost User

The CPU (assuming x86) doesn't require much, SSE requires 16 byte alignment unless you don't care about an approx 100% overhead. But the windows (and linux, too) ABI requires alignment of all things to at least their size, and of the stack to "twice the pointer size" (depends on bit-ness, obviously) The individual elements in an array of bytes can be byte aligned but the starting address will usually be dword (or more) aligned. And Luc is correct of course, in general. Some instructions such as div and idiv have a latency and throughput depending on the value of the result, and small types can lead to smaller values (thus faster computation), obviously that is not guaranteed since it depends on the actual values. For floats, lower precision makes operations such as fdiv faster.

Deresen

You are true, and you always have to bitfuck wherever you can I agree with that. Maybe it's because I like it, maybe because I started with microcontrollers. But if we go and calculate, we will see that it doesn't really matter which types you will use. An average to small cache size is 1 Gb, this means. Let's say your program will use half of it so this leaves you to use 500 M of bytes. If you're an efficient programmer you can never make your program to use the whole 500 M. Even if you store numbers which you could put into a byte in an Int128 it'll take you an array of (500/16 =~) 30 million to fill it. The only way to fill your RAM is when you're busy with graphics. And even then it's not neccecary to bitfuck, because Microsoft made some good library's which will all take care of the memory problem. In conclusion I think we can say that bitfucking is fun, but not really neccesary if you're just making efficient code.

Lost User

Deresen wrote:

average to small cache size is 1 Gb

Where did you get that information? The biggest cache size I've seen so far is 12MB (as 2x6MB) of L2. That is not so much, and every so often an other program will come along and trash it (if you don't do it yourself)

Deresen

My mistake, I was thinking about RAM memory. :omg:

Lost User

Well then 1GB is indeed small :) But RAM is slow (compared to the CPU), a cache miss can easily cost 100 cycles - long enough to justify doing complex calculations just to avoid the cache miss, plenty of time for 150 to 250 instructions That makes me wonder what the theoretical maximum of instructions in 100 cycles is (on Core2) 500 if looking only at the predecoder specs: macro fusion can fuse 2 instructions but can only be done once per cycle, so 5 instructions (3 of which must be 1 µop) and the size of such a "block" should satisfy N*size = 16 (to Never cross a boundary) and no 66H or 67H prefixes should occur anywhere But then looking at the rest: the sequence must not have any dependacy chains, not all instructions are perfectly pipelined, there are only 3 "normal" ports (0, 1 and 5) and even register reading can be a bottleneck. Only 6 µops per cycle are allowed, but that includes memory read/write (bringing us down to 400 except for µop fusion) The predecoder throughput would be less important (only the first iteration) if we were executing a small (less than 4 times 16 bytes) loop.. And I'm not even going to mention the rest. The best throughput of any one instruction is 3/cycle (a stream of NOP's for example) so it should be possible to do (slightly) better than that, right? This is too complex, I'll leave it to the pro's.

Mark Churchill

Minor optimisation tricks like that are something that your compiler should (and probably is) is handling for you. Write clear and readable code.

Mark Churchill Director, Dunn & Churchill Pty Ltd Free Download: Diamond Binding: The simple, powerful, reliable, and effective data layer toolkit for Visual Studio.
Entanglar: .Net game engine featuring automatic networking and powerful HLSL gfx binding.

Luc Pattyn

DaveyM69 wrote:

Are things always word aligned

There is the notion of "natural alignment" which states each item should be aligned to its size, so 2B shorts have even addresses, 4B ints have addresses that are multiples of 4, etc. (although items larger than the int size (long and double) don't need to be aligned (but SIMD data does). A struct by default would use padding to achieve that when necessary, i.e. it would insert dummy bytes when required. To reduce the size bloat, the suggestion if to orden members from largest to smallest. The linker and the run-time will allocate objects at a multiple of 8 or even 16B, so a structs that would only need 6B will effectively be layed out 8B apart. Warning: some Win32 APIs expect an array of structs with odd sizes, such as 6. If you don't want any padding, use Marshal attributes with explicit offsets. :)

Luc Pattyn [Forum Guidelines] [My Articles]

- before you ask a question here, search CodeProject, then Google - the quality and detail of your question reflects on the effectiveness of the help you are likely to get - use the code block button (PRE tags) to preserve formatting when showing multi-line code snippets

modified on Sunday, June 12, 2011 8:36 AM

PIEBALDconsult

Hear hear!

Lost User

Alas, it isn't (but it should!).

Lost User

Aw no comments? It took me quite a while to do the required research..