MSIL CPUs?
-
peterchen wrote: 1) implementign a "top of stack" in hardware. On the plus side, all operations go to the "same location" (if you turn away from the an pointer-based stack implementation). Further, the stack can be large, thus we need a caching mechanism. A "good first shot" could be implementing a special 1st level cache for this, that tracks the stack pointer itself, and allows/prefers operations on top of stack. Normally, the stack already ends in the L1 and (mostly) on L2 cache anyways. peterchen wrote: 3) call overhead. Two "intense" things: a) avoid the branch overhead when a property.get that just returns an internal var, and b) optimization possibilities when parameters are known. This overhead only exists because a) cache misses (unavoidable) and b) prolog/epilog code. In the case of MSIL there's no prolog/epilog, all functions are naked, because there are no registers to save. This could virtually eliminate the need for code inlining.
// Quantum sort algorithm implementation
while (!sorted)
;Daniel Turini wrote: Normally, the stack already ends in the L1 and (mostly) on L2 cache anyways Yep, but I think there's room for improvement - after all, the stack takes over the role of registers, and it should probably be separate from the L1 "data cache". The main overhead for the simplest "get" property is still at least 4 vs. 1 instruction (and likely not single ticks): call - load to register - ret - load to target. This adds up with nice encapsulated property getters. (And would even require a "more clever" architecture) As you say, for "normal" functions, the prolog/epilog is meaningless (and for a stack-based machine anyway). But if you have
int my_div(int x, int y) { return x/y; }
And callmy_div(somevalue, 4)
repeatedly... Unless you move the optimizer into the CPU (ugh!), you will never get the benefit of turning this into an shl 2. When looking at my C++ code, I see many places where this kind of optimization is what makes the whole thing fast. I'd omit cache misses (concerning data cache) here, since both execution mechanisms suffer about the same from it (although they *do* put up a major speed barrier nowadays..) So in closing, and IMO: "hardware-emulating" an MSIL CPU won't be fast. "true" MSIL CPU's will probably be to late. But as you can see I like the idea :rolleyes:
"Der Geist des Kriegers ist erwacht / Ich hab die Macht" StS
sighist | Agile Programming | doxygen -
Daniel Turini wrote: Normally, the stack already ends in the L1 and (mostly) on L2 cache anyways Yep, but I think there's room for improvement - after all, the stack takes over the role of registers, and it should probably be separate from the L1 "data cache". The main overhead for the simplest "get" property is still at least 4 vs. 1 instruction (and likely not single ticks): call - load to register - ret - load to target. This adds up with nice encapsulated property getters. (And would even require a "more clever" architecture) As you say, for "normal" functions, the prolog/epilog is meaningless (and for a stack-based machine anyway). But if you have
int my_div(int x, int y) { return x/y; }
And callmy_div(somevalue, 4)
repeatedly... Unless you move the optimizer into the CPU (ugh!), you will never get the benefit of turning this into an shl 2. When looking at my C++ code, I see many places where this kind of optimization is what makes the whole thing fast. I'd omit cache misses (concerning data cache) here, since both execution mechanisms suffer about the same from it (although they *do* put up a major speed barrier nowadays..) So in closing, and IMO: "hardware-emulating" an MSIL CPU won't be fast. "true" MSIL CPU's will probably be to late. But as you can see I like the idea :rolleyes:
"Der Geist des Kriegers ist erwacht / Ich hab die Macht" StS
sighist | Agile Programming | doxygenpeterchen wrote: But as you can see I like the idea Until MS releases generics into MSIL and you'll need a brand new CPU for running your C# STL :)
// Quantum sort algorithm implementation
while (!sorted)
; -
peterchen wrote: But as you can see I like the idea Until MS releases generics into MSIL and you'll need a brand new CPU for running your C# STL :)
// Quantum sort algorithm implementation
while (!sorted)
;