MSIL CPUs?

James Pullicino

If this whole .NET thing picks up, MS will be able to produce their own CPUs that natively support MSIL. Is this possible? Any thoughts?

NormDroid

thought wrote: If this whole .NET thing picks up, It has! :eek:

Daniel Turini

thought wrote: If this whole .NET thing picks up, MS will be able to produce their own CPUs that natively support MSIL. Is this possible? Any thoughts? It's a common misconception that MSIL is interpreted , just like Java bytecodes. MSIL is never interpreted. Ok, to be fair, Mono people did a interpreter to be able to run code while they didn’t have a JIT. But MS .NET Framework don’t have a MSIL interpreter. And compilers targeting MSIL do not have many chances for optimization. So, MSIL code is unoptimized code (from the POV of the CPU ASM), targeting a very simple, high-level stack-based machine. In other words, there are no registers and there are lots and lots of stack operations. Such a CPU would suck and have a very slow performance, because it would have to go to the main memory and wait, wait, wait (*).... Another point that defeats a MSIL CPU is that today, the JIT have to build a graph of your code and optimize it, this is the slowest part. Sometimes, it may even need to rewrite code to take advantage of inlining. Then, it comes the native code generation part, which is very quick; it’s almost only an encoding part. On most JITs and compilers, this part responds for only about 5%~10% of the processing cost. (*) Just as an example, my machine is a high end one with a 3Ghz P4 and 800Mhz FSB. This means, that, considering the best case, the CPU will have 4 clock cycles for each memory clock cycle. This, of course, assuming only the memory is using the FSB, no other activity is going on the bus. On a 3Ghz and 533 Mhz or 400Mhz FSB machine, which are much more common these days, this proportion is much worse. And it’s becoming worse. Newer processors are expected to have clocks 4Ghz~6Ghz, but memories are not keeping up with these speeds.

// Quantum sort algorithm implementation
while (!sorted)
;

Joey Bloggs

Daniel Turini wrote: Just as an example, my machine is a high end one with a 3Ghz P4 and 800Mhz FSB. This means, that, considering the best case, the CPU will have 4 clock cycles for each memory clock cycle. This, of course, assuming only the memory is using the FSB, no other activity is going on the bus. On a 3Ghz and 533 Mhz or 400Mhz FSB machine, which are much more common these days, this proportion is much worse. And it’s becoming worse. Newer processors are expected to have clocks 4Ghz~6Ghz, but memories are not keeping up with these speeds. ignoring onchip L1, L2 cache prefetch etc etc I remember someone proposing a java byte code chip to run JVM's i do not think it every got off the ground. Partly for technical reasons like the above but of much more concern was locking the design into silicon. Which would make it very hard to upgrade the CLR / JVM or bug fix it. Then of couse there would be issues of bugs in the silicon going back into the CLR / JVM.

jhwurmbach

Normski wrote: It has! Huh?:wtf: Only if you mean by that every single VB programmer is learning VB.NET at the moment. Otherwise, there are few applications in .NET (even for web-gimmics) Until now, .NET seems to be mainly whishful thinking.

Who is 'General Failure'? And why is he reading my harddisk?!?

Rob Manderson

And 20 years ago there were FORTH CPU's. Apparently they executed FORTH code very well indeed. Whilst I don't doubt that FORTH still exists I very much doubt it's a mainstream language. There's a reason why CPU's execute instructions at level X whilst we programmers (and those who pay us) want to write at level Y. Silicon does what silicon does and fairly efficiently. We programmers do what we do. The gap is bridged by the compiler. Whichever way you cut it, an addition and store breaks down into fetch 2 operands, add them and store the result. Might as well express that operation as discrete instructions and keep the generality (and the language independence). Rob Manderson http://www.mindprobes.net

Erick Sgarbi

I have worked already with Java enabled cpu's. As i remember i was sent to Sun microsystems to do some benchmarking on this piece of software in a Spark with a HotSpot i think is called. It was "as" fast as C++ in a normal spark (Oh god!!! :| ) but again, where are the PC's java enabled cpu's now? Or Sparks? Not in my garage I tell ya...;P Cheers, Erick

ColinDavies

Daniel Turini wrote: And it’s becoming worse. Newer processors are expected to have clocks 4Ghz~6Ghz, but memories are not keeping up with these speeds. Yup, the memory gap. This has been a real worry for several years now from what I have read. The way things are moving the 4 clock cycles per mem clock cycle gap will not take long to become 16x , then 64x etc. I might try to do the math on it, but when the gap forms it can only increase. CPU makers will then be forced to make larger and larger caches onboard. Yikes, it might even mean the end of RAM as we know it, because it will become as useful as Floppies are now. IRAM or VIRAM was meant to be the saviour but I have never seen a board for sale, so I guess we will be screwed. Questions. What difference will hyperthreading mean to this ? Will a HT CPU need faster busses and RAM or will it actually relax the need ? Regardz Colin J Davies

*** WARNING *
This could be addictive
**The minion's version of "Catch :bob: "

It's a real shame that people as stupid as you can work out how to use a computer. said by Christian Graus in the Soapbox

ColinDavies

COT 16 bit Forth CPUs were great, and 32 bit Forth CPUs appeared for a while. Because of the flexibility of Forth I bet a 64 bit CPU can be made easily as well. Regardz Colin J Davies

*** WARNING *
This could be addictive
**The minion's version of "Catch :bob: "

It's a real shame that people as stupid as you can work out how to use a computer. said by Christian Graus in the Soapbox

peterchen

These are serious limitations, but if you look at the evolution of x86 CPU's they have turned into kind of "hardwired interpreters for x86 assembly" - Breaking up complex instructions into bytecode, funneling them into various pipes, etc. I think the biggest hurdles are: 1) implementign a "top of stack" in hardware. On the plus side, all operations go to the "same location" (if you turn away from the an pointer-based stack implementation). Further, the stack can be large, thus we need a caching mechanism. A "good first shot" could be implementing a special 1st level cache for this, that tracks the stack pointer itself, and allows/prefers operations on top of stack. 2) Parallelizing stack engine code Stack engine code is tricky to parallelized (as pushing code through different execution pipes) However, a "code decoder" (like we have very often) could bundle an "operate on TOS and pop result" operation, and go on with the next pair (as long as they don't depend on the 3) call overhead. Two "intense" things: a) avoid the branch overhead when a property.get that just returns an internal var, and b) optimization possibilities when parameters are known. 4) Experience There's probably virtually no experience in HW-implementing stack-based engines, and not much experience in writing an optimizer for a stack-based execution environment. I think it's feasible technically, but economically it would take a long time - our experience with register-based parallelized execution engines is simply to far away.

"Der Geist des Kriegers ist erwacht / Ich hab die Macht" StS
sighist | Agile Programming | doxygen

Daniel Turini

Colin Davies wrote: What difference will hyperthreading mean to this ? Will a HT CPU need faster busses and RAM or will it actually relax the need ? It can actually make thing worse, because the CPU will be trying yo access sparse RAM areas and this can kill the L1 and L2 caches. Actually, I have a HT processor and it seems a bit faster with HT enabled, but I've seen some specific benchmarks showing that HT can actually make some software slower. Colin Davies wrote: Yikes, it might even mean the end of RAM as we know it, because it will become as useful as Floppies are now. Or maybe the RAM will be the HD of the future. Today, if we want to save a trip to the HD (because it's much slower than the RAM), we make a cache in RAM. Maybe in the future, we'll start saving trips to the RAM (actually, game programmers are doing this today).

// Quantum sort algorithm implementation
while (!sorted)
;

Daniel Turini

Joey Bloggs wrote: ignoring onchip L1, L2 cache prefetch etc etc Still, registers are much faster even than L1 access.

// Quantum sort algorithm implementation
while (!sorted)
;

ColinDavies

Daniel Turini wrote: but I've seen some specific benchmarks showing that HT can actually make some software slower. Yes I have heard that it can create bottle necks due to a lot of register swapping. Also its interesting to note the name Hyper Threading, when it only allows two threads to be executing at the same time. Maybe it should be called few threading. :-) I'm sure Intel intends to have a dynamic number of threads operating in the future, who knows what the limit will be. But surely the old heat problem will only worsen as you are still pushing more electrons through a small space. One analogy I thought of is that it will be like a CPU having it's own CPU and process system. (Unsure how real that is) While RAM increases for speed are dying RAM is still keeping up with the size issues. And HD space is getting ridiculous still. Like you say about RAM becoming the HD, first I think we will see RAM and HD's working more in co-operation, maybe a better interpretation of how the hibernate state works. I recently tried creating a RAM disk formatted into FAT16 of 2GB that was imaged on and off the HD, It worked Fast to say the least. And the implimentation I used was not the least bit clever. Where HD speed is a critical issue I think it could be an easy solution. With all the performance monitors we have on our boxes today, I would still like another one that could tell you what or which hardware should be upgraded and then return an analysis back in $$$ terms. Eg The best place you could spend 500$ on this box is buy purcahsing xyz. Regardz Colin J Davies

*** WARNING *
This could be addictive
**The minion's version of "Catch :bob: "

It's a real shame that people as stupid as you can work out how to use a computer. said by Christian Graus in the Soapbox

Mike Dimmick

There's also ARM's Jazelle technology, but I haven't worked with that yet. It adds Java bytecode support as an additional instruction set, in addition to the ARM (32-bit instruction word) and Thumb (16-bit instruction word) sets. Apparently it's a required part of the ARM Architecture 6[^] instruction set, so expect it to appear in future ARM Architecture processors. Just to clarify something that bugs me: ARM Architecture processors aren't all made by ARM; Intel ships two ARM Architecture families, the StrongARM SA-11xx series (Architecture 4) and the PXA series, also known as XScale (Architecture 5 with Thumb, DSP, and proprietary Intel DSP extensions). To complicate further, most third-party processors actually use an ARM-designed and licensed core with the third-party's own memory controller and other ancillary components. Intel actually design their own cores (OK, StrongARM was actually designed by DEC before they were bought out by Compaq).

NormDroid

jhwurmbach wrote: Otherwise, there are few applications in .NET (even for web-gimmics) Until now, .NET seems to be mainly whishful thinking. Nope we're a small software house with do most of our apps are develop in C++ (MFC/WTL). Some of the never versions of the datacentric apps are being developed in .net. Look at www.jobserve.com and tell me customers are not asking for .net skill, much as i hate to admit it .net has taken off, if you resist you've be left bye wayside. I would suggest opening your eyes and a take a look at the real picture.

Daniel Turini

peterchen wrote: 1) implementign a "top of stack" in hardware. On the plus side, all operations go to the "same location" (if you turn away from the an pointer-based stack implementation). Further, the stack can be large, thus we need a caching mechanism. A "good first shot" could be implementing a special 1st level cache for this, that tracks the stack pointer itself, and allows/prefers operations on top of stack. Normally, the stack already ends in the L1 and (mostly) on L2 cache anyways. peterchen wrote: 3) call overhead. Two "intense" things: a) avoid the branch overhead when a property.get that just returns an internal var, and b) optimization possibilities when parameters are known. This overhead only exists because a) cache misses (unavoidable) and b) prolog/epilog code. In the case of MSIL there's no prolog/epilog, all functions are naked, because there are no registers to save. This could virtually eliminate the need for code inlining.

// Quantum sort algorithm implementation
while (!sorted)
;

jhwurmbach

Normski wrote: www.jobserve.com and tell me customers are not asking for .net skill I have no idea about the job market in Australia (or the UK), but I think that there (like here in Germany), the technology in which an application will be implemeted is of secondary importance. There are a lot of managers writing .NET into the profile *just in case* one would need that. Microsoft Markeing works! Normski wrote: take a look at the real picture I am doing that. I try not to let the Microsoft Marketing blind me. Java is dead(for applications): Speed lacking, no cross-platform but simply a new platform, API growing exponential. It will be there for years to come, but it is not the great bright future. .NET has (in my eyes) not been tested yet. For real application, at least. I can make no statement about mobile tech gimmics, web applets and so on. A few real applications in .NET may eventually show if .NET is worth all the ballyhoo. While I like C# more than Java (it is something like a Java V2.0), I doubt that the .NET core will fare better than Java speed wise, and for cross platform development, .NET is simply not usable. There is no concept of cross platform for Microsoft (not needed from their point of view), even different versions of windows seem to make problems. So it is my belief that most of the noise we hear right now is generated by the Microsoft Marketing fireworks. This seems to be enough to make the newbies jump the train (sure: when starting to learn, you try to learn the newest stuff to be ahead of the crowd).

Who is 'General Failure'? And why is he reading my harddisk?!?

John M Drescher

Colin Davies wrote: I'm sure Intel intends to have a dynamic number of threads operating in the future, who knows what the limit will be. I think it would be 3 because there are two pipes for integer instructions. HT tries to make use of the wait states in the pipelines by allowing the execution of the other process in that space. With a long pipeline this makes sense because there are a lot of gaps in the execution because of dependencies between the pipelines and branching. They say on average every 5 instructions there is a branch. Well a P4 has a pipeline of over 20 stages. A branch mispredict will have to empty the cache and is a good place to use for hyperthreading. John

John M Drescher

I thought about the idea before but it would have to be produced by AMD or INTEL to be competitive. I MS produced a .NET CPU that was only 50% as fast of what Intel and AMD are producing then who would buy it?? In the x86 market there are other players such as VIA and transmenta. The thing is that the processors they produce are only as fast as P2 or P3 chips. And this is the best they can do. I guess MS with its financial situation could invest billions into research on this but it is a big gamble... John

Roger Wright

Colin Davies wrote: I think we will see RAM and HD's working more in co-operation It would be interesting to see the RAM incorporated into the HD as a single memory resource that is totally transparent to the CPU, with generic blocks of storage that can be simply plugged in when needed. A very swift RISC processor could handle the VM operations and respond to memory requests at RAM-like speeds, prefetching disk data into its own local cache using algorithms that predict future needs. Current disk interfaces would badly limit the performance, but optical connections might make performance acceptable.

"Welcome to Arizona!
Drive Nice - We're Armed..."
- Proposed Sign at CA/AZ Border