Does anyone know of a good guide to the MSIL JIT compiler?
-
I'm aware of that. I am generating MSIL instructions using Reflection Emit as part of my project. The other part generates source code. I would like to ensure that this source code generates IL that will be then be optimized appropriately by the JITter. If not, I will generate the source code differently, but my interest is in post-jitted code. Not the IL.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
"
Quote:
Running the code through a debugger and dropping to assembly. The only way I can do that reliably is with debug info, which may change how the JITter drops native instructions. I can't rely on it.
Probably, the answer is here: Do PDB Files Affect Performance? Generally, the answer is: No. Debugging information is just additional file, which helps debugger to match the native instructions and source code. Of course, if implemented correctly. The article is written by John Robbins.
-
"
Quote:
Running the code through a debugger and dropping to assembly. The only way I can do that reliably is with debug info, which may change how the JITter drops native instructions. I can't rely on it.
Probably, the answer is here: Do PDB Files Affect Performance? Generally, the answer is: No. Debugging information is just additional file, which helps debugger to match the native instructions and source code. Of course, if implemented correctly. The article is written by John Robbins.
I think that's about unmanaged code, and not the JITter
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
I think that's about unmanaged code, and not the JITter
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
Well, buzzwords like .NET, VB .NET, C#, JIT compiler, ILDASM are used in this article only by accident. You are right.
-
Well, buzzwords like .NET, VB .NET, C#, JIT compiler, ILDASM are used in this article only by accident. You are right.
I am tired and I read the first bit of it. Sorry. It's 3am here and I shouldn't be awake.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
Basically I am not sure about a number of things regarding how it works
if((this.current >= 'A' && this.current <= 'Z') ||
(this.current >= 'a' && this.current <= 'z')) {
// do something
}In MSIL you'd have to pepper the IL you drop for that
if
construct with a bunch of extraLdarg_0
arguments to retrieve thethis
reference for *each* comparison. On x86 CPUs (and well, most any CPU with registers, which IL doesn't really have unless you stretch the terminology to include its list of function arguments and locals) you'd load thethis
pointer into a register and work off that rather than repeatedly loading it onto the stack every time you need to access it as you would in IL. On pretty much any supporting architecture this is much faster than hitting stack. Maybe an order of magnitude. So my question is for example, is the JIT compiler smart enough to resolve those repeatedLdarg_0
s into register access? That's just one thing I want to know. Some avenues of research I considered to figure this out: 1. Running the code through a debugger and dropping to assembly. The only way I can do that reliably is with debug info, which may change how the JITter drops native instructions. I can't rely on it. 2. Using ngen and then disassembling the result but again, that's not JITted, but rather precompiled so things like whole program optimization are in play. I can't rely on it. And I can't find any material that will help me figure that out short of the very dry and difficult specs they release, which I'm not even sure tell me that, since the JIT compiler's actual implementation details aren't part of the standard. What I'm hoping for is something some clever Microsoft employee or blogger wrote that describes the behavior of Microsoft's JITter in some detail. There are some real world implications for some C# code that my library generates. I need to make some decisions about it and I feel like I don't have all the information I need.Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
Basically I am not sure about a number of things regarding how it works
if((this.current >= 'A' && this.current <= 'Z') ||
(this.current >= 'a' && this.current <= 'z')) {
// do something
}In MSIL you'd have to pepper the IL you drop for that
if
construct with a bunch of extraLdarg_0
arguments to retrieve thethis
reference for *each* comparison. On x86 CPUs (and well, most any CPU with registers, which IL doesn't really have unless you stretch the terminology to include its list of function arguments and locals) you'd load thethis
pointer into a register and work off that rather than repeatedly loading it onto the stack every time you need to access it as you would in IL. On pretty much any supporting architecture this is much faster than hitting stack. Maybe an order of magnitude. So my question is for example, is the JIT compiler smart enough to resolve those repeatedLdarg_0
s into register access? That's just one thing I want to know. Some avenues of research I considered to figure this out: 1. Running the code through a debugger and dropping to assembly. The only way I can do that reliably is with debug info, which may change how the JITter drops native instructions. I can't rely on it. 2. Using ngen and then disassembling the result but again, that's not JITted, but rather precompiled so things like whole program optimization are in play. I can't rely on it. And I can't find any material that will help me figure that out short of the very dry and difficult specs they release, which I'm not even sure tell me that, since the JIT compiler's actual implementation details aren't part of the standard. What I'm hoping for is something some clever Microsoft employee or blogger wrote that describes the behavior of Microsoft's JITter in some detail. There are some real world implications for some C# code that my library generates. I need to make some decisions about it and I feel like I don't have all the information I need.Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
Even if it did, I wouldn't assume that it always would and would do so on all systems. I would code explicitly and not use behaviour that isn't part of the doco.
Well, I didn't ask you what you would do. And this isn't bizdev
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
Probably not, since at best it uses Emit facilities and has nothing to do with the final JITter output
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
Probably not, since at best it uses Emit facilities and has nothing to do with the final JITter output
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
The JIT (as well as the rest of the runtime) is also [open source](https://github.com/dotnet/runtime/tree/main/src/coreclr/jit) - there's an `optimizer.cpp` in that directory, which might be of interest. Also in that directory is a file (`viewing-jit-dumps.md`) which talks about looking at disassembly, and also mentions [a Visual Studio plugin, Disasmo](https://marketplace.visualstudio.com/items?itemName=EgorBogatov.Disasmo), that simplifies this process. [Edit]Another option - use Godbolt - [it supports C#](https://godbolt.org/z/vnqvGdqfe)![/Edit]
Java, Basic, who cares - it's all a bunch of tree-hugging hippy cr*p
-
Basically I am not sure about a number of things regarding how it works
if((this.current >= 'A' && this.current <= 'Z') ||
(this.current >= 'a' && this.current <= 'z')) {
// do something
}In MSIL you'd have to pepper the IL you drop for that
if
construct with a bunch of extraLdarg_0
arguments to retrieve thethis
reference for *each* comparison. On x86 CPUs (and well, most any CPU with registers, which IL doesn't really have unless you stretch the terminology to include its list of function arguments and locals) you'd load thethis
pointer into a register and work off that rather than repeatedly loading it onto the stack every time you need to access it as you would in IL. On pretty much any supporting architecture this is much faster than hitting stack. Maybe an order of magnitude. So my question is for example, is the JIT compiler smart enough to resolve those repeatedLdarg_0
s into register access? That's just one thing I want to know. Some avenues of research I considered to figure this out: 1. Running the code through a debugger and dropping to assembly. The only way I can do that reliably is with debug info, which may change how the JITter drops native instructions. I can't rely on it. 2. Using ngen and then disassembling the result but again, that's not JITted, but rather precompiled so things like whole program optimization are in play. I can't rely on it. And I can't find any material that will help me figure that out short of the very dry and difficult specs they release, which I'm not even sure tell me that, since the JIT compiler's actual implementation details aren't part of the standard. What I'm hoping for is something some clever Microsoft employee or blogger wrote that describes the behavior of Microsoft's JITter in some detail. There are some real world implications for some C# code that my library generates. I need to make some decisions about it and I feel like I don't have all the information I need.Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
Why not download ILSpy[^] and nosey at the produced IL code? Just compile your application in release mode and take a look at the produced IL to see whether it's been optimised. I would hazard a guess that it probably doesn't optimise something like that, but I could be wrong!
-
Why not download ILSpy[^] and nosey at the produced IL code? Just compile your application in release mode and take a look at the produced IL to see whether it's been optimised. I would hazard a guess that it probably doesn't optimise something like that, but I could be wrong!
Because I'm not interested in the IL code, but in the post jitted native code.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
The JIT (as well as the rest of the runtime) is also [open source](https://github.com/dotnet/runtime/tree/main/src/coreclr/jit) - there's an `optimizer.cpp` in that directory, which might be of interest. Also in that directory is a file (`viewing-jit-dumps.md`) which talks about looking at disassembly, and also mentions [a Visual Studio plugin, Disasmo](https://marketplace.visualstudio.com/items?itemName=EgorBogatov.Disasmo), that simplifies this process. [Edit]Another option - use Godbolt - [it supports C#](https://godbolt.org/z/vnqvGdqfe)![/Edit]
Java, Basic, who cares - it's all a bunch of tree-hugging hippy cr*p
Oh wow. I learned two new things from your post. Thanks! Will check that out.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
Basically I am not sure about a number of things regarding how it works
if((this.current >= 'A' && this.current <= 'Z') ||
(this.current >= 'a' && this.current <= 'z')) {
// do something
}In MSIL you'd have to pepper the IL you drop for that
if
construct with a bunch of extraLdarg_0
arguments to retrieve thethis
reference for *each* comparison. On x86 CPUs (and well, most any CPU with registers, which IL doesn't really have unless you stretch the terminology to include its list of function arguments and locals) you'd load thethis
pointer into a register and work off that rather than repeatedly loading it onto the stack every time you need to access it as you would in IL. On pretty much any supporting architecture this is much faster than hitting stack. Maybe an order of magnitude. So my question is for example, is the JIT compiler smart enough to resolve those repeatedLdarg_0
s into register access? That's just one thing I want to know. Some avenues of research I considered to figure this out: 1. Running the code through a debugger and dropping to assembly. The only way I can do that reliably is with debug info, which may change how the JITter drops native instructions. I can't rely on it. 2. Using ngen and then disassembling the result but again, that's not JITted, but rather precompiled so things like whole program optimization are in play. I can't rely on it. And I can't find any material that will help me figure that out short of the very dry and difficult specs they release, which I'm not even sure tell me that, since the JIT compiler's actual implementation details aren't part of the standard. What I'm hoping for is something some clever Microsoft employee or blogger wrote that describes the behavior of Microsoft's JITter in some detail. There are some real world implications for some C# code that my library generates. I need to make some decisions about it and I feel like I don't have all the information I need.Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
"real world implications for some C# code that my library generates" I don't think this is a good line of research for that reason. They will change the JIT in the future. I wouldn't be surprised if there are minor version updates that change how it works. So how are you going to validate that optimizations that you put into place for one single version will continue to be valid for every version in the future and in the past?
-
"real world implications for some C# code that my library generates" I don't think this is a good line of research for that reason. They will change the JIT in the future. I wouldn't be surprised if there are minor version updates that change how it works. So how are you going to validate that optimizations that you put into place for one single version will continue to be valid for every version in the future and in the past?
If it's such a significant difference in the generated code then yes. Especially because in the case I outlined (turns out it does register access after all though) it would require relatively minor adjustments to my generated code to avoid that potential performance pitfall, and do so without significantly impacting readability. I don't like to wait around and hope that Microsoft will one day do the right thing. I've worked there.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
If it's such a significant difference in the generated code then yes. Especially because in the case I outlined (turns out it does register access after all though) it would require relatively minor adjustments to my generated code to avoid that potential performance pitfall, and do so without significantly impacting readability. I don't like to wait around and hope that Microsoft will one day do the right thing. I've worked there.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
I don't know of any developer using the gcc compiler suite who studies one or more of the code generators (quite a bunch is available) to learn how it works, in order to modify their source code to make one specific code generator produce some specific binary code. Not even "minor code adjustments". The code generator part(s) of gcc is a close parallel to the dotNet jitter. The IL is analogous to the API between the gcc source code parsers (and overall optimizer) and the gcc code generator. When you switch to a newer version of a gcc compiler, you do not adapt your C, C++, Fortran, or whatever, code for making one specific code generator create the very best code. Well, maybe you would do it, but I never met or heard of anyone else who would even consider adapting HLL source code to one specific gcc code generator. ...With one possible exception: Way back in time, when you would go to NetNews (aka. Usenet) for discussions, there was one developer who very intensely claimed that the C compiler for DEC VAX was completely useless! There was this one machine instruction that he wanted the compiler to generate for his C code, but he had found no way to force the compiler to do that. So the compiler was complete garbage! The discussion involved some very experienced VAX programmers, who could certify that this machine instruction would not at all speed up execution, or reduce the code size. It would have no advantages whatsoever to use that instruction. Yet the insistent developer continued insisting that when he wants that instruction, it is the compiler's d**n responsibility to provide a way to generate it. I guess that this fellow would go along with you in modifying the source code to fit one specific code generator. This happened in an age when offline digital storage was limited to (expensive) floppies, and URLs were not yet invented. I found the arguing from this fellow to be so funny that I did preserve it in a printout, where I can also find the specific instruction in question (I have forgotten which one), but that printout is buried deep down in one of my historical IT scrapbooks in the basement. I am not digging up that tonight.
Religious freedom is the freedom to say that two plus two make five.
-
I don't know of any developer using the gcc compiler suite who studies one or more of the code generators (quite a bunch is available) to learn how it works, in order to modify their source code to make one specific code generator produce some specific binary code. Not even "minor code adjustments". The code generator part(s) of gcc is a close parallel to the dotNet jitter. The IL is analogous to the API between the gcc source code parsers (and overall optimizer) and the gcc code generator. When you switch to a newer version of a gcc compiler, you do not adapt your C, C++, Fortran, or whatever, code for making one specific code generator create the very best code. Well, maybe you would do it, but I never met or heard of anyone else who would even consider adapting HLL source code to one specific gcc code generator. ...With one possible exception: Way back in time, when you would go to NetNews (aka. Usenet) for discussions, there was one developer who very intensely claimed that the C compiler for DEC VAX was completely useless! There was this one machine instruction that he wanted the compiler to generate for his C code, but he had found no way to force the compiler to do that. So the compiler was complete garbage! The discussion involved some very experienced VAX programmers, who could certify that this machine instruction would not at all speed up execution, or reduce the code size. It would have no advantages whatsoever to use that instruction. Yet the insistent developer continued insisting that when he wants that instruction, it is the compiler's d**n responsibility to provide a way to generate it. I guess that this fellow would go along with you in modifying the source code to fit one specific code generator. This happened in an age when offline digital storage was limited to (expensive) floppies, and URLs were not yet invented. I found the arguing from this fellow to be so funny that I did preserve it in a printout, where I can also find the specific instruction in question (I have forgotten which one), but that printout is buried deep down in one of my historical IT scrapbooks in the basement. I am not digging up that tonight.
Religious freedom is the freedom to say that two plus two make five.
You're comparing something that involves a total rewrite with a change that makes Advance() take an additional parameter, which it uses instead of current. So really, you're blowing this out of proportion.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
Well, I didn't ask you what you would do. And this isn't bizdev
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
If you make a question about some super-fine peephole optimization, an answer that says "Trying to do anything like that is a waste of your time" is an appropriate answer. Years ago, I could spend days timing and fine-tuning code, testing out various inline assembly variants. Gradually, I came to realize that the compiler would beat me almost every time. Instructions sequences that "looked like" being inefficient, actually run faster when I timed it. Since those days, CPUs have gotten even bigger caches, more lookahead, hyperthreading and whathaveyou, all confusing tight timing loops to the degree of making them useless. Writing (or generating) assembler code to suppress single instructions was meaningful in the days of true RISCs (including pre-1975 architectures when all machines were RISCs...) running at 1 instruction/cycle with (almost) no exception. Today, we are in a different world. I really should have spent the time to assembler code the example you bring up, with and without the repeated register load, and time them for you. But I have a very strong gut feeling of what it would show. I am so certain that I do not spend the time to do that for you.
Religious freedom is the freedom to say that two plus two make five.
-
If you make a question about some super-fine peephole optimization, an answer that says "Trying to do anything like that is a waste of your time" is an appropriate answer. Years ago, I could spend days timing and fine-tuning code, testing out various inline assembly variants. Gradually, I came to realize that the compiler would beat me almost every time. Instructions sequences that "looked like" being inefficient, actually run faster when I timed it. Since those days, CPUs have gotten even bigger caches, more lookahead, hyperthreading and whathaveyou, all confusing tight timing loops to the degree of making them useless. Writing (or generating) assembler code to suppress single instructions was meaningful in the days of true RISCs (including pre-1975 architectures when all machines were RISCs...) running at 1 instruction/cycle with (almost) no exception. Today, we are in a different world. I really should have spent the time to assembler code the example you bring up, with and without the repeated register load, and time them for you. But I have a very strong gut feeling of what it would show. I am so certain that I do not spend the time to do that for you.
Religious freedom is the freedom to say that two plus two make five.
I guess I just don't see looking at a new (to me) tech for code generation to see if it's doing what I expect in terms of performance as a waste of time. To be fair, I also look at the native output of my C++ code. I'm glad I have. Even if not especially the times when it ruined my day, like when I realized how craptastic the ESP32 floating point coprocessor was.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
I guess I just don't see looking at a new (to me) tech for code generation to see if it's doing what I expect in terms of performance as a waste of time. To be fair, I also look at the native output of my C++ code. I'm glad I have. Even if not especially the times when it ruined my day, like when I realized how craptastic the ESP32 floating point coprocessor was.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
If you are working on a jitter for one specific CPU, or a gcc code generator for one specific CPU, and your task is to improve the code generating, then you would study common methods for code generating and peephole optimization. If you are not developing or improving a code generator (whether gcc or jitter), the only reason for studying the architecture of one specific of them is for curiosity. Not for modifying your source code, not even with "minor adjustments". It can be both educating and interesting to study what resides a couple of layers below the layer you are working on. But you should remember that it is a couple of layers down. You are not working at that layer, and should not try to interfere with it. (I should mention that I grew up in an OSI protocol world. Not the one where all you know is that some people have something they call 'layers', but one where layers were separated by solid hulls, and service/protocol were as separated as oil and water. An application entity should never fiddle with TCP protocol elements or IP routing, shouldn't even know that they are there! 30+ years of OO programming, interface definitions, private and protected elements -- and still, developers have not learned to keep their fingers out of lower layers, neither in protocol stacks nor in general programming!)
Religious freedom is the freedom to say that two plus two make five.
-
You're comparing something that involves a total rewrite with a change that makes Advance() take an additional parameter, which it uses instead of current. So really, you're blowing this out of proportion.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
What I am saying is: Leave peephole optimizing to the compiler/code generator, and trust it at that. We have been doing that kind of optimizations since the 1960s (starting in the late 1950s!). It is routine work. Any reasonably well trained code generator developer will handle it well using his left hand. If you think you can improve on it, you are most likely wrong. And even if you manage to dig up some special case, for the reduction in time in the execution time of some user level operation, "percent" is likely to be a much too large unit. Spend your optimizing efforts on considering algorithms, and not the least: data structures. These are way more essential to user perceived execution speed that register allocation. Do timing at user level, not at instruction level.
Religious freedom is the freedom to say that two plus two make five.