Does anyone know of a good guide to the MSIL JIT compiler?
-
Basically I am not sure about a number of things regarding how it works
if((this.current >= 'A' && this.current <= 'Z') ||
(this.current >= 'a' && this.current <= 'z')) {
// do something
}In MSIL you'd have to pepper the IL you drop for that
if
construct with a bunch of extraLdarg_0
arguments to retrieve thethis
reference for *each* comparison. On x86 CPUs (and well, most any CPU with registers, which IL doesn't really have unless you stretch the terminology to include its list of function arguments and locals) you'd load thethis
pointer into a register and work off that rather than repeatedly loading it onto the stack every time you need to access it as you would in IL. On pretty much any supporting architecture this is much faster than hitting stack. Maybe an order of magnitude. So my question is for example, is the JIT compiler smart enough to resolve those repeatedLdarg_0
s into register access? That's just one thing I want to know. Some avenues of research I considered to figure this out: 1. Running the code through a debugger and dropping to assembly. The only way I can do that reliably is with debug info, which may change how the JITter drops native instructions. I can't rely on it. 2. Using ngen and then disassembling the result but again, that's not JITted, but rather precompiled so things like whole program optimization are in play. I can't rely on it. And I can't find any material that will help me figure that out short of the very dry and difficult specs they release, which I'm not even sure tell me that, since the JIT compiler's actual implementation details aren't part of the standard. What I'm hoping for is something some clever Microsoft employee or blogger wrote that describes the behavior of Microsoft's JITter in some detail. There are some real world implications for some C# code that my library generates. I need to make some decisions about it and I feel like I don't have all the information I need.Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
Could this be a possible workaround to avoid those extra JIT compiler arguments?
var cur = this.current;
if( cur >= 'A' && cur <= 'Z' || cur >= 'a' && cur <= 'z') {
// do something
} -
Could this be a possible workaround to avoid those extra JIT compiler arguments?
var cur = this.current;
if( cur >= 'A' && cur <= 'Z' || cur >= 'a' && cur <= 'z') {
// do something
}Not in the instance I'm using it in without a rework. I'd have to change the structure of the code, which is made more complicated by the fact that it's a CodeDOM tree instead of real code. Before I do that, I want to make sure I'm not (A) doing something for nothing, and more importantly (B) introducing clutter or extra overhead in an attempt to optimize. I've included a chunk of the state machine runner code which should illustrate the issue I hope.
int p;
int l;
int c;
ch = -1;
this.capture.Clear();
if ((this.current == -2)) {
this.Advance();
}
p = this.position;
l = this.line;
c = this.column;
// q0:
// [\t-\n\r ]
if (((((this.current >= 9)
&& (this.current <= 10))
|| (this.current == 13))
|| (this.current == 32))) {
this.Advance();
goto q1;
}
// [A-Z_hj-kmqxz]
if ((((((((((this.current >= 65)
&& (this.current <= 90))
|| (this.current == 95))
|| (this.current == 104))
|| ((this.current >= 106)
&& (this.current <= 107)))
|| (this.current == 109))
|| (this.current == 113))
|| (this.current == 120))
|| (this.current == 122))) {
this.Advance();
goto q2;
}
// [a]
if ((this.current == 97)) {
this.Advance();
goto q3;
}
// [b]
if ((this.current == 98)) {
this.Advance();
goto q22;
}
// ...snip...
q1:
// [\t-\n\r ]
if (((((this.current >= 9)
&& (this.current <= 10))
|| (this.current == 13))
|| (this.current == 32))) {
this.Advance();
goto q1;
}
return FAMatch.Create(2, this.capture.ToString(), p, l, c);
q2:
// [0-9A-Z_a-z]
if ((((((this.current >= 48)
&& (this.current <= 57))
|| ((this.current >= 65)
&& (this.current <= 90)))
|| (this.current == 95))
|| ((this.current >= 97)
&& (this.current <= 122)))) {
this.Advance();
goto q2;
}
return FAMatch.Create(0, this.capture.ToString(), p, l, c);Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
Basically I am not sure about a number of things regarding how it works
if((this.current >= 'A' && this.current <= 'Z') ||
(this.current >= 'a' && this.current <= 'z')) {
// do something
}In MSIL you'd have to pepper the IL you drop for that
if
construct with a bunch of extraLdarg_0
arguments to retrieve thethis
reference for *each* comparison. On x86 CPUs (and well, most any CPU with registers, which IL doesn't really have unless you stretch the terminology to include its list of function arguments and locals) you'd load thethis
pointer into a register and work off that rather than repeatedly loading it onto the stack every time you need to access it as you would in IL. On pretty much any supporting architecture this is much faster than hitting stack. Maybe an order of magnitude. So my question is for example, is the JIT compiler smart enough to resolve those repeatedLdarg_0
s into register access? That's just one thing I want to know. Some avenues of research I considered to figure this out: 1. Running the code through a debugger and dropping to assembly. The only way I can do that reliably is with debug info, which may change how the JITter drops native instructions. I can't rely on it. 2. Using ngen and then disassembling the result but again, that's not JITted, but rather precompiled so things like whole program optimization are in play. I can't rely on it. And I can't find any material that will help me figure that out short of the very dry and difficult specs they release, which I'm not even sure tell me that, since the JIT compiler's actual implementation details aren't part of the standard. What I'm hoping for is something some clever Microsoft employee or blogger wrote that describes the behavior of Microsoft's JITter in some detail. There are some real world implications for some C# code that my library generates. I need to make some decisions about it and I feel like I don't have all the information I need.Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
Don't expect to see any optimizations in MSIL code, even in Release configuration. They are done by JIT-compiler, and may be more effective, since exact CPU type is known at runtime. You may try to see optimized real Assembly code, but this is difficult task, since there is huge distance from the source C# code and MSIL to machine language instructions.
-
Don't expect to see any optimizations in MSIL code, even in Release configuration. They are done by JIT-compiler, and may be more effective, since exact CPU type is known at runtime. You may try to see optimized real Assembly code, but this is difficult task, since there is huge distance from the source C# code and MSIL to machine language instructions.
I'm aware of that. I am generating MSIL instructions using Reflection Emit as part of my project. The other part generates source code. I would like to ensure that this source code generates IL that will be then be optimized appropriately by the JITter. If not, I will generate the source code differently, but my interest is in post-jitted code. Not the IL.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
If I select code generating for 'Any CPU' the main compiler will generate an IL assembly, which is processed by the JITter when the assembly is run for the first time. At the moment, I am running 32 bit CLR, and that jitter generates exactly the same binary code as the 'x86' CPU option. I'd be very surprised if they were different. I'd be very surprised if there were two different x86 code generators. The linkers do completely different jobs, but not the code generators. I do not understand where MS could do some magic that is not visible in the generated code.
Religious freedom is the freedom to say that two plus two make five.
That seems to be assuming more than I am usually comfortable with when it comes to MS. I've worked at Microsoft and with Microsoft code enough to expect the unexpected deep in the bowels of their frameworks. You should have seen me wrestle with the some less oft used typelib generation functions in oleaut32.dll. I was working there at the time, and nobody could answer me about what the heck they were doing. If they made the JITter produce different code for debug builds than release, it would be totally on brand for them, is what I'm saying, no matter if it's not intuitive. You can't put anything past these people. You really can't. And I know those are names, but Debug generates debug symbols and such. Does the jitter for example? do something different if a pdb is present? Or some other magic signaled by the linker dropping some flag in the binary's metadata? Probably not. "Probably" is doing a lot of heavy lifting there.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
I'm aware of that. I am generating MSIL instructions using Reflection Emit as part of my project. The other part generates source code. I would like to ensure that this source code generates IL that will be then be optimized appropriately by the JITter. If not, I will generate the source code differently, but my interest is in post-jitted code. Not the IL.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
"
Quote:
Running the code through a debugger and dropping to assembly. The only way I can do that reliably is with debug info, which may change how the JITter drops native instructions. I can't rely on it.
Probably, the answer is here: Do PDB Files Affect Performance? Generally, the answer is: No. Debugging information is just additional file, which helps debugger to match the native instructions and source code. Of course, if implemented correctly. The article is written by John Robbins.
-
"
Quote:
Running the code through a debugger and dropping to assembly. The only way I can do that reliably is with debug info, which may change how the JITter drops native instructions. I can't rely on it.
Probably, the answer is here: Do PDB Files Affect Performance? Generally, the answer is: No. Debugging information is just additional file, which helps debugger to match the native instructions and source code. Of course, if implemented correctly. The article is written by John Robbins.
I think that's about unmanaged code, and not the JITter
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
I think that's about unmanaged code, and not the JITter
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
Well, buzzwords like .NET, VB .NET, C#, JIT compiler, ILDASM are used in this article only by accident. You are right.
-
Well, buzzwords like .NET, VB .NET, C#, JIT compiler, ILDASM are used in this article only by accident. You are right.
I am tired and I read the first bit of it. Sorry. It's 3am here and I shouldn't be awake.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
Basically I am not sure about a number of things regarding how it works
if((this.current >= 'A' && this.current <= 'Z') ||
(this.current >= 'a' && this.current <= 'z')) {
// do something
}In MSIL you'd have to pepper the IL you drop for that
if
construct with a bunch of extraLdarg_0
arguments to retrieve thethis
reference for *each* comparison. On x86 CPUs (and well, most any CPU with registers, which IL doesn't really have unless you stretch the terminology to include its list of function arguments and locals) you'd load thethis
pointer into a register and work off that rather than repeatedly loading it onto the stack every time you need to access it as you would in IL. On pretty much any supporting architecture this is much faster than hitting stack. Maybe an order of magnitude. So my question is for example, is the JIT compiler smart enough to resolve those repeatedLdarg_0
s into register access? That's just one thing I want to know. Some avenues of research I considered to figure this out: 1. Running the code through a debugger and dropping to assembly. The only way I can do that reliably is with debug info, which may change how the JITter drops native instructions. I can't rely on it. 2. Using ngen and then disassembling the result but again, that's not JITted, but rather precompiled so things like whole program optimization are in play. I can't rely on it. And I can't find any material that will help me figure that out short of the very dry and difficult specs they release, which I'm not even sure tell me that, since the JIT compiler's actual implementation details aren't part of the standard. What I'm hoping for is something some clever Microsoft employee or blogger wrote that describes the behavior of Microsoft's JITter in some detail. There are some real world implications for some C# code that my library generates. I need to make some decisions about it and I feel like I don't have all the information I need.Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
Basically I am not sure about a number of things regarding how it works
if((this.current >= 'A' && this.current <= 'Z') ||
(this.current >= 'a' && this.current <= 'z')) {
// do something
}In MSIL you'd have to pepper the IL you drop for that
if
construct with a bunch of extraLdarg_0
arguments to retrieve thethis
reference for *each* comparison. On x86 CPUs (and well, most any CPU with registers, which IL doesn't really have unless you stretch the terminology to include its list of function arguments and locals) you'd load thethis
pointer into a register and work off that rather than repeatedly loading it onto the stack every time you need to access it as you would in IL. On pretty much any supporting architecture this is much faster than hitting stack. Maybe an order of magnitude. So my question is for example, is the JIT compiler smart enough to resolve those repeatedLdarg_0
s into register access? That's just one thing I want to know. Some avenues of research I considered to figure this out: 1. Running the code through a debugger and dropping to assembly. The only way I can do that reliably is with debug info, which may change how the JITter drops native instructions. I can't rely on it. 2. Using ngen and then disassembling the result but again, that's not JITted, but rather precompiled so things like whole program optimization are in play. I can't rely on it. And I can't find any material that will help me figure that out short of the very dry and difficult specs they release, which I'm not even sure tell me that, since the JIT compiler's actual implementation details aren't part of the standard. What I'm hoping for is something some clever Microsoft employee or blogger wrote that describes the behavior of Microsoft's JITter in some detail. There are some real world implications for some C# code that my library generates. I need to make some decisions about it and I feel like I don't have all the information I need.Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
Even if it did, I wouldn't assume that it always would and would do so on all systems. I would code explicitly and not use behaviour that isn't part of the doco.
Well, I didn't ask you what you would do. And this isn't bizdev
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
Probably not, since at best it uses Emit facilities and has nothing to do with the final JITter output
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
Probably not, since at best it uses Emit facilities and has nothing to do with the final JITter output
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
The JIT (as well as the rest of the runtime) is also [open source](https://github.com/dotnet/runtime/tree/main/src/coreclr/jit) - there's an `optimizer.cpp` in that directory, which might be of interest. Also in that directory is a file (`viewing-jit-dumps.md`) which talks about looking at disassembly, and also mentions [a Visual Studio plugin, Disasmo](https://marketplace.visualstudio.com/items?itemName=EgorBogatov.Disasmo), that simplifies this process. [Edit]Another option - use Godbolt - [it supports C#](https://godbolt.org/z/vnqvGdqfe)![/Edit]
Java, Basic, who cares - it's all a bunch of tree-hugging hippy cr*p
-
Basically I am not sure about a number of things regarding how it works
if((this.current >= 'A' && this.current <= 'Z') ||
(this.current >= 'a' && this.current <= 'z')) {
// do something
}In MSIL you'd have to pepper the IL you drop for that
if
construct with a bunch of extraLdarg_0
arguments to retrieve thethis
reference for *each* comparison. On x86 CPUs (and well, most any CPU with registers, which IL doesn't really have unless you stretch the terminology to include its list of function arguments and locals) you'd load thethis
pointer into a register and work off that rather than repeatedly loading it onto the stack every time you need to access it as you would in IL. On pretty much any supporting architecture this is much faster than hitting stack. Maybe an order of magnitude. So my question is for example, is the JIT compiler smart enough to resolve those repeatedLdarg_0
s into register access? That's just one thing I want to know. Some avenues of research I considered to figure this out: 1. Running the code through a debugger and dropping to assembly. The only way I can do that reliably is with debug info, which may change how the JITter drops native instructions. I can't rely on it. 2. Using ngen and then disassembling the result but again, that's not JITted, but rather precompiled so things like whole program optimization are in play. I can't rely on it. And I can't find any material that will help me figure that out short of the very dry and difficult specs they release, which I'm not even sure tell me that, since the JIT compiler's actual implementation details aren't part of the standard. What I'm hoping for is something some clever Microsoft employee or blogger wrote that describes the behavior of Microsoft's JITter in some detail. There are some real world implications for some C# code that my library generates. I need to make some decisions about it and I feel like I don't have all the information I need.Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
Why not download ILSpy[^] and nosey at the produced IL code? Just compile your application in release mode and take a look at the produced IL to see whether it's been optimised. I would hazard a guess that it probably doesn't optimise something like that, but I could be wrong!
-
Why not download ILSpy[^] and nosey at the produced IL code? Just compile your application in release mode and take a look at the produced IL to see whether it's been optimised. I would hazard a guess that it probably doesn't optimise something like that, but I could be wrong!
Because I'm not interested in the IL code, but in the post jitted native code.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
The JIT (as well as the rest of the runtime) is also [open source](https://github.com/dotnet/runtime/tree/main/src/coreclr/jit) - there's an `optimizer.cpp` in that directory, which might be of interest. Also in that directory is a file (`viewing-jit-dumps.md`) which talks about looking at disassembly, and also mentions [a Visual Studio plugin, Disasmo](https://marketplace.visualstudio.com/items?itemName=EgorBogatov.Disasmo), that simplifies this process. [Edit]Another option - use Godbolt - [it supports C#](https://godbolt.org/z/vnqvGdqfe)![/Edit]
Java, Basic, who cares - it's all a bunch of tree-hugging hippy cr*p
Oh wow. I learned two new things from your post. Thanks! Will check that out.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
Basically I am not sure about a number of things regarding how it works
if((this.current >= 'A' && this.current <= 'Z') ||
(this.current >= 'a' && this.current <= 'z')) {
// do something
}In MSIL you'd have to pepper the IL you drop for that
if
construct with a bunch of extraLdarg_0
arguments to retrieve thethis
reference for *each* comparison. On x86 CPUs (and well, most any CPU with registers, which IL doesn't really have unless you stretch the terminology to include its list of function arguments and locals) you'd load thethis
pointer into a register and work off that rather than repeatedly loading it onto the stack every time you need to access it as you would in IL. On pretty much any supporting architecture this is much faster than hitting stack. Maybe an order of magnitude. So my question is for example, is the JIT compiler smart enough to resolve those repeatedLdarg_0
s into register access? That's just one thing I want to know. Some avenues of research I considered to figure this out: 1. Running the code through a debugger and dropping to assembly. The only way I can do that reliably is with debug info, which may change how the JITter drops native instructions. I can't rely on it. 2. Using ngen and then disassembling the result but again, that's not JITted, but rather precompiled so things like whole program optimization are in play. I can't rely on it. And I can't find any material that will help me figure that out short of the very dry and difficult specs they release, which I'm not even sure tell me that, since the JIT compiler's actual implementation details aren't part of the standard. What I'm hoping for is something some clever Microsoft employee or blogger wrote that describes the behavior of Microsoft's JITter in some detail. There are some real world implications for some C# code that my library generates. I need to make some decisions about it and I feel like I don't have all the information I need.Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
"real world implications for some C# code that my library generates" I don't think this is a good line of research for that reason. They will change the JIT in the future. I wouldn't be surprised if there are minor version updates that change how it works. So how are you going to validate that optimizations that you put into place for one single version will continue to be valid for every version in the future and in the past?
-
"real world implications for some C# code that my library generates" I don't think this is a good line of research for that reason. They will change the JIT in the future. I wouldn't be surprised if there are minor version updates that change how it works. So how are you going to validate that optimizations that you put into place for one single version will continue to be valid for every version in the future and in the past?
If it's such a significant difference in the generated code then yes. Especially because in the case I outlined (turns out it does register access after all though) it would require relatively minor adjustments to my generated code to avoid that potential performance pitfall, and do so without significantly impacting readability. I don't like to wait around and hope that Microsoft will one day do the right thing. I've worked there.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
If it's such a significant difference in the generated code then yes. Especially because in the case I outlined (turns out it does register access after all though) it would require relatively minor adjustments to my generated code to avoid that potential performance pitfall, and do so without significantly impacting readability. I don't like to wait around and hope that Microsoft will one day do the right thing. I've worked there.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
I don't know of any developer using the gcc compiler suite who studies one or more of the code generators (quite a bunch is available) to learn how it works, in order to modify their source code to make one specific code generator produce some specific binary code. Not even "minor code adjustments". The code generator part(s) of gcc is a close parallel to the dotNet jitter. The IL is analogous to the API between the gcc source code parsers (and overall optimizer) and the gcc code generator. When you switch to a newer version of a gcc compiler, you do not adapt your C, C++, Fortran, or whatever, code for making one specific code generator create the very best code. Well, maybe you would do it, but I never met or heard of anyone else who would even consider adapting HLL source code to one specific gcc code generator. ...With one possible exception: Way back in time, when you would go to NetNews (aka. Usenet) for discussions, there was one developer who very intensely claimed that the C compiler for DEC VAX was completely useless! There was this one machine instruction that he wanted the compiler to generate for his C code, but he had found no way to force the compiler to do that. So the compiler was complete garbage! The discussion involved some very experienced VAX programmers, who could certify that this machine instruction would not at all speed up execution, or reduce the code size. It would have no advantages whatsoever to use that instruction. Yet the insistent developer continued insisting that when he wants that instruction, it is the compiler's d**n responsibility to provide a way to generate it. I guess that this fellow would go along with you in modifying the source code to fit one specific code generator. This happened in an age when offline digital storage was limited to (expensive) floppies, and URLs were not yet invented. I found the arguing from this fellow to be so funny that I did preserve it in a printout, where I can also find the specific instruction in question (I have forgotten which one), but that printout is buried deep down in one of my historical IT scrapbooks in the basement. I am not digging up that tonight.
Religious freedom is the freedom to say that two plus two make five.