Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. The Lounge
  3. Does anyone know of a good guide to the MSIL JIT compiler?

Does anyone know of a good guide to the MSIL JIT compiler?

Scheduled Pinned Locked Moved The Lounge
designdebuggingtutorialquestioncsharp
54 Posts 11 Posters 4 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • T trønderen

    If you make a question about some super-fine peephole optimization, an answer that says "Trying to do anything like that is a waste of your time" is an appropriate answer. Years ago, I could spend days timing and fine-tuning code, testing out various inline assembly variants. Gradually, I came to realize that the compiler would beat me almost every time. Instructions sequences that "looked like" being inefficient, actually run faster when I timed it. Since those days, CPUs have gotten even bigger caches, more lookahead, hyperthreading and whathaveyou, all confusing tight timing loops to the degree of making them useless. Writing (or generating) assembler code to suppress single instructions was meaningful in the days of true RISCs (including pre-1975 architectures when all machines were RISCs...) running at 1 instruction/cycle with (almost) no exception. Today, we are in a different world. I really should have spent the time to assembler code the example you bring up, with and without the repeated register load, and time them for you. But I have a very strong gut feeling of what it would show. I am so certain that I do not spend the time to do that for you.

    Religious freedom is the freedom to say that two plus two make five.

    H Offline
    H Offline
    honey the codewitch
    wrote on last edited by
    #33

    I guess I just don't see looking at a new (to me) tech for code generation to see if it's doing what I expect in terms of performance as a waste of time. To be fair, I also look at the native output of my C++ code. I'm glad I have. Even if not especially the times when it ruined my day, like when I realized how craptastic the ESP32 floating point coprocessor was.

    Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

    T 1 Reply Last reply
    0
    • H honey the codewitch

      I guess I just don't see looking at a new (to me) tech for code generation to see if it's doing what I expect in terms of performance as a waste of time. To be fair, I also look at the native output of my C++ code. I'm glad I have. Even if not especially the times when it ruined my day, like when I realized how craptastic the ESP32 floating point coprocessor was.

      Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

      T Offline
      T Offline
      trønderen
      wrote on last edited by
      #34

      If you are working on a jitter for one specific CPU, or a gcc code generator for one specific CPU, and your task is to improve the code generating, then you would study common methods for code generating and peephole optimization. If you are not developing or improving a code generator (whether gcc or jitter), the only reason for studying the architecture of one specific of them is for curiosity. Not for modifying your source code, not even with "minor adjustments". It can be both educating and interesting to study what resides a couple of layers below the layer you are working on. But you should remember that it is a couple of layers down. You are not working at that layer, and should not try to interfere with it. (I should mention that I grew up in an OSI protocol world. Not the one where all you know is that some people have something they call 'layers', but one where layers were separated by solid hulls, and service/protocol were as separated as oil and water. An application entity should never fiddle with TCP protocol elements or IP routing, shouldn't even know that they are there! 30+ years of OO programming, interface definitions, private and protected elements -- and still, developers have not learned to keep their fingers out of lower layers, neither in protocol stacks nor in general programming!)

      Religious freedom is the freedom to say that two plus two make five.

      1 Reply Last reply
      0
      • H honey the codewitch

        You're comparing something that involves a total rewrite with a change that makes Advance() take an additional parameter, which it uses instead of current. So really, you're blowing this out of proportion.

        Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

        T Offline
        T Offline
        trønderen
        wrote on last edited by
        #35

        What I am saying is: Leave peephole optimizing to the compiler/code generator, and trust it at that. We have been doing that kind of optimizations since the 1960s (starting in the late 1950s!). It is routine work. Any reasonably well trained code generator developer will handle it well using his left hand. If you think you can improve on it, you are most likely wrong. And even if you manage to dig up some special case, for the reduction in time in the execution time of some user level operation, "percent" is likely to be a much too large unit. Spend your optimizing efforts on considering algorithms, and not the least: data structures. These are way more essential to user perceived execution speed that register allocation. Do timing at user level, not at instruction level.

        Religious freedom is the freedom to say that two plus two make five.

        1 Reply Last reply
        0
        • H honey the codewitch

          Because I'm not interested in the IL code, but in the post jitted native code.

          Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

          T Offline
          T Offline
          trønderen
          wrote on last edited by
          #36

          Maybe you ought to be more interested in the IL code. The optimizations that is really significant to you application are not concerned with register loads, but with techniques such as moving invariant code out of loops, doing arithmetic simplifications etc. Peephole optimization (done at code generation) is a combination of "already done, always" and "no real effect on execution time". I have had a few surprises with C# performance, but they were typically related to data structures, and they were discovered by timing at application level. To pick one example: I suspected that moving a variable out of a single-instance class, making it a static, would make address calculation simpler and faster, compared to addressing a variable within an object instance. I was seriously wrong; that slowed down the application significantly. I could have (maybe should have) dug into the binary code to see what made addressing a static location significantly slower, but as I knew the effect already, I didn't spend the time when I was working on that application.

          Religious freedom is the freedom to say that two plus two make five.

          H 1 Reply Last reply
          0
          • T trønderen

            Maybe you ought to be more interested in the IL code. The optimizations that is really significant to you application are not concerned with register loads, but with techniques such as moving invariant code out of loops, doing arithmetic simplifications etc. Peephole optimization (done at code generation) is a combination of "already done, always" and "no real effect on execution time". I have had a few surprises with C# performance, but they were typically related to data structures, and they were discovered by timing at application level. To pick one example: I suspected that moving a variable out of a single-instance class, making it a static, would make address calculation simpler and faster, compared to addressing a variable within an object instance. I was seriously wrong; that slowed down the application significantly. I could have (maybe should have) dug into the binary code to see what made addressing a static location significantly slower, but as I knew the effect already, I didn't spend the time when I was working on that application.

            Religious freedom is the freedom to say that two plus two make five.

            H Offline
            H Offline
            honey the codewitch
            wrote on last edited by
            #37

            I'm intimately familiar with the IL code already. I both generate code that then gets compiled to it, and I Reflection Emit it directly. I get that you don't want me to be concerned about the things that I am concerned about. Get that I am anyway. I already optimized at the application level. I should add, I inlined one method and got a 20% performance increase. That's strictly jit manipulation. You don't think it's worth it. My tests say otherwise. And one more thing - not paying attention to this? That along with some broken benchmarks (which shielded me from seeing the performance issues) led me into a huge mess. Sure if you're writing an e-commerce site you don't have to be concerned with inner loop performance and "performance critical codepaths" because to the degree that you have them, they are measured in seconds to complete or longer. Lexing, or regex searching is not that. If you don't think manipulating the jitter is worth it then why don't you ask microsoft why they mark up their generated regex code with attributes specifically designed to manipulate the jitted code?

            Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

            T 2 Replies Last reply
            0
            • H honey the codewitch

              I'm intimately familiar with the IL code already. I both generate code that then gets compiled to it, and I Reflection Emit it directly. I get that you don't want me to be concerned about the things that I am concerned about. Get that I am anyway. I already optimized at the application level. I should add, I inlined one method and got a 20% performance increase. That's strictly jit manipulation. You don't think it's worth it. My tests say otherwise. And one more thing - not paying attention to this? That along with some broken benchmarks (which shielded me from seeing the performance issues) led me into a huge mess. Sure if you're writing an e-commerce site you don't have to be concerned with inner loop performance and "performance critical codepaths" because to the degree that you have them, they are measured in seconds to complete or longer. Lexing, or regex searching is not that. If you don't think manipulating the jitter is worth it then why don't you ask microsoft why they mark up their generated regex code with attributes specifically designed to manipulate the jitted code?

              Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

              T Offline
              T Offline
              trønderen
              wrote on last edited by
              #38

              "I should add, I inlined one method and got a 20% performance increase." You are not telling, 20% of what? The entire application, or that specific function call? And: Inlining is not peephole optimization. The jitter didn't do that. The compiler generating the IL did. Inlining isn't quite at the same level as changing the algorithm, but much closer to that than to register allocation. In another post, I mentioned my experience with trying to make a variable static, rather than local to the single instance. Inlining is more at that level. I am saying that modifying your source code to affect peephole optimization is a waste of energy. Inlining is at a different level, and might be worth it, especially if the method is small and called only in a few places.

              Religious freedom is the freedom to say that two plus two make five.

              H 1 Reply Last reply
              0
              • T trønderen

                "I should add, I inlined one method and got a 20% performance increase." You are not telling, 20% of what? The entire application, or that specific function call? And: Inlining is not peephole optimization. The jitter didn't do that. The compiler generating the IL did. Inlining isn't quite at the same level as changing the algorithm, but much closer to that than to register allocation. In another post, I mentioned my experience with trying to make a variable static, rather than local to the single instance. Inlining is more at that level. I am saying that modifying your source code to affect peephole optimization is a waste of energy. Inlining is at a different level, and might be worth it, especially if the method is small and called only in a few places.

                Religious freedom is the freedom to say that two plus two make five.

                H Offline
                H Offline
                honey the codewitch
                wrote on last edited by
                #39

                20% of my total execution time, lexing a document end to end. > The jitter didn't do that. The compiler generating the IL did. Sorry, but that's just categorically false. The method is created and called in the IL code. It's only inline when jitted.

                [System.Runtime.CompilerServices.MethodImpl(System.Runtime.CompilerServices.MethodImplOptions.AggressiveInlining)]

                Feel free to play around and mark your code up with that attribute. Watch the compiled results, and the jitted results. You'll see the compiler still drops your method in the assembly, and still drops the callvirt opcode to call it. THE JITTER is what inlines it.

                Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

                T 1 Reply Last reply
                0
                • H honey the codewitch

                  I'm intimately familiar with the IL code already. I both generate code that then gets compiled to it, and I Reflection Emit it directly. I get that you don't want me to be concerned about the things that I am concerned about. Get that I am anyway. I already optimized at the application level. I should add, I inlined one method and got a 20% performance increase. That's strictly jit manipulation. You don't think it's worth it. My tests say otherwise. And one more thing - not paying attention to this? That along with some broken benchmarks (which shielded me from seeing the performance issues) led me into a huge mess. Sure if you're writing an e-commerce site you don't have to be concerned with inner loop performance and "performance critical codepaths" because to the degree that you have them, they are measured in seconds to complete or longer. Lexing, or regex searching is not that. If you don't think manipulating the jitter is worth it then why don't you ask microsoft why they mark up their generated regex code with attributes specifically designed to manipulate the jitted code?

                  Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

                  T Offline
                  T Offline
                  trønderen
                  wrote on last edited by
                  #40

                  Another remark to micro-optimization: In the 1990s, I was teaching basic computer architecture at a community college. To get a hands-on feeling of real registers and memory and such, the students did a few hand-ins of assembler coding. I was strongly stressing that we must strive to make code - especially assembler - as readable as possible. To zero the AX register, you move 0 into it: MOV AX, 0. One of the students insisted that the proper way to zero AX is XOR AX, AX. No, I did not accept that; that is incomprehensible tribal language. The student insisted: But XOR is faster! So I brought him timing tables for both the 286 and 386 (our machines had 386s) to show him that the instructions took exactly the same number of clock cycles. He returned with timing diagrams for the 8086, showing that XOR would save 1 clock cycle on that processor. He wanted to write software running at maximum speed at all machines, even old 8086s! He was not willing to sacrifice a single clock cycle for the sake of more readable code! So for the next homework problem, he handed in a listing headed by "This is how we are forced by the lecturer to program it:", and a very readable, neat solution. This was followed by a commented-out section: "And this is how real programmers do it:", with some really messy code. I didn't care to uncomment the second alternative for the purpose of timing it. Why did I come to think of this old memory right now?

                  Religious freedom is the freedom to say that two plus two make five.

                  H 1 Reply Last reply
                  0
                  • T trønderen

                    Another remark to micro-optimization: In the 1990s, I was teaching basic computer architecture at a community college. To get a hands-on feeling of real registers and memory and such, the students did a few hand-ins of assembler coding. I was strongly stressing that we must strive to make code - especially assembler - as readable as possible. To zero the AX register, you move 0 into it: MOV AX, 0. One of the students insisted that the proper way to zero AX is XOR AX, AX. No, I did not accept that; that is incomprehensible tribal language. The student insisted: But XOR is faster! So I brought him timing tables for both the 286 and 386 (our machines had 386s) to show him that the instructions took exactly the same number of clock cycles. He returned with timing diagrams for the 8086, showing that XOR would save 1 clock cycle on that processor. He wanted to write software running at maximum speed at all machines, even old 8086s! He was not willing to sacrifice a single clock cycle for the sake of more readable code! So for the next homework problem, he handed in a listing headed by "This is how we are forced by the lecturer to program it:", and a very readable, neat solution. This was followed by a commented-out section: "And this is how real programmers do it:", with some really messy code. I didn't care to uncomment the second alternative for the purpose of timing it. Why did I come to think of this old memory right now?

                    Religious freedom is the freedom to say that two plus two make five.

                    H Offline
                    H Offline
                    honey the codewitch
                    wrote on last edited by
                    #41

                    We're not talking about xor ax,ax vs mov ax, 0 though. We're talking about changing a field reference to a local variable. You couldn't even see the xor ax,ax vs mov ax,0 in C. You'd have to drop to asm. In this case, you certainly see it, and the code does something quite different in each case. It's up to the jitter in this case, to make that leap of intent in the code, because compiler doesn't do it. I suspected it did make that leap. But I won't apologize for checking, or deciding it's worth finding out. Also A) We're talking about generated code and B) This change does next to nothing in terms of affecting its readability.

                    Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

                    T 1 Reply Last reply
                    0
                    • H honey the codewitch

                      20% of my total execution time, lexing a document end to end. > The jitter didn't do that. The compiler generating the IL did. Sorry, but that's just categorically false. The method is created and called in the IL code. It's only inline when jitted.

                      [System.Runtime.CompilerServices.MethodImpl(System.Runtime.CompilerServices.MethodImplOptions.AggressiveInlining)]

                      Feel free to play around and mark your code up with that attribute. Watch the compiled results, and the jitted results. You'll see the compiler still drops your method in the assembly, and still drops the callvirt opcode to call it. THE JITTER is what inlines it.

                      Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

                      T Offline
                      T Offline
                      trønderen
                      wrote on last edited by
                      #42

                      Yes, I guess I was wrong about that. In C#, inlining is not absolute, just a proposal. The compiler/jitter is free to ignore the proposal. That depends on the cost of a call, which varies a lot with CPU architecture. I would guess that on an ARM, a lot more functions are not inlined, even if proposed by the developer, as calls are significantly cheaper on the ARM than on, say, x8x/x64. Nevertheless, even if the code generator makes the final decision based on that CPUs specific instruction set and instruction timing, inlining is something you relate to at source code level. Compare it to unrolling a tight loop with a small, fixed number of iterations. Or use of #define expressions in C/C++. It is not at the level of which instructions are generated. (Well, of course all source code has an effect on code generated, but not at the level of selecting specific coding techniques.) If a method is inlined on both architecture X and architecture Y, that is the same structural code change, regardless of X and Y instruction set. I saw the inlining option a generation ago, when it was a new concept. Then it was a directive to be honored, not a meek proposal. That was at a time when you could also direct a variable to reside in a specific register its entire lifetime. Experience showed that the compiler might know better ... (So we started trusting the compilers!). Note that leaving the decision whether to inline or not might restrict the freedom of the higher level optimizer: If it takes care of the inlining above code generator level, it can e.g. combine common expressions in the inlined code with other code before or after the (inlined) call. While a code generator in principle could do a similar analysis of surrounding code, don't expect, it to be prepared to! The code to possibly be inlined will be inlined in extenso, even if identical expressions were calculated before or after the call. The task of a (code-independent) compiler is to discover such common expressions even when it takes the responsibility for inlining functions, while the code generator does not have a similar responsibily for restructuring the parse tree before generating code.

                      Religious freedom is the freedom to say that two plus two make five.

                      H 1 Reply Last reply
                      0
                      • H honey the codewitch

                        We're not talking about xor ax,ax vs mov ax, 0 though. We're talking about changing a field reference to a local variable. You couldn't even see the xor ax,ax vs mov ax,0 in C. You'd have to drop to asm. In this case, you certainly see it, and the code does something quite different in each case. It's up to the jitter in this case, to make that leap of intent in the code, because compiler doesn't do it. I suspected it did make that leap. But I won't apologize for checking, or deciding it's worth finding out. Also A) We're talking about generated code and B) This change does next to nothing in terms of affecting its readability.

                        Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

                        T Offline
                        T Offline
                        trønderen
                        wrote on last edited by
                        #43

                        Caches do wonders. Even what looks as reloading the same from RAM will be cache hits. The performance penalty will be minimal.

                        Religious freedom is the freedom to say that two plus two make five.

                        H 1 Reply Last reply
                        0
                        • T trønderen

                          Caches do wonders. Even what looks as reloading the same from RAM will be cache hits. The performance penalty will be minimal.

                          Religious freedom is the freedom to say that two plus two make five.

                          H Offline
                          H Offline
                          honey the codewitch
                          wrote on last edited by
                          #44

                          Yeah, I mean, but cache is inherently somewhat unreliable, in that even at best its availability varies from processor to processor even on the same architecture in terms of amount and sometimes speed. It also is hard to figure cache misses, and sometimes hard to track down why your functions aren't being held in the cache lines - maybe your locality of reference stinks or something. Look, I get your point about not bothering with this stuff in general. In general I wouldn't. But this is 1. Critical code in terms of performance. I expect to be able to find tens of thousands of matches through some text in milliseconds. For code like that it is worth doing the legwork to find out how to make it faster. Even 20% is worth it. 2. Generated code, so the priorities aren't strictly about readability or maintainability, but even if they were, the change I proposed doesn't really impact readability. Furthermore, being that it's generated by a programmatic process, it pays even more to make that process produce the fastest code reasonable, given #1 (note I didn't say possible, I said reasonable. A 20% gain as a general rule of thumb is easily reasonable for generated code, even if readability is impacted somewhat). If .NET 8 changes the landscape significantly I'll update my code to reflect that. 3. I'm not doing anything Microsoft hasn't done in terms of optimizing their generated code to produce optimal output. I guarantee you they smoke this thing with benchmarks before every ship, and bits get twiddled. My goal was to beat them at this task, for reasons. So I have to play at the same level they are.

                          Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

                          T 1 Reply Last reply
                          0
                          • T trønderen

                            Yes, I guess I was wrong about that. In C#, inlining is not absolute, just a proposal. The compiler/jitter is free to ignore the proposal. That depends on the cost of a call, which varies a lot with CPU architecture. I would guess that on an ARM, a lot more functions are not inlined, even if proposed by the developer, as calls are significantly cheaper on the ARM than on, say, x8x/x64. Nevertheless, even if the code generator makes the final decision based on that CPUs specific instruction set and instruction timing, inlining is something you relate to at source code level. Compare it to unrolling a tight loop with a small, fixed number of iterations. Or use of #define expressions in C/C++. It is not at the level of which instructions are generated. (Well, of course all source code has an effect on code generated, but not at the level of selecting specific coding techniques.) If a method is inlined on both architecture X and architecture Y, that is the same structural code change, regardless of X and Y instruction set. I saw the inlining option a generation ago, when it was a new concept. Then it was a directive to be honored, not a meek proposal. That was at a time when you could also direct a variable to reside in a specific register its entire lifetime. Experience showed that the compiler might know better ... (So we started trusting the compilers!). Note that leaving the decision whether to inline or not might restrict the freedom of the higher level optimizer: If it takes care of the inlining above code generator level, it can e.g. combine common expressions in the inlined code with other code before or after the (inlined) call. While a code generator in principle could do a similar analysis of surrounding code, don't expect, it to be prepared to! The code to possibly be inlined will be inlined in extenso, even if identical expressions were calculated before or after the call. The task of a (code-independent) compiler is to discover such common expressions even when it takes the responsibility for inlining functions, while the code generator does not have a similar responsibily for restructuring the parse tree before generating code.

                            Religious freedom is the freedom to say that two plus two make five.

                            H Offline
                            H Offline
                            honey the codewitch
                            wrote on last edited by
                            #45

                            Maybe I didn't choose the best example in the OP, but it was the one most readily in front of me. I'll say that about this - not knowing if the JITter would "know" that a repeated access off argument zero could be registerized is a fair question. I already know the answer of a traditional compiler. Here, (given my other most recent response to you, with an eye toward #1) the difference in performance would be significant, if my fear were realized about the actual generated code. I predict substantially more than a 20% difference in execution speed given how often I hit that field in my code. I can't easily test that, because I can't make the jitter do the wrong thing. So admittedly, it's a bit post hoc ergo propter hoc, but I wouldn't say it's a wild guess either. But finding that out was significant. It wasn't about the CPU making adjustments to the microcode. It was higher than that level. The CPU can't figure that out. It requires at the very least peephole optimization, or better. I know a traditional compiler will do it, but I don't know the cost benefit calculus microsoft engage in in order to even decide if they thought it was worth it to do that optimization in the JITter for most purposes - my purposes being somewhat different than most purposes here. I stand by that the question was worth knowing, and that the code would have been worth modifying in that worst case. Because of the kind of code that it is. I'm not arguing general purpose development here. You know as well as I do that generalized rules aren't meant to cover every specific scenario- that's where the experience to know when to step outside those general rules is worth the cost and the potential risks.

                            Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

                            1 Reply Last reply
                            0
                            • H honey the codewitch

                              Yeah, I mean, but cache is inherently somewhat unreliable, in that even at best its availability varies from processor to processor even on the same architecture in terms of amount and sometimes speed. It also is hard to figure cache misses, and sometimes hard to track down why your functions aren't being held in the cache lines - maybe your locality of reference stinks or something. Look, I get your point about not bothering with this stuff in general. In general I wouldn't. But this is 1. Critical code in terms of performance. I expect to be able to find tens of thousands of matches through some text in milliseconds. For code like that it is worth doing the legwork to find out how to make it faster. Even 20% is worth it. 2. Generated code, so the priorities aren't strictly about readability or maintainability, but even if they were, the change I proposed doesn't really impact readability. Furthermore, being that it's generated by a programmatic process, it pays even more to make that process produce the fastest code reasonable, given #1 (note I didn't say possible, I said reasonable. A 20% gain as a general rule of thumb is easily reasonable for generated code, even if readability is impacted somewhat). If .NET 8 changes the landscape significantly I'll update my code to reflect that. 3. I'm not doing anything Microsoft hasn't done in terms of optimizing their generated code to produce optimal output. I guarantee you they smoke this thing with benchmarks before every ship, and bits get twiddled. My goal was to beat them at this task, for reasons. So I have to play at the same level they are.

                              Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

                              T Offline
                              T Offline
                              trønderen
                              wrote on last edited by
                              #46

                              I see your statement about a 20% gain. Would you be willing to publish the complete source code of an application so that we can verify that the application runs 20% faster at the user level when we compile the code with a single "inline" either enabled or disabled, but otherwise identical code? Then we could compile it with varying optimization flags to see the effects on the user level timings. We could see the variations between x86 and x64. Maybe there are people here with access to machines running ARM processors so we can see the effect of inlining on those. Generally speaking: 20% gain is essential in benchmark tests. If you set up a double-blind test (so that not even those managing the test knows), presenting users for a test that may either be a 100% speed one or a 120% speed one, and ask them what they see, you cannot expect a very high hit rate.

                              Religious freedom is the freedom to say that two plus two make five.

                              H 1 Reply Last reply
                              0
                              • T trønderen

                                I see your statement about a 20% gain. Would you be willing to publish the complete source code of an application so that we can verify that the application runs 20% faster at the user level when we compile the code with a single "inline" either enabled or disabled, but otherwise identical code? Then we could compile it with varying optimization flags to see the effects on the user level timings. We could see the variations between x86 and x64. Maybe there are people here with access to machines running ARM processors so we can see the effect of inlining on those. Generally speaking: 20% gain is essential in benchmark tests. If you set up a double-blind test (so that not even those managing the test knows), presenting users for a test that may either be a 100% speed one or a 120% speed one, and ask them what they see, you cannot expect a very high hit rate.

                                Religious freedom is the freedom to say that two plus two make five.

                                H Offline
                                H Offline
                                honey the codewitch
                                wrote on last edited by
                                #47

                                Thanks for making me check my work, even if I feel a little foolish. My initial claim was based on code that has long since evolved. The short answer is I cannot reproduce those initial findings in my current efforts. I'm not yet saying that claim doesn't hold up. I can't. I could still be missing something pretty big, and my classes have some forks in their derivation chains for compiled vs generated vs runtime runners - i need to comb over that stuff before I can say the claim *doesn't* hold up either, because right now I'm not seeing any difference, and that's almost certainly not right. I have some conditional compiles and other things I need to check but I'm not in the headspace for running all that down at the moment. I probably will a little later tonight and repost unless I get distracted. Feel free to ping me if you don't hear from me about it. I feel a little silly not having some repro code available at this moment, but I didn't think I'd be backing this claim up in the lounge when i tested it either. :laugh: Either way, I need to actually focus on proving or disproving that claim at this point as a task in and of itself, because it's not just revealing itself to me at this point. The Benchmarks project here is what i'm using, and FALIB_SMALLER *should* remove the inlining attribute from the runtime runners. The real deal though is the generated runners and I need to modify both the benchmarks and the generator code to allow for that option. GitHub - codewitch-honey-crisis/VisualFA: A fast C# DFA regular expression engine[^]

                                Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

                                1 Reply Last reply
                                0
                                • H honey the codewitch

                                  Basically I am not sure about a number of things regarding how it works

                                  if((this.current >= 'A' && this.current <= 'Z') ||
                                  (this.current >= 'a' && this.current <= 'z')) {
                                  // do something
                                  }

                                  In MSIL you'd have to pepper the IL you drop for that if construct with a bunch of extra Ldarg_0 arguments to retrieve the this reference for *each* comparison. On x86 CPUs (and well, most any CPU with registers, which IL doesn't really have unless you stretch the terminology to include its list of function arguments and locals) you'd load the this pointer into a register and work off that rather than repeatedly loading it onto the stack every time you need to access it as you would in IL. On pretty much any supporting architecture this is much faster than hitting stack. Maybe an order of magnitude. So my question is for example, is the JIT compiler smart enough to resolve those repeated Ldarg_0s into register access? That's just one thing I want to know. Some avenues of research I considered to figure this out: 1. Running the code through a debugger and dropping to assembly. The only way I can do that reliably is with debug info, which may change how the JITter drops native instructions. I can't rely on it. 2. Using ngen and then disassembling the result but again, that's not JITted, but rather precompiled so things like whole program optimization are in play. I can't rely on it. And I can't find any material that will help me figure that out short of the very dry and difficult specs they release, which I'm not even sure tell me that, since the JIT compiler's actual implementation details aren't part of the standard. What I'm hoping for is something some clever Microsoft employee or blogger wrote that describes the behavior of Microsoft's JITter in some detail. There are some real world implications for some C# code that my library generates. I need to make some decisions about it and I feel like I don't have all the information I need.

                                  Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

                                  A Offline
                                  A Offline
                                  Alister Morton
                                  wrote on last edited by
                                  #48

                                  Lo and behold, todays newsletter has an article about extracting the assembly that the jit compiler generates, if that's any use ; here[^] if you haven't already seen it.

                                  H J 2 Replies Last reply
                                  0
                                  • A Alister Morton

                                    Lo and behold, todays newsletter has an article about extracting the assembly that the jit compiler generates, if that's any use ; here[^] if you haven't already seen it.

                                    H Offline
                                    H Offline
                                    honey the codewitch
                                    wrote on last edited by
                                    #49

                                    Super! Yeah I saw that, but after you posted. Thanks!

                                    Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

                                    1 Reply Last reply
                                    0
                                    • H honey the codewitch

                                      If it's such a significant difference in the generated code then yes. Especially because in the case I outlined (turns out it does register access after all though) it would require relatively minor adjustments to my generated code to avoid that potential performance pitfall, and do so without significantly impacting readability. I don't like to wait around and hope that Microsoft will one day do the right thing. I've worked there.

                                      Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

                                      J Offline
                                      J Offline
                                      jschell
                                      wrote on last edited by
                                      #50

                                      The other comment reminded me of what I did long ago. The C compiler could be configured to emit assembler. I did that, optimized it, then I used that instead of the original C code in the build. You could certainly do that here.

                                      H T 2 Replies Last reply
                                      0
                                      • A Alister Morton

                                        Lo and behold, todays newsletter has an article about extracting the assembly that the jit compiler generates, if that's any use ; here[^] if you haven't already seen it.

                                        J Offline
                                        J Offline
                                        jschell
                                        wrote on last edited by
                                        #51

                                        Alister Morton wrote:

                                        has an article about extracting the assembly that the jit compiler generates,

                                        Which obviously proves that aliens, spirits and bigfoot all exist. (And you beat me to posting about that.)

                                        1 Reply Last reply
                                        0
                                        • J jschell

                                          The other comment reminded me of what I did long ago. The C compiler could be configured to emit assembler. I did that, optimized it, then I used that instead of the original C code in the build. You could certainly do that here.

                                          H Offline
                                          H Offline
                                          honey the codewitch
                                          wrote on last edited by
                                          #52

                                          I could, and indeed I do one small optimization with my Reflect Emit based compiler that isn't possible - at least readily in C#.

                                          if((codepoint>='A' && codepoint<='Z') || codepoint=='_' || (codepoint>='a' && codepoint<='a')

                                          The comparison ranges are in sorted order left to right. So rather than run through all of the || conditions, I short circuit if the minimum of the next range in the series is greater than the codepoint. It's easy to do in IL since all of this is already resolved to a series of jumps. Not so easy in C#. But I did it there because it was a minor change, and didn't really impact anything. I'd be far more hesitant to create a total fork in my compiler vs. source generator. The performance benefits would have to be compelling. Fortunately, I didn't need to do that here, because my fears were not realized in the end. The JITter was smart enough to optimize that code. But if it wasn't, I could have reorganized my generated source code to produce more efficient IL, in that it would translate to more efficient native code on most platforms. I'd have preferred that approach as it keeps things from being black boxed.

                                          Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

                                          1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Don't have an account? Register

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups