Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. The Lounge
  3. I might need to optimize this XD

I might need to optimize this XD

Scheduled Pinned Locked Moved The Lounge
regexquestioncode-review
24 Posts 6 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • H Offline
    H Offline
    honey the codewitch
    wrote on last edited by
    #1

    L0001: jmp L0002, L0010, L0021, L0029, L0040, L0050, L0059, L0070, L0082, L0092, L0101, L0109, L0115, L0121, L0133, L0142, L0149, L0157, L0167, L0174, L0185, L0190, L0198, L0209, L0213, L0219, L0223, L0233, L0241, L0247, L0256, L0264, L0272, L0278, L0285, L0292, L0297, L0302, L0306, L0311, L0315, L0319, L0323, L0327, L0331, L0335, L0339, L0343, L0347, L0352, L0357, L0361, L0366, L0371, L0375, L0380, L0384, L0389, L0393, L0398, L0402, L0407, L0412, L0416, L0421, L0426, L0430, L0435, L0440, L0444, L0449, L0453, L0458, L0492, L0499, L0506, L0513, L0520, L0527, L0537, L0546, L0554, L0561, L0568, L0574, L0583, L0591, L0598, L0606, L0616, L0625, L0633, L0640, L0647, L0656, L0665, L0671, L0681, L0690, L0696, L0702, L0708, L0716, L0728, L0753, L0768, L0773

    Each JMP operand spawns a fiber (basically a thread). I haven't counted how many are spawned here, but 70-80 or so? I think maybe this code is a bit heavy handed. This is just to match a (quite complicated) regular expression

    Real programmers use butterflies

    L J Z 3 Replies Last reply
    0
    • H honey the codewitch

      L0001: jmp L0002, L0010, L0021, L0029, L0040, L0050, L0059, L0070, L0082, L0092, L0101, L0109, L0115, L0121, L0133, L0142, L0149, L0157, L0167, L0174, L0185, L0190, L0198, L0209, L0213, L0219, L0223, L0233, L0241, L0247, L0256, L0264, L0272, L0278, L0285, L0292, L0297, L0302, L0306, L0311, L0315, L0319, L0323, L0327, L0331, L0335, L0339, L0343, L0347, L0352, L0357, L0361, L0366, L0371, L0375, L0380, L0384, L0389, L0393, L0398, L0402, L0407, L0412, L0416, L0421, L0426, L0430, L0435, L0440, L0444, L0449, L0453, L0458, L0492, L0499, L0506, L0513, L0520, L0527, L0537, L0546, L0554, L0561, L0568, L0574, L0583, L0591, L0598, L0606, L0616, L0625, L0633, L0640, L0647, L0656, L0665, L0671, L0681, L0690, L0696, L0702, L0708, L0716, L0728, L0753, L0768, L0773

      Each JMP operand spawns a fiber (basically a thread). I haven't counted how many are spawned here, but 70-80 or so? I think maybe this code is a bit heavy handed. This is just to match a (quite complicated) regular expression

      Real programmers use butterflies

      L Offline
      L Offline
      Lost User
      wrote on last edited by
      #2

      honey the codewitch wrote:

      Each JMP operand spawns a fiber (basically a thread).

      A light-weight form of a thread. Given the amount of threads that chrome spawns, I wouldn't mind a bit if there's a few fibers running. Do you have a test-project to test the performance?

      Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^] "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

      H 1 Reply Last reply
      0
      • L Lost User

        honey the codewitch wrote:

        Each JMP operand spawns a fiber (basically a thread).

        A light-weight form of a thread. Given the amount of threads that chrome spawns, I wouldn't mind a bit if there's a few fibers running. Do you have a test-project to test the performance?

        Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^] "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

        H Offline
        H Offline
        honey the codewitch
        wrote on last edited by
        #3

        Yes I do and the performance is god awful. This is for a compound regular expression rather than a web browser, so this is more than a little excessive. Normally the machine will spawn like 2 or 3 while it's doing normal character scans, but when it has to split it quickly grows. The reason it spawns more than one is disjunctions in the regex, like foo|bar - it spawns a fiber to scan each one. In truth it spawns slightly more than 1 fiber on average because save points spawn a fiber. Plus each fiber only lives for the duration of one character.

        Real programmers use butterflies

        L Greg UtasG 2 Replies Last reply
        0
        • H honey the codewitch

          Yes I do and the performance is god awful. This is for a compound regular expression rather than a web browser, so this is more than a little excessive. Normally the machine will spawn like 2 or 3 while it's doing normal character scans, but when it has to split it quickly grows. The reason it spawns more than one is disjunctions in the regex, like foo|bar - it spawns a fiber to scan each one. In truth it spawns slightly more than 1 fiber on average because save points spawn a fiber. Plus each fiber only lives for the duration of one character.

          Real programmers use butterflies

          L Offline
          L Offline
          Lost User
          wrote on last edited by
          #4

          honey the codewitch wrote:

          Yes I do and the performance is god awful.

          I agree, and that's why I run FireFox :D

          honey the codewitch wrote:

          Plus each fiber only lives for the duration of one character.

          So, light weight threads that are short-lived? How would it compare to a threadpool, cutting back on creation cost and feeding the threads as they become available?

          Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^] "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

          H J 2 Replies Last reply
          0
          • L Lost User

            honey the codewitch wrote:

            Yes I do and the performance is god awful.

            I agree, and that's why I run FireFox :D

            honey the codewitch wrote:

            Plus each fiber only lives for the duration of one character.

            So, light weight threads that are short-lived? How would it compare to a threadpool, cutting back on creation cost and feeding the threads as they become available?

            Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^] "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

            H Offline
            H Offline
            honey the codewitch
            wrote on last edited by
            #5

            They're already allocated since they're simple structs sitting inside an array. The only field that gets set are two simple 32 bit fields on the struct =) Since they're allocated this way, at least unless .NET sucks in this arena (i haven't checked the IL) they don't need to be recycled - they're permanent instances. Furthermore, the fibers get used at maximum - they are never idle, ergo, a threadpool won't benefit me.

            Real programmers use butterflies

            1 Reply Last reply
            0
            • H honey the codewitch

              Yes I do and the performance is god awful. This is for a compound regular expression rather than a web browser, so this is more than a little excessive. Normally the machine will spawn like 2 or 3 while it's doing normal character scans, but when it has to split it quickly grows. The reason it spawns more than one is disjunctions in the regex, like foo|bar - it spawns a fiber to scan each one. In truth it spawns slightly more than 1 fiber on average because save points spawn a fiber. Plus each fiber only lives for the duration of one character.

              Real programmers use butterflies

              Greg UtasG Offline
              Greg UtasG Offline
              Greg Utas
              wrote on last edited by
              #6

              honey the codewitch wrote:

              the performance is god awful

              Ah yes, the rarely appropriate Thread Per Request pattern. Almost always better is a work queue served by a single thread, or a pool if blocking is an issue. Threads eat up memory, add context switching overhead, and introduce critical regions. I recently discovered how often Windows schedules a new thread, and I'm still flabbergasted. Fibers, being lighter weight, shouldn't be as bad, but evidently it's still plenty bad.

              honey the codewitch wrote:

              each fiber only lives for the duration of one character

              :omg: :wtf: X|

              <p><a href="https://github.com/GregUtas/robust-services-core/blob/master/README.md">Robust Services Core</a>
              <em>The fox knows many things, but the hedgehog knows one big thing.</em></p>

              H 1 Reply Last reply
              0
              • Greg UtasG Greg Utas

                honey the codewitch wrote:

                the performance is god awful

                Ah yes, the rarely appropriate Thread Per Request pattern. Almost always better is a work queue served by a single thread, or a pool if blocking is an issue. Threads eat up memory, add context switching overhead, and introduce critical regions. I recently discovered how often Windows schedules a new thread, and I'm still flabbergasted. Fibers, being lighter weight, shouldn't be as bad, but evidently it's still plenty bad.

                honey the codewitch wrote:

                each fiber only lives for the duration of one character

                :omg: :wtf: X|

                H Offline
                H Offline
                honey the codewitch
                wrote on last edited by
                #7

                The fibers are already allocated since they're simple structs sitting inside an array. The only field that gets set are two simple 32 bit fields on the struct =) Since they're allocated this way, at least unless .NET sucks in this arena (i haven't checked the IL) they don't need to be recycled - they're permanent instances. So a threadpool doesn't buy me anything. These aren't traditional threads.

                Real programmers use butterflies

                Greg UtasG 1 Reply Last reply
                0
                • H honey the codewitch

                  The fibers are already allocated since they're simple structs sitting inside an array. The only field that gets set are two simple 32 bit fields on the struct =) Since they're allocated this way, at least unless .NET sucks in this arena (i haven't checked the IL) they don't need to be recycled - they're permanent instances. So a threadpool doesn't buy me anything. These aren't traditional threads.

                  Real programmers use butterflies

                  Greg UtasG Offline
                  Greg UtasG Offline
                  Greg Utas
                  wrote on last edited by
                  #8

                  These sound like really lightweight fibers, so .NET must suck at handling them. :)

                  <p><a href="https://github.com/GregUtas/robust-services-core/blob/master/README.md">Robust Services Core</a>
                  <em>The fox knows many things, but the hedgehog knows one big thing.</em></p>

                  H 1 Reply Last reply
                  0
                  • Greg UtasG Greg Utas

                    These sound like really lightweight fibers, so .NET must suck at handling them. :)

                    H Offline
                    H Offline
                    honey the codewitch
                    wrote on last edited by
                    #9

                    No, the issue is most fibers resolve to examination of a single character in the input so if you have 10 of them the same character gets examined as much as 10 times. This is a byproduct of the design of a Pike VM, itself an artifact of the way NFA expressions work so there's very little to be done about it except convert to a DFA (the optimization process) Reduce the fibers and it speeds right up:

                    NFA ran with 10 max fibers and 3.5 average char passes
                    NFA+DFA (optimized) ran with 6 max fibers and 2.5 average char passes
                    DFA ran with 2.5 max fibers and 1 average char passes
                    Pass #1
                    NFA: Lexed in 1.575287 msec
                    NFA+DFA (optimized): Lexed in 1.054843 msec
                    DFA: Lexed in 0.901254 msec
                    Pass #2
                    NFA: Lexed in 1.529819 msec
                    NFA+DFA (optimized): Lexed in 1.100836 msec
                    DFA: Lexed in 0.830835 msec
                    Pass #3
                    NFA: Lexed in 1.523334 msec
                    NFA+DFA (optimized): Lexed in 1.049213 msec
                    DFA: Lexed in 0.851737 msec
                    Pass #4
                    NFA: Lexed in 1.400265 msec
                    NFA+DFA (optimized): Lexed in 1.03485 msec
                    DFA: Lexed in 0.829009 msec

                    Real programmers use butterflies

                    1 Reply Last reply
                    0
                    • L Lost User

                      honey the codewitch wrote:

                      Yes I do and the performance is god awful.

                      I agree, and that's why I run FireFox :D

                      honey the codewitch wrote:

                      Plus each fiber only lives for the duration of one character.

                      So, light weight threads that are short-lived? How would it compare to a threadpool, cutting back on creation cost and feeding the threads as they become available?

                      Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^] "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

                      J Offline
                      J Offline
                      Jon McKee
                      wrote on last edited by
                      #10

                      Eddy Vluggen wrote:

                      So, light weight threads that are short-lived?

                      Kinda. Their primary purpose is non-preemptive/cooperative multitasking instead of preemptive multitasking like threads. The best analogy I've seen is co-routines.

                      H 1 Reply Last reply
                      0
                      • J Jon McKee

                        Eddy Vluggen wrote:

                        So, light weight threads that are short-lived?

                        Kinda. Their primary purpose is non-preemptive/cooperative multitasking instead of preemptive multitasking like threads. The best analogy I've seen is co-routines.

                        H Offline
                        H Offline
                        honey the codewitch
                        wrote on last edited by
                        #11

                        Yep. That's about the long and short of it.

                        private struct _Fiber
                        {
                        public readonly int[][] Program;
                        public readonly int Index;
                        public int[] Saved;
                        public _Fiber(int[][] program, int index,int[] saved)
                        {
                        Program = program;
                        Index = index;
                        Saved = saved;
                        }
                        public _Fiber(_Fiber fiber, int index,int[] saved)
                        {
                        Program = fiber.Program;
                        Index = index;
                        Saved = saved;
                        }
                        }

                        All it contains is a pointer to the program array which all fibers share, the current instruction pointer, and any saved cursor position (only used in the event of the "save" instruction) Creating them is cheap since I just use a straight array to hold them all and it basically never gets resized, so all of them are already "live" just waiting to have their fields filled in.

                        Real programmers use butterflies

                        1 Reply Last reply
                        0
                        • H honey the codewitch

                          L0001: jmp L0002, L0010, L0021, L0029, L0040, L0050, L0059, L0070, L0082, L0092, L0101, L0109, L0115, L0121, L0133, L0142, L0149, L0157, L0167, L0174, L0185, L0190, L0198, L0209, L0213, L0219, L0223, L0233, L0241, L0247, L0256, L0264, L0272, L0278, L0285, L0292, L0297, L0302, L0306, L0311, L0315, L0319, L0323, L0327, L0331, L0335, L0339, L0343, L0347, L0352, L0357, L0361, L0366, L0371, L0375, L0380, L0384, L0389, L0393, L0398, L0402, L0407, L0412, L0416, L0421, L0426, L0430, L0435, L0440, L0444, L0449, L0453, L0458, L0492, L0499, L0506, L0513, L0520, L0527, L0537, L0546, L0554, L0561, L0568, L0574, L0583, L0591, L0598, L0606, L0616, L0625, L0633, L0640, L0647, L0656, L0665, L0671, L0681, L0690, L0696, L0702, L0708, L0716, L0728, L0753, L0768, L0773

                          Each JMP operand spawns a fiber (basically a thread). I haven't counted how many are spawned here, but 70-80 or so? I think maybe this code is a bit heavy handed. This is just to match a (quite complicated) regular expression

                          Real programmers use butterflies

                          J Offline
                          J Offline
                          Jorgen Andersson
                          wrote on last edited by
                          #12

                          If I have understood it correctly: yield uses fibers, foreach uses yield. So try to swap a few well chosen foreach loops for classic for loops and see what happens.

                          Wrong is evil and must be defeated. - Jeff Ello

                          H 1 Reply Last reply
                          0
                          • J Jorgen Andersson

                            If I have understood it correctly: yield uses fibers, foreach uses yield. So try to swap a few well chosen foreach loops for classic for loops and see what happens.

                            Wrong is evil and must be defeated. - Jeff Ello

                            H Offline
                            H Offline
                            honey the codewitch
                            wrote on last edited by
                            #13

                            I'm not using any foreach loops. I've already optimized the VM itself to within an inch of its life

                            Real programmers use butterflies

                            J 1 Reply Last reply
                            0
                            • H honey the codewitch

                              I'm not using any foreach loops. I've already optimized the VM itself to within an inch of its life

                              Real programmers use butterflies

                              J Offline
                              J Offline
                              Jorgen Andersson
                              wrote on last edited by
                              #14

                              How about Linq?

                              Wrong is evil and must be defeated. - Jeff Ello

                              H 1 Reply Last reply
                              0
                              • J Jorgen Andersson

                                How about Linq?

                                Wrong is evil and must be defeated. - Jeff Ello

                                H Offline
                                H Offline
                                honey the codewitch
                                wrote on last edited by
                                #15

                                I thought the goal was to speed this up?

                                Real programmers use butterflies

                                J 1 Reply Last reply
                                0
                                • H honey the codewitch

                                  L0001: jmp L0002, L0010, L0021, L0029, L0040, L0050, L0059, L0070, L0082, L0092, L0101, L0109, L0115, L0121, L0133, L0142, L0149, L0157, L0167, L0174, L0185, L0190, L0198, L0209, L0213, L0219, L0223, L0233, L0241, L0247, L0256, L0264, L0272, L0278, L0285, L0292, L0297, L0302, L0306, L0311, L0315, L0319, L0323, L0327, L0331, L0335, L0339, L0343, L0347, L0352, L0357, L0361, L0366, L0371, L0375, L0380, L0384, L0389, L0393, L0398, L0402, L0407, L0412, L0416, L0421, L0426, L0430, L0435, L0440, L0444, L0449, L0453, L0458, L0492, L0499, L0506, L0513, L0520, L0527, L0537, L0546, L0554, L0561, L0568, L0574, L0583, L0591, L0598, L0606, L0616, L0625, L0633, L0640, L0647, L0656, L0665, L0671, L0681, L0690, L0696, L0702, L0708, L0716, L0728, L0753, L0768, L0773

                                  Each JMP operand spawns a fiber (basically a thread). I haven't counted how many are spawned here, but 70-80 or so? I think maybe this code is a bit heavy handed. This is just to match a (quite complicated) regular expression

                                  Real programmers use butterflies

                                  Z Offline
                                  Z Offline
                                  ZTransform
                                  wrote on last edited by
                                  #16

                                  Lounge?

                                  H J 2 Replies Last reply
                                  0
                                  • H honey the codewitch

                                    I thought the goal was to speed this up?

                                    Real programmers use butterflies

                                    J Offline
                                    J Offline
                                    Jorgen Andersson
                                    wrote on last edited by
                                    #17

                                    I didn't tell you to use it, I'm just looking for problems. :-)

                                    Wrong is evil and must be defeated. - Jeff Ello

                                    H 1 Reply Last reply
                                    0
                                    • Z ZTransform

                                      Lounge?

                                      H Offline
                                      H Offline
                                      honey the codewitch
                                      wrote on last edited by
                                      #18

                                      Yes, this is the Lounge.

                                      Real programmers use butterflies

                                      1 Reply Last reply
                                      0
                                      • J Jorgen Andersson

                                        I didn't tell you to use it, I'm just looking for problems. :-)

                                        Wrong is evil and must be defeated. - Jeff Ello

                                        H Offline
                                        H Offline
                                        honey the codewitch
                                        wrote on last edited by
                                        #19

                                        Look away. Here's almost all of it. The stuff you don't see is very thin

                                        public static int Run(int[][] prog,LexContext input)
                                        {
                                        input.EnsureStarted();
                                        int i,match=-1;
                                        _Fiber[] currentFibers, nextFibers, tmp;
                                        int currentFiberCount=0, nextFiberCount=0;
                                        int[] pc;
                                        // position in input
                                        int sp=0;
                                        // stores our captured input
                                        var sb = new StringBuilder(64);
                                        int[] saved, matched;
                                        saved = new int[2];
                                        currentFibers = new _Fiber[prog.Length];
                                        nextFibers = new _Fiber[prog.Length];
                                        _EnqueueFiber(ref currentFiberCount, ref currentFibers, new _Fiber(prog,0, saved), 0);
                                        matched = null;
                                        var cur = -1;
                                        if (LexContext.EndOfInput != input.Current)
                                        {
                                        var ch1 = unchecked((char)input.Current);
                                        if (char.IsHighSurrogate(ch1))
                                        {
                                        if (-1 == input.Advance())
                                        throw new ExpectingException("Expecting low surrogate in unicode stream. The input source is corrupt or not valid Unicode", input.Line, input.Column, input.Position, input.FileOrUrl);
                                        var ch2 = unchecked((char)input.Current);
                                        cur = char.ConvertToUtf32(ch1, ch2);
                                        }
                                        else
                                        cur = ch1;

                                        }
                                        		
                                        while(0
                                        
                                        J 1 Reply Last reply
                                        0
                                        • Z ZTransform

                                          Lounge?

                                          J Offline
                                          J Offline
                                          Jorgen Andersson
                                          wrote on last edited by
                                          #20

                                          In the sticky post at the top: 2. Technical discussions are welcome...[^]

                                          Wrong is evil and must be defeated. - Jeff Ello

                                          1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Don't have an account? Register

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups