Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. The Lounge
  3. I found a terrible bug (rubber duck session)

I found a terrible bug (rubber duck session)

Scheduled Pinned Locked Moved The Lounge
helpdesignvisual-studiocomgraphics
11 Posts 5 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • H Offline
    H Offline
    honey the codewitch
    wrote on last edited by
    #1

    Update: I have since fixed the terrible bug! :-D I'm posting this because when I rant about these things to you folks I tend to come up with a solution, and I've been at this since last night. Skip it if you'd rather not be used like that. :) It's not a programming question, though I will describe the problem. There's not really code as such. [\r\n]* (zero or more carriage returns or line feeds) yields a proper set with two transitions [^\r\n]* (zero or more of anything but carriage returns or line feeds) matches any characters (incorrect). The set has one range with all unicode code points in it, and when you invert the set and then minimize the result it will actually crash. [^\n\r]* (functionally same as above) but works properly, yielding a set of all except carriage return or line feed. This despite the sets ostensibly being sorted. I thought I narrowed it down to a normalization routine I have that takes overlapping ranges and merges them. That still might be part of the problem. However, I removed the call to the normalization routine and it still fails my test, so something else is at fault further downstream. One of the issues is this is in live code - with deployed nuget packages and codeproject articles, and I only just discovered it. So there's some pressure on me to fix it, albeit self imposed. :~

    Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

    0 K J Mircea NeacsuM 4 Replies Last reply
    0
    • H honey the codewitch

      Update: I have since fixed the terrible bug! :-D I'm posting this because when I rant about these things to you folks I tend to come up with a solution, and I've been at this since last night. Skip it if you'd rather not be used like that. :) It's not a programming question, though I will describe the problem. There's not really code as such. [\r\n]* (zero or more carriage returns or line feeds) yields a proper set with two transitions [^\r\n]* (zero or more of anything but carriage returns or line feeds) matches any characters (incorrect). The set has one range with all unicode code points in it, and when you invert the set and then minimize the result it will actually crash. [^\n\r]* (functionally same as above) but works properly, yielding a set of all except carriage return or line feed. This despite the sets ostensibly being sorted. I thought I narrowed it down to a normalization routine I have that takes overlapping ranges and merges them. That still might be part of the problem. However, I removed the call to the normalization routine and it still fails my test, so something else is at fault further downstream. One of the issues is this is in live code - with deployed nuget packages and codeproject articles, and I only just discovered it. So there's some pressure on me to fix it, albeit self imposed. :~

      Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

      0 Offline
      0 Offline
      0x01AA
      wrote on last edited by
      #2

      Sometimes over-engineered code breaks your neck. And then I can't help but laugh maliciously. ;P :-D

      H 1 Reply Last reply
      0
      • H honey the codewitch

        Update: I have since fixed the terrible bug! :-D I'm posting this because when I rant about these things to you folks I tend to come up with a solution, and I've been at this since last night. Skip it if you'd rather not be used like that. :) It's not a programming question, though I will describe the problem. There's not really code as such. [\r\n]* (zero or more carriage returns or line feeds) yields a proper set with two transitions [^\r\n]* (zero or more of anything but carriage returns or line feeds) matches any characters (incorrect). The set has one range with all unicode code points in it, and when you invert the set and then minimize the result it will actually crash. [^\n\r]* (functionally same as above) but works properly, yielding a set of all except carriage return or line feed. This despite the sets ostensibly being sorted. I thought I narrowed it down to a normalization routine I have that takes overlapping ranges and merges them. That still might be part of the problem. However, I removed the call to the normalization routine and it still fails my test, so something else is at fault further downstream. One of the issues is this is in live code - with deployed nuget packages and codeproject articles, and I only just discovered it. So there's some pressure on me to fix it, albeit self imposed. :~

        Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

        K Offline
        K Offline
        k5054
        wrote on last edited by
        #3

        Does the issue only apply to control characters, or do you have issues with other match groups not in order? e.g. if [^\r\n] fails and/or crashes, does [^rn] fail and/or crash also? If the latter fails also, then maybe you have an issue in your normalization routine. If only the former fails, then I'd have to suspect that it has something to do with handling "special" (i.e. control) characters. Does the group [^\v\n] also fail? What about code>[^\n\r\a\v\t] or other combinations of control chars? What about [^abc\n]?

        "A little song, a little dance, a little seltzer down your pants" Chuckles the clown

        H 1 Reply Last reply
        0
        • K k5054

          Does the issue only apply to control characters, or do you have issues with other match groups not in order? e.g. if [^\r\n] fails and/or crashes, does [^rn] fail and/or crash also? If the latter fails also, then maybe you have an issue in your normalization routine. If only the former fails, then I'd have to suspect that it has something to do with handling "special" (i.e. control) characters. Does the group [^\v\n] also fail? What about code>[^\n\r\a\v\t] or other combinations of control chars? What about [^abc\n]?

          "A little song, a little dance, a little seltzer down your pants" Chuckles the clown

          H Offline
          H Offline
          honey the codewitch
          wrote on last edited by
          #4

          I'll check it out. :) thanks

          Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

          1 Reply Last reply
          0
          • H honey the codewitch

            Update: I have since fixed the terrible bug! :-D I'm posting this because when I rant about these things to you folks I tend to come up with a solution, and I've been at this since last night. Skip it if you'd rather not be used like that. :) It's not a programming question, though I will describe the problem. There's not really code as such. [\r\n]* (zero or more carriage returns or line feeds) yields a proper set with two transitions [^\r\n]* (zero or more of anything but carriage returns or line feeds) matches any characters (incorrect). The set has one range with all unicode code points in it, and when you invert the set and then minimize the result it will actually crash. [^\n\r]* (functionally same as above) but works properly, yielding a set of all except carriage return or line feed. This despite the sets ostensibly being sorted. I thought I narrowed it down to a normalization routine I have that takes overlapping ranges and merges them. That still might be part of the problem. However, I removed the call to the normalization routine and it still fails my test, so something else is at fault further downstream. One of the issues is this is in live code - with deployed nuget packages and codeproject articles, and I only just discovered it. So there's some pressure on me to fix it, albeit self imposed. :~

            Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

            J Offline
            J Offline
            jschell
            wrote on last edited by
            #5

            I would say you should never do any of those. So do not attempt to solve anything. All of those are unbounded and optional. Something like the following is correct in that it provides a bound and is not entirely optional.

            ^\d[^\r\n]*[\r\n]+

            H 1 Reply Last reply
            0
            • J jschell

              I would say you should never do any of those. So do not attempt to solve anything. All of those are unbounded and optional. Something like the following is correct in that it provides a bound and is not entirely optional.

              ^\d[^\r\n]*[\r\n]+

              H Offline
              H Offline
              honey the codewitch
              wrote on last edited by
              #6

              Well, I caught this as part of a larger regular expression, I'm simply taking out a portion in order to simplify. In my engine, it's perfectly fine to have a zero length match because every subexpression is an expression. It's expressions all the way down. :) (Oh, and I get the same results with +)

              Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

              J 1 Reply Last reply
              0
              • 0 0x01AA

                Sometimes over-engineered code breaks your neck. And then I can't help but laugh maliciously. ;P :-D

                H Offline
                H Offline
                honey the codewitch
                wrote on last edited by
                #7

                It's actually based on some simple mathematical concepts. I know enough about the present bug that it has to do with the way I'm sorting and categorizing ranges of characters. I wouldn't even use ranges except I need to in order to make Unicode practical, but it does cause the algorithm to significantly deviate from what you'd find in a textbook. It all works, but basically here's what's going on: Range 1: 0-12 Range 2: 10-0x1ffff It works fine if range two comes before range one, but how do i sort this? Normally it needs to sort such that Z-A becomes A-Z and therein lies the issue, or at least an issue. Maybe I can side step it somehow. Still stewing on this. Edit: I just realized it's groups of ranges I'm trying to sort. Maybe I don't need to at all?

                Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

                1 Reply Last reply
                0
                • H honey the codewitch

                  Update: I have since fixed the terrible bug! :-D I'm posting this because when I rant about these things to you folks I tend to come up with a solution, and I've been at this since last night. Skip it if you'd rather not be used like that. :) It's not a programming question, though I will describe the problem. There's not really code as such. [\r\n]* (zero or more carriage returns or line feeds) yields a proper set with two transitions [^\r\n]* (zero or more of anything but carriage returns or line feeds) matches any characters (incorrect). The set has one range with all unicode code points in it, and when you invert the set and then minimize the result it will actually crash. [^\n\r]* (functionally same as above) but works properly, yielding a set of all except carriage return or line feed. This despite the sets ostensibly being sorted. I thought I narrowed it down to a normalization routine I have that takes overlapping ranges and merges them. That still might be part of the problem. However, I removed the call to the normalization routine and it still fails my test, so something else is at fault further downstream. One of the issues is this is in live code - with deployed nuget packages and codeproject articles, and I only just discovered it. So there's some pressure on me to fix it, albeit self imposed. :~

                  Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

                  Mircea NeacsuM Online
                  Mircea NeacsuM Online
                  Mircea Neacsu
                  wrote on last edited by
                  #8

                  Might be an ordering issue: \r(0x0d) > \n(0x0a).

                  Mircea

                  H 1 Reply Last reply
                  0
                  • Mircea NeacsuM Mircea Neacsu

                    Might be an ordering issue: \r(0x0d) > \n(0x0a).

                    Mircea

                    H Offline
                    H Offline
                    honey the codewitch
                    wrote on last edited by
                    #9

                    It is, but I just can't find where it's creating the problem. I've been kind of avoiding it at the moment.

                    Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

                    1 Reply Last reply
                    0
                    • H honey the codewitch

                      Well, I caught this as part of a larger regular expression, I'm simply taking out a portion in order to simplify. In my engine, it's perfectly fine to have a zero length match because every subexpression is an expression. It's expressions all the way down. :) (Oh, and I get the same results with +)

                      Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

                      J Offline
                      J Offline
                      jschell
                      wrote on last edited by
                      #10

                      honey the codewitch wrote:

                      it's perfectly fine to have a zero length match

                      I didn't say the engine can't do it. I am saying a programmer should never, not for any reason, write expressions like that for an engine.

                      H 1 Reply Last reply
                      0
                      • J jschell

                        honey the codewitch wrote:

                        it's perfectly fine to have a zero length match

                        I didn't say the engine can't do it. I am saying a programmer should never, not for any reason, write expressions like that for an engine.

                        H Offline
                        H Offline
                        honey the codewitch
                        wrote on last edited by
                        #11

                        Sure, but in this case, it was sufficient for running down this bug, because of the way the engine works. I did change it to + just to dot my "i"s and cross my "t"s but same result.

                        Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

                        1 Reply Last reply
                        0
                        Reply
                        • Reply as topic
                        Log in to reply
                        • Oldest to Newest
                        • Newest to Oldest
                        • Most Votes


                        • Login

                        • Don't have an account? Register

                        • Login or register to search.
                        • First post
                          Last post
                        0
                        • Categories
                        • Recent
                        • Tags
                        • Popular
                        • World
                        • Users
                        • Groups