Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. RegEx bug in .NET Framework!!

RegEx bug in .NET Framework!!

Scheduled Pinned Locked Moved C#
csharpdotnetregexhelpquestion
10 Posts 7 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • D Offline
    D Offline
    Dominic Pettifer
    wrote on last edited by
    #1

    OK, this is interesting, I think I've found a bug in the RegEx class in the .NET framework. Run the following as a simple console app...

    using System.Text.RegularExpressions;

    namespace RegExBug
    {
    class Program
    {
    static void Main(string[] args)
    {
    string emailPattern =
    @"^([0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*@(([0-9a-zA-Z])+([-\w]*[0-9a-zA-Z])*\.)+[a-zA-Z]{2,9})$";
    Regex emailTest = new Regex(emailPattern);

            if (emailTest.IsMatch("sad.couple.skint.tired.fedup@ukgateway.n"))
            {
                return;
            }
        }
    }
    

    }

    Now when it gets to the code 'emailTest.IsMatch' the application will hang. It doesn't throw an exception, just hangs there for ages (possible infinite loop bug in RegEx class??). Wrapping a try catch around it does nothing either. Anyone know what could be going on?

    Dominic Pettifer Blog: www.dominicpettifer.co.uk

    J 1 Reply Last reply
    0
    • D Dominic Pettifer

      OK, this is interesting, I think I've found a bug in the RegEx class in the .NET framework. Run the following as a simple console app...

      using System.Text.RegularExpressions;

      namespace RegExBug
      {
      class Program
      {
      static void Main(string[] args)
      {
      string emailPattern =
      @"^([0-9a-zA-Z]([-.\w]*[0-9a-zA-Z])*@(([0-9a-zA-Z])+([-\w]*[0-9a-zA-Z])*\.)+[a-zA-Z]{2,9})$";
      Regex emailTest = new Regex(emailPattern);

              if (emailTest.IsMatch("sad.couple.skint.tired.fedup@ukgateway.n"))
              {
                  return;
              }
          }
      }
      

      }

      Now when it gets to the code 'emailTest.IsMatch' the application will hang. It doesn't throw an exception, just hangs there for ages (possible infinite loop bug in RegEx class??). Wrapping a try catch around it does nothing either. Anyone know what could be going on?

      Dominic Pettifer Blog: www.dominicpettifer.co.uk

      J Offline
      J Offline
      Judah Gabriel Himango
      wrote on last edited by
      #2

      I just tested this snippet and it appears you're right, the app hangs. Actually, it just takes a really long time thanks to the nested quantifiers. I just did a search on Microsoft feedback center, is this the bug you're hitting? Complex Regex evaluation hangs[^] Microsoft has this to say on the BCL team blog[^]:

      Well, actually, it doesn't hang. It just takes a really really long time, and you haven't waited long enough for it to finish. One of the pitfalls with regular expression is that you can write expressions which don't perform very well. In particular, you can end up with expressions whose search time grows exponentially with the length of the search string. I get bugs reporting that Regex hangs about once a month, and it always turns out to be an exponentially slow expression. Here's a simplified example of one of them: ([a-z]+)*= There are two things interesting about this expression. First, notice that it has two quantifiers nested within each other. The inner one is the + quantifier for the character class, and the outer one is the *. Second, it has a character (the equals char) that must be matched at the end of the result. In English terms, this expression can be explained as 1. match any character a-z, one or more times 2. match step #1 zero or more times 3. match an equals What will happen is that Regex will breeze through step 1 and 2 only to find that it can't match in step 3. That forces it to backtrack and try to match the first two steps differently. The trouble is that there are a lot of different ways that steps 1 and 2 can match, and Regex needs to try every single one before it can determine that the expression does not match the string.

      Tech, life, family, faith: Give me a visit. I'm currently blogging about: How could God prove Himself to humanity? The apos

      M M 2 Replies Last reply
      0
      • J Judah Gabriel Himango

        I just tested this snippet and it appears you're right, the app hangs. Actually, it just takes a really long time thanks to the nested quantifiers. I just did a search on Microsoft feedback center, is this the bug you're hitting? Complex Regex evaluation hangs[^] Microsoft has this to say on the BCL team blog[^]:

        Well, actually, it doesn't hang. It just takes a really really long time, and you haven't waited long enough for it to finish. One of the pitfalls with regular expression is that you can write expressions which don't perform very well. In particular, you can end up with expressions whose search time grows exponentially with the length of the search string. I get bugs reporting that Regex hangs about once a month, and it always turns out to be an exponentially slow expression. Here's a simplified example of one of them: ([a-z]+)*= There are two things interesting about this expression. First, notice that it has two quantifiers nested within each other. The inner one is the + quantifier for the character class, and the outer one is the *. Second, it has a character (the equals char) that must be matched at the end of the result. In English terms, this expression can be explained as 1. match any character a-z, one or more times 2. match step #1 zero or more times 3. match an equals What will happen is that Regex will breeze through step 1 and 2 only to find that it can't match in step 3. That forces it to backtrack and try to match the first two steps differently. The trouble is that there are a lot of different ways that steps 1 and 2 can match, and Regex needs to try every single one before it can determine that the expression does not match the string.

        Tech, life, family, faith: Give me a visit. I'm currently blogging about: How could God prove Himself to humanity? The apos

        M Offline
        M Offline
        Malcolm Smart
        wrote on last edited by
        #3

        Which all of course leads onto the next question :- What is the best email validating regex?

        "More functions should disregard input values and just return 12. It would make life easier." - comment posted on WTF

        S 1 Reply Last reply
        0
        • M Malcolm Smart

          Which all of course leads onto the next question :- What is the best email validating regex?

          "More functions should disregard input values and just return 12. It would make life easier." - comment posted on WTF

          S Offline
          S Offline
          Scott Dorman
          wrote on last edited by
          #4

          For which the correct answer is: There isn't one.

          ----------------------------- In just two days, tomorrow will be yesterday.

          P D J 3 Replies Last reply
          0
          • S Scott Dorman

            For which the correct answer is: There isn't one.

            ----------------------------- In just two days, tomorrow will be yesterday.

            P Offline
            P Offline
            Paul Conrad
            wrote on last edited by
            #5

            I agree. I've seen numerous ones all over the net and they all seem to have some kind of short coming. No silver bullet, cure all, I guess.

            "Any sort of work in VB6 is bound to provide several WTF moments." - Christian Graus

            1 Reply Last reply
            0
            • S Scott Dorman

              For which the correct answer is: There isn't one.

              ----------------------------- In just two days, tomorrow will be yesterday.

              D Offline
              D Offline
              Dan Neely
              wrote on last edited by
              #6

              There's one in the RFC that is supposed to catch all the curlicues that are allowed if almost never implemented. It's gargantuan, about the size of a large paragraph. :omg::wtf::omg:

              -- You have to explain to them [VB coders] what you mean by "typed". their first response is likely to be something like, "Of course my code is typed. Do you think i magically project it onto the screen with the power of my mind?" --- John Simmons / outlaw programmer

              P 1 Reply Last reply
              0
              • S Scott Dorman

                For which the correct answer is: There isn't one.

                ----------------------------- In just two days, tomorrow will be yesterday.

                J Offline
                J Offline
                Judah Gabriel Himango
                wrote on last edited by
                #7

                Scott Dorman wrote:

                There isn't one.

                No, there is one, it's just really big: :)

                (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

                That matches all RFC 2822 email addresses. However, if you want a regex that validates 99% of the email addresses out there and still performs well, this one[^] should work well. *edit* dang smileys :-p

                Tech, life, family, faith: Give me a visit. I'm currently blogging about: How could God prove Himself to humanity? The apostle Paul, modernly speaking: Epistles of Paul Judah Himango

                P 1 Reply Last reply
                0
                • D Dan Neely

                  There's one in the RFC that is supposed to catch all the curlicues that are allowed if almost never implemented. It's gargantuan, about the size of a large paragraph. :omg::wtf::omg:

                  -- You have to explain to them [VB coders] what you mean by "typed". their first response is likely to be something like, "Of course my code is typed. Do you think i magically project it onto the screen with the power of my mind?" --- John Simmons / outlaw programmer

                  P Offline
                  P Offline
                  Paul Conrad
                  wrote on last edited by
                  #8

                  dan neely wrote:

                  It's gargantuan, about the size of a large paragraph.

                  Holy moly :eek:

                  "Any sort of work in VB6 is bound to provide several WTF moments." - Christian Graus

                  1 Reply Last reply
                  0
                  • J Judah Gabriel Himango

                    Scott Dorman wrote:

                    There isn't one.

                    No, there is one, it's just really big: :)

                    (?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

                    That matches all RFC 2822 email addresses. However, if you want a regex that validates 99% of the email addresses out there and still performs well, this one[^] should work well. *edit* dang smileys :-p

                    Tech, life, family, faith: Give me a visit. I'm currently blogging about: How could God prove Himself to humanity? The apostle Paul, modernly speaking: Epistles of Paul Judah Himango

                    P Offline
                    P Offline
                    Paul Conrad
                    wrote on last edited by
                    #9

                    :-D I forgot about www.regular-expressions.info[^], time to bookmark it :-O

                    "Any sort of work in VB6 is bound to provide several WTF moments." - Christian Graus

                    1 Reply Last reply
                    0
                    • J Judah Gabriel Himango

                      I just tested this snippet and it appears you're right, the app hangs. Actually, it just takes a really long time thanks to the nested quantifiers. I just did a search on Microsoft feedback center, is this the bug you're hitting? Complex Regex evaluation hangs[^] Microsoft has this to say on the BCL team blog[^]:

                      Well, actually, it doesn't hang. It just takes a really really long time, and you haven't waited long enough for it to finish. One of the pitfalls with regular expression is that you can write expressions which don't perform very well. In particular, you can end up with expressions whose search time grows exponentially with the length of the search string. I get bugs reporting that Regex hangs about once a month, and it always turns out to be an exponentially slow expression. Here's a simplified example of one of them: ([a-z]+)*= There are two things interesting about this expression. First, notice that it has two quantifiers nested within each other. The inner one is the + quantifier for the character class, and the outer one is the *. Second, it has a character (the equals char) that must be matched at the end of the result. In English terms, this expression can be explained as 1. match any character a-z, one or more times 2. match step #1 zero or more times 3. match an equals What will happen is that Regex will breeze through step 1 and 2 only to find that it can't match in step 3. That forces it to backtrack and try to match the first two steps differently. The trouble is that there are a lot of different ways that steps 1 and 2 can match, and Regex needs to try every single one before it can determine that the expression does not match the string.

                      Tech, life, family, faith: Give me a visit. I'm currently blogging about: How could God prove Himself to humanity? The apos

                      M Offline
                      M Offline
                      Michael Sync
                      wrote on last edited by
                      #10

                      thanks. it's very interesting....

                      Thanks and Regards, Michael Sync ( Blog: http://michaelsync.net)

                      1 Reply Last reply
                      0
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Don't have an account? Register

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups