Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. .NET (Core and Framework)
  4. Regular expression not matching as expected

Regular expression not matching as expected

Scheduled Pinned Locked Moved .NET (Core and Framework)
helpregexquestion
15 Posts 2 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • T Offline
    T Offline
    tgrt
    wrote on last edited by
    #1

    I have a regular expression that isn't matching the text into a capture group as expected. What makes this odd is that it works fine in Expresso, but not at run-time in the application. The problem is that the "ErrorMessage" capture has a value of "Error processing" at run-time; however, in Expresso it has "Error processing CallRaisesError" which is what I expect. What am I missing here? Regular Expression: \A\d{2}[\s\t]+(?.[^\t]*)[\s\t]+(?:.[^,]*,[\s\t])*(?:.[^;]*;)[\s\t](?\d*)[;][\s\t]+{DONE}\z Test data: 01\tError processing CallRaisesError\t; 10000; {DONE} Options: RegexOptions.Compiled | RegexOptions.CultureInvariant | RegexOptions.IgnoreCase | RegexOptions.Multiline

    P 1 Reply Last reply
    0
    • T tgrt

      I have a regular expression that isn't matching the text into a capture group as expected. What makes this odd is that it works fine in Expresso, but not at run-time in the application. The problem is that the "ErrorMessage" capture has a value of "Error processing" at run-time; however, in Expresso it has "Error processing CallRaisesError" which is what I expect. What am I missing here? Regular Expression: \A\d{2}[\s\t]+(?.[^\t]*)[\s\t]+(?:.[^,]*,[\s\t])*(?:.[^;]*;)[\s\t](?\d*)[;][\s\t]+{DONE}\z Test data: 01\tError processing CallRaisesError\t; 10000; {DONE} Options: RegexOptions.Compiled | RegexOptions.CultureInvariant | RegexOptions.IgnoreCase | RegexOptions.Multiline

      P Offline
      P Offline
      PIEBALDconsult
      wrote on last edited by
      #2

      I don't know, I'm fairly new to regular expressions as well. But it doesn't help that the post has an unfortunate smiley in it. You may want to modify the post with the code within a pre block (or whatever the proper term is). Also, be sure that things like \t are understood by the processor, you may need to use a literal string (preceeded by an @ sign): "abc\tdef" will be different from @"abc\tdef" and "abc\\tdef".

      T 1 Reply Last reply
      0
      • P PIEBALDconsult

        I don't know, I'm fairly new to regular expressions as well. But it doesn't help that the post has an unfortunate smiley in it. You may want to modify the post with the code within a pre block (or whatever the proper term is). Also, be sure that things like \t are understood by the processor, you may need to use a literal string (preceeded by an @ sign): "abc\tdef" will be different from @"abc\tdef" and "abc\\tdef".

        T Offline
        T Offline
        tgrt
        wrote on last edited by
        #3

        Neither the pre or code blocks do anything about the smiley. Apparently, you're stuck with it. I figured it would be easy to ascertain that the winking smiley is a "; )" (just remove the space). The \t is a tab character in C# and is used there as such to illustrate the delimiter in the output.

        P 1 Reply Last reply
        0
        • T tgrt

          Neither the pre or code blocks do anything about the smiley. Apparently, you're stuck with it. I figured it would be easy to ascertain that the winking smiley is a "; )" (just remove the space). The \t is a tab character in C# and is used there as such to illustrate the delimiter in the output.

          P Offline
          P Offline
          PIEBALDconsult
          wrote on last edited by
          #4

          tgrt wrote:

          I figured it would be easy to ascertain that the winking smiley is a "; )"

          I didn't know whether or not it was safe to do so.

          tgrt wrote:

          The \t is a tab character

          Yes, but do you want the regex to contain a TAB character or the literal \t? If the latter (as I expect), then you need to either use a literal string or escape the backslash (\\t).

          T 1 Reply Last reply
          0
          • P PIEBALDconsult

            tgrt wrote:

            I figured it would be easy to ascertain that the winking smiley is a "; )"

            I didn't know whether or not it was safe to do so.

            tgrt wrote:

            The \t is a tab character

            Yes, but do you want the regex to contain a TAB character or the literal \t? If the latter (as I expect), then you need to either use a literal string or escape the backslash (\\t).

            T Offline
            T Offline
            tgrt
            wrote on last edited by
            #5

            PIEBALDconsult wrote:

            Yes, but do you want the regex to contain a TAB character or the literal \t? If the latter (as I expect), then you need to either use a literal string or escape the backslash (\\t).

            The purpose of the \t in the regular expression is to match a tab character and the \t in the test data is the illustration of there actually being a tab character (basically the string as you'd see it in the locals window for instance). The tabs match in the expression without fail. The problem is that when ran as part of my application (exact same regular expression and options as noted) it doesn't match part of the message whereas in Espresso (a regular expression build/test application) it does.

            P 1 Reply Last reply
            0
            • T tgrt

              PIEBALDconsult wrote:

              Yes, but do you want the regex to contain a TAB character or the literal \t? If the latter (as I expect), then you need to either use a literal string or escape the backslash (\\t).

              The purpose of the \t in the regular expression is to match a tab character and the \t in the test data is the illustration of there actually being a tab character (basically the string as you'd see it in the locals window for instance). The tabs match in the expression without fail. The problem is that when ran as part of my application (exact same regular expression and options as noted) it doesn't match part of the message whereas in Espresso (a regular expression build/test application) it does.

              P Offline
              P Offline
              PIEBALDconsult
              wrote on last edited by
              #6

              Yes, I understand that. Let's see... Fact 1 : The text to be tested contains TAB characters that need to be matched. Fact 2 : To match TAB characters with a regular expression you specify \t (not an actual TAB character). Fact 3 : The string produced by the C# statement string s="\t"; will contain a TAB character, not the desired \t (even though the debugger may show it as \t). Fact 4 : To produce a string in C# that does contain the two-characters \t, use either string s=@"\t"; or string s="\\t"; If that's not the problem, then I don't know what is and I hope someone else steps up.

              T 1 Reply Last reply
              0
              • P PIEBALDconsult

                Yes, I understand that. Let's see... Fact 1 : The text to be tested contains TAB characters that need to be matched. Fact 2 : To match TAB characters with a regular expression you specify \t (not an actual TAB character). Fact 3 : The string produced by the C# statement string s="\t"; will contain a TAB character, not the desired \t (even though the debugger may show it as \t). Fact 4 : To produce a string in C# that does contain the two-characters \t, use either string s=@"\t"; or string s="\\t"; If that's not the problem, then I don't know what is and I hope someone else steps up.

                T Offline
                T Offline
                tgrt
                wrote on last edited by
                #7

                Yes on all accounts. The regular expressions in code are @-quoted. Here's a snippet of the declarations within the factory class (there are other types of messages in different formats that can be received and I use these expressions to identify and parse). You can tell the one that's in issue, because it comes up with the smiley! :)

                private const string _patternReturnDataSet = @"\A\d{2}[\s\t]+(?\w+[,\t]\d+[,\t]\d*[,\t])+[\n\r\f]+(?(?:[^,\t\n\r\f]*[,\t])+[\n\r\f]+)*{DONE}\z";
                private const string _patternReturnVariant = @"\A\d{2}[\s\t]+(?.[^\t]*){DONE}\z";
                private const string _patternError = @"\A\d{2}[\s\t]+(?.[^\t]*)[\s\t]+(?:.[^,]*,[\s\t])*(?:.[^;]*;)[\s\t](?\d*)[;][\s\t]+{DONE}\z";
                private const string _patternInvalidCommand = @"\A(?:\d{2}[\s\t]+)?(?.[^;]*)[;\s\t]+{DONE}\z";
                private const RegexOptions _options = RegexOptions.Compiled | RegexOptions.CultureInvariant | RegexOptions.IgnoreCase | RegexOptions.Multiline;
                
                P 1 Reply Last reply
                0
                • T tgrt

                  Yes on all accounts. The regular expressions in code are @-quoted. Here's a snippet of the declarations within the factory class (there are other types of messages in different formats that can be received and I use these expressions to identify and parse). You can tell the one that's in issue, because it comes up with the smiley! :)

                  private const string _patternReturnDataSet = @"\A\d{2}[\s\t]+(?\w+[,\t]\d+[,\t]\d*[,\t])+[\n\r\f]+(?(?:[^,\t\n\r\f]*[,\t])+[\n\r\f]+)*{DONE}\z";
                  private const string _patternReturnVariant = @"\A\d{2}[\s\t]+(?.[^\t]*){DONE}\z";
                  private const string _patternError = @"\A\d{2}[\s\t]+(?.[^\t]*)[\s\t]+(?:.[^,]*,[\s\t])*(?:.[^;]*;)[\s\t](?\d*)[;][\s\t]+{DONE}\z";
                  private const string _patternInvalidCommand = @"\A(?:\d{2}[\s\t]+)?(?.[^;]*)[;\s\t]+{DONE}\z";
                  private const RegexOptions _options = RegexOptions.Compiled | RegexOptions.CultureInvariant | RegexOptions.IgnoreCase | RegexOptions.Multiline;
                  
                  P Offline
                  P Offline
                  PIEBALDconsult
                  wrote on last edited by
                  #8

                  Huh, alright, now I'm stumped. But I'll try this out myself and maybe learn something. Fortunately, this latest post also came into my email so I have a no-smiley version of the line to use. -- modified at 11:52 Saturday 16th June, 2007 Hmmm, I get: System.Text.RegularExpressions.RegexParser(ScanGroupOpen)parsing "\A\d{2}[\s\t]+(?.[^\t]*)[\s\t]+(?:.[^,]*,[\s\t])*(?:.[^;]*;)[\s\t](?\d*)[;][\s\t]+{DONE}\z" - Unrecognized grouping construct. -- modified at 12:17 Saturday 16th June, 2007 I added names for the capturing groups (I assume you specified names but used angle-brackets, try apostrophes) @"\A\d{2}[\s\t]+(?'ErrorMessage'.[^\t]*)[\s\t]+(?:.[^,]*,[\s\t])*(?:.[^;]*;)[\s\t](?'Number'\d*)[;][\s\t]+{DONE}\z" But it doesn't match... yet. -- modified at 12:35 Saturday 16th June, 2007 OK, it matches, and misbehaves as described. -- modified at 12:51 Saturday 16th June, 2007 The missing text is going to the second non-capturing group (I made it capture just to see). It appears the first group isn't being greedy enough. -- modified at 13:02 Saturday 16th June, 2007 This @"\A\d\d\t(?'ErrorMessage'[^\t]*)\t;\s*(?'Number'\d*);\s*{DONE}\z" works on the sample provided. -- modified at 13:16 Saturday 16th June, 2007 First, I'll say that \s includes \t, so [\s\t] can be simplified to \s Other than that, I changed this V from a + to a * and it seems to work @"\A\d{2}\s+(?'ErrorMessage'.[^\t]*)\s*(?:.[^,]*,\s)*(?:.[^;]*;)\s(?'Number'\d*);\s+{DONE}\z" I would also remove these ^ ^ ^

                  P T 2 Replies Last reply
                  0
                  • P PIEBALDconsult

                    Huh, alright, now I'm stumped. But I'll try this out myself and maybe learn something. Fortunately, this latest post also came into my email so I have a no-smiley version of the line to use. -- modified at 11:52 Saturday 16th June, 2007 Hmmm, I get: System.Text.RegularExpressions.RegexParser(ScanGroupOpen)parsing "\A\d{2}[\s\t]+(?.[^\t]*)[\s\t]+(?:.[^,]*,[\s\t])*(?:.[^;]*;)[\s\t](?\d*)[;][\s\t]+{DONE}\z" - Unrecognized grouping construct. -- modified at 12:17 Saturday 16th June, 2007 I added names for the capturing groups (I assume you specified names but used angle-brackets, try apostrophes) @"\A\d{2}[\s\t]+(?'ErrorMessage'.[^\t]*)[\s\t]+(?:.[^,]*,[\s\t])*(?:.[^;]*;)[\s\t](?'Number'\d*)[;][\s\t]+{DONE}\z" But it doesn't match... yet. -- modified at 12:35 Saturday 16th June, 2007 OK, it matches, and misbehaves as described. -- modified at 12:51 Saturday 16th June, 2007 The missing text is going to the second non-capturing group (I made it capture just to see). It appears the first group isn't being greedy enough. -- modified at 13:02 Saturday 16th June, 2007 This @"\A\d\d\t(?'ErrorMessage'[^\t]*)\t;\s*(?'Number'\d*);\s*{DONE}\z" works on the sample provided. -- modified at 13:16 Saturday 16th June, 2007 First, I'll say that \s includes \t, so [\s\t] can be simplified to \s Other than that, I changed this V from a + to a * and it seems to work @"\A\d{2}\s+(?'ErrorMessage'.[^\t]*)\s*(?:.[^,]*,\s)*(?:.[^;]*;)\s(?'Number'\d*);\s+{DONE}\z" I would also remove these ^ ^ ^

                    P Offline
                    P Offline
                    PIEBALDconsult
                    wrote on last edited by
                    #9

                    Just trying again, to see if I can have the pre tags but no smiley

                    Other than that, I changed this V from a + to a * and it seems to work
                    @"\A\d{2}\s+(?'ErrorMessage'.[^\t]*)\s*(?:.[^,]*,\s)*(?:.[^;]*; )\s(?'Number'\d*);\s+{DONE}\z"
                    I would also remove these ^ ^ ^

                    I guess not, so I added the SPACE.

                    1 Reply Last reply
                    0
                    • P PIEBALDconsult

                      Huh, alright, now I'm stumped. But I'll try this out myself and maybe learn something. Fortunately, this latest post also came into my email so I have a no-smiley version of the line to use. -- modified at 11:52 Saturday 16th June, 2007 Hmmm, I get: System.Text.RegularExpressions.RegexParser(ScanGroupOpen)parsing "\A\d{2}[\s\t]+(?.[^\t]*)[\s\t]+(?:.[^,]*,[\s\t])*(?:.[^;]*;)[\s\t](?\d*)[;][\s\t]+{DONE}\z" - Unrecognized grouping construct. -- modified at 12:17 Saturday 16th June, 2007 I added names for the capturing groups (I assume you specified names but used angle-brackets, try apostrophes) @"\A\d{2}[\s\t]+(?'ErrorMessage'.[^\t]*)[\s\t]+(?:.[^,]*,[\s\t])*(?:.[^;]*;)[\s\t](?'Number'\d*)[;][\s\t]+{DONE}\z" But it doesn't match... yet. -- modified at 12:35 Saturday 16th June, 2007 OK, it matches, and misbehaves as described. -- modified at 12:51 Saturday 16th June, 2007 The missing text is going to the second non-capturing group (I made it capture just to see). It appears the first group isn't being greedy enough. -- modified at 13:02 Saturday 16th June, 2007 This @"\A\d\d\t(?'ErrorMessage'[^\t]*)\t;\s*(?'Number'\d*);\s*{DONE}\z" works on the sample provided. -- modified at 13:16 Saturday 16th June, 2007 First, I'll say that \s includes \t, so [\s\t] can be simplified to \s Other than that, I changed this V from a + to a * and it seems to work @"\A\d{2}\s+(?'ErrorMessage'.[^\t]*)\s*(?:.[^,]*,\s)*(?:.[^;]*;)\s(?'Number'\d*);\s+{DONE}\z" I would also remove these ^ ^ ^

                      T Offline
                      T Offline
                      tgrt
                      wrote on last edited by
                      #10

                      Thanks for all your work on this. I'm stumped too. I hoped that I was missing something obvious. I'm actually using Regex.Match for the work I'm doing. The capturing groups seem to work fine except for the one mentioned whether I use the < or the '. I cannot get it to match if I leave out the \t -- I wonder why it works for you, are you using the same Regex options that I am? (The \s is in there as well, because the delimiter isn't necessarily always a \t -- it's a crappy legacy system I have nothing to do with.) So far no luck. This is very odd.

                      P 1 Reply Last reply
                      0
                      • T tgrt

                        Thanks for all your work on this. I'm stumped too. I hoped that I was missing something obvious. I'm actually using Regex.Match for the work I'm doing. The capturing groups seem to work fine except for the one mentioned whether I use the < or the '. I cannot get it to match if I leave out the \t -- I wonder why it works for you, are you using the same Regex options that I am? (The \s is in there as well, because the delimiter isn't necessarily always a \t -- it's a crappy legacy system I have nothing to do with.) So far no luck. This is very odd.

                        P Offline
                        P Offline
                        PIEBALDconsult
                        wrote on last edited by
                        #11

                        I think the culprit is that +, but I don't know why it would work in one system and not the other. I am using the same options, and using .Matches to be sure I get only the one match.

                        T 1 Reply Last reply
                        0
                        • P PIEBALDconsult

                          I think the culprit is that +, but I don't know why it would work in one system and not the other. I am using the same options, and using .Matches to be sure I get only the one match.

                          T Offline
                          T Offline
                          tgrt
                          wrote on last edited by
                          #12

                          I used the +, because I want to match one or more. I cannot rely on the legacy application to have only one, but it must have at least one. I'm nervous about cascading problems by switching the + to a * that I may not discover until later.

                          P 1 Reply Last reply
                          0
                          • T tgrt

                            I used the +, because I want to match one or more. I cannot rely on the legacy application to have only one, but it must have at least one. I'm nervous about cascading problems by switching the + to a * that I may not discover until later.

                            P Offline
                            P Offline
                            PIEBALDconsult
                            wrote on last edited by
                            #13

                            Yes, but it's causing the problem. Because you require at least one, it uses the SPACE that should be part of the first group. By allowing zero, that doesn't happen. All in all, that regex is more complex than needed for the given sample. Are there other samples you can provide?

                            T 1 Reply Last reply
                            0
                            • P PIEBALDconsult

                              Yes, but it's causing the problem. Because you require at least one, it uses the SPACE that should be part of the first group. By allowing zero, that doesn't happen. All in all, that regex is more complex than needed for the given sample. Are there other samples you can provide?

                              T Offline
                              T Offline
                              tgrt
                              wrote on last edited by
                              #14

                              I'm just finally getting back to this after several projects sidelined the longer term project that this regular expression is used in. It seems using the * does the trick -- all my unit tests are passing. I gave you a "5". Sorry it's belated.

                              P 1 Reply Last reply
                              0
                              • T tgrt

                                I'm just finally getting back to this after several projects sidelined the longer term project that this regular expression is used in. It seems using the * does the trick -- all my unit tests are passing. I gave you a "5". Sorry it's belated.

                                P Offline
                                P Offline
                                PIEBALDconsult
                                wrote on last edited by
                                #15

                                Prot a noblem, glad it worked.

                                1 Reply Last reply
                                0
                                Reply
                                • Reply as topic
                                Log in to reply
                                • Oldest to Newest
                                • Newest to Oldest
                                • Most Votes


                                • Login

                                • Don't have an account? Register

                                • Login or register to search.
                                • First post
                                  Last post
                                0
                                • Categories
                                • Recent
                                • Tags
                                • Popular
                                • World
                                • Users
                                • Groups