Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Other Discussions
  3. Clever Code
  4. C# RegEx Match Groups

C# RegEx Match Groups

Scheduled Pinned Locked Moved Clever Code
regexcsharpjavascriptcomdata-structures
17 Posts 11 Posters 2 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S Shog9 0

    peterchen wrote:

    Anytime I read anything about RegEx, I wonder why anyone in their right mind is using them.

    IMHO, they're great when you need to code up something that's almost intuitive for humans do to: recognize certain patterns in text, and then do something interesting with them. Parsing code tends to be simultaneously complex and boring - you can spend half an hour reading screen-scraper code, to be left with something as simple as "pick the numbers after the names, and associate them with the names". A half-dozen simple regexps, with a comment preceding them explaining their purpose, can be recognized and skipped in seconds, leaving me free to use the rest of my time to read and understand the interesting portions of the code. (and yes, ideally the program would be structured such that parsing code was separate from processing code, but sometimes it seems that's just too much to ask... )

    ---- Do you see what i see? Why do we live like this? Is it because it's true... ...That ignorance is bliss?

    P Offline
    P Offline
    peterchen
    wrote on last edited by
    #8

    I have a few functions that since I have them do everything I need. They are based on "remove and return". It mayb be 50 lines instead of 5, but I feel much safer. :cool: I've always been curious about RegExes. They are cool. But the "implementation differences" are plain scary, and you end up with a lot of write-only code.


    Developers, Developers, Developers, Developers, Developers, Developers, Velopers, Develprs, Developers!
    We are a big screwed up dysfunctional psychotic happy family - some more screwed up, others more happy, but everybody's psychotic joint venture definition of CP
    Linkify!|Fold With Us!

    T 1 Reply Last reply
    0
    • P peterchen

      Anytime I read anything about RegEx, I wonder why anyone in their right mind is using them. One thousand gotos cannot cause that much havoc.


      Developers, Developers, Developers, Developers, Developers, Developers, Velopers, Develprs, Developers!
      We are a big screwed up dysfunctional psychotic happy family - some more screwed up, others more happy, but everybody's psychotic joint venture definition of CP
      Linkify!|Fold With Us!

      L Offline
      L Offline
      Lost User
      wrote on last edited by
      #9

      What? I hope you meant this ironically :cool: Ever tried to parse incoming messages like the IRC protocol? Works with string functions, but it's a royal pain in the a**, imho. They're also great for validating certain strings. Gruß

      P 1 Reply Last reply
      0
      • L Lost User

        What? I hope you meant this ironically :cool: Ever tried to parse incoming messages like the IRC protocol? Works with string functions, but it's a royal pain in the a**, imho. They're also great for validating certain strings. Gruß

        P Offline
        P Offline
        peterchen
        wrote on last edited by
        #10

        No :cool: With the right helpers, it's quite readable / maintainable code. Not as compact as regexes, though. I never needed "find substring" lately, and performance wasn't the most important one, so YMMV. [edit]I find that if you have to validate the string, RegExes become terribly complex. That's the main reason I avoid them.[/edit]


        Developers, Developers, Developers, Developers, Developers, Developers, Velopers, Develprs, Developers!
        We are a big screwed up dysfunctional psychotic happy family - some more screwed up, others more happy, but everybody's psychotic joint venture definition of CP
        Linkify!|Fold With Us!

        L 1 Reply Last reply
        0
        • P peterchen

          No :cool: With the right helpers, it's quite readable / maintainable code. Not as compact as regexes, though. I never needed "find substring" lately, and performance wasn't the most important one, so YMMV. [edit]I find that if you have to validate the string, RegExes become terribly complex. That's the main reason I avoid them.[/edit]


          Developers, Developers, Developers, Developers, Developers, Developers, Velopers, Develprs, Developers!
          We are a big screwed up dysfunctional psychotic happy family - some more screwed up, others more happy, but everybody's psychotic joint venture definition of CP
          Linkify!|Fold With Us!

          L Offline
          L Offline
          Lost User
          wrote on last edited by
          #11

          peterchen wrote:

          No With the right helpers, it's quite readable / maintainable code. Not as compact as regexes, though.

          Depends on the Regex ^([0-9]( |-)?)?(\(?[0-9]{3}\)?|[0-9]{3})( |-)?([0-9]{3}( |-)?[0-9]{4}|[a-zA-Z0-9]{7})$ I don't like this one, either :wtf:

          M 1 Reply Last reply
          0
          • P peterchen

            I have a few functions that since I have them do everything I need. They are based on "remove and return". It mayb be 50 lines instead of 5, but I feel much safer. :cool: I've always been curious about RegExes. They are cool. But the "implementation differences" are plain scary, and you end up with a lot of write-only code.


            Developers, Developers, Developers, Developers, Developers, Developers, Velopers, Develprs, Developers!
            We are a big screwed up dysfunctional psychotic happy family - some more screwed up, others more happy, but everybody's psychotic joint venture definition of CP
            Linkify!|Fold With Us!

            T Offline
            T Offline
            Todd Smith
            wrote on last edited by
            #12

            I couldn't live without them. I'd hate to write parsers everytime I wanted to extract some small piece of info from a file. As an example, in our build system we always extract the version number of the product or library from some version.h file. The format of that file has not been standardized so we need a different regex for each one and there's 10+ of them and growing. I think I ran into this regex problem the other day with groups while parsing version numbers. I couldn't figure out why my values where goofed up. I kept expecting group[0] to have the first item and it didn't. I think I ended up using named group items and worked around it that way.

            Todd Smith

            P A 2 Replies Last reply
            0
            • T Todd Smith

              I couldn't live without them. I'd hate to write parsers everytime I wanted to extract some small piece of info from a file. As an example, in our build system we always extract the version number of the product or library from some version.h file. The format of that file has not been standardized so we need a different regex for each one and there's 10+ of them and growing. I think I ran into this regex problem the other day with groups while parsing version numbers. I couldn't figure out why my values where goofed up. I kept expecting group[0] to have the first item and it didn't. I think I ended up using named group items and worked around it that way.

              Todd Smith

              P Offline
              P Offline
              peterchen
              wrote on last edited by
              #13

              They can be quite handy, no doubt, esp. if you are "fluent" in them. (I'm not)


              Developers, Developers, Developers, Developers, Developers, Developers, Velopers, Develprs, Developers!
              We are a big screwed up dysfunctional psychotic happy family - some more screwed up, others more happy, but everybody's psychotic joint venture definition of CP
              Linkify!|Fold With Us!

              1 Reply Last reply
              0
              • C Chris Maunder

                Chris Losinger wrote:

                so it's, like, an array, see, of sub-matches, yeah? but the first element in the array is, like, totally something else, instead

                I see your problem. You're reading the ValleyGirl .NET documentation, not the Visual C# .NET docs.

                cheers, Chris Maunder

                CodeProject.com : C++ MVP

                M Offline
                M Offline
                Maximilien
                wrote on last edited by
                #14

                :laugh:


                Maximilien Lincourt Your Head A Splode - Strong Bad

                1 Reply Last reply
                0
                • C Chris Losinger

                  (from memory, don't have the code in front of me) i spent a good two hours yesterday trying to port a bit of Javascript RegEx code to C#. the RegExp has four groups "(blah1)|(blah2)|(blah3)|(blah4)" and the JScript was picking results out of the captures like:

                  if (RegEx.$1!="") ...
                  else if (RegEx.$2!="") ...
                  else if (RegEx.$3!="") ...
                  else if (RegEx.$4!="") ...

                  so, it seemed like a pretty straightforward conversion, i could just do the C# Regex match, then pull the sub-matches out of the Match.Groups array:

                  match = regexp.Match(...);
                  if (match.Groups[0] != "")...
                  else if (match.Groups[1] != "")...
                  etc

                  but that didn't work - the first group (Groups[0]) was always set, even when the first RegExp subexpression shouldn't have matched anything. and there was alwys another Group element set, too, but not the one i was expecting... was my RegExp broken? was the concept of matched subgroups different in C#'s Regex code than in Javascript's ? the Groups MSDN entry says:

                  Group represents the results from a single capturing group. A capturing group can capture zero, one, or more strings in a single match because of quantifiers, so Group supplies a collection of Capture objects.

                  sounds like the right object to use... so, where are my &^$%^*% sub-matches ? after a couple of hours of Internet searches and head-scratching, i figured it out: what MS's documentation doesn't tell you (at least not in any place i could find) is that Groups[0] is actually the entire match, not just the first sub-group. the sub-matches actually start with match.Groups[1]. so it's, like, an array, see, of sub-matches, yeah? but the first element in the array is, like, totally something else, instead. and it's not documented! fantastic! :omg: after i figured this out, i was able to find the place in the MSDN where they mention this fact (because i knew what to search for) - it's in an overview article about RegEx usage, not on the pages for the Regex, Match or Group objects themselves. it seems like something they'd want to document right there on the pages for the relevant classes, in big bold letters. -- modified at 14:42 Saturday 2nd December, 2006

                  image processing | batch image processing | blogging

                  A Offline
                  A Offline
                  Andy Brummer
                  wrote on last edited by
                  #15

                  Everyone knows you should write loops like for i = 1 to Groups.Length next I don't see what the problem is. :-D

                  Using the GridView is like trying to explain to someone else how to move a third person's hands in order to tie your shoelaces for you. -Chris Maunder

                  1 Reply Last reply
                  0
                  • T Todd Smith

                    I couldn't live without them. I'd hate to write parsers everytime I wanted to extract some small piece of info from a file. As an example, in our build system we always extract the version number of the product or library from some version.h file. The format of that file has not been standardized so we need a different regex for each one and there's 10+ of them and growing. I think I ran into this regex problem the other day with groups while parsing version numbers. I couldn't figure out why my values where goofed up. I kept expecting group[0] to have the first item and it didn't. I think I ended up using named group items and worked around it that way.

                    Todd Smith

                    A Offline
                    A Offline
                    Andy Brummer
                    wrote on last edited by
                    #16

                    Todd Smith wrote:

                    I think I ended up using named group items and worked around it that way.

                    Ah, that's why I've never run into this. I've always used named groups. Probably because I ran into this and just forgot about it.

                    Using the GridView is like trying to explain to someone else how to move a third person's hands in order to tie your shoelaces for you. -Chris Maunder

                    1 Reply Last reply
                    0
                    • L Lost User

                      peterchen wrote:

                      No With the right helpers, it's quite readable / maintainable code. Not as compact as regexes, though.

                      Depends on the Regex ^([0-9]( |-)?)?(\(?[0-9]{3}\)?|[0-9]{3})( |-)?([0-9]{3}( |-)?[0-9]{4}|[a-zA-Z0-9]{7})$ I don't like this one, either :wtf:

                      M Offline
                      M Offline
                      Monkeyget2
                      wrote on last edited by
                      #17

                      That's where inline comments come in (see http://www.regular-expressions.info/comments.html[^] ) A good trick described at http://www.codeproject.com/dotnet/RegexTutorial.asp[^] is : "Comments please Another use of parentheses is to include comments using the "(?#comment)" syntax. A better method is to set the "Ignore Pattern Whitespace" option, which allows whitespace to be inserted in the expression and then ignored when the expression is used. With this option set, anything following a number sign "#" at the end of each line of text is ignored. For example, we can format the preceding example like this: 31. Text between HTML tags, with comments (?<= # Search for a prefix, but exclude it <(\w+)> # Match a tag of alphanumerics within angle brackets ) # End the prefix .* # Match any text (?= # Search for a suffix, but exclude it <\/\1> # Match the previously captured tag preceded by "/" ) # End the suffix"

                      1 Reply Last reply
                      0
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Don't have an account? Register

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups