C# RegEx Match Groups
-
peterchen wrote:
Anytime I read anything about RegEx, I wonder why anyone in their right mind is using them.
IMHO, they're great when you need to code up something that's almost intuitive for humans do to: recognize certain patterns in text, and then do something interesting with them. Parsing code tends to be simultaneously complex and boring - you can spend half an hour reading screen-scraper code, to be left with something as simple as "pick the numbers after the names, and associate them with the names". A half-dozen simple regexps, with a comment preceding them explaining their purpose, can be recognized and skipped in seconds, leaving me free to use the rest of my time to read and understand the interesting portions of the code. (and yes, ideally the program would be structured such that parsing code was separate from processing code, but sometimes it seems that's just too much to ask... )
I have a few functions that since I have them do everything I need. They are based on "remove and return". It mayb be 50 lines instead of 5, but I feel much safer. :cool: I've always been curious about RegExes. They are cool. But the "implementation differences" are plain scary, and you end up with a lot of write-only code.
Developers, Developers, Developers, Developers, Developers, Developers, Velopers, Develprs, Developers!
We are a big screwed up dysfunctional psychotic happy family - some more screwed up, others more happy, but everybody's psychotic joint venture definition of CP
Linkify!|Fold With Us! -
Anytime I read anything about RegEx, I wonder why anyone in their right mind is using them. One thousand gotos cannot cause that much havoc.
Developers, Developers, Developers, Developers, Developers, Developers, Velopers, Develprs, Developers!
We are a big screwed up dysfunctional psychotic happy family - some more screwed up, others more happy, but everybody's psychotic joint venture definition of CP
Linkify!|Fold With Us! -
What? I hope you meant this ironically :cool: Ever tried to parse incoming messages like the IRC protocol? Works with string functions, but it's a royal pain in the a**, imho. They're also great for validating certain strings. Gruß
No :cool: With the right helpers, it's quite readable / maintainable code. Not as compact as regexes, though. I never needed "find substring" lately, and performance wasn't the most important one, so YMMV. [edit]I find that if you have to validate the string, RegExes become terribly complex. That's the main reason I avoid them.[/edit]
Developers, Developers, Developers, Developers, Developers, Developers, Velopers, Develprs, Developers!
We are a big screwed up dysfunctional psychotic happy family - some more screwed up, others more happy, but everybody's psychotic joint venture definition of CP
Linkify!|Fold With Us! -
No :cool: With the right helpers, it's quite readable / maintainable code. Not as compact as regexes, though. I never needed "find substring" lately, and performance wasn't the most important one, so YMMV. [edit]I find that if you have to validate the string, RegExes become terribly complex. That's the main reason I avoid them.[/edit]
Developers, Developers, Developers, Developers, Developers, Developers, Velopers, Develprs, Developers!
We are a big screwed up dysfunctional psychotic happy family - some more screwed up, others more happy, but everybody's psychotic joint venture definition of CP
Linkify!|Fold With Us!peterchen wrote:
No With the right helpers, it's quite readable / maintainable code. Not as compact as regexes, though.
Depends on the Regex ^([0-9]( |-)?)?(\(?[0-9]{3}\)?|[0-9]{3})( |-)?([0-9]{3}( |-)?[0-9]{4}|[a-zA-Z0-9]{7})$ I don't like this one, either :wtf:
-
I have a few functions that since I have them do everything I need. They are based on "remove and return". It mayb be 50 lines instead of 5, but I feel much safer. :cool: I've always been curious about RegExes. They are cool. But the "implementation differences" are plain scary, and you end up with a lot of write-only code.
Developers, Developers, Developers, Developers, Developers, Developers, Velopers, Develprs, Developers!
We are a big screwed up dysfunctional psychotic happy family - some more screwed up, others more happy, but everybody's psychotic joint venture definition of CP
Linkify!|Fold With Us!I couldn't live without them. I'd hate to write parsers everytime I wanted to extract some small piece of info from a file. As an example, in our build system we always extract the version number of the product or library from some version.h file. The format of that file has not been standardized so we need a different regex for each one and there's 10+ of them and growing. I think I ran into this regex problem the other day with groups while parsing version numbers. I couldn't figure out why my values where goofed up. I kept expecting group[0] to have the first item and it didn't. I think I ended up using named group items and worked around it that way.
Todd Smith
-
I couldn't live without them. I'd hate to write parsers everytime I wanted to extract some small piece of info from a file. As an example, in our build system we always extract the version number of the product or library from some version.h file. The format of that file has not been standardized so we need a different regex for each one and there's 10+ of them and growing. I think I ran into this regex problem the other day with groups while parsing version numbers. I couldn't figure out why my values where goofed up. I kept expecting group[0] to have the first item and it didn't. I think I ended up using named group items and worked around it that way.
Todd Smith
They can be quite handy, no doubt, esp. if you are "fluent" in them. (I'm not)
Developers, Developers, Developers, Developers, Developers, Developers, Velopers, Develprs, Developers!
We are a big screwed up dysfunctional psychotic happy family - some more screwed up, others more happy, but everybody's psychotic joint venture definition of CP
Linkify!|Fold With Us! -
Chris Losinger wrote:
so it's, like, an array, see, of sub-matches, yeah? but the first element in the array is, like, totally something else, instead
I see your problem. You're reading the ValleyGirl .NET documentation, not the Visual C# .NET docs.
cheers, Chris Maunder
CodeProject.com : C++ MVP
-
(from memory, don't have the code in front of me) i spent a good two hours yesterday trying to port a bit of Javascript RegEx code to C#. the RegExp has four groups "(blah1)|(blah2)|(blah3)|(blah4)" and the JScript was picking results out of the captures like:
if (RegEx.$1!="") ...
else if (RegEx.$2!="") ...
else if (RegEx.$3!="") ...
else if (RegEx.$4!="") ...so, it seemed like a pretty straightforward conversion, i could just do the C# Regex match, then pull the sub-matches out of the Match.Groups array:
match = regexp.Match(...);
if (match.Groups[0] != "")...
else if (match.Groups[1] != "")...
etcbut that didn't work - the first group (Groups[0]) was always set, even when the first RegExp subexpression shouldn't have matched anything. and there was alwys another Group element set, too, but not the one i was expecting... was my RegExp broken? was the concept of matched subgroups different in C#'s Regex code than in Javascript's ? the Groups MSDN entry says:
Group represents the results from a single capturing group. A capturing group can capture zero, one, or more strings in a single match because of quantifiers, so Group supplies a collection of Capture objects.
sounds like the right object to use... so, where are my &^$%^*% sub-matches ? after a couple of hours of Internet searches and head-scratching, i figured it out: what MS's documentation doesn't tell you (at least not in any place i could find) is that Groups[0] is actually the entire match, not just the first sub-group. the sub-matches actually start with match.Groups[1]. so it's, like, an array, see, of sub-matches, yeah? but the first element in the array is, like, totally something else, instead. and it's not documented! fantastic! :omg: after i figured this out, i was able to find the place in the MSDN where they mention this fact (because i knew what to search for) - it's in an overview article about RegEx usage, not on the pages for the Regex, Match or Group objects themselves. it seems like something they'd want to document right there on the pages for the relevant classes, in big bold letters. -- modified at 14:42 Saturday 2nd December, 2006
Everyone knows you should write loops like for i = 1 to Groups.Length next I don't see what the problem is. :-D
Using the GridView is like trying to explain to someone else how to move a third person's hands in order to tie your shoelaces for you. -Chris Maunder
-
I couldn't live without them. I'd hate to write parsers everytime I wanted to extract some small piece of info from a file. As an example, in our build system we always extract the version number of the product or library from some version.h file. The format of that file has not been standardized so we need a different regex for each one and there's 10+ of them and growing. I think I ran into this regex problem the other day with groups while parsing version numbers. I couldn't figure out why my values where goofed up. I kept expecting group[0] to have the first item and it didn't. I think I ended up using named group items and worked around it that way.
Todd Smith
Todd Smith wrote:
I think I ended up using named group items and worked around it that way.
Ah, that's why I've never run into this. I've always used named groups. Probably because I ran into this and just forgot about it.
Using the GridView is like trying to explain to someone else how to move a third person's hands in order to tie your shoelaces for you. -Chris Maunder
-
peterchen wrote:
No With the right helpers, it's quite readable / maintainable code. Not as compact as regexes, though.
Depends on the Regex ^([0-9]( |-)?)?(\(?[0-9]{3}\)?|[0-9]{3})( |-)?([0-9]{3}( |-)?[0-9]{4}|[a-zA-Z0-9]{7})$ I don't like this one, either :wtf:
That's where inline comments come in (see http://www.regular-expressions.info/comments.html[^] ) A good trick described at http://www.codeproject.com/dotnet/RegexTutorial.asp[^] is : "Comments please Another use of parentheses is to include comments using the "(?#comment)" syntax. A better method is to set the "Ignore Pattern Whitespace" option, which allows whitespace to be inserted in the expression and then ignored when the expression is used. With this option set, anything following a number sign "#" at the end of each line of text is ignored. For example, we can format the preceding example like this: 31. Text between HTML tags, with comments (?<= # Search for a prefix, but exclude it <(\w+)> # Match a tag of alphanumerics within angle brackets ) # End the prefix .* # Match any text (?= # Search for a suffix, but exclude it <\/\1> # Match the previously captured tag preceded by "/" ) # End the suffix"