regex name/value pairs
-
a little stumped with a regex i've been working on... trying to parse name/value pairs from a string - I have something close, but it chokes on certain cases. (?# PROPERTY )(?[a-zA-Z0-9_]*)\x20*?(?# OPERATOR )(?>=|<=|<|>|=|!=|LIKE)(?# Value )\x20*?'?\w+'? // any property name of any length, captured into backreference 'Property' (?[a-zA-Z0-9_]*) // whitespace, 0+, minimal matching \x20*? //any of the operators >=, <=, <, >, =, !=, LIKE captured into backreference 'Operator' (?>=|<=|<|>|=|!=|LIKE) // whitespace, 0+, minimal matching \x20*? // single quote, 0-1 // followed by word character, 1+ // followed by single quote, 0-1 '?\w+'? I am looking to match (and extract) name/operator/value pairs from a string, such as... PropertyName='SomeValue' AND IntProperty < 9 OR AnotherProperty LIKE 'this is a test' the regex i have above works fine for the first two terms, but then when you get a quoted string, it only matches up to the first space... This is only ever going to be an issue with the quoted strings, anything else will assume a word boundary on whitespace, which is the desired behavour. I need that last part of the Regex to basically say "if we're enclosed in single quotes, get anything between the opening and closing quote; otherwise, match everything up to whitespace" any help is much appreciated. (or if anything spots any weak spots in the Regex i have so far..) thanks -
-
a little stumped with a regex i've been working on... trying to parse name/value pairs from a string - I have something close, but it chokes on certain cases. (?# PROPERTY )(?[a-zA-Z0-9_]*)\x20*?(?# OPERATOR )(?>=|<=|<|>|=|!=|LIKE)(?# Value )\x20*?'?\w+'? // any property name of any length, captured into backreference 'Property' (?[a-zA-Z0-9_]*) // whitespace, 0+, minimal matching \x20*? //any of the operators >=, <=, <, >, =, !=, LIKE captured into backreference 'Operator' (?>=|<=|<|>|=|!=|LIKE) // whitespace, 0+, minimal matching \x20*? // single quote, 0-1 // followed by word character, 1+ // followed by single quote, 0-1 '?\w+'? I am looking to match (and extract) name/operator/value pairs from a string, such as... PropertyName='SomeValue' AND IntProperty < 9 OR AnotherProperty LIKE 'this is a test' the regex i have above works fine for the first two terms, but then when you get a quoted string, it only matches up to the first space... This is only ever going to be an issue with the quoted strings, anything else will assume a word boundary on whitespace, which is the desired behavour. I need that last part of the Regex to basically say "if we're enclosed in single quotes, get anything between the opening and closing quote; otherwise, match everything up to whitespace" any help is much appreciated. (or if anything spots any weak spots in the Regex i have so far..) thanks -
You will need to test for quotes and assign the substring a name. Do this at the first place in your expression where the quotes can occur. This will test for optional single or double quotes: (?[""']?) -- *The double-quote is repeated as shown if it exists in a VB string. You must then use conditional matching to test whether has been assigned. The syntax for the conditional match is: (?yes|no) -- The |no portion is optional. So, (?\k) Will test whether was previously assigned, and if so it will match it again. Otherwise, it does nothing. Hope that helps.
-
You will need to test for quotes and assign the substring a name. Do this at the first place in your expression where the quotes can occur. This will test for optional single or double quotes: (?[""']?) -- *The double-quote is repeated as shown if it exists in a VB string. You must then use conditional matching to test whether has been assigned. The syntax for the conditional match is: (?yes|no) -- The |no portion is optional. So, (?\k) Will test whether was previously assigned, and if so it will match it again. Otherwise, it does nothing. Hope that helps.
Keith - I was headed in that direction, but couldn't quite get it.. Thanks for the help on it. What you have helped with has given me.. (?[a-zA-Z0-9_]*) \x20*? (?<Operator>>=|<=|<|>|=|!=|LIKE) \x20*? (?<quote>[""']?)\w+?\k<quote> \x20*? which matches the following PropertyName = 'Blah' AND PropertyTwo >= 9 OR PropertyThree = "asdf" OR AnotherProperty >= 'thisis atest' perfectly up until the last part... AnotherProperty='thisis atest' the \w+? will stop at the space in between 'is' and 'atest', which is where i am stuck now. using .*, or something similar captures too much... I am inclined to believe that using a negated character class [^""'] is the way to go, but am not positive how to do that; the things I have been trying wind up matching too much. Any ideas? Thanks - -- modified at 9:21 Friday 5th January, 2007
-
a little stumped with a regex i've been working on... trying to parse name/value pairs from a string - I have something close, but it chokes on certain cases. (?# PROPERTY )(?[a-zA-Z0-9_]*)\x20*?(?# OPERATOR )(?>=|<=|<|>|=|!=|LIKE)(?# Value )\x20*?'?\w+'? // any property name of any length, captured into backreference 'Property' (?[a-zA-Z0-9_]*) // whitespace, 0+, minimal matching \x20*? //any of the operators >=, <=, <, >, =, !=, LIKE captured into backreference 'Operator' (?>=|<=|<|>|=|!=|LIKE) // whitespace, 0+, minimal matching \x20*? // single quote, 0-1 // followed by word character, 1+ // followed by single quote, 0-1 '?\w+'? I am looking to match (and extract) name/operator/value pairs from a string, such as... PropertyName='SomeValue' AND IntProperty < 9 OR AnotherProperty LIKE 'this is a test' the regex i have above works fine for the first two terms, but then when you get a quoted string, it only matches up to the first space... This is only ever going to be an issue with the quoted strings, anything else will assume a word boundary on whitespace, which is the desired behavour. I need that last part of the Regex to basically say "if we're enclosed in single quotes, get anything between the opening and closing quote; otherwise, match everything up to whitespace" any help is much appreciated. (or if anything spots any weak spots in the Regex i have so far..) thanks -
You really have a grammar there, not just a regular expression. I'd recommend using something like ANTLR[^] to parse your expressions. It's a lot less of a headache than trying to do a single RE that does the whole job.
Stability. What an interesting concept. -- Chris Maunder
-
Keith - I was headed in that direction, but couldn't quite get it.. Thanks for the help on it. What you have helped with has given me.. (?[a-zA-Z0-9_]*) \x20*? (?<Operator>>=|<=|<|>|=|!=|LIKE) \x20*? (?<quote>[""']?)\w+?\k<quote> \x20*? which matches the following PropertyName = 'Blah' AND PropertyTwo >= 9 OR PropertyThree = "asdf" OR AnotherProperty >= 'thisis atest' perfectly up until the last part... AnotherProperty='thisis atest' the \w+? will stop at the space in between 'is' and 'atest', which is where i am stuck now. using .*, or something similar captures too much... I am inclined to believe that using a negated character class [^""'] is the way to go, but am not positive how to do that; the things I have been trying wind up matching too much. Any ideas? Thanks - -- modified at 9:21 Friday 5th January, 2007
Try replacing the \w+?\k with: .+\k This will match everything between the quotes. The \w is matching any word character, so it's not matching the spaces. The period will match anything, and since we want to get everything between the quotes, we use the + to denote one or more matches. Hope that helps.