Trying to parse legacy data with RegEx [modified]
-
smesser wrote:
All can be variable length and there isn't a delimiter Unfortunately, you also can't depend on the tags having no spaces with the colons (i.e. VIN :, vs YRMD:.
This is true, which makes parsing strings like that pretty hard. I did however find a pattern in this string, but it doesn't necessarily mean that it holds true for other strings. Basically it's always a pair of arbitrary strings separated by a colon, sort of like a key/value-pair. So whenever there's a whitespace left (or also right?) to the colon it doesn't count as a delimiter. Based on this fact the following regex will work:
[^\ ]+\ *:[^\ ]+
If you want to also disallow whitespaces right to the colon as a delimiter then this will work:[^\ ]+\ *:\ *[^\ ]+
regardsThen what use named groups to get your values? EDIT: private void Test9() { string original = "LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1G1JC12F137230800"; Regex r = new Regex(@"[^\ ]+\ *:\ *[^\ ]+"); MatchCollection theMatches = r.Matches(original); foreach (Match theMatch in theMatches) { Console.WriteLine(theMatch.Value); } }
modified on Friday, March 21, 2008 12:42 PM
-
I need to be able to parse the following line, I am not even sure regualar expression are the best option. LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1G1JC12F137230800 into LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1F1JD12F137230735 And then capture the data after the semicolon. All can be variable length and there isn't a delimiter Unfortunately, you also can't depend on the tags having no spaces with the colons (i.e. VIN :, vs YRMD:. I have been trying to use Grouping to parse this. I am newish to regular expression and thought maybe I should use them to solve this problem. Are regular expression even a good fit for this problem? Any suggestions appreciated.
modified on Friday, March 21, 2008 12:01 PM
This seems to work
if ( args.Length > 0 )
{
//LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1G1JC12F137230800System.Text.RegularExpressions.Regex reg = new System.Text.RegularExpressions.Regex ( @"^\\s\*LIC#\\s\*: (?'LIC'.\*)YRMD\\s\*: (?'YRMD'.\*)MAKE\\s\*: (?'MAKE'.\*)BTM\\s\*: (?'BTM'.\*)VIN\\s: (?'VIN'.\*)$" ) ; foreach ( System.Text.RegularExpressions.Match mat in reg.Matches ( args \[ 0 \] ) ) { System.Console.WriteLine ( "LIC# = {0} YRMD = {1} MAKE = {2} BTM = {3} VIN = {4}" , mat.Groups \[ "LIC" \].Value , mat.Groups \[ "YRMD" \].Value , mat.Groups \[ "MAKE" \].Value , mat.Groups \[ "BTM" \].Value , mat.Groups \[ "VIN" \].Value ) ; }
}
Dagnabit! Frowny faces?! Who wrote this crap? I added a SPACE between the
:
and the(
to solve that little problem, but they should be eliminated from the Regex.modified on Friday, March 21, 2008 1:52 PM
-
This seems to work
if ( args.Length > 0 )
{
//LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1G1JC12F137230800System.Text.RegularExpressions.Regex reg = new System.Text.RegularExpressions.Regex ( @"^\\s\*LIC#\\s\*: (?'LIC'.\*)YRMD\\s\*: (?'YRMD'.\*)MAKE\\s\*: (?'MAKE'.\*)BTM\\s\*: (?'BTM'.\*)VIN\\s: (?'VIN'.\*)$" ) ; foreach ( System.Text.RegularExpressions.Match mat in reg.Matches ( args \[ 0 \] ) ) { System.Console.WriteLine ( "LIC# = {0} YRMD = {1} MAKE = {2} BTM = {3} VIN = {4}" , mat.Groups \[ "LIC" \].Value , mat.Groups \[ "YRMD" \].Value , mat.Groups \[ "MAKE" \].Value , mat.Groups \[ "BTM" \].Value , mat.Groups \[ "VIN" \].Value ) ; }
}
Dagnabit! Frowny faces?! Who wrote this crap? I added a SPACE between the
:
and the(
to solve that little problem, but they should be eliminated from the Regex.modified on Friday, March 21, 2008 1:52 PM
PIEBALDconsult wrote:
Dagnabit! Frowny faces?! Who wrote this crap?
Paging Chris Maunder. :doh:
Otherwise [Microsoft is] toast in the long term no matter how much money they've got. They would be already if the Linux community didn't have it's head so firmly up it's own command line buffer that it looks like taking 15 years to find the desktop. -- Matthew Faithfull
-
This seems to work
if ( args.Length > 0 )
{
//LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1G1JC12F137230800System.Text.RegularExpressions.Regex reg = new System.Text.RegularExpressions.Regex ( @"^\\s\*LIC#\\s\*: (?'LIC'.\*)YRMD\\s\*: (?'YRMD'.\*)MAKE\\s\*: (?'MAKE'.\*)BTM\\s\*: (?'BTM'.\*)VIN\\s: (?'VIN'.\*)$" ) ; foreach ( System.Text.RegularExpressions.Match mat in reg.Matches ( args \[ 0 \] ) ) { System.Console.WriteLine ( "LIC# = {0} YRMD = {1} MAKE = {2} BTM = {3} VIN = {4}" , mat.Groups \[ "LIC" \].Value , mat.Groups \[ "YRMD" \].Value , mat.Groups \[ "MAKE" \].Value , mat.Groups \[ "BTM" \].Value , mat.Groups \[ "VIN" \].Value ) ; }
}
Dagnabit! Frowny faces?! Who wrote this crap? I added a SPACE between the
:
and the(
to solve that little problem, but they should be eliminated from the Regex.modified on Friday, March 21, 2008 1:52 PM
Hum, unless the copy paste messed something up this is not creating a match for me.
-
Hum, unless the copy paste messed something up this is not creating a match for me.
With the SPACE between the
:
and(
you need to useSystem.Text.RegularExpressions.RegexOptions.IgnorePatternWhitespace
to ignore the extraneous SPACEs, but then the#
and everything after it become a comment! :mad: So now I've escaped the the#
to\x23
. Resulting in:System.Text.RegularExpressions.Regex reg = new System.Text.RegularExpressions.Regex
(
@"^\s*LIC\x23\s*: (?'LIC'.*)YRMD\s*: (?'YRMD'.*)MAKE\s*: (?'MAKE'.*)BTM\s*: (?'BTM'.*)VIN\s: (?'VIN'.*)$"
,
System.Text.RegularExpressions.RegexOptions.IgnorePatternWhitespace
) ;Whoops, I had left out an asterisk I had meant to include:
VIN\s*****
modified on Friday, March 21, 2008 2:35 PM
-
With the SPACE between the
:
and(
you need to useSystem.Text.RegularExpressions.RegexOptions.IgnorePatternWhitespace
to ignore the extraneous SPACEs, but then the#
and everything after it become a comment! :mad: So now I've escaped the the#
to\x23
. Resulting in:System.Text.RegularExpressions.Regex reg = new System.Text.RegularExpressions.Regex
(
@"^\s*LIC\x23\s*: (?'LIC'.*)YRMD\s*: (?'YRMD'.*)MAKE\s*: (?'MAKE'.*)BTM\s*: (?'BTM'.*)VIN\s: (?'VIN'.*)$"
,
System.Text.RegularExpressions.RegexOptions.IgnorePatternWhitespace
) ;Whoops, I had left out an asterisk I had meant to include:
VIN\s*****
modified on Friday, March 21, 2008 2:35 PM
-
Quick little edit to get rid of the extra space that was being captured.
^\s*LIC\x23\s*: (?'LIC'.*)\sYRMD\s*: (?'YRMD'.*)\sMAKE\s*: (?'MAKE'.*)\sBTM\s*: (?'BTM'.*)\sVIN\s: (?'VIN'.*)$
Hey, I was leaving that for the OP to do; I didn't want to solve the whole thing for him. :-D
-
Hey, I was leaving that for the OP to do; I didn't want to solve the whole thing for him. :-D
-
Sorry, I was bored and happened to have The Regulator open. At least now I can enjoy the weekend in knowing that I accomplished something today. :laugh:
Thanks all, your comments and examples have been very inlightening
modified on Friday, March 21, 2008 6:15 PM
-
Thanks all, your comments and examples have been very inlightening
modified on Friday, March 21, 2008 6:15 PM
Glad to be of service.