Trying to parse legacy data with RegEx [modified]

Steve Messer

Thanks, I would taken a much more difficult approach. That is embarrassingly simple.

PIEBALDconsult · modified on Friday, March 21, 2008 12:01 PM

smesser wrote:

Unfortunately, you also can't depend on the tags having no spaces with the colons (i.e. VIN :, vs YRMD:.

Perhaps you could replace " :" with ":" to fix that and make processing easier. After that, if values don't contain SPACEs then you can split on SPACE and then on colon as mentioned. Certainly a Regular Expression could be used if the data is regular enough, but why not perform both tasks at once?

Lost User · modified on Friday, March 21, 2008 12:01 PM

smesser wrote:

All can be variable length and there isn't a delimiter Unfortunately, you also can't depend on the tags having no spaces with the colons (i.e. VIN :, vs YRMD:.

This is true, which makes parsing strings like that pretty hard. I did however find a pattern in this string, but it doesn't necessarily mean that it holds true for other strings. Basically it's always a pair of arbitrary strings separated by a colon, sort of like a key/value-pair. So whenever there's a whitespace left (or also right?) to the colon it doesn't count as a delimiter. Based on this fact the following regex will work: [^\ ]+\ *:[^\ ]+ If you want to also disallow whitespaces right to the colon as a delimiter then this will work: [^\ ]+\ *:\ *[^\ ]+ regards

Steve Messer

Yes, I had just tried that myself. That one was looking me right in the face.

Steve Messer

Then what use named groups to get your values? EDIT: private void Test9() { string original = "LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1G1JC12F137230800"; Regex r = new Regex(@"[^\ ]+\ *:\ *[^\ ]+"); MatchCollection theMatches = r.Matches(original); foreach (Match theMatch in theMatches) { Console.WriteLine(theMatch.Value); } }

modified on Friday, March 21, 2008 12:42 PM

PIEBALDconsult · modified on Friday, March 21, 2008 12:01 PM

This seems to work

if ( args.Length > 0 )
{
//LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1G1JC12F137230800

System.Text.RegularExpressions.Regex reg = new System.Text.RegularExpressions.Regex
(
    @"^\\s\*LIC#\\s\*: (?'LIC'.\*)YRMD\\s\*: (?'YRMD'.\*)MAKE\\s\*: (?'MAKE'.\*)BTM\\s\*: (?'BTM'.\*)VIN\\s: (?'VIN'.\*)$"
) ;

foreach ( System.Text.RegularExpressions.Match mat in reg.Matches ( args \[ 0 \] ) )
{
    System.Console.WriteLine
    (
        "LIC# = {0} YRMD = {1} MAKE = {2} BTM = {3} VIN = {4}"
    ,
         mat.Groups \[ "LIC" \].Value
    ,
         mat.Groups \[ "YRMD" \].Value
    ,
         mat.Groups \[ "MAKE" \].Value
    ,
         mat.Groups \[ "BTM" \].Value
    ,
         mat.Groups \[ "VIN" \].Value
    ) ;
}

}

Dagnabit! Frowny faces?! Who wrote this crap? I added a SPACE between the : and the ( to solve that little problem, but they should be eliminated from the Regex.

modified on Friday, March 21, 2008 1:52 PM

Dan Neely · modified on Friday, March 21, 2008 1:52 PM

PIEBALDconsult wrote:

Dagnabit! Frowny faces?! Who wrote this crap?

Paging Chris Maunder. :doh:

Otherwise [Microsoft is] toast in the long term no matter how much money they've got. They would be already if the Linux community didn't have it's head so firmly up it's own command line buffer that it looks like taking 15 years to find the desktop. -- Matthew Faithfull

Steve Messer · modified on Friday, March 21, 2008 1:52 PM

Hum, unless the copy paste messed something up this is not creating a match for me.

PIEBALDconsult

With the SPACE between the : and ( you need to use System.Text.RegularExpressions.RegexOptions.IgnorePatternWhitespace to ignore the extraneous SPACEs, but then the # and everything after it become a comment! :mad: So now I've escaped the the # to \x23. Resulting in:

System.Text.RegularExpressions.Regex reg = new System.Text.RegularExpressions.Regex
(
@"^\s*LIC\x23\s*: (?'LIC'.*)YRMD\s*: (?'YRMD'.*)MAKE\s*: (?'MAKE'.*)BTM\s*: (?'BTM'.*)VIN\s: (?'VIN'.*)$"
,
System.Text.RegularExpressions.RegexOptions.IgnorePatternWhitespace
) ;

Whoops, I had left out an asterisk I had meant to include: VIN\s*****

modified on Friday, March 21, 2008 2:35 PM

ChrisKo 0 · modified on Friday, March 21, 2008 2:35 PM

Quick little edit to get rid of the extra space that was being captured.

^\s*LIC\x23\s*: (?'LIC'.*)\sYRMD\s*: (?'YRMD'.*)\sMAKE\s*: (?'MAKE'.*)\sBTM\s*: (?'BTM'.*)\sVIN\s: (?'VIN'.*)$

PIEBALDconsult

Hey, I was leaving that for the OP to do; I didn't want to solve the whole thing for him. :-D

ChrisKo 0

Sorry, I was bored and happened to have The Regulator open. At least now I can enjoy the weekend in knowing that I accomplished something today. :laugh:

Steve Messer

Thanks all, your comments and examples have been very inlightening

modified on Friday, March 21, 2008 6:15 PM

PIEBALDconsult · modified on Friday, March 21, 2008 6:15 PM

Glad to be of service.