Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. Trying to parse legacy data with RegEx [modified]

Trying to parse legacy data with RegEx [modified]

Scheduled Pinned Locked Moved C#
regexcsharpvisual-studiohelpquestion
16 Posts 6 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S Offline
    S Offline
    Steve Messer
    wrote on last edited by
    #1

    I need to be able to parse the following line, I am not even sure regualar expression are the best option. LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1G1JC12F137230800 into LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1F1JD12F137230735 And then capture the data after the semicolon. All can be variable length and there isn't a delimiter Unfortunately, you also can't depend on the tags having no spaces with the colons (i.e. VIN :, vs YRMD:. I have been trying to use Grouping to parse this. I am newish to regular expression and thought maybe I should use them to solve this problem. Are regular expression even a good fit for this problem? Any suggestions appreciated.

    modified on Friday, March 21, 2008 12:01 PM

    realJSOPR P L 4 Replies Last reply
    0
    • S Steve Messer

      I need to be able to parse the following line, I am not even sure regualar expression are the best option. LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1G1JC12F137230800 into LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1F1JD12F137230735 And then capture the data after the semicolon. All can be variable length and there isn't a delimiter Unfortunately, you also can't depend on the tags having no spaces with the colons (i.e. VIN :, vs YRMD:. I have been trying to use Grouping to parse this. I am newish to regular expression and thought maybe I should use them to solve this problem. Are regular expression even a good fit for this problem? Any suggestions appreciated.

      modified on Friday, March 21, 2008 12:01 PM

      realJSOPR Offline
      realJSOPR Offline
      realJSOP
      wrote on last edited by
      #2

      You don't parse with regular expressions. Try something like this:

      	private string SplitValue(string str, char delim)
      	{
      		string\[\] parts = str.Split(delim);
      		return parts\[1\];
      	}
      	//--------------------------------------------------------------------------------
      	private bool Test7()
      	{
      		bool result = false;
      		string original = "LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1G1JC12F137230800";
      		string\[\] parts = original.Split(' ');
      
      		if (parts.Length != 7)
      		{
      			return result;
      		}
      		string license = SplitValue(parts\[0\], ':');
      		string yrmd = SplitValue(parts\[1\], ':');
      		string make = SplitValue(parts\[2\], ':'); ;
      		string btm = parts\[4\].Replace(":", "");
      		string vin = parts\[6\].Replace(":", "");
      
      		return true;
      	}
      

      The code above is based on the string you provided. Since there appear to be spaces in places where there really should be none, you can't really get away with a single split command, so you have to improvise. Yeah, I could have used substring instead of writing another function to split the first three parts, but what the hell, I had time...

      "Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997
      -----
      "...the staggering layers of obscenity in your statement make it a work of art on so many levels." - Jason Jystad, 10/26/2001

      S 1 Reply Last reply
      0
      • realJSOPR realJSOP

        You don't parse with regular expressions. Try something like this:

        	private string SplitValue(string str, char delim)
        	{
        		string\[\] parts = str.Split(delim);
        		return parts\[1\];
        	}
        	//--------------------------------------------------------------------------------
        	private bool Test7()
        	{
        		bool result = false;
        		string original = "LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1G1JC12F137230800";
        		string\[\] parts = original.Split(' ');
        
        		if (parts.Length != 7)
        		{
        			return result;
        		}
        		string license = SplitValue(parts\[0\], ':');
        		string yrmd = SplitValue(parts\[1\], ':');
        		string make = SplitValue(parts\[2\], ':'); ;
        		string btm = parts\[4\].Replace(":", "");
        		string vin = parts\[6\].Replace(":", "");
        
        		return true;
        	}
        

        The code above is based on the string you provided. Since there appear to be spaces in places where there really should be none, you can't really get away with a single split command, so you have to improvise. Yeah, I could have used substring instead of writing another function to split the first three parts, but what the hell, I had time...

        "Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997
        -----
        "...the staggering layers of obscenity in your statement make it a work of art on so many levels." - Jason Jystad, 10/26/2001

        S Offline
        S Offline
        Steve Messer
        wrote on last edited by
        #3

        Thanks, I would taken a much more difficult approach. That is embarrassingly simple.

        1 Reply Last reply
        0
        • S Steve Messer

          I need to be able to parse the following line, I am not even sure regualar expression are the best option. LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1G1JC12F137230800 into LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1F1JD12F137230735 And then capture the data after the semicolon. All can be variable length and there isn't a delimiter Unfortunately, you also can't depend on the tags having no spaces with the colons (i.e. VIN :, vs YRMD:. I have been trying to use Grouping to parse this. I am newish to regular expression and thought maybe I should use them to solve this problem. Are regular expression even a good fit for this problem? Any suggestions appreciated.

          modified on Friday, March 21, 2008 12:01 PM

          P Offline
          P Offline
          PIEBALDconsult
          wrote on last edited by
          #4

          smesser wrote:

          Unfortunately, you also can't depend on the tags having no spaces with the colons (i.e. VIN :, vs YRMD:.

          Perhaps you could replace " :" with ":" to fix that and make processing easier. After that, if values don't contain SPACEs then you can split on SPACE and then on colon as mentioned. Certainly a Regular Expression could be used if the data is regular enough, but why not perform both tasks at once?

          S 1 Reply Last reply
          0
          • S Steve Messer

            I need to be able to parse the following line, I am not even sure regualar expression are the best option. LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1G1JC12F137230800 into LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1F1JD12F137230735 And then capture the data after the semicolon. All can be variable length and there isn't a delimiter Unfortunately, you also can't depend on the tags having no spaces with the colons (i.e. VIN :, vs YRMD:. I have been trying to use Grouping to parse this. I am newish to regular expression and thought maybe I should use them to solve this problem. Are regular expression even a good fit for this problem? Any suggestions appreciated.

            modified on Friday, March 21, 2008 12:01 PM

            L Offline
            L Offline
            Lost User
            wrote on last edited by
            #5

            smesser wrote:

            All can be variable length and there isn't a delimiter Unfortunately, you also can't depend on the tags having no spaces with the colons (i.e. VIN :, vs YRMD:.

            This is true, which makes parsing strings like that pretty hard. I did however find a pattern in this string, but it doesn't necessarily mean that it holds true for other strings. Basically it's always a pair of arbitrary strings separated by a colon, sort of like a key/value-pair. So whenever there's a whitespace left (or also right?) to the colon it doesn't count as a delimiter. Based on this fact the following regex will work: [^\ ]+\ *:[^\ ]+ If you want to also disallow whitespaces right to the colon as a delimiter then this will work: [^\ ]+\ *:\ *[^\ ]+ regards

            S 1 Reply Last reply
            0
            • P PIEBALDconsult

              smesser wrote:

              Unfortunately, you also can't depend on the tags having no spaces with the colons (i.e. VIN :, vs YRMD:.

              Perhaps you could replace " :" with ":" to fix that and make processing easier. After that, if values don't contain SPACEs then you can split on SPACE and then on colon as mentioned. Certainly a Regular Expression could be used if the data is regular enough, but why not perform both tasks at once?

              S Offline
              S Offline
              Steve Messer
              wrote on last edited by
              #6

              Yes, I had just tried that myself. That one was looking me right in the face.

              1 Reply Last reply
              0
              • L Lost User

                smesser wrote:

                All can be variable length and there isn't a delimiter Unfortunately, you also can't depend on the tags having no spaces with the colons (i.e. VIN :, vs YRMD:.

                This is true, which makes parsing strings like that pretty hard. I did however find a pattern in this string, but it doesn't necessarily mean that it holds true for other strings. Basically it's always a pair of arbitrary strings separated by a colon, sort of like a key/value-pair. So whenever there's a whitespace left (or also right?) to the colon it doesn't count as a delimiter. Based on this fact the following regex will work: [^\ ]+\ *:[^\ ]+ If you want to also disallow whitespaces right to the colon as a delimiter then this will work: [^\ ]+\ *:\ *[^\ ]+ regards

                S Offline
                S Offline
                Steve Messer
                wrote on last edited by
                #7

                Then what use named groups to get your values? EDIT: private void Test9() { string original = "LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1G1JC12F137230800"; Regex r = new Regex(@"[^\ ]+\ *:\ *[^\ ]+"); MatchCollection theMatches = r.Matches(original); foreach (Match theMatch in theMatches) { Console.WriteLine(theMatch.Value); } }

                modified on Friday, March 21, 2008 12:42 PM

                1 Reply Last reply
                0
                • S Steve Messer

                  I need to be able to parse the following line, I am not even sure regualar expression are the best option. LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1G1JC12F137230800 into LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1F1JD12F137230735 And then capture the data after the semicolon. All can be variable length and there isn't a delimiter Unfortunately, you also can't depend on the tags having no spaces with the colons (i.e. VIN :, vs YRMD:. I have been trying to use Grouping to parse this. I am newish to regular expression and thought maybe I should use them to solve this problem. Are regular expression even a good fit for this problem? Any suggestions appreciated.

                  modified on Friday, March 21, 2008 12:01 PM

                  P Offline
                  P Offline
                  PIEBALDconsult
                  wrote on last edited by
                  #8

                  This seems to work

                  if ( args.Length > 0 )
                  {
                  //LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1G1JC12F137230800

                  System.Text.RegularExpressions.Regex reg = new System.Text.RegularExpressions.Regex
                  (
                      @"^\\s\*LIC#\\s\*: (?'LIC'.\*)YRMD\\s\*: (?'YRMD'.\*)MAKE\\s\*: (?'MAKE'.\*)BTM\\s\*: (?'BTM'.\*)VIN\\s: (?'VIN'.\*)$"
                  ) ;
                  
                  foreach ( System.Text.RegularExpressions.Match mat in reg.Matches ( args \[ 0 \] ) )
                  {
                      System.Console.WriteLine
                      (
                          "LIC# = {0} YRMD = {1} MAKE = {2} BTM = {3} VIN = {4}"
                      ,
                           mat.Groups \[ "LIC" \].Value
                      ,
                           mat.Groups \[ "YRMD" \].Value
                      ,
                           mat.Groups \[ "MAKE" \].Value
                      ,
                           mat.Groups \[ "BTM" \].Value
                      ,
                           mat.Groups \[ "VIN" \].Value
                      ) ;
                  }
                  

                  }

                  Dagnabit! Frowny faces?! Who wrote this crap? I added a SPACE between the : and the ( to solve that little problem, but they should be eliminated from the Regex.

                  modified on Friday, March 21, 2008 1:52 PM

                  D S 2 Replies Last reply
                  0
                  • P PIEBALDconsult

                    This seems to work

                    if ( args.Length > 0 )
                    {
                    //LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1G1JC12F137230800

                    System.Text.RegularExpressions.Regex reg = new System.Text.RegularExpressions.Regex
                    (
                        @"^\\s\*LIC#\\s\*: (?'LIC'.\*)YRMD\\s\*: (?'YRMD'.\*)MAKE\\s\*: (?'MAKE'.\*)BTM\\s\*: (?'BTM'.\*)VIN\\s: (?'VIN'.\*)$"
                    ) ;
                    
                    foreach ( System.Text.RegularExpressions.Match mat in reg.Matches ( args \[ 0 \] ) )
                    {
                        System.Console.WriteLine
                        (
                            "LIC# = {0} YRMD = {1} MAKE = {2} BTM = {3} VIN = {4}"
                        ,
                             mat.Groups \[ "LIC" \].Value
                        ,
                             mat.Groups \[ "YRMD" \].Value
                        ,
                             mat.Groups \[ "MAKE" \].Value
                        ,
                             mat.Groups \[ "BTM" \].Value
                        ,
                             mat.Groups \[ "VIN" \].Value
                        ) ;
                    }
                    

                    }

                    Dagnabit! Frowny faces?! Who wrote this crap? I added a SPACE between the : and the ( to solve that little problem, but they should be eliminated from the Regex.

                    modified on Friday, March 21, 2008 1:52 PM

                    D Offline
                    D Offline
                    Dan Neely
                    wrote on last edited by
                    #9

                    PIEBALDconsult wrote:

                    Dagnabit! Frowny faces?! Who wrote this crap?

                    Paging Chris Maunder. :doh:

                    Otherwise [Microsoft is] toast in the long term no matter how much money they've got. They would be already if the Linux community didn't have it's head so firmly up it's own command line buffer that it looks like taking 15 years to find the desktop. -- Matthew Faithfull

                    1 Reply Last reply
                    0
                    • P PIEBALDconsult

                      This seems to work

                      if ( args.Length > 0 )
                      {
                      //LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1G1JC12F137230800

                      System.Text.RegularExpressions.Regex reg = new System.Text.RegularExpressions.Regex
                      (
                          @"^\\s\*LIC#\\s\*: (?'LIC'.\*)YRMD\\s\*: (?'YRMD'.\*)MAKE\\s\*: (?'MAKE'.\*)BTM\\s\*: (?'BTM'.\*)VIN\\s: (?'VIN'.\*)$"
                      ) ;
                      
                      foreach ( System.Text.RegularExpressions.Match mat in reg.Matches ( args \[ 0 \] ) )
                      {
                          System.Console.WriteLine
                          (
                              "LIC# = {0} YRMD = {1} MAKE = {2} BTM = {3} VIN = {4}"
                          ,
                               mat.Groups \[ "LIC" \].Value
                          ,
                               mat.Groups \[ "YRMD" \].Value
                          ,
                               mat.Groups \[ "MAKE" \].Value
                          ,
                               mat.Groups \[ "BTM" \].Value
                          ,
                               mat.Groups \[ "VIN" \].Value
                          ) ;
                      }
                      

                      }

                      Dagnabit! Frowny faces?! Who wrote this crap? I added a SPACE between the : and the ( to solve that little problem, but they should be eliminated from the Regex.

                      modified on Friday, March 21, 2008 1:52 PM

                      S Offline
                      S Offline
                      Steve Messer
                      wrote on last edited by
                      #10

                      Hum, unless the copy paste messed something up this is not creating a match for me.

                      P 1 Reply Last reply
                      0
                      • S Steve Messer

                        Hum, unless the copy paste messed something up this is not creating a match for me.

                        P Offline
                        P Offline
                        PIEBALDconsult
                        wrote on last edited by
                        #11

                        With the SPACE between the : and ( you need to use System.Text.RegularExpressions.RegexOptions.IgnorePatternWhitespace to ignore the extraneous SPACEs, but then the # and everything after it become a comment! :mad: So now I've escaped the the # to \x23. Resulting in:

                        System.Text.RegularExpressions.Regex reg = new System.Text.RegularExpressions.Regex
                        (
                        @"^\s*LIC\x23\s*: (?'LIC'.*)YRMD\s*: (?'YRMD'.*)MAKE\s*: (?'MAKE'.*)BTM\s*: (?'BTM'.*)VIN\s: (?'VIN'.*)$"
                        ,
                        System.Text.RegularExpressions.RegexOptions.IgnorePatternWhitespace
                        ) ;

                        Whoops, I had left out an asterisk I had meant to include: VIN\s*****

                        modified on Friday, March 21, 2008 2:35 PM

                        C 1 Reply Last reply
                        0
                        • P PIEBALDconsult

                          With the SPACE between the : and ( you need to use System.Text.RegularExpressions.RegexOptions.IgnorePatternWhitespace to ignore the extraneous SPACEs, but then the # and everything after it become a comment! :mad: So now I've escaped the the # to \x23. Resulting in:

                          System.Text.RegularExpressions.Regex reg = new System.Text.RegularExpressions.Regex
                          (
                          @"^\s*LIC\x23\s*: (?'LIC'.*)YRMD\s*: (?'YRMD'.*)MAKE\s*: (?'MAKE'.*)BTM\s*: (?'BTM'.*)VIN\s: (?'VIN'.*)$"
                          ,
                          System.Text.RegularExpressions.RegexOptions.IgnorePatternWhitespace
                          ) ;

                          Whoops, I had left out an asterisk I had meant to include: VIN\s*****

                          modified on Friday, March 21, 2008 2:35 PM

                          C Offline
                          C Offline
                          ChrisKo 0
                          wrote on last edited by
                          #12

                          Quick little edit to get rid of the extra space that was being captured.

                          ^\s*LIC\x23\s*: (?'LIC'.*)\sYRMD\s*: (?'YRMD'.*)\sMAKE\s*: (?'MAKE'.*)\sBTM\s*: (?'BTM'.*)\sVIN\s: (?'VIN'.*)$

                          P 1 Reply Last reply
                          0
                          • C ChrisKo 0

                            Quick little edit to get rid of the extra space that was being captured.

                            ^\s*LIC\x23\s*: (?'LIC'.*)\sYRMD\s*: (?'YRMD'.*)\sMAKE\s*: (?'MAKE'.*)\sBTM\s*: (?'BTM'.*)\sVIN\s: (?'VIN'.*)$

                            P Offline
                            P Offline
                            PIEBALDconsult
                            wrote on last edited by
                            #13

                            Hey, I was leaving that for the OP to do; I didn't want to solve the whole thing for him. :-D

                            C 1 Reply Last reply
                            0
                            • P PIEBALDconsult

                              Hey, I was leaving that for the OP to do; I didn't want to solve the whole thing for him. :-D

                              C Offline
                              C Offline
                              ChrisKo 0
                              wrote on last edited by
                              #14

                              Sorry, I was bored and happened to have The Regulator open. At least now I can enjoy the weekend in knowing that I accomplished something today. :laugh:

                              S 1 Reply Last reply
                              0
                              • C ChrisKo 0

                                Sorry, I was bored and happened to have The Regulator open. At least now I can enjoy the weekend in knowing that I accomplished something today. :laugh:

                                S Offline
                                S Offline
                                Steve Messer
                                wrote on last edited by
                                #15

                                Thanks all, your comments and examples have been very inlightening

                                modified on Friday, March 21, 2008 6:15 PM

                                P 1 Reply Last reply
                                0
                                • S Steve Messer

                                  Thanks all, your comments and examples have been very inlightening

                                  modified on Friday, March 21, 2008 6:15 PM

                                  P Offline
                                  P Offline
                                  PIEBALDconsult
                                  wrote on last edited by
                                  #16

                                  Glad to be of service.

                                  1 Reply Last reply
                                  0
                                  Reply
                                  • Reply as topic
                                  Log in to reply
                                  • Oldest to Newest
                                  • Newest to Oldest
                                  • Most Votes


                                  • Login

                                  • Don't have an account? Register

                                  • Login or register to search.
                                  • First post
                                    Last post
                                  0
                                  • Categories
                                  • Recent
                                  • Tags
                                  • Popular
                                  • World
                                  • Users
                                  • Groups