Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. Trying to parse legacy data with RegEx [modified]

Trying to parse legacy data with RegEx [modified]

Scheduled Pinned Locked Moved C#
regexcsharpvisual-studiohelpquestion
16 Posts 6 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • realJSOPR realJSOP

    You don't parse with regular expressions. Try something like this:

    	private string SplitValue(string str, char delim)
    	{
    		string\[\] parts = str.Split(delim);
    		return parts\[1\];
    	}
    	//--------------------------------------------------------------------------------
    	private bool Test7()
    	{
    		bool result = false;
    		string original = "LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1G1JC12F137230800";
    		string\[\] parts = original.Split(' ');
    
    		if (parts.Length != 7)
    		{
    			return result;
    		}
    		string license = SplitValue(parts\[0\], ':');
    		string yrmd = SplitValue(parts\[1\], ':');
    		string make = SplitValue(parts\[2\], ':'); ;
    		string btm = parts\[4\].Replace(":", "");
    		string vin = parts\[6\].Replace(":", "");
    
    		return true;
    	}
    

    The code above is based on the string you provided. Since there appear to be spaces in places where there really should be none, you can't really get away with a single split command, so you have to improvise. Yeah, I could have used substring instead of writing another function to split the first three parts, but what the hell, I had time...

    "Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997
    -----
    "...the staggering layers of obscenity in your statement make it a work of art on so many levels." - Jason Jystad, 10/26/2001

    S Offline
    S Offline
    Steve Messer
    wrote on last edited by
    #3

    Thanks, I would taken a much more difficult approach. That is embarrassingly simple.

    1 Reply Last reply
    0
    • S Steve Messer

      I need to be able to parse the following line, I am not even sure regualar expression are the best option. LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1G1JC12F137230800 into LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1F1JD12F137230735 And then capture the data after the semicolon. All can be variable length and there isn't a delimiter Unfortunately, you also can't depend on the tags having no spaces with the colons (i.e. VIN :, vs YRMD:. I have been trying to use Grouping to parse this. I am newish to regular expression and thought maybe I should use them to solve this problem. Are regular expression even a good fit for this problem? Any suggestions appreciated.

      modified on Friday, March 21, 2008 12:01 PM

      P Offline
      P Offline
      PIEBALDconsult
      wrote on last edited by
      #4

      smesser wrote:

      Unfortunately, you also can't depend on the tags having no spaces with the colons (i.e. VIN :, vs YRMD:.

      Perhaps you could replace " :" with ":" to fix that and make processing easier. After that, if values don't contain SPACEs then you can split on SPACE and then on colon as mentioned. Certainly a Regular Expression could be used if the data is regular enough, but why not perform both tasks at once?

      S 1 Reply Last reply
      0
      • S Steve Messer

        I need to be able to parse the following line, I am not even sure regualar expression are the best option. LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1G1JC12F137230800 into LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1F1JD12F137230735 And then capture the data after the semicolon. All can be variable length and there isn't a delimiter Unfortunately, you also can't depend on the tags having no spaces with the colons (i.e. VIN :, vs YRMD:. I have been trying to use Grouping to parse this. I am newish to regular expression and thought maybe I should use them to solve this problem. Are regular expression even a good fit for this problem? Any suggestions appreciated.

        modified on Friday, March 21, 2008 12:01 PM

        L Offline
        L Offline
        Lost User
        wrote on last edited by
        #5

        smesser wrote:

        All can be variable length and there isn't a delimiter Unfortunately, you also can't depend on the tags having no spaces with the colons (i.e. VIN :, vs YRMD:.

        This is true, which makes parsing strings like that pretty hard. I did however find a pattern in this string, but it doesn't necessarily mean that it holds true for other strings. Basically it's always a pair of arbitrary strings separated by a colon, sort of like a key/value-pair. So whenever there's a whitespace left (or also right?) to the colon it doesn't count as a delimiter. Based on this fact the following regex will work: [^\ ]+\ *:[^\ ]+ If you want to also disallow whitespaces right to the colon as a delimiter then this will work: [^\ ]+\ *:\ *[^\ ]+ regards

        S 1 Reply Last reply
        0
        • P PIEBALDconsult

          smesser wrote:

          Unfortunately, you also can't depend on the tags having no spaces with the colons (i.e. VIN :, vs YRMD:.

          Perhaps you could replace " :" with ":" to fix that and make processing easier. After that, if values don't contain SPACEs then you can split on SPACE and then on colon as mentioned. Certainly a Regular Expression could be used if the data is regular enough, but why not perform both tasks at once?

          S Offline
          S Offline
          Steve Messer
          wrote on last edited by
          #6

          Yes, I had just tried that myself. That one was looking me right in the face.

          1 Reply Last reply
          0
          • L Lost User

            smesser wrote:

            All can be variable length and there isn't a delimiter Unfortunately, you also can't depend on the tags having no spaces with the colons (i.e. VIN :, vs YRMD:.

            This is true, which makes parsing strings like that pretty hard. I did however find a pattern in this string, but it doesn't necessarily mean that it holds true for other strings. Basically it's always a pair of arbitrary strings separated by a colon, sort of like a key/value-pair. So whenever there's a whitespace left (or also right?) to the colon it doesn't count as a delimiter. Based on this fact the following regex will work: [^\ ]+\ *:[^\ ]+ If you want to also disallow whitespaces right to the colon as a delimiter then this will work: [^\ ]+\ *:\ *[^\ ]+ regards

            S Offline
            S Offline
            Steve Messer
            wrote on last edited by
            #7

            Then what use named groups to get your values? EDIT: private void Test9() { string original = "LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1G1JC12F137230800"; Regex r = new Regex(@"[^\ ]+\ *:\ *[^\ ]+"); MatchCollection theMatches = r.Matches(original); foreach (Match theMatch in theMatches) { Console.WriteLine(theMatch.Value); } }

            modified on Friday, March 21, 2008 12:42 PM

            1 Reply Last reply
            0
            • S Steve Messer

              I need to be able to parse the following line, I am not even sure regualar expression are the best option. LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1G1JC12F137230800 into LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1F1JD12F137230735 And then capture the data after the semicolon. All can be variable length and there isn't a delimiter Unfortunately, you also can't depend on the tags having no spaces with the colons (i.e. VIN :, vs YRMD:. I have been trying to use Grouping to parse this. I am newish to regular expression and thought maybe I should use them to solve this problem. Are regular expression even a good fit for this problem? Any suggestions appreciated.

              modified on Friday, March 21, 2008 12:01 PM

              P Offline
              P Offline
              PIEBALDconsult
              wrote on last edited by
              #8

              This seems to work

              if ( args.Length > 0 )
              {
              //LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1G1JC12F137230800

              System.Text.RegularExpressions.Regex reg = new System.Text.RegularExpressions.Regex
              (
                  @"^\\s\*LIC#\\s\*: (?'LIC'.\*)YRMD\\s\*: (?'YRMD'.\*)MAKE\\s\*: (?'MAKE'.\*)BTM\\s\*: (?'BTM'.\*)VIN\\s: (?'VIN'.\*)$"
              ) ;
              
              foreach ( System.Text.RegularExpressions.Match mat in reg.Matches ( args \[ 0 \] ) )
              {
                  System.Console.WriteLine
                  (
                      "LIC# = {0} YRMD = {1} MAKE = {2} BTM = {3} VIN = {4}"
                  ,
                       mat.Groups \[ "LIC" \].Value
                  ,
                       mat.Groups \[ "YRMD" \].Value
                  ,
                       mat.Groups \[ "MAKE" \].Value
                  ,
                       mat.Groups \[ "BTM" \].Value
                  ,
                       mat.Groups \[ "VIN" \].Value
                  ) ;
              }
              

              }

              Dagnabit! Frowny faces?! Who wrote this crap? I added a SPACE between the : and the ( to solve that little problem, but they should be eliminated from the Regex.

              modified on Friday, March 21, 2008 1:52 PM

              D S 2 Replies Last reply
              0
              • P PIEBALDconsult

                This seems to work

                if ( args.Length > 0 )
                {
                //LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1G1JC12F137230800

                System.Text.RegularExpressions.Regex reg = new System.Text.RegularExpressions.Regex
                (
                    @"^\\s\*LIC#\\s\*: (?'LIC'.\*)YRMD\\s\*: (?'YRMD'.\*)MAKE\\s\*: (?'MAKE'.\*)BTM\\s\*: (?'BTM'.\*)VIN\\s: (?'VIN'.\*)$"
                ) ;
                
                foreach ( System.Text.RegularExpressions.Match mat in reg.Matches ( args \[ 0 \] ) )
                {
                    System.Console.WriteLine
                    (
                        "LIC# = {0} YRMD = {1} MAKE = {2} BTM = {3} VIN = {4}"
                    ,
                         mat.Groups \[ "LIC" \].Value
                    ,
                         mat.Groups \[ "YRMD" \].Value
                    ,
                         mat.Groups \[ "MAKE" \].Value
                    ,
                         mat.Groups \[ "BTM" \].Value
                    ,
                         mat.Groups \[ "VIN" \].Value
                    ) ;
                }
                

                }

                Dagnabit! Frowny faces?! Who wrote this crap? I added a SPACE between the : and the ( to solve that little problem, but they should be eliminated from the Regex.

                modified on Friday, March 21, 2008 1:52 PM

                D Offline
                D Offline
                Dan Neely
                wrote on last edited by
                #9

                PIEBALDconsult wrote:

                Dagnabit! Frowny faces?! Who wrote this crap?

                Paging Chris Maunder. :doh:

                Otherwise [Microsoft is] toast in the long term no matter how much money they've got. They would be already if the Linux community didn't have it's head so firmly up it's own command line buffer that it looks like taking 15 years to find the desktop. -- Matthew Faithfull

                1 Reply Last reply
                0
                • P PIEBALDconsult

                  This seems to work

                  if ( args.Length > 0 )
                  {
                  //LIC#:ABC123 YRMD:03 MAKE:CHEV BTM :CP VIN :1G1JC12F137230800

                  System.Text.RegularExpressions.Regex reg = new System.Text.RegularExpressions.Regex
                  (
                      @"^\\s\*LIC#\\s\*: (?'LIC'.\*)YRMD\\s\*: (?'YRMD'.\*)MAKE\\s\*: (?'MAKE'.\*)BTM\\s\*: (?'BTM'.\*)VIN\\s: (?'VIN'.\*)$"
                  ) ;
                  
                  foreach ( System.Text.RegularExpressions.Match mat in reg.Matches ( args \[ 0 \] ) )
                  {
                      System.Console.WriteLine
                      (
                          "LIC# = {0} YRMD = {1} MAKE = {2} BTM = {3} VIN = {4}"
                      ,
                           mat.Groups \[ "LIC" \].Value
                      ,
                           mat.Groups \[ "YRMD" \].Value
                      ,
                           mat.Groups \[ "MAKE" \].Value
                      ,
                           mat.Groups \[ "BTM" \].Value
                      ,
                           mat.Groups \[ "VIN" \].Value
                      ) ;
                  }
                  

                  }

                  Dagnabit! Frowny faces?! Who wrote this crap? I added a SPACE between the : and the ( to solve that little problem, but they should be eliminated from the Regex.

                  modified on Friday, March 21, 2008 1:52 PM

                  S Offline
                  S Offline
                  Steve Messer
                  wrote on last edited by
                  #10

                  Hum, unless the copy paste messed something up this is not creating a match for me.

                  P 1 Reply Last reply
                  0
                  • S Steve Messer

                    Hum, unless the copy paste messed something up this is not creating a match for me.

                    P Offline
                    P Offline
                    PIEBALDconsult
                    wrote on last edited by
                    #11

                    With the SPACE between the : and ( you need to use System.Text.RegularExpressions.RegexOptions.IgnorePatternWhitespace to ignore the extraneous SPACEs, but then the # and everything after it become a comment! :mad: So now I've escaped the the # to \x23. Resulting in:

                    System.Text.RegularExpressions.Regex reg = new System.Text.RegularExpressions.Regex
                    (
                    @"^\s*LIC\x23\s*: (?'LIC'.*)YRMD\s*: (?'YRMD'.*)MAKE\s*: (?'MAKE'.*)BTM\s*: (?'BTM'.*)VIN\s: (?'VIN'.*)$"
                    ,
                    System.Text.RegularExpressions.RegexOptions.IgnorePatternWhitespace
                    ) ;

                    Whoops, I had left out an asterisk I had meant to include: VIN\s*****

                    modified on Friday, March 21, 2008 2:35 PM

                    C 1 Reply Last reply
                    0
                    • P PIEBALDconsult

                      With the SPACE between the : and ( you need to use System.Text.RegularExpressions.RegexOptions.IgnorePatternWhitespace to ignore the extraneous SPACEs, but then the # and everything after it become a comment! :mad: So now I've escaped the the # to \x23. Resulting in:

                      System.Text.RegularExpressions.Regex reg = new System.Text.RegularExpressions.Regex
                      (
                      @"^\s*LIC\x23\s*: (?'LIC'.*)YRMD\s*: (?'YRMD'.*)MAKE\s*: (?'MAKE'.*)BTM\s*: (?'BTM'.*)VIN\s: (?'VIN'.*)$"
                      ,
                      System.Text.RegularExpressions.RegexOptions.IgnorePatternWhitespace
                      ) ;

                      Whoops, I had left out an asterisk I had meant to include: VIN\s*****

                      modified on Friday, March 21, 2008 2:35 PM

                      C Offline
                      C Offline
                      ChrisKo 0
                      wrote on last edited by
                      #12

                      Quick little edit to get rid of the extra space that was being captured.

                      ^\s*LIC\x23\s*: (?'LIC'.*)\sYRMD\s*: (?'YRMD'.*)\sMAKE\s*: (?'MAKE'.*)\sBTM\s*: (?'BTM'.*)\sVIN\s: (?'VIN'.*)$

                      P 1 Reply Last reply
                      0
                      • C ChrisKo 0

                        Quick little edit to get rid of the extra space that was being captured.

                        ^\s*LIC\x23\s*: (?'LIC'.*)\sYRMD\s*: (?'YRMD'.*)\sMAKE\s*: (?'MAKE'.*)\sBTM\s*: (?'BTM'.*)\sVIN\s: (?'VIN'.*)$

                        P Offline
                        P Offline
                        PIEBALDconsult
                        wrote on last edited by
                        #13

                        Hey, I was leaving that for the OP to do; I didn't want to solve the whole thing for him. :-D

                        C 1 Reply Last reply
                        0
                        • P PIEBALDconsult

                          Hey, I was leaving that for the OP to do; I didn't want to solve the whole thing for him. :-D

                          C Offline
                          C Offline
                          ChrisKo 0
                          wrote on last edited by
                          #14

                          Sorry, I was bored and happened to have The Regulator open. At least now I can enjoy the weekend in knowing that I accomplished something today. :laugh:

                          S 1 Reply Last reply
                          0
                          • C ChrisKo 0

                            Sorry, I was bored and happened to have The Regulator open. At least now I can enjoy the weekend in knowing that I accomplished something today. :laugh:

                            S Offline
                            S Offline
                            Steve Messer
                            wrote on last edited by
                            #15

                            Thanks all, your comments and examples have been very inlightening

                            modified on Friday, March 21, 2008 6:15 PM

                            P 1 Reply Last reply
                            0
                            • S Steve Messer

                              Thanks all, your comments and examples have been very inlightening

                              modified on Friday, March 21, 2008 6:15 PM

                              P Offline
                              P Offline
                              PIEBALDconsult
                              wrote on last edited by
                              #16

                              Glad to be of service.

                              1 Reply Last reply
                              0
                              Reply
                              • Reply as topic
                              Log in to reply
                              • Oldest to Newest
                              • Newest to Oldest
                              • Most Votes


                              • Login

                              • Don't have an account? Register

                              • Login or register to search.
                              • First post
                                Last post
                              0
                              • Categories
                              • Recent
                              • Tags
                              • Popular
                              • World
                              • Users
                              • Groups