Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. Dirtying my hand with regular expression for first time

Dirtying my hand with regular expression for first time

Scheduled Pinned Locked Moved C#
regexhelp
6 Posts 3 Posters 1 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • T Offline
    T Offline
    tasumisra
    wrote on last edited by
    #1

    Hello Experts, This is my first attempt to use regular expression for a sequence of string.. i am partially successful but wants to get things validated... I have a sequence of character like "ABC 34 DEX 456 NT 456 TEXT rt st NEWTEXT 4564" All caps are identifier and subsequent is the value like the value of ABC is 34 and DEX is 456.... Problem: i need to replace the value of TEXT with * 1)it can be null 2)its name itself can change like TEXTVAL or TEXT: 3)there could be 2 or 3 space after TEXT 4)But the max length of value would be 5 and min would be 2 5)it can accept space too at any position in the value Considering all above i reached in the conclusion that it would be hard to find number of element for Value..as space can be from value or from field... what i have decided i will insert five stars(*****) max length of TEXT value... so to achieve this i am using

    Quote:

    Regex _regex = new Regex(@"/TEXT/([a-z0-9\-\ ]+)\ $");

    for each field TEXT,TEXT: and TEXTVAL.. but i am not very much convinced with the approach ..can some body help me here.. Thanks, Tasu

    vikas da

    OriginalGriffO 1 Reply Last reply
    0
    • T tasumisra

      Hello Experts, This is my first attempt to use regular expression for a sequence of string.. i am partially successful but wants to get things validated... I have a sequence of character like "ABC 34 DEX 456 NT 456 TEXT rt st NEWTEXT 4564" All caps are identifier and subsequent is the value like the value of ABC is 34 and DEX is 456.... Problem: i need to replace the value of TEXT with * 1)it can be null 2)its name itself can change like TEXTVAL or TEXT: 3)there could be 2 or 3 space after TEXT 4)But the max length of value would be 5 and min would be 2 5)it can accept space too at any position in the value Considering all above i reached in the conclusion that it would be hard to find number of element for Value..as space can be from value or from field... what i have decided i will insert five stars(*****) max length of TEXT value... so to achieve this i am using

      Quote:

      Regex _regex = new Regex(@"/TEXT/([a-z0-9\-\ ]+)\ $");

      for each field TEXT,TEXT: and TEXTVAL.. but i am not very much convinced with the approach ..can some body help me here.. Thanks, Tasu

      vikas da

      OriginalGriffO Offline
      OriginalGriffO Offline
      OriginalGriff
      wrote on last edited by
      #2

      I'm not sure exactly what you are trying to do! Perhaps an example of your input and output strings would help? Preferably using "real" data, rather than "mock up"? The trouble is that your example fixes the word "TEXT" and will detect in two places in your example, so it's difficult to work out exactly what you are trying to achive:

      TEXT rt st NEWTEXT 4564

      is one match, and

      TEXT 4564

      is also a match.

      Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952) Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)

      "I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
      "Common sense is so rare these days, it should be classified as a super power" - Random T-shirt

      T 1 Reply Last reply
      0
      • OriginalGriffO OriginalGriff

        I'm not sure exactly what you are trying to do! Perhaps an example of your input and output strings would help? Preferably using "real" data, rather than "mock up"? The trouble is that your example fixes the word "TEXT" and will detect in two places in your example, so it's difficult to work out exactly what you are trying to achive:

        TEXT rt st NEWTEXT 4564

        is one match, and

        TEXT 4564

        is also a match.

        Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952) Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)

        T Offline
        T Offline
        tasumisra
        wrote on last edited by
        #3

        Perhaps i could have provided this before..

        Quote:

        "FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN 12345 ADDLINE1....." "FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN # 12345 CITY....." "FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN: 123 5 COUNTRY....." "FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN NUMBER ADDLINE1....."

        So in Above the value of PIN value needs to be masked with (*)...if it has some value.. Pin can be alphanumeric and can have space also. above streams are from different screen so they do not have consistency in name of PIN as it can be PIN: or PIN # or PIN NUMBER or simply PIN... The ADDLINE1 is also not fixed and it can change to CITY,Country etc Now i have to extract the value of PIN|PIN:|PIN #|PIN NUMBER and need to replace it by ***** in all the scenario if it contains a value. Let me know if you still have some doubts..

        vikas da

        OriginalGriffO Richard DeemingR 2 Replies Last reply
        0
        • T tasumisra

          Perhaps i could have provided this before..

          Quote:

          "FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN 12345 ADDLINE1....." "FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN # 12345 CITY....." "FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN: 123 5 COUNTRY....." "FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN NUMBER ADDLINE1....."

          So in Above the value of PIN value needs to be masked with (*)...if it has some value.. Pin can be alphanumeric and can have space also. above streams are from different screen so they do not have consistency in name of PIN as it can be PIN: or PIN # or PIN NUMBER or simply PIN... The ADDLINE1 is also not fixed and it can change to CITY,Country etc Now i have to extract the value of PIN|PIN:|PIN #|PIN NUMBER and need to replace it by ***** in all the scenario if it contains a value. Let me know if you still have some doubts..

          vikas da

          OriginalGriffO Offline
          OriginalGriffO Offline
          OriginalGriff
          wrote on last edited by
          #4

          That's really quite nasty. If the PIN can only be numeric, then it's not too bad - but if it does contain alpha characters and spaces, then you can't find a PIN in the last example: ADDLINE1 could be PIN data... For numeric it's ok:

          public static Regex regex = new Regex(
          "(PIN\\s?(\\#|:|NUMBER)?\\s?)([\\d\\s]+)",
          RegexOptions.Multiline
          | RegexOptions.CultureInvariant
          | RegexOptions.IgnorePatternWhitespace
          | RegexOptions.Compiled
          );
          public static string regexReplace = "$1 ****";
          ...
          string result = regex.Replace(InputText,regexReplace);

          buit with alphanumerics? I'm not sure it can be done... But I do love Expresso [^] - it makes working out and testing these things sooooo much easier!

          Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952) Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)

          "I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
          "Common sense is so rare these days, it should be classified as a super power" - Random T-shirt

          1 Reply Last reply
          0
          • T tasumisra

            Perhaps i could have provided this before..

            Quote:

            "FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN 12345 ADDLINE1....." "FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN # 12345 CITY....." "FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN: 123 5 COUNTRY....." "FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN NUMBER ADDLINE1....."

            So in Above the value of PIN value needs to be masked with (*)...if it has some value.. Pin can be alphanumeric and can have space also. above streams are from different screen so they do not have consistency in name of PIN as it can be PIN: or PIN # or PIN NUMBER or simply PIN... The ADDLINE1 is also not fixed and it can change to CITY,Country etc Now i have to extract the value of PIN|PIN:|PIN #|PIN NUMBER and need to replace it by ***** in all the scenario if it contains a value. Let me know if you still have some doubts..

            vikas da

            Richard DeemingR Offline
            Richard DeemingR Offline
            Richard Deeming
            wrote on last edited by
            #5

            Based on your sample data and description, this pattern will match the three PIN numbers:

            (?<=(PIN|(PIN\s+\#)|(PIN:)|(PIN NUMBER))\s+)\b[\w\s]{2,5}\b

            You can then replace it with "*****" to mask the values:

            Regex pinNumberPattern = new Regex(@"(?<=(PIN|(PIN\s+\#)|(PIN:)|(PIN NUMBER))\s+)\b[\w\s]{2,5}\b", RegexOptions.ExplicitCapture | RegexOptions.IgnoreCase);

            string input = @"FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN 12345 ADDLINE1.....
            FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN # 12345 CITY.....
            FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN: 123 5 COUNTRY.....
            FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN NUMBER ADDLINE1.....";

            string output = pinNumberPattern.Replace(input, "*****");

            /*
            output contains:
            FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN ***** ADDLINE1.....
            FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN # ***** CITY.....
            FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN: ***** COUNTRY.....
            FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN NUMBER ADDLINE1.....
            */


            "These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer

            "These people looked deep within my soul and assigned me a number based on the order in which I joined" - Homer

            T 1 Reply Last reply
            0
            • Richard DeemingR Richard Deeming

              Based on your sample data and description, this pattern will match the three PIN numbers:

              (?<=(PIN|(PIN\s+\#)|(PIN:)|(PIN NUMBER))\s+)\b[\w\s]{2,5}\b

              You can then replace it with "*****" to mask the values:

              Regex pinNumberPattern = new Regex(@"(?<=(PIN|(PIN\s+\#)|(PIN:)|(PIN NUMBER))\s+)\b[\w\s]{2,5}\b", RegexOptions.ExplicitCapture | RegexOptions.IgnoreCase);

              string input = @"FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN 12345 ADDLINE1.....
              FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN # 12345 CITY.....
              FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN: 123 5 COUNTRY.....
              FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN NUMBER ADDLINE1.....";

              string output = pinNumberPattern.Replace(input, "*****");

              /*
              output contains:
              FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN ***** ADDLINE1.....
              FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN # ***** CITY.....
              FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN: ***** COUNTRY.....
              FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN NUMBER ADDLINE1.....
              */


              "These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer

              T Offline
              T Offline
              tasumisra
              wrote on last edited by
              #6

              Thanks Richard Deeming and OriginalGriff ...I will test the various scenario and update you the same. Thank you so much for the help

              vikas da

              1 Reply Last reply
              0
              Reply
              • Reply as topic
              Log in to reply
              • Oldest to Newest
              • Newest to Oldest
              • Most Votes


              • Login

              • Don't have an account? Register

              • Login or register to search.
              • First post
                Last post
              0
              • Categories
              • Recent
              • Tags
              • Popular
              • World
              • Users
              • Groups