Dirtying my hand with regular expression for first time
-
Hello Experts, This is my first attempt to use regular expression for a sequence of string.. i am partially successful but wants to get things validated... I have a sequence of character like "ABC 34 DEX 456 NT 456 TEXT rt st NEWTEXT 4564" All caps are identifier and subsequent is the value like the value of ABC is 34 and DEX is 456.... Problem: i need to replace the value of TEXT with * 1)it can be null 2)its name itself can change like TEXTVAL or TEXT: 3)there could be 2 or 3 space after TEXT 4)But the max length of value would be 5 and min would be 2 5)it can accept space too at any position in the value Considering all above i reached in the conclusion that it would be hard to find number of element for Value..as space can be from value or from field... what i have decided i will insert five stars(*****) max length of TEXT value... so to achieve this i am using
Quote:
Regex _regex = new Regex(@"/TEXT/([a-z0-9\-\ ]+)\ $");
for each field TEXT,TEXT: and TEXTVAL.. but i am not very much convinced with the approach ..can some body help me here.. Thanks, Tasu
vikas da
-
Hello Experts, This is my first attempt to use regular expression for a sequence of string.. i am partially successful but wants to get things validated... I have a sequence of character like "ABC 34 DEX 456 NT 456 TEXT rt st NEWTEXT 4564" All caps are identifier and subsequent is the value like the value of ABC is 34 and DEX is 456.... Problem: i need to replace the value of TEXT with * 1)it can be null 2)its name itself can change like TEXTVAL or TEXT: 3)there could be 2 or 3 space after TEXT 4)But the max length of value would be 5 and min would be 2 5)it can accept space too at any position in the value Considering all above i reached in the conclusion that it would be hard to find number of element for Value..as space can be from value or from field... what i have decided i will insert five stars(*****) max length of TEXT value... so to achieve this i am using
Quote:
Regex _regex = new Regex(@"/TEXT/([a-z0-9\-\ ]+)\ $");
for each field TEXT,TEXT: and TEXTVAL.. but i am not very much convinced with the approach ..can some body help me here.. Thanks, Tasu
vikas da
I'm not sure exactly what you are trying to do! Perhaps an example of your input and output strings would help? Preferably using "real" data, rather than "mock up"? The trouble is that your example fixes the word "TEXT" and will detect in two places in your example, so it's difficult to work out exactly what you are trying to achive:
TEXT rt st NEWTEXT 4564
is one match, and
TEXT 4564
is also a match.
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952) Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
-
I'm not sure exactly what you are trying to do! Perhaps an example of your input and output strings would help? Preferably using "real" data, rather than "mock up"? The trouble is that your example fixes the word "TEXT" and will detect in two places in your example, so it's difficult to work out exactly what you are trying to achive:
TEXT rt st NEWTEXT 4564
is one match, and
TEXT 4564
is also a match.
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952) Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
Perhaps i could have provided this before..
Quote:
"FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN 12345 ADDLINE1....." "FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN # 12345 CITY....." "FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN: 123 5 COUNTRY....." "FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN NUMBER ADDLINE1....."
So in Above the value of PIN value needs to be masked with (*)...if it has some value.. Pin can be alphanumeric and can have space also. above streams are from different screen so they do not have consistency in name of PIN as it can be PIN: or PIN # or PIN NUMBER or simply PIN... The ADDLINE1 is also not fixed and it can change to CITY,Country etc Now i have to extract the value of PIN|PIN:|PIN #|PIN NUMBER and need to replace it by ***** in all the scenario if it contains a value. Let me know if you still have some doubts..
vikas da
-
Perhaps i could have provided this before..
Quote:
"FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN 12345 ADDLINE1....." "FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN # 12345 CITY....." "FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN: 123 5 COUNTRY....." "FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN NUMBER ADDLINE1....."
So in Above the value of PIN value needs to be masked with (*)...if it has some value.. Pin can be alphanumeric and can have space also. above streams are from different screen so they do not have consistency in name of PIN as it can be PIN: or PIN # or PIN NUMBER or simply PIN... The ADDLINE1 is also not fixed and it can change to CITY,Country etc Now i have to extract the value of PIN|PIN:|PIN #|PIN NUMBER and need to replace it by ***** in all the scenario if it contains a value. Let me know if you still have some doubts..
vikas da
That's really quite nasty. If the PIN can only be numeric, then it's not too bad - but if it does contain alpha characters and spaces, then you can't find a PIN in the last example: ADDLINE1 could be PIN data... For numeric it's ok:
public static Regex regex = new Regex(
"(PIN\\s?(\\#|:|NUMBER)?\\s?)([\\d\\s]+)",
RegexOptions.Multiline
| RegexOptions.CultureInvariant
| RegexOptions.IgnorePatternWhitespace
| RegexOptions.Compiled
);
public static string regexReplace = "$1 ****";
...
string result = regex.Replace(InputText,regexReplace);buit with alphanumerics? I'm not sure it can be done... But I do love Expresso [^] - it makes working out and testing these things sooooo much easier!
Those who fail to learn history are doomed to repeat it. --- George Santayana (December 16, 1863 – September 26, 1952) Those who fail to clear history are doomed to explain it. --- OriginalGriff (February 24, 1959 – ∞)
-
Perhaps i could have provided this before..
Quote:
"FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN 12345 ADDLINE1....." "FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN # 12345 CITY....." "FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN: 123 5 COUNTRY....." "FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN NUMBER ADDLINE1....."
So in Above the value of PIN value needs to be masked with (*)...if it has some value.. Pin can be alphanumeric and can have space also. above streams are from different screen so they do not have consistency in name of PIN as it can be PIN: or PIN # or PIN NUMBER or simply PIN... The ADDLINE1 is also not fixed and it can change to CITY,Country etc Now i have to extract the value of PIN|PIN:|PIN #|PIN NUMBER and need to replace it by ***** in all the scenario if it contains a value. Let me know if you still have some doubts..
vikas da
Based on your sample data and description, this pattern will match the three PIN numbers:
(?<=(PIN|(PIN\s+\#)|(PIN:)|(PIN NUMBER))\s+)\b[\w\s]{2,5}\b
You can then replace it with "*****" to mask the values:
Regex pinNumberPattern = new Regex(@"(?<=(PIN|(PIN\s+\#)|(PIN:)|(PIN NUMBER))\s+)\b[\w\s]{2,5}\b", RegexOptions.ExplicitCapture | RegexOptions.IgnoreCase);
string input = @"FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN 12345 ADDLINE1.....
FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN # 12345 CITY.....
FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN: 123 5 COUNTRY.....
FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN NUMBER ADDLINE1.....";string output = pinNumberPattern.Replace(input, "*****");
/*
output contains:
FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN ***** ADDLINE1.....
FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN # ***** CITY.....
FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN: ***** COUNTRY.....
FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN NUMBER ADDLINE1.....
*/
"These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer
-
Based on your sample data and description, this pattern will match the three PIN numbers:
(?<=(PIN|(PIN\s+\#)|(PIN:)|(PIN NUMBER))\s+)\b[\w\s]{2,5}\b
You can then replace it with "*****" to mask the values:
Regex pinNumberPattern = new Regex(@"(?<=(PIN|(PIN\s+\#)|(PIN:)|(PIN NUMBER))\s+)\b[\w\s]{2,5}\b", RegexOptions.ExplicitCapture | RegexOptions.IgnoreCase);
string input = @"FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN 12345 ADDLINE1.....
FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN # 12345 CITY.....
FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN: 123 5 COUNTRY.....
FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN NUMBER ADDLINE1.....";string output = pinNumberPattern.Replace(input, "*****");
/*
output contains:
FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN ***** ADDLINE1.....
FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN # ***** CITY.....
FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN: ***** COUNTRY.....
FIRSTNAME Tasu LASTNAME Mishra DOB 02011982 PIN NUMBER ADDLINE1.....
*/
"These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer