Help with regex HTML form validation Part 2
-
Don't forget the characters that include diacritical marks. E.g., ö Å ç
A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.
Is there a way to check for that without having to list every Unicode character? I didn't see any accented names in our database but that certainly doesn't mean it can't happen in the future. I'd prefer to not include all Unicode characters. Just the ones with a high likelihood of showing up. I imagine that it could only be characters that would be accepted by Active Directory.
-
Is there a way to check for that without having to list every Unicode character? I didn't see any accented names in our database but that certainly doesn't mean it can't happen in the future. I'd prefer to not include all Unicode characters. Just the ones with a high likelihood of showing up. I imagine that it could only be characters that would be accepted by Active Directory.
At least with the .NET Regex http://msdn.microsoft.com/en-us/library/20bw873z(v=vs.110).aspx#CategoryOrBlock[^] (I don't know about others) you can specify the Unicode character category (for "Letter") so your regex would be:
^[\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Lm}\-\s']+$
possibly even just
^[\p{L}\-\s']+$
A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.
-
At least with the .NET Regex http://msdn.microsoft.com/en-us/library/20bw873z(v=vs.110).aspx#CategoryOrBlock[^] (I don't know about others) you can specify the Unicode character category (for "Letter") so your regex would be:
^[\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Lm}\-\s']+$
possibly even just
^[\p{L}\-\s']+$
A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.
-
After looking at that link, a person could go crazy trying to catch every possibility. Looks like regex can be very thorough! Thanks for the help!
Yes!! There's a reason the "Mastering Regular Expressions" book[^] is 496 pages!!! :omg:
A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.
-
At least with the .NET Regex http://msdn.microsoft.com/en-us/library/20bw873z(v=vs.110).aspx#CategoryOrBlock[^] (I don't know about others) you can specify the Unicode character category (for "Letter") so your regex would be:
^[\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Lm}\-\s']+$
possibly even just
^[\p{L}\-\s']+$
A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.
How would you allow for a period only at the end of the string where in the case a name ends in Jr. or Sr.? A period wouldn't normally appear in any other position in a last name. I'm going with the pattern below so far. I'm double checking names in Active Directory but I'm reasonably sure you can't use diacritical characters. I need to research that to be certain. ^[a-zA-Z\-\s']+$
-
How would you allow for a period only at the end of the string where in the case a name ends in Jr. or Sr.? A period wouldn't normally appear in any other position in a last name. I'm going with the pattern below so far. I'm double checking names in Active Directory but I'm reasonably sure you can't use diacritical characters. I need to research that to be certain. ^[a-zA-Z\-\s']+$
^[a-zA-Z\-\s']+\.$
Add the
\.
right before the$
A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.
-
^[a-zA-Z\-\s']+\.$
Add the
\.
right before the$
A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.
-
How would you allow for a period only at the end of the string where in the case a name ends in Jr. or Sr.? A period wouldn't normally appear in any other position in a last name. I'm going with the pattern below so far. I'm double checking names in Active Directory but I'm reasonably sure you can't use diacritical characters. I need to research that to be certain. ^[a-zA-Z\-\s']+$
I'd be awfully surprised if the only characters allowed in Active Directory worldwide are the basic ASCII-ish letters.
A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.
-
I'd be awfully surprised if the only characters allowed in Active Directory worldwide are the basic ASCII-ish letters.
A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.
I know we have a least one person that has an accented 'e' in their last name but it's not that way in Active Directory. I don't know if that is due the person making the entry didn't know how to make the accented character or it was disallowed. I'll definitely research to be sure before I make a final decision to leave it out. I will post my findings here.
-
-
^[a-zA-Z\-\s']+\.$
Add the
\.
right before the$
A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.
Well, this pattern was working yesterday on a different computer at work. I installed Expresso on my personal computer so I could work on my project over the weekend and now the pattern is not working. ^[a-zA-Z\-\s']+\.$ john1 = no matches The pattern should match the number one because numbers are not allowed but the results are blank when I run this pattern. I could have sworn that this was working yesterday. EDIT: I did some further testing and discovered that the \. is breaking the pattern. If there is no period at the end; then count = 0. This pattern seems to require the period at the end and then it works correctly. The period should be allowed 0 or 1 times at the end of the string. So the pattern below is working the way I want it to in Expresso but not when I use it in an HTA using vbscript to do the pattern matching. Vbscript is throwing an error at the line where the pattern is executed. ^[a-zA-Z\-\s']+?\.$ Not sure how to make a pattern that works in Expresso to also work with vbscript. SOLUTION: ^[a-zA-Z\-\s']+?\.$ This pattern works when testing in Expresso but doesn't work with vbscript although this may work when used with other languages. ^[a-zA-Z\-\s']+\.{0,1}$ This is the pattern that behaves the same way as the pattern above but also works with vbscript. MATCHES: Jones Jones-Smith Jones Smith (no hyphen) O'Leary Van Allen (no hyphen) Vander Ark (no hyphen) Jones Sr. Although this doesn't address diacritical characters, a few conversations with colleagues resulted in the decision that the risk is very low that they will be used in Active Directory. We currently have only 3 techs making entries into AD so informing them of how this pattern works will reduce the risk even further. I have worked for my organization for 14 years and no diacritical characters have been used until now so I feel pretty safe in not testing for them. It may not be the ultimate approach such as selling a product to the public but it does meet the needs of the specifications that were given to me. Thank you! - I'd like to give a shout out to everyone who helped me out with this project! I really appreciate all of you taking the time to steer me in the right direction! I would go as far as to say that CodeProject could be just as valuable as sitting in any classroom. You may not get a certification here but the knowledge gained is invaluable. I was able to gain a solid understanding of regex in a matter of a few hours. I watched several videos but I wo
-
Well, this pattern was working yesterday on a different computer at work. I installed Expresso on my personal computer so I could work on my project over the weekend and now the pattern is not working. ^[a-zA-Z\-\s']+\.$ john1 = no matches The pattern should match the number one because numbers are not allowed but the results are blank when I run this pattern. I could have sworn that this was working yesterday. EDIT: I did some further testing and discovered that the \. is breaking the pattern. If there is no period at the end; then count = 0. This pattern seems to require the period at the end and then it works correctly. The period should be allowed 0 or 1 times at the end of the string. So the pattern below is working the way I want it to in Expresso but not when I use it in an HTA using vbscript to do the pattern matching. Vbscript is throwing an error at the line where the pattern is executed. ^[a-zA-Z\-\s']+?\.$ Not sure how to make a pattern that works in Expresso to also work with vbscript. SOLUTION: ^[a-zA-Z\-\s']+?\.$ This pattern works when testing in Expresso but doesn't work with vbscript although this may work when used with other languages. ^[a-zA-Z\-\s']+\.{0,1}$ This is the pattern that behaves the same way as the pattern above but also works with vbscript. MATCHES: Jones Jones-Smith Jones Smith (no hyphen) O'Leary Van Allen (no hyphen) Vander Ark (no hyphen) Jones Sr. Although this doesn't address diacritical characters, a few conversations with colleagues resulted in the decision that the risk is very low that they will be used in Active Directory. We currently have only 3 techs making entries into AD so informing them of how this pattern works will reduce the risk even further. I have worked for my organization for 14 years and no diacritical characters have been used until now so I feel pretty safe in not testing for them. It may not be the ultimate approach such as selling a product to the public but it does meet the needs of the specifications that were given to me. Thank you! - I'd like to give a shout out to everyone who helped me out with this project! I really appreciate all of you taking the time to steer me in the right direction! I would go as far as to say that CodeProject could be just as valuable as sitting in any classroom. You may not get a certification here but the knowledge gained is invaluable. I was able to gain a solid understanding of regex in a matter of a few hours. I watched several videos but I wo
robwm1 wrote:
^[a-zA-Z\-\s']+?\.$
This was so.... close. When I suggested the
\.
I forgot the conditional aspect of the the dot at the end. (Sorry.) Just move the?
to be after the\.
^[a-zA-Z\-\s']+\.?$
the
?
means exactly the same thing as{0,1}
A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.
-
robwm1 wrote:
^[a-zA-Z\-\s']+?\.$
This was so.... close. When I suggested the
\.
I forgot the conditional aspect of the the dot at the end. (Sorry.) Just move the?
to be after the\.
^[a-zA-Z\-\s']+\.?$
the
?
means exactly the same thing as{0,1}
A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.