Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. Regular Expressions
  4. Help with regex HTML form validation Part 2

Help with regex HTML form validation Part 2

Scheduled Pinned Locked Moved Regular Expressions
regexhtmldatabasetestingbeta-testing
16 Posts 2 Posters 12 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • R Offline
    R Offline
    robwm1
    wrote on last edited by
    #1

    Hi, Now I need a pattern to detect last name possibilities. I think this pattern will be slightly more complicated. Names that I see in the database are like: Jones Jones-Smith Jones Smith (no hyphen) O'Leary Van Allen (no hyphen) Vander Ark (no hyphen) I think that this pattern will work but I would like public opinion to make sure I am getting this right: ^[a-zA-Z\-\s']+$ Can you think of any last names where this will not work? In testing it seems to work out alright. Thanks, Rob

    M 1 Reply Last reply
    0
    • R robwm1

      Hi, Now I need a pattern to detect last name possibilities. I think this pattern will be slightly more complicated. Names that I see in the database are like: Jones Jones-Smith Jones Smith (no hyphen) O'Leary Van Allen (no hyphen) Vander Ark (no hyphen) I think that this pattern will work but I would like public opinion to make sure I am getting this right: ^[a-zA-Z\-\s']+$ Can you think of any last names where this will not work? In testing it seems to work out alright. Thanks, Rob

      M Offline
      M Offline
      Matt T Heffron
      wrote on last edited by
      #2

      Don't forget the characters that include diacritical marks. E.g., ö Å ç

      A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.

      R 1 Reply Last reply
      0
      • M Matt T Heffron

        Don't forget the characters that include diacritical marks. E.g., ö Å ç

        A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.

        R Offline
        R Offline
        robwm1
        wrote on last edited by
        #3

        Is there a way to check for that without having to list every Unicode character? I didn't see any accented names in our database but that certainly doesn't mean it can't happen in the future. I'd prefer to not include all Unicode characters. Just the ones with a high likelihood of showing up. I imagine that it could only be characters that would be accepted by Active Directory.

        M 1 Reply Last reply
        0
        • R robwm1

          Is there a way to check for that without having to list every Unicode character? I didn't see any accented names in our database but that certainly doesn't mean it can't happen in the future. I'd prefer to not include all Unicode characters. Just the ones with a high likelihood of showing up. I imagine that it could only be characters that would be accepted by Active Directory.

          M Offline
          M Offline
          Matt T Heffron
          wrote on last edited by
          #4

          At least with the .NET Regex http://msdn.microsoft.com/en-us/library/20bw873z(v=vs.110).aspx#CategoryOrBlock[^] (I don't know about others) you can specify the Unicode character category (for "Letter") so your regex would be:

          ^[\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Lm}\-\s']+$

          possibly even just

          ^[\p{L}\-\s']+$

          A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.

          R 2 Replies Last reply
          0
          • M Matt T Heffron

            At least with the .NET Regex http://msdn.microsoft.com/en-us/library/20bw873z(v=vs.110).aspx#CategoryOrBlock[^] (I don't know about others) you can specify the Unicode character category (for "Letter") so your regex would be:

            ^[\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Lm}\-\s']+$

            possibly even just

            ^[\p{L}\-\s']+$

            A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.

            R Offline
            R Offline
            robwm1
            wrote on last edited by
            #5

            After looking at that link, a person could go crazy trying to catch every possibility. Looks like regex can be very thorough! Thanks for the help!

            M 1 Reply Last reply
            0
            • R robwm1

              After looking at that link, a person could go crazy trying to catch every possibility. Looks like regex can be very thorough! Thanks for the help!

              M Offline
              M Offline
              Matt T Heffron
              wrote on last edited by
              #6

              Yes!! There's a reason the "Mastering Regular Expressions" book[^] is 496 pages!!! :omg:

              A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.

              1 Reply Last reply
              0
              • M Matt T Heffron

                At least with the .NET Regex http://msdn.microsoft.com/en-us/library/20bw873z(v=vs.110).aspx#CategoryOrBlock[^] (I don't know about others) you can specify the Unicode character category (for "Letter") so your regex would be:

                ^[\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Lm}\-\s']+$

                possibly even just

                ^[\p{L}\-\s']+$

                A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.

                R Offline
                R Offline
                robwm1
                wrote on last edited by
                #7

                How would you allow for a period only at the end of the string where in the case a name ends in Jr. or Sr.? A period wouldn't normally appear in any other position in a last name. I'm going with the pattern below so far. I'm double checking names in Active Directory but I'm reasonably sure you can't use diacritical characters. I need to research that to be certain. ^[a-zA-Z\-\s']+$

                M 2 Replies Last reply
                0
                • R robwm1

                  How would you allow for a period only at the end of the string where in the case a name ends in Jr. or Sr.? A period wouldn't normally appear in any other position in a last name. I'm going with the pattern below so far. I'm double checking names in Active Directory but I'm reasonably sure you can't use diacritical characters. I need to research that to be certain. ^[a-zA-Z\-\s']+$

                  M Offline
                  M Offline
                  Matt T Heffron
                  wrote on last edited by
                  #8

                  ^[a-zA-Z\-\s']+\.$

                  Add the \. right before the $

                  A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.

                  R 2 Replies Last reply
                  0
                  • M Matt T Heffron

                    ^[a-zA-Z\-\s']+\.$

                    Add the \. right before the $

                    A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.

                    R Offline
                    R Offline
                    robwm1
                    wrote on last edited by
                    #9

                    That works perfect. I'm really starting to get the hang of this.

                    M 1 Reply Last reply
                    0
                    • R robwm1

                      How would you allow for a period only at the end of the string where in the case a name ends in Jr. or Sr.? A period wouldn't normally appear in any other position in a last name. I'm going with the pattern below so far. I'm double checking names in Active Directory but I'm reasonably sure you can't use diacritical characters. I need to research that to be certain. ^[a-zA-Z\-\s']+$

                      M Offline
                      M Offline
                      Matt T Heffron
                      wrote on last edited by
                      #10

                      I'd be awfully surprised if the only characters allowed in Active Directory worldwide are the basic ASCII-ish letters.

                      A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.

                      R 1 Reply Last reply
                      0
                      • M Matt T Heffron

                        I'd be awfully surprised if the only characters allowed in Active Directory worldwide are the basic ASCII-ish letters.

                        A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.

                        R Offline
                        R Offline
                        robwm1
                        wrote on last edited by
                        #11

                        I know we have a least one person that has an accented 'e' in their last name but it's not that way in Active Directory. I don't know if that is due the person making the entry didn't know how to make the accented character or it was disallowed. I'll definitely research to be sure before I make a final decision to leave it out. I will post my findings here.

                        1 Reply Last reply
                        0
                        • R robwm1

                          That works perfect. I'm really starting to get the hang of this.

                          M Offline
                          M Offline
                          Matt T Heffron
                          wrote on last edited by
                          #12

                          Checkout the Expresso[^] tool (free) to explore regular expressions!

                          A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.

                          R 1 Reply Last reply
                          0
                          • M Matt T Heffron

                            Checkout the Expresso[^] tool (free) to explore regular expressions!

                            A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.

                            R Offline
                            R Offline
                            robwm1
                            wrote on last edited by
                            #13

                            Right, that is actually the tool I'm using. I bumped into it a couple of years ago but this is the first time I ever used regex.

                            1 Reply Last reply
                            0
                            • M Matt T Heffron

                              ^[a-zA-Z\-\s']+\.$

                              Add the \. right before the $

                              A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.

                              R Offline
                              R Offline
                              robwm1
                              wrote on last edited by
                              #14

                              Well, this pattern was working yesterday on a different computer at work. I installed Expresso on my personal computer so I could work on my project over the weekend and now the pattern is not working. ^[a-zA-Z\-\s']+\.$ john1 = no matches The pattern should match the number one because numbers are not allowed but the results are blank when I run this pattern. I could have sworn that this was working yesterday. EDIT: I did some further testing and discovered that the \. is breaking the pattern. If there is no period at the end; then count = 0. This pattern seems to require the period at the end and then it works correctly. The period should be allowed 0 or 1 times at the end of the string. So the pattern below is working the way I want it to in Expresso but not when I use it in an HTA using vbscript to do the pattern matching. Vbscript is throwing an error at the line where the pattern is executed. ^[a-zA-Z\-\s']+?\.$ Not sure how to make a pattern that works in Expresso to also work with vbscript. SOLUTION: ^[a-zA-Z\-\s']+?\.$ This pattern works when testing in Expresso but doesn't work with vbscript although this may work when used with other languages. ^[a-zA-Z\-\s']+\.{0,1}$ This is the pattern that behaves the same way as the pattern above but also works with vbscript. MATCHES: Jones Jones-Smith Jones Smith (no hyphen) O'Leary Van Allen (no hyphen) Vander Ark (no hyphen) Jones Sr. Although this doesn't address diacritical characters, a few conversations with colleagues resulted in the decision that the risk is very low that they will be used in Active Directory. We currently have only 3 techs making entries into AD so informing them of how this pattern works will reduce the risk even further. I have worked for my organization for 14 years and no diacritical characters have been used until now so I feel pretty safe in not testing for them. It may not be the ultimate approach such as selling a product to the public but it does meet the needs of the specifications that were given to me. Thank you! - I'd like to give a shout out to everyone who helped me out with this project! I really appreciate all of you taking the time to steer me in the right direction! I would go as far as to say that CodeProject could be just as valuable as sitting in any classroom. You may not get a certification here but the knowledge gained is invaluable. I was able to gain a solid understanding of regex in a matter of a few hours. I watched several videos but I wo

                              M 1 Reply Last reply
                              0
                              • R robwm1

                                Well, this pattern was working yesterday on a different computer at work. I installed Expresso on my personal computer so I could work on my project over the weekend and now the pattern is not working. ^[a-zA-Z\-\s']+\.$ john1 = no matches The pattern should match the number one because numbers are not allowed but the results are blank when I run this pattern. I could have sworn that this was working yesterday. EDIT: I did some further testing and discovered that the \. is breaking the pattern. If there is no period at the end; then count = 0. This pattern seems to require the period at the end and then it works correctly. The period should be allowed 0 or 1 times at the end of the string. So the pattern below is working the way I want it to in Expresso but not when I use it in an HTA using vbscript to do the pattern matching. Vbscript is throwing an error at the line where the pattern is executed. ^[a-zA-Z\-\s']+?\.$ Not sure how to make a pattern that works in Expresso to also work with vbscript. SOLUTION: ^[a-zA-Z\-\s']+?\.$ This pattern works when testing in Expresso but doesn't work with vbscript although this may work when used with other languages. ^[a-zA-Z\-\s']+\.{0,1}$ This is the pattern that behaves the same way as the pattern above but also works with vbscript. MATCHES: Jones Jones-Smith Jones Smith (no hyphen) O'Leary Van Allen (no hyphen) Vander Ark (no hyphen) Jones Sr. Although this doesn't address diacritical characters, a few conversations with colleagues resulted in the decision that the risk is very low that they will be used in Active Directory. We currently have only 3 techs making entries into AD so informing them of how this pattern works will reduce the risk even further. I have worked for my organization for 14 years and no diacritical characters have been used until now so I feel pretty safe in not testing for them. It may not be the ultimate approach such as selling a product to the public but it does meet the needs of the specifications that were given to me. Thank you! - I'd like to give a shout out to everyone who helped me out with this project! I really appreciate all of you taking the time to steer me in the right direction! I would go as far as to say that CodeProject could be just as valuable as sitting in any classroom. You may not get a certification here but the knowledge gained is invaluable. I was able to gain a solid understanding of regex in a matter of a few hours. I watched several videos but I wo

                                M Offline
                                M Offline
                                Matt T Heffron
                                wrote on last edited by
                                #15

                                robwm1 wrote:

                                ^[a-zA-Z\-\s']+?\.$

                                This was so.... close. When I suggested the \. I forgot the conditional aspect of the the dot at the end. (Sorry.) Just move the ? to be after the \.

                                ^[a-zA-Z\-\s']+\.?$

                                the ? means exactly the same thing as {0,1}

                                A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.

                                R 1 Reply Last reply
                                0
                                • M Matt T Heffron

                                  robwm1 wrote:

                                  ^[a-zA-Z\-\s']+?\.$

                                  This was so.... close. When I suggested the \. I forgot the conditional aspect of the the dot at the end. (Sorry.) Just move the ? to be after the \.

                                  ^[a-zA-Z\-\s']+\.?$

                                  the ? means exactly the same thing as {0,1}

                                  A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.

                                  R Offline
                                  R Offline
                                  robwm1
                                  wrote on last edited by
                                  #16

                                  I never thought to move the ? to the end. You're right though, it is the same result as {0,1}. Thanks again!

                                  1 Reply Last reply
                                  0
                                  Reply
                                  • Reply as topic
                                  Log in to reply
                                  • Oldest to Newest
                                  • Newest to Oldest
                                  • Most Votes


                                  • Login

                                  • Don't have an account? Register

                                  • Login or register to search.
                                  • First post
                                    Last post
                                  0
                                  • Categories
                                  • Recent
                                  • Tags
                                  • Popular
                                  • World
                                  • Users
                                  • Groups