Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. Regular Expressions
  4. Help with regex HTML form validation Part 2

Help with regex HTML form validation Part 2

Scheduled Pinned Locked Moved Regular Expressions
regexhtmldatabasetestingbeta-testing
16 Posts 2 Posters 12 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • R robwm1

    Hi, Now I need a pattern to detect last name possibilities. I think this pattern will be slightly more complicated. Names that I see in the database are like: Jones Jones-Smith Jones Smith (no hyphen) O'Leary Van Allen (no hyphen) Vander Ark (no hyphen) I think that this pattern will work but I would like public opinion to make sure I am getting this right: ^[a-zA-Z\-\s']+$ Can you think of any last names where this will not work? In testing it seems to work out alright. Thanks, Rob

    M Offline
    M Offline
    Matt T Heffron
    wrote on last edited by
    #2

    Don't forget the characters that include diacritical marks. E.g., ö Å ç

    A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.

    R 1 Reply Last reply
    0
    • M Matt T Heffron

      Don't forget the characters that include diacritical marks. E.g., ö Å ç

      A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.

      R Offline
      R Offline
      robwm1
      wrote on last edited by
      #3

      Is there a way to check for that without having to list every Unicode character? I didn't see any accented names in our database but that certainly doesn't mean it can't happen in the future. I'd prefer to not include all Unicode characters. Just the ones with a high likelihood of showing up. I imagine that it could only be characters that would be accepted by Active Directory.

      M 1 Reply Last reply
      0
      • R robwm1

        Is there a way to check for that without having to list every Unicode character? I didn't see any accented names in our database but that certainly doesn't mean it can't happen in the future. I'd prefer to not include all Unicode characters. Just the ones with a high likelihood of showing up. I imagine that it could only be characters that would be accepted by Active Directory.

        M Offline
        M Offline
        Matt T Heffron
        wrote on last edited by
        #4

        At least with the .NET Regex http://msdn.microsoft.com/en-us/library/20bw873z(v=vs.110).aspx#CategoryOrBlock[^] (I don't know about others) you can specify the Unicode character category (for "Letter") so your regex would be:

        ^[\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Lm}\-\s']+$

        possibly even just

        ^[\p{L}\-\s']+$

        A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.

        R 2 Replies Last reply
        0
        • M Matt T Heffron

          At least with the .NET Regex http://msdn.microsoft.com/en-us/library/20bw873z(v=vs.110).aspx#CategoryOrBlock[^] (I don't know about others) you can specify the Unicode character category (for "Letter") so your regex would be:

          ^[\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Lm}\-\s']+$

          possibly even just

          ^[\p{L}\-\s']+$

          A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.

          R Offline
          R Offline
          robwm1
          wrote on last edited by
          #5

          After looking at that link, a person could go crazy trying to catch every possibility. Looks like regex can be very thorough! Thanks for the help!

          M 1 Reply Last reply
          0
          • R robwm1

            After looking at that link, a person could go crazy trying to catch every possibility. Looks like regex can be very thorough! Thanks for the help!

            M Offline
            M Offline
            Matt T Heffron
            wrote on last edited by
            #6

            Yes!! There's a reason the "Mastering Regular Expressions" book[^] is 496 pages!!! :omg:

            A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.

            1 Reply Last reply
            0
            • M Matt T Heffron

              At least with the .NET Regex http://msdn.microsoft.com/en-us/library/20bw873z(v=vs.110).aspx#CategoryOrBlock[^] (I don't know about others) you can specify the Unicode character category (for "Letter") so your regex would be:

              ^[\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Lm}\-\s']+$

              possibly even just

              ^[\p{L}\-\s']+$

              A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.

              R Offline
              R Offline
              robwm1
              wrote on last edited by
              #7

              How would you allow for a period only at the end of the string where in the case a name ends in Jr. or Sr.? A period wouldn't normally appear in any other position in a last name. I'm going with the pattern below so far. I'm double checking names in Active Directory but I'm reasonably sure you can't use diacritical characters. I need to research that to be certain. ^[a-zA-Z\-\s']+$

              M 2 Replies Last reply
              0
              • R robwm1

                How would you allow for a period only at the end of the string where in the case a name ends in Jr. or Sr.? A period wouldn't normally appear in any other position in a last name. I'm going with the pattern below so far. I'm double checking names in Active Directory but I'm reasonably sure you can't use diacritical characters. I need to research that to be certain. ^[a-zA-Z\-\s']+$

                M Offline
                M Offline
                Matt T Heffron
                wrote on last edited by
                #8

                ^[a-zA-Z\-\s']+\.$

                Add the \. right before the $

                A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.

                R 2 Replies Last reply
                0
                • M Matt T Heffron

                  ^[a-zA-Z\-\s']+\.$

                  Add the \. right before the $

                  A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.

                  R Offline
                  R Offline
                  robwm1
                  wrote on last edited by
                  #9

                  That works perfect. I'm really starting to get the hang of this.

                  M 1 Reply Last reply
                  0
                  • R robwm1

                    How would you allow for a period only at the end of the string where in the case a name ends in Jr. or Sr.? A period wouldn't normally appear in any other position in a last name. I'm going with the pattern below so far. I'm double checking names in Active Directory but I'm reasonably sure you can't use diacritical characters. I need to research that to be certain. ^[a-zA-Z\-\s']+$

                    M Offline
                    M Offline
                    Matt T Heffron
                    wrote on last edited by
                    #10

                    I'd be awfully surprised if the only characters allowed in Active Directory worldwide are the basic ASCII-ish letters.

                    A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.

                    R 1 Reply Last reply
                    0
                    • M Matt T Heffron

                      I'd be awfully surprised if the only characters allowed in Active Directory worldwide are the basic ASCII-ish letters.

                      A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.

                      R Offline
                      R Offline
                      robwm1
                      wrote on last edited by
                      #11

                      I know we have a least one person that has an accented 'e' in their last name but it's not that way in Active Directory. I don't know if that is due the person making the entry didn't know how to make the accented character or it was disallowed. I'll definitely research to be sure before I make a final decision to leave it out. I will post my findings here.

                      1 Reply Last reply
                      0
                      • R robwm1

                        That works perfect. I'm really starting to get the hang of this.

                        M Offline
                        M Offline
                        Matt T Heffron
                        wrote on last edited by
                        #12

                        Checkout the Expresso[^] tool (free) to explore regular expressions!

                        A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.

                        R 1 Reply Last reply
                        0
                        • M Matt T Heffron

                          Checkout the Expresso[^] tool (free) to explore regular expressions!

                          A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.

                          R Offline
                          R Offline
                          robwm1
                          wrote on last edited by
                          #13

                          Right, that is actually the tool I'm using. I bumped into it a couple of years ago but this is the first time I ever used regex.

                          1 Reply Last reply
                          0
                          • M Matt T Heffron

                            ^[a-zA-Z\-\s']+\.$

                            Add the \. right before the $

                            A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.

                            R Offline
                            R Offline
                            robwm1
                            wrote on last edited by
                            #14

                            Well, this pattern was working yesterday on a different computer at work. I installed Expresso on my personal computer so I could work on my project over the weekend and now the pattern is not working. ^[a-zA-Z\-\s']+\.$ john1 = no matches The pattern should match the number one because numbers are not allowed but the results are blank when I run this pattern. I could have sworn that this was working yesterday. EDIT: I did some further testing and discovered that the \. is breaking the pattern. If there is no period at the end; then count = 0. This pattern seems to require the period at the end and then it works correctly. The period should be allowed 0 or 1 times at the end of the string. So the pattern below is working the way I want it to in Expresso but not when I use it in an HTA using vbscript to do the pattern matching. Vbscript is throwing an error at the line where the pattern is executed. ^[a-zA-Z\-\s']+?\.$ Not sure how to make a pattern that works in Expresso to also work with vbscript. SOLUTION: ^[a-zA-Z\-\s']+?\.$ This pattern works when testing in Expresso but doesn't work with vbscript although this may work when used with other languages. ^[a-zA-Z\-\s']+\.{0,1}$ This is the pattern that behaves the same way as the pattern above but also works with vbscript. MATCHES: Jones Jones-Smith Jones Smith (no hyphen) O'Leary Van Allen (no hyphen) Vander Ark (no hyphen) Jones Sr. Although this doesn't address diacritical characters, a few conversations with colleagues resulted in the decision that the risk is very low that they will be used in Active Directory. We currently have only 3 techs making entries into AD so informing them of how this pattern works will reduce the risk even further. I have worked for my organization for 14 years and no diacritical characters have been used until now so I feel pretty safe in not testing for them. It may not be the ultimate approach such as selling a product to the public but it does meet the needs of the specifications that were given to me. Thank you! - I'd like to give a shout out to everyone who helped me out with this project! I really appreciate all of you taking the time to steer me in the right direction! I would go as far as to say that CodeProject could be just as valuable as sitting in any classroom. You may not get a certification here but the knowledge gained is invaluable. I was able to gain a solid understanding of regex in a matter of a few hours. I watched several videos but I wo

                            M 1 Reply Last reply
                            0
                            • R robwm1

                              Well, this pattern was working yesterday on a different computer at work. I installed Expresso on my personal computer so I could work on my project over the weekend and now the pattern is not working. ^[a-zA-Z\-\s']+\.$ john1 = no matches The pattern should match the number one because numbers are not allowed but the results are blank when I run this pattern. I could have sworn that this was working yesterday. EDIT: I did some further testing and discovered that the \. is breaking the pattern. If there is no period at the end; then count = 0. This pattern seems to require the period at the end and then it works correctly. The period should be allowed 0 or 1 times at the end of the string. So the pattern below is working the way I want it to in Expresso but not when I use it in an HTA using vbscript to do the pattern matching. Vbscript is throwing an error at the line where the pattern is executed. ^[a-zA-Z\-\s']+?\.$ Not sure how to make a pattern that works in Expresso to also work with vbscript. SOLUTION: ^[a-zA-Z\-\s']+?\.$ This pattern works when testing in Expresso but doesn't work with vbscript although this may work when used with other languages. ^[a-zA-Z\-\s']+\.{0,1}$ This is the pattern that behaves the same way as the pattern above but also works with vbscript. MATCHES: Jones Jones-Smith Jones Smith (no hyphen) O'Leary Van Allen (no hyphen) Vander Ark (no hyphen) Jones Sr. Although this doesn't address diacritical characters, a few conversations with colleagues resulted in the decision that the risk is very low that they will be used in Active Directory. We currently have only 3 techs making entries into AD so informing them of how this pattern works will reduce the risk even further. I have worked for my organization for 14 years and no diacritical characters have been used until now so I feel pretty safe in not testing for them. It may not be the ultimate approach such as selling a product to the public but it does meet the needs of the specifications that were given to me. Thank you! - I'd like to give a shout out to everyone who helped me out with this project! I really appreciate all of you taking the time to steer me in the right direction! I would go as far as to say that CodeProject could be just as valuable as sitting in any classroom. You may not get a certification here but the knowledge gained is invaluable. I was able to gain a solid understanding of regex in a matter of a few hours. I watched several videos but I wo

                              M Offline
                              M Offline
                              Matt T Heffron
                              wrote on last edited by
                              #15

                              robwm1 wrote:

                              ^[a-zA-Z\-\s']+?\.$

                              This was so.... close. When I suggested the \. I forgot the conditional aspect of the the dot at the end. (Sorry.) Just move the ? to be after the \.

                              ^[a-zA-Z\-\s']+\.?$

                              the ? means exactly the same thing as {0,1}

                              A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.

                              R 1 Reply Last reply
                              0
                              • M Matt T Heffron

                                robwm1 wrote:

                                ^[a-zA-Z\-\s']+?\.$

                                This was so.... close. When I suggested the \. I forgot the conditional aspect of the the dot at the end. (Sorry.) Just move the ? to be after the \.

                                ^[a-zA-Z\-\s']+\.?$

                                the ? means exactly the same thing as {0,1}

                                A positive attitude may not solve every problem, but it will annoy enough people to be worth the effort.

                                R Offline
                                R Offline
                                robwm1
                                wrote on last edited by
                                #16

                                I never thought to move the ? to the end. You're right though, it is the same result as {0,1}. Thanks again!

                                1 Reply Last reply
                                0
                                Reply
                                • Reply as topic
                                Log in to reply
                                • Oldest to Newest
                                • Newest to Oldest
                                • Most Votes


                                • Login

                                • Don't have an account? Register

                                • Login or register to search.
                                • First post
                                  Last post
                                0
                                • Categories
                                • Recent
                                • Tags
                                • Popular
                                • World
                                • Users
                                • Groups