Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. Regex and accents

Regex and accents

Scheduled Pinned Locked Moved C#
6 Posts 3 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • R Offline
    R Offline
    Rome Singh
    wrote on last edited by
    #1

    I'm using Regex to identify if a token is alpha [a-zA-Z] but it will not recognize french accents like É, Ô ... Does anyone know what the expression should be?

    N D 3 Replies Last reply
    0
    • R Rome Singh

      I'm using Regex to identify if a token is alpha [a-zA-Z] but it will not recognize french accents like É, Ô ... Does anyone know what the expression should be?

      N Offline
      N Offline
      Nathan Blomquist
      wrote on last edited by
      #2

      You will need to include the character codes for those accented characters in your Regex. I don't know what they are but you should be able to look them up. The Regex is just matching characters a through z and A through Z. It does not know anything about those accented characters until you add them to the expression. Hope this helps, Nathan --------------------------- Hmmm... what's a signature?

      1 Reply Last reply
      0
      • R Rome Singh

        I'm using Regex to identify if a token is alpha [a-zA-Z] but it will not recognize french accents like É, Ô ... Does anyone know what the expression should be?

        D Offline
        D Offline
        Daniel Turini
        wrote on last edited by
        #3

        Did you try the \w (lower case!) expression? I see dumb people

        R 1 Reply Last reply
        0
        • D Daniel Turini

          Did you try the \w (lower case!) expression? I see dumb people

          R Offline
          R Offline
          Rome Singh
          wrote on last edited by
          #4

          I did but something like 34A returns alpha. That not what I want. I have another regex checking for alphanumeric.

          1 Reply Last reply
          0
          • R Rome Singh

            I'm using Regex to identify if a token is alpha [a-zA-Z] but it will not recognize french accents like É, Ô ... Does anyone know what the expression should be?

            N Offline
            N Offline
            Nathan Blomquist
            wrote on last edited by
            #5

            I think I figured it out, at least if I understand you correctly. You can look at this Ascii table[^]. So to match those characters you need to enter in the extended ascii codes. You can do this in windows by holding down alt and typing the 3 digit ascii codes on the numpad. Example: É = alt + 144 So then to get a range of these characters do: [a-zA-ZÉ-ô] I don't know exactly what characters you need, so you will have to enter them yourself. The alt + combination doesn't seem to work in Internet Explorer, but it does work in Visual Studio on WindowsXP English. Hope this help clarify more, Nathan --------------------------- Hmmm... what's a signature?

            R 1 Reply Last reply
            0
            • N Nathan Blomquist

              I think I figured it out, at least if I understand you correctly. You can look at this Ascii table[^]. So to match those characters you need to enter in the extended ascii codes. You can do this in windows by holding down alt and typing the 3 digit ascii codes on the numpad. Example: É = alt + 144 So then to get a range of these characters do: [a-zA-ZÉ-ô] I don't know exactly what characters you need, so you will have to enter them yourself. The alt + combination doesn't seem to work in Internet Explorer, but it does work in Visual Studio on WindowsXP English. Hope this help clarify more, Nathan --------------------------- Hmmm... what's a signature?

              R Offline
              R Offline
              Rome Singh
              wrote on last edited by
              #6

              Entering the range Ã-ÿ does seem to work. The ASCII table in your link seems to be code page 850 (American). If you bring up the character map, under Accessories, you can see the values for those characters (Windows code page which I think is 1252??) The problem is that I will be parsing data from databases. With databases you can specify the code page to use. This would cause a problem with using the range you specified. I found a few links http://www.unicode.org/reports/tr18/ http://www.microsoft.com/mspress/books/sampchap/5957.asp There is a [/p{Greek}] and a few other languages but no French. I'll try your solution for now. Thanks, Rome

              1 Reply Last reply
              0
              Reply
              • Reply as topic
              Log in to reply
              • Oldest to Newest
              • Newest to Oldest
              • Most Votes


              • Login

              • Don't have an account? Register

              • Login or register to search.
              • First post
                Last post
              0
              • Categories
              • Recent
              • Tags
              • Popular
              • World
              • Users
              • Groups