Regex and accents
-
I'm using Regex to identify if a token is alpha [a-zA-Z] but it will not recognize french accents like É, Ô ... Does anyone know what the expression should be?
-
I'm using Regex to identify if a token is alpha [a-zA-Z] but it will not recognize french accents like É, Ô ... Does anyone know what the expression should be?
You will need to include the character codes for those accented characters in your Regex. I don't know what they are but you should be able to look them up. The Regex is just matching characters a through z and A through Z. It does not know anything about those accented characters until you add them to the expression. Hope this helps, Nathan --------------------------- Hmmm... what's a signature?
-
I'm using Regex to identify if a token is alpha [a-zA-Z] but it will not recognize french accents like É, Ô ... Does anyone know what the expression should be?
Did you try the \w (lower case!) expression? I see dumb people
-
Did you try the \w (lower case!) expression? I see dumb people
I did but something like 34A returns alpha. That not what I want. I have another regex checking for alphanumeric.
-
I'm using Regex to identify if a token is alpha [a-zA-Z] but it will not recognize french accents like É, Ô ... Does anyone know what the expression should be?
I think I figured it out, at least if I understand you correctly. You can look at this Ascii table[^]. So to match those characters you need to enter in the extended ascii codes. You can do this in windows by holding down alt and typing the 3 digit ascii codes on the numpad. Example: É = alt + 144 So then to get a range of these characters do: [a-zA-ZÉ-ô] I don't know exactly what characters you need, so you will have to enter them yourself. The alt + combination doesn't seem to work in Internet Explorer, but it does work in Visual Studio on WindowsXP English. Hope this help clarify more, Nathan --------------------------- Hmmm... what's a signature?
-
I think I figured it out, at least if I understand you correctly. You can look at this Ascii table[^]. So to match those characters you need to enter in the extended ascii codes. You can do this in windows by holding down alt and typing the 3 digit ascii codes on the numpad. Example: É = alt + 144 So then to get a range of these characters do: [a-zA-ZÉ-ô] I don't know exactly what characters you need, so you will have to enter them yourself. The alt + combination doesn't seem to work in Internet Explorer, but it does work in Visual Studio on WindowsXP English. Hope this help clarify more, Nathan --------------------------- Hmmm... what's a signature?
Entering the range Ã-ÿ does seem to work. The ASCII table in your link seems to be code page 850 (American). If you bring up the character map, under Accessories, you can see the values for those characters (Windows code page which I think is 1252??) The problem is that I will be parsing data from databases. With databases you can specify the code page to use. This would cause a problem with using the range you specified. I found a few links http://www.unicode.org/reports/tr18/ http://www.microsoft.com/mspress/books/sampchap/5957.asp There is a [/p{Greek}] and a few other languages but no French. I'll try your solution for now. Thanks, Rome