problem with unicode. [modified]
-
i need to get the character when the unicode is given. for example : i get the unicode values as a string -> "\U0061" so then i need to get the character for this unicode value. i know that i can use "char c = '\U0061' but unfortunately i get the Unicode value as a string, so what i need is to get the character by that string . anyone got an idea to do this. thanx in advance.
modified on Wednesday, May 11, 2011 12:30 PM
-
i need to get the character when the unicode is given. for example : i get the unicode values as a string -> "\U0061" so then i need to get the character for this unicode value. i know that i can use "char c = '\U0061' but unfortunately i get the Unicode value as a string, so what i need is to get the character by that string . anyone got an idea to do this. thanx in advance.
modified on Wednesday, May 11, 2011 12:30 PM
I only know three ways to do something of that kind.
string s="\U0061";
char c0=s[0];
char c1='\u0061';
char c2=(char)0x0061;:)
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
modified on Wednesday, May 11, 2011 1:04 PM
-
i need to get the character when the unicode is given. for example : i get the unicode values as a string -> "\U0061" so then i need to get the character for this unicode value. i know that i can use "char c = '\U0061' but unfortunately i get the Unicode value as a string, so what i need is to get the character by that string . anyone got an idea to do this. thanx in advance.
modified on Wednesday, May 11, 2011 12:30 PM
-
i need to get the character when the unicode is given. for example : i get the unicode values as a string -> "\U0061" so then i need to get the character for this unicode value. i know that i can use "char c = '\U0061' but unfortunately i get the Unicode value as a string, so what i need is to get the character by that string . anyone got an idea to do this. thanx in advance.
modified on Wednesday, May 11, 2011 12:30 PM
-
could you please guide me on that. thanx.
-
I only know three ways to do something of that kind.
string s="\U0061";
char c0=s[0];
char c1='\u0061';
char c2=(char)0x0061;:)
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
modified on Wednesday, May 11, 2011 1:04 PM
thanx, the first one is the option that i could apply , but it also not working for me, i tried the first option and i still get the first character in the string . any idea about it . thanx.
-
"/U0061" gives the string /U0061; I think you mean "\U0061".
The best things in life are not things.
yea, sorry about that mistake.
-
could you please guide me on that. thanx.
-
thanx, the first one is the option that i could apply , but it also not working for me, i tried the first option and i still get the first character in the string . any idea about it . thanx.
prasadbuddhika wrote:
i still get the first character in the string
????????????????????? string s only holds one character. the whole backslash-u-fourdigirs thing is C#'s way to specify a single character by its Unicode character number.
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
-
prasadbuddhika wrote:
i still get the first character in the string
????????????????????? string s only holds one character. the whole backslash-u-fourdigirs thing is C#'s way to specify a single character by its Unicode character number.
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
thanx Luc, i had made a mistake , i had used "U" instead of "u" , when i use "U" it gives me the unrecognized escape sequence . but with "u" it works fine thank you.
-
i need to get the character when the unicode is given. for example : i get the unicode values as a string -> "\U0061" so then i need to get the character for this unicode value. i know that i can use "char c = '\U0061' but unfortunately i get the Unicode value as a string, so what i need is to get the character by that string . anyone got an idea to do this. thanx in advance.
modified on Wednesday, May 11, 2011 12:30 PM
if what you have is a six-character string containing a real backslash, a U, and four hex digits, then you could turn that into a single character like so, however this situation is rare, it would typically occur only if you plan on writing your own C# compiler!
string s=@"\U0061"; // the @ in front tells the compiler to ignore the special meaning of backslashes
int uni;
if (s==null || s.Length!=6 || s[0]!='\\' || s[1]!='U' ||
!int.TryParse(s.Substring(2, 4), NumberStyles.HexNumber, null, out uni))
throw new Exception("Bad unicode string in: "+s);
char c=(char)uni;
log("uni="+uni.ToString("X4"));
log("c="+c);:)
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
modified on Wednesday, May 11, 2011 1:23 PM
-
i need to get the character when the unicode is given. for example : i get the unicode values as a string -> "\U0061" so then i need to get the character for this unicode value. i know that i can use "char c = '\U0061' but unfortunately i get the Unicode value as a string, so what i need is to get the character by that string . anyone got an idea to do this. thanx in advance.
modified on Wednesday, May 11, 2011 12:30 PM
Is there a real problem here? As noted you have a simple character one which will be in string if the string is created correctly. However you CANNOT use a single C# data type 'char', to represent the entire supported character set range. So if that is your goal you will fail. Read up on "surrogate pairs" to find out why.
-
First search for occurrences of \\U[0-9]+ (that's a regex, the backslash is escaped and you may have to doubly-escape it) Then replace it with
(char)int.Parse(match.Value.Split('U')[1])
(though I would refactor that)You have a problem. The four digits after the \u are HEX not decimal. So
int.Parse
won't cut it. Cheers, PeterSoftware rusts. Simon Stephenson, ca 1994.
-
You have a problem. The four digits after the \u are HEX not decimal. So
int.Parse
won't cut it. Cheers, PeterSoftware rusts. Simon Stephenson, ca 1994.
-
Because that's the way \u_nnnn_ works. Big brother to the \x_nn_ convention for single byte characters. Borrowing a couple of sentences from the Java Language Specification, section 3.2: A Unicode escape of the form \u_xxxx_, where xxxx is a hexadecimal value, represents the Unicode character whose encoding is xxxx. This translation step allows any program to be expressed using only ASCII characters.
Software rusts. Simon Stephenson, ca 1994.
-
Because that's the way \u_nnnn_ works. Big brother to the \x_nn_ convention for single byte characters. Borrowing a couple of sentences from the Java Language Specification, section 3.2: A Unicode escape of the form \u_xxxx_, where xxxx is a hexadecimal value, represents the Unicode character whose encoding is xxxx. This translation step allows any program to be expressed using only ASCII characters.
Software rusts. Simon Stephenson, ca 1994.
-
\u_nnnn_ doesn't work any particular way, it's just a string.. Of course it works that way in Java and C# and no doubt some other places as well, but there's no guarantee that it always does and OP should have specified it
Well, this is the C# forum... :doh:
Software rusts. Simon Stephenson, ca 1994.
-
Well, this is the C# forum... :doh:
Software rusts. Simon Stephenson, ca 1994.
Why does that matter? It's not about the string "\Uanything" (ie a string containing the actual character), but about a string containing "\\Uanything" that has to be converted the the first form. Anything could still be in any form - nowhere did he say that it originates from C# sourcecode.
-
Why does that matter? It's not about the string "\Uanything" (ie a string containing the actual character), but about a string containing "\\Uanything" that has to be converted the the first form. Anything could still be in any form - nowhere did he say that it originates from C# sourcecode.
I didn't have a problem understanding what OP wanted to do. Luc didn't have a problem. Richard didn't have a problem. jschell didn't have a problem. OP didn't have a problem understanding Luc's answers. I'm not going on a troll-feeding expedition. End of discussion.
Software rusts. Simon Stephenson, ca 1994.
-
I didn't have a problem understanding what OP wanted to do. Luc didn't have a problem. Richard didn't have a problem. jschell didn't have a problem. OP didn't have a problem understanding Luc's answers. I'm not going on a troll-feeding expedition. End of discussion.
Software rusts. Simon Stephenson, ca 1994.