problem with unicode. [modified]
-
i need to get the character when the unicode is given. for example : i get the unicode values as a string -> "\U0061" so then i need to get the character for this unicode value. i know that i can use "char c = '\U0061' but unfortunately i get the Unicode value as a string, so what i need is to get the character by that string . anyone got an idea to do this. thanx in advance.
modified on Wednesday, May 11, 2011 12:30 PM
-
i need to get the character when the unicode is given. for example : i get the unicode values as a string -> "\U0061" so then i need to get the character for this unicode value. i know that i can use "char c = '\U0061' but unfortunately i get the Unicode value as a string, so what i need is to get the character by that string . anyone got an idea to do this. thanx in advance.
modified on Wednesday, May 11, 2011 12:30 PM
I only know three ways to do something of that kind.
string s="\U0061";
char c0=s[0];
char c1='\u0061';
char c2=(char)0x0061;:)
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
modified on Wednesday, May 11, 2011 1:04 PM
-
i need to get the character when the unicode is given. for example : i get the unicode values as a string -> "\U0061" so then i need to get the character for this unicode value. i know that i can use "char c = '\U0061' but unfortunately i get the Unicode value as a string, so what i need is to get the character by that string . anyone got an idea to do this. thanx in advance.
modified on Wednesday, May 11, 2011 12:30 PM
-
could you please guide me on that. thanx.
-
i need to get the character when the unicode is given. for example : i get the unicode values as a string -> "\U0061" so then i need to get the character for this unicode value. i know that i can use "char c = '\U0061' but unfortunately i get the Unicode value as a string, so what i need is to get the character by that string . anyone got an idea to do this. thanx in advance.
modified on Wednesday, May 11, 2011 12:30 PM
-
I only know three ways to do something of that kind.
string s="\U0061";
char c0=s[0];
char c1='\u0061';
char c2=(char)0x0061;:)
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
modified on Wednesday, May 11, 2011 1:04 PM
thanx, the first one is the option that i could apply , but it also not working for me, i tried the first option and i still get the first character in the string . any idea about it . thanx.
-
"/U0061" gives the string /U0061; I think you mean "\U0061".
The best things in life are not things.
yea, sorry about that mistake.
-
could you please guide me on that. thanx.
-
thanx, the first one is the option that i could apply , but it also not working for me, i tried the first option and i still get the first character in the string . any idea about it . thanx.
prasadbuddhika wrote:
i still get the first character in the string
????????????????????? string s only holds one character. the whole backslash-u-fourdigirs thing is C#'s way to specify a single character by its Unicode character number.
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
-
prasadbuddhika wrote:
i still get the first character in the string
????????????????????? string s only holds one character. the whole backslash-u-fourdigirs thing is C#'s way to specify a single character by its Unicode character number.
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
thanx Luc, i had made a mistake , i had used "U" instead of "u" , when i use "U" it gives me the unrecognized escape sequence . but with "u" it works fine thank you.
-
i need to get the character when the unicode is given. for example : i get the unicode values as a string -> "\U0061" so then i need to get the character for this unicode value. i know that i can use "char c = '\U0061' but unfortunately i get the Unicode value as a string, so what i need is to get the character by that string . anyone got an idea to do this. thanx in advance.
modified on Wednesday, May 11, 2011 12:30 PM
if what you have is a six-character string containing a real backslash, a U, and four hex digits, then you could turn that into a single character like so, however this situation is rare, it would typically occur only if you plan on writing your own C# compiler!
string s=@"\U0061"; // the @ in front tells the compiler to ignore the special meaning of backslashes
int uni;
if (s==null || s.Length!=6 || s[0]!='\\' || s[1]!='U' ||
!int.TryParse(s.Substring(2, 4), NumberStyles.HexNumber, null, out uni))
throw new Exception("Bad unicode string in: "+s);
char c=(char)uni;
log("uni="+uni.ToString("X4"));
log("c="+c);:)
Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum
Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.
modified on Wednesday, May 11, 2011 1:23 PM
-
i need to get the character when the unicode is given. for example : i get the unicode values as a string -> "\U0061" so then i need to get the character for this unicode value. i know that i can use "char c = '\U0061' but unfortunately i get the Unicode value as a string, so what i need is to get the character by that string . anyone got an idea to do this. thanx in advance.
modified on Wednesday, May 11, 2011 12:30 PM
Is there a real problem here? As noted you have a simple character one which will be in string if the string is created correctly. However you CANNOT use a single C# data type 'char', to represent the entire supported character set range. So if that is your goal you will fail. Read up on "surrogate pairs" to find out why.
-
First search for occurrences of \\U[0-9]+ (that's a regex, the backslash is escaped and you may have to doubly-escape it) Then replace it with
(char)int.Parse(match.Value.Split('U')[1])
(though I would refactor that)You have a problem. The four digits after the \u are HEX not decimal. So
int.Parse
won't cut it. Cheers, PeterSoftware rusts. Simon Stephenson, ca 1994.
-
You have a problem. The four digits after the \u are HEX not decimal. So
int.Parse
won't cut it. Cheers, PeterSoftware rusts. Simon Stephenson, ca 1994.
-
Because that's the way \u_nnnn_ works. Big brother to the \x_nn_ convention for single byte characters. Borrowing a couple of sentences from the Java Language Specification, section 3.2: A Unicode escape of the form \u_xxxx_, where xxxx is a hexadecimal value, represents the Unicode character whose encoding is xxxx. This translation step allows any program to be expressed using only ASCII characters.
Software rusts. Simon Stephenson, ca 1994.
-
Because that's the way \u_nnnn_ works. Big brother to the \x_nn_ convention for single byte characters. Borrowing a couple of sentences from the Java Language Specification, section 3.2: A Unicode escape of the form \u_xxxx_, where xxxx is a hexadecimal value, represents the Unicode character whose encoding is xxxx. This translation step allows any program to be expressed using only ASCII characters.
Software rusts. Simon Stephenson, ca 1994.
-
\u_nnnn_ doesn't work any particular way, it's just a string.. Of course it works that way in Java and C# and no doubt some other places as well, but there's no guarantee that it always does and OP should have specified it
Well, this is the C# forum... :doh:
Software rusts. Simon Stephenson, ca 1994.
-
Well, this is the C# forum... :doh:
Software rusts. Simon Stephenson, ca 1994.
Why does that matter? It's not about the string "\Uanything" (ie a string containing the actual character), but about a string containing "\\Uanything" that has to be converted the the first form. Anything could still be in any form - nowhere did he say that it originates from C# sourcecode.
-
Why does that matter? It's not about the string "\Uanything" (ie a string containing the actual character), but about a string containing "\\Uanything" that has to be converted the the first form. Anything could still be in any form - nowhere did he say that it originates from C# sourcecode.
I didn't have a problem understanding what OP wanted to do. Luc didn't have a problem. Richard didn't have a problem. jschell didn't have a problem. OP didn't have a problem understanding Luc's answers. I'm not going on a troll-feeding expedition. End of discussion.
Software rusts. Simon Stephenson, ca 1994.
-
I didn't have a problem understanding what OP wanted to do. Luc didn't have a problem. Richard didn't have a problem. jschell didn't have a problem. OP didn't have a problem understanding Luc's answers. I'm not going on a troll-feeding expedition. End of discussion.
Software rusts. Simon Stephenson, ca 1994.
Actually they all had problems understanding what he meant. Luc's first answer doesn't answer the question, Richard's answer doesn't the question, Luc's first version of his second answer parsed it as decimal aswell IIRC and jschell just noted a problem with what the OP is trying to do.
Peter_in_2780 wrote:
I'm not going on a troll-feeding expedition. End of discussion.
Fuck you. It was a legitimate discussion. You are the troll here, not me.
modified on Thursday, May 12, 2011 3:42 AM