problem with unicode. [modified]

prasadbuddhika

i need to get the character when the unicode is given. for example : i get the unicode values as a string -> "\U0061" so then i need to get the character for this unicode value. i know that i can use "char c = '\U0061' but unfortunately i get the Unicode value as a string, so what i need is to get the character by that string . anyone got an idea to do this. thanx in advance.

modified on Wednesday, May 11, 2011 12:30 PM

Luc Pattyn · modified on Wednesday, May 11, 2011 12:30 PM

I only know three ways to do something of that kind.

string s="\U0061";
char c0=s[0];
char c1='\u0061';
char c2=(char)0x0061;

:)

Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum

Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.

modified on Wednesday, May 11, 2011 1:04 PM

David1987 · modified on Wednesday, May 11, 2011 12:30 PM

You can parse it as int (the 0061 part) and then cast to char.

Lost User · modified on Wednesday, May 11, 2011 12:30 PM

"/U0061" gives the string /U0061; I think you mean "\U0061".

The best things in life are not things.

prasadbuddhika

could you please guide me on that. thanx.

prasadbuddhika · modified on Wednesday, May 11, 2011 1:04 PM

thanx, the first one is the option that i could apply , but it also not working for me, i tried the first option and i still get the first character in the string . any idea about it . thanx.

prasadbuddhika

yea, sorry about that mistake.

David1987

First search for occurrences of \\U[0-9]+ (that's a regex, the backslash is escaped and you may have to doubly-escape it) Then replace it with (char)int.Parse(match.Value.Split('U')[1]) (though I would refactor that)

Luc Pattyn

prasadbuddhika wrote:

i still get the first character in the string

????????????????????? string s only holds one character. the whole backslash-u-fourdigirs thing is C#'s way to specify a single character by its Unicode character number.

Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum

Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.

prasadbuddhika

thanx Luc, i had made a mistake , i had used "U" instead of "u" , when i use "U" it gives me the unrecognized escape sequence . but with "u" it works fine thank you.

Luc Pattyn · modified on Wednesday, May 11, 2011 12:30 PM

if what you have is a six-character string containing a real backslash, a U, and four hex digits, then you could turn that into a single character like so, however this situation is rare, it would typically occur only if you plan on writing your own C# compiler!

string s=@"\U0061"; // the @ in front tells the compiler to ignore the special meaning of backslashes
int uni;
if (s==null || s.Length!=6 || s[0]!='\\' || s[1]!='U' ||
!int.TryParse(s.Substring(2, 4), NumberStyles.HexNumber, null, out uni))
throw new Exception("Bad unicode string in: "+s);
char c=(char)uni;
log("uni="+uni.ToString("X4"));
log("c="+c);

:)

Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum

Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.

modified on Wednesday, May 11, 2011 1:23 PM

jschell · modified on Wednesday, May 11, 2011 12:30 PM

Is there a real problem here? As noted you have a simple character one which will be in string if the string is created correctly. However you CANNOT use a single C# data type 'char', to represent the entire supported character set range. So if that is your goal you will fail. Read up on "surrogate pairs" to find out why.

Peter_in_2780

You have a problem. The four digits after the \u are HEX not decimal. So int.Parse won't cut it. Cheers, Peter

Software rusts. Simon Stephenson, ca 1994.

David1987

OP didn't say so, so how do you know?

Peter_in_2780

Because that's the way \u_nnnn_ works. Big brother to the \x_nn_ convention for single byte characters. Borrowing a couple of sentences from the Java Language Specification, section 3.2: A Unicode escape of the form \u_xxxx_, where xxxx is a hexadecimal value, represents the Unicode character whose encoding is xxxx. This translation step allows any program to be expressed using only ASCII characters.

Software rusts. Simon Stephenson, ca 1994.

David1987

\u_nnnn_ doesn't work any particular way, it's just a string.. Of course it works that way in Java and C# and no doubt some other places as well, but there's no guarantee that it always does and OP should have specified it

Peter_in_2780

Well, this is the C# forum... :doh:

Software rusts. Simon Stephenson, ca 1994.

David1987

Why does that matter? It's not about the string "\Uanything" (ie a string containing the actual character), but about a string containing "\\Uanything" that has to be converted the the first form. Anything could still be in any form - nowhere did he say that it originates from C# sourcecode.

Peter_in_2780

I didn't have a problem understanding what OP wanted to do. Luc didn't have a problem. Richard didn't have a problem. jschell didn't have a problem. OP didn't have a problem understanding Luc's answers. I'm not going on a troll-feeding expedition. End of discussion.

Software rusts. Simon Stephenson, ca 1994.

David1987

And let me remind you, you are wrong. The OP did not specify that the number had to be in HEX, therefore it was not clear.