Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. problem with unicode. [modified]

problem with unicode. [modified]

Scheduled Pinned Locked Moved C#
helptutorial
26 Posts 7 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • P Offline
    P Offline
    prasadbuddhika
    wrote on last edited by
    #1

    i need to get the character when the unicode is given. for example : i get the unicode values as a string -> "\U0061" so then i need to get the character for this unicode value. i know that i can use "char c = '\U0061' but unfortunately i get the Unicode value as a string, so what i need is to get the character by that string . anyone got an idea to do this. thanx in advance.

    modified on Wednesday, May 11, 2011 12:30 PM

    L D L J 5 Replies Last reply
    0
    • P prasadbuddhika

      i need to get the character when the unicode is given. for example : i get the unicode values as a string -> "\U0061" so then i need to get the character for this unicode value. i know that i can use "char c = '\U0061' but unfortunately i get the Unicode value as a string, so what i need is to get the character by that string . anyone got an idea to do this. thanx in advance.

      modified on Wednesday, May 11, 2011 12:30 PM

      L Offline
      L Offline
      Luc Pattyn
      wrote on last edited by
      #2

      I only know three ways to do something of that kind.

      string s="\U0061";
      char c0=s[0];
      char c1='\u0061';
      char c2=(char)0x0061;

      :)

      Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum

      Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.

      modified on Wednesday, May 11, 2011 1:04 PM

      P 1 Reply Last reply
      0
      • P prasadbuddhika

        i need to get the character when the unicode is given. for example : i get the unicode values as a string -> "\U0061" so then i need to get the character for this unicode value. i know that i can use "char c = '\U0061' but unfortunately i get the Unicode value as a string, so what i need is to get the character by that string . anyone got an idea to do this. thanx in advance.

        modified on Wednesday, May 11, 2011 12:30 PM

        D Offline
        D Offline
        David1987
        wrote on last edited by
        #3

        You can parse it as int (the 0061 part) and then cast to char.

        P 1 Reply Last reply
        0
        • P prasadbuddhika

          i need to get the character when the unicode is given. for example : i get the unicode values as a string -> "\U0061" so then i need to get the character for this unicode value. i know that i can use "char c = '\U0061' but unfortunately i get the Unicode value as a string, so what i need is to get the character by that string . anyone got an idea to do this. thanx in advance.

          modified on Wednesday, May 11, 2011 12:30 PM

          L Offline
          L Offline
          Lost User
          wrote on last edited by
          #4

          "/U0061" gives the string /U0061; I think you mean "\U0061".

          The best things in life are not things.

          P 1 Reply Last reply
          0
          • D David1987

            You can parse it as int (the 0061 part) and then cast to char.

            P Offline
            P Offline
            prasadbuddhika
            wrote on last edited by
            #5

            could you please guide me on that. thanx.

            D 1 Reply Last reply
            0
            • L Luc Pattyn

              I only know three ways to do something of that kind.

              string s="\U0061";
              char c0=s[0];
              char c1='\u0061';
              char c2=(char)0x0061;

              :)

              Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum

              Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.

              modified on Wednesday, May 11, 2011 1:04 PM

              P Offline
              P Offline
              prasadbuddhika
              wrote on last edited by
              #6

              thanx, the first one is the option that i could apply , but it also not working for me, i tried the first option and i still get the first character in the string . any idea about it . thanx.

              L 1 Reply Last reply
              0
              • L Lost User

                "/U0061" gives the string /U0061; I think you mean "\U0061".

                The best things in life are not things.

                P Offline
                P Offline
                prasadbuddhika
                wrote on last edited by
                #7

                yea, sorry about that mistake.

                1 Reply Last reply
                0
                • P prasadbuddhika

                  could you please guide me on that. thanx.

                  D Offline
                  D Offline
                  David1987
                  wrote on last edited by
                  #8

                  First search for occurrences of \\U[0-9]+ (that's a regex, the backslash is escaped and you may have to doubly-escape it) Then replace it with (char)int.Parse(match.Value.Split('U')[1]) (though I would refactor that)

                  P 1 Reply Last reply
                  0
                  • P prasadbuddhika

                    thanx, the first one is the option that i could apply , but it also not working for me, i tried the first option and i still get the first character in the string . any idea about it . thanx.

                    L Offline
                    L Offline
                    Luc Pattyn
                    wrote on last edited by
                    #9

                    prasadbuddhika wrote:

                    i still get the first character in the string

                    ????????????????????? string s only holds one character. the whole backslash-u-fourdigirs thing is C#'s way to specify a single character by its Unicode character number.

                    Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum

                    Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.

                    P 1 Reply Last reply
                    0
                    • L Luc Pattyn

                      prasadbuddhika wrote:

                      i still get the first character in the string

                      ????????????????????? string s only holds one character. the whole backslash-u-fourdigirs thing is C#'s way to specify a single character by its Unicode character number.

                      Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum

                      Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.

                      P Offline
                      P Offline
                      prasadbuddhika
                      wrote on last edited by
                      #10

                      thanx Luc, i had made a mistake , i had used "U" instead of "u" , when i use "U" it gives me the unrecognized escape sequence . but with "u" it works fine thank you.

                      1 Reply Last reply
                      0
                      • P prasadbuddhika

                        i need to get the character when the unicode is given. for example : i get the unicode values as a string -> "\U0061" so then i need to get the character for this unicode value. i know that i can use "char c = '\U0061' but unfortunately i get the Unicode value as a string, so what i need is to get the character by that string . anyone got an idea to do this. thanx in advance.

                        modified on Wednesday, May 11, 2011 12:30 PM

                        L Offline
                        L Offline
                        Luc Pattyn
                        wrote on last edited by
                        #11

                        if what you have is a six-character string containing a real backslash, a U, and four hex digits, then you could turn that into a single character like so, however this situation is rare, it would typically occur only if you plan on writing your own C# compiler!

                        string s=@"\U0061"; // the @ in front tells the compiler to ignore the special meaning of backslashes
                        int uni;
                        if (s==null || s.Length!=6 || s[0]!='\\' || s[1]!='U' ||
                        !int.TryParse(s.Substring(2, 4), NumberStyles.HexNumber, null, out uni))
                        throw new Exception("Bad unicode string in: "+s);
                        char c=(char)uni;
                        log("uni="+uni.ToString("X4"));
                        log("c="+c);

                        :)

                        Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum

                        Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.

                        modified on Wednesday, May 11, 2011 1:23 PM

                        1 Reply Last reply
                        0
                        • P prasadbuddhika

                          i need to get the character when the unicode is given. for example : i get the unicode values as a string -> "\U0061" so then i need to get the character for this unicode value. i know that i can use "char c = '\U0061' but unfortunately i get the Unicode value as a string, so what i need is to get the character by that string . anyone got an idea to do this. thanx in advance.

                          modified on Wednesday, May 11, 2011 12:30 PM

                          J Offline
                          J Offline
                          jschell
                          wrote on last edited by
                          #12

                          Is there a real problem here? As noted you have a simple character one which will be in string if the string is created correctly. However you CANNOT use a single C# data type 'char', to represent the entire supported character set range. So if that is your goal you will fail. Read up on "surrogate pairs" to find out why.

                          1 Reply Last reply
                          0
                          • D David1987

                            First search for occurrences of \\U[0-9]+ (that's a regex, the backslash is escaped and you may have to doubly-escape it) Then replace it with (char)int.Parse(match.Value.Split('U')[1]) (though I would refactor that)

                            P Offline
                            P Offline
                            Peter_in_2780
                            wrote on last edited by
                            #13

                            You have a problem. The four digits after the \u are HEX not decimal. So int.Parse won't cut it. Cheers, Peter

                            Software rusts. Simon Stephenson, ca 1994.

                            D 1 Reply Last reply
                            0
                            • P Peter_in_2780

                              You have a problem. The four digits after the \u are HEX not decimal. So int.Parse won't cut it. Cheers, Peter

                              Software rusts. Simon Stephenson, ca 1994.

                              D Offline
                              D Offline
                              David1987
                              wrote on last edited by
                              #14

                              OP didn't say so, so how do you know?

                              P 1 Reply Last reply
                              0
                              • D David1987

                                OP didn't say so, so how do you know?

                                P Offline
                                P Offline
                                Peter_in_2780
                                wrote on last edited by
                                #15

                                Because that's the way \u_nnnn_ works. Big brother to the \x_nn_ convention for single byte characters. Borrowing a couple of sentences from the Java Language Specification, section 3.2: A Unicode escape of the form \u_xxxx_, where xxxx is a hexadecimal value, represents the Unicode character whose encoding is xxxx. This translation step allows any program to be expressed using only ASCII characters.

                                Software rusts. Simon Stephenson, ca 1994.

                                D 1 Reply Last reply
                                0
                                • P Peter_in_2780

                                  Because that's the way \u_nnnn_ works. Big brother to the \x_nn_ convention for single byte characters. Borrowing a couple of sentences from the Java Language Specification, section 3.2: A Unicode escape of the form \u_xxxx_, where xxxx is a hexadecimal value, represents the Unicode character whose encoding is xxxx. This translation step allows any program to be expressed using only ASCII characters.

                                  Software rusts. Simon Stephenson, ca 1994.

                                  D Offline
                                  D Offline
                                  David1987
                                  wrote on last edited by
                                  #16

                                  \u_nnnn_ doesn't work any particular way, it's just a string.. Of course it works that way in Java and C# and no doubt some other places as well, but there's no guarantee that it always does and OP should have specified it

                                  P 1 Reply Last reply
                                  0
                                  • D David1987

                                    \u_nnnn_ doesn't work any particular way, it's just a string.. Of course it works that way in Java and C# and no doubt some other places as well, but there's no guarantee that it always does and OP should have specified it

                                    P Offline
                                    P Offline
                                    Peter_in_2780
                                    wrote on last edited by
                                    #17

                                    Well, this is the C# forum... :doh:

                                    Software rusts. Simon Stephenson, ca 1994.

                                    D 1 Reply Last reply
                                    0
                                    • P Peter_in_2780

                                      Well, this is the C# forum... :doh:

                                      Software rusts. Simon Stephenson, ca 1994.

                                      D Offline
                                      D Offline
                                      David1987
                                      wrote on last edited by
                                      #18

                                      Why does that matter? It's not about the string "\Uanything" (ie a string containing the actual character), but about a string containing "\\Uanything" that has to be converted the the first form. Anything could still be in any form - nowhere did he say that it originates from C# sourcecode.

                                      P 1 Reply Last reply
                                      0
                                      • D David1987

                                        Why does that matter? It's not about the string "\Uanything" (ie a string containing the actual character), but about a string containing "\\Uanything" that has to be converted the the first form. Anything could still be in any form - nowhere did he say that it originates from C# sourcecode.

                                        P Offline
                                        P Offline
                                        Peter_in_2780
                                        wrote on last edited by
                                        #19

                                        I didn't have a problem understanding what OP wanted to do. Luc didn't have a problem. Richard didn't have a problem. jschell didn't have a problem. OP didn't have a problem understanding Luc's answers. I'm not going on a troll-feeding expedition. End of discussion.

                                        Software rusts. Simon Stephenson, ca 1994.

                                        D L 4 Replies Last reply
                                        0
                                        • P Peter_in_2780

                                          I didn't have a problem understanding what OP wanted to do. Luc didn't have a problem. Richard didn't have a problem. jschell didn't have a problem. OP didn't have a problem understanding Luc's answers. I'm not going on a troll-feeding expedition. End of discussion.

                                          Software rusts. Simon Stephenson, ca 1994.

                                          D Offline
                                          D Offline
                                          David1987
                                          wrote on last edited by
                                          #20

                                          And let me remind you, you are wrong. The OP did not specify that the number had to be in HEX, therefore it was not clear.

                                          1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Don't have an account? Register

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups