Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. problem with unicode. [modified]

problem with unicode. [modified]

Scheduled Pinned Locked Moved C#
helptutorial
26 Posts 7 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • P Offline
    P Offline
    prasadbuddhika
    wrote on last edited by
    #1

    i need to get the character when the unicode is given. for example : i get the unicode values as a string -> "\U0061" so then i need to get the character for this unicode value. i know that i can use "char c = '\U0061' but unfortunately i get the Unicode value as a string, so what i need is to get the character by that string . anyone got an idea to do this. thanx in advance.

    modified on Wednesday, May 11, 2011 12:30 PM

    L D L J 5 Replies Last reply
    0
    • P prasadbuddhika

      i need to get the character when the unicode is given. for example : i get the unicode values as a string -> "\U0061" so then i need to get the character for this unicode value. i know that i can use "char c = '\U0061' but unfortunately i get the Unicode value as a string, so what i need is to get the character by that string . anyone got an idea to do this. thanx in advance.

      modified on Wednesday, May 11, 2011 12:30 PM

      L Offline
      L Offline
      Luc Pattyn
      wrote on last edited by
      #2

      I only know three ways to do something of that kind.

      string s="\U0061";
      char c0=s[0];
      char c1='\u0061';
      char c2=(char)0x0061;

      :)

      Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum

      Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.

      modified on Wednesday, May 11, 2011 1:04 PM

      P 1 Reply Last reply
      0
      • P prasadbuddhika

        i need to get the character when the unicode is given. for example : i get the unicode values as a string -> "\U0061" so then i need to get the character for this unicode value. i know that i can use "char c = '\U0061' but unfortunately i get the Unicode value as a string, so what i need is to get the character by that string . anyone got an idea to do this. thanx in advance.

        modified on Wednesday, May 11, 2011 12:30 PM

        D Offline
        D Offline
        David1987
        wrote on last edited by
        #3

        You can parse it as int (the 0061 part) and then cast to char.

        P 1 Reply Last reply
        0
        • D David1987

          You can parse it as int (the 0061 part) and then cast to char.

          P Offline
          P Offline
          prasadbuddhika
          wrote on last edited by
          #4

          could you please guide me on that. thanx.

          D 1 Reply Last reply
          0
          • P prasadbuddhika

            i need to get the character when the unicode is given. for example : i get the unicode values as a string -> "\U0061" so then i need to get the character for this unicode value. i know that i can use "char c = '\U0061' but unfortunately i get the Unicode value as a string, so what i need is to get the character by that string . anyone got an idea to do this. thanx in advance.

            modified on Wednesday, May 11, 2011 12:30 PM

            L Offline
            L Offline
            Lost User
            wrote on last edited by
            #5

            "/U0061" gives the string /U0061; I think you mean "\U0061".

            The best things in life are not things.

            P 1 Reply Last reply
            0
            • L Luc Pattyn

              I only know three ways to do something of that kind.

              string s="\U0061";
              char c0=s[0];
              char c1='\u0061';
              char c2=(char)0x0061;

              :)

              Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum

              Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.

              modified on Wednesday, May 11, 2011 1:04 PM

              P Offline
              P Offline
              prasadbuddhika
              wrote on last edited by
              #6

              thanx, the first one is the option that i could apply , but it also not working for me, i tried the first option and i still get the first character in the string . any idea about it . thanx.

              L 1 Reply Last reply
              0
              • L Lost User

                "/U0061" gives the string /U0061; I think you mean "\U0061".

                The best things in life are not things.

                P Offline
                P Offline
                prasadbuddhika
                wrote on last edited by
                #7

                yea, sorry about that mistake.

                1 Reply Last reply
                0
                • P prasadbuddhika

                  could you please guide me on that. thanx.

                  D Offline
                  D Offline
                  David1987
                  wrote on last edited by
                  #8

                  First search for occurrences of \\U[0-9]+ (that's a regex, the backslash is escaped and you may have to doubly-escape it) Then replace it with (char)int.Parse(match.Value.Split('U')[1]) (though I would refactor that)

                  P 1 Reply Last reply
                  0
                  • P prasadbuddhika

                    thanx, the first one is the option that i could apply , but it also not working for me, i tried the first option and i still get the first character in the string . any idea about it . thanx.

                    L Offline
                    L Offline
                    Luc Pattyn
                    wrote on last edited by
                    #9

                    prasadbuddhika wrote:

                    i still get the first character in the string

                    ????????????????????? string s only holds one character. the whole backslash-u-fourdigirs thing is C#'s way to specify a single character by its Unicode character number.

                    Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum

                    Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.

                    P 1 Reply Last reply
                    0
                    • L Luc Pattyn

                      prasadbuddhika wrote:

                      i still get the first character in the string

                      ????????????????????? string s only holds one character. the whole backslash-u-fourdigirs thing is C#'s way to specify a single character by its Unicode character number.

                      Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum

                      Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.

                      P Offline
                      P Offline
                      prasadbuddhika
                      wrote on last edited by
                      #10

                      thanx Luc, i had made a mistake , i had used "U" instead of "u" , when i use "U" it gives me the unrecognized escape sequence . but with "u" it works fine thank you.

                      1 Reply Last reply
                      0
                      • P prasadbuddhika

                        i need to get the character when the unicode is given. for example : i get the unicode values as a string -> "\U0061" so then i need to get the character for this unicode value. i know that i can use "char c = '\U0061' but unfortunately i get the Unicode value as a string, so what i need is to get the character by that string . anyone got an idea to do this. thanx in advance.

                        modified on Wednesday, May 11, 2011 12:30 PM

                        L Offline
                        L Offline
                        Luc Pattyn
                        wrote on last edited by
                        #11

                        if what you have is a six-character string containing a real backslash, a U, and four hex digits, then you could turn that into a single character like so, however this situation is rare, it would typically occur only if you plan on writing your own C# compiler!

                        string s=@"\U0061"; // the @ in front tells the compiler to ignore the special meaning of backslashes
                        int uni;
                        if (s==null || s.Length!=6 || s[0]!='\\' || s[1]!='U' ||
                        !int.TryParse(s.Substring(2, 4), NumberStyles.HexNumber, null, out uni))
                        throw new Exception("Bad unicode string in: "+s);
                        char c=(char)uni;
                        log("uni="+uni.ToString("X4"));
                        log("c="+c);

                        :)

                        Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum

                        Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.

                        modified on Wednesday, May 11, 2011 1:23 PM

                        1 Reply Last reply
                        0
                        • P prasadbuddhika

                          i need to get the character when the unicode is given. for example : i get the unicode values as a string -> "\U0061" so then i need to get the character for this unicode value. i know that i can use "char c = '\U0061' but unfortunately i get the Unicode value as a string, so what i need is to get the character by that string . anyone got an idea to do this. thanx in advance.

                          modified on Wednesday, May 11, 2011 12:30 PM

                          J Offline
                          J Offline
                          jschell
                          wrote on last edited by
                          #12

                          Is there a real problem here? As noted you have a simple character one which will be in string if the string is created correctly. However you CANNOT use a single C# data type 'char', to represent the entire supported character set range. So if that is your goal you will fail. Read up on "surrogate pairs" to find out why.

                          1 Reply Last reply
                          0
                          • D David1987

                            First search for occurrences of \\U[0-9]+ (that's a regex, the backslash is escaped and you may have to doubly-escape it) Then replace it with (char)int.Parse(match.Value.Split('U')[1]) (though I would refactor that)

                            P Offline
                            P Offline
                            Peter_in_2780
                            wrote on last edited by
                            #13

                            You have a problem. The four digits after the \u are HEX not decimal. So int.Parse won't cut it. Cheers, Peter

                            Software rusts. Simon Stephenson, ca 1994.

                            D 1 Reply Last reply
                            0
                            • P Peter_in_2780

                              You have a problem. The four digits after the \u are HEX not decimal. So int.Parse won't cut it. Cheers, Peter

                              Software rusts. Simon Stephenson, ca 1994.

                              D Offline
                              D Offline
                              David1987
                              wrote on last edited by
                              #14

                              OP didn't say so, so how do you know?

                              P 1 Reply Last reply
                              0
                              • D David1987

                                OP didn't say so, so how do you know?

                                P Offline
                                P Offline
                                Peter_in_2780
                                wrote on last edited by
                                #15

                                Because that's the way \u_nnnn_ works. Big brother to the \x_nn_ convention for single byte characters. Borrowing a couple of sentences from the Java Language Specification, section 3.2: A Unicode escape of the form \u_xxxx_, where xxxx is a hexadecimal value, represents the Unicode character whose encoding is xxxx. This translation step allows any program to be expressed using only ASCII characters.

                                Software rusts. Simon Stephenson, ca 1994.

                                D 1 Reply Last reply
                                0
                                • P Peter_in_2780

                                  Because that's the way \u_nnnn_ works. Big brother to the \x_nn_ convention for single byte characters. Borrowing a couple of sentences from the Java Language Specification, section 3.2: A Unicode escape of the form \u_xxxx_, where xxxx is a hexadecimal value, represents the Unicode character whose encoding is xxxx. This translation step allows any program to be expressed using only ASCII characters.

                                  Software rusts. Simon Stephenson, ca 1994.

                                  D Offline
                                  D Offline
                                  David1987
                                  wrote on last edited by
                                  #16

                                  \u_nnnn_ doesn't work any particular way, it's just a string.. Of course it works that way in Java and C# and no doubt some other places as well, but there's no guarantee that it always does and OP should have specified it

                                  P 1 Reply Last reply
                                  0
                                  • D David1987

                                    \u_nnnn_ doesn't work any particular way, it's just a string.. Of course it works that way in Java and C# and no doubt some other places as well, but there's no guarantee that it always does and OP should have specified it

                                    P Offline
                                    P Offline
                                    Peter_in_2780
                                    wrote on last edited by
                                    #17

                                    Well, this is the C# forum... :doh:

                                    Software rusts. Simon Stephenson, ca 1994.

                                    D 1 Reply Last reply
                                    0
                                    • P Peter_in_2780

                                      Well, this is the C# forum... :doh:

                                      Software rusts. Simon Stephenson, ca 1994.

                                      D Offline
                                      D Offline
                                      David1987
                                      wrote on last edited by
                                      #18

                                      Why does that matter? It's not about the string "\Uanything" (ie a string containing the actual character), but about a string containing "\\Uanything" that has to be converted the the first form. Anything could still be in any form - nowhere did he say that it originates from C# sourcecode.

                                      P 1 Reply Last reply
                                      0
                                      • D David1987

                                        Why does that matter? It's not about the string "\Uanything" (ie a string containing the actual character), but about a string containing "\\Uanything" that has to be converted the the first form. Anything could still be in any form - nowhere did he say that it originates from C# sourcecode.

                                        P Offline
                                        P Offline
                                        Peter_in_2780
                                        wrote on last edited by
                                        #19

                                        I didn't have a problem understanding what OP wanted to do. Luc didn't have a problem. Richard didn't have a problem. jschell didn't have a problem. OP didn't have a problem understanding Luc's answers. I'm not going on a troll-feeding expedition. End of discussion.

                                        Software rusts. Simon Stephenson, ca 1994.

                                        D L 4 Replies Last reply
                                        0
                                        • P Peter_in_2780

                                          I didn't have a problem understanding what OP wanted to do. Luc didn't have a problem. Richard didn't have a problem. jschell didn't have a problem. OP didn't have a problem understanding Luc's answers. I'm not going on a troll-feeding expedition. End of discussion.

                                          Software rusts. Simon Stephenson, ca 1994.

                                          D Offline
                                          D Offline
                                          David1987
                                          wrote on last edited by
                                          #20

                                          Actually they all had problems understanding what he meant. Luc's first answer doesn't answer the question, Richard's answer doesn't the question, Luc's first version of his second answer parsed it as decimal aswell IIRC and jschell just noted a problem with what the OP is trying to do.

                                          Peter_in_2780 wrote:

                                          I'm not going on a troll-feeding expedition. End of discussion.

                                          Fuck you. It was a legitimate discussion. You are the troll here, not me.

                                          modified on Thursday, May 12, 2011 3:42 AM

                                          L L 2 Replies Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Don't have an account? Register

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups