Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. problem with unicode. [modified]

problem with unicode. [modified]

Scheduled Pinned Locked Moved C#
helptutorial
26 Posts 7 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • L Luc Pattyn

    prasadbuddhika wrote:

    i still get the first character in the string

    ????????????????????? string s only holds one character. the whole backslash-u-fourdigirs thing is C#'s way to specify a single character by its Unicode character number.

    Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum

    Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.

    P Offline
    P Offline
    prasadbuddhika
    wrote on last edited by
    #10

    thanx Luc, i had made a mistake , i had used "U" instead of "u" , when i use "U" it gives me the unrecognized escape sequence . but with "u" it works fine thank you.

    1 Reply Last reply
    0
    • P prasadbuddhika

      i need to get the character when the unicode is given. for example : i get the unicode values as a string -> "\U0061" so then i need to get the character for this unicode value. i know that i can use "char c = '\U0061' but unfortunately i get the Unicode value as a string, so what i need is to get the character by that string . anyone got an idea to do this. thanx in advance.

      modified on Wednesday, May 11, 2011 12:30 PM

      L Offline
      L Offline
      Luc Pattyn
      wrote on last edited by
      #11

      if what you have is a six-character string containing a real backslash, a U, and four hex digits, then you could turn that into a single character like so, however this situation is rare, it would typically occur only if you plan on writing your own C# compiler!

      string s=@"\U0061"; // the @ in front tells the compiler to ignore the special meaning of backslashes
      int uni;
      if (s==null || s.Length!=6 || s[0]!='\\' || s[1]!='U' ||
      !int.TryParse(s.Substring(2, 4), NumberStyles.HexNumber, null, out uni))
      throw new Exception("Bad unicode string in: "+s);
      char c=(char)uni;
      log("uni="+uni.ToString("X4"));
      log("c="+c);

      :)

      Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum

      Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.

      modified on Wednesday, May 11, 2011 1:23 PM

      1 Reply Last reply
      0
      • P prasadbuddhika

        i need to get the character when the unicode is given. for example : i get the unicode values as a string -> "\U0061" so then i need to get the character for this unicode value. i know that i can use "char c = '\U0061' but unfortunately i get the Unicode value as a string, so what i need is to get the character by that string . anyone got an idea to do this. thanx in advance.

        modified on Wednesday, May 11, 2011 12:30 PM

        J Offline
        J Offline
        jschell
        wrote on last edited by
        #12

        Is there a real problem here? As noted you have a simple character one which will be in string if the string is created correctly. However you CANNOT use a single C# data type 'char', to represent the entire supported character set range. So if that is your goal you will fail. Read up on "surrogate pairs" to find out why.

        1 Reply Last reply
        0
        • D David1987

          First search for occurrences of \\U[0-9]+ (that's a regex, the backslash is escaped and you may have to doubly-escape it) Then replace it with (char)int.Parse(match.Value.Split('U')[1]) (though I would refactor that)

          P Offline
          P Offline
          Peter_in_2780
          wrote on last edited by
          #13

          You have a problem. The four digits after the \u are HEX not decimal. So int.Parse won't cut it. Cheers, Peter

          Software rusts. Simon Stephenson, ca 1994.

          D 1 Reply Last reply
          0
          • P Peter_in_2780

            You have a problem. The four digits after the \u are HEX not decimal. So int.Parse won't cut it. Cheers, Peter

            Software rusts. Simon Stephenson, ca 1994.

            D Offline
            D Offline
            David1987
            wrote on last edited by
            #14

            OP didn't say so, so how do you know?

            P 1 Reply Last reply
            0
            • D David1987

              OP didn't say so, so how do you know?

              P Offline
              P Offline
              Peter_in_2780
              wrote on last edited by
              #15

              Because that's the way \u_nnnn_ works. Big brother to the \x_nn_ convention for single byte characters. Borrowing a couple of sentences from the Java Language Specification, section 3.2: A Unicode escape of the form \u_xxxx_, where xxxx is a hexadecimal value, represents the Unicode character whose encoding is xxxx. This translation step allows any program to be expressed using only ASCII characters.

              Software rusts. Simon Stephenson, ca 1994.

              D 1 Reply Last reply
              0
              • P Peter_in_2780

                Because that's the way \u_nnnn_ works. Big brother to the \x_nn_ convention for single byte characters. Borrowing a couple of sentences from the Java Language Specification, section 3.2: A Unicode escape of the form \u_xxxx_, where xxxx is a hexadecimal value, represents the Unicode character whose encoding is xxxx. This translation step allows any program to be expressed using only ASCII characters.

                Software rusts. Simon Stephenson, ca 1994.

                D Offline
                D Offline
                David1987
                wrote on last edited by
                #16

                \u_nnnn_ doesn't work any particular way, it's just a string.. Of course it works that way in Java and C# and no doubt some other places as well, but there's no guarantee that it always does and OP should have specified it

                P 1 Reply Last reply
                0
                • D David1987

                  \u_nnnn_ doesn't work any particular way, it's just a string.. Of course it works that way in Java and C# and no doubt some other places as well, but there's no guarantee that it always does and OP should have specified it

                  P Offline
                  P Offline
                  Peter_in_2780
                  wrote on last edited by
                  #17

                  Well, this is the C# forum... :doh:

                  Software rusts. Simon Stephenson, ca 1994.

                  D 1 Reply Last reply
                  0
                  • P Peter_in_2780

                    Well, this is the C# forum... :doh:

                    Software rusts. Simon Stephenson, ca 1994.

                    D Offline
                    D Offline
                    David1987
                    wrote on last edited by
                    #18

                    Why does that matter? It's not about the string "\Uanything" (ie a string containing the actual character), but about a string containing "\\Uanything" that has to be converted the the first form. Anything could still be in any form - nowhere did he say that it originates from C# sourcecode.

                    P 1 Reply Last reply
                    0
                    • D David1987

                      Why does that matter? It's not about the string "\Uanything" (ie a string containing the actual character), but about a string containing "\\Uanything" that has to be converted the the first form. Anything could still be in any form - nowhere did he say that it originates from C# sourcecode.

                      P Offline
                      P Offline
                      Peter_in_2780
                      wrote on last edited by
                      #19

                      I didn't have a problem understanding what OP wanted to do. Luc didn't have a problem. Richard didn't have a problem. jschell didn't have a problem. OP didn't have a problem understanding Luc's answers. I'm not going on a troll-feeding expedition. End of discussion.

                      Software rusts. Simon Stephenson, ca 1994.

                      D L 4 Replies Last reply
                      0
                      • P Peter_in_2780

                        I didn't have a problem understanding what OP wanted to do. Luc didn't have a problem. Richard didn't have a problem. jschell didn't have a problem. OP didn't have a problem understanding Luc's answers. I'm not going on a troll-feeding expedition. End of discussion.

                        Software rusts. Simon Stephenson, ca 1994.

                        D Offline
                        D Offline
                        David1987
                        wrote on last edited by
                        #20

                        And let me remind you, you are wrong. The OP did not specify that the number had to be in HEX, therefore it was not clear.

                        1 Reply Last reply
                        0
                        • P Peter_in_2780

                          I didn't have a problem understanding what OP wanted to do. Luc didn't have a problem. Richard didn't have a problem. jschell didn't have a problem. OP didn't have a problem understanding Luc's answers. I'm not going on a troll-feeding expedition. End of discussion.

                          Software rusts. Simon Stephenson, ca 1994.

                          D Offline
                          D Offline
                          David1987
                          wrote on last edited by
                          #21

                          Actually they all had problems understanding what he meant. Luc's first answer doesn't answer the question, Richard's answer doesn't the question, Luc's first version of his second answer parsed it as decimal aswell IIRC and jschell just noted a problem with what the OP is trying to do.

                          Peter_in_2780 wrote:

                          I'm not going on a troll-feeding expedition. End of discussion.

                          Fuck you. It was a legitimate discussion. You are the troll here, not me.

                          modified on Thursday, May 12, 2011 3:42 AM

                          L L 2 Replies Last reply
                          0
                          • P Peter_in_2780

                            I didn't have a problem understanding what OP wanted to do. Luc didn't have a problem. Richard didn't have a problem. jschell didn't have a problem. OP didn't have a problem understanding Luc's answers. I'm not going on a troll-feeding expedition. End of discussion.

                            Software rusts. Simon Stephenson, ca 1994.

                            D Offline
                            D Offline
                            David1987
                            wrote on last edited by
                            #22

                            Who the fuck upvoted this?

                            K 1 Reply Last reply
                            0
                            • D David1987

                              Actually they all had problems understanding what he meant. Luc's first answer doesn't answer the question, Richard's answer doesn't the question, Luc's first version of his second answer parsed it as decimal aswell IIRC and jschell just noted a problem with what the OP is trying to do.

                              Peter_in_2780 wrote:

                              I'm not going on a troll-feeding expedition. End of discussion.

                              Fuck you. It was a legitimate discussion. You are the troll here, not me.

                              modified on Thursday, May 12, 2011 3:42 AM

                              L Offline
                              L Offline
                              Lost User
                              wrote on last edited by
                              #23

                              David1987 wrote:

                              Richard's answer doesn't the question

                              Precisely, because I did not think the question was clear, and it had already received enough suggestions from other people. BTW I was not confused about the hex/decimal question; as you rightly pointed out the OP did not specify what the number format was.

                              The best things in life are not things.

                              1 Reply Last reply
                              0
                              • P Peter_in_2780

                                I didn't have a problem understanding what OP wanted to do. Luc didn't have a problem. Richard didn't have a problem. jschell didn't have a problem. OP didn't have a problem understanding Luc's answers. I'm not going on a troll-feeding expedition. End of discussion.

                                Software rusts. Simon Stephenson, ca 1994.

                                L Offline
                                L Offline
                                Lost User
                                wrote on last edited by
                                #24

                                Now look[^] what you've done. :sigh:

                                The best things in life are not things.

                                1 Reply Last reply
                                0
                                • D David1987

                                  Who the fuck upvoted this?

                                  K Offline
                                  K Offline
                                  Keith Barrow
                                  wrote on last edited by
                                  #25

                                  Correcting Univotes now. Luckily my jedi powers are OK, so you'll gain more then you lost :)

                                  Sort of a cross between Lawrence of Arabia and Dilbert.[^]
                                  -Or-
                                  A Dead ringer for Kate Winslett[^]

                                  1 Reply Last reply
                                  0
                                  • D David1987

                                    Actually they all had problems understanding what he meant. Luc's first answer doesn't answer the question, Richard's answer doesn't the question, Luc's first version of his second answer parsed it as decimal aswell IIRC and jschell just noted a problem with what the OP is trying to do.

                                    Peter_in_2780 wrote:

                                    I'm not going on a troll-feeding expedition. End of discussion.

                                    Fuck you. It was a legitimate discussion. You are the troll here, not me.

                                    modified on Thursday, May 12, 2011 3:42 AM

                                    L Offline
                                    L Offline
                                    Luc Pattyn
                                    wrote on last edited by
                                    #26

                                    David1987 wrote:

                                    Luc's first version of his second answer parsed it as decimal aswell IIRC

                                    Wrong. I edited to improve the error checking, basically I added s==null || s.Length!=6 || s[0]!='\\' || s[1]!='U' to the if(...) throw statement. Now stop this silly dispute, none of us know exactly what the OP intended, as is often the case, unfortunately. That is why I ended up providing two different answers, assuming one of them would hit the actual question. :)

                                    Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum

                                    Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.

                                    1 Reply Last reply
                                    0
                                    Reply
                                    • Reply as topic
                                    Log in to reply
                                    • Oldest to Newest
                                    • Newest to Oldest
                                    • Most Votes


                                    • Login

                                    • Don't have an account? Register

                                    • Login or register to search.
                                    • First post
                                      Last post
                                    0
                                    • Categories
                                    • Recent
                                    • Tags
                                    • Popular
                                    • World
                                    • Users
                                    • Groups