Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. problem with unicode. [modified]

problem with unicode. [modified]

Scheduled Pinned Locked Moved C#
helptutorial
26 Posts 7 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • P prasadbuddhika

    could you please guide me on that. thanx.

    D Offline
    D Offline
    David1987
    wrote on last edited by
    #8

    First search for occurrences of \\U[0-9]+ (that's a regex, the backslash is escaped and you may have to doubly-escape it) Then replace it with (char)int.Parse(match.Value.Split('U')[1]) (though I would refactor that)

    P 1 Reply Last reply
    0
    • P prasadbuddhika

      thanx, the first one is the option that i could apply , but it also not working for me, i tried the first option and i still get the first character in the string . any idea about it . thanx.

      L Offline
      L Offline
      Luc Pattyn
      wrote on last edited by
      #9

      prasadbuddhika wrote:

      i still get the first character in the string

      ????????????????????? string s only holds one character. the whole backslash-u-fourdigirs thing is C#'s way to specify a single character by its Unicode character number.

      Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum

      Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.

      P 1 Reply Last reply
      0
      • L Luc Pattyn

        prasadbuddhika wrote:

        i still get the first character in the string

        ????????????????????? string s only holds one character. the whole backslash-u-fourdigirs thing is C#'s way to specify a single character by its Unicode character number.

        Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum

        Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.

        P Offline
        P Offline
        prasadbuddhika
        wrote on last edited by
        #10

        thanx Luc, i had made a mistake , i had used "U" instead of "u" , when i use "U" it gives me the unrecognized escape sequence . but with "u" it works fine thank you.

        1 Reply Last reply
        0
        • P prasadbuddhika

          i need to get the character when the unicode is given. for example : i get the unicode values as a string -> "\U0061" so then i need to get the character for this unicode value. i know that i can use "char c = '\U0061' but unfortunately i get the Unicode value as a string, so what i need is to get the character by that string . anyone got an idea to do this. thanx in advance.

          modified on Wednesday, May 11, 2011 12:30 PM

          L Offline
          L Offline
          Luc Pattyn
          wrote on last edited by
          #11

          if what you have is a six-character string containing a real backslash, a U, and four hex digits, then you could turn that into a single character like so, however this situation is rare, it would typically occur only if you plan on writing your own C# compiler!

          string s=@"\U0061"; // the @ in front tells the compiler to ignore the special meaning of backslashes
          int uni;
          if (s==null || s.Length!=6 || s[0]!='\\' || s[1]!='U' ||
          !int.TryParse(s.Substring(2, 4), NumberStyles.HexNumber, null, out uni))
          throw new Exception("Bad unicode string in: "+s);
          char c=(char)uni;
          log("uni="+uni.ToString("X4"));
          log("c="+c);

          :)

          Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum

          Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.

          modified on Wednesday, May 11, 2011 1:23 PM

          1 Reply Last reply
          0
          • P prasadbuddhika

            i need to get the character when the unicode is given. for example : i get the unicode values as a string -> "\U0061" so then i need to get the character for this unicode value. i know that i can use "char c = '\U0061' but unfortunately i get the Unicode value as a string, so what i need is to get the character by that string . anyone got an idea to do this. thanx in advance.

            modified on Wednesday, May 11, 2011 12:30 PM

            J Offline
            J Offline
            jschell
            wrote on last edited by
            #12

            Is there a real problem here? As noted you have a simple character one which will be in string if the string is created correctly. However you CANNOT use a single C# data type 'char', to represent the entire supported character set range. So if that is your goal you will fail. Read up on "surrogate pairs" to find out why.

            1 Reply Last reply
            0
            • D David1987

              First search for occurrences of \\U[0-9]+ (that's a regex, the backslash is escaped and you may have to doubly-escape it) Then replace it with (char)int.Parse(match.Value.Split('U')[1]) (though I would refactor that)

              P Offline
              P Offline
              Peter_in_2780
              wrote on last edited by
              #13

              You have a problem. The four digits after the \u are HEX not decimal. So int.Parse won't cut it. Cheers, Peter

              Software rusts. Simon Stephenson, ca 1994.

              D 1 Reply Last reply
              0
              • P Peter_in_2780

                You have a problem. The four digits after the \u are HEX not decimal. So int.Parse won't cut it. Cheers, Peter

                Software rusts. Simon Stephenson, ca 1994.

                D Offline
                D Offline
                David1987
                wrote on last edited by
                #14

                OP didn't say so, so how do you know?

                P 1 Reply Last reply
                0
                • D David1987

                  OP didn't say so, so how do you know?

                  P Offline
                  P Offline
                  Peter_in_2780
                  wrote on last edited by
                  #15

                  Because that's the way \u_nnnn_ works. Big brother to the \x_nn_ convention for single byte characters. Borrowing a couple of sentences from the Java Language Specification, section 3.2: A Unicode escape of the form \u_xxxx_, where xxxx is a hexadecimal value, represents the Unicode character whose encoding is xxxx. This translation step allows any program to be expressed using only ASCII characters.

                  Software rusts. Simon Stephenson, ca 1994.

                  D 1 Reply Last reply
                  0
                  • P Peter_in_2780

                    Because that's the way \u_nnnn_ works. Big brother to the \x_nn_ convention for single byte characters. Borrowing a couple of sentences from the Java Language Specification, section 3.2: A Unicode escape of the form \u_xxxx_, where xxxx is a hexadecimal value, represents the Unicode character whose encoding is xxxx. This translation step allows any program to be expressed using only ASCII characters.

                    Software rusts. Simon Stephenson, ca 1994.

                    D Offline
                    D Offline
                    David1987
                    wrote on last edited by
                    #16

                    \u_nnnn_ doesn't work any particular way, it's just a string.. Of course it works that way in Java and C# and no doubt some other places as well, but there's no guarantee that it always does and OP should have specified it

                    P 1 Reply Last reply
                    0
                    • D David1987

                      \u_nnnn_ doesn't work any particular way, it's just a string.. Of course it works that way in Java and C# and no doubt some other places as well, but there's no guarantee that it always does and OP should have specified it

                      P Offline
                      P Offline
                      Peter_in_2780
                      wrote on last edited by
                      #17

                      Well, this is the C# forum... :doh:

                      Software rusts. Simon Stephenson, ca 1994.

                      D 1 Reply Last reply
                      0
                      • P Peter_in_2780

                        Well, this is the C# forum... :doh:

                        Software rusts. Simon Stephenson, ca 1994.

                        D Offline
                        D Offline
                        David1987
                        wrote on last edited by
                        #18

                        Why does that matter? It's not about the string "\Uanything" (ie a string containing the actual character), but about a string containing "\\Uanything" that has to be converted the the first form. Anything could still be in any form - nowhere did he say that it originates from C# sourcecode.

                        P 1 Reply Last reply
                        0
                        • D David1987

                          Why does that matter? It's not about the string "\Uanything" (ie a string containing the actual character), but about a string containing "\\Uanything" that has to be converted the the first form. Anything could still be in any form - nowhere did he say that it originates from C# sourcecode.

                          P Offline
                          P Offline
                          Peter_in_2780
                          wrote on last edited by
                          #19

                          I didn't have a problem understanding what OP wanted to do. Luc didn't have a problem. Richard didn't have a problem. jschell didn't have a problem. OP didn't have a problem understanding Luc's answers. I'm not going on a troll-feeding expedition. End of discussion.

                          Software rusts. Simon Stephenson, ca 1994.

                          D L 4 Replies Last reply
                          0
                          • P Peter_in_2780

                            I didn't have a problem understanding what OP wanted to do. Luc didn't have a problem. Richard didn't have a problem. jschell didn't have a problem. OP didn't have a problem understanding Luc's answers. I'm not going on a troll-feeding expedition. End of discussion.

                            Software rusts. Simon Stephenson, ca 1994.

                            D Offline
                            D Offline
                            David1987
                            wrote on last edited by
                            #20

                            Actually they all had problems understanding what he meant. Luc's first answer doesn't answer the question, Richard's answer doesn't the question, Luc's first version of his second answer parsed it as decimal aswell IIRC and jschell just noted a problem with what the OP is trying to do.

                            Peter_in_2780 wrote:

                            I'm not going on a troll-feeding expedition. End of discussion.

                            Fuck you. It was a legitimate discussion. You are the troll here, not me.

                            modified on Thursday, May 12, 2011 3:42 AM

                            L L 2 Replies Last reply
                            0
                            • P Peter_in_2780

                              I didn't have a problem understanding what OP wanted to do. Luc didn't have a problem. Richard didn't have a problem. jschell didn't have a problem. OP didn't have a problem understanding Luc's answers. I'm not going on a troll-feeding expedition. End of discussion.

                              Software rusts. Simon Stephenson, ca 1994.

                              D Offline
                              D Offline
                              David1987
                              wrote on last edited by
                              #21

                              And let me remind you, you are wrong. The OP did not specify that the number had to be in HEX, therefore it was not clear.

                              1 Reply Last reply
                              0
                              • P Peter_in_2780

                                I didn't have a problem understanding what OP wanted to do. Luc didn't have a problem. Richard didn't have a problem. jschell didn't have a problem. OP didn't have a problem understanding Luc's answers. I'm not going on a troll-feeding expedition. End of discussion.

                                Software rusts. Simon Stephenson, ca 1994.

                                D Offline
                                D Offline
                                David1987
                                wrote on last edited by
                                #22

                                Who the fuck upvoted this?

                                K 1 Reply Last reply
                                0
                                • D David1987

                                  Actually they all had problems understanding what he meant. Luc's first answer doesn't answer the question, Richard's answer doesn't the question, Luc's first version of his second answer parsed it as decimal aswell IIRC and jschell just noted a problem with what the OP is trying to do.

                                  Peter_in_2780 wrote:

                                  I'm not going on a troll-feeding expedition. End of discussion.

                                  Fuck you. It was a legitimate discussion. You are the troll here, not me.

                                  modified on Thursday, May 12, 2011 3:42 AM

                                  L Offline
                                  L Offline
                                  Lost User
                                  wrote on last edited by
                                  #23

                                  David1987 wrote:

                                  Richard's answer doesn't the question

                                  Precisely, because I did not think the question was clear, and it had already received enough suggestions from other people. BTW I was not confused about the hex/decimal question; as you rightly pointed out the OP did not specify what the number format was.

                                  The best things in life are not things.

                                  1 Reply Last reply
                                  0
                                  • P Peter_in_2780

                                    I didn't have a problem understanding what OP wanted to do. Luc didn't have a problem. Richard didn't have a problem. jschell didn't have a problem. OP didn't have a problem understanding Luc's answers. I'm not going on a troll-feeding expedition. End of discussion.

                                    Software rusts. Simon Stephenson, ca 1994.

                                    L Offline
                                    L Offline
                                    Lost User
                                    wrote on last edited by
                                    #24

                                    Now look[^] what you've done. :sigh:

                                    The best things in life are not things.

                                    1 Reply Last reply
                                    0
                                    • D David1987

                                      Who the fuck upvoted this?

                                      K Offline
                                      K Offline
                                      Keith Barrow
                                      wrote on last edited by
                                      #25

                                      Correcting Univotes now. Luckily my jedi powers are OK, so you'll gain more then you lost :)

                                      Sort of a cross between Lawrence of Arabia and Dilbert.[^]
                                      -Or-
                                      A Dead ringer for Kate Winslett[^]

                                      1 Reply Last reply
                                      0
                                      • D David1987

                                        Actually they all had problems understanding what he meant. Luc's first answer doesn't answer the question, Richard's answer doesn't the question, Luc's first version of his second answer parsed it as decimal aswell IIRC and jschell just noted a problem with what the OP is trying to do.

                                        Peter_in_2780 wrote:

                                        I'm not going on a troll-feeding expedition. End of discussion.

                                        Fuck you. It was a legitimate discussion. You are the troll here, not me.

                                        modified on Thursday, May 12, 2011 3:42 AM

                                        L Offline
                                        L Offline
                                        Luc Pattyn
                                        wrote on last edited by
                                        #26

                                        David1987 wrote:

                                        Luc's first version of his second answer parsed it as decimal aswell IIRC

                                        Wrong. I edited to improve the error checking, basically I added s==null || s.Length!=6 || s[0]!='\\' || s[1]!='U' to the if(...) throw statement. Now stop this silly dispute, none of us know exactly what the OP intended, as is often the case, unfortunately. That is why I ended up providing two different answers, assuming one of them would hit the actual question. :)

                                        Luc Pattyn [Forum Guidelines] [My Articles] Nil Volentibus Arduum

                                        Please use <PRE> tags for code snippets, they preserve indentation, improve readability, and make me actually look at the code.

                                        1 Reply Last reply
                                        0
                                        Reply
                                        • Reply as topic
                                        Log in to reply
                                        • Oldest to Newest
                                        • Newest to Oldest
                                        • Most Votes


                                        • Login

                                        • Don't have an account? Register

                                        • Login or register to search.
                                        • First post
                                          Last post
                                        0
                                        • Categories
                                        • Recent
                                        • Tags
                                        • Popular
                                        • World
                                        • Users
                                        • Groups