Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. UTF8 Encoding - need help or explanation

UTF8 Encoding - need help or explanation

Scheduled Pinned Locked Moved C#
helpcsharpcomadobequestion
3 Posts 2 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • R Offline
    R Offline
    Radoslav Bielik
    wrote on last edited by
    #1

    Hi everyone, I'm having a strange problem with unicode encoding in C# / Macromedia Flash, and I think I need a little explanation to make sure I understand WHERE is the problem. :) So, in C#, we have Encoding.UTF8 and Encoding.Unicode. Encoding.UTF8 will encode ASCII characters into 8 bits, and all other characters as 16 bits (accented characters, etc). On the other hand, Encoding.Unicode is actually UTF16 and will encode all characters into 16 bits. The problem: latin small letter s with caron - š - with character code 0x0161. This letter is encoded into 0x6101 when using Encoding.Unicode or 0x0161 when using Encoding.BigEndianUnicode. However, when using Encoding.UTF8, this letter is encoded into 0xC5A1. In Macromedia Flash, strings are apparently encoded using UTF8, as the base ASCII characters are encoded into 8 bits, but the small letter s with caron - š - is encoded into 0x0161. So now I don't know why is it different in UTF8 in C#? Any clues will be highly appreciated... Rado


    Radoslav Bielik http://www.neomyz.com/poll [^] - Get your own web poll

    M 1 Reply Last reply
    0
    • R Radoslav Bielik

      Hi everyone, I'm having a strange problem with unicode encoding in C# / Macromedia Flash, and I think I need a little explanation to make sure I understand WHERE is the problem. :) So, in C#, we have Encoding.UTF8 and Encoding.Unicode. Encoding.UTF8 will encode ASCII characters into 8 bits, and all other characters as 16 bits (accented characters, etc). On the other hand, Encoding.Unicode is actually UTF16 and will encode all characters into 16 bits. The problem: latin small letter s with caron - š - with character code 0x0161. This letter is encoded into 0x6101 when using Encoding.Unicode or 0x0161 when using Encoding.BigEndianUnicode. However, when using Encoding.UTF8, this letter is encoded into 0xC5A1. In Macromedia Flash, strings are apparently encoded using UTF8, as the base ASCII characters are encoded into 8 bits, but the small letter s with caron - š - is encoded into 0x0161. So now I don't know why is it different in UTF8 in C#? Any clues will be highly appreciated... Rado


      Radoslav Bielik http://www.neomyz.com/poll [^] - Get your own web poll

      M Offline
      M Offline
      Mike Dimmick
      wrote on last edited by
      #2

      Yes, LATIN SMALL LETTER S WITH CARON[^] is U+0161. In little-endian UTF-16 this is the byte sequence 0x61 0x01, in big-endian UTF-16 0x01 0x61 and in UTF-8, 0xC5 0xA1. I'd look into how Flash encodes characters. See for example http://www.macromedia.com/support/flash/languages/unicode_in_flmx/[^]. Stability. What an interesting concept. -- Chris Maunder

      R 1 Reply Last reply
      0
      • M Mike Dimmick

        Yes, LATIN SMALL LETTER S WITH CARON[^] is U+0161. In little-endian UTF-16 this is the byte sequence 0x61 0x01, in big-endian UTF-16 0x01 0x61 and in UTF-8, 0xC5 0xA1. I'd look into how Flash encodes characters. See for example http://www.macromedia.com/support/flash/languages/unicode_in_flmx/[^]. Stability. What an interesting concept. -- Chris Maunder

        R Offline
        R Offline
        Radoslav Bielik
        wrote on last edited by
        #3

        Thanks Mike, this makes sense! :) Now it seems that the Flash is actually using UTF-16 internally, not UTF-8. I will forward this to our Flash guy. One more question - is there any easy and straightforward way to convert UTF-16 representation to UTF-8 representation of a letter? [EDIT]I was thinking about an algorithm, or a simple script, not about the C# Encoding.Convert[/EDIT] Thanks again! Rado


        Radoslav Bielik http://www.neomyz.com/poll [^] - Get your own web poll

        1 Reply Last reply
        0
        Reply
        • Reply as topic
        Log in to reply
        • Oldest to Newest
        • Newest to Oldest
        • Most Votes


        • Login

        • Don't have an account? Register

        • Login or register to search.
        • First post
          Last post
        0
        • Categories
        • Recent
        • Tags
        • Popular
        • World
        • Users
        • Groups