Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. UTF-8 and MultiByte

UTF-8 and MultiByte

Scheduled Pinned Locked Moved C / C++ / MFC
question
8 Posts 5 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S Offline
    S Offline
    sandeepkavade
    wrote on last edited by
    #1

    hi all can anybody tell me is whats the difference between UTF-8 and multibyte. and from where i can find more information on the same? Thanx in advance.

    H N B 3 Replies Last reply
    0
    • S sandeepkavade

      hi all can anybody tell me is whats the difference between UTF-8 and multibyte. and from where i can find more information on the same? Thanx in advance.

      H Offline
      H Offline
      Hans Dietrich
      wrote on last edited by
      #2

      Google for "UTF-8 multibyte" and you will get many hits. Try Wikipedia first.

      Best wishes, Hans


      [CodeProject Forum Guidelines] [How To Ask A Question] [My Articles]

      1 Reply Last reply
      0
      • S sandeepkavade

        hi all can anybody tell me is whats the difference between UTF-8 and multibyte. and from where i can find more information on the same? Thanx in advance.

        N Offline
        N Offline
        Nemanja Trifunovic
        wrote on last edited by
        #3

        UTF-8 is a Unicode encoding scheme. Multibyte is a common name for a number of legacy encodings that typically store strings in char arrays (in C) as opposed to wchar_t arrays.


        Programming Blog utf8-cpp

        J 1 Reply Last reply
        0
        • N Nemanja Trifunovic

          UTF-8 is a Unicode encoding scheme. Multibyte is a common name for a number of legacy encodings that typically store strings in char arrays (in C) as opposed to wchar_t arrays.


          Programming Blog utf8-cpp

          J Offline
          J Offline
          jhwurmbach
          wrote on last edited by
          #4

          Nemanja Trifunovic wrote:

          Multibyte is a common name for a number of legacy encodings that typically store strings in char arrays

          Thus in the sense of your definition UTF-8 is a multibyte-format. UTF-16 (not that someone in his right mind would use that) isn't.


          Though I speak with the tongues of men and of angels, and have not money, I am become as a sounding brass, or a tinkling cymbal.
          George Orwell, "Keep the Aspidistra Flying", Opening words

          N 1 Reply Last reply
          0
          • S sandeepkavade

            hi all can anybody tell me is whats the difference between UTF-8 and multibyte. and from where i can find more information on the same? Thanx in advance.

            B Offline
            B Offline
            bob16972
            wrote on last edited by
            #5

            UTF explained[^]

            1 Reply Last reply
            0
            • J jhwurmbach

              Nemanja Trifunovic wrote:

              Multibyte is a common name for a number of legacy encodings that typically store strings in char arrays

              Thus in the sense of your definition UTF-8 is a multibyte-format. UTF-16 (not that someone in his right mind would use that) isn't.


              Though I speak with the tongues of men and of angels, and have not money, I am become as a sounding brass, or a tinkling cymbal.
              George Orwell, "Keep the Aspidistra Flying", Opening words

              N Offline
              N Offline
              Nemanja Trifunovic
              wrote on last edited by
              #6

              jhwurmbach wrote:

              UTF-8 is a multibyte-format.

              It is in a sense that is usually stored in char arrays and is a variable length encoding, but as I said the term "multibyte" is usually used for various legacy ASCII extensions such as SHIFT_JIS. UTF-8 is really a Unicode encoding.

              jhwurmbach wrote:

              UTF-16 (not that someone in his right mind would use that) isn't.

              You probably mean UTF-32.


              Programming Blog utf8-cpp

              J 1 Reply Last reply
              0
              • N Nemanja Trifunovic

                jhwurmbach wrote:

                UTF-8 is a multibyte-format.

                It is in a sense that is usually stored in char arrays and is a variable length encoding, but as I said the term "multibyte" is usually used for various legacy ASCII extensions such as SHIFT_JIS. UTF-8 is really a Unicode encoding.

                jhwurmbach wrote:

                UTF-16 (not that someone in his right mind would use that) isn't.

                You probably mean UTF-32.


                Programming Blog utf8-cpp

                J Offline
                J Offline
                jhwurmbach
                wrote on last edited by
                #7

                Nemanja Trifunovic wrote:

                jhwurmbach wrote: UTF-16 (not that someone in his right mind would use that) isn't. You probably mean UTF-32.

                I meant UTF-8 in the original meaning. According to the link[^]given in the posting below, UTF-16 is fixed 16-bit (and seems to be what the Windows-designers had in mind when they added the UNICODE-Functions taking wchar_t) It seem as if standard bodies have tampered with UTF-16. UTF-8 uses bytes, but it leaves the fixed relationship one code number <-> one character (which UTF-16) reatined.


                Though I speak with the tongues of men and of angels, and have not money, I am become as a sounding brass, or a tinkling cymbal.
                George Orwell, "Keep the Aspidistra Flying", Opening words

                N 1 Reply Last reply
                0
                • J jhwurmbach

                  Nemanja Trifunovic wrote:

                  jhwurmbach wrote: UTF-16 (not that someone in his right mind would use that) isn't. You probably mean UTF-32.

                  I meant UTF-8 in the original meaning. According to the link[^]given in the posting below, UTF-16 is fixed 16-bit (and seems to be what the Windows-designers had in mind when they added the UNICODE-Functions taking wchar_t) It seem as if standard bodies have tampered with UTF-16. UTF-8 uses bytes, but it leaves the fixed relationship one code number <-> one character (which UTF-16) reatined.


                  Though I speak with the tongues of men and of angels, and have not money, I am become as a sounding brass, or a tinkling cymbal.
                  George Orwell, "Keep the Aspidistra Flying", Opening words

                  N Offline
                  N Offline
                  Nemanja Trifunovic
                  wrote on last edited by
                  #8

                  jhwurmbach wrote:

                  According to the link[^]given in the posting below, UTF-16 is fixed 16-bit

                  Don't know about the link, but UTF-16 is definitelly not fixed 16-bit per character. There are surrogate pairs[^] that cover the space above 16 bits. On the other hand, with UTF-32, each code point is encoded with a 32-bit number, and it is the only fix-length Unicode encoding schema.


                  Programming Blog utf8-cpp

                  1 Reply Last reply
                  0
                  Reply
                  • Reply as topic
                  Log in to reply
                  • Oldest to Newest
                  • Newest to Oldest
                  • Most Votes


                  • Login

                  • Don't have an account? Register

                  • Login or register to search.
                  • First post
                    Last post
                  0
                  • Categories
                  • Recent
                  • Tags
                  • Popular
                  • World
                  • Users
                  • Groups