Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. 3-byte Unicode characters?

3-byte Unicode characters?

Scheduled Pinned Locked Moved C / C++ / MFC
c++question
10 Posts 4 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A Offline
    A Offline
    Anthony Appleyard
    wrote on last edited by
    #1

    Windows functions with character arguments, currently are each in 2 modes, one for 1-byte characters (the old ascii characters from 0 to 255), and one for 2-byte characters (WCHAR etc) for the Unicode characters from 0 to 65535 (0x0000 to 0xFFFF). But there are Unicode characters defined that need to be in 3 bytes, e.g. 0x012000 to 0x01236E for cuneiform; I have already found a Wikipedia page that displays cuneiform characters, or would if I had a font for cuneiform. How do Windows C++ programs usually handle and read and write such exotica? Wikipedia page for "Cuneiform"

    L J J 3 Replies Last reply
    0
    • A Anthony Appleyard

      Windows functions with character arguments, currently are each in 2 modes, one for 1-byte characters (the old ascii characters from 0 to 255), and one for 2-byte characters (WCHAR etc) for the Unicode characters from 0 to 65535 (0x0000 to 0xFFFF). But there are Unicode characters defined that need to be in 3 bytes, e.g. 0x012000 to 0x01236E for cuneiform; I have already found a Wikipedia page that displays cuneiform characters, or would if I had a font for cuneiform. How do Windows C++ programs usually handle and read and write such exotica? Wikipedia page for "Cuneiform"

      L Offline
      L Offline
      Lost User
      wrote on last edited by
      #2

      Those characters fall into the Multi-byte Character Set (MBCS) types, and require fonts that can display them.

      A 1 Reply Last reply
      0
      • L Lost User

        Those characters fall into the Multi-byte Character Set (MBCS) types, and require fonts that can display them.

        A Offline
        A Offline
        Anthony Appleyard
        wrote on last edited by
        #3

        Are there any C++ functions to handle MBCS characters?

        L 1 Reply Last reply
        0
        • A Anthony Appleyard

          Are there any C++ functions to handle MBCS characters?

          L Offline
          L Offline
          Lost User
          wrote on last edited by
          #4

          What specifically are you trying to do?

          1 Reply Last reply
          0
          • A Anthony Appleyard

            Windows functions with character arguments, currently are each in 2 modes, one for 1-byte characters (the old ascii characters from 0 to 255), and one for 2-byte characters (WCHAR etc) for the Unicode characters from 0 to 65535 (0x0000 to 0xFFFF). But there are Unicode characters defined that need to be in 3 bytes, e.g. 0x012000 to 0x01236E for cuneiform; I have already found a Wikipedia page that displays cuneiform characters, or would if I had a font for cuneiform. How do Windows C++ programs usually handle and read and write such exotica? Wikipedia page for "Cuneiform"

            J Offline
            J Offline
            jschell
            wrote on last edited by
            #5

            First you figure out what you want to do. Second your figure out what character set or character sets (plural) you need to solve your problem. Third you determine what technology you need to solve that problem. At best you haven't identified the second part of the above. At least you haven't stated what character set you think you will be working with.

            A 1 Reply Last reply
            0
            • J jschell

              First you figure out what you want to do. Second your figure out what character set or character sets (plural) you need to solve your problem. Third you determine what technology you need to solve that problem. At best you haven't identified the second part of the above. At least you haven't stated what character set you think you will be working with.

              A Offline
              A Offline
              Anthony Appleyard
              wrote on last edited by
              #6

              The printf function has a version that prints one-byte characters, and a version that prints two-byte characters. Similarly with many other Windows C++ functions. But if I want to print a cuneiform character to screen, that is a 3-byte Unicode character, am I advised to stick to one-character mode and myself make the byte sequence to make Unicode go into the mode for 3-byte characters, and then to send the 3-byte character as three one-byte characters?

              J 1 Reply Last reply
              0
              • A Anthony Appleyard

                The printf function has a version that prints one-byte characters, and a version that prints two-byte characters. Similarly with many other Windows C++ functions. But if I want to print a cuneiform character to screen, that is a 3-byte Unicode character, am I advised to stick to one-character mode and myself make the byte sequence to make Unicode go into the mode for 3-byte characters, and then to send the 3-byte character as three one-byte characters?

                J Offline
                J Offline
                jschell
                wrote on last edited by
                #7

                Anthony Appleyard wrote:

                that is a 3-byte Unicode character

                You are confusing the conceptual with the practical. "Unicode" in its broadest sense is an attempt to regularize how characters are used in computing. It does that by defining characters. Those characters are then represented in character sets. There are quite a few of those (although less than the number of sets without the standardization of unicode.) Following are two examples of character sets. http://en.wikipedia.org/wiki/UTF-8[^] http://en.wikipedia.org/wiki/UTF-16[^] And those are just what is supposed to be in the data and doesn't say anything about whether any given technology X will support them partially much less fully. You seem to be suggesting that you might be attempting to use UTF16. However I am rather certain that there are variants of that.

                A 1 Reply Last reply
                0
                • J jschell

                  Anthony Appleyard wrote:

                  that is a 3-byte Unicode character

                  You are confusing the conceptual with the practical. "Unicode" in its broadest sense is an attempt to regularize how characters are used in computing. It does that by defining characters. Those characters are then represented in character sets. There are quite a few of those (although less than the number of sets without the standardization of unicode.) Following are two examples of character sets. http://en.wikipedia.org/wiki/UTF-8[^] http://en.wikipedia.org/wiki/UTF-16[^] And those are just what is supposed to be in the data and doesn't say anything about whether any given technology X will support them partially much less fully. You seem to be suggesting that you might be attempting to use UTF16. However I am rather certain that there are variants of that.

                  A Offline
                  A Offline
                  Anthony Appleyard
                  wrote on last edited by
                  #8

                  jschell wrote:

                  You seem to be suggesting that you might be attempting to use UTF16. However I am rather certain that there are variants of that.

                  I have successfully read and printed and displayed on screen the 2-byte Unicode characters, in a C++ application called Typecase which I wrote, which is somewhat like Windows Character Map; it outputs by putting its text output in the clipboard. I have successfully output Unicode text to UTF16 mode files.

                  J 1 Reply Last reply
                  0
                  • A Anthony Appleyard

                    jschell wrote:

                    You seem to be suggesting that you might be attempting to use UTF16. However I am rather certain that there are variants of that.

                    I have successfully read and printed and displayed on screen the 2-byte Unicode characters, in a C++ application called Typecase which I wrote, which is somewhat like Windows Character Map; it outputs by putting its text output in the clipboard. I have successfully output Unicode text to UTF16 mode files.

                    J Offline
                    J Offline
                    jschell
                    wrote on last edited by
                    #9

                    As I stated there are variants to UTF16. I am rather certain that one does not have any extensions at all. Another uses a two bytes (a range of two bytes) to specify that the following two bytes are used together (4 bytes) to create a code point. I believe there is a variant of UTF8 that can have a 3 byte character code point. But I am not as clear that there is a UTF16 that does.

                    1 Reply Last reply
                    0
                    • A Anthony Appleyard

                      Windows functions with character arguments, currently are each in 2 modes, one for 1-byte characters (the old ascii characters from 0 to 255), and one for 2-byte characters (WCHAR etc) for the Unicode characters from 0 to 65535 (0x0000 to 0xFFFF). But there are Unicode characters defined that need to be in 3 bytes, e.g. 0x012000 to 0x01236E for cuneiform; I have already found a Wikipedia page that displays cuneiform characters, or would if I had a font for cuneiform. How do Windows C++ programs usually handle and read and write such exotica? Wikipedia page for "Cuneiform"

                      J Offline
                      J Offline
                      Joe Woodbury
                      wrote on last edited by
                      #10

                      You are confusing UTF-8 and UTF-16. Both use variable length representation for characters, though with UTF-16, common languages are represented with two bytes, with things like musical notes and cuneiform in the escaped range. See the following: https://en.wikipedia.org/wiki/UTF-16[^]

                      1 Reply Last reply
                      0
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Don't have an account? Register

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups