Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. wchar_t in C

wchar_t in C

Scheduled Pinned Locked Moved C / C++ / MFC
question
16 Posts 5 Posters 1 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J Jochen Arndt

    Because "abc" is a char* string and not a wchar_t which represents a single character. wchar_t are used to store fixed length Unicode characters like UTF-16 or UTF-32 while char can store fixed length ASCII or ANSI characters (with an associated code page) or variable length characters like UTF-8 or Microsoft multi byte characters.

    J Offline
    J Offline
    jschell
    wrote on last edited by
    #7

    Jochen Arndt wrote:

    wchar_t are used to store fixed length Unicode characters like UTF-16 or UTF-32

    Just noting that statement is somewhat of a generalization. For starters, unicode, for those bit sizes never represents all characters via single character. One needs to go to 128 bits for a full representation. Maybe that isn't even big enough. Additionally it is not limited to unicode. Although perhaps these days that would be the predominant usage in the western world.

    J 1 Reply Last reply
    0
    • J jschell

      Jochen Arndt wrote:

      wchar_t are used to store fixed length Unicode characters like UTF-16 or UTF-32

      Just noting that statement is somewhat of a generalization. For starters, unicode, for those bit sizes never represents all characters via single character. One needs to go to 128 bits for a full representation. Maybe that isn't even big enough. Additionally it is not limited to unicode. Although perhaps these days that would be the predominant usage in the western world.

      J Offline
      J Offline
      Jochen Arndt
      wrote on last edited by
      #8

      Quote:

      "wchar_t are used to store fixed length Unicode characters like UTF-16 or UTF-32" Just noting that statement is somewhat of a generalization.

      Why? As far as I know wchar_t is not used for variable length encodings.

      jschell wrote:

      for those bit sizes never represents all characters via single character

      While that is true for UTF-16 it is not for UTF-32.

      jschell wrote:

      One needs to go to 128 bits for a full representation

      That is wrong.

      The Unicode Blog: Announcing The UnicodeĀ® Standard, Version 10.0[^]:

      Tuesday, June 20, 2017 Version 10.0 of the Unicode Standard is now available. For the first time, both the core specification and the data files are available on the same date. Version 10.0 adds 8,518 characters, for a total of 136,690 characters. These additions include four new scripts, for a total of 139 scripts, as well as 56 new emoji characters.

      Still 14 of 32 bits unused.

      J 1 Reply Last reply
      0
      • J Jochen Arndt

        Because "abc" is a char* string and not a wchar_t which represents a single character. wchar_t are used to store fixed length Unicode characters like UTF-16 or UTF-32 while char can store fixed length ASCII or ANSI characters (with an associated code page) or variable length characters like UTF-8 or Microsoft multi byte characters.

        A Offline
        A Offline
        Anonygeeker
        wrote on last edited by
        #9

        fixed length Unicode characters like UTF-16 or UTF-32-A detailed explanation will be helpful.And any example for them?

        J 1 Reply Last reply
        0
        • A Anonygeeker

          fixed length Unicode characters like UTF-16 or UTF-32-A detailed explanation will be helpful.And any example for them?

          J Offline
          J Offline
          Jochen Arndt
          wrote on last edited by
          #10

          wchar_t are used to store "wide characters" (characters using an encoding that requires more than a byte). The most common used character encodings for wchar_t are UCS-2 (a subset of UTF-16) and UTF-32. Read Unicode - Wikipedia[^].

          A 1 Reply Last reply
          0
          • J Jochen Arndt

            wchar_t are used to store "wide characters" (characters using an encoding that requires more than a byte). The most common used character encodings for wchar_t are UCS-2 (a subset of UTF-16) and UTF-32. Read Unicode - Wikipedia[^].

            A Offline
            A Offline
            Anonygeeker
            wrote on last edited by
            #11

            Thanks..

            1 Reply Last reply
            0
            • J Jochen Arndt

              Quote:

              "wchar_t are used to store fixed length Unicode characters like UTF-16 or UTF-32" Just noting that statement is somewhat of a generalization.

              Why? As far as I know wchar_t is not used for variable length encodings.

              jschell wrote:

              for those bit sizes never represents all characters via single character

              While that is true for UTF-16 it is not for UTF-32.

              jschell wrote:

              One needs to go to 128 bits for a full representation

              That is wrong.

              The Unicode Blog: Announcing The UnicodeĀ® Standard, Version 10.0[^]:

              Tuesday, June 20, 2017 Version 10.0 of the Unicode Standard is now available. For the first time, both the core specification and the data files are available on the same date. Version 10.0 adds 8,518 characters, for a total of 136,690 characters. These additions include four new scripts, for a total of 139 scripts, as well as 56 new emoji characters.

              Still 14 of 32 bits unused.

              J Offline
              J Offline
              jschell
              wrote on last edited by
              #12

              Jochen Arndt wrote:

              Why? As far as I know wchar_t is not used for variable length encodings.

              It is intended for any character set, not just unicode. Most representations are not unicode. And unicode IS a variable length encoding to some meaning of that definition. There are escape characters in the 8/16/32 bit unicode character sets that allow for the definition of additional characters using multiple 'character' values. So two wchar_t might be needed for a single character.

              Jochen Arndt wrote:

              While that is true for UTF-16 it is not for UTF-32.

              "UTF-32 does not make calculating the displayed width of a string easier, since even with a "fixed width" font there may be more than one code point per character position" UTF-32 - Wikipedia[^] I will state that it is unlikely for this to be used.

              Jochen Arndt wrote:

              That is wrong.

              Presumably you are claiming that UTF-32 contains every possible character. So based on that logic what exactly is in UTF-64? Just UTF-32 for the first half and the empty space for the rest?

              Jochen Arndt wrote:

              Still 14 of 32 bits unused.

              That isn't relevant. It isn't how the character set is defined but rather the extent and how it is used. There are unused spots in a number of places in unicode in general. No idea why. Perhaps they figure a specific range of the character set might have a few more characters added in the future. Far as I can recall many character sets beyond 7 bits end up duplicating or adding to a real character set. For example the normal extended ascii set has several dashes and a few mathematical symbols. And that is only using 8 bits.

              J 1 Reply Last reply
              0
              • J jschell

                Jochen Arndt wrote:

                Why? As far as I know wchar_t is not used for variable length encodings.

                It is intended for any character set, not just unicode. Most representations are not unicode. And unicode IS a variable length encoding to some meaning of that definition. There are escape characters in the 8/16/32 bit unicode character sets that allow for the definition of additional characters using multiple 'character' values. So two wchar_t might be needed for a single character.

                Jochen Arndt wrote:

                While that is true for UTF-16 it is not for UTF-32.

                "UTF-32 does not make calculating the displayed width of a string easier, since even with a "fixed width" font there may be more than one code point per character position" UTF-32 - Wikipedia[^] I will state that it is unlikely for this to be used.

                Jochen Arndt wrote:

                That is wrong.

                Presumably you are claiming that UTF-32 contains every possible character. So based on that logic what exactly is in UTF-64? Just UTF-32 for the first half and the empty space for the rest?

                Jochen Arndt wrote:

                Still 14 of 32 bits unused.

                That isn't relevant. It isn't how the character set is defined but rather the extent and how it is used. There are unused spots in a number of places in unicode in general. No idea why. Perhaps they figure a specific range of the character set might have a few more characters added in the future. Far as I can recall many character sets beyond 7 bits end up duplicating or adding to a real character set. For example the normal extended ascii set has several dashes and a few mathematical symbols. And that is only using 8 bits.

                J Offline
                J Offline
                Jochen Arndt
                wrote on last edited by
                #13

                Quote:

                It is intended for any character set, not just unicode. Most representations are not unicode.

                Examples (I don't know one except when wchar_t is defined as char)?

                Quote:

                And unicode IS a variable length encoding to some meaning of that definition

                There are multiple Unicode encodings where some are fixed length and some are variable length.

                Quote:

                So two wchar_t might be needed for a single character.

                It is intended to be used for single characters. Allowing more than one requires a much more complex implementation (like with the char based Microsoft multi byte character sets).

                Quote:

                "UTF-32 does not make calculating the displayed width of a string easier, since even with a "fixed width" font there may be more than one code point per character position"

                Fonts and there display length is not related to character encoding specifications.

                Quote:

                That isn't relevant. It isn't how the character set is defined but rather the extent and how it is used.

                It is all about the definition regarding the required storage size. The unused code points are there because the codes are grouped (each script or symbol type has an assigned range). See Unicode block - Wikipedia[^]. So new characters / symbols can be added later to the belonging group (a rather old example is the Euro symbol). The grouping has been choosen because with 32 bits there is enough room. Unicode already contains nearly all known scripts including ancient ones like Runes and Mayan glyphs and a wide range of symbols.

                J 1 Reply Last reply
                0
                • J Jochen Arndt

                  Quote:

                  It is intended for any character set, not just unicode. Most representations are not unicode.

                  Examples (I don't know one except when wchar_t is defined as char)?

                  Quote:

                  And unicode IS a variable length encoding to some meaning of that definition

                  There are multiple Unicode encodings where some are fixed length and some are variable length.

                  Quote:

                  So two wchar_t might be needed for a single character.

                  It is intended to be used for single characters. Allowing more than one requires a much more complex implementation (like with the char based Microsoft multi byte character sets).

                  Quote:

                  "UTF-32 does not make calculating the displayed width of a string easier, since even with a "fixed width" font there may be more than one code point per character position"

                  Fonts and there display length is not related to character encoding specifications.

                  Quote:

                  That isn't relevant. It isn't how the character set is defined but rather the extent and how it is used.

                  It is all about the definition regarding the required storage size. The unused code points are there because the codes are grouped (each script or symbol type has an assigned range). See Unicode block - Wikipedia[^]. So new characters / symbols can be added later to the belonging group (a rather old example is the Euro symbol). The grouping has been choosen because with 32 bits there is enough room. Unicode already contains nearly all known scripts including ancient ones like Runes and Mayan glyphs and a wide range of symbols.

                  J Offline
                  J Offline
                  jschell
                  wrote on last edited by
                  #14

                  Jochen Arndt wrote:

                  The grouping has been choosen because with 32 bits there is enough room. Unicode already contains nearly all known scripts including ancient ones like Runes and Mayan glyphs and a wide range of symbols.

                  As I said there is a 64 bit definition. Left to you to explain what the purpose of that is.

                  J 1 Reply Last reply
                  0
                  • J jschell

                    Jochen Arndt wrote:

                    The grouping has been choosen because with 32 bits there is enough room. Unicode already contains nearly all known scripts including ancient ones like Runes and Mayan glyphs and a wide range of symbols.

                    As I said there is a 64 bit definition. Left to you to explain what the purpose of that is.

                    J Offline
                    J Offline
                    Jochen Arndt
                    wrote on last edited by
                    #15

                    I can't explain the purpose of something that does not exist. There is neither UTF-64 nor UTF-128.

                    J 1 Reply Last reply
                    0
                    • J Jochen Arndt

                      I can't explain the purpose of something that does not exist. There is neither UTF-64 nor UTF-128.

                      J Offline
                      J Offline
                      jschell
                      wrote on last edited by
                      #16

                      Jochen Arndt wrote:

                      There is neither UTF-64 nor UTF-128.

                      I stand corrected - far as I can tell there is no 64 bit encoding. However there still remains code points in the 32 bit set that require a total of two code points.

                      1 Reply Last reply
                      0
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Don't have an account? Register

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups