Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. wchar_t in C

wchar_t in C

Scheduled Pinned Locked Moved C / C++ / MFC
question
16 Posts 5 Posters 1 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A Offline
    A Offline
    Anonygeeker
    wrote on last edited by
    #1

    Hi, Is there a data type wchar_t in C?If so, how it differ from char?

    CPalliniC J L 3 Replies Last reply
    0
    • A Anonygeeker

      Hi, Is there a data type wchar_t in C?If so, how it differ from char?

      CPalliniC Offline
      CPalliniC Offline
      CPallini
      wrote on last edited by
      #2

      Quote:

      Is there a data type wchar_t in C?

      Yes, apparently[^].

      Quote:

      If so, how it differ from char?

      It is compiler dependent.

      In testa che avete, signor di Ceprano?

      1 Reply Last reply
      0
      • A Anonygeeker

        Hi, Is there a data type wchar_t in C?If so, how it differ from char?

        J Offline
        J Offline
        Jochen Arndt
        wrote on last edited by
        #3

        Yes, since C90. See wchar_t - C++ Reference[^] and Wide character - Wikipedia[^]. Because it is implementation defined (compiler and platform dependant), there is no general answer to how it differs from a char.

        A 1 Reply Last reply
        0
        • J Jochen Arndt

          Yes, since C90. See wchar_t - C++ Reference[^] and Wide character - Wikipedia[^]. Because it is implementation defined (compiler and platform dependant), there is no general answer to how it differs from a char.

          A Offline
          A Offline
          Anonygeeker
          wrote on last edited by
          #4

          Thanks. I tried finding size of it and got 4 bytes. If so, It should be able to take something like "abc". But its not happening. Why?

          J 1 Reply Last reply
          0
          • A Anonygeeker

            Thanks. I tried finding size of it and got 4 bytes. If so, It should be able to take something like "abc". But its not happening. Why?

            J Offline
            J Offline
            Jochen Arndt
            wrote on last edited by
            #5

            Because "abc" is a char* string and not a wchar_t which represents a single character. wchar_t are used to store fixed length Unicode characters like UTF-16 or UTF-32 while char can store fixed length ASCII or ANSI characters (with an associated code page) or variable length characters like UTF-8 or Microsoft multi byte characters.

            J A 2 Replies Last reply
            0
            • A Anonygeeker

              Hi, Is there a data type wchar_t in C?If so, how it differ from char?

              L Offline
              L Offline
              leon de boer
              wrote on last edited by
              #6

              It's defined as a wide char for Unicode & UTF16 support primarily for filename name support (FAT32 LFN for example) and foreign console input. There is also another important type in which is wint_t which is the generic carrier form. You need the concept of narrowing which take a wide character back to it's byte approximation (see function wctob). wctob | Microsoft Docs[^] The reverse concept is widening which takes a byte character and promotes it (see function btowc) btowc | Microsoft Docs[^] The letter conversions are controlled by the current LC_TYPE locale meaning the language type Type something like this .. it prints the time in japanese :-)

              #include
              #include
              #include

              int main(void){

              wchar_t str[100];
              time_t t = time(0);
              setlocale(LC_ALL, "ja-JP");
              wcsftime(str, 100, L"%A %c", localtime(&t));
              wprintf(L"%Ls\n", str);
              }

              It will look something like "金曜日 2017/12/15 2:09:13"

              In vino veritas

              1 Reply Last reply
              0
              • J Jochen Arndt

                Because "abc" is a char* string and not a wchar_t which represents a single character. wchar_t are used to store fixed length Unicode characters like UTF-16 or UTF-32 while char can store fixed length ASCII or ANSI characters (with an associated code page) or variable length characters like UTF-8 or Microsoft multi byte characters.

                J Offline
                J Offline
                jschell
                wrote on last edited by
                #7

                Jochen Arndt wrote:

                wchar_t are used to store fixed length Unicode characters like UTF-16 or UTF-32

                Just noting that statement is somewhat of a generalization. For starters, unicode, for those bit sizes never represents all characters via single character. One needs to go to 128 bits for a full representation. Maybe that isn't even big enough. Additionally it is not limited to unicode. Although perhaps these days that would be the predominant usage in the western world.

                J 1 Reply Last reply
                0
                • J jschell

                  Jochen Arndt wrote:

                  wchar_t are used to store fixed length Unicode characters like UTF-16 or UTF-32

                  Just noting that statement is somewhat of a generalization. For starters, unicode, for those bit sizes never represents all characters via single character. One needs to go to 128 bits for a full representation. Maybe that isn't even big enough. Additionally it is not limited to unicode. Although perhaps these days that would be the predominant usage in the western world.

                  J Offline
                  J Offline
                  Jochen Arndt
                  wrote on last edited by
                  #8

                  Quote:

                  "wchar_t are used to store fixed length Unicode characters like UTF-16 or UTF-32" Just noting that statement is somewhat of a generalization.

                  Why? As far as I know wchar_t is not used for variable length encodings.

                  jschell wrote:

                  for those bit sizes never represents all characters via single character

                  While that is true for UTF-16 it is not for UTF-32.

                  jschell wrote:

                  One needs to go to 128 bits for a full representation

                  That is wrong.

                  The Unicode Blog: Announcing The Unicode® Standard, Version 10.0[^]:

                  Tuesday, June 20, 2017 Version 10.0 of the Unicode Standard is now available. For the first time, both the core specification and the data files are available on the same date. Version 10.0 adds 8,518 characters, for a total of 136,690 characters. These additions include four new scripts, for a total of 139 scripts, as well as 56 new emoji characters.

                  Still 14 of 32 bits unused.

                  J 1 Reply Last reply
                  0
                  • J Jochen Arndt

                    Because "abc" is a char* string and not a wchar_t which represents a single character. wchar_t are used to store fixed length Unicode characters like UTF-16 or UTF-32 while char can store fixed length ASCII or ANSI characters (with an associated code page) or variable length characters like UTF-8 or Microsoft multi byte characters.

                    A Offline
                    A Offline
                    Anonygeeker
                    wrote on last edited by
                    #9

                    fixed length Unicode characters like UTF-16 or UTF-32-A detailed explanation will be helpful.And any example for them?

                    J 1 Reply Last reply
                    0
                    • A Anonygeeker

                      fixed length Unicode characters like UTF-16 or UTF-32-A detailed explanation will be helpful.And any example for them?

                      J Offline
                      J Offline
                      Jochen Arndt
                      wrote on last edited by
                      #10

                      wchar_t are used to store "wide characters" (characters using an encoding that requires more than a byte). The most common used character encodings for wchar_t are UCS-2 (a subset of UTF-16) and UTF-32. Read Unicode - Wikipedia[^].

                      A 1 Reply Last reply
                      0
                      • J Jochen Arndt

                        wchar_t are used to store "wide characters" (characters using an encoding that requires more than a byte). The most common used character encodings for wchar_t are UCS-2 (a subset of UTF-16) and UTF-32. Read Unicode - Wikipedia[^].

                        A Offline
                        A Offline
                        Anonygeeker
                        wrote on last edited by
                        #11

                        Thanks..

                        1 Reply Last reply
                        0
                        • J Jochen Arndt

                          Quote:

                          "wchar_t are used to store fixed length Unicode characters like UTF-16 or UTF-32" Just noting that statement is somewhat of a generalization.

                          Why? As far as I know wchar_t is not used for variable length encodings.

                          jschell wrote:

                          for those bit sizes never represents all characters via single character

                          While that is true for UTF-16 it is not for UTF-32.

                          jschell wrote:

                          One needs to go to 128 bits for a full representation

                          That is wrong.

                          The Unicode Blog: Announcing The Unicode® Standard, Version 10.0[^]:

                          Tuesday, June 20, 2017 Version 10.0 of the Unicode Standard is now available. For the first time, both the core specification and the data files are available on the same date. Version 10.0 adds 8,518 characters, for a total of 136,690 characters. These additions include four new scripts, for a total of 139 scripts, as well as 56 new emoji characters.

                          Still 14 of 32 bits unused.

                          J Offline
                          J Offline
                          jschell
                          wrote on last edited by
                          #12

                          Jochen Arndt wrote:

                          Why? As far as I know wchar_t is not used for variable length encodings.

                          It is intended for any character set, not just unicode. Most representations are not unicode. And unicode IS a variable length encoding to some meaning of that definition. There are escape characters in the 8/16/32 bit unicode character sets that allow for the definition of additional characters using multiple 'character' values. So two wchar_t might be needed for a single character.

                          Jochen Arndt wrote:

                          While that is true for UTF-16 it is not for UTF-32.

                          "UTF-32 does not make calculating the displayed width of a string easier, since even with a "fixed width" font there may be more than one code point per character position" UTF-32 - Wikipedia[^] I will state that it is unlikely for this to be used.

                          Jochen Arndt wrote:

                          That is wrong.

                          Presumably you are claiming that UTF-32 contains every possible character. So based on that logic what exactly is in UTF-64? Just UTF-32 for the first half and the empty space for the rest?

                          Jochen Arndt wrote:

                          Still 14 of 32 bits unused.

                          That isn't relevant. It isn't how the character set is defined but rather the extent and how it is used. There are unused spots in a number of places in unicode in general. No idea why. Perhaps they figure a specific range of the character set might have a few more characters added in the future. Far as I can recall many character sets beyond 7 bits end up duplicating or adding to a real character set. For example the normal extended ascii set has several dashes and a few mathematical symbols. And that is only using 8 bits.

                          J 1 Reply Last reply
                          0
                          • J jschell

                            Jochen Arndt wrote:

                            Why? As far as I know wchar_t is not used for variable length encodings.

                            It is intended for any character set, not just unicode. Most representations are not unicode. And unicode IS a variable length encoding to some meaning of that definition. There are escape characters in the 8/16/32 bit unicode character sets that allow for the definition of additional characters using multiple 'character' values. So two wchar_t might be needed for a single character.

                            Jochen Arndt wrote:

                            While that is true for UTF-16 it is not for UTF-32.

                            "UTF-32 does not make calculating the displayed width of a string easier, since even with a "fixed width" font there may be more than one code point per character position" UTF-32 - Wikipedia[^] I will state that it is unlikely for this to be used.

                            Jochen Arndt wrote:

                            That is wrong.

                            Presumably you are claiming that UTF-32 contains every possible character. So based on that logic what exactly is in UTF-64? Just UTF-32 for the first half and the empty space for the rest?

                            Jochen Arndt wrote:

                            Still 14 of 32 bits unused.

                            That isn't relevant. It isn't how the character set is defined but rather the extent and how it is used. There are unused spots in a number of places in unicode in general. No idea why. Perhaps they figure a specific range of the character set might have a few more characters added in the future. Far as I can recall many character sets beyond 7 bits end up duplicating or adding to a real character set. For example the normal extended ascii set has several dashes and a few mathematical symbols. And that is only using 8 bits.

                            J Offline
                            J Offline
                            Jochen Arndt
                            wrote on last edited by
                            #13

                            Quote:

                            It is intended for any character set, not just unicode. Most representations are not unicode.

                            Examples (I don't know one except when wchar_t is defined as char)?

                            Quote:

                            And unicode IS a variable length encoding to some meaning of that definition

                            There are multiple Unicode encodings where some are fixed length and some are variable length.

                            Quote:

                            So two wchar_t might be needed for a single character.

                            It is intended to be used for single characters. Allowing more than one requires a much more complex implementation (like with the char based Microsoft multi byte character sets).

                            Quote:

                            "UTF-32 does not make calculating the displayed width of a string easier, since even with a "fixed width" font there may be more than one code point per character position"

                            Fonts and there display length is not related to character encoding specifications.

                            Quote:

                            That isn't relevant. It isn't how the character set is defined but rather the extent and how it is used.

                            It is all about the definition regarding the required storage size. The unused code points are there because the codes are grouped (each script or symbol type has an assigned range). See Unicode block - Wikipedia[^]. So new characters / symbols can be added later to the belonging group (a rather old example is the Euro symbol). The grouping has been choosen because with 32 bits there is enough room. Unicode already contains nearly all known scripts including ancient ones like Runes and Mayan glyphs and a wide range of symbols.

                            J 1 Reply Last reply
                            0
                            • J Jochen Arndt

                              Quote:

                              It is intended for any character set, not just unicode. Most representations are not unicode.

                              Examples (I don't know one except when wchar_t is defined as char)?

                              Quote:

                              And unicode IS a variable length encoding to some meaning of that definition

                              There are multiple Unicode encodings where some are fixed length and some are variable length.

                              Quote:

                              So two wchar_t might be needed for a single character.

                              It is intended to be used for single characters. Allowing more than one requires a much more complex implementation (like with the char based Microsoft multi byte character sets).

                              Quote:

                              "UTF-32 does not make calculating the displayed width of a string easier, since even with a "fixed width" font there may be more than one code point per character position"

                              Fonts and there display length is not related to character encoding specifications.

                              Quote:

                              That isn't relevant. It isn't how the character set is defined but rather the extent and how it is used.

                              It is all about the definition regarding the required storage size. The unused code points are there because the codes are grouped (each script or symbol type has an assigned range). See Unicode block - Wikipedia[^]. So new characters / symbols can be added later to the belonging group (a rather old example is the Euro symbol). The grouping has been choosen because with 32 bits there is enough room. Unicode already contains nearly all known scripts including ancient ones like Runes and Mayan glyphs and a wide range of symbols.

                              J Offline
                              J Offline
                              jschell
                              wrote on last edited by
                              #14

                              Jochen Arndt wrote:

                              The grouping has been choosen because with 32 bits there is enough room. Unicode already contains nearly all known scripts including ancient ones like Runes and Mayan glyphs and a wide range of symbols.

                              As I said there is a 64 bit definition. Left to you to explain what the purpose of that is.

                              J 1 Reply Last reply
                              0
                              • J jschell

                                Jochen Arndt wrote:

                                The grouping has been choosen because with 32 bits there is enough room. Unicode already contains nearly all known scripts including ancient ones like Runes and Mayan glyphs and a wide range of symbols.

                                As I said there is a 64 bit definition. Left to you to explain what the purpose of that is.

                                J Offline
                                J Offline
                                Jochen Arndt
                                wrote on last edited by
                                #15

                                I can't explain the purpose of something that does not exist. There is neither UTF-64 nor UTF-128.

                                J 1 Reply Last reply
                                0
                                • J Jochen Arndt

                                  I can't explain the purpose of something that does not exist. There is neither UTF-64 nor UTF-128.

                                  J Offline
                                  J Offline
                                  jschell
                                  wrote on last edited by
                                  #16

                                  Jochen Arndt wrote:

                                  There is neither UTF-64 nor UTF-128.

                                  I stand corrected - far as I can tell there is no 64 bit encoding. However there still remains code points in the 32 bit set that require a total of two code points.

                                  1 Reply Last reply
                                  0
                                  Reply
                                  • Reply as topic
                                  Log in to reply
                                  • Oldest to Newest
                                  • Newest to Oldest
                                  • Most Votes


                                  • Login

                                  • Don't have an account? Register

                                  • Login or register to search.
                                  • First post
                                    Last post
                                  0
                                  • Categories
                                  • Recent
                                  • Tags
                                  • Popular
                                  • World
                                  • Users
                                  • Groups