UTF-8 and MultiByte
-
hi all can anybody tell me is whats the difference between UTF-8 and multibyte. and from where i can find more information on the same? Thanx in advance.
-
hi all can anybody tell me is whats the difference between UTF-8 and multibyte. and from where i can find more information on the same? Thanx in advance.
Google for "UTF-8 multibyte" and you will get many hits. Try Wikipedia first.
Best wishes, Hans
[CodeProject Forum Guidelines] [How To Ask A Question] [My Articles]
-
hi all can anybody tell me is whats the difference between UTF-8 and multibyte. and from where i can find more information on the same? Thanx in advance.
UTF-8 is a Unicode encoding scheme. Multibyte is a common name for a number of legacy encodings that typically store strings in
char
arrays (in C) as opposed towchar_t
arrays.
-
UTF-8 is a Unicode encoding scheme. Multibyte is a common name for a number of legacy encodings that typically store strings in
char
arrays (in C) as opposed towchar_t
arrays.
Nemanja Trifunovic wrote:
Multibyte is a common name for a number of legacy encodings that typically store strings in char arrays
Thus in the sense of your definition UTF-8 is a multibyte-format. UTF-16 (not that someone in his right mind would use that) isn't.
Though I speak with the tongues of men and of angels, and have not money, I am become as a sounding brass, or a tinkling cymbal.
George Orwell, "Keep the Aspidistra Flying", Opening words -
hi all can anybody tell me is whats the difference between UTF-8 and multibyte. and from where i can find more information on the same? Thanx in advance.
-
Nemanja Trifunovic wrote:
Multibyte is a common name for a number of legacy encodings that typically store strings in char arrays
Thus in the sense of your definition UTF-8 is a multibyte-format. UTF-16 (not that someone in his right mind would use that) isn't.
Though I speak with the tongues of men and of angels, and have not money, I am become as a sounding brass, or a tinkling cymbal.
George Orwell, "Keep the Aspidistra Flying", Opening wordsjhwurmbach wrote:
UTF-8 is a multibyte-format.
It is in a sense that is usually stored in char arrays and is a variable length encoding, but as I said the term "multibyte" is usually used for various legacy ASCII extensions such as SHIFT_JIS. UTF-8 is really a Unicode encoding.
jhwurmbach wrote:
UTF-16 (not that someone in his right mind would use that) isn't.
You probably mean UTF-32.
-
jhwurmbach wrote:
UTF-8 is a multibyte-format.
It is in a sense that is usually stored in char arrays and is a variable length encoding, but as I said the term "multibyte" is usually used for various legacy ASCII extensions such as SHIFT_JIS. UTF-8 is really a Unicode encoding.
jhwurmbach wrote:
UTF-16 (not that someone in his right mind would use that) isn't.
You probably mean UTF-32.
Nemanja Trifunovic wrote:
jhwurmbach wrote: UTF-16 (not that someone in his right mind would use that) isn't. You probably mean UTF-32.
I meant UTF-8 in the original meaning. According to the link[^]given in the posting below, UTF-16 is fixed 16-bit (and seems to be what the Windows-designers had in mind when they added the UNICODE-Functions taking wchar_t) It seem as if standard bodies have tampered with UTF-16. UTF-8 uses bytes, but it leaves the fixed relationship one code number <-> one character (which UTF-16) reatined.
Though I speak with the tongues of men and of angels, and have not money, I am become as a sounding brass, or a tinkling cymbal.
George Orwell, "Keep the Aspidistra Flying", Opening words -
Nemanja Trifunovic wrote:
jhwurmbach wrote: UTF-16 (not that someone in his right mind would use that) isn't. You probably mean UTF-32.
I meant UTF-8 in the original meaning. According to the link[^]given in the posting below, UTF-16 is fixed 16-bit (and seems to be what the Windows-designers had in mind when they added the UNICODE-Functions taking wchar_t) It seem as if standard bodies have tampered with UTF-16. UTF-8 uses bytes, but it leaves the fixed relationship one code number <-> one character (which UTF-16) reatined.
Though I speak with the tongues of men and of angels, and have not money, I am become as a sounding brass, or a tinkling cymbal.
George Orwell, "Keep the Aspidistra Flying", Opening wordsjhwurmbach wrote:
According to the link[^]given in the posting below, UTF-16 is fixed 16-bit
Don't know about the link, but UTF-16 is definitelly not fixed 16-bit per character. There are surrogate pairs[^] that cover the space above 16 bits. On the other hand, with UTF-32, each code point is encoded with a 32-bit number, and it is the only fix-length Unicode encoding schema.