Simple Encoding

Jorgen Sigvardsson

I'm not so sure about your definitions. The Unicode <-> UTF-8 part I have no problem with. ASCII is a strict definition of 128 characters, encoded using 7 bits. ASCII-8 is an extension of that, and defines exactly 256 characters. Windows-1252 also defines 256 characters. They're all character sets, as they define exactly where each character is in the "alphabet". So in my point of view; "ASCII is to Windows-1252 as Unicode is to UTF-8." doesn't hold.

eggie5

So ASCII, ASCII-8 and Unicode are all character sets? /\ |_ E X E GG

Jorgen Sigvardsson

I believe so. This list[^] seem to imply that. IIRC, Unicode was the character set designed to kill the need for every other character sets, as it's supposed to be large enough.

Jorgen Sigvardsson

ASCII is to the alphabet, as Unicode is to the union of all alphabets (or the sum of all alphabets - to make it easier on the layman's ear). I think that would be an appropriate analogy.

jhaga

I like Joel's article on the subject: http://www.joelonsoftware.com/articles/Unicode.html[^] jhaga --------------------------------- Every generation laughs at the old fashions, but follows religiously the new. Henry David Thoreau, "Walden", 1854

eggie5

ASCII is to the english alphabet, as Unicode all alll alphabets in the world??? Thanks, I like that. /\ |_ E X E GG

Jorgen Sigvardsson

Something like that, yes. And to explain UTF-8, use Morse code for an analogy. Morse code conveys the same thing as written language, it's just communicated a bit differently.

eggie5

"What do web browsers do if they don't find any Content-Type, either in the http headers or the meta tag? Internet Explorer actually does something quite interesting: it tries to guess, based on the frequency in which various bytes appear in typical text in typical encodings of various languages, what language and encoding was used." Thought you might be interested in that... /\ |_ E X E GG

Michael Dunn

You're mixing up the concepts of character set and encoding. The Joel article mentioned earlier covers this is more detail. --Mike-- Visual C++ MVP :cool: LINKS~! Ericahist | NEW!! PimpFish | CP SearchBar v3.0 | C++ Forum FAQ

RandomMonkey

James's Catch22 page may also be of assistance.

Jeremy Falcon

eggie5 wrote:

Thought you might be interested in that...

Thanks. Funny thing is, that's exactly what Notepad does also to detect which set the file is in. Jeremy Falcon