Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. The Lounge
  3. Encoding Question

Encoding Question

Scheduled Pinned Locked Moved The Lounge
question
28 Posts 13 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • E Offline
    E Offline
    eggie5
    wrote on last edited by
    #1

    Would it be true to call these characters 翻訳と通訳 non-ISO latin chars? If not, what is the correct name to call them? Unicode? I don't think that japanese/chinese is latin...

    P J C S R 6 Replies Last reply
    0
    • E eggie5

      Would it be true to call these characters 翻訳と通訳 non-ISO latin chars? If not, what is the correct name to call them? Unicode? I don't think that japanese/chinese is latin...

      P Offline
      P Offline
      PJ Arends
      wrote on last edited by
      #2

      I'd call them 'squares';P


      "You're obviously a superstar." - Christian Graus about me - 12 Feb '03 "Obviously ???  You're definitely a superstar!!!" - mYkel - 21 Jun '04 "There's not enough blatant self-congratulatory backslapping in the world today..." - HumblePie - 21 Jun '05 Within you lies the power for good - Use it!

      1 Reply Last reply
      0
      • E eggie5

        Would it be true to call these characters 翻訳と通訳 non-ISO latin chars? If not, what is the correct name to call them? Unicode? I don't think that japanese/chinese is latin...

        J Offline
        J Offline
        Josh Martin
        wrote on last edited by
        #3

        I'm dealing heavily with Unicode data right now, and I just refer to them as Unicode characters, as that encompasses the entire range that I'm dealing with. The "non-ISO latin chars" confused me for a sec, because I initially read it as "(non-ISO) latin chars" instead of "(non-ISO latin) chars". Josh Find a penny, pick it up, and all day long you'll have a back-ache...

        E 1 Reply Last reply
        0
        • J Josh Martin

          I'm dealing heavily with Unicode data right now, and I just refer to them as Unicode characters, as that encompasses the entire range that I'm dealing with. The "non-ISO latin chars" confused me for a sec, because I initially read it as "(non-ISO) latin chars" instead of "(non-ISO latin) chars". Josh Find a penny, pick it up, and all day long you'll have a back-ache...

          E Offline
          E Offline
          eggie5
          wrote on last edited by
          #4

          So, Unicode encompasses all every character there is right? or i that UTF? Anyways, what is a term that I can use to differentiate those symbols from say, standard English... in encoding lingo... /\ |_ E X E GG

          J 1 Reply Last reply
          0
          • E eggie5

            So, Unicode encompasses all every character there is right? or i that UTF? Anyways, what is a term that I can use to differentiate those symbols from say, standard English... in encoding lingo... /\ |_ E X E GG

            J Offline
            J Offline
            Josh Martin
            wrote on last edited by
            #5

            The Unicode character set contains all of the characters (Latin, Hebrew, Arabic, Chinese, Japanase, etc) laid out by the Unicode standards. UTF-8 and UTF-16 are encoding schemes for storing a Unicode code-point in a binary representation. Josh Find a penny, pick it up, and all day long you'll have a back-ache...

            E J 2 Replies Last reply
            0
            • J Josh Martin

              The Unicode character set contains all of the characters (Latin, Hebrew, Arabic, Chinese, Japanase, etc) laid out by the Unicode standards. UTF-8 and UTF-16 are encoding schemes for storing a Unicode code-point in a binary representation. Josh Find a penny, pick it up, and all day long you'll have a back-ache...

              E Offline
              E Offline
              eggie5
              wrote on last edited by
              #6

              So what would be the most specific name for these 三维地图, if I wanted to differnetiate it from standard english text? /\ |_ E X E GG

              J 1 Reply Last reply
              0
              • E eggie5

                So what would be the most specific name for these 三维地图, if I wanted to differnetiate it from standard english text? /\ |_ E X E GG

                J Offline
                J Offline
                Josh Martin
                wrote on last edited by
                #7

                I guess it would all depend on your target audience. For me, I'd probably just call them Japanese characters, even though they could just as well be Chinese (I studied some Japanese in school, and know that in some cases, the Japanese Kanji is identical to the Chinese characters, but the words are pronounced differently). Since my target audience is mainly my QA department right now, I either just say "Unicode characters" or I identify the specific portion of the character set that I'm referring to at the time (either Chinese/Japanese, Hebrew, Arabic, etc). Josh Find a penny, pick it up, and all day long you'll have a back-ache...

                E 1 Reply Last reply
                0
                • J Josh Martin

                  I guess it would all depend on your target audience. For me, I'd probably just call them Japanese characters, even though they could just as well be Chinese (I studied some Japanese in school, and know that in some cases, the Japanese Kanji is identical to the Chinese characters, but the words are pronounced differently). Since my target audience is mainly my QA department right now, I either just say "Unicode characters" or I identify the specific portion of the character set that I'm referring to at the time (either Chinese/Japanese, Hebrew, Arabic, etc). Josh Find a penny, pick it up, and all day long you'll have a back-ache...

                  E Offline
                  E Offline
                  eggie5
                  wrote on last edited by
                  #8

                  But if I said "Unicode Characters" wouldn't that mean "alfdkjsf" too? /\ |_ E X E GG

                  C 2 Replies Last reply
                  0
                  • E eggie5

                    Would it be true to call these characters 翻訳と通訳 non-ISO latin chars? If not, what is the correct name to call them? Unicode? I don't think that japanese/chinese is latin...

                    C Offline
                    C Offline
                    cmk
                    wrote on last edited by
                    #9

                    Generally you will specify the code page id and description. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_81rn.asp[^] Unicode numbers every character for every language. The unicode 'number space' is carved into blocks/sets - one for each language/code page. http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicode_6bqr.asp[^] ...cmk Save the whales - collect the whole set

                    1 Reply Last reply
                    0
                    • E eggie5

                      Would it be true to call these characters 翻訳と通訳 non-ISO latin chars? If not, what is the correct name to call them? Unicode? I don't think that japanese/chinese is latin...

                      S Offline
                      S Offline
                      Shog9 0
                      wrote on last edited by
                      #10

                      eggie5 wrote:

                      I don't think that japanese/chinese is latin...

                      That's for sure. :) I'm no expert, but here's a tool that'll let you get the script and character names (just paste the characters into the UTF-8 box)(one at a time): http://isthisthingon.org/unicode/index.phtml[^]

                      My god, you're a genius! - Jörgen Sigvardsson, The Lounge

                      1 Reply Last reply
                      0
                      • E eggie5

                        But if I said "Unicode Characters" wouldn't that mean "alfdkjsf" too? /\ |_ E X E GG

                        C Offline
                        C Offline
                        code frog 0
                        wrote on last edited by
                        #11

                        You would refer to those as Kanji. I did this a long time ago in C++. I believe that unicode is a 4byte wide-character set. Once upon a time you only used it if necessary but internationalization has really exploded such that most applications just use unicode. So for you you are using a unicode encoding to support Kanji. I believe it's Kanji even if it's Korean, Chinese, Japanese, etc... But for me it's been 4 years since I did this and memory fades. I think the answer you want though is "Kanji". {EDIT-ADDED} This looks very similar to the document I used way back when... http://www.cl.cam.ac.uk/~mgk25/unicode.html[^] here's the info on Kanji do a control F on the page to read the whole document with links. Unicode X11 font names end with -ISO10646-1. This is now the officially registered value for the X Logical Font Descriptor (XLFD) fields CHARSET_REGISTRY and CHARSET_ENCODING for all Unicode and ISO 10646-1 16-bit fonts. The *-ISO10646-1 fonts contain some unspecified subset of the entire Unicode character set, and users have to make sure that whatever font they select covers the subset of characters needed by them. The *-ISO10646-1 fonts usually also specify a DEFAULT_CHAR value that points to a special non-Unicode glyph for representing any character that is not available in the font (usually a dashed box, the size of an H, located at 0x00). This ensures that users at least see clearly that there is an unsupported character. The smaller fixed-width fonts such as 6x13 etc. for xterm will never be able to cover all of Unicode, because many scripts such as Kanji can only be represented in considerably larger pixel sizes than those widely used by European users. Typical Unicode fonts for European usage will contain only subsets of between 1000 and 3000 characters, such as the CEN MES-3 repertoire. You might notice that in the *-ISO10646-1 fonts the shapes of the ASCII quotation marks has slightly changed to bring them in line with the standards and practice on other platforms. {END-EDIT-ADDED}

                        Some assembly required. Code-frog System Architects, Inc.

                        -- modified at 17:34 Wednesday 30th November, 2005

                        J R 2 Replies Last reply
                        0
                        • E eggie5

                          But if I said "Unicode Characters" wouldn't that mean "alfdkjsf" too? /\ |_ E X E GG

                          C Offline
                          C Offline
                          code frog 0
                          wrote on last edited by
                          #12

                          eggie5 wrote:

                          But if I said "Unicode Characters" wouldn't that mean "alfdkjsf" too?

                          I think you would also call those specifically glyphs which is generally understood to mean symbols used in writing or as a form of writing that convey's much more than a written letter in the terms of a meaning. So a glyph by itself might mean one thing but the same glyph used with other glyphs may not have the same meaning at all in fact a totally different story might be told with using 2 glyphs for example...

                          Some assembly required. Code-frog System Architects, Inc.

                          1 Reply Last reply
                          0
                          • C code frog 0

                            You would refer to those as Kanji. I did this a long time ago in C++. I believe that unicode is a 4byte wide-character set. Once upon a time you only used it if necessary but internationalization has really exploded such that most applications just use unicode. So for you you are using a unicode encoding to support Kanji. I believe it's Kanji even if it's Korean, Chinese, Japanese, etc... But for me it's been 4 years since I did this and memory fades. I think the answer you want though is "Kanji". {EDIT-ADDED} This looks very similar to the document I used way back when... http://www.cl.cam.ac.uk/~mgk25/unicode.html[^] here's the info on Kanji do a control F on the page to read the whole document with links. Unicode X11 font names end with -ISO10646-1. This is now the officially registered value for the X Logical Font Descriptor (XLFD) fields CHARSET_REGISTRY and CHARSET_ENCODING for all Unicode and ISO 10646-1 16-bit fonts. The *-ISO10646-1 fonts contain some unspecified subset of the entire Unicode character set, and users have to make sure that whatever font they select covers the subset of characters needed by them. The *-ISO10646-1 fonts usually also specify a DEFAULT_CHAR value that points to a special non-Unicode glyph for representing any character that is not available in the font (usually a dashed box, the size of an H, located at 0x00). This ensures that users at least see clearly that there is an unsupported character. The smaller fixed-width fonts such as 6x13 etc. for xterm will never be able to cover all of Unicode, because many scripts such as Kanji can only be represented in considerably larger pixel sizes than those widely used by European users. Typical Unicode fonts for European usage will contain only subsets of between 1000 and 3000 characters, such as the CEN MES-3 repertoire. You might notice that in the *-ISO10646-1 fonts the shapes of the ASCII quotation marks has slightly changed to bring them in line with the standards and practice on other platforms. {END-EDIT-ADDED}

                            Some assembly required. Code-frog System Architects, Inc.

                            -- modified at 17:34 Wednesday 30th November, 2005

                            J Offline
                            J Offline
                            Jorgen Sigvardsson
                            wrote on last edited by
                            #13

                            > I believe it's Kanji even if it's Korean, Chinese, Japanese, etc... Both the Koreans and the Japanese use kanji (Chinese letters, developed during the Han dynasty if memory serves me right). While the Japanese use a simplified version, and the Chinese use both traditional and simplified versions, I don't know much about the Korean kanji. I believe the Koreans use kanji sparsely, because the only times I remember seeing them are in context of martial arts. However, Koreans also use Hangul, which is a syllable symbology, although much more complex than our good old alphabet. The Japanese also use Katakana and Hiragana, which are both syllable symbologies. So, anything Korean or Japanese doesn't necessarily have to be kanji. :) -- Pictures[^] from my Japan trip.

                            C 1 Reply Last reply
                            0
                            • J Josh Martin

                              The Unicode character set contains all of the characters (Latin, Hebrew, Arabic, Chinese, Japanase, etc) laid out by the Unicode standards. UTF-8 and UTF-16 are encoding schemes for storing a Unicode code-point in a binary representation. Josh Find a penny, pick it up, and all day long you'll have a back-ache...

                              J Offline
                              J Offline
                              Jorgen Sigvardsson
                              wrote on last edited by
                              #14

                              > The Unicode character set contains all of the characters (Latin, Hebrew, Arabic, Chinese, Japanase, etc) laid out by the Unicode standards. Which is far from complete. Not even half of the Japanese kanji are in the standard. IIRC, 20000 or so out of 50000 are in the standard. It has even worried some japanese schollars that the use of computers may cripple the written language! (I think he's on to something, and I don't think that problem is confined to the japanese language only :sigh:) -- Pictures[^] from my Japan trip.

                              C 1 Reply Last reply
                              0
                              • J Jorgen Sigvardsson

                                > I believe it's Kanji even if it's Korean, Chinese, Japanese, etc... Both the Koreans and the Japanese use kanji (Chinese letters, developed during the Han dynasty if memory serves me right). While the Japanese use a simplified version, and the Chinese use both traditional and simplified versions, I don't know much about the Korean kanji. I believe the Koreans use kanji sparsely, because the only times I remember seeing them are in context of martial arts. However, Koreans also use Hangul, which is a syllable symbology, although much more complex than our good old alphabet. The Japanese also use Katakana and Hiragana, which are both syllable symbologies. So, anything Korean or Japanese doesn't necessarily have to be kanji. :) -- Pictures[^] from my Japan trip.

                                C Offline
                                C Offline
                                code frog 0
                                wrote on last edited by
                                #15

                                Correct! As usual, Jorgen! :) But as I recall the team I worked on that was implementing support for unicode across all the major languages we just loosely called it "Kanji" if it was glyph based that's more the point I was getting at. Although now many years later I would use the term "glyph" based languages instead. I might piss off some people if I lumped all glyph based languages into Kanji. So I'd call it "glyph" and then I'd call it unicode and after that I'd just call it work. If it's unicode, elastic collisions, linear algebra or whatever it's all work right. :)

                                Some assembly required. Code-frog System Architects, Inc.

                                J 1 Reply Last reply
                                0
                                • J Jorgen Sigvardsson

                                  > The Unicode character set contains all of the characters (Latin, Hebrew, Arabic, Chinese, Japanase, etc) laid out by the Unicode standards. Which is far from complete. Not even half of the Japanese kanji are in the standard. IIRC, 20000 or so out of 50000 are in the standard. It has even worried some japanese schollars that the use of computers may cripple the written language! (I think he's on to something, and I don't think that problem is confined to the japanese language only :sigh:) -- Pictures[^] from my Japan trip.

                                  C Offline
                                  C Offline
                                  code frog 0
                                  wrote on last edited by
                                  #16

                                  No WAY! Computers will explode into that problem and solve it. That challenge is ripe and begging for someone at MIT to solve over a weekend with a spare 9 volt battery, a slightly used postage stamp a magazine insert and some copper wire.;P Seriously though. We'll get that taken care of. I'm just glad I don't have to use a Kanji keyboard. Can you imagine? 18 feet long and 6 feet tall. It would have to be to get all those keys on it. I bet someone will implement a flexible keyboard that will auto-scroll with the motion of the hands. I should shut up now or I'll make someone rich. This idea is copyrighted by me. Don't touch it. It's mine.;P

                                  Some assembly required. Code-frog System Architects, Inc.

                                  S 1 Reply Last reply
                                  0
                                  • C code frog 0

                                    Correct! As usual, Jorgen! :) But as I recall the team I worked on that was implementing support for unicode across all the major languages we just loosely called it "Kanji" if it was glyph based that's more the point I was getting at. Although now many years later I would use the term "glyph" based languages instead. I might piss off some people if I lumped all glyph based languages into Kanji. So I'd call it "glyph" and then I'd call it unicode and after that I'd just call it work. If it's unicode, elastic collisions, linear algebra or whatever it's all work right. :)

                                    Some assembly required. Code-frog System Architects, Inc.

                                    J Offline
                                    J Offline
                                    Jorgen Sigvardsson
                                    wrote on last edited by
                                    #17

                                    > So I'd call it "glyph" and then I'd call it unicode and after that I'd just call it work. If it's unicode, elastic collisions, linear algebra or whatever it's all work right. :-D I was screwed up at the University by the use of both Lisp and Scheme, hence I got stuck on symbols. These days I work a lot with barcodes and their associated symbologies, so I'm still stuck on symbols. Glyph sounds nicer though. I remember a class hierarchy in "Design Patterns: Elements of Reusable Object-Oriented Software", which was very sweet. Ever since, the word "glyph" has always had a positive ring in my ears. Woops. This MDCO (Chief Miscellaneous Department Officer) is supposed to be wake up in 7 hrs. :) (Good night that is.. :-D) -- Pictures[^] from my Japan trip. -- modified at 18:41 Wednesday 30th November, 2005

                                    D 1 Reply Last reply
                                    0
                                    • E eggie5

                                      Would it be true to call these characters 翻訳と通訳 non-ISO latin chars? If not, what is the correct name to call them? Unicode? I don't think that japanese/chinese is latin...

                                      R Offline
                                      R Offline
                                      Ryan Roberts
                                      wrote on last edited by
                                      #18

                                      Really good article on unicode[^] Ryan

                                      O fools, awake! The rites you sacred hold Are but a cheat contrived by men of old, Who lusted after wealth and gained their lust And died in baseness—and their law is dust. al-Ma'arri (973-1057)

                                      1 Reply Last reply
                                      0
                                      • J Jorgen Sigvardsson

                                        > So I'd call it "glyph" and then I'd call it unicode and after that I'd just call it work. If it's unicode, elastic collisions, linear algebra or whatever it's all work right. :-D I was screwed up at the University by the use of both Lisp and Scheme, hence I got stuck on symbols. These days I work a lot with barcodes and their associated symbologies, so I'm still stuck on symbols. Glyph sounds nicer though. I remember a class hierarchy in "Design Patterns: Elements of Reusable Object-Oriented Software", which was very sweet. Ever since, the word "glyph" has always had a positive ring in my ears. Woops. This MDCO (Chief Miscellaneous Department Officer) is supposed to be wake up in 7 hrs. :) (Good night that is.. :-D) -- Pictures[^] from my Japan trip. -- modified at 18:41 Wednesday 30th November, 2005

                                        D Offline
                                        D Offline
                                        David Stone
                                        wrote on last edited by
                                        #19

                                        Jörgen Sigvardsson wrote:

                                        MDCO (Chief Miscellaneous Department Officer)

                                        Shouldn't that be CMDO? :~


                                        Picture a huge catholic cathedral. In it there's many people, including a gregorian monk choir. You know, those who sing beautifully. Then they start singing, in latin, as they always do: "Ad hominem..." -Jörgen Sigvardsson

                                        -- modified at 19:29 Wednesday 30th November, 2005

                                        J 1 Reply Last reply
                                        0
                                        • E eggie5

                                          Would it be true to call these characters 翻訳と通訳 non-ISO latin chars? If not, what is the correct name to call them? Unicode? I don't think that japanese/chinese is latin...

                                          N Offline
                                          N Offline
                                          Nemanja Trifunovic
                                          wrote on last edited by
                                          #20

                                          Generally, these characters may be encoded with different character sets. Just by looking at them one can't say a thing about how they are encoded. If you want to differentiate them from western scripts, I think your best bet would be to call them "non-western scripts". That says nothing about encoding, and I believe that's what you want.


                                          My programming blahblahblah blog. If you ever find anything useful here, please let me know to remove it.

                                          1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Don't have an account? Register

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups