A plea to Japanese (or Asian) Language Web Developers

dan sh

So, this is why they are slow readers. They need to first figure out what the sentence actually means.

Simon Lee Shugar

Thanks for the help I'll have a look into it. I was wondering if the Japanese language used certain characters to symbolise a block of characters that must be read together. The web is an interesting weave indeed.

Simon Lee Shugar (Software Developer) www.simonshugar.co.uk "If something goes by a false name, would it mean that thing is fake? False by nature?" By Gilbert Durandil

Simon Lee Shugar

Gary Wheeler wrote:

I can tell you from experience in developing localized applications that the general rule is to never construct text from stock phrases.

I think this might be the best advise yet. It's not just translating text from English to japanese or translating from English to Japanese in context to application but in fact translating English to Japanese in context to the application and in context to the design... Interesting. (Substitute the languages at will)

Simon Lee Shugar (Software Developer) www.simonshugar.co.uk "If something goes by a false name, would it mean that thing is fake? False by nature?" By Gilbert Durandil

Anthony Appleyard

Google Translater's Japanese-to-English translater makes the same mistake as yours. It seems to have a bug :: it treats end-of-line as end-of-sentence. Its French-to-English version did the same with a French sentence:- Le chat mangeait la souris. AS The cat ate it mouse. Le chat mangeait la souris. AS The cat ate the mouse. This may happen only with very short lines.

User 10331519

Those sentences are identical; Japanese is not grammatically affected by line wrapping. As the other commenter suggests, the problem lies with Google Translate (which, by the way, generally produces gibberish from Japanese).

Mario Prawirosudiro

It does, but this generally only happens in katakana/hiragana, not kanji. For example, a small 'tsu' indicates a long consonant. And, the presence of 'u' after anything that ends with 'o' (like 'to' or 'ko') indicates a long 'o'. The same goes for an 'i' after anything that ends with 'e'. I can't put down the exact letters here, but I'm sure you'll be able to find many sources for katakana/hiragana online. That said, I've never seen any split-up words in any Japanese texts I've seen (admittedly, they're mostly games and manga ;) ), and there are cases where two or more words are joined together to provide context. So if you're looking for a way to split words up, my suggestion is don't.

jibalt

Whoosh!

Kirk 10389821

I help maintain a site that supports 11 languages including Japanese. We have a phrase based system we use, and we use outsourced translators, and in-house spot checkers familiar with different languages. We build new interfaces in English (because that is our primary language). We try to design for the fact that English is usually the shortest way to say most things (in characters due to a volume of 2,3 and 4 letter words), by putting labels ABOVE fields, not in front of them. Then we send out all of the phases to be translated. We then review the pages in every language to make sure something drastic did not happen to the flow. We also use the same group(s) to translate. So, they know our background/context. They also require that ALL translations are reviewed by a second person, and edited before they get back to us. HTH

Simon Lee Shugar

Thanks Kirk. This is most likely the option I am going to suggest. Gary from an earlier post said something similar and both replies seem to be the most sensible solution. We have a similar process to yours I think it is just the design we need to be more wary of. Thanks again!

Simon Lee Shugar (Software Developer) www.simonshugar.co.uk "If something goes by a false name, would it mean that thing is fake? False by nature?" By Gilbert Durandil

Kirk 10389821

No worries. Real experience counts for a lot. Also, we use SUBSTITUTION based phrase, it turns out it makes many things easier to translate in larger chunks. Example: "All E-Mail replies will be sent to {EMAIL_REPLY_TO} and will be sent from {EMAIL_FROM} so be sure to white list this account." By doing this, if the language REQUIRES a completely different ordering, it will come back properly: "XXX {EMAIL_FROM} YYY {EMAIL_REPLY_TO} ZZZ" "AAA {EMAIL_REPLY_TO} BBB {EMAIL_FROM} CCC" Trust me. A lot of output has embedded data from the web site. Think about a simple email to change your password using the attached link. Without such control, where do you put the link? How would that affect the wording? How about a link in case they DID NOT choose to reset their password?

DanKorn

If you think Japanese is fun, try some bidirectional text, that is, mixing left-to-right with right-to-left text (such as English and Arabic). Good luck trying to make sense of how lines should wrap!

RASPeter

Blocks of kanji that should be read together as a single word are quite common, actually. It's similar to how we string together Greek and Latin roots to make new words. For example: Locomotive = 機関車

User 11817871

Japanese newspapers still flow text right to left and vertically, top to bottom! What is even more impressive is that Japanese OCR software, such as Fuji Xerox's DocuWorks Desk, understands this!

Colorado_Bill

As many others have mentioned, Google translate is awful when in comes to Japanese. I have been studying Japanese for quite some time and even I am sometimes mislead by the weird formatting of various web pages. Since no one else has mentioned it, I will try to help a bit here with the language side of it -- typically you don't see Japanese words broken up between lines since it makes it harder to read (esp with Kanjis that have multiple readings). What this means is that "ideally" you would break up sentences on "particle" and word boundaries. Particles are "markers" that help identify the subject and object etc. Parsing for these is way beyond this type of discussion, and require some working knowledge of the language. Also, to make it harder yet, spaces are not necessary NOR required in Japanese writing (western style punctuation use has crept in though). As a rough algorithm though (if you cannot read nor parse for real particles/words) you could assume any Kanji followed by Hiragana is a word (until you hit a period or another Kanji or Katakana). Also, Katakana are single words too (typically foreign words like "code" in your case). In your case: 好き (すき) is a single word (to like/love, depending on context) and shouldn't be broken up with a line break (IMHO). Following this algorithm would break up the sentence like: 私は, ____ コード ____ が ___ 好き which gives four "words" -- it turns out that for this case you have 2 particles は and が ( for those keeping score ) but keeping them attached to their prior "word" isn't typically too confusing to read. This was probably too long winded for this question but I hope it helped some.

Bill

Mario Prawirosudiro

Indeed they are. I really meant to say syllables. In most cases, kanji letters, when strung together, act as a single syllable. For example, the kanji for 'person' could be read 'hito' when standalone, or 'jin' when it's a part of a word. However, in most cases, they're standalone syllables, unlike (for example) 'ho' + 'u', which is read 'hoo' (long 'o'). At least that is what I know. I don't have much formal training when it comes to Japanese. I mostly learn it out of self interest.

RASPeter

I think you actually mean katakana, not kanji. Japanese use two phonetic character sets: katakana and hiragana. Historically, hiragana was used by women and katakana was used by men, and you can still see that history in the characters themselves. Hiragana tend to be more round and have loops, while katakana tend to be more angular. In modern usage, hiragana is used for native Japanese words and katakana is used for foreign words. Kanji are Chinese characters, and each is a word in itself, representing a distinct concept. They are not used as syllables, because that's what katakana and hiragana were created for. When kanji are strung together it is to merge the concepts together to describe a new thing that can't be described adequately by any existing single character (again, just like we do with Greek and Latin roots). All the characters in both katakana and hiragana were derived from kanji that have the same pronunciation, and most (possibly all?) retain the meaning of the original kanji, even though that's not typically how they're used. It's pretty common to see kanji and hiragana together, and there are two general cases. First, small hiragana are sometimes placed above or below a kanji character as a pronunciation guide, called furigana, because children are generally taught hiragana first and gradually introduced to kanji as they get older. Second, hiragana are often added to the end of a kanji word (one or more characters) to indicate verb conjugation, because Japanese has verb tenses and Mandarin (which kanji were actually created for) does not. Anyway, I hope I didn't go too overboard there. I actually do have some formal training in both Chinese and Japanese. Not enough to claim even moderate fluency, sadly, but enough to understand how the writing systems work.

Mario Prawirosudiro

Yes, sorry. I meant one kanji letter is usually one syllable, though is some cases, it could be more than one (like the kanji for mountain in 'Fujiyama'). Like I said in my original post, katakana/hiragana in many cases require more than one letter to produce a syllable. When I said 'syllable', it's from the perspective of someone who's used to Latin alphabet, thus 'yama' is two syllables, though it's written with one kanji. And 'he + i' (which is read 'hee' with a long 'e') is a single syllable, though it's written with two hiraganas. So what I'm trying to say is, while you could split words consisting only of kanji into two lines with ease, that migh not be the case with words containing horagana/katakana, due to that reason, as it might confuse the reader. Or at least makes it harder for them to read. Now, I don't know what kind of provision the Japanese language has for dealing with word splitting for line breaks, but personally, I've never seen any split words in the texts (*cough*manga*cough*) I've read. I don't know about websites though.

RASPeter

If katakana/hiragana are present in typed Japanese there will also be spaces. There might not be if it's pure kanji, but anyone who can read that is already having to figure out a lot from context (verb tense, etc) so putting a line break in the wrong spot is not likely to increase their cognitive load significantly.