A plea to Japanese (or Asian) Language Web Developers

Simon Lee Shugar

I've added this question to the Web Development forum but it was suggested I add a plea into the lounge also. I'd appreciate the help - the link and question are below! Link: http://www.codeproject.com/Messages/5102431/How-do-you-deal-with-Japanese-Asian-languages-in-r.aspx[^] "This is something I have come up against recently and that is dealing with the Japanese language in responsive web applications. Changing the formation of a sentence can change the entire context or meaning. How do we deal with this - if a sentence is too long for a field and spills onto the next line? Is there anywhere I can read up on how this is handled? Any information would be appreciated! Example 私は、コードが好き = I , like the code 私は、コードが好き = I , code is good Can"

Simon Lee Shugar (Software Developer) www.simonshugar.co.uk "If something goes by a false name, would it mean that thing is fake? False by nature?" By Gilbert Durandil

Jeremy Falcon

Without a specific question I cannot give a specific answer and keep in mind I've only studied Japanese for a few months, but your issue has nothing to do with a different language... a different font would give you the same issue. Perhaps even a different text size (for the visually impaired) would give you this problem as well. There's no one exact answer, except to remember one core, fundamental of design philosophy between print and the web. The web is live and fluid. Print is not. So in other words you need to design your page in such as way to work in more than one scenario. If for whatever reason, you switch languages and it causes a line wrap in one language where in English there was none, then your UI layout needs to handle it. In the case where this is not acceptable, then consider having one layout for one language and another for a different language. http://www.nomensa.com/blog/2010/7-tips-and-techniques-for-multi-lingual-website-accessibility[^] Points 6 and 7 in that link talk about this a bit more.

Jeremy Falcon

Jeremy Falcon

Also, here is a Bootstrap website supporting more than one language to help get your motor running.. http://en.houbovypark.cz/[^]

Jeremy Falcon

Gary Wheeler

The book is a bit old and may be out of print, but you might take a look at "CJKV Information Processing" by Ken Lunde. It was published by O'Reilly in 1999. The ISBN is 1-56592-224-7. Chapter 7, "Typography", includes a discussion on text wrapping issues. I can tell you from experience in developing localized applications that the general rule is to never construct text from stock phrases. Every piece of text in the application should be grammatically complete. The only time you should substitute one piece of text into another for display to the user is for parameter values (numbers, filenames, etc.). Absolutely nothing will impress your user base less than a clumsy UI that makes grammatical mistakes like a three-year old on Valium.

Software Zen: delete this;

Kornfeld Eliyahu Peter

The line-breaking rules of Japanese are very permissive - you can have a line-break everywhere (almost). Unfortunatelly even those rules are only implemented in FireFox... I check (not now but in the past) some Japanese sites and found no specific handling of breaking so it seams to me that a fluent Japanese able to figure out what all this about regardless the line breaking... If you however insist not-to-break (as there is no 'right' or 'exact' breaking) you may use CSS, but that of course may (probably will) change the exact layout of your site... Try to come up with a fluid CSS design where the exact width of a text block will not break (but change a bit) your overall page design...

Skipper: We'll fix it. Alex: Fix it? How you gonna fix this? Skipper: Grit, spit and a whole lotta duct tape.

dan sh

So, this is why they are slow readers. They need to first figure out what the sentence actually means.

Simon Lee Shugar

Thanks for the help I'll have a look into it. I was wondering if the Japanese language used certain characters to symbolise a block of characters that must be read together. The web is an interesting weave indeed.

Simon Lee Shugar (Software Developer) www.simonshugar.co.uk "If something goes by a false name, would it mean that thing is fake? False by nature?" By Gilbert Durandil

Simon Lee Shugar

Gary Wheeler wrote:

I can tell you from experience in developing localized applications that the general rule is to never construct text from stock phrases.

I think this might be the best advise yet. It's not just translating text from English to japanese or translating from English to Japanese in context to application but in fact translating English to Japanese in context to the application and in context to the design... Interesting. (Substitute the languages at will)

Simon Lee Shugar (Software Developer) www.simonshugar.co.uk "If something goes by a false name, would it mean that thing is fake? False by nature?" By Gilbert Durandil

Anthony Appleyard

Google Translater's Japanese-to-English translater makes the same mistake as yours. It seems to have a bug :: it treats end-of-line as end-of-sentence. Its French-to-English version did the same with a French sentence:- Le chat mangeait la souris. AS The cat ate it mouse. Le chat mangeait la souris. AS The cat ate the mouse. This may happen only with very short lines.

User 10331519

Those sentences are identical; Japanese is not grammatically affected by line wrapping. As the other commenter suggests, the problem lies with Google Translate (which, by the way, generally produces gibberish from Japanese).

Mario Prawirosudiro

It does, but this generally only happens in katakana/hiragana, not kanji. For example, a small 'tsu' indicates a long consonant. And, the presence of 'u' after anything that ends with 'o' (like 'to' or 'ko') indicates a long 'o'. The same goes for an 'i' after anything that ends with 'e'. I can't put down the exact letters here, but I'm sure you'll be able to find many sources for katakana/hiragana online. That said, I've never seen any split-up words in any Japanese texts I've seen (admittedly, they're mostly games and manga ;) ), and there are cases where two or more words are joined together to provide context. So if you're looking for a way to split words up, my suggestion is don't.

jibalt

Whoosh!

Kirk 10389821

I help maintain a site that supports 11 languages including Japanese. We have a phrase based system we use, and we use outsourced translators, and in-house spot checkers familiar with different languages. We build new interfaces in English (because that is our primary language). We try to design for the fact that English is usually the shortest way to say most things (in characters due to a volume of 2,3 and 4 letter words), by putting labels ABOVE fields, not in front of them. Then we send out all of the phases to be translated. We then review the pages in every language to make sure something drastic did not happen to the flow. We also use the same group(s) to translate. So, they know our background/context. They also require that ALL translations are reviewed by a second person, and edited before they get back to us. HTH

Simon Lee Shugar

Thanks Kirk. This is most likely the option I am going to suggest. Gary from an earlier post said something similar and both replies seem to be the most sensible solution. We have a similar process to yours I think it is just the design we need to be more wary of. Thanks again!

Simon Lee Shugar (Software Developer) www.simonshugar.co.uk "If something goes by a false name, would it mean that thing is fake? False by nature?" By Gilbert Durandil

Kirk 10389821

No worries. Real experience counts for a lot. Also, we use SUBSTITUTION based phrase, it turns out it makes many things easier to translate in larger chunks. Example: "All E-Mail replies will be sent to {EMAIL_REPLY_TO} and will be sent from {EMAIL_FROM} so be sure to white list this account." By doing this, if the language REQUIRES a completely different ordering, it will come back properly: "XXX {EMAIL_FROM} YYY {EMAIL_REPLY_TO} ZZZ" "AAA {EMAIL_REPLY_TO} BBB {EMAIL_FROM} CCC" Trust me. A lot of output has embedded data from the web site. Think about a simple email to change your password using the attached link. Without such control, where do you put the link? How would that affect the wording? How about a link in case they DID NOT choose to reset their password?

DanKorn

If you think Japanese is fun, try some bidirectional text, that is, mixing left-to-right with right-to-left text (such as English and Arabic). Good luck trying to make sense of how lines should wrap!

RASPeter

Blocks of kanji that should be read together as a single word are quite common, actually. It's similar to how we string together Greek and Latin roots to make new words. For example: Locomotive = 機関車

User 11817871

Japanese newspapers still flow text right to left and vertically, top to bottom! What is even more impressive is that Japanese OCR software, such as Fuji Xerox's DocuWorks Desk, understands this!

Colorado_Bill

As many others have mentioned, Google translate is awful when in comes to Japanese. I have been studying Japanese for quite some time and even I am sometimes mislead by the weird formatting of various web pages. Since no one else has mentioned it, I will try to help a bit here with the language side of it -- typically you don't see Japanese words broken up between lines since it makes it harder to read (esp with Kanjis that have multiple readings). What this means is that "ideally" you would break up sentences on "particle" and word boundaries. Particles are "markers" that help identify the subject and object etc. Parsing for these is way beyond this type of discussion, and require some working knowledge of the language. Also, to make it harder yet, spaces are not necessary NOR required in Japanese writing (western style punctuation use has crept in though). As a rough algorithm though (if you cannot read nor parse for real particles/words) you could assume any Kanji followed by Hiragana is a word (until you hit a period or another Kanji or Katakana). Also, Katakana are single words too (typically foreign words like "code" in your case). In your case: 好き (すき) is a single word (to like/love, depending on context) and shouldn't be broken up with a line break (IMHO). Following this algorithm would break up the sentence like: 私は, ____ コード ____ が ___ 好き which gives four "words" -- it turns out that for this case you have 2 particles は and が ( for those keeping score ) but keeping them attached to their prior "word" isn't typically too confusing to read. This was probably too long winded for this question but I hope it helped some.

Bill

Mario Prawirosudiro

Indeed they are. I really meant to say syllables. In most cases, kanji letters, when strung together, act as a single syllable. For example, the kanji for 'person' could be read 'hito' when standalone, or 'jin' when it's a part of a word. However, in most cases, they're standalone syllables, unlike (for example) 'ho' + 'u', which is read 'hoo' (long 'o'). At least that is what I know. I don't have much formal training when it comes to Japanese. I mostly learn it out of self interest.