Lost in Unicode
-
We have some code ( c#, asp.net codebehind ) that we use to translate small amounts of text between languages. It's pretty simple. We post the text to translate to google and then parse the return HTML stream to get the translated text. All was well until we needed to add chinese to the list of translated languages. I assumed the return from google would have the translated chinese in unicode. So we parsed out the unicode that represents the returned chinese and used the code: // note: UniResponse contains the byte entries for the returned chinese only UnicodeEncoding Unicode = new UnicodeEncoding(); int charCount = Unicode.GetCharCount(UniResponse, 0, UniResponse.Length); char[] chars = new Char[charCount]; Unicode.GetChars(UniResponse, 0, UniResponse.Length, chars, 0); string s=new string(chars); This should get the byte stream back to a string type I could add to the dynamically created HTML for the new page. But when our page is displayed, it does show some chinese chars/words/whatever, but they do not match what google shows. We verified we do indeed parse out the correct bytes from the returned HTML. All the samples I could find on CG only translate with the english char set, they all seem to stop short of non-english char sets. When using unicode, I don't have to worry about codepages or anything do I? If anyone knows of an example or has some words of wisdom concerning the translation process. Please let me know. TIA :-D