Unicode Pain
-
I am trying to scrape some language translations from Googles translation services. All works well except trying to scrape Japanese/Korean/Chinese. I am using the webclient in a c# codebehind, I input the word "cheese" to be tranlated to Japanese, and the byte sequence returned is 131,96,129,91,131,89.( go to http://www.google.com/translate\_t and give it a try). It returns 3 glyphs. Sounds good so far, unicode - 2 bytes per char = 6 bytes to get 3 glyphs. The return is UTF8, so I transform the 6 bytes using the UnicodeEncoding, to get a 3 char(glyph?) string. But when displayed, it's not the same glyphs shown by google???? After searching high and low, I finally determined the last glyph displayed by google has the unicoded hex value 0x30BA. Which isn't the value I get when I unicode 131,89(dec values). Basically I need to get the scraped values into the form � (per glyph) so I can move them about the system as a standard string. If I save the google return, in notepad using MS Ariel Unicode and save with unicode encoding, the resulting values in the saved file, match up to what is being displayed by google. Namely the value of the final glpyh is ズ. What am i missing, read UTF8, convert to Unicode encoding, get the bytes per glyph and create strings like, "�". Shouldn't that work??? TIA