Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Web Development
  3. ASP.NET
  4. Unicode Pain

Unicode Pain

Scheduled Pinned Locked Moved ASP.NET
questioncsharpcomalgorithmsregex
1 Posts 1 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • N Offline
    N Offline
    nemo
    wrote on last edited by
    #1

    I am trying to scrape some language translations from Googles translation services. All works well except trying to scrape Japanese/Korean/Chinese. I am using the webclient in a c# codebehind, I input the word "cheese" to be tranlated to Japanese, and the byte sequence returned is 131,96,129,91,131,89.( go to http://www.google.com/translate\_t and give it a try). It returns 3 glyphs. Sounds good so far, unicode - 2 bytes per char = 6 bytes to get 3 glyphs. The return is UTF8, so I transform the 6 bytes using the UnicodeEncoding, to get a 3 char(glyph?) string. But when displayed, it's not the same glyphs shown by google???? After searching high and low, I finally determined the last glyph displayed by google has the unicoded hex value 0x30BA. Which isn't the value I get when I unicode 131,89(dec values). Basically I need to get the scraped values into the form � (per glyph) so I can move them about the system as a standard string. If I save the google return, in notepad using MS Ariel Unicode and save with unicode encoding, the resulting values in the saved file, match up to what is being displayed by google. Namely the value of the final glpyh is ズ. What am i missing, read UTF8, convert to Unicode encoding, get the bytes per glyph and create strings like, "�". Shouldn't that work??? TIA

    1 Reply Last reply
    0
    Reply
    • Reply as topic
    Log in to reply
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes


    • Login

    • Don't have an account? Register

    • Login or register to search.
    • First post
      Last post
    0
    • Categories
    • Recent
    • Tags
    • Popular
    • World
    • Users
    • Groups