Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. how to extract non ascii charachters from a text.

how to extract non ascii charachters from a text.

Scheduled Pinned Locked Moved C / C++ / MFC
performancehelptutorialquestion
2 Posts 2 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A Offline
    A Offline
    ananttrivedi
    wrote on last edited by
    #1

    hi, i have a text in an "unknown text encoding" and I need to extract all non ascii charachters from it; replace them with some ascii tags; do some processing and then; replace the original non ascii chars back. *(non ascii charachters like japanese and chinese charachter sets) well basically i have a vrml scenegraph and i need to render it using the open inventor renderer, which has problems with non ASCII charachters. so I am doing this .. get the sceneData in memory as binary data void *sceneData= ; scenedata(sizeof(filesize)); then read each BYTE from memory buffer; check the ASCII value to each BYTE; if outside ASCII range -> then replace it with a TAG. ELSE copy as such; NOW is this check a correct way to do it .. i mean do both the bytes of the multibyte charachters need to be outside the ASCII range ??? is there a better alternative way than this HACK !! please suggest !! :-( i am totally stuck

    V 1 Reply Last reply
    0
    • A ananttrivedi

      hi, i have a text in an "unknown text encoding" and I need to extract all non ascii charachters from it; replace them with some ascii tags; do some processing and then; replace the original non ascii chars back. *(non ascii charachters like japanese and chinese charachter sets) well basically i have a vrml scenegraph and i need to render it using the open inventor renderer, which has problems with non ASCII charachters. so I am doing this .. get the sceneData in memory as binary data void *sceneData= ; scenedata(sizeof(filesize)); then read each BYTE from memory buffer; check the ASCII value to each BYTE; if outside ASCII range -> then replace it with a TAG. ELSE copy as such; NOW is this check a correct way to do it .. i mean do both the bytes of the multibyte charachters need to be outside the ASCII range ??? is there a better alternative way than this HACK !! please suggest !! :-( i am totally stuck

      V Offline
      V Offline
      vmaltsev
      wrote on last edited by
      #2

      Well, my suggestion will be to forget about bytes and treat each character as unsigned short (WORD). In this case your character isn't ASCII if value bigger than 255 (or 127 if you want only latin chars).

      1 Reply Last reply
      0
      Reply
      • Reply as topic
      Log in to reply
      • Oldest to Newest
      • Newest to Oldest
      • Most Votes


      • Login

      • Don't have an account? Register

      • Login or register to search.
      • First post
        Last post
      0
      • Categories
      • Recent
      • Tags
      • Popular
      • World
      • Users
      • Groups