Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. Hebrew chars returned by Directory.GetFiles problem

Hebrew chars returned by Directory.GetFiles problem

Scheduled Pinned Locked Moved C#
helpdebuggingquestion
4 Posts 3 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • I Offline
    I Offline
    impeham
    wrote on last edited by
    #1

    I'm having the following issue: i am using "Directory.GetFiles" to retrieve all filenames from a path. In that path i have filenames which have hebrew characters in them. I use a debugger to take a close look at the characters that constructs such a filename string and i can see that the hebrew characters are not REAL char (they are numbers above 1000 - how can that be for char?). This makes a problem when i try to write the string to a file - the characters looks weird when i open it later with a text editor, and that is probably because the truly written thing for each char is 2 bytes instead of just one that represents each hebrew character. How can that be solved? Thanks.

    L M 2 Replies Last reply
    0
    • I impeham

      I'm having the following issue: i am using "Directory.GetFiles" to retrieve all filenames from a path. In that path i have filenames which have hebrew characters in them. I use a debugger to take a close look at the characters that constructs such a filename string and i can see that the hebrew characters are not REAL char (they are numbers above 1000 - how can that be for char?). This makes a problem when i try to write the string to a file - the characters looks weird when i open it later with a text editor, and that is probably because the truly written thing for each char is 2 bytes instead of just one that represents each hebrew character. How can that be solved? Thanks.

      L Offline
      L Offline
      Luc Pattyn
      wrote on last edited by
      #2

      Hi, you can specify which character encoding should be used when writing text to a file. By default ASCII is used, and non-ASCII characters get mapped onto ASCII characters somehow (e.g. accents would be dropped); of course for very different scripts, such mapping makes no sense. You really want a file that can hold real 16-bit characters where appropriate. One way of doing this is by using a StreamWriter; one of its constructor overloads takes an Encoding object, you should consider Encoding.Unicode BTW: your Hebrew characters are real characters, if Visual shows them as numbers that's to make sure you can read them (if you're unfamiliar with the script), and you can paste them like that in an ASCII file. Normally your source files are ASCII files, taking one byte per character; as soon as you paste a non-ASCII character in a string literal or so, the file will be saved as a UTF8 or a Unicode file, and may no longer be readable by other apps. :)

      Luc Pattyn


      try { [Search CP Articles] [Search CP Forums] [Forum Guidelines] [My Articles] } catch { [Google] }


      I 1 Reply Last reply
      0
      • L Luc Pattyn

        Hi, you can specify which character encoding should be used when writing text to a file. By default ASCII is used, and non-ASCII characters get mapped onto ASCII characters somehow (e.g. accents would be dropped); of course for very different scripts, such mapping makes no sense. You really want a file that can hold real 16-bit characters where appropriate. One way of doing this is by using a StreamWriter; one of its constructor overloads takes an Encoding object, you should consider Encoding.Unicode BTW: your Hebrew characters are real characters, if Visual shows them as numbers that's to make sure you can read them (if you're unfamiliar with the script), and you can paste them like that in an ASCII file. Normally your source files are ASCII files, taking one byte per character; as soon as you paste a non-ASCII character in a string literal or so, the file will be saved as a UTF8 or a Unicode file, and may no longer be readable by other apps. :)

        Luc Pattyn


        try { [Search CP Articles] [Search CP Forums] [Forum Guidelines] [My Articles] } catch { [Google] }


        I Offline
        I Offline
        impeham
        wrote on last edited by
        #3

        Well - using UTF8 with the StreamWriter did the job! Man - thanks a lot! :)

        1 Reply Last reply
        0
        • I impeham

          I'm having the following issue: i am using "Directory.GetFiles" to retrieve all filenames from a path. In that path i have filenames which have hebrew characters in them. I use a debugger to take a close look at the characters that constructs such a filename string and i can see that the hebrew characters are not REAL char (they are numbers above 1000 - how can that be for char?). This makes a problem when i try to write the string to a file - the characters looks weird when i open it later with a text editor, and that is probably because the truly written thing for each char is 2 bytes instead of just one that represents each hebrew character. How can that be solved? Thanks.

          M Offline
          M Offline
          Mike Dimmick
          wrote on last edited by
          #4

          impeham wrote:

          they are numbers above 1000 - how can that be for char?

          You're clearly an ex-C++ programmer. char in C# is not a byte-size quantity as it is in C++, it represents a single UTF-16 encoded value (i.e. it's a synonym for short). All strings in the .NET Framework are Unicode internally, using UTF-16. Hebrew characters fall in a block between U+0590 and U+05FF, with alef encoded at U+05D0 = 1488. The default encoding for .NET StreamWriter objects is UTF-8. Alef does indeed turn into two bytes in the output, 0xD7 0X90. If you want to use a different encoding, for example Windows codepage 1255 for Hebrew, you need to create a suitable Encoding object and pass it to the StreamWriter's constructor.

          Stability. What an interesting concept. -- Chris Maunder

          1 Reply Last reply
          0
          Reply
          • Reply as topic
          Log in to reply
          • Oldest to Newest
          • Newest to Oldest
          • Most Votes


          • Login

          • Don't have an account? Register

          • Login or register to search.
          • First post
            Last post
          0
          • Categories
          • Recent
          • Tags
          • Popular
          • World
          • Users
          • Groups