Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. .NET (Core and Framework)
  4. StreamWriter.WriteLine converting hex A0 to hex EF BF BD

StreamWriter.WriteLine converting hex A0 to hex EF BF BD

Scheduled Pinned Locked Moved .NET (Core and Framework)
csharpphphtmlvisual-studiosecurity
4 Posts 2 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M Offline
    M Offline
    Mike Bluett
    wrote on last edited by
    #1

    In 2006, I wrote a C# app (using .NET 2.0 in VS 2005 Express) which was designed to add some PHP code to several HTML files. The HTML files are encoded as ISO 8859-1. This application worked fine until around Nov 2007 when some unknown change occurred in my Windows XP installation which resulted in hex A0 (  in html) being converted to hex EF BF BD. I have since found info which states that an illegal character will be recoded as EF BF BD. However, A0 is a LEGAL character in ISO 8859-1 encoding. In trying to resolve this problem I have tried all of the StreamWriter class Encoding options and have also completely uninstalled VS 2005 and all .NET SDK's and runtime and then subsequently installed .NET 3.5 SDK's (which includes the latest 2.0 SDK) and runtimes and also C# VS 2008. None of these actions resolved the problem. Any ideas on how to prevent this conversion from occurring? The code sequence which makes use of StreamWriter.WriteLine is as follows: private void Process_Files() { const bool OVERWRITE = true; const bool APPEND = true; ArrayList file_array = new ArrayList(); string newString; bool found = false; foreach (string fileName in fileList) { Console.WriteLine("Processing: " + fileName + ".htm"); // Add PHP authentication info to top of file FileInfo srcFile = new FileInfo("PHP-Auth.txt"); srcFile.CopyTo(HTM_FILE_PATH + "temp.php", OVERWRITE); StreamReader htmFile = new StreamReader(HTM_FILE_PATH + fileName + ".htm"); StreamWriter tmpFile = new StreamWriter(HTM_FILE_PATH + "temp.php", APPEND); // Read the entire .htm file into memory. while (!htmFile.EndOfStream) { file_array.Add(htmFile.ReadLine()); } htmFile.Close(); // Append a blank line to the end of the temp.php file. tmpFile.WriteLine(); tmpFile.WriteLine(); // Process each line of the .htm file. foreach (string line in file_array) { // Compare each name in the fileList to each line of this particular // .htm file. foreach (string name in fileList) { // If this line of the file contains one of the filenames in "fileList"

    G 1 Reply Last reply
    0
    • M Mike Bluett

      In 2006, I wrote a C# app (using .NET 2.0 in VS 2005 Express) which was designed to add some PHP code to several HTML files. The HTML files are encoded as ISO 8859-1. This application worked fine until around Nov 2007 when some unknown change occurred in my Windows XP installation which resulted in hex A0 (  in html) being converted to hex EF BF BD. I have since found info which states that an illegal character will be recoded as EF BF BD. However, A0 is a LEGAL character in ISO 8859-1 encoding. In trying to resolve this problem I have tried all of the StreamWriter class Encoding options and have also completely uninstalled VS 2005 and all .NET SDK's and runtime and then subsequently installed .NET 3.5 SDK's (which includes the latest 2.0 SDK) and runtimes and also C# VS 2008. None of these actions resolved the problem. Any ideas on how to prevent this conversion from occurring? The code sequence which makes use of StreamWriter.WriteLine is as follows: private void Process_Files() { const bool OVERWRITE = true; const bool APPEND = true; ArrayList file_array = new ArrayList(); string newString; bool found = false; foreach (string fileName in fileList) { Console.WriteLine("Processing: " + fileName + ".htm"); // Add PHP authentication info to top of file FileInfo srcFile = new FileInfo("PHP-Auth.txt"); srcFile.CopyTo(HTM_FILE_PATH + "temp.php", OVERWRITE); StreamReader htmFile = new StreamReader(HTM_FILE_PATH + fileName + ".htm"); StreamWriter tmpFile = new StreamWriter(HTM_FILE_PATH + "temp.php", APPEND); // Read the entire .htm file into memory. while (!htmFile.EndOfStream) { file_array.Add(htmFile.ReadLine()); } htmFile.Close(); // Append a blank line to the end of the temp.php file. tmpFile.WriteLine(); tmpFile.WriteLine(); // Process each line of the .htm file. foreach (string line in file_array) { // Compare each name in the fileList to each line of this particular // .htm file. foreach (string name in fileList) { // If this line of the file contains one of the filenames in "fileList"

      G Offline
      G Offline
      Guffa
      wrote on last edited by
      #2

      Have you tried to simply specify the encoding when you open the reader and writer?

      Encoding isoWesternEuropean = Encoding.GetEncoding(28591);
      StreamReader htmFile = new StreamReader(HTM_FILE_PATH + fileName + ".htm", isoWesternEuropean);
      StreamWriter tmpFile = new StreamWriter(HTM_FILE_PATH + "temp.php", APPEND, isoWesternEuropean);

      If that doesn't work, the file that you are reading contains a byte order mark (BOM) that overrides the encoding. You can check this by adding the .bin extension to the file and open it in Visual Studio to examine the actual binary data in the file. If it contains a BOM, the file is broken, as it contains information about decoding it that doesn't match how it was encoded. To fix this you would either have to use/write a program that removes the BOM, or open the file as a binary stream so that you can read past the BOM before starting to read the stream with the StreamReader.

      Despite everything, the person most likely to be fooling you next is yourself.

      M 1 Reply Last reply
      0
      • G Guffa

        Have you tried to simply specify the encoding when you open the reader and writer?

        Encoding isoWesternEuropean = Encoding.GetEncoding(28591);
        StreamReader htmFile = new StreamReader(HTM_FILE_PATH + fileName + ".htm", isoWesternEuropean);
        StreamWriter tmpFile = new StreamWriter(HTM_FILE_PATH + "temp.php", APPEND, isoWesternEuropean);

        If that doesn't work, the file that you are reading contains a byte order mark (BOM) that overrides the encoding. You can check this by adding the .bin extension to the file and open it in Visual Studio to examine the actual binary data in the file. If it contains a BOM, the file is broken, as it contains information about decoding it that doesn't match how it was encoded. To fix this you would either have to use/write a program that removes the BOM, or open the file as a binary stream so that you can read past the BOM before starting to read the stream with the StreamReader.

        Despite everything, the person most likely to be fooling you next is yourself.

        M Offline
        M Offline
        Mike Bluett
        wrote on last edited by
        #3

        That did the trick. Thanks very much. I had tried all of the obvious encodings, but there is no obvious encoding for ISO Western. 1. How did you know that ISO 8859-1 was represented by 28591 (in .NET) (i.e., where would I have looked to find such information)? 2. The fact that you were able to respond with an answer to this type of question conveys to me that you know a fair amount about programming. What kinds of things did you do to get to the understanding of programming that you have today? Thanks again and have a great day!!!

        G 1 Reply Last reply
        0
        • M Mike Bluett

          That did the trick. Thanks very much. I had tried all of the obvious encodings, but there is no obvious encoding for ISO Western. 1. How did you know that ISO 8859-1 was represented by 28591 (in .NET) (i.e., where would I have looked to find such information)? 2. The fact that you were able to respond with an answer to this type of question conveys to me that you know a fair amount about programming. What kinds of things did you do to get to the understanding of programming that you have today? Thanks again and have a great day!!!

          G Offline
          G Offline
          Guffa
          wrote on last edited by
          #4

          Mike Bluett wrote:

          1. How did you know that ISO 8859-1 was represented by 28591 (in .NET) (i.e., where would I have looked to find such information)?

          In the documentation on the page about the Encoding class, there is a list of encodings: MSDN Library: Encoding class[^]

          Mike Bluett wrote:

          2. The fact that you were able to respond with an answer to this type of question conveys to me that you know a fair amount about programming. What kinds of things did you do to get to the understanding of programming that you have today?

          Well, I did a lot of programming. :) I have used many different programming languages on several different platforms. It helps to have done a bit of machine level programming, so that you know what really happens below the surface. Also, the last years I have been hanging out a lot in forums like this, helping people. You learn a lot from that. :)

          Despite everything, the person most likely to be fooling you next is yourself.

          1 Reply Last reply
          0
          Reply
          • Reply as topic
          Log in to reply
          • Oldest to Newest
          • Newest to Oldest
          • Most Votes


          • Login

          • Don't have an account? Register

          • Login or register to search.
          • First post
            Last post
          0
          • Categories
          • Recent
          • Tags
          • Popular
          • World
          • Users
          • Groups