Encoding and the Streamreader
-
Hi, I have run into a small problem reading text from a file. Consider this line below as part of the source file.
(Code 1956, § 486.01)
I'm using the good old Streamreader to read the file.string line = sourceStreamReader.ReadLine();
When this line executes on the above source line the "Section" character " §" is read as char 65533. I've declare the streamreader as follows.StreamReader sourceStreamReader = new StreamReader(sourceFiles[i].ToString()); System.Diagnostics.Debug.WriteLine(sourceStreamReader.CurrentEncoding.EncodingName);
the output from the debug line tells me thats its UTF-8 encoding, but the "section" character is not read nor writes as its original value. Can someone please tell me where I went wrong!Just because we can; does not mean we should.
-
Hi, I have run into a small problem reading text from a file. Consider this line below as part of the source file.
(Code 1956, § 486.01)
I'm using the good old Streamreader to read the file.string line = sourceStreamReader.ReadLine();
When this line executes on the above source line the "Section" character " §" is read as char 65533. I've declare the streamreader as follows.StreamReader sourceStreamReader = new StreamReader(sourceFiles[i].ToString()); System.Diagnostics.Debug.WriteLine(sourceStreamReader.CurrentEncoding.EncodingName);
the output from the debug line tells me thats its UTF-8 encoding, but the "section" character is not read nor writes as its original value. Can someone please tell me where I went wrong!Just because we can; does not mean we should.
As it turns out, changing the code to the listing below resolved my issue.
//read the source file FileStream fStream = File.OpenRead(sourceFiles[i].ToString()); byte[] buffer = new byte[fStream.Length]; int bytesRead; bytesRead = fStream.Read(buffer, 0, buffer.Length); fStream.Close(); Decoder decoder = Encoding.Default.GetDecoder(); char[] cBuffer = new char[buffer.Length]; int bytesConverted, charsConverted; bool bCompleted; decoder.Convert(buffer, 0, buffer.Length, cBuffer, 0, buffer.Length, false, out bytesConverted, out charsConverted, out bCompleted);
Just because we can; does not mean we should.
-
As it turns out, changing the code to the listing below resolved my issue.
//read the source file FileStream fStream = File.OpenRead(sourceFiles[i].ToString()); byte[] buffer = new byte[fStream.Length]; int bytesRead; bytesRead = fStream.Read(buffer, 0, buffer.Length); fStream.Close(); Decoder decoder = Encoding.Default.GetDecoder(); char[] cBuffer = new char[buffer.Length]; int bytesConverted, charsConverted; bool bCompleted; decoder.Convert(buffer, 0, buffer.Length, cBuffer, 0, buffer.Length, false, out bytesConverted, out charsConverted, out bCompleted);
Just because we can; does not mean we should.
KaptinKrunch wrote:
changing the code to the listing below resolved my issue.
That means that the file is not at all Unicode, but ANSI.
KaptinKrunch wrote:
bytesRead = fStream.Read(buffer, 0, buffer.Length);
Ouch. You read from the file, but you ignore the result. The Read method doesn't have to read as much data as you request. You have to loop until all data is read, or the method return zero. The easiest is of course to just replace all that code with:
string text = File.ReadAllText(sourceFiles[i].ToString(), Encoding.Default);
Despite everything, the person most likely to be fooling you next is yourself.