XML encoding issue
-
Great wmba! I studied it. And I think the following statements applies to my issue, right? -------------------- Note: If the XmlDocument is saved to either a TextWriter or an XmlTextWriter, this encoding value is discarded. Instead, the encoding of the TextWriter or the XmlTextWriter is used. This ensures that the XML written out can be read back using the correct encoding. " -------------------- But It only mentions TextWriter and XmlTextWriter, which will be able to use their own encoding approach, but I am using StringWriter, it is not mentioned in the document, right? regards, George
-
Great wmba! I studied it. And I think the following statements applies to my issue, right? -------------------- Note: If the XmlDocument is saved to either a TextWriter or an XmlTextWriter, this encoding value is discarded. Instead, the encoding of the TextWriter or the XmlTextWriter is used. This ensures that the XML written out can be read back using the correct encoding. " -------------------- But It only mentions TextWriter and XmlTextWriter, which will be able to use their own encoding approach, but I am using StringWriter, it is not mentioned in the document, right? regards, George
George_George wrote:
I am using StringWriter
That doesn't write to a file, does it? Always use an
XmlTextWriter
for writing XML documents to files. -
You are using StringWriter, and it "Implements a TextWriter for writing information to a string." (http://msdn.microsoft.com/en-us/library/system.io.stringwriter.aspx[^])
Thanks wmba, 1. I have solved this issue from your help. Here is my code. Could you review whether it is correct please?
using System;
using System.Text;
using System.IO;
using System.Xml;class FSOpenWrite
{
public static void Main()
{
StringWriter stream = new StringWriter();
XmlTextWriter writer = new XmlTextWriter(stream);
writer.WriteStartElement("Stock");
writer.WriteAttributeString("Symbol", "123");
writer.WriteElementString("Price", "456");
writer.WriteElementString("Change", "abc");
writer.WriteElementString("Volume", "edd");
writer.WriteEndElement();string content = stream.ToString(); return; }
}
2. Why in my original code in question, even if I set UTF-16, but I can only use UTF-8 encoding? regards, George
-
George_George wrote:
I am using StringWriter
That doesn't write to a file, does it? Always use an
XmlTextWriter
for writing XML documents to files.Thanks PIEBALDconsult, I only need a memory representation (string) for XML. No need to write to a file. My question is, why even if I set UTF-8 property, but in my original question and code, UTF-16 header is displayed? regards, George
-
Thanks PIEBALDconsult, I only need a memory representation (string) for XML. No need to write to a file. My question is, why even if I set UTF-8 property, but in my original question and code, UTF-16 header is displayed? regards, George
I'm guessing that it's because .net strings are two-byte Unicode, but I could easily be wrong.
-
I'm guessing that it's because .net strings are two-byte Unicode, but I could easily be wrong.
Thanks PIEBALDconsult, I agree C# is using UTF-16 as internal encoding approach, but why the XML header UTF-8 which is already set is overwritten by UTF-16? regards, George
-
Thanks PIEBALDconsult, I agree C# is using UTF-16 as internal encoding approach, but why the XML header UTF-8 which is already set is overwritten by UTF-16? regards, George
Because doing otherwise would be wrong. What problem are you trying to solve?
-
Because doing otherwise would be wrong. What problem are you trying to solve?
Thanks PIEBALDconsult, I do not quite understand why I set UTF-8 header, but UTF-16 is output in my original sample. What is the internal operations which steals and changes my original header? :-) regards, George
-
Thanks PIEBALDconsult, I do not quite understand why I set UTF-8 header, but UTF-16 is output in my original sample. What is the internal operations which steals and changes my original header? :-) regards, George
The XmlDocument.Save and XmlTextWriter operation will only write well-formed XML. It knows that the StringWriter uses UTF-16 so it sets the proper encoding. Encoding in UTF-16, but saying it's UTF-8 would yield mal-formed XML. If you want UTF-8, write it to a file, a StringBuilder won't do it.
-
Thanks wmba, 1. I have solved this issue from your help. Here is my code. Could you review whether it is correct please?
using System;
using System.Text;
using System.IO;
using System.Xml;class FSOpenWrite
{
public static void Main()
{
StringWriter stream = new StringWriter();
XmlTextWriter writer = new XmlTextWriter(stream);
writer.WriteStartElement("Stock");
writer.WriteAttributeString("Symbol", "123");
writer.WriteElementString("Price", "456");
writer.WriteElementString("Change", "abc");
writer.WriteElementString("Volume", "edd");
writer.WriteEndElement();string content = stream.ToString(); return; }
}
2. Why in my original code in question, even if I set UTF-16, but I can only use UTF-8 encoding? regards, George
With your code sample, you are missing the part to tells the XmlTextWriter what encoding to use. If you use any class that is derived from a TextWriter (like StringWriter), then you can't specify the encoding. The reason for this is that the base string in a StringWriter is UTF-16, so you have no options for using a different Encoding. If however, you use a MemoryStream, or something derived directly from Stream, then you can specify a different Encoding. Anyway, here is a code snippet that describes this:
MemoryStream ms = new MemoryStream(); //Set the encoding to UTF8: XmlTextWriter writer = new XmlTextWriter(ms, Encoding.UTF8); //Just makes the xml easier to read: writer.Formatting = Formatting.Indented; //Write out our xml document: writer.WriteStartDocument(); writer.WriteStartElement("Stock"); writer.WriteAttributeString("Symbol", "123"); writer.WriteElementString("Price", "456"); writer.WriteElementString("Change", "abc"); writer.WriteElementString("Volume", "edd"); writer.WriteEndElement(); //Reset our stream's read pointer, so we can read back from our memory stream: writer.Flush(); ms.Seek(0, SeekOrigin.Begin); //Read our memory stream, and output to console: StreamReader sr = new StreamReader(ms); string content = sr.ReadToEnd(); Console.WriteLine(content); return;
It is important to note that you could have used a similar technique in your original code when you used the XmlDocument. The reason why you were getting the UTF-16 encoding is because your underlying writer class was a string. StringWriter writes directly to a string (or possibly a StringBuilder). And because strings in .NET are all UTF-16, that is the encoding you got. When you write directly to a stream (FileStream, MemoryStream, etc), then you are not writing to a string, but conceptually you are writing to just an array of bytes. Because of that you can specify a different encoding. Anyway, I hope this helps you out.
-
The XmlDocument.Save and XmlTextWriter operation will only write well-formed XML. It knows that the StringWriter uses UTF-16 so it sets the proper encoding. Encoding in UTF-16, but saying it's UTF-8 would yield mal-formed XML. If you want UTF-8, write it to a file, a StringBuilder won't do it.
Can I set the encoding of StringWriter from UTF-16 to UTF-8? regards, George
-
With your code sample, you are missing the part to tells the XmlTextWriter what encoding to use. If you use any class that is derived from a TextWriter (like StringWriter), then you can't specify the encoding. The reason for this is that the base string in a StringWriter is UTF-16, so you have no options for using a different Encoding. If however, you use a MemoryStream, or something derived directly from Stream, then you can specify a different Encoding. Anyway, here is a code snippet that describes this:
MemoryStream ms = new MemoryStream(); //Set the encoding to UTF8: XmlTextWriter writer = new XmlTextWriter(ms, Encoding.UTF8); //Just makes the xml easier to read: writer.Formatting = Formatting.Indented; //Write out our xml document: writer.WriteStartDocument(); writer.WriteStartElement("Stock"); writer.WriteAttributeString("Symbol", "123"); writer.WriteElementString("Price", "456"); writer.WriteElementString("Change", "abc"); writer.WriteElementString("Volume", "edd"); writer.WriteEndElement(); //Reset our stream's read pointer, so we can read back from our memory stream: writer.Flush(); ms.Seek(0, SeekOrigin.Begin); //Read our memory stream, and output to console: StreamReader sr = new StreamReader(ms); string content = sr.ReadToEnd(); Console.WriteLine(content); return;
It is important to note that you could have used a similar technique in your original code when you used the XmlDocument. The reason why you were getting the UTF-16 encoding is because your underlying writer class was a string. StringWriter writes directly to a string (or possibly a StringBuilder). And because strings in .NET are all UTF-16, that is the encoding you got. When you write directly to a stream (FileStream, MemoryStream, etc), then you are not writing to a string, but conceptually you are writing to just an array of bytes. Because of that you can specify a different encoding. Anyway, I hope this helps you out.
I like your sample, wmba! So, cool!! :-) regards, George
-
Can I set the encoding of StringWriter from UTF-16 to UTF-8? regards, George
NO, goddammit! You can't! .net strings are UTF-16, and that's it, end of story!
-
NO, goddammit! You can't! .net strings are UTF-16, and that's it, end of story!
Thanks PIEBALDconsult, I have solved this issue by using MemoryStream. :-) regards, George