XML encoding issue
-
Hello everyone, Here is my code, and it will always output UTF-16 at XML header even if I set the XML declaration to UTF-8. Here is my code and output. My questions, 1. How to make UTF-8 in header other than UTF-16? 2. Is the XML string really UTF-16 encoded or UTF-8 encoded? I think in C#, string is always UTF-16 encoded, why do we need a UTF-8 in header?
<?xml version="1.0" encoding="utf-16"?>
<CategoryList a="12345" b="1d5458cd-a070-40cc-a3f4-cf3c394013cc" c="true" />using System;
using System.Text;
using System.IO;
using System.Xml;class Test
{
public static void Main()
{
XmlDocument xmlDoc = new XmlDocument();// Write down the XML declaration XmlDeclaration xmlDeclaration = xmlDoc.CreateXmlDeclaration("1.0", "utf-8", null); // Create the root element XmlElement rootNode = xmlDoc.CreateElement("CategoryList"); xmlDoc.InsertBefore(xmlDeclaration, xmlDoc.DocumentElement); // Set attribute name and value! rootNode.SetAttribute("a", "12345"); rootNode.SetAttribute("b", Guid.NewGuid().ToString()); rootNode.SetAttribute("c", "true"); xmlDoc.AppendChild(rootNode); // Save to the XML file StringWriter stream = new StringWriter(); xmlDoc.Save(stream); string content = stream.ToString(); Console.Write(content); return; }
}
thanks in advance, George
Looking at msdn documenation: http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.createxmldeclaration.aspx[^] The section about encoding says, "The value of the encoding attribute. This is the encoding that is used when you save the XmlDocument to a file or a stream; therefore, it must be set to a string supported by the Encoding class, otherwise Save fails. If this is nullNothingnullptra null reference (Nothing in Visual Basic) or String.Empty, the Save method does not write an encoding attribute on the XML declaration and therefore the default encoding, UTF-8, is used. Note: If the XmlDocument is saved to either a TextWriter or an XmlTextWriter, this encoding value is discarded. Instead, the encoding of the TextWriter or the XmlTextWriter is used. This ensures that the XML written out can be read back using the correct encoding. " So I would guess that the "note" applies in your case. Your StringWriter that you are saving to is causing the encoding value to be ignored. (I imagine that the underlying StringBuilder is using UTF-16 strings) If you were to use the XmlTextWriter, then you can specify the encoding that you want.
-
Looking at msdn documenation: http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.createxmldeclaration.aspx[^] The section about encoding says, "The value of the encoding attribute. This is the encoding that is used when you save the XmlDocument to a file or a stream; therefore, it must be set to a string supported by the Encoding class, otherwise Save fails. If this is nullNothingnullptra null reference (Nothing in Visual Basic) or String.Empty, the Save method does not write an encoding attribute on the XML declaration and therefore the default encoding, UTF-8, is used. Note: If the XmlDocument is saved to either a TextWriter or an XmlTextWriter, this encoding value is discarded. Instead, the encoding of the TextWriter or the XmlTextWriter is used. This ensures that the XML written out can be read back using the correct encoding. " So I would guess that the "note" applies in your case. Your StringWriter that you are saving to is causing the encoding value to be ignored. (I imagine that the underlying StringBuilder is using UTF-16 strings) If you were to use the XmlTextWriter, then you can specify the encoding that you want.
Great wmba! I studied it. And I think the following statements applies to my issue, right? -------------------- Note: If the XmlDocument is saved to either a TextWriter or an XmlTextWriter, this encoding value is discarded. Instead, the encoding of the TextWriter or the XmlTextWriter is used. This ensures that the XML written out can be read back using the correct encoding. " -------------------- But It only mentions TextWriter and XmlTextWriter, which will be able to use their own encoding approach, but I am using StringWriter, it is not mentioned in the document, right? regards, George
-
Great wmba! I studied it. And I think the following statements applies to my issue, right? -------------------- Note: If the XmlDocument is saved to either a TextWriter or an XmlTextWriter, this encoding value is discarded. Instead, the encoding of the TextWriter or the XmlTextWriter is used. This ensures that the XML written out can be read back using the correct encoding. " -------------------- But It only mentions TextWriter and XmlTextWriter, which will be able to use their own encoding approach, but I am using StringWriter, it is not mentioned in the document, right? regards, George
-
Great wmba! I studied it. And I think the following statements applies to my issue, right? -------------------- Note: If the XmlDocument is saved to either a TextWriter or an XmlTextWriter, this encoding value is discarded. Instead, the encoding of the TextWriter or the XmlTextWriter is used. This ensures that the XML written out can be read back using the correct encoding. " -------------------- But It only mentions TextWriter and XmlTextWriter, which will be able to use their own encoding approach, but I am using StringWriter, it is not mentioned in the document, right? regards, George
George_George wrote:
I am using StringWriter
That doesn't write to a file, does it? Always use an
XmlTextWriter
for writing XML documents to files. -
You are using StringWriter, and it "Implements a TextWriter for writing information to a string." (http://msdn.microsoft.com/en-us/library/system.io.stringwriter.aspx[^])
Thanks wmba, 1. I have solved this issue from your help. Here is my code. Could you review whether it is correct please?
using System;
using System.Text;
using System.IO;
using System.Xml;class FSOpenWrite
{
public static void Main()
{
StringWriter stream = new StringWriter();
XmlTextWriter writer = new XmlTextWriter(stream);
writer.WriteStartElement("Stock");
writer.WriteAttributeString("Symbol", "123");
writer.WriteElementString("Price", "456");
writer.WriteElementString("Change", "abc");
writer.WriteElementString("Volume", "edd");
writer.WriteEndElement();string content = stream.ToString(); return; }
}
2. Why in my original code in question, even if I set UTF-16, but I can only use UTF-8 encoding? regards, George
-
George_George wrote:
I am using StringWriter
That doesn't write to a file, does it? Always use an
XmlTextWriter
for writing XML documents to files.Thanks PIEBALDconsult, I only need a memory representation (string) for XML. No need to write to a file. My question is, why even if I set UTF-8 property, but in my original question and code, UTF-16 header is displayed? regards, George
-
Thanks PIEBALDconsult, I only need a memory representation (string) for XML. No need to write to a file. My question is, why even if I set UTF-8 property, but in my original question and code, UTF-16 header is displayed? regards, George
I'm guessing that it's because .net strings are two-byte Unicode, but I could easily be wrong.
-
I'm guessing that it's because .net strings are two-byte Unicode, but I could easily be wrong.
Thanks PIEBALDconsult, I agree C# is using UTF-16 as internal encoding approach, but why the XML header UTF-8 which is already set is overwritten by UTF-16? regards, George
-
Thanks PIEBALDconsult, I agree C# is using UTF-16 as internal encoding approach, but why the XML header UTF-8 which is already set is overwritten by UTF-16? regards, George
Because doing otherwise would be wrong. What problem are you trying to solve?
-
Because doing otherwise would be wrong. What problem are you trying to solve?
Thanks PIEBALDconsult, I do not quite understand why I set UTF-8 header, but UTF-16 is output in my original sample. What is the internal operations which steals and changes my original header? :-) regards, George
-
Thanks PIEBALDconsult, I do not quite understand why I set UTF-8 header, but UTF-16 is output in my original sample. What is the internal operations which steals and changes my original header? :-) regards, George
The XmlDocument.Save and XmlTextWriter operation will only write well-formed XML. It knows that the StringWriter uses UTF-16 so it sets the proper encoding. Encoding in UTF-16, but saying it's UTF-8 would yield mal-formed XML. If you want UTF-8, write it to a file, a StringBuilder won't do it.
-
Thanks wmba, 1. I have solved this issue from your help. Here is my code. Could you review whether it is correct please?
using System;
using System.Text;
using System.IO;
using System.Xml;class FSOpenWrite
{
public static void Main()
{
StringWriter stream = new StringWriter();
XmlTextWriter writer = new XmlTextWriter(stream);
writer.WriteStartElement("Stock");
writer.WriteAttributeString("Symbol", "123");
writer.WriteElementString("Price", "456");
writer.WriteElementString("Change", "abc");
writer.WriteElementString("Volume", "edd");
writer.WriteEndElement();string content = stream.ToString(); return; }
}
2. Why in my original code in question, even if I set UTF-16, but I can only use UTF-8 encoding? regards, George
With your code sample, you are missing the part to tells the XmlTextWriter what encoding to use. If you use any class that is derived from a TextWriter (like StringWriter), then you can't specify the encoding. The reason for this is that the base string in a StringWriter is UTF-16, so you have no options for using a different Encoding. If however, you use a MemoryStream, or something derived directly from Stream, then you can specify a different Encoding. Anyway, here is a code snippet that describes this:
MemoryStream ms = new MemoryStream(); //Set the encoding to UTF8: XmlTextWriter writer = new XmlTextWriter(ms, Encoding.UTF8); //Just makes the xml easier to read: writer.Formatting = Formatting.Indented; //Write out our xml document: writer.WriteStartDocument(); writer.WriteStartElement("Stock"); writer.WriteAttributeString("Symbol", "123"); writer.WriteElementString("Price", "456"); writer.WriteElementString("Change", "abc"); writer.WriteElementString("Volume", "edd"); writer.WriteEndElement(); //Reset our stream's read pointer, so we can read back from our memory stream: writer.Flush(); ms.Seek(0, SeekOrigin.Begin); //Read our memory stream, and output to console: StreamReader sr = new StreamReader(ms); string content = sr.ReadToEnd(); Console.WriteLine(content); return;
It is important to note that you could have used a similar technique in your original code when you used the XmlDocument. The reason why you were getting the UTF-16 encoding is because your underlying writer class was a string. StringWriter writes directly to a string (or possibly a StringBuilder). And because strings in .NET are all UTF-16, that is the encoding you got. When you write directly to a stream (FileStream, MemoryStream, etc), then you are not writing to a string, but conceptually you are writing to just an array of bytes. Because of that you can specify a different encoding. Anyway, I hope this helps you out.
-
The XmlDocument.Save and XmlTextWriter operation will only write well-formed XML. It knows that the StringWriter uses UTF-16 so it sets the proper encoding. Encoding in UTF-16, but saying it's UTF-8 would yield mal-formed XML. If you want UTF-8, write it to a file, a StringBuilder won't do it.
Can I set the encoding of StringWriter from UTF-16 to UTF-8? regards, George
-
With your code sample, you are missing the part to tells the XmlTextWriter what encoding to use. If you use any class that is derived from a TextWriter (like StringWriter), then you can't specify the encoding. The reason for this is that the base string in a StringWriter is UTF-16, so you have no options for using a different Encoding. If however, you use a MemoryStream, or something derived directly from Stream, then you can specify a different Encoding. Anyway, here is a code snippet that describes this:
MemoryStream ms = new MemoryStream(); //Set the encoding to UTF8: XmlTextWriter writer = new XmlTextWriter(ms, Encoding.UTF8); //Just makes the xml easier to read: writer.Formatting = Formatting.Indented; //Write out our xml document: writer.WriteStartDocument(); writer.WriteStartElement("Stock"); writer.WriteAttributeString("Symbol", "123"); writer.WriteElementString("Price", "456"); writer.WriteElementString("Change", "abc"); writer.WriteElementString("Volume", "edd"); writer.WriteEndElement(); //Reset our stream's read pointer, so we can read back from our memory stream: writer.Flush(); ms.Seek(0, SeekOrigin.Begin); //Read our memory stream, and output to console: StreamReader sr = new StreamReader(ms); string content = sr.ReadToEnd(); Console.WriteLine(content); return;
It is important to note that you could have used a similar technique in your original code when you used the XmlDocument. The reason why you were getting the UTF-16 encoding is because your underlying writer class was a string. StringWriter writes directly to a string (or possibly a StringBuilder). And because strings in .NET are all UTF-16, that is the encoding you got. When you write directly to a stream (FileStream, MemoryStream, etc), then you are not writing to a string, but conceptually you are writing to just an array of bytes. Because of that you can specify a different encoding. Anyway, I hope this helps you out.
I like your sample, wmba! So, cool!! :-) regards, George
-
Can I set the encoding of StringWriter from UTF-16 to UTF-8? regards, George
NO, goddammit! You can't! .net strings are UTF-16, and that's it, end of story!
-
NO, goddammit! You can't! .net strings are UTF-16, and that's it, end of story!
Thanks PIEBALDconsult, I have solved this issue by using MemoryStream. :-) regards, George