XML encoding issue
-
George_George wrote:
I am using StringWriter
That doesn't write to a file, does it? Always use an
XmlTextWriter
for writing XML documents to files.Thanks PIEBALDconsult, I only need a memory representation (string) for XML. No need to write to a file. My question is, why even if I set UTF-8 property, but in my original question and code, UTF-16 header is displayed? regards, George
-
Thanks PIEBALDconsult, I only need a memory representation (string) for XML. No need to write to a file. My question is, why even if I set UTF-8 property, but in my original question and code, UTF-16 header is displayed? regards, George
I'm guessing that it's because .net strings are two-byte Unicode, but I could easily be wrong.
-
I'm guessing that it's because .net strings are two-byte Unicode, but I could easily be wrong.
Thanks PIEBALDconsult, I agree C# is using UTF-16 as internal encoding approach, but why the XML header UTF-8 which is already set is overwritten by UTF-16? regards, George
-
Thanks PIEBALDconsult, I agree C# is using UTF-16 as internal encoding approach, but why the XML header UTF-8 which is already set is overwritten by UTF-16? regards, George
Because doing otherwise would be wrong. What problem are you trying to solve?
-
Because doing otherwise would be wrong. What problem are you trying to solve?
Thanks PIEBALDconsult, I do not quite understand why I set UTF-8 header, but UTF-16 is output in my original sample. What is the internal operations which steals and changes my original header? :-) regards, George
-
Thanks PIEBALDconsult, I do not quite understand why I set UTF-8 header, but UTF-16 is output in my original sample. What is the internal operations which steals and changes my original header? :-) regards, George
The XmlDocument.Save and XmlTextWriter operation will only write well-formed XML. It knows that the StringWriter uses UTF-16 so it sets the proper encoding. Encoding in UTF-16, but saying it's UTF-8 would yield mal-formed XML. If you want UTF-8, write it to a file, a StringBuilder won't do it.
-
Thanks wmba, 1. I have solved this issue from your help. Here is my code. Could you review whether it is correct please?
using System;
using System.Text;
using System.IO;
using System.Xml;class FSOpenWrite
{
public static void Main()
{
StringWriter stream = new StringWriter();
XmlTextWriter writer = new XmlTextWriter(stream);
writer.WriteStartElement("Stock");
writer.WriteAttributeString("Symbol", "123");
writer.WriteElementString("Price", "456");
writer.WriteElementString("Change", "abc");
writer.WriteElementString("Volume", "edd");
writer.WriteEndElement();string content = stream.ToString(); return; }
}
2. Why in my original code in question, even if I set UTF-16, but I can only use UTF-8 encoding? regards, George
With your code sample, you are missing the part to tells the XmlTextWriter what encoding to use. If you use any class that is derived from a TextWriter (like StringWriter), then you can't specify the encoding. The reason for this is that the base string in a StringWriter is UTF-16, so you have no options for using a different Encoding. If however, you use a MemoryStream, or something derived directly from Stream, then you can specify a different Encoding. Anyway, here is a code snippet that describes this:
MemoryStream ms = new MemoryStream(); //Set the encoding to UTF8: XmlTextWriter writer = new XmlTextWriter(ms, Encoding.UTF8); //Just makes the xml easier to read: writer.Formatting = Formatting.Indented; //Write out our xml document: writer.WriteStartDocument(); writer.WriteStartElement("Stock"); writer.WriteAttributeString("Symbol", "123"); writer.WriteElementString("Price", "456"); writer.WriteElementString("Change", "abc"); writer.WriteElementString("Volume", "edd"); writer.WriteEndElement(); //Reset our stream's read pointer, so we can read back from our memory stream: writer.Flush(); ms.Seek(0, SeekOrigin.Begin); //Read our memory stream, and output to console: StreamReader sr = new StreamReader(ms); string content = sr.ReadToEnd(); Console.WriteLine(content); return;
It is important to note that you could have used a similar technique in your original code when you used the XmlDocument. The reason why you were getting the UTF-16 encoding is because your underlying writer class was a string. StringWriter writes directly to a string (or possibly a StringBuilder). And because strings in .NET are all UTF-16, that is the encoding you got. When you write directly to a stream (FileStream, MemoryStream, etc), then you are not writing to a string, but conceptually you are writing to just an array of bytes. Because of that you can specify a different encoding. Anyway, I hope this helps you out.
-
The XmlDocument.Save and XmlTextWriter operation will only write well-formed XML. It knows that the StringWriter uses UTF-16 so it sets the proper encoding. Encoding in UTF-16, but saying it's UTF-8 would yield mal-formed XML. If you want UTF-8, write it to a file, a StringBuilder won't do it.
Can I set the encoding of StringWriter from UTF-16 to UTF-8? regards, George
-
With your code sample, you are missing the part to tells the XmlTextWriter what encoding to use. If you use any class that is derived from a TextWriter (like StringWriter), then you can't specify the encoding. The reason for this is that the base string in a StringWriter is UTF-16, so you have no options for using a different Encoding. If however, you use a MemoryStream, or something derived directly from Stream, then you can specify a different Encoding. Anyway, here is a code snippet that describes this:
MemoryStream ms = new MemoryStream(); //Set the encoding to UTF8: XmlTextWriter writer = new XmlTextWriter(ms, Encoding.UTF8); //Just makes the xml easier to read: writer.Formatting = Formatting.Indented; //Write out our xml document: writer.WriteStartDocument(); writer.WriteStartElement("Stock"); writer.WriteAttributeString("Symbol", "123"); writer.WriteElementString("Price", "456"); writer.WriteElementString("Change", "abc"); writer.WriteElementString("Volume", "edd"); writer.WriteEndElement(); //Reset our stream's read pointer, so we can read back from our memory stream: writer.Flush(); ms.Seek(0, SeekOrigin.Begin); //Read our memory stream, and output to console: StreamReader sr = new StreamReader(ms); string content = sr.ReadToEnd(); Console.WriteLine(content); return;
It is important to note that you could have used a similar technique in your original code when you used the XmlDocument. The reason why you were getting the UTF-16 encoding is because your underlying writer class was a string. StringWriter writes directly to a string (or possibly a StringBuilder). And because strings in .NET are all UTF-16, that is the encoding you got. When you write directly to a stream (FileStream, MemoryStream, etc), then you are not writing to a string, but conceptually you are writing to just an array of bytes. Because of that you can specify a different encoding. Anyway, I hope this helps you out.
I like your sample, wmba! So, cool!! :-) regards, George
-
Can I set the encoding of StringWriter from UTF-16 to UTF-8? regards, George
NO, goddammit! You can't! .net strings are UTF-16, and that's it, end of story!
-
NO, goddammit! You can't! .net strings are UTF-16, and that's it, end of story!
Thanks PIEBALDconsult, I have solved this issue by using MemoryStream. :-) regards, George