Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. XML encoding issue

XML encoding issue

Scheduled Pinned Locked Moved C#
csharpxmlhelptutorialquestion
17 Posts 3 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • G Offline
    G Offline
    George_George
    wrote on last edited by
    #1

    Hello everyone, Here is my code, and it will always output UTF-16 at XML header even if I set the XML declaration to UTF-8. Here is my code and output. My questions, 1. How to make UTF-8 in header other than UTF-16? 2. Is the XML string really UTF-16 encoded or UTF-8 encoded? I think in C#, string is always UTF-16 encoded, why do we need a UTF-8 in header?

    <?xml version="1.0" encoding="utf-16"?>
    <CategoryList a="12345" b="1d5458cd-a070-40cc-a3f4-cf3c394013cc" c="true" />

    using System;
    using System.Text;
    using System.IO;
    using System.Xml;

    class Test
    {
    public static void Main()
    {
    XmlDocument xmlDoc = new XmlDocument();

        // Write down the XML declaration
        XmlDeclaration xmlDeclaration = xmlDoc.CreateXmlDeclaration("1.0", "utf-8", null);
    
        // Create the root element
        XmlElement rootNode = xmlDoc.CreateElement("CategoryList");
        xmlDoc.InsertBefore(xmlDeclaration, xmlDoc.DocumentElement);
        // Set attribute name and value!
        rootNode.SetAttribute("a", "12345");
        rootNode.SetAttribute("b", Guid.NewGuid().ToString());
        rootNode.SetAttribute("c", "true");
        xmlDoc.AppendChild(rootNode);
    
        // Save to the XML file
        StringWriter stream = new StringWriter();
        xmlDoc.Save(stream);
        string content = stream.ToString();
        Console.Write(content);
    
        return;
    }
    

    }

    thanks in advance, George

    M 1 Reply Last reply
    0
    • G George_George

      Hello everyone, Here is my code, and it will always output UTF-16 at XML header even if I set the XML declaration to UTF-8. Here is my code and output. My questions, 1. How to make UTF-8 in header other than UTF-16? 2. Is the XML string really UTF-16 encoded or UTF-8 encoded? I think in C#, string is always UTF-16 encoded, why do we need a UTF-8 in header?

      <?xml version="1.0" encoding="utf-16"?>
      <CategoryList a="12345" b="1d5458cd-a070-40cc-a3f4-cf3c394013cc" c="true" />

      using System;
      using System.Text;
      using System.IO;
      using System.Xml;

      class Test
      {
      public static void Main()
      {
      XmlDocument xmlDoc = new XmlDocument();

          // Write down the XML declaration
          XmlDeclaration xmlDeclaration = xmlDoc.CreateXmlDeclaration("1.0", "utf-8", null);
      
          // Create the root element
          XmlElement rootNode = xmlDoc.CreateElement("CategoryList");
          xmlDoc.InsertBefore(xmlDeclaration, xmlDoc.DocumentElement);
          // Set attribute name and value!
          rootNode.SetAttribute("a", "12345");
          rootNode.SetAttribute("b", Guid.NewGuid().ToString());
          rootNode.SetAttribute("c", "true");
          xmlDoc.AppendChild(rootNode);
      
          // Save to the XML file
          StringWriter stream = new StringWriter();
          xmlDoc.Save(stream);
          string content = stream.ToString();
          Console.Write(content);
      
          return;
      }
      

      }

      thanks in advance, George

      M Offline
      M Offline
      mlsteeves
      wrote on last edited by
      #2

      Looking at msdn documenation: http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.createxmldeclaration.aspx[^] The section about encoding says, "The value of the encoding attribute. This is the encoding that is used when you save the XmlDocument to a file or a stream; therefore, it must be set to a string supported by the Encoding class, otherwise Save fails. If this is nullNothingnullptra null reference (Nothing in Visual Basic) or String.Empty, the Save method does not write an encoding attribute on the XML declaration and therefore the default encoding, UTF-8, is used. Note: If the XmlDocument is saved to either a TextWriter or an XmlTextWriter, this encoding value is discarded. Instead, the encoding of the TextWriter or the XmlTextWriter is used. This ensures that the XML written out can be read back using the correct encoding. " So I would guess that the "note" applies in your case. Your StringWriter that you are saving to is causing the encoding value to be ignored. (I imagine that the underlying StringBuilder is using UTF-16 strings) If you were to use the XmlTextWriter, then you can specify the encoding that you want.

      G 1 Reply Last reply
      0
      • M mlsteeves

        Looking at msdn documenation: http://msdn.microsoft.com/en-us/library/system.xml.xmldocument.createxmldeclaration.aspx[^] The section about encoding says, "The value of the encoding attribute. This is the encoding that is used when you save the XmlDocument to a file or a stream; therefore, it must be set to a string supported by the Encoding class, otherwise Save fails. If this is nullNothingnullptra null reference (Nothing in Visual Basic) or String.Empty, the Save method does not write an encoding attribute on the XML declaration and therefore the default encoding, UTF-8, is used. Note: If the XmlDocument is saved to either a TextWriter or an XmlTextWriter, this encoding value is discarded. Instead, the encoding of the TextWriter or the XmlTextWriter is used. This ensures that the XML written out can be read back using the correct encoding. " So I would guess that the "note" applies in your case. Your StringWriter that you are saving to is causing the encoding value to be ignored. (I imagine that the underlying StringBuilder is using UTF-16 strings) If you were to use the XmlTextWriter, then you can specify the encoding that you want.

        G Offline
        G Offline
        George_George
        wrote on last edited by
        #3

        Great wmba! I studied it. And I think the following statements applies to my issue, right? -------------------- Note: If the XmlDocument is saved to either a TextWriter or an XmlTextWriter, this encoding value is discarded. Instead, the encoding of the TextWriter or the XmlTextWriter is used. This ensures that the XML written out can be read back using the correct encoding. " -------------------- But It only mentions TextWriter and XmlTextWriter, which will be able to use their own encoding approach, but I am using StringWriter, it is not mentioned in the document, right? regards, George

        M P 2 Replies Last reply
        0
        • G George_George

          Great wmba! I studied it. And I think the following statements applies to my issue, right? -------------------- Note: If the XmlDocument is saved to either a TextWriter or an XmlTextWriter, this encoding value is discarded. Instead, the encoding of the TextWriter or the XmlTextWriter is used. This ensures that the XML written out can be read back using the correct encoding. " -------------------- But It only mentions TextWriter and XmlTextWriter, which will be able to use their own encoding approach, but I am using StringWriter, it is not mentioned in the document, right? regards, George

          M Offline
          M Offline
          mlsteeves
          wrote on last edited by
          #4

          You are using StringWriter, and it "Implements a TextWriter for writing information to a string." (http://msdn.microsoft.com/en-us/library/system.io.stringwriter.aspx[^])

          G 1 Reply Last reply
          0
          • G George_George

            Great wmba! I studied it. And I think the following statements applies to my issue, right? -------------------- Note: If the XmlDocument is saved to either a TextWriter or an XmlTextWriter, this encoding value is discarded. Instead, the encoding of the TextWriter or the XmlTextWriter is used. This ensures that the XML written out can be read back using the correct encoding. " -------------------- But It only mentions TextWriter and XmlTextWriter, which will be able to use their own encoding approach, but I am using StringWriter, it is not mentioned in the document, right? regards, George

            P Offline
            P Offline
            PIEBALDconsult
            wrote on last edited by
            #5

            George_George wrote:

            I am using StringWriter

            That doesn't write to a file, does it? Always use an XmlTextWriter for writing XML documents to files.

            G 1 Reply Last reply
            0
            • M mlsteeves

              You are using StringWriter, and it "Implements a TextWriter for writing information to a string." (http://msdn.microsoft.com/en-us/library/system.io.stringwriter.aspx[^])

              G Offline
              G Offline
              George_George
              wrote on last edited by
              #6

              Thanks wmba, 1. I have solved this issue from your help. Here is my code. Could you review whether it is correct please?

              using System;
              using System.Text;
              using System.IO;
              using System.Xml;

              class FSOpenWrite
              {
              public static void Main()
              {
              StringWriter stream = new StringWriter();
              XmlTextWriter writer = new XmlTextWriter(stream);
              writer.WriteStartElement("Stock");
              writer.WriteAttributeString("Symbol", "123");
              writer.WriteElementString("Price", "456");
              writer.WriteElementString("Change", "abc");
              writer.WriteElementString("Volume", "edd");
              writer.WriteEndElement();

                  string content = stream.ToString();
              
                  return;
              }
              

              }

              2. Why in my original code in question, even if I set UTF-16, but I can only use UTF-8 encoding? regards, George

              M 1 Reply Last reply
              0
              • P PIEBALDconsult

                George_George wrote:

                I am using StringWriter

                That doesn't write to a file, does it? Always use an XmlTextWriter for writing XML documents to files.

                G Offline
                G Offline
                George_George
                wrote on last edited by
                #7

                Thanks PIEBALDconsult, I only need a memory representation (string) for XML. No need to write to a file. My question is, why even if I set UTF-8 property, but in my original question and code, UTF-16 header is displayed? regards, George

                P 1 Reply Last reply
                0
                • G George_George

                  Thanks PIEBALDconsult, I only need a memory representation (string) for XML. No need to write to a file. My question is, why even if I set UTF-8 property, but in my original question and code, UTF-16 header is displayed? regards, George

                  P Offline
                  P Offline
                  PIEBALDconsult
                  wrote on last edited by
                  #8

                  I'm guessing that it's because .net strings are two-byte Unicode, but I could easily be wrong.

                  G 1 Reply Last reply
                  0
                  • P PIEBALDconsult

                    I'm guessing that it's because .net strings are two-byte Unicode, but I could easily be wrong.

                    G Offline
                    G Offline
                    George_George
                    wrote on last edited by
                    #9

                    Thanks PIEBALDconsult, I agree C# is using UTF-16 as internal encoding approach, but why the XML header UTF-8 which is already set is overwritten by UTF-16? regards, George

                    P 1 Reply Last reply
                    0
                    • G George_George

                      Thanks PIEBALDconsult, I agree C# is using UTF-16 as internal encoding approach, but why the XML header UTF-8 which is already set is overwritten by UTF-16? regards, George

                      P Offline
                      P Offline
                      PIEBALDconsult
                      wrote on last edited by
                      #10

                      Because doing otherwise would be wrong. What problem are you trying to solve?

                      G 1 Reply Last reply
                      0
                      • P PIEBALDconsult

                        Because doing otherwise would be wrong. What problem are you trying to solve?

                        G Offline
                        G Offline
                        George_George
                        wrote on last edited by
                        #11

                        Thanks PIEBALDconsult, I do not quite understand why I set UTF-8 header, but UTF-16 is output in my original sample. What is the internal operations which steals and changes my original header? :-) regards, George

                        P 1 Reply Last reply
                        0
                        • G George_George

                          Thanks PIEBALDconsult, I do not quite understand why I set UTF-8 header, but UTF-16 is output in my original sample. What is the internal operations which steals and changes my original header? :-) regards, George

                          P Offline
                          P Offline
                          PIEBALDconsult
                          wrote on last edited by
                          #12

                          The XmlDocument.Save and XmlTextWriter operation will only write well-formed XML. It knows that the StringWriter uses UTF-16 so it sets the proper encoding. Encoding in UTF-16, but saying it's UTF-8 would yield mal-formed XML. If you want UTF-8, write it to a file, a StringBuilder won't do it.

                          G 1 Reply Last reply
                          0
                          • G George_George

                            Thanks wmba, 1. I have solved this issue from your help. Here is my code. Could you review whether it is correct please?

                            using System;
                            using System.Text;
                            using System.IO;
                            using System.Xml;

                            class FSOpenWrite
                            {
                            public static void Main()
                            {
                            StringWriter stream = new StringWriter();
                            XmlTextWriter writer = new XmlTextWriter(stream);
                            writer.WriteStartElement("Stock");
                            writer.WriteAttributeString("Symbol", "123");
                            writer.WriteElementString("Price", "456");
                            writer.WriteElementString("Change", "abc");
                            writer.WriteElementString("Volume", "edd");
                            writer.WriteEndElement();

                                string content = stream.ToString();
                            
                                return;
                            }
                            

                            }

                            2. Why in my original code in question, even if I set UTF-16, but I can only use UTF-8 encoding? regards, George

                            M Offline
                            M Offline
                            mlsteeves
                            wrote on last edited by
                            #13

                            With your code sample, you are missing the part to tells the XmlTextWriter what encoding to use. If you use any class that is derived from a TextWriter (like StringWriter), then you can't specify the encoding. The reason for this is that the base string in a StringWriter is UTF-16, so you have no options for using a different Encoding. If however, you use a MemoryStream, or something derived directly from Stream, then you can specify a different Encoding. Anyway, here is a code snippet that describes this:

                                    MemoryStream ms = new MemoryStream();
                            
                                    //Set the encoding to UTF8:
                                    XmlTextWriter writer = new XmlTextWriter(ms, Encoding.UTF8);
                            
                                    //Just makes the xml easier to read:
                                    writer.Formatting = Formatting.Indented;
                            
                                    //Write out our xml document:
                                    writer.WriteStartDocument();
                                    writer.WriteStartElement("Stock");
                                    writer.WriteAttributeString("Symbol", "123");
                                    writer.WriteElementString("Price", "456");
                                    writer.WriteElementString("Change", "abc");
                                    writer.WriteElementString("Volume", "edd");
                                    writer.WriteEndElement();
                            
                                    //Reset our stream's read pointer, so we can read back from our memory stream:
                                    writer.Flush();
                                    ms.Seek(0, SeekOrigin.Begin);
                            
                                    //Read our memory stream, and output to console:
                                    StreamReader sr = new StreamReader(ms);
                                    string content = sr.ReadToEnd();
                                    Console.WriteLine(content);
                            
                                    return;
                            

                            It is important to note that you could have used a similar technique in your original code when you used the XmlDocument. The reason why you were getting the UTF-16 encoding is because your underlying writer class was a string. StringWriter writes directly to a string (or possibly a StringBuilder). And because strings in .NET are all UTF-16, that is the encoding you got. When you write directly to a stream (FileStream, MemoryStream, etc), then you are not writing to a string, but conceptually you are writing to just an array of bytes. Because of that you can specify a different encoding. Anyway, I hope this helps you out.

                            G 1 Reply Last reply
                            0
                            • P PIEBALDconsult

                              The XmlDocument.Save and XmlTextWriter operation will only write well-formed XML. It knows that the StringWriter uses UTF-16 so it sets the proper encoding. Encoding in UTF-16, but saying it's UTF-8 would yield mal-formed XML. If you want UTF-8, write it to a file, a StringBuilder won't do it.

                              G Offline
                              G Offline
                              George_George
                              wrote on last edited by
                              #14

                              Can I set the encoding of StringWriter from UTF-16 to UTF-8? regards, George

                              P 1 Reply Last reply
                              0
                              • M mlsteeves

                                With your code sample, you are missing the part to tells the XmlTextWriter what encoding to use. If you use any class that is derived from a TextWriter (like StringWriter), then you can't specify the encoding. The reason for this is that the base string in a StringWriter is UTF-16, so you have no options for using a different Encoding. If however, you use a MemoryStream, or something derived directly from Stream, then you can specify a different Encoding. Anyway, here is a code snippet that describes this:

                                        MemoryStream ms = new MemoryStream();
                                
                                        //Set the encoding to UTF8:
                                        XmlTextWriter writer = new XmlTextWriter(ms, Encoding.UTF8);
                                
                                        //Just makes the xml easier to read:
                                        writer.Formatting = Formatting.Indented;
                                
                                        //Write out our xml document:
                                        writer.WriteStartDocument();
                                        writer.WriteStartElement("Stock");
                                        writer.WriteAttributeString("Symbol", "123");
                                        writer.WriteElementString("Price", "456");
                                        writer.WriteElementString("Change", "abc");
                                        writer.WriteElementString("Volume", "edd");
                                        writer.WriteEndElement();
                                
                                        //Reset our stream's read pointer, so we can read back from our memory stream:
                                        writer.Flush();
                                        ms.Seek(0, SeekOrigin.Begin);
                                
                                        //Read our memory stream, and output to console:
                                        StreamReader sr = new StreamReader(ms);
                                        string content = sr.ReadToEnd();
                                        Console.WriteLine(content);
                                
                                        return;
                                

                                It is important to note that you could have used a similar technique in your original code when you used the XmlDocument. The reason why you were getting the UTF-16 encoding is because your underlying writer class was a string. StringWriter writes directly to a string (or possibly a StringBuilder). And because strings in .NET are all UTF-16, that is the encoding you got. When you write directly to a stream (FileStream, MemoryStream, etc), then you are not writing to a string, but conceptually you are writing to just an array of bytes. Because of that you can specify a different encoding. Anyway, I hope this helps you out.

                                G Offline
                                G Offline
                                George_George
                                wrote on last edited by
                                #15

                                I like your sample, wmba! So, cool!! :-) regards, George

                                1 Reply Last reply
                                0
                                • G George_George

                                  Can I set the encoding of StringWriter from UTF-16 to UTF-8? regards, George

                                  P Offline
                                  P Offline
                                  PIEBALDconsult
                                  wrote on last edited by
                                  #16

                                  NO, goddammit! You can't! .net strings are UTF-16, and that's it, end of story!

                                  G 1 Reply Last reply
                                  0
                                  • P PIEBALDconsult

                                    NO, goddammit! You can't! .net strings are UTF-16, and that's it, end of story!

                                    G Offline
                                    G Offline
                                    George_George
                                    wrote on last edited by
                                    #17

                                    Thanks PIEBALDconsult, I have solved this issue by using MemoryStream. :-) regards, George

                                    1 Reply Last reply
                                    0
                                    Reply
                                    • Reply as topic
                                    Log in to reply
                                    • Oldest to Newest
                                    • Newest to Oldest
                                    • Most Votes


                                    • Login

                                    • Don't have an account? Register

                                    • Login or register to search.
                                    • First post
                                      Last post
                                    0
                                    • Categories
                                    • Recent
                                    • Tags
                                    • Popular
                                    • World
                                    • Users
                                    • Groups