Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. XML encoding issue

XML encoding issue

Scheduled Pinned Locked Moved C#
csharpxmlhelptutorialquestion
17 Posts 3 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • G George_George

    Great wmba! I studied it. And I think the following statements applies to my issue, right? -------------------- Note: If the XmlDocument is saved to either a TextWriter or an XmlTextWriter, this encoding value is discarded. Instead, the encoding of the TextWriter or the XmlTextWriter is used. This ensures that the XML written out can be read back using the correct encoding. " -------------------- But It only mentions TextWriter and XmlTextWriter, which will be able to use their own encoding approach, but I am using StringWriter, it is not mentioned in the document, right? regards, George

    M Offline
    M Offline
    mlsteeves
    wrote on last edited by
    #4

    You are using StringWriter, and it "Implements a TextWriter for writing information to a string." (http://msdn.microsoft.com/en-us/library/system.io.stringwriter.aspx[^])

    G 1 Reply Last reply
    0
    • G George_George

      Great wmba! I studied it. And I think the following statements applies to my issue, right? -------------------- Note: If the XmlDocument is saved to either a TextWriter or an XmlTextWriter, this encoding value is discarded. Instead, the encoding of the TextWriter or the XmlTextWriter is used. This ensures that the XML written out can be read back using the correct encoding. " -------------------- But It only mentions TextWriter and XmlTextWriter, which will be able to use their own encoding approach, but I am using StringWriter, it is not mentioned in the document, right? regards, George

      P Offline
      P Offline
      PIEBALDconsult
      wrote on last edited by
      #5

      George_George wrote:

      I am using StringWriter

      That doesn't write to a file, does it? Always use an XmlTextWriter for writing XML documents to files.

      G 1 Reply Last reply
      0
      • M mlsteeves

        You are using StringWriter, and it "Implements a TextWriter for writing information to a string." (http://msdn.microsoft.com/en-us/library/system.io.stringwriter.aspx[^])

        G Offline
        G Offline
        George_George
        wrote on last edited by
        #6

        Thanks wmba, 1. I have solved this issue from your help. Here is my code. Could you review whether it is correct please?

        using System;
        using System.Text;
        using System.IO;
        using System.Xml;

        class FSOpenWrite
        {
        public static void Main()
        {
        StringWriter stream = new StringWriter();
        XmlTextWriter writer = new XmlTextWriter(stream);
        writer.WriteStartElement("Stock");
        writer.WriteAttributeString("Symbol", "123");
        writer.WriteElementString("Price", "456");
        writer.WriteElementString("Change", "abc");
        writer.WriteElementString("Volume", "edd");
        writer.WriteEndElement();

            string content = stream.ToString();
        
            return;
        }
        

        }

        2. Why in my original code in question, even if I set UTF-16, but I can only use UTF-8 encoding? regards, George

        M 1 Reply Last reply
        0
        • P PIEBALDconsult

          George_George wrote:

          I am using StringWriter

          That doesn't write to a file, does it? Always use an XmlTextWriter for writing XML documents to files.

          G Offline
          G Offline
          George_George
          wrote on last edited by
          #7

          Thanks PIEBALDconsult, I only need a memory representation (string) for XML. No need to write to a file. My question is, why even if I set UTF-8 property, but in my original question and code, UTF-16 header is displayed? regards, George

          P 1 Reply Last reply
          0
          • G George_George

            Thanks PIEBALDconsult, I only need a memory representation (string) for XML. No need to write to a file. My question is, why even if I set UTF-8 property, but in my original question and code, UTF-16 header is displayed? regards, George

            P Offline
            P Offline
            PIEBALDconsult
            wrote on last edited by
            #8

            I'm guessing that it's because .net strings are two-byte Unicode, but I could easily be wrong.

            G 1 Reply Last reply
            0
            • P PIEBALDconsult

              I'm guessing that it's because .net strings are two-byte Unicode, but I could easily be wrong.

              G Offline
              G Offline
              George_George
              wrote on last edited by
              #9

              Thanks PIEBALDconsult, I agree C# is using UTF-16 as internal encoding approach, but why the XML header UTF-8 which is already set is overwritten by UTF-16? regards, George

              P 1 Reply Last reply
              0
              • G George_George

                Thanks PIEBALDconsult, I agree C# is using UTF-16 as internal encoding approach, but why the XML header UTF-8 which is already set is overwritten by UTF-16? regards, George

                P Offline
                P Offline
                PIEBALDconsult
                wrote on last edited by
                #10

                Because doing otherwise would be wrong. What problem are you trying to solve?

                G 1 Reply Last reply
                0
                • P PIEBALDconsult

                  Because doing otherwise would be wrong. What problem are you trying to solve?

                  G Offline
                  G Offline
                  George_George
                  wrote on last edited by
                  #11

                  Thanks PIEBALDconsult, I do not quite understand why I set UTF-8 header, but UTF-16 is output in my original sample. What is the internal operations which steals and changes my original header? :-) regards, George

                  P 1 Reply Last reply
                  0
                  • G George_George

                    Thanks PIEBALDconsult, I do not quite understand why I set UTF-8 header, but UTF-16 is output in my original sample. What is the internal operations which steals and changes my original header? :-) regards, George

                    P Offline
                    P Offline
                    PIEBALDconsult
                    wrote on last edited by
                    #12

                    The XmlDocument.Save and XmlTextWriter operation will only write well-formed XML. It knows that the StringWriter uses UTF-16 so it sets the proper encoding. Encoding in UTF-16, but saying it's UTF-8 would yield mal-formed XML. If you want UTF-8, write it to a file, a StringBuilder won't do it.

                    G 1 Reply Last reply
                    0
                    • G George_George

                      Thanks wmba, 1. I have solved this issue from your help. Here is my code. Could you review whether it is correct please?

                      using System;
                      using System.Text;
                      using System.IO;
                      using System.Xml;

                      class FSOpenWrite
                      {
                      public static void Main()
                      {
                      StringWriter stream = new StringWriter();
                      XmlTextWriter writer = new XmlTextWriter(stream);
                      writer.WriteStartElement("Stock");
                      writer.WriteAttributeString("Symbol", "123");
                      writer.WriteElementString("Price", "456");
                      writer.WriteElementString("Change", "abc");
                      writer.WriteElementString("Volume", "edd");
                      writer.WriteEndElement();

                          string content = stream.ToString();
                      
                          return;
                      }
                      

                      }

                      2. Why in my original code in question, even if I set UTF-16, but I can only use UTF-8 encoding? regards, George

                      M Offline
                      M Offline
                      mlsteeves
                      wrote on last edited by
                      #13

                      With your code sample, you are missing the part to tells the XmlTextWriter what encoding to use. If you use any class that is derived from a TextWriter (like StringWriter), then you can't specify the encoding. The reason for this is that the base string in a StringWriter is UTF-16, so you have no options for using a different Encoding. If however, you use a MemoryStream, or something derived directly from Stream, then you can specify a different Encoding. Anyway, here is a code snippet that describes this:

                              MemoryStream ms = new MemoryStream();
                      
                              //Set the encoding to UTF8:
                              XmlTextWriter writer = new XmlTextWriter(ms, Encoding.UTF8);
                      
                              //Just makes the xml easier to read:
                              writer.Formatting = Formatting.Indented;
                      
                              //Write out our xml document:
                              writer.WriteStartDocument();
                              writer.WriteStartElement("Stock");
                              writer.WriteAttributeString("Symbol", "123");
                              writer.WriteElementString("Price", "456");
                              writer.WriteElementString("Change", "abc");
                              writer.WriteElementString("Volume", "edd");
                              writer.WriteEndElement();
                      
                              //Reset our stream's read pointer, so we can read back from our memory stream:
                              writer.Flush();
                              ms.Seek(0, SeekOrigin.Begin);
                      
                              //Read our memory stream, and output to console:
                              StreamReader sr = new StreamReader(ms);
                              string content = sr.ReadToEnd();
                              Console.WriteLine(content);
                      
                              return;
                      

                      It is important to note that you could have used a similar technique in your original code when you used the XmlDocument. The reason why you were getting the UTF-16 encoding is because your underlying writer class was a string. StringWriter writes directly to a string (or possibly a StringBuilder). And because strings in .NET are all UTF-16, that is the encoding you got. When you write directly to a stream (FileStream, MemoryStream, etc), then you are not writing to a string, but conceptually you are writing to just an array of bytes. Because of that you can specify a different encoding. Anyway, I hope this helps you out.

                      G 1 Reply Last reply
                      0
                      • P PIEBALDconsult

                        The XmlDocument.Save and XmlTextWriter operation will only write well-formed XML. It knows that the StringWriter uses UTF-16 so it sets the proper encoding. Encoding in UTF-16, but saying it's UTF-8 would yield mal-formed XML. If you want UTF-8, write it to a file, a StringBuilder won't do it.

                        G Offline
                        G Offline
                        George_George
                        wrote on last edited by
                        #14

                        Can I set the encoding of StringWriter from UTF-16 to UTF-8? regards, George

                        P 1 Reply Last reply
                        0
                        • M mlsteeves

                          With your code sample, you are missing the part to tells the XmlTextWriter what encoding to use. If you use any class that is derived from a TextWriter (like StringWriter), then you can't specify the encoding. The reason for this is that the base string in a StringWriter is UTF-16, so you have no options for using a different Encoding. If however, you use a MemoryStream, or something derived directly from Stream, then you can specify a different Encoding. Anyway, here is a code snippet that describes this:

                                  MemoryStream ms = new MemoryStream();
                          
                                  //Set the encoding to UTF8:
                                  XmlTextWriter writer = new XmlTextWriter(ms, Encoding.UTF8);
                          
                                  //Just makes the xml easier to read:
                                  writer.Formatting = Formatting.Indented;
                          
                                  //Write out our xml document:
                                  writer.WriteStartDocument();
                                  writer.WriteStartElement("Stock");
                                  writer.WriteAttributeString("Symbol", "123");
                                  writer.WriteElementString("Price", "456");
                                  writer.WriteElementString("Change", "abc");
                                  writer.WriteElementString("Volume", "edd");
                                  writer.WriteEndElement();
                          
                                  //Reset our stream's read pointer, so we can read back from our memory stream:
                                  writer.Flush();
                                  ms.Seek(0, SeekOrigin.Begin);
                          
                                  //Read our memory stream, and output to console:
                                  StreamReader sr = new StreamReader(ms);
                                  string content = sr.ReadToEnd();
                                  Console.WriteLine(content);
                          
                                  return;
                          

                          It is important to note that you could have used a similar technique in your original code when you used the XmlDocument. The reason why you were getting the UTF-16 encoding is because your underlying writer class was a string. StringWriter writes directly to a string (or possibly a StringBuilder). And because strings in .NET are all UTF-16, that is the encoding you got. When you write directly to a stream (FileStream, MemoryStream, etc), then you are not writing to a string, but conceptually you are writing to just an array of bytes. Because of that you can specify a different encoding. Anyway, I hope this helps you out.

                          G Offline
                          G Offline
                          George_George
                          wrote on last edited by
                          #15

                          I like your sample, wmba! So, cool!! :-) regards, George

                          1 Reply Last reply
                          0
                          • G George_George

                            Can I set the encoding of StringWriter from UTF-16 to UTF-8? regards, George

                            P Offline
                            P Offline
                            PIEBALDconsult
                            wrote on last edited by
                            #16

                            NO, goddammit! You can't! .net strings are UTF-16, and that's it, end of story!

                            G 1 Reply Last reply
                            0
                            • P PIEBALDconsult

                              NO, goddammit! You can't! .net strings are UTF-16, and that's it, end of story!

                              G Offline
                              G Offline
                              George_George
                              wrote on last edited by
                              #17

                              Thanks PIEBALDconsult, I have solved this issue by using MemoryStream. :-) regards, George

                              1 Reply Last reply
                              0
                              Reply
                              • Reply as topic
                              Log in to reply
                              • Oldest to Newest
                              • Newest to Oldest
                              • Most Votes


                              • Login

                              • Don't have an account? Register

                              • Login or register to search.
                              • First post
                                Last post
                              0
                              • Categories
                              • Recent
                              • Tags
                              • Popular
                              • World
                              • Users
                              • Groups