MSXML 3 and escape sequences
-
I have a problem with a string containing the £ symbol (GB pounds). I can set the text of a child node using m_pXMLDoc->createTextNode(pszText) to a string containing £ chars ok, but if I try to reparse the xml using MSXML e.g. try displaying it in IE it complains about an invalid character i.e. the £. No problem I thought, I'll replace all £ chars with their escape sequence i.e. &#_163; (please ignore the _ - I had to include it to prevent HTML replacing the escape sequence with a £). All appears to work ok e.g. I can reparse the XML with MSXML and display it in IE, but MSXML has escaped my escape sequence i.e. instead of &#_163; the string contains £. How can I stop MSXML doing this to my escape sequence? Or, how can I include a £ char in a string. I have to use MSXML 3, I can't use a DTD/Schema (don't ask) and I can't use a CDATA section (again, don't ask). Any help/advice would be much appreciated. Gavin
-
I have a problem with a string containing the £ symbol (GB pounds). I can set the text of a child node using m_pXMLDoc->createTextNode(pszText) to a string containing £ chars ok, but if I try to reparse the xml using MSXML e.g. try displaying it in IE it complains about an invalid character i.e. the £. No problem I thought, I'll replace all £ chars with their escape sequence i.e. &#_163; (please ignore the _ - I had to include it to prevent HTML replacing the escape sequence with a £). All appears to work ok e.g. I can reparse the XML with MSXML and display it in IE, but MSXML has escaped my escape sequence i.e. instead of &#_163; the string contains £. How can I stop MSXML doing this to my escape sequence? Or, how can I include a £ char in a string. I have to use MSXML 3, I can't use a DTD/Schema (don't ask) and I can't use a CDATA section (again, don't ask). Any help/advice would be much appreciated. Gavin
this is a classic problem - you will run into it all the time. the important point is that £ is resolved by the parser, i.e. by the time you get a look in it has been converted to its character. generally these issues are caused by the
encoding
attribute that is specified at the top of your xml<?xml version="1.0" encoding="ISO-8859-1" ?>
People have a nasty habit of sticking any old encoding in here, without considering what encoding they are using - for example different versions of windows (95, 98, NT) have used different encoding schemes. Also be careful if you use XSLT - you have to tell it how to output it<xsl:output encoding="iso-8859-1"/>
. Otherwise you will end up in a world of pain ;) my general advice is to always replace any characters with values below 32 and above 128 with a tag - this allows you to handle the problem. consider using a <character code="163"/> tag, this way you can do what you like. hope this helps.
"When the only tool you have is a hammer, a sore thumb you will have."