Need to write the ASCII (230) in XML content section using MSXML 4.0.
-
I need to write the ASCII (230) in XML content section using MSXML 4.0. Premise: XML incompatible with third party tool. Under MSXML I am converting my string to BSTR (wide char), and it end’s up in the XML file back into the ASCII equivalent. Unfortunately it does not work for the accented characters when in the content section of an element. Instead that part of the string stays in wide-char in the xml file. This is trouble since the XML file is passed to a third party utility that does not support this wide-char, nor the usual mapping like [& eaccent;] nor a XML header indicating UTF-8 or other. For example: I save the string: <Test>Char é</Test> It will actually be saved as: <Test>Char Ç</Test> Where 'Ç' is an example of some accented char (not necessarily 'é'). Note that you must open the XML in binary to see this. Notepad, Visual Studio or XmlPad all display the accented character fine (they detects the wide char, unlike my third party tool). If I open the XML using a hex editor to replace the 2 characters by the ASCII # 230, the third party tool works fine. So my question is: How can I force a char-230 (or other bad char) to actually be into the XML file using MSXML 4.0? My code to create the XML node: Example: CreateElementNode( myXmlDoc, "Node_Name", "Test é" );
BSTR AsciiToBSTR( LPCTSTR pszFText )
{
// TROUBLE with: "àáâäçèéêëìíîòóô"; (and more?)BSTR bsText = NULL; WCHAR \*wszURL = NULL; int wSize = strlen( pszFText ); ::MultiByteToWideChar( CP\_ACP, 0, pszFText, -1, wszURL, wSize ); bsText = SysAllocString( wszURL ); free( wszURL ); return bsText;
}
MSXML2::IXMLDOMNodePtr CreateElementNode( MSXML2::IXMLDOMDocument2Ptr pXMLDoc, string sName, string sNamespaceURI )
{
MSXML2::IXMLDOMNodePtr node;BSTR bsName = NULL; bsName = AsciiToBSTR( sName ); BSTR bsNamespaceURI = NULL; bsNamespaceURI = AsciiToBSTR( sNamespaceURI ); VARIANT vtype; vtype.vt = VT\_I4; V\_I4( &vtype ) = (int)MSXML2::NODE\_ELEMENT; node = pXMLDoc->createNode( vtype, bsName, bsNamespaceURI ); return node;
}
-
I need to write the ASCII (230) in XML content section using MSXML 4.0. Premise: XML incompatible with third party tool. Under MSXML I am converting my string to BSTR (wide char), and it end’s up in the XML file back into the ASCII equivalent. Unfortunately it does not work for the accented characters when in the content section of an element. Instead that part of the string stays in wide-char in the xml file. This is trouble since the XML file is passed to a third party utility that does not support this wide-char, nor the usual mapping like [& eaccent;] nor a XML header indicating UTF-8 or other. For example: I save the string: <Test>Char é</Test> It will actually be saved as: <Test>Char Ç</Test> Where 'Ç' is an example of some accented char (not necessarily 'é'). Note that you must open the XML in binary to see this. Notepad, Visual Studio or XmlPad all display the accented character fine (they detects the wide char, unlike my third party tool). If I open the XML using a hex editor to replace the 2 characters by the ASCII # 230, the third party tool works fine. So my question is: How can I force a char-230 (or other bad char) to actually be into the XML file using MSXML 4.0? My code to create the XML node: Example: CreateElementNode( myXmlDoc, "Node_Name", "Test é" );
BSTR AsciiToBSTR( LPCTSTR pszFText )
{
// TROUBLE with: "àáâäçèéêëìíîòóô"; (and more?)BSTR bsText = NULL; WCHAR \*wszURL = NULL; int wSize = strlen( pszFText ); ::MultiByteToWideChar( CP\_ACP, 0, pszFText, -1, wszURL, wSize ); bsText = SysAllocString( wszURL ); free( wszURL ); return bsText;
}
MSXML2::IXMLDOMNodePtr CreateElementNode( MSXML2::IXMLDOMDocument2Ptr pXMLDoc, string sName, string sNamespaceURI )
{
MSXML2::IXMLDOMNodePtr node;BSTR bsName = NULL; bsName = AsciiToBSTR( sName ); BSTR bsNamespaceURI = NULL; bsNamespaceURI = AsciiToBSTR( sNamespaceURI ); VARIANT vtype; vtype.vt = VT\_I4; V\_I4( &vtype ) = (int)MSXML2::NODE\_ELEMENT; node = pXMLDoc->createNode( vtype, bsName, bsNamespaceURI ); return node;
}
I am not trying to nitpick (honestly :) ) but there is no char 230 in ASCII. ASCII is a 7-bit encoding and covers only values 0-127. What you are doing there is basically using the computer's system code page (CP_ACP) which is really a configurable thing, so it may work on your machine, but not on someone else's. Anyway, Ç is just a UTF-8 representation of your accented character. UTF-8 is the default encoding for XML files. If the 3rd party tool does not recognize UTF-8, you'll need to explicitelly save your XML document in the encoding it will recognize. Also, you'll need to insert the "encoding" instruction accordingly.
-
I am not trying to nitpick (honestly :) ) but there is no char 230 in ASCII. ASCII is a 7-bit encoding and covers only values 0-127. What you are doing there is basically using the computer's system code page (CP_ACP) which is really a configurable thing, so it may work on your machine, but not on someone else's. Anyway, Ç is just a UTF-8 representation of your accented character. UTF-8 is the default encoding for XML files. If the 3rd party tool does not recognize UTF-8, you'll need to explicitelly save your XML document in the encoding it will recognize. Also, you'll need to insert the "encoding" instruction accordingly.
“you'll need to explicitelly save your XML document in the encoding it will recognize” That’s what I am trying to do, here is an example in binary: (the HEX values are almost random, not sure if they map to an actual accented char). I have the string: “54, 65, 73, 74, 3a” The BSTR becomes: “54, 00, 65, 00, 73, 00, 74, 00, 3a, 00” The XML file opened in Binary is: “54, 65, 73, 74, 00, 3a, 00” So a mix of ASCII and Wide Char (???). If I replace (manually) “74, 00, 3a, 00” by “74, 3a” then all is fine. Certainly their must be a way to force the “extended” chars to be re-mapped like all other chars? Is that what you mean by “you'll need to insert the "encoding" instruction accordingly”? But how can I do this thru MSXML?
-
“you'll need to explicitelly save your XML document in the encoding it will recognize” That’s what I am trying to do, here is an example in binary: (the HEX values are almost random, not sure if they map to an actual accented char). I have the string: “54, 65, 73, 74, 3a” The BSTR becomes: “54, 00, 65, 00, 73, 00, 74, 00, 3a, 00” The XML file opened in Binary is: “54, 65, 73, 74, 00, 3a, 00” So a mix of ASCII and Wide Char (???). If I replace (manually) “74, 00, 3a, 00” by “74, 3a” then all is fine. Certainly their must be a way to force the “extended” chars to be re-mapped like all other chars? Is that what you mean by “you'll need to insert the "encoding" instruction accordingly”? But how can I do this thru MSXML?
You really just need to add the processing instruction to specify the correct encoding, and MSXML will save it correctly. For the details on how to do it, take a look at this MSDN article[^]
-
I need to write the ASCII (230) in XML content section using MSXML 4.0. Premise: XML incompatible with third party tool. Under MSXML I am converting my string to BSTR (wide char), and it end’s up in the XML file back into the ASCII equivalent. Unfortunately it does not work for the accented characters when in the content section of an element. Instead that part of the string stays in wide-char in the xml file. This is trouble since the XML file is passed to a third party utility that does not support this wide-char, nor the usual mapping like [& eaccent;] nor a XML header indicating UTF-8 or other. For example: I save the string: <Test>Char é</Test> It will actually be saved as: <Test>Char Ç</Test> Where 'Ç' is an example of some accented char (not necessarily 'é'). Note that you must open the XML in binary to see this. Notepad, Visual Studio or XmlPad all display the accented character fine (they detects the wide char, unlike my third party tool). If I open the XML using a hex editor to replace the 2 characters by the ASCII # 230, the third party tool works fine. So my question is: How can I force a char-230 (or other bad char) to actually be into the XML file using MSXML 4.0? My code to create the XML node: Example: CreateElementNode( myXmlDoc, "Node_Name", "Test é" );
BSTR AsciiToBSTR( LPCTSTR pszFText )
{
// TROUBLE with: "àáâäçèéêëìíîòóô"; (and more?)BSTR bsText = NULL; WCHAR \*wszURL = NULL; int wSize = strlen( pszFText ); ::MultiByteToWideChar( CP\_ACP, 0, pszFText, -1, wszURL, wSize ); bsText = SysAllocString( wszURL ); free( wszURL ); return bsText;
}
MSXML2::IXMLDOMNodePtr CreateElementNode( MSXML2::IXMLDOMDocument2Ptr pXMLDoc, string sName, string sNamespaceURI )
{
MSXML2::IXMLDOMNodePtr node;BSTR bsName = NULL; bsName = AsciiToBSTR( sName ); BSTR bsNamespaceURI = NULL; bsNamespaceURI = AsciiToBSTR( sNamespaceURI ); VARIANT vtype; vtype.vt = VT\_I4; V\_I4( &vtype ) = (int)MSXML2::NODE\_ELEMENT; node = pXMLDoc->createNode( vtype, bsName, bsNamespaceURI ); return node;
}
A reply to myself for does not followingthe secondary thread: => Saving the character "E9" (é) it becomes "E9 00" in BSTR (no problem) but MSXML writes "C3 A9" in the actual XML file. Why is it doing this at all? Then if I edit the XML to replace "C3 A9" for "E9" everything works! How can I force MSXML to write "E9" (what I pass to it) instead of translating my "E9" into "C3 A9"?
-
You really just need to add the processing instruction to specify the correct encoding, and MSXML will save it correctly. For the details on how to do it, take a look at this MSDN article[^]
I already tried to no avail. The solution indicated is to add a header like but the third party tool does not support this. If the line is present the tool craps on me. => Saving the character "E9" (é) it becomes "E9 00" in BSTR (no problem) but MSXML writes "C3 A9" in the actual XML file. Why is it doing this at all? Then if I edit the XML to replace "C3 A9" for "E9" everything works! How can I force MSXML to write "E9" (what I pass to it) instead of translating my "E9" into "C3 A9"?
-
A reply to myself for does not followingthe secondary thread: => Saving the character "E9" (é) it becomes "E9 00" in BSTR (no problem) but MSXML writes "C3 A9" in the actual XML file. Why is it doing this at all? Then if I edit the XML to replace "C3 A9" for "E9" everything works! How can I force MSXML to write "E9" (what I pass to it) instead of translating my "E9" into "C3 A9"?
OK, so now I know that C3 A9 is the UTF-8 for my extended ascii. What I need to know is how to force MSXML to write extended ASCII, without writing the at the start of the page (if present the external tool crashes).
-
I already tried to no avail. The solution indicated is to add a header like but the third party tool does not support this. If the line is present the tool craps on me. => Saving the character "E9" (é) it becomes "E9 00" in BSTR (no problem) but MSXML writes "C3 A9" in the actual XML file. Why is it doing this at all? Then if I edit the XML to replace "C3 A9" for "E9" everything works! How can I force MSXML to write "E9" (what I pass to it) instead of translating my "E9" into "C3 A9"?
So if I understand you correctly - if you add the processing instruction
encoding='ISO-8859-1'
, MSXML outputs just E9 and not C3 A9, but the third party tool crashes? If that is the case, it is really the problem with the tool - MSXML's output is good. Having said that, if you want to get output encoded as windows cp1252 or ISO-8859-1 (still not sure what you mean by 'extended ascii' - there are numerous encodings that extend ascii) without the processing instruction, one way I can think of is to write your XML document to a temp UTF-8 file and then convert that file from UTF-8 to Windows CP 1252 (or whatever encoding you really want) in a separate step. Just remember - this will actually be an invalid XML file (because it lacks the processing instruction), but this tool may be happy with it.