Shortcoming with XmlReader (2.0)
-
I lost a few hours to this problem this morning, and after I discovered the workaround I felt like ranting, but now I've cooled down a bit (and had a beer). Anyway... I have an XML file with embedded stylesheets (XSL). One of these stylesheets transforms the XML to CSV. An important part of this transform is the insertion of linefeeds, I do this with
<xsl:text> </xsl:text>
, and it worked fine until I started using an XmlReader to perform a validated read. This works; the resultant XmlElement's text is a linefeed.System.Xml.XmlDocument doc = new System.Xml.XmlDocument() ;
doc.Load ( @"\XMLtest.xml" ) ;This doesn't work; the resultant XmlElement's text is empty.
System.Xml.XmlDocument doc = new System.Xml.XmlDocument() ;
System.Xml.XmlReaderSettings rs = new System.Xml.XmlReaderSettings() ;(set the XmlReaderSetting's properties.)
doc.Load ( System.Xml.XmlReader.Create ( @"\XMLtest.xml" , rs ) ) ;
So then, looking through the help for XmlReader I see: XmlReader objects created by the Create method expand all entities automatically. So I assume that the entity gets expanded, then the linefeed (whitespace) is determined to be non-essential and removed, leaving an empty value. Looking further I see: If you must expand entities on request (readers created by the Create method expand all entities), or if you do not want your text content normalized, use the XmlTextReader class. Now wait a minute! Isn't the XmlTextReader, not recommended practice? In the Microsoft .NET Framework version 2.0 release, the recommended practice is to create XmlReader instances using the System.Xml.XmlReader.Create method. XmlReader objects created by the Create method are, by default, more conformant than the XmlTextReader implementation. So the workaround I chose is:
System.Xml.XmlDocument doc = new System.Xml.XmlDocument() ;
System.Xml.XmlReaderSettings rs = new System.Xml.XmlReaderSettings() ;(set the XmlReaderSetting's properties.)
doc.Load ( System.Xml.XmlReader.Create ( new System.Xml.XmlTextReader ( @"\XMLtest.xml" ) , rs ) ) ;
It gets the job done, but it seems odd that there isn't a property in the XmlReaderSettings to do this. I tried the IgnoreWhitespace and CheckCharacters properties but to no avail. My question is: Does .net 3.0 solve this issue? Does anyone else have cleaner workaroun
-
I lost a few hours to this problem this morning, and after I discovered the workaround I felt like ranting, but now I've cooled down a bit (and had a beer). Anyway... I have an XML file with embedded stylesheets (XSL). One of these stylesheets transforms the XML to CSV. An important part of this transform is the insertion of linefeeds, I do this with
<xsl:text> </xsl:text>
, and it worked fine until I started using an XmlReader to perform a validated read. This works; the resultant XmlElement's text is a linefeed.System.Xml.XmlDocument doc = new System.Xml.XmlDocument() ;
doc.Load ( @"\XMLtest.xml" ) ;This doesn't work; the resultant XmlElement's text is empty.
System.Xml.XmlDocument doc = new System.Xml.XmlDocument() ;
System.Xml.XmlReaderSettings rs = new System.Xml.XmlReaderSettings() ;(set the XmlReaderSetting's properties.)
doc.Load ( System.Xml.XmlReader.Create ( @"\XMLtest.xml" , rs ) ) ;
So then, looking through the help for XmlReader I see: XmlReader objects created by the Create method expand all entities automatically. So I assume that the entity gets expanded, then the linefeed (whitespace) is determined to be non-essential and removed, leaving an empty value. Looking further I see: If you must expand entities on request (readers created by the Create method expand all entities), or if you do not want your text content normalized, use the XmlTextReader class. Now wait a minute! Isn't the XmlTextReader, not recommended practice? In the Microsoft .NET Framework version 2.0 release, the recommended practice is to create XmlReader instances using the System.Xml.XmlReader.Create method. XmlReader objects created by the Create method are, by default, more conformant than the XmlTextReader implementation. So the workaround I chose is:
System.Xml.XmlDocument doc = new System.Xml.XmlDocument() ;
System.Xml.XmlReaderSettings rs = new System.Xml.XmlReaderSettings() ;(set the XmlReaderSetting's properties.)
doc.Load ( System.Xml.XmlReader.Create ( new System.Xml.XmlTextReader ( @"\XMLtest.xml" ) , rs ) ) ;
It gets the job done, but it seems odd that there isn't a property in the XmlReaderSettings to do this. I tried the IgnoreWhitespace and CheckCharacters properties but to no avail. My question is: Does .net 3.0 solve this issue? Does anyone else have cleaner workaroun
I think you may need to use the xml:space attribute in you XML document. It has two values, "default" and "preserve". "default" value tells the XML processor to handle space as necessary. Also, this is the default behavior of the processor. "preserve" means to maintain whitespace as is. Also, is not an entity, and > is an entity. Example: <poem xml:space="preserve"> ... all whitespace will be preserved in all child nodes (remember text is considered a node) ... <poem/> Unfortunately, I found the following: http://www.stylusstudio.com/xmldev/200307/post70060.html#[^] -- modified at 20:16 Friday 1st June, 2007
"We make a living by what we get, we make a life by what we give." --Winston Churchill
-
I think you may need to use the xml:space attribute in you XML document. It has two values, "default" and "preserve". "default" value tells the XML processor to handle space as necessary. Also, this is the default behavior of the processor. "preserve" means to maintain whitespace as is. Also, is not an entity, and > is an entity. Example: <poem xml:space="preserve"> ... all whitespace will be preserved in all child nodes (remember text is considered a node) ... <poem/> Unfortunately, I found the following: http://www.stylusstudio.com/xmldev/200307/post70060.html#[^] -- modified at 20:16 Friday 1st June, 2007
"We make a living by what we get, we make a life by what we give." --Winston Churchill
I'll give that a try, thanks.
-
I think you may need to use the xml:space attribute in you XML document. It has two values, "default" and "preserve". "default" value tells the XML processor to handle space as necessary. Also, this is the default behavior of the processor. "preserve" means to maintain whitespace as is. Also, is not an entity, and > is an entity. Example: <poem xml:space="preserve"> ... all whitespace will be preserved in all child nodes (remember text is considered a node) ... <poem/> Unfortunately, I found the following: http://www.stylusstudio.com/xmldev/200307/post70060.html#[^] -- modified at 20:16 Friday 1st June, 2007
"We make a living by what we get, we make a life by what we give." --Winston Churchill
George L. Jackson wrote:
Also, is not an entity, and > is an entity.
It's character reference[^]! Awesome, I didn't know there is difference between them till now.
"Throughout human history, we have been dependent on machines to survive. Fate, it seems, is not without a sense of irony. " - Morpheus "Real men use mspaint for writing code and notepad for designing graphics." - Anna-Jayne Metcalfe