Converting Doc file to XML & read the XML
-
I am creating an web application which need to convert document files into XML. Then read the xml files for specific words in specific format. I am using Microsoft.Office.Interop for converting the document files to xml .The files are getting generated but with lots of formating information which leads to heavy file. I need an help to write a code which can reduce the xml files by removing the unwanted document formating. Or can be preserved if required. Thanks in advance.
Learner always
-
I am creating an web application which need to convert document files into XML. Then read the xml files for specific words in specific format. I am using Microsoft.Office.Interop for converting the document files to xml .The files are getting generated but with lots of formating information which leads to heavy file. I need an help to write a code which can reduce the xml files by removing the unwanted document formating. Or can be preserved if required. Thanks in advance.
Learner always
To get rid of the document format, you may consider generating the xml document on your own. That being said, you can use Automation or a third party component (a better option IMO) to access the word document, and use xsl to transform data to generate a light xml file.