Which my choice? DOM or SAX

Clark John

I have a big XML file and I want to use some XSLT file to transform it, which is the best choice? I used the DOM process the file, but I found It need a very huge memory. So I used the SAX, but it cann't transform the file with XSLT file.

Paul Watson

Clark John wrote: So I used the SAX, but it cann't transform the file with XSLT file. Brief discussion on the topic[^] Not having used SAX I am no authority but from what I know you use SAX when you want to do simple enough tasks on large XML documents. I think the main thing about SAX is that it is a stream or forward only type method, not like the DOM. Which is why XSL via SAX could be a b with an itch (e.g. XSL templates jump back and forth in an XML doc.)

Paul Watson
Bluegrass
Cape Town, South Africa

Ray Cassick wrote: Well I am not female, not gay and I am not Paul Watson

Clark John

Thanks. But the DOM process a 3.89M XML file need about 100M memory, how can I reduce the memory dosage on DOM.

Paul Watson

Clark John wrote: But the DOM process a 3.89M XML file need about 100M memory, how can I reduce the memory dosage on DOM. Hmmm, I have not had any experience with working with such large XML files (I would consider 3.89M big for an XML file, must have a ton of entries.) Normally I use XML as an intermediary format (database -> xml -> XHTML/whatever) so the files never get big. Just a thought and I may be totally off the mark but you might just open the XML file as a normal text file and then extract the block of nodes you want out of it and save those as a subset file of the original XML file. Then run that through DOM. Also XSL on a 3.89M file could potentially be a killer. This CP article may have some ideas for you and this seems like a starting point for some info. Good luck :)

Paul Watson
Bluegrass
Cape Town, South Africa

Ray Cassick wrote: Well I am not female, not gay and I am not Paul Watson

Michael A Barnhart

What SAX implementation are you using? You can definitely transform an xml data set with either (assuming your implementation supports transformations.) However what you have to understand is that the order of processing is not necessarily the same so the XPath expression in the XSLT definition may be different to handle the file depending on a DOM implementation vs a SAX implementation. For large files SAX definitly takes less memory. "We are what we repeatedly do. excellence, then, is not an act, but a habit." Aristotle

Clark John

Think you!:)

Clark John

Can you get me some advise of SAX implementation? Thank you.

Michael A Barnhart

I definitly will be willing to offer some comments. Some of my background to let you understand the limits of what I can offer. For the last 18 months I have not been programming (much), as I am now a system architect (Manager egad.) For our needs we have been building upon the Apache Cocoon project. It however is Java based (again so others have been doing the programming that I have spec'd, I am still a C++ guy.) Due to the above most of our parser usage has been with the Java versions of Apache Xerces and Xalan. A warning here. There are C++ versions but I have found the Xalan is quirky and have stayed away from it. I am assuming you are using the MSXML parser. I have not used the SAX implementation of it. However: For some reference just type in MSXML and SAX into google. I just did and the first page was http://www.perfectxml.com/msxmlsax.asp It looks like it has a intro and several references. Might be a starting point. I then added Transformation to google and found http://www.perfectxml.com/msxmlxslt.asp google must like perfetxml today:) The big issue is here what is happening between the parsers. In DOM you have the whole thing and can process the data in how ever you see fit. In SAX you are responding to events that are thrown as the data stream is read. You may have to think a little here but the impact is the sequence of how you have to respond to those events may be different. I suggest get a tool that will step through or at least display the results of a transformation. Start with a simple example to get a feel of what is happening. Some I can recommend are Marrowsofts ($ but a 30 day eval) StylusStudio (Same) There are some other free tools also. see http://www.garshol.priv.no/download/xmltools/name_ix.html I have used "CookTop" http://www.xmlcooktop.com/ in the above list and for a free tool it works fairly well. I hope this helps some. "We are what we repeatedly do. excellence, then, is not an act, but a habit." Aristotle

Chris Austin

Paul Watson wrote: Hmmm, I have not had any experience with working with such large XML files (I would consider 3.89M big for an XML file, must have a ton of entries.) Actualy you'd be suprised at how big some XML files can get. Paul Watson wrote: Just a thought and I may be totally off the mark but you might just open the XML file as a normal text file and then extract the block of nodes you want out of it and save those as a subset file of the original XML file. Then run that through DOM. That's righ on. At my last place we handeled transactions/data wharehousing for utility companys. Every company sent data in some bizzare flat-file format, EDI, or XML. Depending on the frequency of the transactions, we could see 10mb+ XML files. We were just doing XML -> DB or DB ->XML -> format 'X' Working on the XML files we would open up a stream, and use SAX to quickly parse the XML. At this point one could just use their XSL to verify content (we used a custom rules engine to apply business logic.) Performance wasn't bad with this approach but, I have qualify this by saying that our implementation was done in JAVA. With an optimized JVM.

Clark John

Mr. Manager:;) It's a "perfect XML", thank you!