XML and performance
-
I need to scan a very big XML file (several millions of records), and I have discovered that access to data is very slow. On a Pentium IV / 3 Ghz, the following instruction need 9 milliseconds : BSTR BSTR_Result; IXMLDOMNodePtr pNode; // scan tree of nodes ... pNode = pNode->nextSibling; // very quick (0 ms) // get the text of the node pNode->get_text( &BSTR_Result); // < need 9 milliseconds To scan the whole file, the call to "nextSibling" is very quick. But the access to value is very slow ("get_text"). How can I increase performances ? Best regards.
-
I need to scan a very big XML file (several millions of records), and I have discovered that access to data is very slow. On a Pentium IV / 3 Ghz, the following instruction need 9 milliseconds : BSTR BSTR_Result; IXMLDOMNodePtr pNode; // scan tree of nodes ... pNode = pNode->nextSibling; // very quick (0 ms) // get the text of the node pNode->get_text( &BSTR_Result); // < need 9 milliseconds To scan the whole file, the call to "nextSibling" is very quick. But the access to value is very slow ("get_text"). How can I increase performances ? Best regards.
marcelcerdanjunior wrote:
How can I increase performances ?
marcelcerdanjunior wrote:
How can I increase performances ?
You can try switching to a SAX parser but I doubt that will satisfy your 9ms requirement. It is far more likely that you are abusing XML. XML is NOT a replacement for Databases. You will probably have to use some form of optimized database to satisfy your 9ms requirement.
Last modified: after originally posted -- clicked wrong button
led mike
-
marcelcerdanjunior wrote:
How can I increase performances ?
marcelcerdanjunior wrote:
How can I increase performances ?
You can try switching to a SAX parser but I doubt that will satisfy your 9ms requirement. It is far more likely that you are abusing XML. XML is NOT a replacement for Databases. You will probably have to use some form of optimized database to satisfy your 9ms requirement.
Last modified: after originally posted -- clicked wrong button
led mike
It is strange that DOT.NET gives correct performances (less than 1 second to find a record in million list), but not the msxml4.dll interface.
-
It is strange that DOT.NET gives correct performances (less than 1 second to find a record in million list), but not the msxml4.dll interface.
marcelcerdanjunior wrote:
It is strange that DOT.NET gives correct performances (less than 1 second to find a record in million list), but not the msxml4.dll interface.
Why is that strange? It would be strange if they were the same thing, but since they are not the same thing why is it strange. There is a web page out there somewhere that lists the multitude of XML parsers with performance information. But again, at a million records that just screams "Database".
led mike
-
It is strange that DOT.NET gives correct performances (less than 1 second to find a record in million list), but not the msxml4.dll interface.
If you are trying to find 1 record in a list of a million, why not use selectSingleNode() instead of looping through each node yourself?
-
If you are trying to find 1 record in a list of a million, why not use selectSingleNode() instead of looping through each node yourself?
Thanks, I have tried your suggest, and I have the following result with selectSingleNode(.) call : (ms = milliseconds) /* nb records table load duration last record access time identifier to access ----------------------------------------------------------------------------------- 1000000 75000 ms "999999" ~ 43000 ms 100000 2119 ms "99999" 52 ms 10000 160 ms "9999" 5 ms */ My table looks like : Bigtest 0000000 Guest Guest 0 06.00.00.00.00 true 0000001 Guest Guest 1 06.00.00.00.00 true 0000002 Guest Guest 2 06.00.00.00.00 true ...