Editable XML from MSXML
-
Michael A. Barnhart wrote: ... all of that formating is contained in nodes of the parent element(s). I'm not sure I understand that. Actual formatting can be done with XSLT to get a revised file, typically an HTML file, but that is only based on several examples. If I want to get something that looks like (with newlines): [Node1] [Data2]Here is data[/Data2] [/Node1] what I end up with is (no newlines): [Node1][Data2]Here is data[/Data2][/Node1] Even if I just type this into an XML file (with appropriate headers, etc) I can physically enter a newline at the end of each line, and that is what I am missing from the DOM. Are you saying that I can add that "formatting" (the newlines) as additional text nodes, perhaps just containing "\n"? Dave "You can say that again." -- Dept. of Redundancy Dept.
David, David Chamberlain wrote: Are you saying that I can add that "formatting" (the newlines) as additional text nodes, perhaps just containing "\n"? In short yes. Question: Just from a first pass. It appears to me that in your creating the edited file you are skipping "text" nodes. All formating will be in text nodes of some element. In your example you are keeping the single text node that is part of element Data2. i.e. "Here is data". However you are loosing the text nodes that belong to the element called "Node1". Please note that you have an element called Node1. It has at least 3 child nodes one of which is the element called "Data2".Prior to the child element of Node1 called "Data2" is a text node that has a linefeed/carriage return and 2 spaces. You then have an element node called "Data2" which is then followed by a text node which is ended by a single linefeed/carriage return. Have I lost you here? To keep things striaght one must be careful what a child node is. One type is an element. yes you can add formating with XSLT but formating already does exist. I hope this helps. Good ideas are not adopted automatically. They must be driven into practice with courageous patients. -Admiral Rickover. ...
-
Michael A. Barnhart wrote: ... all of that formating is contained in nodes of the parent element(s). I'm not sure I understand that. Actual formatting can be done with XSLT to get a revised file, typically an HTML file, but that is only based on several examples. If I want to get something that looks like (with newlines): [Node1] [Data2]Here is data[/Data2] [/Node1] what I end up with is (no newlines): [Node1][Data2]Here is data[/Data2][/Node1] Even if I just type this into an XML file (with appropriate headers, etc) I can physically enter a newline at the end of each line, and that is what I am missing from the DOM. Are you saying that I can add that "formatting" (the newlines) as additional text nodes, perhaps just containing "\n"? Dave "You can say that again." -- Dept. of Redundancy Dept.
It has been a while since I've done any work with XML but from memory there are options to control how newlines are handled. This wasn't with the MSXML stuff, but I think it is a common issue. Have you had a look to see if there are any options or flags which may resolve this? Beware that if you are interested in working with larg'ish XML files that MSXML can be a dog, in that it uses memory like there is no tommorow and runs very, very slowly. Neville Franks, Author of ED for Windows. www.getsoft.com
-
It has been a while since I've done any work with XML but from memory there are options to control how newlines are handled. This wasn't with the MSXML stuff, but I think it is a common issue. Have you had a look to see if there are any options or flags which may resolve this? Beware that if you are interested in working with larg'ish XML files that MSXML can be a dog, in that it uses memory like there is no tommorow and runs very, very slowly. Neville Franks, Author of ED for Windows. www.getsoft.com
Neville Franks wrote: Beware that if you are interested in working with larg'ish XML files that MSXML can be a dog, in that it uses memory like there is no tommorow and runs very, very slowly. Yes, It makes child nodes for everything. So long as it fits in memory version 4 is much faster than 3. For large files SAX parsing is usually preferable than a DOM parser. Good ideas are not adopted automatically. They must be driven into practice with courageous patients. -Admiral Rickover. ...
-
David, David Chamberlain wrote: Are you saying that I can add that "formatting" (the newlines) as additional text nodes, perhaps just containing "\n"? In short yes. Question: Just from a first pass. It appears to me that in your creating the edited file you are skipping "text" nodes. All formating will be in text nodes of some element. In your example you are keeping the single text node that is part of element Data2. i.e. "Here is data". However you are loosing the text nodes that belong to the element called "Node1". Please note that you have an element called Node1. It has at least 3 child nodes one of which is the element called "Data2".Prior to the child element of Node1 called "Data2" is a text node that has a linefeed/carriage return and 2 spaces. You then have an element node called "Data2" which is then followed by a text node which is ended by a single linefeed/carriage return. Have I lost you here? To keep things striaght one must be careful what a child node is. One type is an element. yes you can add formating with XSLT but formating already does exist. I hope this helps. Good ideas are not adopted automatically. They must be driven into practice with courageous patients. -Admiral Rickover. ...
Okay. So I really should have something like this? [Node1][EOL]"\n"[/EOL] [Data2]Here is data[/Data2][EOL]"\n"[/EOL] [/Node1][EOL]"\n"[/EOL] where the EOL text nodes are effectively "hidden" because they only show up as the newlines? I guess that makes sense, but it sure isn't what I thought it should be. This, of course, is only of use if I really do need to look at or modify the file in an editor. If I just use the web browser to view the file, none of this matters. And, if I only access the file through the application, then none of it matters. Thanks, Dave "You can say that again." -- Dept. of Redundancy Dept.
-
Okay. So I really should have something like this? [Node1][EOL]"\n"[/EOL] [Data2]Here is data[/Data2][EOL]"\n"[/EOL] [/Node1][EOL]"\n"[/EOL] where the EOL text nodes are effectively "hidden" because they only show up as the newlines? I guess that makes sense, but it sure isn't what I thought it should be. This, of course, is only of use if I really do need to look at or modify the file in an editor. If I just use the web browser to view the file, none of this matters. And, if I only access the file through the application, then none of it matters. Thanks, Dave "You can say that again." -- Dept. of Redundancy Dept.
First my appoligies for not being a better writer. I am not getting the story across very well. An element is a node but a node may not be an element. or a node could be an element. It could also be text, processing instructions, comments, etc. In your example: David Chamberlain wrote: [Node1][EOL]"\n"[/EOL] [Data2]Here is data[/Data2][EOL]"\n"[/EOL] [/Node1][EOL]"\n"[/EOL] you have added child elements to the element called node1. What is missing are the child nodes that are not elements, not adding more elements. I guess it has been to many months since I stepped through the MS DOM model. What I did to finally get a better feel of what was going on in the MS DOM model was to create a simple dialog app that when a button was pressed created a MSDOM instance and read in a XML file. I then added code that found the root element of the document and sent it to a function that would get the list of child nodes and looked at what types they were. When I found a node that was an element I recursed back to the function. It helped me see all of the items that were existed. I am not sure if I have this save or not. I left it there and concluded for my needs the class I had written worked fine and I would use it for any manipulation of XML files. I do use the MS DOM to read in files as well as some of the Apache code. Good ideas are not adopted automatically. They must be driven into practice with courageous patients. -Admiral Rickover. ...
-
First my appoligies for not being a better writer. I am not getting the story across very well. An element is a node but a node may not be an element. or a node could be an element. It could also be text, processing instructions, comments, etc. In your example: David Chamberlain wrote: [Node1][EOL]"\n"[/EOL] [Data2]Here is data[/Data2][EOL]"\n"[/EOL] [/Node1][EOL]"\n"[/EOL] you have added child elements to the element called node1. What is missing are the child nodes that are not elements, not adding more elements. I guess it has been to many months since I stepped through the MS DOM model. What I did to finally get a better feel of what was going on in the MS DOM model was to create a simple dialog app that when a button was pressed created a MSDOM instance and read in a XML file. I then added code that found the root element of the document and sent it to a function that would get the list of child nodes and looked at what types they were. When I found a node that was an element I recursed back to the function. It helped me see all of the items that were existed. I am not sure if I have this save or not. I left it there and concluded for my needs the class I had written worked fine and I would use it for any manipulation of XML files. I do use the MS DOM to read in files as well as some of the Apache code. Good ideas are not adopted automatically. They must be driven into practice with courageous patients. -Admiral Rickover. ...
Let me first say that I really appreciate your help, even though what appears to be a simple matter has become quite complicated. Thanks to MS, I'm sure. So, without creating new child elements, I should just create additional text nodes, ending up like this: [Node1] (Node) "\n" (Text, child 1 of Node1) [Data2] (Node, child 2 of Node1) "Here is data" (Text, child 1 of Data2) [/Data2] "\n" (Text, child 3 of Node1) [/Node1] "\n" (Text, child ? of parent-of-Node1) While I hate the vocabulary of "nodes" and "elements," the only real difference I could see was that "elements" allow access to attributes while "nodes" do not. Either one can have children. Dave "You can say that again." -- Dept. of Redundancy Dept.
-
Let me first say that I really appreciate your help, even though what appears to be a simple matter has become quite complicated. Thanks to MS, I'm sure. So, without creating new child elements, I should just create additional text nodes, ending up like this: [Node1] (Node) "\n" (Text, child 1 of Node1) [Data2] (Node, child 2 of Node1) "Here is data" (Text, child 1 of Data2) [/Data2] "\n" (Text, child 3 of Node1) [/Node1] "\n" (Text, child ? of parent-of-Node1) While I hate the vocabulary of "nodes" and "elements," the only real difference I could see was that "elements" allow access to attributes while "nodes" do not. Either one can have children. Dave "You can say that again." -- Dept. of Redundancy Dept.
In general yes to adding the text nodes. Neville's comment about some control options appears to be correct and I just had not noticed. In the following code at one time I received all of the nodes and now I do not receive the nodes that only contain white spaces. I.E. exactly the point you made about missing the formating!!! It is gone now. Hopefully this is a start. The first function initializes the process and reads in a specific file. The second function then steps through it is two different ways. If you experiment with putting non-white space text data in with elements between I think you will see my comment. I am using Win2k with MSXML 4 and tried this out on WinMe also with MSXML 4. Previously I had run something simmilar but with only MSXML 3 installed. If that is the difference or not I can not say. Take Care void CMsDomTestDlg::OnButtonread() { row=0; CComVariant varFileName = (LPCSTR)"ourtest.xml"; VARIANT_BOOL varOkay; HRESULT hr; IXMLDOMDocument *pXML = NULL; hr = CoCreateInstance(CLSID_DOMDocument, NULL, CLSCTX_INPROC_SERVER, IID_IXMLDOMDocument2, (void**)&pXML); ASSERT(SUCCEEDED(hr) && pXML!=NULL); hr = pXML->load(varFileName,&varOkay); IXMLDOMElement *pRoot; if(SUCCEEDED(hr)) { hr = pXML->get_documentElement(&pRoot); if(SUCCEEDED(hr)&&pRoot!=NULL) { LoadChildren((IXMLDOMNode*)pRoot, 0); } else { m_NodeGrid.SetItemText(row,0,"Model Not Read In"); } } Invalidate(TRUE); } void CMsDomTestDlg::LoadChildren(IXMLDOMNode* pNode, int depth) { HRESULT hr; IXMLDOMNode *child; BOOL Method1 = FALSE; CString data; CString td; CComBSTR txt; long listlen,listpos; DOMNodeType type; IXMLDOMNodeList *childlist; if(Method1) { hr = pNode->get_firstChild(&child); while(SUCCEEDED(hr)&&child!=NULL) { row++; data.Format("%d",depth); m_NodeGrid.SetItemText(row,0,data); hr = child->get_nodeType(&type); data.Format("%d",type); m_NodeGrid.SetItemText(row,1,data); hr = child->get_text(&txt); if(SUCCEEDED(hr)) { td = txt; data.Format("%s length of %d",td,td.GetLength()); m_NodeGrid.SetItemText(row,2,data); } else { data = "No Text Data"; m_NodeGrid.SetItemText(row,2,data); } hr = child->get_baseName(&txt); if(SUCCEEDED(hr)) { data = txt; m_NodeGrid.SetItemText(row,3,data); } else { data = "No Base Name"; m_NodeGrid.SetItemText(row,3,data); } if(type == NODE_ELEMENT) { LoadChildren(child,depth+1); } hr = child->get_nextSibling(&child); } } else { hr = pNode->get_childNodes(&childlist); childlist->get_length
-
In general yes to adding the text nodes. Neville's comment about some control options appears to be correct and I just had not noticed. In the following code at one time I received all of the nodes and now I do not receive the nodes that only contain white spaces. I.E. exactly the point you made about missing the formating!!! It is gone now. Hopefully this is a start. The first function initializes the process and reads in a specific file. The second function then steps through it is two different ways. If you experiment with putting non-white space text data in with elements between I think you will see my comment. I am using Win2k with MSXML 4 and tried this out on WinMe also with MSXML 4. Previously I had run something simmilar but with only MSXML 3 installed. If that is the difference or not I can not say. Take Care void CMsDomTestDlg::OnButtonread() { row=0; CComVariant varFileName = (LPCSTR)"ourtest.xml"; VARIANT_BOOL varOkay; HRESULT hr; IXMLDOMDocument *pXML = NULL; hr = CoCreateInstance(CLSID_DOMDocument, NULL, CLSCTX_INPROC_SERVER, IID_IXMLDOMDocument2, (void**)&pXML); ASSERT(SUCCEEDED(hr) && pXML!=NULL); hr = pXML->load(varFileName,&varOkay); IXMLDOMElement *pRoot; if(SUCCEEDED(hr)) { hr = pXML->get_documentElement(&pRoot); if(SUCCEEDED(hr)&&pRoot!=NULL) { LoadChildren((IXMLDOMNode*)pRoot, 0); } else { m_NodeGrid.SetItemText(row,0,"Model Not Read In"); } } Invalidate(TRUE); } void CMsDomTestDlg::LoadChildren(IXMLDOMNode* pNode, int depth) { HRESULT hr; IXMLDOMNode *child; BOOL Method1 = FALSE; CString data; CString td; CComBSTR txt; long listlen,listpos; DOMNodeType type; IXMLDOMNodeList *childlist; if(Method1) { hr = pNode->get_firstChild(&child); while(SUCCEEDED(hr)&&child!=NULL) { row++; data.Format("%d",depth); m_NodeGrid.SetItemText(row,0,data); hr = child->get_nodeType(&type); data.Format("%d",type); m_NodeGrid.SetItemText(row,1,data); hr = child->get_text(&txt); if(SUCCEEDED(hr)) { td = txt; data.Format("%s length of %d",td,td.GetLength()); m_NodeGrid.SetItemText(row,2,data); } else { data = "No Text Data"; m_NodeGrid.SetItemText(row,2,data); } hr = child->get_baseName(&txt); if(SUCCEEDED(hr)) { data = txt; m_NodeGrid.SetItemText(row,3,data); } else { data = "No Base Name"; m_NodeGrid.SetItemText(row,3,data); } if(type == NODE_ELEMENT) { LoadChildren(child,depth+1); } hr = child->get_nextSibling(&child); } } else { hr = pNode->get_childNodes(&childlist); childlist->get_length
From MSXML 4 documentation When a text file is opened with the xmlDoc.load method or the xmlDoc.loadXML method (where xmlDoc is an XML DOM document), the parser strips most white space from the file, unless specifically directed otherwise. The parser notes within each node whether one or more spaces, tabs, newlines, or carriage returns follow the node in the text by setting a flag. This method is efficient, reducing both the size of each XML file and the number of calculations required to redisplay the XML in a browser. However, because this information is lost, an XML document stored in this manner can lose formatting information in its content. Tabs, in particular, can be lost, because they are not formally recognized in the default mode as anything but white space. hr = pXML->put_preserveWhiteSpace(VARIANT_TRUE); Place this before the load. And now they are spaces are back :) Good ideas are not adopted automatically. They must be driven into practice with courageous patients. -Admiral Rickover. ...
-
From MSXML 4 documentation When a text file is opened with the xmlDoc.load method or the xmlDoc.loadXML method (where xmlDoc is an XML DOM document), the parser strips most white space from the file, unless specifically directed otherwise. The parser notes within each node whether one or more spaces, tabs, newlines, or carriage returns follow the node in the text by setting a flag. This method is efficient, reducing both the size of each XML file and the number of calculations required to redisplay the XML in a browser. However, because this information is lost, an XML document stored in this manner can lose formatting information in its content. Tabs, in particular, can be lost, because they are not formally recognized in the default mode as anything but white space. hr = pXML->put_preserveWhiteSpace(VARIANT_TRUE); Place this before the load. And now they are spaces are back :) Good ideas are not adopted automatically. They must be driven into practice with courageous patients. -Admiral Rickover. ...
Michael A. Barnhart wrote: However, because this information is lost, an XML document stored in this manner can lose formatting information in its content. First, I sure didn't expect such investigation into this seemingly trivial matter, but I certainly appreciate all the input and help. Apparently, the preserve white space option is on by default. I had created an XML file in the Visual Studio IDE in order to plan out the structure and content of the file that would eventually be manipulated and maintained by the application program. Once I had that file, I would call the 'load' function, and then let the application do its thing, one operation being the creation of new nodes, as described in the earlier posts. At the end of execution, and calling 'save', I would then load the file back into the IDE to see what happened, and to check that the application created the new nodes properly. At that point, what I was seeing was the same file as I had originally created, with all the 'formatting' (spaces, tabs, and newlines) properly still existant, but the new nodes would appear on a single line. They would be in the proper location in the file, in terms of being after the last child of the node being added to, but there were no new lines. Therefore, although I haven't updated the implementation yet, I believe that the previous suggestion about adding text nodes with newlines (and spaces or tabs if I decide to add those too) will indeed place those into the file and will be preserved upon subsequent 'load' and 'save' operations, by default, even without calling 'preserve white space'. This particular application is running on Win98 with msxml3, although I plan to update that to msxml4 for the speed and memory considerations. I also appreciate your code, as seeing how things are done is the best teacher. But, unfortunately, and probably as no surprise, that raises a few more questions. Based on one of the XML sample projects on CP, I am using the #import [msxml3.dll] in the header file. While it seems to me that the following should be equivalent, one worked and one did not. While I am not familiar with the intricacies of COM, I went with the one that worked. (1) IXMLDOMNode *pNode; pNode = m_pXmlDoc->selectNode ("StartTag"); (2) IXMLDOMNodePtr pNode; pNode = m_pXmlDoc->selectNode ("StartTag"); According to the contents of the generated .tlh file, the selectNode function returns an IXMLDOMNodePtr, and option 1 bombs. While I don't really understand t
-
Michael A. Barnhart wrote: However, because this information is lost, an XML document stored in this manner can lose formatting information in its content. First, I sure didn't expect such investigation into this seemingly trivial matter, but I certainly appreciate all the input and help. Apparently, the preserve white space option is on by default. I had created an XML file in the Visual Studio IDE in order to plan out the structure and content of the file that would eventually be manipulated and maintained by the application program. Once I had that file, I would call the 'load' function, and then let the application do its thing, one operation being the creation of new nodes, as described in the earlier posts. At the end of execution, and calling 'save', I would then load the file back into the IDE to see what happened, and to check that the application created the new nodes properly. At that point, what I was seeing was the same file as I had originally created, with all the 'formatting' (spaces, tabs, and newlines) properly still existant, but the new nodes would appear on a single line. They would be in the proper location in the file, in terms of being after the last child of the node being added to, but there were no new lines. Therefore, although I haven't updated the implementation yet, I believe that the previous suggestion about adding text nodes with newlines (and spaces or tabs if I decide to add those too) will indeed place those into the file and will be preserved upon subsequent 'load' and 'save' operations, by default, even without calling 'preserve white space'. This particular application is running on Win98 with msxml3, although I plan to update that to msxml4 for the speed and memory considerations. I also appreciate your code, as seeing how things are done is the best teacher. But, unfortunately, and probably as no surprise, that raises a few more questions. Based on one of the XML sample projects on CP, I am using the #import [msxml3.dll] in the header file. While it seems to me that the following should be equivalent, one worked and one did not. While I am not familiar with the intricacies of COM, I went with the one that worked. (1) IXMLDOMNode *pNode; pNode = m_pXmlDoc->selectNode ("StartTag"); (2) IXMLDOMNodePtr pNode; pNode = m_pXmlDoc->selectNode ("StartTag"); According to the contents of the generated .tlh file, the selectNode function returns an IXMLDOMNodePtr, and option 1 bombs. While I don't really understand t
Dave, I think you are well along the right path. Good luck:-D David Chamberlain wrote: Apparently, the preserve white space option is on by default. Dave, As I said earlier my code worked differently awhile back. I now have MSXML4 installed that the default appears to not include the white spaces. So I would go ahead and add that line in unless your memeory is much better than mine. And go with what works. In COM you have interfaces. So a pointer to an interface is not the same thing as an interface to a pointer to XYZ. Don't we love this. I needed a little refresher especially with what I learned for differences between 3 and 4. Take Care and have a nice day. Mike Good ideas are not adopted automatically. They must be driven into practice with courageous patients. -Admiral Rickover. ...