Small XML problem

Martin23

Hi, I have made a small program that opens a data file in XML format, then searches through it for certain information, the problem I have is that if the first line of the XML text looks like; it doesnt work, but if I change it so the first line is simply; It does work. The problem is that the data comes in the first way, which doesnt work for me, can anyone explain why it might be that this happens? I assume it is a simple, seeing as how I can fix it by simply removing a small amount of text. (p.s. I am fairly new to XML so forgive me if this is really dumb!) thanks in advance! -- modified at 10:39 Thursday 5th January, 2006

Guffa

Standard question #1: What do you mean by "not working"? --- b { font-weight: normal; }

Martin23

Ok good point!, this is an example; private void button4_Click(object sender, EventArgs e) { StreamReader sr = new StreamReader(@"data.xml"); XmlTextReader xr = new XmlTextReader(sr); XmlDocument docData = new XmlDocument(); docData.Load(xr); XmlNodeList wikinodes = wiki.SelectNodes("data/page"); lblNodes.Text = dataNodes.Count.ToString(); } When it works, lblNodes will show there are 1699 nodes, when it doesnt work it just says there are 0 nodes. The XML file looks roughly like this; informationhere informationhere informationhere but will only work if I change it to look like this; informationhere informationhere informationhere I hope that is enough info, Thank you!

Guffa

Try with the "/data/page" xpath instead. Otherwise, it might be the schema that removed the nodes. Does the schema that the data references to exist? Check the contents of the InnerXml property once the document is loaded, to see if the page nodes gets loaded at all. --- b { font-weight: normal; }

Martin23

hhmmm, that didn't work, the page nodes do get loaded though. The data that I am trying to process comes from the Wikipedia database dump (you know, the open source encyclopedia, see http://en.wikipedia.org/wiki/Main\_Page). I am practising on a small foreign language XML dump which can be downloaded from http://download.wikimedia.org/wikipedia/am/20051020\_pages\_current.xml.bz2 (the english wikipedia dump is about 3gbs, so you dont exactly want to process that every time you test the program!). I extracted the exact first line of the XML, which is; This includes links to various websites that explain the schema. I find it strange that it works by simply removing all the attribute material from this first line. I tried programmatically removing this using the Attributes.RemoveAll() method, but as I found out it should really be called RemoveAllButOne(), so that doesnt work either. Thanks for trying to help, really appreciated, don't worry if don't have the time to go any further though. Martin

leppie

You need to setup the XmlNamespaceManager. Has been a long time, but I remember there was a problem with queries on the default namespace, the workaround it to add a dummy alias in your code, then use the in your query. HTH :) xacc.ide-0.1.1.4 - now with LSharp integration and scripting :)

Guffa

I see. When you show the complete tag, it's obvious. You need to use an XmlNamespaceManager object along with the xml document to be able to reach the nodes that belong to a namespace. --- b { font-weight: normal; }