Structured HTML to XML
-
Hello, My task is to convert race results like this to XML: http://www.sportstats.ca\res1997\nord10k.htm You can see that there are 1,948 race results. I need to create 1,948 xml elements. I would like to see XML data like this: 1 29:41 2:59 1868 Rachid TBAHI Sleepy Hollow NY 1/213 1/1216 Men 30-34 I've shown just the first entry, and would expect to see 1,947 more. A few months ago, I wrote some C# code that would convert the URL above to the kind of XML I wanted. I noticed that the race results were all after a
tag. I found that I had to use a C# regular expression to parse the race results. Creating one regular expression isn't so bad, but there is so much race results data, conforming to many different regular expressions, that this becomes a large project for me.
Recently, I found out about Dapper and thought it would be able to make this a manageable project. Their website is here: http://www.dapper.net/
I have tried unsuccessfully to create a Dapp that will do this. If you look at the source html for the sportstats URL, you will see that there isn't anything delineating the attributes I need to capture. It's not like a CSV file - it's more like an old fashioned mainframe fixed width file. I can't seem to define fields for the Dapp, as a result.
Any suggestions are most welcome!
Richard Rogers
-
Hello, My task is to convert race results like this to XML: http://www.sportstats.ca\res1997\nord10k.htm You can see that there are 1,948 race results. I need to create 1,948 xml elements. I would like to see XML data like this: 1 29:41 2:59 1868 Rachid TBAHI Sleepy Hollow NY 1/213 1/1216 Men 30-34 I've shown just the first entry, and would expect to see 1,947 more. A few months ago, I wrote some C# code that would convert the URL above to the kind of XML I wanted. I noticed that the race results were all after a
tag. I found that I had to use a C# regular expression to parse the race results. Creating one regular expression isn't so bad, but there is so much race results data, conforming to many different regular expressions, that this becomes a large project for me.
Recently, I found out about Dapper and thought it would be able to make this a manageable project. Their website is here: http://www.dapper.net/
I have tried unsuccessfully to create a Dapp that will do this. If you look at the source html for the sportstats URL, you will see that there isn't anything delineating the attributes I need to capture. It's not like a CSV file - it's more like an old fashioned mainframe fixed width file. I can't seem to define fields for the Dapp, as a result.
Any suggestions are most welcome!
Richard Rogers
Yeah, a regular expression. But I don't see why more than one would be required. (Not that I'm an expert.)
-
Yeah, a regular expression. But I don't see why more than one would be required. (Not that I'm an expert.)
Hello, The only reason more than one is required is because I need to import the following datasets: http://www.sportstats.ca/res1997/nord10k.htm http://www.sportstats.ca/res1997/sunny10.htm http://www.sportstats.ca/res1997/ncm.htm http://www.sportstats.ca/res1997/niaghalf.htm http://www.sportstats.ca/res1997/niagmara.htm http://www.sportstats.ca/res1997/ncmh.htm http://www.sportstats.ca/res1997/rattle10.htm http://www.sportstats.ca/res1997/xerox10.htm http://www.sportstats.ca/res1997/sunny3.htm http://www.sportstats.ca/res1997/ncm6k.htm http://www.sportstats.ca/res1997/nordion.htm http://www.sportstats.ca/res1998/nord5k.htm http://www.sportstats.ca/res1998/ncm5k.htm http://www.sportstats.ca/res1998/ncmmar.htm http://www.sportstats.ca/res1998/ncmhalf.htm http://www.sportstats.ca/res1998/sunny10.htm http://www.sportstats.ca/res1998/reach5k.htm http://www.sportstats.ca/res1998/kingbeat.htm http://www.sportstats.ca/res1998/beat8k.htm http://www.sportstats.ca/res1998/rattle10.htm http://www.sportstats.ca/res1998/can10k.htm http://www.sportstats.ca/res1998/grimhalf.htm http://www.sportstats.ca/res1999/rsboiran.htm http://www.sportstats.ca/res1999/nor10k1.htm http://www.sportstats.ca/res1999/bay30k.htm http://www.sportstats.ca/res1999/cimh.htm http://www.sportstats.ca/res1999/sp10k.htm http://www.sportstats.ca/res1999/cimm.htm http://www.sportstats.ca/res1999/mara5.htm http://www.sportstats.ca/res1999/rdkiran.htm http://www.sportstats.ca/res1999/mara.htm http://www.sportstats.ca/res1999/marah.htm http://www.sportstats.ca/res2000/bay30k.htm http://www.sportstats.ca/res2000/mds10k.htm http://www.sportstats.ca/res2000/cimh.htm http://www.sportstats.ca/res2000/ncmhalf.htm http://www.sportstats.ca/res2000/ncm5k.htm http://www.sportstats.ca/res2000/ncmmara.htm http://www.sportstats.ca/res2000/cimm.htm http://www.sportstats.ca/res2000/compu10.htm http://www.sportstats.ca/res2000/casm.htm http://www.sportstats.ca/res2000/gatorh.htm http://www.sportstats.ca/res2001/nord10k.htm http://www.sportstats.ca/res2001/bay30.htm http://www.sportstats.ca/res2001/marahalf.htm http://www.sportstats.ca/res2001/cimh.htm http://www.sportstats.ca/res2001/mara5k.htm http://www.sportstats.ca/res2001/mara.htm http://www.sportstats.ca/res2001/cimm.htm http://www.sportstats.ca/res2001/legacy5k.htm http://www.sportstats.ca/res2001/pb5k.htm http://www.sportstats.ca/res2001/casnm.htm http://www.sportstats.ca/res2002/nord10k.htm http://www.sportstats.ca/res2002/marah.htm http://www.spor
-
Hello, The only reason more than one is required is because I need to import the following datasets: http://www.sportstats.ca/res1997/nord10k.htm http://www.sportstats.ca/res1997/sunny10.htm http://www.sportstats.ca/res1997/ncm.htm http://www.sportstats.ca/res1997/niaghalf.htm http://www.sportstats.ca/res1997/niagmara.htm http://www.sportstats.ca/res1997/ncmh.htm http://www.sportstats.ca/res1997/rattle10.htm http://www.sportstats.ca/res1997/xerox10.htm http://www.sportstats.ca/res1997/sunny3.htm http://www.sportstats.ca/res1997/ncm6k.htm http://www.sportstats.ca/res1997/nordion.htm http://www.sportstats.ca/res1998/nord5k.htm http://www.sportstats.ca/res1998/ncm5k.htm http://www.sportstats.ca/res1998/ncmmar.htm http://www.sportstats.ca/res1998/ncmhalf.htm http://www.sportstats.ca/res1998/sunny10.htm http://www.sportstats.ca/res1998/reach5k.htm http://www.sportstats.ca/res1998/kingbeat.htm http://www.sportstats.ca/res1998/beat8k.htm http://www.sportstats.ca/res1998/rattle10.htm http://www.sportstats.ca/res1998/can10k.htm http://www.sportstats.ca/res1998/grimhalf.htm http://www.sportstats.ca/res1999/rsboiran.htm http://www.sportstats.ca/res1999/nor10k1.htm http://www.sportstats.ca/res1999/bay30k.htm http://www.sportstats.ca/res1999/cimh.htm http://www.sportstats.ca/res1999/sp10k.htm http://www.sportstats.ca/res1999/cimm.htm http://www.sportstats.ca/res1999/mara5.htm http://www.sportstats.ca/res1999/rdkiran.htm http://www.sportstats.ca/res1999/mara.htm http://www.sportstats.ca/res1999/marah.htm http://www.sportstats.ca/res2000/bay30k.htm http://www.sportstats.ca/res2000/mds10k.htm http://www.sportstats.ca/res2000/cimh.htm http://www.sportstats.ca/res2000/ncmhalf.htm http://www.sportstats.ca/res2000/ncm5k.htm http://www.sportstats.ca/res2000/ncmmara.htm http://www.sportstats.ca/res2000/cimm.htm http://www.sportstats.ca/res2000/compu10.htm http://www.sportstats.ca/res2000/casm.htm http://www.sportstats.ca/res2000/gatorh.htm http://www.sportstats.ca/res2001/nord10k.htm http://www.sportstats.ca/res2001/bay30.htm http://www.sportstats.ca/res2001/marahalf.htm http://www.sportstats.ca/res2001/cimh.htm http://www.sportstats.ca/res2001/mara5k.htm http://www.sportstats.ca/res2001/mara.htm http://www.sportstats.ca/res2001/cimm.htm http://www.sportstats.ca/res2001/legacy5k.htm http://www.sportstats.ca/res2001/pb5k.htm http://www.sportstats.ca/res2001/casnm.htm http://www.sportstats.ca/res2002/nord10k.htm http://www.sportstats.ca/res2002/marah.htm http://www.spor
Ah, one per dataset makes sense for very diverse datasets, but I looked at the first three, they seem pretty similar, so one general one might to do the trick. Or, perhaps two; One that helps examine the column headings and the dashed-line heading to dynamically create another one that's specific to that layout?