Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. XML / XSL
  4. Structured HTML to XML

Structured HTML to XML

Scheduled Pinned Locked Moved XML / XSL
csharphtmlregexxml
4 Posts 2 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • R Offline
    R Offline
    RichardInToronto
    wrote on last edited by
    #1

    Hello, My task is to convert race results like this to XML: http://www.sportstats.ca\res1997\nord10k.htm You can see that there are 1,948 race results. I need to create 1,948 xml elements. I would like to see XML data like this: 1 29:41 2:59 1868 Rachid TBAHI Sleepy Hollow NY 1/213 1/1216 Men 30-34 I've shown just the first entry, and would expect to see 1,947 more. A few months ago, I wrote some C# code that would convert the URL above to the kind of XML I wanted. I noticed that the race results were all after a

    tag. I found that I had to use a C# regular expression to parse the race results. Creating one regular expression isn't so bad, but there is so much race results data, conforming to many different regular expressions, that this becomes a large project for me.

    Recently, I found out about Dapper and thought it would be able to make this a manageable project. Their website is here: http://www.dapper.net/

    I have tried unsuccessfully to create a Dapp that will do this. If you look at the source html for the sportstats URL, you will see that there isn't anything delineating the attributes I need to capture. It's not like a CSV file - it's more like an old fashioned mainframe fixed width file. I can't seem to define fields for the Dapp, as a result.

    Any suggestions are most welcome!

    Richard Rogers

    P 1 Reply Last reply
    0
    • R RichardInToronto

      Hello, My task is to convert race results like this to XML: http://www.sportstats.ca\res1997\nord10k.htm You can see that there are 1,948 race results. I need to create 1,948 xml elements. I would like to see XML data like this: 1 29:41 2:59 1868 Rachid TBAHI Sleepy Hollow NY 1/213 1/1216 Men 30-34 I've shown just the first entry, and would expect to see 1,947 more. A few months ago, I wrote some C# code that would convert the URL above to the kind of XML I wanted. I noticed that the race results were all after a

      tag. I found that I had to use a C# regular expression to parse the race results. Creating one regular expression isn't so bad, but there is so much race results data, conforming to many different regular expressions, that this becomes a large project for me.

      Recently, I found out about Dapper and thought it would be able to make this a manageable project. Their website is here: http://www.dapper.net/

      I have tried unsuccessfully to create a Dapp that will do this. If you look at the source html for the sportstats URL, you will see that there isn't anything delineating the attributes I need to capture. It's not like a CSV file - it's more like an old fashioned mainframe fixed width file. I can't seem to define fields for the Dapp, as a result.

      Any suggestions are most welcome!

      Richard Rogers

      P Online
      P Online
      PIEBALDconsult
      wrote on last edited by
      #2

      Yeah, a regular expression. But I don't see why more than one would be required. (Not that I'm an expert.)

      R 1 Reply Last reply
      0
      • P PIEBALDconsult

        Yeah, a regular expression. But I don't see why more than one would be required. (Not that I'm an expert.)

        R Offline
        R Offline
        RichardInToronto
        wrote on last edited by
        #3

        Hello, The only reason more than one is required is because I need to import the following datasets: http://www.sportstats.ca/res1997/nord10k.htm http://www.sportstats.ca/res1997/sunny10.htm http://www.sportstats.ca/res1997/ncm.htm http://www.sportstats.ca/res1997/niaghalf.htm http://www.sportstats.ca/res1997/niagmara.htm http://www.sportstats.ca/res1997/ncmh.htm http://www.sportstats.ca/res1997/rattle10.htm http://www.sportstats.ca/res1997/xerox10.htm http://www.sportstats.ca/res1997/sunny3.htm http://www.sportstats.ca/res1997/ncm6k.htm http://www.sportstats.ca/res1997/nordion.htm http://www.sportstats.ca/res1998/nord5k.htm http://www.sportstats.ca/res1998/ncm5k.htm http://www.sportstats.ca/res1998/ncmmar.htm http://www.sportstats.ca/res1998/ncmhalf.htm http://www.sportstats.ca/res1998/sunny10.htm http://www.sportstats.ca/res1998/reach5k.htm http://www.sportstats.ca/res1998/kingbeat.htm http://www.sportstats.ca/res1998/beat8k.htm http://www.sportstats.ca/res1998/rattle10.htm http://www.sportstats.ca/res1998/can10k.htm http://www.sportstats.ca/res1998/grimhalf.htm http://www.sportstats.ca/res1999/rsboiran.htm http://www.sportstats.ca/res1999/nor10k1.htm http://www.sportstats.ca/res1999/bay30k.htm http://www.sportstats.ca/res1999/cimh.htm http://www.sportstats.ca/res1999/sp10k.htm http://www.sportstats.ca/res1999/cimm.htm http://www.sportstats.ca/res1999/mara5.htm http://www.sportstats.ca/res1999/rdkiran.htm http://www.sportstats.ca/res1999/mara.htm http://www.sportstats.ca/res1999/marah.htm http://www.sportstats.ca/res2000/bay30k.htm http://www.sportstats.ca/res2000/mds10k.htm http://www.sportstats.ca/res2000/cimh.htm http://www.sportstats.ca/res2000/ncmhalf.htm http://www.sportstats.ca/res2000/ncm5k.htm http://www.sportstats.ca/res2000/ncmmara.htm http://www.sportstats.ca/res2000/cimm.htm http://www.sportstats.ca/res2000/compu10.htm http://www.sportstats.ca/res2000/casm.htm http://www.sportstats.ca/res2000/gatorh.htm http://www.sportstats.ca/res2001/nord10k.htm http://www.sportstats.ca/res2001/bay30.htm http://www.sportstats.ca/res2001/marahalf.htm http://www.sportstats.ca/res2001/cimh.htm http://www.sportstats.ca/res2001/mara5k.htm http://www.sportstats.ca/res2001/mara.htm http://www.sportstats.ca/res2001/cimm.htm http://www.sportstats.ca/res2001/legacy5k.htm http://www.sportstats.ca/res2001/pb5k.htm http://www.sportstats.ca/res2001/casnm.htm http://www.sportstats.ca/res2002/nord10k.htm http://www.sportstats.ca/res2002/marah.htm http://www.spor

        P 1 Reply Last reply
        0
        • R RichardInToronto

          Hello, The only reason more than one is required is because I need to import the following datasets: http://www.sportstats.ca/res1997/nord10k.htm http://www.sportstats.ca/res1997/sunny10.htm http://www.sportstats.ca/res1997/ncm.htm http://www.sportstats.ca/res1997/niaghalf.htm http://www.sportstats.ca/res1997/niagmara.htm http://www.sportstats.ca/res1997/ncmh.htm http://www.sportstats.ca/res1997/rattle10.htm http://www.sportstats.ca/res1997/xerox10.htm http://www.sportstats.ca/res1997/sunny3.htm http://www.sportstats.ca/res1997/ncm6k.htm http://www.sportstats.ca/res1997/nordion.htm http://www.sportstats.ca/res1998/nord5k.htm http://www.sportstats.ca/res1998/ncm5k.htm http://www.sportstats.ca/res1998/ncmmar.htm http://www.sportstats.ca/res1998/ncmhalf.htm http://www.sportstats.ca/res1998/sunny10.htm http://www.sportstats.ca/res1998/reach5k.htm http://www.sportstats.ca/res1998/kingbeat.htm http://www.sportstats.ca/res1998/beat8k.htm http://www.sportstats.ca/res1998/rattle10.htm http://www.sportstats.ca/res1998/can10k.htm http://www.sportstats.ca/res1998/grimhalf.htm http://www.sportstats.ca/res1999/rsboiran.htm http://www.sportstats.ca/res1999/nor10k1.htm http://www.sportstats.ca/res1999/bay30k.htm http://www.sportstats.ca/res1999/cimh.htm http://www.sportstats.ca/res1999/sp10k.htm http://www.sportstats.ca/res1999/cimm.htm http://www.sportstats.ca/res1999/mara5.htm http://www.sportstats.ca/res1999/rdkiran.htm http://www.sportstats.ca/res1999/mara.htm http://www.sportstats.ca/res1999/marah.htm http://www.sportstats.ca/res2000/bay30k.htm http://www.sportstats.ca/res2000/mds10k.htm http://www.sportstats.ca/res2000/cimh.htm http://www.sportstats.ca/res2000/ncmhalf.htm http://www.sportstats.ca/res2000/ncm5k.htm http://www.sportstats.ca/res2000/ncmmara.htm http://www.sportstats.ca/res2000/cimm.htm http://www.sportstats.ca/res2000/compu10.htm http://www.sportstats.ca/res2000/casm.htm http://www.sportstats.ca/res2000/gatorh.htm http://www.sportstats.ca/res2001/nord10k.htm http://www.sportstats.ca/res2001/bay30.htm http://www.sportstats.ca/res2001/marahalf.htm http://www.sportstats.ca/res2001/cimh.htm http://www.sportstats.ca/res2001/mara5k.htm http://www.sportstats.ca/res2001/mara.htm http://www.sportstats.ca/res2001/cimm.htm http://www.sportstats.ca/res2001/legacy5k.htm http://www.sportstats.ca/res2001/pb5k.htm http://www.sportstats.ca/res2001/casnm.htm http://www.sportstats.ca/res2002/nord10k.htm http://www.sportstats.ca/res2002/marah.htm http://www.spor

          P Online
          P Online
          PIEBALDconsult
          wrote on last edited by
          #4

          Ah, one per dataset makes sense for very diverse datasets, but I looked at the first three, they seem pretty similar, so one general one might to do the trick. Or, perhaps two; One that helps examine the column headings and the dashed-line heading to dynamically create another one that's specific to that layout?

          1 Reply Last reply
          0
          Reply
          • Reply as topic
          Log in to reply
          • Oldest to Newest
          • Newest to Oldest
          • Most Votes


          • Login

          • Don't have an account? Register

          • Login or register to search.
          • First post
            Last post
          0
          • Categories
          • Recent
          • Tags
          • Popular
          • World
          • Users
          • Groups