Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. html file parsing help

html file parsing help

Scheduled Pinned Locked Moved C#
htmlxmljsonhelpquestion
3 Posts 3 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • P Offline
    P Offline
    pokabot
    wrote on last edited by
    #1

    Hi could any one show me the best way to extract the info i need from the file below. Basicly the files are hand historys from a poker site and i would like to keep track of how often a player calls checks folds and so on. The file is in html (hoping it posts ok) , or if any one could tell me if it would be better to convert the file to xml (if possible) and extract the data that way? well here is a short part of the file .....

    Idx

    Date/Time

    Table Name

    Hand ID

    Stakes

    19

    [[Jul 31 16:04:10]](#[Jul 31 16:04:10]~Casanova~10474702-12327~$0.25/$0.50)

    [Casanova](#[Jul 31 16:04:10]~Casanova~10474702-12327~$0.25/$0.50)

    [10474702-12327](#[Jul 31 16:04:10]~Casanova~10474702-12327~$0.25/$0.50)

    [$0.25/$0.50](#[Jul 31 16:04:10]~Casanova~10474702-12327~$0.25/$0.50)

    18

    [[Jul 31 16:03:20]](#[Jul 31 16:03:20]~Casanova~10474702-12326~$0.25/$0.50)

    [Casanova](#[Jul 31 16:03:20]~Casanova~10474702-12326~$0.25/$0.50)

    [10474702-12326](#[Jul 31 16:03:20]~Casanova~10474702-12326~$0.25/$0.50)

    [$0.25/$0.50](#[Jul 31 16:03:20]~Casanova~10474702-12326~$0.25/$0.50)

    17

    [[Jul 31 16:02:32]](#[Jul 31 16:02:32]~Casanova~10474702-12325~$0.25/$0.50)

    [Casanova](#[Jul 31 16:02:32]~Casanova~10474702-12325~$0.25/$0.50)

    [10474702-12325](#[Jul 31 16:02:32]~Casanova~10474702-12325~$0.25/$0.50)

    [$0.25/$0.50](#[Jul 31 16:02:32]~Casanova~10474702-12325~$0.25/$0.50)

    16

    [[Jul 31 16:01:45]](#[Jul 31 16:01:45]~Casanova~10474702-12324~$0.25/$0.50)

    [Casanova](#[Jul 31 16:01:45]~Casanova~10474702-12324~$0.25/$0.50)

    [10474702-12324](#[Jul 31 16:01:45]~Casanova~10474702-12324~$0.25/$0.50)

    D C 2 Replies Last reply
    0
    • P pokabot

      Hi could any one show me the best way to extract the info i need from the file below. Basicly the files are hand historys from a poker site and i would like to keep track of how often a player calls checks folds and so on. The file is in html (hoping it posts ok) , or if any one could tell me if it would be better to convert the file to xml (if possible) and extract the data that way? well here is a short part of the file .....

      Idx

      Date/Time

      Table Name

      Hand ID

      Stakes

      19

      [[Jul 31 16:04:10]](#[Jul 31 16:04:10]~Casanova~10474702-12327~$0.25/$0.50)

      [Casanova](#[Jul 31 16:04:10]~Casanova~10474702-12327~$0.25/$0.50)

      [10474702-12327](#[Jul 31 16:04:10]~Casanova~10474702-12327~$0.25/$0.50)

      [$0.25/$0.50](#[Jul 31 16:04:10]~Casanova~10474702-12327~$0.25/$0.50)

      18

      [[Jul 31 16:03:20]](#[Jul 31 16:03:20]~Casanova~10474702-12326~$0.25/$0.50)

      [Casanova](#[Jul 31 16:03:20]~Casanova~10474702-12326~$0.25/$0.50)

      [10474702-12326](#[Jul 31 16:03:20]~Casanova~10474702-12326~$0.25/$0.50)

      [$0.25/$0.50](#[Jul 31 16:03:20]~Casanova~10474702-12326~$0.25/$0.50)

      17

      [[Jul 31 16:02:32]](#[Jul 31 16:02:32]~Casanova~10474702-12325~$0.25/$0.50)

      [Casanova](#[Jul 31 16:02:32]~Casanova~10474702-12325~$0.25/$0.50)

      [10474702-12325](#[Jul 31 16:02:32]~Casanova~10474702-12325~$0.25/$0.50)

      [$0.25/$0.50](#[Jul 31 16:02:32]~Casanova~10474702-12325~$0.25/$0.50)

      16

      [[Jul 31 16:01:45]](#[Jul 31 16:01:45]~Casanova~10474702-12324~$0.25/$0.50)

      [Casanova](#[Jul 31 16:01:45]~Casanova~10474702-12324~$0.25/$0.50)

      [10474702-12324](#[Jul 31 16:01:45]~Casanova~10474702-12324~$0.25/$0.50)

      D Offline
      D Offline
      Dustin Metzgar
      wrote on last edited by
      #2

      Have you tried opening this up with an XML parser?  It looks well-formed enough that you could do that.  In this case, you could use an XmlDocument object and start grabbing data out of it.


      Logifusion[^] If not entertaining, write your Congressman.

      1 Reply Last reply
      0
      • P pokabot

        Hi could any one show me the best way to extract the info i need from the file below. Basicly the files are hand historys from a poker site and i would like to keep track of how often a player calls checks folds and so on. The file is in html (hoping it posts ok) , or if any one could tell me if it would be better to convert the file to xml (if possible) and extract the data that way? well here is a short part of the file .....

        Idx

        Date/Time

        Table Name

        Hand ID

        Stakes

        19

        [[Jul 31 16:04:10]](#[Jul 31 16:04:10]~Casanova~10474702-12327~$0.25/$0.50)

        [Casanova](#[Jul 31 16:04:10]~Casanova~10474702-12327~$0.25/$0.50)

        [10474702-12327](#[Jul 31 16:04:10]~Casanova~10474702-12327~$0.25/$0.50)

        [$0.25/$0.50](#[Jul 31 16:04:10]~Casanova~10474702-12327~$0.25/$0.50)

        18

        [[Jul 31 16:03:20]](#[Jul 31 16:03:20]~Casanova~10474702-12326~$0.25/$0.50)

        [Casanova](#[Jul 31 16:03:20]~Casanova~10474702-12326~$0.25/$0.50)

        [10474702-12326](#[Jul 31 16:03:20]~Casanova~10474702-12326~$0.25/$0.50)

        [$0.25/$0.50](#[Jul 31 16:03:20]~Casanova~10474702-12326~$0.25/$0.50)

        17

        [[Jul 31 16:02:32]](#[Jul 31 16:02:32]~Casanova~10474702-12325~$0.25/$0.50)

        [Casanova](#[Jul 31 16:02:32]~Casanova~10474702-12325~$0.25/$0.50)

        [10474702-12325](#[Jul 31 16:02:32]~Casanova~10474702-12325~$0.25/$0.50)

        [$0.25/$0.50](#[Jul 31 16:02:32]~Casanova~10474702-12325~$0.25/$0.50)

        16

        [[Jul 31 16:01:45]](#[Jul 31 16:01:45]~Casanova~10474702-12324~$0.25/$0.50)

        [Casanova](#[Jul 31 16:01:45]~Casanova~10474702-12324~$0.25/$0.50)

        [10474702-12324](#[Jul 31 16:01:45]~Casanova~10474702-12324~$0.25/$0.50)

        C Offline
        C Offline
        coolestCoder
        wrote on last edited by
        #3

        Hi, Use Regex to search for specific patterns. "(\\w+)\\s*=\\s*" "\"\\s*(.*?)\\s*\"" These will match all the attributes in the html like name = "value" Try forming the proper regex for your situation.


        "A good programmer is someone who looks both ways before crossing a one-way street." -- Doug Linder


        Anant Y. Kulkarni

        1 Reply Last reply
        0
        Reply
        • Reply as topic
        Log in to reply
        • Oldest to Newest
        • Newest to Oldest
        • Most Votes


        • Login

        • Don't have an account? Register

        • Login or register to search.
        • First post
          Last post
        0
        • Categories
        • Recent
        • Tags
        • Popular
        • World
        • Users
        • Groups