html file parsing help
-
Hi could any one show me the best way to extract the info i need from the file below. Basicly the files are hand historys from a poker site and i would like to keep track of how often a player calls checks folds and so on. The file is in html (hoping it posts ok) , or if any one could tell me if it would be better to convert the file to xml (if possible) and extract the data that way? well here is a short part of the file .....
Idx
Date/Time
Table Name
Hand ID
Stakes
19
[[Jul 31 16:04:10]](#[Jul 31 16:04:10]~Casanova~10474702-12327~$0.25/$0.50)
[Casanova](#[Jul 31 16:04:10]~Casanova~10474702-12327~$0.25/$0.50)
[10474702-12327](#[Jul 31 16:04:10]~Casanova~10474702-12327~$0.25/$0.50)
[$0.25/$0.50](#[Jul 31 16:04:10]~Casanova~10474702-12327~$0.25/$0.50)
18
[[Jul 31 16:03:20]](#[Jul 31 16:03:20]~Casanova~10474702-12326~$0.25/$0.50)
[Casanova](#[Jul 31 16:03:20]~Casanova~10474702-12326~$0.25/$0.50)
[10474702-12326](#[Jul 31 16:03:20]~Casanova~10474702-12326~$0.25/$0.50)
[$0.25/$0.50](#[Jul 31 16:03:20]~Casanova~10474702-12326~$0.25/$0.50)
17
[[Jul 31 16:02:32]](#[Jul 31 16:02:32]~Casanova~10474702-12325~$0.25/$0.50)
[Casanova](#[Jul 31 16:02:32]~Casanova~10474702-12325~$0.25/$0.50)
[10474702-12325](#[Jul 31 16:02:32]~Casanova~10474702-12325~$0.25/$0.50)
[$0.25/$0.50](#[Jul 31 16:02:32]~Casanova~10474702-12325~$0.25/$0.50)
16
[[Jul 31 16:01:45]](#[Jul 31 16:01:45]~Casanova~10474702-12324~$0.25/$0.50)
[Casanova](#[Jul 31 16:01:45]~Casanova~10474702-12324~$0.25/$0.50)
[10474702-12324](#[Jul 31 16:01:45]~Casanova~10474702-12324~$0.25/$0.50)
-
Hi could any one show me the best way to extract the info i need from the file below. Basicly the files are hand historys from a poker site and i would like to keep track of how often a player calls checks folds and so on. The file is in html (hoping it posts ok) , or if any one could tell me if it would be better to convert the file to xml (if possible) and extract the data that way? well here is a short part of the file .....
Idx
Date/Time
Table Name
Hand ID
Stakes
19
[[Jul 31 16:04:10]](#[Jul 31 16:04:10]~Casanova~10474702-12327~$0.25/$0.50)
[Casanova](#[Jul 31 16:04:10]~Casanova~10474702-12327~$0.25/$0.50)
[10474702-12327](#[Jul 31 16:04:10]~Casanova~10474702-12327~$0.25/$0.50)
[$0.25/$0.50](#[Jul 31 16:04:10]~Casanova~10474702-12327~$0.25/$0.50)
18
[[Jul 31 16:03:20]](#[Jul 31 16:03:20]~Casanova~10474702-12326~$0.25/$0.50)
[Casanova](#[Jul 31 16:03:20]~Casanova~10474702-12326~$0.25/$0.50)
[10474702-12326](#[Jul 31 16:03:20]~Casanova~10474702-12326~$0.25/$0.50)
[$0.25/$0.50](#[Jul 31 16:03:20]~Casanova~10474702-12326~$0.25/$0.50)
17
[[Jul 31 16:02:32]](#[Jul 31 16:02:32]~Casanova~10474702-12325~$0.25/$0.50)
[Casanova](#[Jul 31 16:02:32]~Casanova~10474702-12325~$0.25/$0.50)
[10474702-12325](#[Jul 31 16:02:32]~Casanova~10474702-12325~$0.25/$0.50)
[$0.25/$0.50](#[Jul 31 16:02:32]~Casanova~10474702-12325~$0.25/$0.50)
16
[[Jul 31 16:01:45]](#[Jul 31 16:01:45]~Casanova~10474702-12324~$0.25/$0.50)
[Casanova](#[Jul 31 16:01:45]~Casanova~10474702-12324~$0.25/$0.50)
[10474702-12324](#[Jul 31 16:01:45]~Casanova~10474702-12324~$0.25/$0.50)
Have you tried opening this up with an XML parser? It looks well-formed enough that you could do that. In this case, you could use an XmlDocument object and start grabbing data out of it.
Logifusion[^] If not entertaining, write your Congressman.
-
Hi could any one show me the best way to extract the info i need from the file below. Basicly the files are hand historys from a poker site and i would like to keep track of how often a player calls checks folds and so on. The file is in html (hoping it posts ok) , or if any one could tell me if it would be better to convert the file to xml (if possible) and extract the data that way? well here is a short part of the file .....
Idx
Date/Time
Table Name
Hand ID
Stakes
19
[[Jul 31 16:04:10]](#[Jul 31 16:04:10]~Casanova~10474702-12327~$0.25/$0.50)
[Casanova](#[Jul 31 16:04:10]~Casanova~10474702-12327~$0.25/$0.50)
[10474702-12327](#[Jul 31 16:04:10]~Casanova~10474702-12327~$0.25/$0.50)
[$0.25/$0.50](#[Jul 31 16:04:10]~Casanova~10474702-12327~$0.25/$0.50)
18
[[Jul 31 16:03:20]](#[Jul 31 16:03:20]~Casanova~10474702-12326~$0.25/$0.50)
[Casanova](#[Jul 31 16:03:20]~Casanova~10474702-12326~$0.25/$0.50)
[10474702-12326](#[Jul 31 16:03:20]~Casanova~10474702-12326~$0.25/$0.50)
[$0.25/$0.50](#[Jul 31 16:03:20]~Casanova~10474702-12326~$0.25/$0.50)
17
[[Jul 31 16:02:32]](#[Jul 31 16:02:32]~Casanova~10474702-12325~$0.25/$0.50)
[Casanova](#[Jul 31 16:02:32]~Casanova~10474702-12325~$0.25/$0.50)
[10474702-12325](#[Jul 31 16:02:32]~Casanova~10474702-12325~$0.25/$0.50)
[$0.25/$0.50](#[Jul 31 16:02:32]~Casanova~10474702-12325~$0.25/$0.50)
16
[[Jul 31 16:01:45]](#[Jul 31 16:01:45]~Casanova~10474702-12324~$0.25/$0.50)
[Casanova](#[Jul 31 16:01:45]~Casanova~10474702-12324~$0.25/$0.50)
[10474702-12324](#[Jul 31 16:01:45]~Casanova~10474702-12324~$0.25/$0.50)
Hi, Use Regex to search for specific patterns.
"(\\w+)\\s*=\\s*" "\"\\s*(.*?)\\s*\""
These will match all the attributes in the html like name = "value" Try forming the proper regex for your situation.
"A good programmer is someone who looks both ways before crossing a one-way street." -- Doug Linder
Anant Y. Kulkarni