html control parsing
-
hi to all. i have a html file that contains a "select" control (dropdown) with a few "option"s i need to read the option and the value of each of them, is it possible to load a html code into a HtmlDocument and then read the control by getElemementByName? Or does any one have the regex pattern to read the options and values of a dropdown control? Thanks
-
hi to all. i have a html file that contains a "select" control (dropdown) with a few "option"s i need to read the option and the value of each of them, is it possible to load a html code into a HtmlDocument and then read the control by getElemementByName? Or does any one have the regex pattern to read the options and values of a dropdown control? Thanks
-
Eli Nurman wrote:
is it possible to load a html code into a HtmlDocument and then read the control by getElemementByName?
Yes. However if it is XHTML it would be simpler to use an XML Parser
led mike
how is that possible to to?
-
hi to all. i have a html file that contains a "select" control (dropdown) with a few "option"s i need to read the option and the value of each of them, is it possible to load a html code into a HtmlDocument and then read the control by getElemementByName? Or does any one have the regex pattern to read the options and values of a dropdown control? Thanks
Try Html Agility Pack: http://www.codeplex.com/htmlagilitypack[^] For a one-off it may be overkill, but if you're going to be doing alot of html parsing it can be incredibly useful to use xpath on html.
-
Try Html Agility Pack: http://www.codeplex.com/htmlagilitypack[^] For a one-off it may be overkill, but if you're going to be doing alot of html parsing it can be incredibly useful to use xpath on html.
after getting the inner html of the control how do i process it? i need a regex pattern
-
after getting the inner html of the control how do i process it? i need a regex pattern
Eli Nurman wrote:
after getting the inner html of the control how do i process it? i need a regex pattern
Have you ever used xpath before? With xpath, you can do something like
htmlDoc.SelectNodes("//yoursubdiv/your_option_box/select");
And you will get a collection of nodes back whose values are the text of the selections. No need for regex, no need for html parsing. This is the best suggestion I can make. Otherwise you'll need to spend some time studying regex and figuring out the exact pattern required to pick out the right dropdown, etc. It can be an irritating and mind-numbing process. If your document is guaranteed to be XHTML valid, you can even use Xml navigators straight out of the .NET framework with no need to use the Html Agility Pack (HAP is designed to provide XPATH functionality to html documents, which are not strict xml and so will not validate in the XmlDocument constructor). -
how is that possible to to?