Accessing the HTML in WebBrowser difficulty
-
I am trying to grab the HTML source of a web page using the WebBrowser control. The page in question allows the user to query out a specific record (or set of records) from a database. Once these records have been listed, the user clicks on the desired title and javascript (or AJAX) overlays the current query page with the desired result display. Problem: If I try to programmatically grab the source, I get the original query page, not the overlayed desired result object. I can right click on the result and view source correctly but I can't seem to get it via code. Anyone out there solved this issue in the past?
-
I am trying to grab the HTML source of a web page using the WebBrowser control. The page in question allows the user to query out a specific record (or set of records) from a database. Once these records have been listed, the user clicks on the desired title and javascript (or AJAX) overlays the current query page with the desired result display. Problem: If I try to programmatically grab the source, I get the original query page, not the overlayed desired result object. I can right click on the result and view source correctly but I can't seem to get it via code. Anyone out there solved this issue in the past?
I have tried some tricks of querying a webpage via code and outputting the result. Check following topic on my blog may be it helps you. http://shareyour-experience.blogspot.com/2009/06/find-geo-location-through-ip-address.html[^]
Share your experience with others Check my Blog...
-
I am trying to grab the HTML source of a web page using the WebBrowser control. The page in question allows the user to query out a specific record (or set of records) from a database. Once these records have been listed, the user clicks on the desired title and javascript (or AJAX) overlays the current query page with the desired result display. Problem: If I try to programmatically grab the source, I get the original query page, not the overlayed desired result object. I can right click on the result and view source correctly but I can't seem to get it via code. Anyone out there solved this issue in the past?
Michael Potter wrote:
I am trying to grab the HTML source of a web page using the WebBrowser control.
I don't know what your requirements are but making an HTTP Request will get you the HTML code. It's far simpler than using a WebBrowser Control. You can use many different Base Class items to do this, one is the HttpWebRequest Class[^]
-
Michael Potter wrote:
I am trying to grab the HTML source of a web page using the WebBrowser control.
I don't know what your requirements are but making an HTTP Request will get you the HTML code. It's far simpler than using a WebBrowser Control. You can use many different Base Class items to do this, one is the HttpWebRequest Class[^]
Thanks for the response. I can't hide the functionality of the website I wish to scrape. I need its query interface to function as designed. I just can't get to the result source HTML. I am guessing it is inserted somewhere in the DOM but, I failed to locate it. Essentially, a small square 'frame' appears (via java script) in the center if the page. If I right click on the small square 'frame' and choose [view source], I get what I want. If I right click OFF the small square 'frame' and choose [view source], I get the intial query HTML. I can't find the small square 'frame's HTML programically.
-
Thanks for the response. I can't hide the functionality of the website I wish to scrape. I need its query interface to function as designed. I just can't get to the result source HTML. I am guessing it is inserted somewhere in the DOM but, I failed to locate it. Essentially, a small square 'frame' appears (via java script) in the center if the page. If I right click on the small square 'frame' and choose [view source], I get what I want. If I right click OFF the small square 'frame' and choose [view source], I get the intial query HTML. I can't find the small square 'frame's HTML programically.
Michael Potter wrote:
I can't hide the functionality of the website I wish to scrape.
Not sure what that means but if you must use a WebBrowser Control you could still use the URL from the control to make separate HTTP Requests to obtain the HTML. If you are trying to capture the dynamic changes to the DOM from any client side script then of course that will not help you.
Michael Potter wrote:
I am guessing it is inserted somewhere in the DOM
Yes the DOM is the in memory version of the HTML. Again if you want the original stream from the server then just make a HTTP Request. If you need the dynamic HTML you will have to use the DOM. You will have to dig through the DOM documentation to find the parts you need. The basic concept is that each Frame has a Body and a Body element might give you access to the Inner HTML as Text.
-
Thanks for the response. I can't hide the functionality of the website I wish to scrape. I need its query interface to function as designed. I just can't get to the result source HTML. I am guessing it is inserted somewhere in the DOM but, I failed to locate it. Essentially, a small square 'frame' appears (via java script) in the center if the page. If I right click on the small square 'frame' and choose [view source], I get what I want. If I right click OFF the small square 'frame' and choose [view source], I get the intial query HTML. I can't find the small square 'frame's HTML programically.
Is the "frame" an iFrame? If it is that would explain your problem. An iFrame hold it's contents in it's own innerHTMl property so it wouldn't come back from the webbrowsers.Document.InnerHTML.
If at first you don't succeed ... post it on The Code Project and Pray.
-
Is the "frame" an iFrame? If it is that would explain your problem. An iFrame hold it's contents in it's own innerHTMl property so it wouldn't come back from the webbrowsers.Document.InnerHTML.
If at first you don't succeed ... post it on The Code Project and Pray.
After some javascript research - yes it is an IFrame. I was able to capture the navigated URL and use HttpWebRequest (thanks led mike) to re-grab the IFrame when it is unsecured. I am unable to do so when it is secured data. I can't seem to hitch onto the rights the WebBrowser object has negotiated and I don't know how to negotiate a new set (I am not privy to the sites inner workings). So the problem remains but, is better defined. How do I read an IFrame's source from the WebBrowser control?
-
After some javascript research - yes it is an IFrame. I was able to capture the navigated URL and use HttpWebRequest (thanks led mike) to re-grab the IFrame when it is unsecured. I am unable to do so when it is secured data. I can't seem to hitch onto the rights the WebBrowser object has negotiated and I don't know how to negotiate a new set (I am not privy to the sites inner workings). So the problem remains but, is better defined. How do I read an IFrame's source from the WebBrowser control?
What I would do, I'm sure there is a better way, is just append a JavaScript function and a hidden textbox to the innerHTML of the loaded document. then call InvokeScript on the webbrowser to run your JavaScript (which should set the hidden textboxs text to the inner HTML of the iframe) then get the text from the textbox by getting the innerhtml and parsing out the textbox value. Like I said I'm sure there is a better way.
If at first you don't succeed ... post it on The Code Project and Pray.
-
What I would do, I'm sure there is a better way, is just append a JavaScript function and a hidden textbox to the innerHTML of the loaded document. then call InvokeScript on the webbrowser to run your JavaScript (which should set the hidden textboxs text to the inner HTML of the iframe) then get the text from the textbox by getting the innerhtml and parsing out the textbox value. Like I said I'm sure there is a better way.
If at first you don't succeed ... post it on The Code Project and Pray.
Any idea on what the script would look like? I have not done a lot of web programming.
-
I am trying to grab the HTML source of a web page using the WebBrowser control. The page in question allows the user to query out a specific record (or set of records) from a database. Once these records have been listed, the user clicks on the desired title and javascript (or AJAX) overlays the current query page with the desired result display. Problem: If I try to programmatically grab the source, I get the original query page, not the overlayed desired result object. I can right click on the result and view source correctly but I can't seem to get it via code. Anyone out there solved this issue in the past?
Found this on the net that allowed me to use HttpWebRequest (as suggested earlier). http://mmarinov.blogspot.com/2007/10/using-exsiting-ie-cookies-with.html[^] Thanks for all those that helped - refining the definition of the problem was very helpful. Special Note: The WPF WebBrowser control doesn't even fire the events (IFrame navigation) necessary for the above solution. I have to use the Windows Forms version.
modified on Friday, July 24, 2009 2:07 PM