Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. Accessing the HTML in WebBrowser difficulty

Accessing the HTML in WebBrowser difficulty

Scheduled Pinned Locked Moved C#
databasehelpquestionjavascripthtml
10 Posts 4 Posters 1 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M Offline
    M Offline
    Michael Potter
    wrote on last edited by
    #1

    I am trying to grab the HTML source of a web page using the WebBrowser control. The page in question allows the user to query out a specific record (or set of records) from a database. Once these records have been listed, the user clicks on the desired title and javascript (or AJAX) overlays the current query page with the desired result display. Problem: If I try to programmatically grab the source, I get the original query page, not the overlayed desired result object. I can right click on the result and view source correctly but I can't seem to get it via code. Anyone out there solved this issue in the past?

    M L M 3 Replies Last reply
    0
    • M Michael Potter

      I am trying to grab the HTML source of a web page using the WebBrowser control. The page in question allows the user to query out a specific record (or set of records) from a database. Once these records have been listed, the user clicks on the desired title and javascript (or AJAX) overlays the current query page with the desired result display. Problem: If I try to programmatically grab the source, I get the original query page, not the overlayed desired result object. I can right click on the result and view source correctly but I can't seem to get it via code. Anyone out there solved this issue in the past?

      M Offline
      M Offline
      Muhammad Mazhar
      wrote on last edited by
      #2

      I have tried some tricks of querying a webpage via code and outputting the result. Check following topic on my blog may be it helps you. http://shareyour-experience.blogspot.com/2009/06/find-geo-location-through-ip-address.html[^]

      Share your experience with others Check my Blog...

      1 Reply Last reply
      0
      • M Michael Potter

        I am trying to grab the HTML source of a web page using the WebBrowser control. The page in question allows the user to query out a specific record (or set of records) from a database. Once these records have been listed, the user clicks on the desired title and javascript (or AJAX) overlays the current query page with the desired result display. Problem: If I try to programmatically grab the source, I get the original query page, not the overlayed desired result object. I can right click on the result and view source correctly but I can't seem to get it via code. Anyone out there solved this issue in the past?

        L Offline
        L Offline
        led mike
        wrote on last edited by
        #3

        Michael Potter wrote:

        I am trying to grab the HTML source of a web page using the WebBrowser control.

        I don't know what your requirements are but making an HTTP Request will get you the HTML code. It's far simpler than using a WebBrowser Control. You can use many different Base Class items to do this, one is the HttpWebRequest Class[^]

        M 1 Reply Last reply
        0
        • L led mike

          Michael Potter wrote:

          I am trying to grab the HTML source of a web page using the WebBrowser control.

          I don't know what your requirements are but making an HTTP Request will get you the HTML code. It's far simpler than using a WebBrowser Control. You can use many different Base Class items to do this, one is the HttpWebRequest Class[^]

          M Offline
          M Offline
          Michael Potter
          wrote on last edited by
          #4

          Thanks for the response. I can't hide the functionality of the website I wish to scrape. I need its query interface to function as designed. I just can't get to the result source HTML. I am guessing it is inserted somewhere in the DOM but, I failed to locate it. Essentially, a small square 'frame' appears (via java script) in the center if the page. If I right click on the small square 'frame' and choose [view source], I get what I want. If I right click OFF the small square 'frame' and choose [view source], I get the intial query HTML. I can't find the small square 'frame's HTML programically.

          L A 2 Replies Last reply
          0
          • M Michael Potter

            Thanks for the response. I can't hide the functionality of the website I wish to scrape. I need its query interface to function as designed. I just can't get to the result source HTML. I am guessing it is inserted somewhere in the DOM but, I failed to locate it. Essentially, a small square 'frame' appears (via java script) in the center if the page. If I right click on the small square 'frame' and choose [view source], I get what I want. If I right click OFF the small square 'frame' and choose [view source], I get the intial query HTML. I can't find the small square 'frame's HTML programically.

            L Offline
            L Offline
            led mike
            wrote on last edited by
            #5

            Michael Potter wrote:

            I can't hide the functionality of the website I wish to scrape.

            Not sure what that means but if you must use a WebBrowser Control you could still use the URL from the control to make separate HTTP Requests to obtain the HTML. If you are trying to capture the dynamic changes to the DOM from any client side script then of course that will not help you.

            Michael Potter wrote:

            I am guessing it is inserted somewhere in the DOM

            Yes the DOM is the in memory version of the HTML. Again if you want the original stream from the server then just make a HTTP Request. If you need the dynamic HTML you will have to use the DOM. You will have to dig through the DOM documentation to find the parts you need. The basic concept is that each Frame has a Body and a Body element might give you access to the Inner HTML as Text.

            1 Reply Last reply
            0
            • M Michael Potter

              Thanks for the response. I can't hide the functionality of the website I wish to scrape. I need its query interface to function as designed. I just can't get to the result source HTML. I am guessing it is inserted somewhere in the DOM but, I failed to locate it. Essentially, a small square 'frame' appears (via java script) in the center if the page. If I right click on the small square 'frame' and choose [view source], I get what I want. If I right click OFF the small square 'frame' and choose [view source], I get the intial query HTML. I can't find the small square 'frame's HTML programically.

              A Offline
              A Offline
              Adam R Harris
              wrote on last edited by
              #6

              Is the "frame" an iFrame? If it is that would explain your problem. An iFrame hold it's contents in it's own innerHTMl property so it wouldn't come back from the webbrowsers.Document.InnerHTML.

              If at first you don't succeed ... post it on The Code Project and Pray.

              M 1 Reply Last reply
              0
              • A Adam R Harris

                Is the "frame" an iFrame? If it is that would explain your problem. An iFrame hold it's contents in it's own innerHTMl property so it wouldn't come back from the webbrowsers.Document.InnerHTML.

                If at first you don't succeed ... post it on The Code Project and Pray.

                M Offline
                M Offline
                Michael Potter
                wrote on last edited by
                #7

                After some javascript research - yes it is an IFrame. I was able to capture the navigated URL and use HttpWebRequest (thanks led mike) to re-grab the IFrame when it is unsecured. I am unable to do so when it is secured data. I can't seem to hitch onto the rights the WebBrowser object has negotiated and I don't know how to negotiate a new set (I am not privy to the sites inner workings). So the problem remains but, is better defined. How do I read an IFrame's source from the WebBrowser control?

                A 1 Reply Last reply
                0
                • M Michael Potter

                  After some javascript research - yes it is an IFrame. I was able to capture the navigated URL and use HttpWebRequest (thanks led mike) to re-grab the IFrame when it is unsecured. I am unable to do so when it is secured data. I can't seem to hitch onto the rights the WebBrowser object has negotiated and I don't know how to negotiate a new set (I am not privy to the sites inner workings). So the problem remains but, is better defined. How do I read an IFrame's source from the WebBrowser control?

                  A Offline
                  A Offline
                  Adam R Harris
                  wrote on last edited by
                  #8

                  What I would do, I'm sure there is a better way, is just append a JavaScript function and a hidden textbox to the innerHTML of the loaded document. then call InvokeScript on the webbrowser to run your JavaScript (which should set the hidden textboxs text to the inner HTML of the iframe) then get the text from the textbox by getting the innerhtml and parsing out the textbox value. Like I said I'm sure there is a better way.

                  If at first you don't succeed ... post it on The Code Project and Pray.

                  M 1 Reply Last reply
                  0
                  • A Adam R Harris

                    What I would do, I'm sure there is a better way, is just append a JavaScript function and a hidden textbox to the innerHTML of the loaded document. then call InvokeScript on the webbrowser to run your JavaScript (which should set the hidden textboxs text to the inner HTML of the iframe) then get the text from the textbox by getting the innerhtml and parsing out the textbox value. Like I said I'm sure there is a better way.

                    If at first you don't succeed ... post it on The Code Project and Pray.

                    M Offline
                    M Offline
                    Michael Potter
                    wrote on last edited by
                    #9

                    Any idea on what the script would look like? I have not done a lot of web programming.

                    1 Reply Last reply
                    0
                    • M Michael Potter

                      I am trying to grab the HTML source of a web page using the WebBrowser control. The page in question allows the user to query out a specific record (or set of records) from a database. Once these records have been listed, the user clicks on the desired title and javascript (or AJAX) overlays the current query page with the desired result display. Problem: If I try to programmatically grab the source, I get the original query page, not the overlayed desired result object. I can right click on the result and view source correctly but I can't seem to get it via code. Anyone out there solved this issue in the past?

                      M Offline
                      M Offline
                      Michael Potter
                      wrote on last edited by
                      #10

                      Found this on the net that allowed me to use HttpWebRequest (as suggested earlier). http://mmarinov.blogspot.com/2007/10/using-exsiting-ie-cookies-with.html[^] Thanks for all those that helped - refining the definition of the problem was very helpful. Special Note: The WPF WebBrowser control doesn't even fire the events (IFrame navigation) necessary for the above solution. I have to use the Windows Forms version.

                      modified on Friday, July 24, 2009 2:07 PM

                      1 Reply Last reply
                      0
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Don't have an account? Register

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups