Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Web Development
  3. ASP.NET
  4. Need to capture data of currently displayed webpage

Need to capture data of currently displayed webpage

Scheduled Pinned Locked Moved ASP.NET
csharpperlhtmlasp-netdatabase
16 Posts 4 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • P PJWindsor

    Hi, have you tried Fiddler its a HTTP Debugging tool and logs whats going on as your are browsing, you can see html, js files downloaded and alot more, might help. If that doesnt do the trick then look at the WebRequest object this can be used to get the HTTP response, as you said there is a log screen it maybe harder, but you can do your own POST by changing the username, password, and wordart answer into a Byte array and then sending this with your request to whatever page the form points too. try here http://msdn.microsoft.com/en-us/library/system.net.webrequest.aspx[^] Phil

    A Offline
    A Offline
    Antone Eason
    wrote on last edited by
    #5

    Hi Phil, Thanks for the quick reply! I will take a look at fiddler. I am an assembly language programmer mainly, so I do not know so much about script languages. If fiddler can write the webpage to a file then it should work! Please understand the login is not an issue anymore I can login fine and leave the opera browser on with refresh set to 30 seconds. Then as the page refreshes with new data I want to write it to a file. Next my program runs and check the raw data to see if what I am looking for is there, if it is then the bell will ring, if not then it ends and waits for the next html file to check. Thanks, Antone

    1 Reply Last reply
    0
    • P PJWindsor

      Hi, have you tried Fiddler its a HTTP Debugging tool and logs whats going on as your are browsing, you can see html, js files downloaded and alot more, might help. If that doesnt do the trick then look at the WebRequest object this can be used to get the HTTP response, as you said there is a log screen it maybe harder, but you can do your own POST by changing the username, password, and wordart answer into a Byte array and then sending this with your request to whatever page the form points too. try here http://msdn.microsoft.com/en-us/library/system.net.webrequest.aspx[^] Phil

      A Offline
      A Offline
      Antone Eason
      wrote on last edited by
      #6

      Hi Phil, Sorry Fiddler is a no go. It does not write data to a file. Thanks, Antone

      P 1 Reply Last reply
      0
      • A Antone Eason

        Not clear why it is rude??? I am asking different forums and those that do not necessary see my question from other forums. So, if my question maybe answered in Perl and not in C# then the Perl guys will never see my question. If this is not correct let me know. I have not been on here in many years. Thanks, Antone

        N Offline
        N Offline
        N a v a n e e t h
        wrote on last edited by
        #7

        Antone Eason wrote:

        Not clear why it is rude??? I am asking different forums and those that do not necessary see my question from other forums

        Well, in such cases, you can post the question in most appropriate forum and link to it while posting in the other forums. Putting the same text in all places is considered to be rude. To answer your question, You have said the page you want to download requires authentication and some word art. Do you mean a CAPTCHA[^] ? If yes, getting the authentication pass is tough as you can't predict the value of CAPTCHA. :)

        Navaneeth How to use google | Ask smart questions

        A 2 Replies Last reply
        0
        • N N a v a n e e t h

          Antone Eason wrote:

          Not clear why it is rude??? I am asking different forums and those that do not necessary see my question from other forums

          Well, in such cases, you can post the question in most appropriate forum and link to it while posting in the other forums. Putting the same text in all places is considered to be rude. To answer your question, You have said the page you want to download requires authentication and some word art. Do you mean a CAPTCHA[^] ? If yes, getting the authentication pass is tough as you can't predict the value of CAPTCHA. :)

          Navaneeth How to use google | Ask smart questions

          A Offline
          A Offline
          Antone Eason
          wrote on last edited by
          #8

          OK thanks I will look into linking for future posts. As far as my question I do not need to authenticate or worry about login credentials. I am already in fine, the page just changes every few minutes. Think of it as simply being able to click the [view source] button on any web page that happens to be on the browser and then the source e.g. raw HTML is written to a text file. That is what I need a script or program to be the view source button that I can invoke inside a batch file that is called by the scheduler every 5 minutes or so to check for the latest update. Thanks, Antone

          N 1 Reply Last reply
          0
          • A Antone Eason

            Hi Phil, Sorry Fiddler is a no go. It does not write data to a file. Thanks, Antone

            P Offline
            P Offline
            PJWindsor
            wrote on last edited by
            #9

            How about if you use the WebBrowser Control on a windows form? Im not sure but it think you should beable to browse to the page you want and then write some code to get the HTML of the current page on the WebBrowser control. If that doesnt work then you can look at using Automating IE directly i forget the dll name its something like SHDocVW and this contains the Interfaces for automating IE Directly. From here you can get the current document.

            A 1 Reply Last reply
            0
            • P PJWindsor

              How about if you use the WebBrowser Control on a windows form? Im not sure but it think you should beable to browse to the page you want and then write some code to get the HTML of the current page on the WebBrowser control. If that doesnt work then you can look at using Automating IE directly i forget the dll name its something like SHDocVW and this contains the Interfaces for automating IE Directly. From here you can get the current document.

              A Offline
              A Offline
              Antone Eason
              wrote on last edited by
              #10

              Thanks for the input, but I do not believe I have the knowledge to do that. I was hoping that this has been done before. The IE automation sounds interesting. Thanks, Antone

              P 2 Replies Last reply
              0
              • A Antone Eason

                Thanks for the input, but I do not believe I have the knowledge to do that. I was hoping that this has been done before. The IE automation sounds interesting. Thanks, Antone

                P Offline
                P Offline
                PJWindsor
                wrote on last edited by
                #11

                If you want it learn its Called Platform Invoke in .NET its all about using the existing Windows dll's from .NET here is a good artical to get started its shows you how to use some of the functions in the User32.dll [PInoke on CodeProject] this website list alot of the windows functions with code to automate them http://www.pinvoke.net/[www.pinvoke.net] Phil

                A 1 Reply Last reply
                0
                • A Antone Eason

                  Thanks for the input, but I do not believe I have the knowledge to do that. I was hoping that this has been done before. The IE automation sounds interesting. Thanks, Antone

                  P Offline
                  P Offline
                  PJWindsor
                  wrote on last edited by
                  #12

                  It is possible to get the HTML From a WebBrowser Control see below code sorry if you dont know C# I just put a textbox on the page to enter the url, a button to navigate to the url(btnLoadWebsite_Click handles this button) and a button to get the HTML from the currently displayed page (btnGetHTML_Click handles this).

                  private void btnLoadWebsite_Click(object sender, EventArgs e)
                  {
                  try
                  {
                  Uri oURL = new Uri(txtURL.Text); //Get the Url from the text box must be format http://www.google.com
                  wbMain.Navigate(oURL); //Navigate to the Url
                  }

                          catch (Exception err)
                          {
                              MessageBox.Show(err.ToString());
                          }
                      }
                  
                  
                      private void btnGetHtml\_Click(object sender, EventArgs e)
                      {
                          MessageBox.Show(wbMain.DocumentText); //Display the HTML
                      }
                  

                  The important bit is wbMain.DocumentText. wbMain is the WebBrowser Control and DocumentText is the property that holds the current html page text. Phil

                  1 Reply Last reply
                  0
                  • P PJWindsor

                    If you want it learn its Called Platform Invoke in .NET its all about using the existing Windows dll's from .NET here is a good artical to get started its shows you how to use some of the functions in the User32.dll [PInoke on CodeProject] this website list alot of the windows functions with code to automate them http://www.pinvoke.net/[www.pinvoke.net] Phil

                    A Offline
                    A Offline
                    Antone Eason
                    wrote on last edited by
                    #13

                    Hi Phil, Thanks for all your assistnce. I really though this would have been done in the past. I will take a look at all the info you sent to me. Still waiting to hear back from the Perl guys, I really thought you can do this in Perl. Thanks, Antone

                    1 Reply Last reply
                    0
                    • N N a v a n e e t h

                      Antone Eason wrote:

                      Not clear why it is rude??? I am asking different forums and those that do not necessary see my question from other forums

                      Well, in such cases, you can post the question in most appropriate forum and link to it while posting in the other forums. Putting the same text in all places is considered to be rude. To answer your question, You have said the page you want to download requires authentication and some word art. Do you mean a CAPTCHA[^] ? If yes, getting the authentication pass is tough as you can't predict the value of CAPTCHA. :)

                      Navaneeth How to use google | Ask smart questions

                      A Offline
                      A Offline
                      Antone Eason
                      wrote on last edited by
                      #14

                      Hi - Yes that is security used. Thanks for your input and quick response. Thanks, Antone

                      1 Reply Last reply
                      0
                      • A Antone Eason

                        OK thanks I will look into linking for future posts. As far as my question I do not need to authenticate or worry about login credentials. I am already in fine, the page just changes every few minutes. Think of it as simply being able to click the [view source] button on any web page that happens to be on the browser and then the source e.g. raw HTML is written to a text file. That is what I need a script or program to be the view source button that I can invoke inside a batch file that is called by the scheduler every 5 minutes or so to check for the latest update. Thanks, Antone

                        N Offline
                        N Offline
                        N a v a n e e t h
                        wrote on last edited by
                        #15

                        Look at the WebClient class and work with DownloadString[^] method. It does exactly what you are looking for. Try to follow the below steps 1 - Create a console application which quits automatically once the job is done. 2 - Inside this, use WebClient class and request to the web page. This will give you raw HTML. 3 - Schedule this exe to run every 5 minutes for getting the latest updates. Hope that helps :)

                        Navaneeth How to use google | Ask smart questions

                        A 1 Reply Last reply
                        0
                        • N N a v a n e e t h

                          Look at the WebClient class and work with DownloadString[^] method. It does exactly what you are looking for. Try to follow the below steps 1 - Create a console application which quits automatically once the job is done. 2 - Inside this, use WebClient class and request to the web page. This will give you raw HTML. 3 - Schedule this exe to run every 5 minutes for getting the latest updates. Hope that helps :)

                          Navaneeth How to use google | Ask smart questions

                          A Offline
                          A Offline
                          Antone Eason
                          wrote on last edited by
                          #16

                          Hi - yes that sounds like an excellent solution. I will try just a little short on time. Thanks again! Antone

                          1 Reply Last reply
                          0
                          Reply
                          • Reply as topic
                          Log in to reply
                          • Oldest to Newest
                          • Newest to Oldest
                          • Most Votes


                          • Login

                          • Don't have an account? Register

                          • Login or register to search.
                          • First post
                            Last post
                          0
                          • Categories
                          • Recent
                          • Tags
                          • Popular
                          • World
                          • Users
                          • Groups