Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Web Development
  3. ASP.NET
  4. Need to capture data of currently displayed webpage

Need to capture data of currently displayed webpage

Scheduled Pinned Locked Moved ASP.NET
csharpperlhtmlasp-netdatabase
16 Posts 4 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A Offline
    A Offline
    Antone Eason
    wrote on last edited by
    #1

    Greetings, I would like to know if anyone has any code ASP.NET or even VB script, Perl etc that can simply be run from a command line to download whatever page is on a browser and write the raw HTML to a file, it can also be MHT for that matter. I know about using - Navigate etc for a specific web page. This is different. The page I go to require authentication with word art and user name and password etc. Then once there you must click a few times to get where you are going for the data. Therefore running a script to direct me to a certain webpage will not work. The page has to be displayed first and then downloaded. So, I just want to take whatever is currently on the browser, mainly due to catch refreshes of new data. Understand the page I am getting data from is generated as from a database so I can not simply run a script file to download that page. The program must be able to simple be activated and write to disk what ever is displayed on the browser at the time the program is run. I have a seperate data extractor to run against the new file to find the data I need. This file can always have the same name as well and be overwritten. Thanks, Antone

    M P 2 Replies Last reply
    0
    • A Antone Eason

      Greetings, I would like to know if anyone has any code ASP.NET or even VB script, Perl etc that can simply be run from a command line to download whatever page is on a browser and write the raw HTML to a file, it can also be MHT for that matter. I know about using - Navigate etc for a specific web page. This is different. The page I go to require authentication with word art and user name and password etc. Then once there you must click a few times to get where you are going for the data. Therefore running a script to direct me to a certain webpage will not work. The page has to be displayed first and then downloaded. So, I just want to take whatever is currently on the browser, mainly due to catch refreshes of new data. Understand the page I am getting data from is generated as from a database so I can not simply run a script file to download that page. The program must be able to simple be activated and write to disk what ever is displayed on the browser at the time the program is run. I have a seperate data extractor to run against the new file to find the data I need. This file can always have the same name as well and be overwritten. Thanks, Antone

      M Offline
      M Offline
      Manas Bhardwaj
      wrote on last edited by
      #2

      This is considered as rude. You already asked this question in C# forum.

      Manas Bhardwaj Please remember to rate helpful or unhelpful answers, it lets us and people reading the forums know if our answers are any good.

      modified on Wednesday, July 8, 2009 10:08 AM

      A 1 Reply Last reply
      0
      • M Manas Bhardwaj

        This is considered as rude. You already asked this question in C# forum.

        Manas Bhardwaj Please remember to rate helpful or unhelpful answers, it lets us and people reading the forums know if our answers are any good.

        modified on Wednesday, July 8, 2009 10:08 AM

        A Offline
        A Offline
        Antone Eason
        wrote on last edited by
        #3

        Not clear why it is rude??? I am asking different forums and those that do not necessary see my question from other forums. So, if my question maybe answered in Perl and not in C# then the Perl guys will never see my question. If this is not correct let me know. I have not been on here in many years. Thanks, Antone

        N 1 Reply Last reply
        0
        • A Antone Eason

          Greetings, I would like to know if anyone has any code ASP.NET or even VB script, Perl etc that can simply be run from a command line to download whatever page is on a browser and write the raw HTML to a file, it can also be MHT for that matter. I know about using - Navigate etc for a specific web page. This is different. The page I go to require authentication with word art and user name and password etc. Then once there you must click a few times to get where you are going for the data. Therefore running a script to direct me to a certain webpage will not work. The page has to be displayed first and then downloaded. So, I just want to take whatever is currently on the browser, mainly due to catch refreshes of new data. Understand the page I am getting data from is generated as from a database so I can not simply run a script file to download that page. The program must be able to simple be activated and write to disk what ever is displayed on the browser at the time the program is run. I have a seperate data extractor to run against the new file to find the data I need. This file can always have the same name as well and be overwritten. Thanks, Antone

          P Offline
          P Offline
          PJWindsor
          wrote on last edited by
          #4

          Hi, have you tried Fiddler its a HTTP Debugging tool and logs whats going on as your are browsing, you can see html, js files downloaded and alot more, might help. If that doesnt do the trick then look at the WebRequest object this can be used to get the HTTP response, as you said there is a log screen it maybe harder, but you can do your own POST by changing the username, password, and wordart answer into a Byte array and then sending this with your request to whatever page the form points too. try here http://msdn.microsoft.com/en-us/library/system.net.webrequest.aspx[^] Phil

          A 2 Replies Last reply
          0
          • P PJWindsor

            Hi, have you tried Fiddler its a HTTP Debugging tool and logs whats going on as your are browsing, you can see html, js files downloaded and alot more, might help. If that doesnt do the trick then look at the WebRequest object this can be used to get the HTTP response, as you said there is a log screen it maybe harder, but you can do your own POST by changing the username, password, and wordart answer into a Byte array and then sending this with your request to whatever page the form points too. try here http://msdn.microsoft.com/en-us/library/system.net.webrequest.aspx[^] Phil

            A Offline
            A Offline
            Antone Eason
            wrote on last edited by
            #5

            Hi Phil, Thanks for the quick reply! I will take a look at fiddler. I am an assembly language programmer mainly, so I do not know so much about script languages. If fiddler can write the webpage to a file then it should work! Please understand the login is not an issue anymore I can login fine and leave the opera browser on with refresh set to 30 seconds. Then as the page refreshes with new data I want to write it to a file. Next my program runs and check the raw data to see if what I am looking for is there, if it is then the bell will ring, if not then it ends and waits for the next html file to check. Thanks, Antone

            1 Reply Last reply
            0
            • P PJWindsor

              Hi, have you tried Fiddler its a HTTP Debugging tool and logs whats going on as your are browsing, you can see html, js files downloaded and alot more, might help. If that doesnt do the trick then look at the WebRequest object this can be used to get the HTTP response, as you said there is a log screen it maybe harder, but you can do your own POST by changing the username, password, and wordart answer into a Byte array and then sending this with your request to whatever page the form points too. try here http://msdn.microsoft.com/en-us/library/system.net.webrequest.aspx[^] Phil

              A Offline
              A Offline
              Antone Eason
              wrote on last edited by
              #6

              Hi Phil, Sorry Fiddler is a no go. It does not write data to a file. Thanks, Antone

              P 1 Reply Last reply
              0
              • A Antone Eason

                Not clear why it is rude??? I am asking different forums and those that do not necessary see my question from other forums. So, if my question maybe answered in Perl and not in C# then the Perl guys will never see my question. If this is not correct let me know. I have not been on here in many years. Thanks, Antone

                N Offline
                N Offline
                N a v a n e e t h
                wrote on last edited by
                #7

                Antone Eason wrote:

                Not clear why it is rude??? I am asking different forums and those that do not necessary see my question from other forums

                Well, in such cases, you can post the question in most appropriate forum and link to it while posting in the other forums. Putting the same text in all places is considered to be rude. To answer your question, You have said the page you want to download requires authentication and some word art. Do you mean a CAPTCHA[^] ? If yes, getting the authentication pass is tough as you can't predict the value of CAPTCHA. :)

                Navaneeth How to use google | Ask smart questions

                A 2 Replies Last reply
                0
                • N N a v a n e e t h

                  Antone Eason wrote:

                  Not clear why it is rude??? I am asking different forums and those that do not necessary see my question from other forums

                  Well, in such cases, you can post the question in most appropriate forum and link to it while posting in the other forums. Putting the same text in all places is considered to be rude. To answer your question, You have said the page you want to download requires authentication and some word art. Do you mean a CAPTCHA[^] ? If yes, getting the authentication pass is tough as you can't predict the value of CAPTCHA. :)

                  Navaneeth How to use google | Ask smart questions

                  A Offline
                  A Offline
                  Antone Eason
                  wrote on last edited by
                  #8

                  OK thanks I will look into linking for future posts. As far as my question I do not need to authenticate or worry about login credentials. I am already in fine, the page just changes every few minutes. Think of it as simply being able to click the [view source] button on any web page that happens to be on the browser and then the source e.g. raw HTML is written to a text file. That is what I need a script or program to be the view source button that I can invoke inside a batch file that is called by the scheduler every 5 minutes or so to check for the latest update. Thanks, Antone

                  N 1 Reply Last reply
                  0
                  • A Antone Eason

                    Hi Phil, Sorry Fiddler is a no go. It does not write data to a file. Thanks, Antone

                    P Offline
                    P Offline
                    PJWindsor
                    wrote on last edited by
                    #9

                    How about if you use the WebBrowser Control on a windows form? Im not sure but it think you should beable to browse to the page you want and then write some code to get the HTML of the current page on the WebBrowser control. If that doesnt work then you can look at using Automating IE directly i forget the dll name its something like SHDocVW and this contains the Interfaces for automating IE Directly. From here you can get the current document.

                    A 1 Reply Last reply
                    0
                    • P PJWindsor

                      How about if you use the WebBrowser Control on a windows form? Im not sure but it think you should beable to browse to the page you want and then write some code to get the HTML of the current page on the WebBrowser control. If that doesnt work then you can look at using Automating IE directly i forget the dll name its something like SHDocVW and this contains the Interfaces for automating IE Directly. From here you can get the current document.

                      A Offline
                      A Offline
                      Antone Eason
                      wrote on last edited by
                      #10

                      Thanks for the input, but I do not believe I have the knowledge to do that. I was hoping that this has been done before. The IE automation sounds interesting. Thanks, Antone

                      P 2 Replies Last reply
                      0
                      • A Antone Eason

                        Thanks for the input, but I do not believe I have the knowledge to do that. I was hoping that this has been done before. The IE automation sounds interesting. Thanks, Antone

                        P Offline
                        P Offline
                        PJWindsor
                        wrote on last edited by
                        #11

                        If you want it learn its Called Platform Invoke in .NET its all about using the existing Windows dll's from .NET here is a good artical to get started its shows you how to use some of the functions in the User32.dll [PInoke on CodeProject] this website list alot of the windows functions with code to automate them http://www.pinvoke.net/[www.pinvoke.net] Phil

                        A 1 Reply Last reply
                        0
                        • A Antone Eason

                          Thanks for the input, but I do not believe I have the knowledge to do that. I was hoping that this has been done before. The IE automation sounds interesting. Thanks, Antone

                          P Offline
                          P Offline
                          PJWindsor
                          wrote on last edited by
                          #12

                          It is possible to get the HTML From a WebBrowser Control see below code sorry if you dont know C# I just put a textbox on the page to enter the url, a button to navigate to the url(btnLoadWebsite_Click handles this button) and a button to get the HTML from the currently displayed page (btnGetHTML_Click handles this).

                          private void btnLoadWebsite_Click(object sender, EventArgs e)
                          {
                          try
                          {
                          Uri oURL = new Uri(txtURL.Text); //Get the Url from the text box must be format http://www.google.com
                          wbMain.Navigate(oURL); //Navigate to the Url
                          }

                                  catch (Exception err)
                                  {
                                      MessageBox.Show(err.ToString());
                                  }
                              }
                          
                          
                              private void btnGetHtml\_Click(object sender, EventArgs e)
                              {
                                  MessageBox.Show(wbMain.DocumentText); //Display the HTML
                              }
                          

                          The important bit is wbMain.DocumentText. wbMain is the WebBrowser Control and DocumentText is the property that holds the current html page text. Phil

                          1 Reply Last reply
                          0
                          • P PJWindsor

                            If you want it learn its Called Platform Invoke in .NET its all about using the existing Windows dll's from .NET here is a good artical to get started its shows you how to use some of the functions in the User32.dll [PInoke on CodeProject] this website list alot of the windows functions with code to automate them http://www.pinvoke.net/[www.pinvoke.net] Phil

                            A Offline
                            A Offline
                            Antone Eason
                            wrote on last edited by
                            #13

                            Hi Phil, Thanks for all your assistnce. I really though this would have been done in the past. I will take a look at all the info you sent to me. Still waiting to hear back from the Perl guys, I really thought you can do this in Perl. Thanks, Antone

                            1 Reply Last reply
                            0
                            • N N a v a n e e t h

                              Antone Eason wrote:

                              Not clear why it is rude??? I am asking different forums and those that do not necessary see my question from other forums

                              Well, in such cases, you can post the question in most appropriate forum and link to it while posting in the other forums. Putting the same text in all places is considered to be rude. To answer your question, You have said the page you want to download requires authentication and some word art. Do you mean a CAPTCHA[^] ? If yes, getting the authentication pass is tough as you can't predict the value of CAPTCHA. :)

                              Navaneeth How to use google | Ask smart questions

                              A Offline
                              A Offline
                              Antone Eason
                              wrote on last edited by
                              #14

                              Hi - Yes that is security used. Thanks for your input and quick response. Thanks, Antone

                              1 Reply Last reply
                              0
                              • A Antone Eason

                                OK thanks I will look into linking for future posts. As far as my question I do not need to authenticate or worry about login credentials. I am already in fine, the page just changes every few minutes. Think of it as simply being able to click the [view source] button on any web page that happens to be on the browser and then the source e.g. raw HTML is written to a text file. That is what I need a script or program to be the view source button that I can invoke inside a batch file that is called by the scheduler every 5 minutes or so to check for the latest update. Thanks, Antone

                                N Offline
                                N Offline
                                N a v a n e e t h
                                wrote on last edited by
                                #15

                                Look at the WebClient class and work with DownloadString[^] method. It does exactly what you are looking for. Try to follow the below steps 1 - Create a console application which quits automatically once the job is done. 2 - Inside this, use WebClient class and request to the web page. This will give you raw HTML. 3 - Schedule this exe to run every 5 minutes for getting the latest updates. Hope that helps :)

                                Navaneeth How to use google | Ask smart questions

                                A 1 Reply Last reply
                                0
                                • N N a v a n e e t h

                                  Look at the WebClient class and work with DownloadString[^] method. It does exactly what you are looking for. Try to follow the below steps 1 - Create a console application which quits automatically once the job is done. 2 - Inside this, use WebClient class and request to the web page. This will give you raw HTML. 3 - Schedule this exe to run every 5 minutes for getting the latest updates. Hope that helps :)

                                  Navaneeth How to use google | Ask smart questions

                                  A Offline
                                  A Offline
                                  Antone Eason
                                  wrote on last edited by
                                  #16

                                  Hi - yes that sounds like an excellent solution. I will try just a little short on time. Thanks again! Antone

                                  1 Reply Last reply
                                  0
                                  Reply
                                  • Reply as topic
                                  Log in to reply
                                  • Oldest to Newest
                                  • Newest to Oldest
                                  • Most Votes


                                  • Login

                                  • Don't have an account? Register

                                  • Login or register to search.
                                  • First post
                                    Last post
                                  0
                                  • Categories
                                  • Recent
                                  • Tags
                                  • Popular
                                  • World
                                  • Users
                                  • Groups