Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Web Development
  3. ASP.NET
  4. Need to capture data of currently displayed webpage

Need to capture data of currently displayed webpage

Scheduled Pinned Locked Moved ASP.NET
csharpperlhtmlasp-netdatabase
16 Posts 4 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A Antone Eason

    Greetings, I would like to know if anyone has any code ASP.NET or even VB script, Perl etc that can simply be run from a command line to download whatever page is on a browser and write the raw HTML to a file, it can also be MHT for that matter. I know about using - Navigate etc for a specific web page. This is different. The page I go to require authentication with word art and user name and password etc. Then once there you must click a few times to get where you are going for the data. Therefore running a script to direct me to a certain webpage will not work. The page has to be displayed first and then downloaded. So, I just want to take whatever is currently on the browser, mainly due to catch refreshes of new data. Understand the page I am getting data from is generated as from a database so I can not simply run a script file to download that page. The program must be able to simple be activated and write to disk what ever is displayed on the browser at the time the program is run. I have a seperate data extractor to run against the new file to find the data I need. This file can always have the same name as well and be overwritten. Thanks, Antone

    P Offline
    P Offline
    PJWindsor
    wrote on last edited by
    #4

    Hi, have you tried Fiddler its a HTTP Debugging tool and logs whats going on as your are browsing, you can see html, js files downloaded and alot more, might help. If that doesnt do the trick then look at the WebRequest object this can be used to get the HTTP response, as you said there is a log screen it maybe harder, but you can do your own POST by changing the username, password, and wordart answer into a Byte array and then sending this with your request to whatever page the form points too. try here http://msdn.microsoft.com/en-us/library/system.net.webrequest.aspx[^] Phil

    A 2 Replies Last reply
    0
    • P PJWindsor

      Hi, have you tried Fiddler its a HTTP Debugging tool and logs whats going on as your are browsing, you can see html, js files downloaded and alot more, might help. If that doesnt do the trick then look at the WebRequest object this can be used to get the HTTP response, as you said there is a log screen it maybe harder, but you can do your own POST by changing the username, password, and wordart answer into a Byte array and then sending this with your request to whatever page the form points too. try here http://msdn.microsoft.com/en-us/library/system.net.webrequest.aspx[^] Phil

      A Offline
      A Offline
      Antone Eason
      wrote on last edited by
      #5

      Hi Phil, Thanks for the quick reply! I will take a look at fiddler. I am an assembly language programmer mainly, so I do not know so much about script languages. If fiddler can write the webpage to a file then it should work! Please understand the login is not an issue anymore I can login fine and leave the opera browser on with refresh set to 30 seconds. Then as the page refreshes with new data I want to write it to a file. Next my program runs and check the raw data to see if what I am looking for is there, if it is then the bell will ring, if not then it ends and waits for the next html file to check. Thanks, Antone

      1 Reply Last reply
      0
      • P PJWindsor

        Hi, have you tried Fiddler its a HTTP Debugging tool and logs whats going on as your are browsing, you can see html, js files downloaded and alot more, might help. If that doesnt do the trick then look at the WebRequest object this can be used to get the HTTP response, as you said there is a log screen it maybe harder, but you can do your own POST by changing the username, password, and wordart answer into a Byte array and then sending this with your request to whatever page the form points too. try here http://msdn.microsoft.com/en-us/library/system.net.webrequest.aspx[^] Phil

        A Offline
        A Offline
        Antone Eason
        wrote on last edited by
        #6

        Hi Phil, Sorry Fiddler is a no go. It does not write data to a file. Thanks, Antone

        P 1 Reply Last reply
        0
        • A Antone Eason

          Not clear why it is rude??? I am asking different forums and those that do not necessary see my question from other forums. So, if my question maybe answered in Perl and not in C# then the Perl guys will never see my question. If this is not correct let me know. I have not been on here in many years. Thanks, Antone

          N Offline
          N Offline
          N a v a n e e t h
          wrote on last edited by
          #7

          Antone Eason wrote:

          Not clear why it is rude??? I am asking different forums and those that do not necessary see my question from other forums

          Well, in such cases, you can post the question in most appropriate forum and link to it while posting in the other forums. Putting the same text in all places is considered to be rude. To answer your question, You have said the page you want to download requires authentication and some word art. Do you mean a CAPTCHA[^] ? If yes, getting the authentication pass is tough as you can't predict the value of CAPTCHA. :)

          Navaneeth How to use google | Ask smart questions

          A 2 Replies Last reply
          0
          • N N a v a n e e t h

            Antone Eason wrote:

            Not clear why it is rude??? I am asking different forums and those that do not necessary see my question from other forums

            Well, in such cases, you can post the question in most appropriate forum and link to it while posting in the other forums. Putting the same text in all places is considered to be rude. To answer your question, You have said the page you want to download requires authentication and some word art. Do you mean a CAPTCHA[^] ? If yes, getting the authentication pass is tough as you can't predict the value of CAPTCHA. :)

            Navaneeth How to use google | Ask smart questions

            A Offline
            A Offline
            Antone Eason
            wrote on last edited by
            #8

            OK thanks I will look into linking for future posts. As far as my question I do not need to authenticate or worry about login credentials. I am already in fine, the page just changes every few minutes. Think of it as simply being able to click the [view source] button on any web page that happens to be on the browser and then the source e.g. raw HTML is written to a text file. That is what I need a script or program to be the view source button that I can invoke inside a batch file that is called by the scheduler every 5 minutes or so to check for the latest update. Thanks, Antone

            N 1 Reply Last reply
            0
            • A Antone Eason

              Hi Phil, Sorry Fiddler is a no go. It does not write data to a file. Thanks, Antone

              P Offline
              P Offline
              PJWindsor
              wrote on last edited by
              #9

              How about if you use the WebBrowser Control on a windows form? Im not sure but it think you should beable to browse to the page you want and then write some code to get the HTML of the current page on the WebBrowser control. If that doesnt work then you can look at using Automating IE directly i forget the dll name its something like SHDocVW and this contains the Interfaces for automating IE Directly. From here you can get the current document.

              A 1 Reply Last reply
              0
              • P PJWindsor

                How about if you use the WebBrowser Control on a windows form? Im not sure but it think you should beable to browse to the page you want and then write some code to get the HTML of the current page on the WebBrowser control. If that doesnt work then you can look at using Automating IE directly i forget the dll name its something like SHDocVW and this contains the Interfaces for automating IE Directly. From here you can get the current document.

                A Offline
                A Offline
                Antone Eason
                wrote on last edited by
                #10

                Thanks for the input, but I do not believe I have the knowledge to do that. I was hoping that this has been done before. The IE automation sounds interesting. Thanks, Antone

                P 2 Replies Last reply
                0
                • A Antone Eason

                  Thanks for the input, but I do not believe I have the knowledge to do that. I was hoping that this has been done before. The IE automation sounds interesting. Thanks, Antone

                  P Offline
                  P Offline
                  PJWindsor
                  wrote on last edited by
                  #11

                  If you want it learn its Called Platform Invoke in .NET its all about using the existing Windows dll's from .NET here is a good artical to get started its shows you how to use some of the functions in the User32.dll [PInoke on CodeProject] this website list alot of the windows functions with code to automate them http://www.pinvoke.net/[www.pinvoke.net] Phil

                  A 1 Reply Last reply
                  0
                  • A Antone Eason

                    Thanks for the input, but I do not believe I have the knowledge to do that. I was hoping that this has been done before. The IE automation sounds interesting. Thanks, Antone

                    P Offline
                    P Offline
                    PJWindsor
                    wrote on last edited by
                    #12

                    It is possible to get the HTML From a WebBrowser Control see below code sorry if you dont know C# I just put a textbox on the page to enter the url, a button to navigate to the url(btnLoadWebsite_Click handles this button) and a button to get the HTML from the currently displayed page (btnGetHTML_Click handles this).

                    private void btnLoadWebsite_Click(object sender, EventArgs e)
                    {
                    try
                    {
                    Uri oURL = new Uri(txtURL.Text); //Get the Url from the text box must be format http://www.google.com
                    wbMain.Navigate(oURL); //Navigate to the Url
                    }

                            catch (Exception err)
                            {
                                MessageBox.Show(err.ToString());
                            }
                        }
                    
                    
                        private void btnGetHtml\_Click(object sender, EventArgs e)
                        {
                            MessageBox.Show(wbMain.DocumentText); //Display the HTML
                        }
                    

                    The important bit is wbMain.DocumentText. wbMain is the WebBrowser Control and DocumentText is the property that holds the current html page text. Phil

                    1 Reply Last reply
                    0
                    • P PJWindsor

                      If you want it learn its Called Platform Invoke in .NET its all about using the existing Windows dll's from .NET here is a good artical to get started its shows you how to use some of the functions in the User32.dll [PInoke on CodeProject] this website list alot of the windows functions with code to automate them http://www.pinvoke.net/[www.pinvoke.net] Phil

                      A Offline
                      A Offline
                      Antone Eason
                      wrote on last edited by
                      #13

                      Hi Phil, Thanks for all your assistnce. I really though this would have been done in the past. I will take a look at all the info you sent to me. Still waiting to hear back from the Perl guys, I really thought you can do this in Perl. Thanks, Antone

                      1 Reply Last reply
                      0
                      • N N a v a n e e t h

                        Antone Eason wrote:

                        Not clear why it is rude??? I am asking different forums and those that do not necessary see my question from other forums

                        Well, in such cases, you can post the question in most appropriate forum and link to it while posting in the other forums. Putting the same text in all places is considered to be rude. To answer your question, You have said the page you want to download requires authentication and some word art. Do you mean a CAPTCHA[^] ? If yes, getting the authentication pass is tough as you can't predict the value of CAPTCHA. :)

                        Navaneeth How to use google | Ask smart questions

                        A Offline
                        A Offline
                        Antone Eason
                        wrote on last edited by
                        #14

                        Hi - Yes that is security used. Thanks for your input and quick response. Thanks, Antone

                        1 Reply Last reply
                        0
                        • A Antone Eason

                          OK thanks I will look into linking for future posts. As far as my question I do not need to authenticate or worry about login credentials. I am already in fine, the page just changes every few minutes. Think of it as simply being able to click the [view source] button on any web page that happens to be on the browser and then the source e.g. raw HTML is written to a text file. That is what I need a script or program to be the view source button that I can invoke inside a batch file that is called by the scheduler every 5 minutes or so to check for the latest update. Thanks, Antone

                          N Offline
                          N Offline
                          N a v a n e e t h
                          wrote on last edited by
                          #15

                          Look at the WebClient class and work with DownloadString[^] method. It does exactly what you are looking for. Try to follow the below steps 1 - Create a console application which quits automatically once the job is done. 2 - Inside this, use WebClient class and request to the web page. This will give you raw HTML. 3 - Schedule this exe to run every 5 minutes for getting the latest updates. Hope that helps :)

                          Navaneeth How to use google | Ask smart questions

                          A 1 Reply Last reply
                          0
                          • N N a v a n e e t h

                            Look at the WebClient class and work with DownloadString[^] method. It does exactly what you are looking for. Try to follow the below steps 1 - Create a console application which quits automatically once the job is done. 2 - Inside this, use WebClient class and request to the web page. This will give you raw HTML. 3 - Schedule this exe to run every 5 minutes for getting the latest updates. Hope that helps :)

                            Navaneeth How to use google | Ask smart questions

                            A Offline
                            A Offline
                            Antone Eason
                            wrote on last edited by
                            #16

                            Hi - yes that sounds like an excellent solution. I will try just a little short on time. Thanks again! Antone

                            1 Reply Last reply
                            0
                            Reply
                            • Reply as topic
                            Log in to reply
                            • Oldest to Newest
                            • Newest to Oldest
                            • Most Votes


                            • Login

                            • Don't have an account? Register

                            • Login or register to search.
                            • First post
                              Last post
                            0
                            • Categories
                            • Recent
                            • Tags
                            • Popular
                            • World
                            • Users
                            • Groups