Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. HTML Page download in C#

HTML Page download in C#

Scheduled Pinned Locked Moved C#
questioncsharphtmlvisual-studiodebugging
3 Posts 2 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M Offline
    M Offline
    makumazan84
    wrote on last edited by
    #1

    Hi, all! Here is the problem: I'm writting some code that gets HTML page, and then grabs it's content. The content of the page is organized in multiple pages, and navigation between them is done by clicking on the page's number below the records (For example, there are 150 records, displayed in 10 pages * 15 records/page. Therefore, the web page contains 10 hyperlinks to other pages with records). Obviously, in order to get all the information needed, I need to loop through all the pages' links, download their HTML and afterwards parse the information. The problem is, that I can only download 2 pages from the list. For some unknown reason, my code freeses after it downloads 2 pages. The order of the pages does not matter, for example if I start from page #5, I can only get pages 5 and 6. According to common sense and VS debugger the problem lies in the method, that downloads HTML code

    public delegate byte[] getHTTPdelegate(Uri address); // this is a delegate defined as a class member, used to perform async page download
    public void downloadPage(string URL)
    {
    // creating new webclient
    client = new WebClient();
    //assigning download method to a delegate
    getHTTPdelegate dl = client.DownloadData;

            // starting async download
            IAsyncResult ar = dl.BeginInvoke(new Uri(URL), null, null);
            
    
            while (!ar.IsCompleted)
            {
                
                Thread.Sleep(10);
            }
            // rawpage contains HTML in terms of byte\[\], the result of async download
            rawPage = dl.EndInvoke(ar); // this is also the line where exception occurs
        }
    

    After downloading page #2, the application stops, and in a minute ar two throws an unhandled exception stating that operation has timed out. Please note, that the problem is not "why isn't it working?", but "why does it work only 2 times?", when it should be downloading all the pages. Any ideas will be highly appreciated.

    L 1 Reply Last reply
    0
    • M makumazan84

      Hi, all! Here is the problem: I'm writting some code that gets HTML page, and then grabs it's content. The content of the page is organized in multiple pages, and navigation between them is done by clicking on the page's number below the records (For example, there are 150 records, displayed in 10 pages * 15 records/page. Therefore, the web page contains 10 hyperlinks to other pages with records). Obviously, in order to get all the information needed, I need to loop through all the pages' links, download their HTML and afterwards parse the information. The problem is, that I can only download 2 pages from the list. For some unknown reason, my code freeses after it downloads 2 pages. The order of the pages does not matter, for example if I start from page #5, I can only get pages 5 and 6. According to common sense and VS debugger the problem lies in the method, that downloads HTML code

      public delegate byte[] getHTTPdelegate(Uri address); // this is a delegate defined as a class member, used to perform async page download
      public void downloadPage(string URL)
      {
      // creating new webclient
      client = new WebClient();
      //assigning download method to a delegate
      getHTTPdelegate dl = client.DownloadData;

              // starting async download
              IAsyncResult ar = dl.BeginInvoke(new Uri(URL), null, null);
              
      
              while (!ar.IsCompleted)
              {
                  
                  Thread.Sleep(10);
              }
              // rawpage contains HTML in terms of byte\[\], the result of async download
              rawPage = dl.EndInvoke(ar); // this is also the line where exception occurs
          }
      

      After downloading page #2, the application stops, and in a minute ar two throws an unhandled exception stating that operation has timed out. Please note, that the problem is not "why isn't it working?", but "why does it work only 2 times?", when it should be downloading all the pages. Any ideas will be highly appreciated.

      L Offline
      L Offline
      Lost User
      wrote on last edited by
      #2

      The sleep loop is a bad idea and the downloading is not really asynchronous (because you're just waiting for it) Did you know WebClient has a method called DownloadDataAsync? I don't know why it's working twice.

      M 1 Reply Last reply
      0
      • L Lost User

        The sleep loop is a bad idea and the downloading is not really asynchronous (because you're just waiting for it) Did you know WebClient has a method called DownloadDataAsync? I don't know why it's working twice.

        M Offline
        M Offline
        makumazan84
        wrote on last edited by
        #3

        Thanks for a reply, Harlod. I know about DownloadDataAsync, but I didn't try that. Now, I will try that, and post the result. UPDATE: I've implemented downloading via DownloadDataAsync, but still the problem of 2 pages remained :-( Also, the same exception was thrown.

        modified on Saturday, June 26, 2010 10:11 AM

        1 Reply Last reply
        0
        Reply
        • Reply as topic
        Log in to reply
        • Oldest to Newest
        • Newest to Oldest
        • Most Votes


        • Login

        • Don't have an account? Register

        • Login or register to search.
        • First post
          Last post
        0
        • Categories
        • Recent
        • Tags
        • Popular
        • World
        • Users
        • Groups