Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. Downloading PDF's from website

Downloading PDF's from website

Scheduled Pinned Locked Moved C#
comquestion
6 Posts 2 Posters 2 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • T Offline
    T Offline
    Tom Wright
    wrote on last edited by
    #1

    I wrote a program that will find all links to PDF's on a site and download them. However when I look at the downloaded PDF's they are all the same size and corrupted. If I download them via IE they are differnt sizes and they open just fine. My download code is: Webclient wc = new WebClient(); wc.DownLoadFile("http://www.somesite.com/somepage/somefile.pdf", "c:\site\somefile.pdf"); I get no errors. I'm not sure if this makes a difference but if I look at the source of the site they have the href to the pdf as href="/somepage/somefile.pdf" I string the hostname to the front of the URL. I know I can do an httpwebrequest but how will I know how big to make my buffer. Suggestions? Thanks

    Tom Wright tawright915@gmail.com

    C 1 Reply Last reply
    0
    • T Tom Wright

      I wrote a program that will find all links to PDF's on a site and download them. However when I look at the downloaded PDF's they are all the same size and corrupted. If I download them via IE they are differnt sizes and they open just fine. My download code is: Webclient wc = new WebClient(); wc.DownLoadFile("http://www.somesite.com/somepage/somefile.pdf", "c:\site\somefile.pdf"); I get no errors. I'm not sure if this makes a difference but if I look at the source of the site they have the href to the pdf as href="/somepage/somefile.pdf" I string the hostname to the front of the URL. I know I can do an httpwebrequest but how will I know how big to make my buffer. Suggestions? Thanks

      Tom Wright tawright915@gmail.com

      C Offline
      C Offline
      Covean
      wrote on last edited by
      #2

      Have you ever looked what is in those corrupted PDFs? Maybe you will find a html-site that tells you that the file wasn't found on this server or you maybe haven't the right to access to file directly.

      T 1 Reply Last reply
      0
      • C Covean

        Have you ever looked what is in those corrupted PDFs? Maybe you will find a html-site that tells you that the file wasn't found on this server or you maybe haven't the right to access to file directly.

        T Offline
        T Offline
        Tom Wright
        wrote on last edited by
        #3

        Your right....opened it with a hex editor and found HTML. Renamed the extension to .html and it's the logon page. Okay so even though I have logged on outside of my app and checked the box to remember me, it does not use that cookie. So how do I pass the username and password in my app to grab the file? Thanks

        Tom Wright tawright915@gmail.com

        C 1 Reply Last reply
        0
        • T Tom Wright

          Your right....opened it with a hex editor and found HTML. Renamed the extension to .html and it's the logon page. Okay so even though I have logged on outside of my app and checked the box to remember me, it does not use that cookie. So how do I pass the username and password in my app to grab the file? Thanks

          Tom Wright tawright915@gmail.com

          C Offline
          C Offline
          Covean
          wrote on last edited by
          #4

          To access files from a website that saves the login in a cookie is hard. You have to find the cookie the website saved on your computer and send it with the HTTP-Request Header. (I don't know an other way except the website has a possibility to login per querystring (ex. data.aspx?user=abc&pwd=pwd)).

          T 1 Reply Last reply
          0
          • C Covean

            To access files from a website that saves the login in a cookie is hard. You have to find the cookie the website saved on your computer and send it with the HTTP-Request Header. (I don't know an other way except the website has a possibility to login per querystring (ex. data.aspx?user=abc&pwd=pwd)).

            T Offline
            T Offline
            Tom Wright
            wrote on last edited by
            #5

            If I dumped the html in to a bowser object on my app where the end user logged on, would those credentials carry over to my app? Hope this makes sense.

            Tom Wright tawright915@gmail.com

            C 1 Reply Last reply
            0
            • T Tom Wright

              If I dumped the html in to a bowser object on my app where the end user logged on, would those credentials carry over to my app? Hope this makes sense.

              Tom Wright tawright915@gmail.com

              C Offline
              C Offline
              Covean
              wrote on last edited by
              #6

              I think those credentials will only work in the scope of your app browser object. But you can download those pdfs if you get your browser object to do this for you. (I have build an app like yours a time ago and had the same problem, but I managed my browser object to do the steps (but not downloading) I wanted to automate.)

              1 Reply Last reply
              0
              Reply
              • Reply as topic
              Log in to reply
              • Oldest to Newest
              • Newest to Oldest
              • Most Votes


              • Login

              • Don't have an account? Register

              • Login or register to search.
              • First post
                Last post
              0
              • Categories
              • Recent
              • Tags
              • Popular
              • World
              • Users
              • Groups