Downloading PDF's from website
-
I wrote a program that will find all links to PDF's on a site and download them. However when I look at the downloaded PDF's they are all the same size and corrupted. If I download them via IE they are differnt sizes and they open just fine. My download code is: Webclient wc = new WebClient(); wc.DownLoadFile("http://www.somesite.com/somepage/somefile.pdf", "c:\site\somefile.pdf"); I get no errors. I'm not sure if this makes a difference but if I look at the source of the site they have the href to the pdf as href="/somepage/somefile.pdf" I string the hostname to the front of the URL. I know I can do an httpwebrequest but how will I know how big to make my buffer. Suggestions? Thanks
Tom Wright tawright915@gmail.com
-
I wrote a program that will find all links to PDF's on a site and download them. However when I look at the downloaded PDF's they are all the same size and corrupted. If I download them via IE they are differnt sizes and they open just fine. My download code is: Webclient wc = new WebClient(); wc.DownLoadFile("http://www.somesite.com/somepage/somefile.pdf", "c:\site\somefile.pdf"); I get no errors. I'm not sure if this makes a difference but if I look at the source of the site they have the href to the pdf as href="/somepage/somefile.pdf" I string the hostname to the front of the URL. I know I can do an httpwebrequest but how will I know how big to make my buffer. Suggestions? Thanks
Tom Wright tawright915@gmail.com
-
Have you ever looked what is in those corrupted PDFs? Maybe you will find a html-site that tells you that the file wasn't found on this server or you maybe haven't the right to access to file directly.
Your right....opened it with a hex editor and found HTML. Renamed the extension to .html and it's the logon page. Okay so even though I have logged on outside of my app and checked the box to remember me, it does not use that cookie. So how do I pass the username and password in my app to grab the file? Thanks
Tom Wright tawright915@gmail.com
-
Your right....opened it with a hex editor and found HTML. Renamed the extension to .html and it's the logon page. Okay so even though I have logged on outside of my app and checked the box to remember me, it does not use that cookie. So how do I pass the username and password in my app to grab the file? Thanks
Tom Wright tawright915@gmail.com
To access files from a website that saves the login in a cookie is hard. You have to find the cookie the website saved on your computer and send it with the HTTP-Request Header. (I don't know an other way except the website has a possibility to login per querystring (ex. data.aspx?user=abc&pwd=pwd)).
-
To access files from a website that saves the login in a cookie is hard. You have to find the cookie the website saved on your computer and send it with the HTTP-Request Header. (I don't know an other way except the website has a possibility to login per querystring (ex. data.aspx?user=abc&pwd=pwd)).
If I dumped the html in to a bowser object on my app where the end user logged on, would those credentials carry over to my app? Hope this makes sense.
Tom Wright tawright915@gmail.com
-
If I dumped the html in to a bowser object on my app where the end user logged on, would those credentials carry over to my app? Hope this makes sense.
Tom Wright tawright915@gmail.com
I think those credentials will only work in the scope of your app browser object. But you can download those pdfs if you get your browser object to do this for you. (I have build an app like yours a time ago and had the same problem, but I managed my browser object to do the steps (but not downloading) I wanted to automate.)