Saving Web Site from automatic download
-
Hi There! Just was wondering as if there is some way of blocking the offline browsing softwares to download my entire web site. Like Web Site Ripper and same softwars, which have the capability to download the complete web site .. Is there any HTML code for that or VB/java Scripting stuff for that or something else. thanks regards,
-
Hi There! Just was wondering as if there is some way of blocking the offline browsing softwares to download my entire web site. Like Web Site Ripper and same softwars, which have the capability to download the complete web site .. Is there any HTML code for that or VB/java Scripting stuff for that or something else. thanks regards,
-
Hi There! Just was wondering as if there is some way of blocking the offline browsing softwares to download my entire web site. Like Web Site Ripper and same softwars, which have the capability to download the complete web site .. Is there any HTML code for that or VB/java Scripting stuff for that or something else. thanks regards,
Hmmm. This is a tricky one. I'm going to give you the benefit of the doubt and assume that you aren't asking a programming question in the lounge - that it's more about the general philosophy. I suspect that one way to do this would be to detect the user agent that was coming in and then allow/disallow access as appropriate. If you were doing this in ASP.NET you could do this in a HTTP Handler.
Deja View - the feeling that you've seen this post before.
-
Hi There! Just was wondering as if there is some way of blocking the offline browsing softwares to download my entire web site. Like Web Site Ripper and same softwars, which have the capability to download the complete web site .. Is there any HTML code for that or VB/java Scripting stuff for that or something else. thanks regards,
Not really. Depending on the software used, it might list itself in the user agent of the request and you could block that way. I think the most reliable method of blocking it to block pages that are being sent to the same IP at too fast of rate. If a request is getting received, one after the other from the same IP with no pause, it would either be a search bot or download sofware. They should be required to follow the same robot.txt file that search engines do, but I doubt they do.
Rocky <>< Blog Post: Silverlight goes Beta 2.0 Tech Blog Post: Cheap Biofuels and Synthetics coming soon?
-
Hi There! Just was wondering as if there is some way of blocking the offline browsing softwares to download my entire web site. Like Web Site Ripper and same softwars, which have the capability to download the complete web site .. Is there any HTML code for that or VB/java Scripting stuff for that or something else. thanks regards,
Once it's in my browser on my PC there's not much you can do to stop me. I'll get it if I really want it and if I really want it there's oh... a hundred ways to get it. I won't need to make a single request from your site I'll just tip-toe through my cache and pick up all the parts and pieces I want to make a nice little clone. I won't get your server side code that way but there's many ways around that as well. What is it you are afraid of? If it's that important... do not put it on the internet that's all I can tell you. Look at what the music industry is going through because they were stupid enough to not heed that rule. There's no way to stop a determined internet and the internet is very determined... Good luck though... :-D
-
Not really. Depending on the software used, it might list itself in the user agent of the request and you could block that way. I think the most reliable method of blocking it to block pages that are being sent to the same IP at too fast of rate. If a request is getting received, one after the other from the same IP with no pause, it would either be a search bot or download sofware. They should be required to follow the same robot.txt file that search engines do, but I doubt they do.
Rocky <>< Blog Post: Silverlight goes Beta 2.0 Tech Blog Post: Cheap Biofuels and Synthetics coming soon?
Rocky Moore wrote:
I think the most reliable method of blocking it to block pages that are being sent to the same IP at too fast of rate.
Yes, something similar to what google does. Will make a great article too ;) But it is still not perfect solution due to IP spoofing.
-
Hi There! Just was wondering as if there is some way of blocking the offline browsing softwares to download my entire web site. Like Web Site Ripper and same softwars, which have the capability to download the complete web site .. Is there any HTML code for that or VB/java Scripting stuff for that or something else. thanks regards,
What you need is a way to encrypt all of the files on the site, and then decrypt them as they're served. The following cost money, bbut less than $100. http://www.encrypt-html.com/[^] http://www.protware.com/default.htm[^] http://www.tagslock.com/[^] Google is your friend.
"Why don't you tie a kerosene-soaked rag around your ankles so the ants won't climb up and eat your candy ass..." - Dale Earnhardt, 1997
-----
"...the staggering layers of obscenity in your statement make it a work of art on so many levels." - Jason Jystad, 10/26/2001 -
Once it's in my browser on my PC there's not much you can do to stop me. I'll get it if I really want it and if I really want it there's oh... a hundred ways to get it. I won't need to make a single request from your site I'll just tip-toe through my cache and pick up all the parts and pieces I want to make a nice little clone. I won't get your server side code that way but there's many ways around that as well. What is it you are afraid of? If it's that important... do not put it on the internet that's all I can tell you. Look at what the music industry is going through because they were stupid enough to not heed that rule. There's no way to stop a determined internet and the internet is very determined... Good luck though... :-D
I believe the op was asking about making a complete copy of the website, not just that one single page you have in cache. There are plenty or reasons to do it, one is to prevent traffic congestion, website bandwidth can be expensive, and serving entire website which can be hundreds of megabytes if you include all the art and downloads (if any) to everybody who wants a copy will be expensive as well.
-
I believe the op was asking about making a complete copy of the website, not just that one single page you have in cache. There are plenty or reasons to do it, one is to prevent traffic congestion, website bandwidth can be expensive, and serving entire website which can be hundreds of megabytes if you include all the art and downloads (if any) to everybody who wants a copy will be expensive as well.
Yeah, you can do it a 100 ways. I'm not sure you can stop someone who wants it though.
-
Not really. Depending on the software used, it might list itself in the user agent of the request and you could block that way. I think the most reliable method of blocking it to block pages that are being sent to the same IP at too fast of rate. If a request is getting received, one after the other from the same IP with no pause, it would either be a search bot or download sofware. They should be required to follow the same robot.txt file that search engines do, but I doubt they do.
Rocky <>< Blog Post: Silverlight goes Beta 2.0 Tech Blog Post: Cheap Biofuels and Synthetics coming soon?
Rocky Moore wrote:
to block pages that are being sent to the same IP at too fast of rate
Your Chinese visitors won't like you very much. Outside the USA Network Address Translation is alive and well because the IPv4 numbering plan gives large US corporations and US and UK government departments 16 million public IP addresses each while the whole of some Asian countries get a few thousand between them. That said, IANA are holding on to some pretty damn big blocks. Regardless, unique public IP address is not a good way to distinguish 'unique' users.
DoEvents: Generating unexpected recursion since 1991
-
Not really. Depending on the software used, it might list itself in the user agent of the request and you could block that way. I think the most reliable method of blocking it to block pages that are being sent to the same IP at too fast of rate. If a request is getting received, one after the other from the same IP with no pause, it would either be a search bot or download sofware. They should be required to follow the same robot.txt file that search engines do, but I doubt they do.
Rocky <>< Blog Post: Silverlight goes Beta 2.0 Tech Blog Post: Cheap Biofuels and Synthetics coming soon?
Rocky Moore wrote:
They should be required to follow the same robot.txt file that search engines do, but I doubt they do.
I use WinHTTrack when I want to nab an offline copy of a game guide. In it's settings, it defaults to using robots.txt rules, but you can turn it off. Generally, I'll leave it on as most game walkthroughs I download don't use robots.txt files anyway. If it's just a single page, I'll use IE's "Save As" to make an .mht file. Flynn
_If we can't corrupt the youth of today,
the adults of tomorrow will be no fun...
_