Web spider legality
-
Hi, I've been working on a web spider for a little while now and have a discussion question about the legality of it. Basically, as it runs for days, weeks, and months, this spider should be making its way through the internet, with the potential for random-walking itself right into some E-ghettos. As we all know, the dark fringes of the internet are pretty dark indeed, and I'm wordering about what could potentially happen to me legally if my spider ends up in one of those places. So let us suppose that without any bad intentions my part, my automated spider stumbles onto some internet filth, say a pirated movie site, kiddie porn site, something like that. Legally, could this possibly get me in trouble? I mean, could I argue that it's automated? Since my spider isn't downloading any media (it might go to a kiddie porn site, but it would only download the html, not any pictures -- just reading it for the articles so to speak), would it be illegal at all? If I put some provisions in the program to try and prevent it from going to bad sites, but it does anyways, will that buy me anything? If I spoof the user-agent, will that make a difference? If my user-agent says "Python-urllib library-spider" vs "IE 6.0" in their server logs, will that make a difference in the eyes of John Law? I'm looking to try this out and have a little geeky fun, but I really don't want or need the FBI knocking on my door because this thing has been spidering some pro-terrorist site or something. If they do show up, can I show them my spider and database of millions of pages, and will that be satisfactory? What does everyone think? Discuss.
-
Hi, I've been working on a web spider for a little while now and have a discussion question about the legality of it. Basically, as it runs for days, weeks, and months, this spider should be making its way through the internet, with the potential for random-walking itself right into some E-ghettos. As we all know, the dark fringes of the internet are pretty dark indeed, and I'm wordering about what could potentially happen to me legally if my spider ends up in one of those places. So let us suppose that without any bad intentions my part, my automated spider stumbles onto some internet filth, say a pirated movie site, kiddie porn site, something like that. Legally, could this possibly get me in trouble? I mean, could I argue that it's automated? Since my spider isn't downloading any media (it might go to a kiddie porn site, but it would only download the html, not any pictures -- just reading it for the articles so to speak), would it be illegal at all? If I put some provisions in the program to try and prevent it from going to bad sites, but it does anyways, will that buy me anything? If I spoof the user-agent, will that make a difference? If my user-agent says "Python-urllib library-spider" vs "IE 6.0" in their server logs, will that make a difference in the eyes of John Law? I'm looking to try this out and have a little geeky fun, but I really don't want or need the FBI knocking on my door because this thing has been spidering some pro-terrorist site or something. If they do show up, can I show them my spider and database of millions of pages, and will that be satisfactory? What does everyone think? Discuss.
I don't see any problem that doesn't also face google, etc. If I were to go to www.jihad.com once, even if I looked around, I doubt that would raise any alarms. If I went back daily and they were watching, I guess it might. Either way, it's obvious that you're hitting plenty of other sites equally, for the purpose you're describing. I have no idea of the legality, but if it were me, I would not be worried.
Christian Graus - Microsoft MVP - C++ Metal Musings - Rex and my new metal blog
-
Hi, I've been working on a web spider for a little while now and have a discussion question about the legality of it. Basically, as it runs for days, weeks, and months, this spider should be making its way through the internet, with the potential for random-walking itself right into some E-ghettos. As we all know, the dark fringes of the internet are pretty dark indeed, and I'm wordering about what could potentially happen to me legally if my spider ends up in one of those places. So let us suppose that without any bad intentions my part, my automated spider stumbles onto some internet filth, say a pirated movie site, kiddie porn site, something like that. Legally, could this possibly get me in trouble? I mean, could I argue that it's automated? Since my spider isn't downloading any media (it might go to a kiddie porn site, but it would only download the html, not any pictures -- just reading it for the articles so to speak), would it be illegal at all? If I put some provisions in the program to try and prevent it from going to bad sites, but it does anyways, will that buy me anything? If I spoof the user-agent, will that make a difference? If my user-agent says "Python-urllib library-spider" vs "IE 6.0" in their server logs, will that make a difference in the eyes of John Law? I'm looking to try this out and have a little geeky fun, but I really don't want or need the FBI knocking on my door because this thing has been spidering some pro-terrorist site or something. If they do show up, can I show them my spider and database of millions of pages, and will that be satisfactory? What does everyone think? Discuss.
Nathan A. wrote:
a pirated movie site, kiddie porn site, something like that. Legally, could this possibly get me in trouble?
Yup. Just heard an interview with a law enforcement officer about that. He pointed out that when they do their investigations, they have to have a secure room with checks and double checks and multiple witnesses to make sure EVERYTHING is legit. Simply hitting the site will probably not get you into trouble, but if anything more than trivial data goes back to your computer, you could end up in BIG trouble. (Also think about the consequence of hitting government secure sites. Even if you don't get in, if you knock too much, the FBI is likely to pay you a visit.) BTW, I believe spoofing is a felony; regardless, it will make you look even more guilty. (I suspect Google heavily documents everything they are doing. Plus, I'm sure they already have a sharing agreement with the FBI to turn over the scumbags they find to them.)
Anyone who thinks he has a better idea of what's good for people than people do is a swine. - P.J. O'Rourke
-
Hi, I've been working on a web spider for a little while now and have a discussion question about the legality of it. Basically, as it runs for days, weeks, and months, this spider should be making its way through the internet, with the potential for random-walking itself right into some E-ghettos. As we all know, the dark fringes of the internet are pretty dark indeed, and I'm wordering about what could potentially happen to me legally if my spider ends up in one of those places. So let us suppose that without any bad intentions my part, my automated spider stumbles onto some internet filth, say a pirated movie site, kiddie porn site, something like that. Legally, could this possibly get me in trouble? I mean, could I argue that it's automated? Since my spider isn't downloading any media (it might go to a kiddie porn site, but it would only download the html, not any pictures -- just reading it for the articles so to speak), would it be illegal at all? If I put some provisions in the program to try and prevent it from going to bad sites, but it does anyways, will that buy me anything? If I spoof the user-agent, will that make a difference? If my user-agent says "Python-urllib library-spider" vs "IE 6.0" in their server logs, will that make a difference in the eyes of John Law? I'm looking to try this out and have a little geeky fun, but I really don't want or need the FBI knocking on my door because this thing has been spidering some pro-terrorist site or something. If they do show up, can I show them my spider and database of millions of pages, and will that be satisfactory? What does everyone think? Discuss.
It depends on your jurisdiction and what information is stored on your computer. The test would probably be what you do with the information, not whather your spider visited a site unintentially.
cheers, Chris Maunder
CodeProject.com : C++ MVP
-
Hi, I've been working on a web spider for a little while now and have a discussion question about the legality of it. Basically, as it runs for days, weeks, and months, this spider should be making its way through the internet, with the potential for random-walking itself right into some E-ghettos. As we all know, the dark fringes of the internet are pretty dark indeed, and I'm wordering about what could potentially happen to me legally if my spider ends up in one of those places. So let us suppose that without any bad intentions my part, my automated spider stumbles onto some internet filth, say a pirated movie site, kiddie porn site, something like that. Legally, could this possibly get me in trouble? I mean, could I argue that it's automated? Since my spider isn't downloading any media (it might go to a kiddie porn site, but it would only download the html, not any pictures -- just reading it for the articles so to speak), would it be illegal at all? If I put some provisions in the program to try and prevent it from going to bad sites, but it does anyways, will that buy me anything? If I spoof the user-agent, will that make a difference? If my user-agent says "Python-urllib library-spider" vs "IE 6.0" in their server logs, will that make a difference in the eyes of John Law? I'm looking to try this out and have a little geeky fun, but I really don't want or need the FBI knocking on my door because this thing has been spidering some pro-terrorist site or something. If they do show up, can I show them my spider and database of millions of pages, and will that be satisfactory? What does everyone think? Discuss.
I think if you have significant legal concerns, you should hire a lawyer. I wouldn't trust my options on going to jail or not on heresay and guesses from the likes of people like us. :-D I'm no big fan of legal beagles, but this is what their area of expertise is...
Author of The Career Programmer and Unite the Tribes www.PracticalStrategyConsulting.com
-
Nathan A. wrote:
a pirated movie site, kiddie porn site, something like that. Legally, could this possibly get me in trouble?
Yup. Just heard an interview with a law enforcement officer about that. He pointed out that when they do their investigations, they have to have a secure room with checks and double checks and multiple witnesses to make sure EVERYTHING is legit. Simply hitting the site will probably not get you into trouble, but if anything more than trivial data goes back to your computer, you could end up in BIG trouble. (Also think about the consequence of hitting government secure sites. Even if you don't get in, if you knock too much, the FBI is likely to pay you a visit.) BTW, I believe spoofing is a felony; regardless, it will make you look even more guilty. (I suspect Google heavily documents everything they are doing. Plus, I'm sure they already have a sharing agreement with the FBI to turn over the scumbags they find to them.)
Anyone who thinks he has a better idea of what's good for people than people do is a swine. - P.J. O'Rourke
User Agent spoofing is a legitimate process and I cannot imagine it being illegal at all. User Name spoofing, is possibly another matter. All he is trying to do is make the receiving server handle his response in a known manner. This is necessary because there are some morons out there don't learn what they're doing, or make any attempt to accomodate any web browsers other than IE. To make sure that their precious work of 'art' is seen 'properly' they may refuse to serve pages to any client that doesn't identify itself as IE. being able to do this is getting even more important with the advent of mobile browsing and it is something that most Opera and Netscape users have done for years. Now, opening the SMTP port on a mail server and pretending to be someone else when sending mails or spam is another matter entirely and in many juristictions is at least frowned upon if not outright illegal. The two things are completely different.
-
I think if you have significant legal concerns, you should hire a lawyer. I wouldn't trust my options on going to jail or not on heresay and guesses from the likes of people like us. :-D I'm no big fan of legal beagles, but this is what their area of expertise is...
Author of The Career Programmer and Unite the Tribes www.PracticalStrategyConsulting.com
I am a licensed beagle. We don't know crap. We only think we do. Years ago my then-5-yr-old son asked me "Daddy, what do you do?" And I explained to him that as an attorney you go to school, then high school, then college, then three years of law school; then you take a reasonably tough exam where half of the people taking it usually don't pass (California, anyway). And then you take an identical set of facts (say a signed contract) and go argue about it with somebody else with at least as much training, trying to decide what those facts really mean, in front of a judge who's had as much or more training and generally more experience. Then after you get done and a decision is handed down, one side is usually unhappier than the other with the result, so you can file an appeal. And the cycle continues. Another caveat: the advice you get from an attorney today, even if reasonably accurate today, can change with political pressures, changing priorities (until recently, I never thought twice about bringing bottled water onto an airplane), the whimsical decision of an appellate court. And then the legislature (federal, state, even local) changes a law - and there are precious few attorneys who feel an ethical obligation to go back through their files and see what advice they've given might be impacted by the (literally thousands of) changes that come out over the course of a year. And we argue about legislative intent: what did the lawmakers really mean? And with all of the posturing today, what "I" meant when I drafted, sponsored and/or voted for a statute has a highly political component. It's just as bad on the enforcement side (I used to be a cop, too). Laws (apparently) passed to thwart terrorism have been used to charge people involved in incidents of domestic violence (with "use of terrorist threats") simply because the alleged (really "alleged" - a factually innocent person) perpetrator had gone through basic Marine Corps training. I could go on and on. Don't trust the legislators to pass wise (or good) laws. Don't trust John Law to enforce it (gun control, borders) or to use it "appropriately" (by whose definition?). Don't trust attorneys or the system to convict or acquit appropriately (the system found OJ legally not guilty, but do YOU believe he was factually innocent?). To the root question about crawlers: the common wisdom expressed by the folks posting here is probably close to the actual result you'll get in the long run. And if you get stung because some local, state or f