Wayback Machine Safe, Settlement Disappointing 182
Jibbanx writes "Healthcare Advocates and the Internet Archive have finally resolved their differences, reaching an undisclosed out-of-court settlement. The suit stemmed from HA's anger over the Wayback Machine showing pages archived from their site even after they added a robots.txt file to their webserver. While the settlement is good for the Internet Archive, it's also disappointing because it would have tested HA's claims in court. As the article notes, you can't really un-ring the bell of publishing something online, which is exactly what HA wanted to do. Obeying robots.txt files is voluntary, after all, and if the company didn't want the information online, they shouldn't have put it there in the first place."
Autolawyers (Score:4, Insightful)
If Congress were serious about keeping the US economy "safe and effective", it would reform the "lawyers' job security" laws. Instead it will surely make them even worse, and make the lawyer tax on technology mandatory.
I sense a little two-faced opinion here (Score:5, Insightful)
So by the logic, if I didn't want AOL to release my search information I shouldn't be mad as it's my fault to have used them in the first place? Or that if I want my copyrighted information to not be republished by someone else, I should just simply not publish at all? How about, if I don't want my GPL code resold by someone in a closed source product I should just know better and not put it out in the open to begin with. And that if I post something stupid when I'm 9 we believe it should follow me around throughout my entire lifetime, because a 9 year old should know better.
If you don't want it read... (Score:4, Insightful)
People shouldn't put anything on the Internet that they wouldn't want their worst enemy, boss, NSA, or grandmother to see. Obviously since the porn industiry exists online, few people follow this rule, but it's a good one none the less.
I enjoy Archive.org and when I get nostalgic about my websites of the past, it's there to show me a glimpse into history.
Re:Autolawyers (Score:3, Insightful)
What REALLY pisses me OFF (Score:5, Insightful)
After a certain domain was no longer in use for years some adware search rank linkpharm whatever it is added a robots.txt file to a "hijacked" domain.
One can now get formerly accessible sites removed from archive.org. EVEN IF THE ORIGINAL OWNER NEVER INTENDED TO.
Re:Autolawyers (Score:5, Insightful)
There's probably a way to ensure that lawyers represent people's rights better than they do now. Regular random audits of billings and practices. More "contempt of court" punishment. More suspended/revoked licenses, especially for repeated frivolous representation. More "malpractice" awards. There ought to be more competition, with more standardized reviews contextualizing all those "scores", published for consumers.
Lawyers even more than doctors hide behind consumer ignorance and blind "respect". Exposing their performance as part of the shopping process would make them more competitive, and better adhere to the required "ethics" that usually are assumed to come with the tie.
A world without cooperation (Score:5, Insightful)
Even if you don't fear the legal system, disregarding robots.txt can quickly get you in trouble. There are junk-scripts which feed bots endlessly and there are blocklisting automatisms against unbehaving bots. If people program their bots to ignore robots.txt, these and possibly more proactive self-defense mechanisms will become the norm. Is that the net you want? Maybe obeying robots.txt is the better alternative, don't you think?
Retroactive robots.txt (Score:5, Insightful)
First, some background. I have a weblog I've been running since 2002, switching from B2 to WordPress and changing the permalink structure twice (with appropriate HTTP redirects each time) as nicer structures became available. Unfortunately, some spiders kept hitting the old URLs over and over again, despite the fact that they forwarded with a 301 permanent redirect to the new locations. So, foolishly, I added the old links to robots.txt to get the spiders to stop.
Flash forward to earlier this week. I've made a post on Slashdot, which reminds me of a review I did of Might and Magic IX nearly four years ago. I head to my blog, pull up the post... and to my horror, discover that it's missing half a sentence at the beginning of a paragraph and I don't remember the sense of what I originally wrote!
My backups are too recent (ironic, that), so I hit the Wayback Machine. They only have the post going back to 2004, which is still missing the chunk of text. Then I remember that the link structure was different, so I try hitting the oldest archived copies of the main page, and I'm able to pull up the summary with a link to the original location. I click on it... and I see:
Excluded by robots.txt (or words to that effect).
Now this is a page that was not blocked at the time that ia_archiver spidered it, but that was later blocked. The Wayback machine retroactively blocked access to the page based on the robots.txt content. I searched through the documentation and couldn't determine whether the data had actually been removed or just blocked, so I decided to alter my site's robots.txt file, fire off a request for clarification, and see what happened.
As it turns out, several days later, they unblocked the file, and I was able to restore the missing text.
In summary, the Wayback Machine will block end-users from accessing anything that is in your current robots.txt file. If you remove the restriction from your robots.txt, it will re-enable access, but only if it had archived the page in the first place.
Re:I sense a little two-faced opinion here (Score:5, Insightful)
If you post something on the net then I can point my browser to it - there is no privacy, and nor was there any expectation of it. I could have used wget -r -erobots=off on your page every day and got all its content - and I'd have that archive even when you deleted it or moved it into some private archive, and it happily ignored your robots.txt. Since obeying robots.txt is volutary I simply chose not to.
News websites often want you to pay to for older content but there is nothing theoretically stopping you from saving all the content day by day. You are comparing apples and oranges.
Heres the summary - we posted evidence online that was used against us in a court of law, we lost, we sued the people who provided that evidence, and because its cheaper to settle than deal with bloody lawyers we settled with them.
Does anyone here know what copyright is?! (Score:3, Insightful)
Pretty much every time we have a discussion about the legality of web/Usenet archive sites, the only argument with any legal weight that's given for what would otherwise be a clear infringement of copyright is that the rightsholder is implicitly consenting to certain uses by making the material available on that medium. The degree to which this holds in general is debatable, and AFAIK has never been tested in any major court case in any jurisdiction. However, even if robots.txt is voluntary, it's a clear statement of intent. There is no way you can claim implicit permission to copy the material when the supplier explicitly indicated, using a recognised mechanism, that they did not want it copied.
That makes comments like this one by Doc Ruby [slashdot.org] and this one by saskboy [slashdot.org] seem a little presumptuous, IMNSHO.
Re:A world without cooperation (Score:5, Insightful)
Info published on the Internet... (Score:3, Insightful)
I'll no doubt have lawyer (and lawyer wannabees) protesting - but that only follows the literal and common sense meaning of "public domain," instead of the legal rationalization which has been brought about by those who want to have their cake, and eat it too.
Re:I sense a little two-faced opinion here (Score:3, Insightful)
Its purpose is not to censor information but to avoid incident by agressive robots that could stress WWW servers (introduction in the first link).
HA action is revisionism. Like a politician yelling something then a few years later claiming he never said such a thing and threatening people with a piece of evidence to the contrary.
Re:Info published on the Internet... (Score:3, Insightful)
Re:Don't need no Wayback (Score:3, Insightful)
Inital impressions go a long way. It may seem silly to some people, but in buisness it can mean the difference between people taking you seriously and buying your product, or not.
Re:Simple post (Score:2, Insightful)
Re:Info published on the Internet... (Score:2, Insightful)
My friend's hosting service got hacked. we caught it right away, before a site had been put into place, but the individuals attempted to put up the site http://paypal-protect.org./ [paypal-protect.org.] We shut them down quick. They went on to hack another hoster, and currently have their little phishing site up and running. I suggest you go to the site, and without using ANY real information, login with a bogus email and password, and check it out. If you take a look at the WHOIS entry for paypal-protect.org, you will see a name and address of an actual individual. We called this guy and told him that it was likely his name, info and credit card were used illegally to register the domain.
THe important thing to notice, is the EMAIL contact in the WHOIS entry. GO ahead, and do a google search for that email address. You will turn up two forum posts this guy made, where he is selling credit card info, bank info, Ccvv2 numbers and more. Now, the first result in your google search is a post at paypalsucks.com. You would not BELIEVE what it took to get the admin there to remove the post. And his policy wasn't to remove posts normally, but to just move them to a "garbage" thread, which would still be publically available. The second and third result in your google search, were a post left on a free board that was created at anyboard.net. I was able to get that board taken down within 12 hours of notifying the host, netbula. THe board was being used for lots of CC resellers, for at least 5 years before I got it shut down. How do I know? Three of those years are archived at archive.org.
However... EACH OF THOSE POSTS is still there in the google cache. Go ahead and see. Why is this important? Because all you need to see, if you are in the market to buy stolen Identities and credit cards, is the contact information. It does not matter if it is in an archive, or if it is in an active forum. Archiving it has made it virtually impossible to remove from the net, because now there is no way of knowing exactly who has archived this information.
Now, I've not provided clickable links for a reason. I've provided enough information here, that if you want to check my facts, you can do so.
A library might be public domain, but the books within are not. There are some books that ARE considered public domain, but that does not mean that EVERY book is public domain.
Re:Info published on the Internet... (Score:2, Insightful)
You know the current standard the US follows, for copyright of printed works, is LIFE+70 years? That means that once the author copyrights their work, the copyright is good for 70 years after they die. Only after the copyright expires and it is not renewed, the work becomes public domain.
http://onlinebooks.library.upenn.edu/okbooks.html [upenn.edu]
there are some specific exceptions based on when the work was copywritten, when the work was published, what country it was published, whether or not the copyright notice was properly added to the work, and more.
To continue the library analogy I started earlier, the internet is a library. websites are the books. each must be treated as an individual entity. If someone steals your identity through a phishing scam, and uses that info immediately, then sure you might be able to get out of liability by appealing to your bank. DOes that mean that phishers should be allowed to run their scams freely and uncontested, because they can just pot your info and declare it public domain, which would then in turn give them license to use that info however they wanted?
What if YOU didn;t put those photos on the internet? What if your Ex Girlfriend stole them by using your spare key when you were at work? Sorry charlie, they are on the net now and are public domain? I don't think so.