Archive.org Sued By Colorado Woman 797
An anonymous reader writes "The Internet Archive is being sued by a Colorado woman for spidering her site. Suzanne Shell posted a notice on her site saying she wasn't allowing it to be crawled. When it was, she sued for civil theft, breach of contract, and violations of the Racketeering Influence and Corrupt Organizations act and the Colorado Organized Crime Control Act. A court ruling last month granted the Internet Archive's motion to dismiss the charges, except for the breach of contract claim. If Shell prevails on that count, sites like Google will have to get online publishers to 'opt in' before they can be crawled, radically changing the nature of Web search."
If she didn't want it crawled.. (Score:4, Informative)
robots.txt (Score:5, Informative)
Um...robots.txt?! (Score:1, Informative)
--
Franklin Brauner
Re:Posted notice? (Score:5, Informative)
No, she didn't post the notice properly:
The case should be thrown out, period. She should just have learned her lesson and used a proper robots.txt file next time. If you're going to post stuff on the Internet and don't want it to beb indexed or archived, you should know what you're doing. If you don't, it's your problem. The lawsuit is frivolous and inane.
A bit about Suzanne Shell (Score:5, Informative)
Re:it would actually be nice if ... (Score:2, Informative)
Statute of Frauds (Score:4, Informative)
Court dismissed most charges (Score:5, Informative)
Check out this article here: http://www.phillipsnizer.com/library/cases/lib_ca
According to this, she requested that the site be removed from the Archive in December, 2005, and they complied. They're actually countersuing her. They moved to have her claims dropped for various reasons, but the court chose to only drop the ones related to conversion, civil theft and the RICO claims. The issue of breach of contract and copyright infringement still apply.
I think it's absolutely ridiculous that this can go forward, especially when there are two established methods to stop the Archive's activity: The opt-out, which will remove history, and robots.txt (which she didn't use and appears to still not use), which will prevent that spider from ever archiving her site again.
Her site shows up in Google, I wonder why she hasn't sued them? Could it be that she likes the exposure of the big search engine, but doesn't want any history of her site archived by the Internet Archive?
Re:it would actually be nice if ... (Score:3, Informative)
Re:Posted notice? (Score:5, Informative)
Re:Maybe I'm new here... (Score:2, Informative)
i'd also like to point out that the spam prevention word for this post was "sucked"
Re:GRRRRRRRRRR (Score:1, Informative)
Please. Let's try not to live up to the "geeks are frustrated male misogynists" stereotypes, shall we?
Also, robots.txt is a pretty silly standard(much like the similar practice of using favicon.ico for website icons, using magical filenames here makes the process obscure and unintuitive), a (de-facto) standard though it is. While I agree that people who care as much about spidering as Suzanne Shell obviously does should learn about robots.txt, it's not something you can expect every non-technical person who operates a website to know about.
-A.C.
Suzanne Shell - Think of The Children!!!11!1 (Score:4, Informative)
She is advertised by: http://www.profane-justice.org/ [profane-justice.org]
An example of her work: http://www.profane-justice.org/sctcomplnt.pdf [profane-justice.org]
She urges interested parties to contact her via her contact address or phone number on the letterhead in above PDF.
Let her know what you think!
Re:Posted notice? (Score:5, Informative)
http://www.phillipsnizer.com/library/cases/lib_ca
SITE SLASHDOTTED (Score:5, Informative)
14053 Eastonville Rd.
Elbert, CO 80106
719.749.2971
For those looking to share your views, Suzanne has asked that we continue to contact her organization at her official "non-web" addresses.
Re:Posted notice? (Score:5, Informative)
It isn't just a "nice convention". It's a sufficiently reasonable precaution available to plaintiff to effectively avoid the inadvertent disclosure of copyrighted documents. Failure to provide a simple robots.txt file evidences a lack of reasonable precaution and undermines plaintiff's claims to redress in a court of law.
In her defense it seems she probably needs the money after being fined $6000 in a Colorado state court a few months ago for a contempt violation (unauthorized practice of law) [coloradoconfidential.com] after she participated in three separate Colorado court cases under a power of attorney when she had no prior involvement- after having been warned on a prior occasion that this was illegal in the state of Colorado. In fact it's illegal everywhere except Slashdot. But of course it's lies, all lies! She needs a good spanking.
Re:This is so stupid (Score:4, Informative)
Re:Posted notice? (Score:2, Informative)
I'll verify archive.org do remove content quite quickly. A friend of mine was being stalked, and while she'd removed the pages on her site that had more information than she'd wanted about herself online publicly (a mistake she made in 1999 putting the pages up), archive.org kept a copy. According to their FAQ, putting a denial in robots.txt would not only stop her site being crawled, but remove old archived pages from their archive - so she added a relevant robots.txt entry that denied archive.org access, and submitted her site to their crawl engine again.
Within minutes a horde of archive.org bots from different archive.org crawl servers descended on her site, reading that robots.txt file. Within 15 minutes some archive.org searches for her domain came up with their "We're sorry, access to $SITENAME has been blocked by the site owner via robots.txt" message, and after 24 hours and more crawls, all searches came up negative. Now, archive.org checks robots.txt daily, and crawls no further than that.
There's no guarantee of getting rid of anything you publish online of course, whether it be a webpage, newsgroup post, email or IM, but archive.org do the right thing there, automatically and quickly.
Re:Posted notice? (Score:5, Informative)
ROTFL. I didn't even see that part. Since it was buried below text talking about copying/printing, the assumption is that it is a continuation of that content, but it really isn't. More on why this text is also an illegal contract a little later.
But first, the obvious flaws: the content formatting is so unreadable that it's easy to miss. And then, there are the spelling and grammatical errors in their "license" notice.
For example, "The content if this website is intended to generate income, it is not free if you intend to archive, copy, print or distribute anything electronically fixed herein." Let's see. Run-on sentence, missing serial comma, misspelled a two-letter word.... Could someone explain to me why someone incapable of writing a very simple English sentence without tons of very basic 2nd grader mistakes is trying to make money on the internet?
And then, there's the fact that this woman lacks a basic fundamental understanding of computers. "Permission and limited, non-exclusive license to reproduce this web site, by any method including but not limited to magnetically, digitally, electronically or hard copy, may be purchased for $5,000 (five thousand dollars) per printed hard copy page per copy, in advance of printing." Where to begin.... Magnetic, digital, and electronic reproduction do not involve printing! Oh, yeah, and missing a serial comma, an extra comma after "this web site", I think it needs a comma after "by any method", but I'd have to see what Strunk & White say on the subject. And "per printed hard copy page per copy" is redundant. A printed hard copy page is a single copy by definition.
But my favorite part is this: not only is there no robots.txt (still), but also nowhere in the page source are there any meta tag to indicate that the document should not be cached, so by viewing the page, you are committing an act which the license claims would require payment of $5,000, and by viewing it through AOL, you also cause AOL to commit an act (caching) which the license claims would require payment of $5,000. And by viewing it through AOL using the Google cache [64.233.167.104], you cause both companies to owe $5,000. Just in case you think that she might have been smart enough (yeah, right) to set it in the headers, here are the headers for the web page:
HTTP/1.1 200 OK
Date: Sat, 17 Mar 2007 19:25:33 GMT
Server: Apache
Last-Modified: Fri, 16 Mar 2007 16:15:52 GMT
ETag: "10d27b-f53b-45fac2b8"
Accept-Ranges: bytes
Content-Length: 62779
Content-Type: text/html
Notice anything missing? Like a Cache-Control directive?
Here's a hint: this woman is a nearly computer-illiterate neophyte who posted tons of content online without any real understanding of how the internet works, and now is pissed off because of her own carelessness. Ignorance of the way the net operates is not a defense, folks.
There. I've copied a portion of the site contents. She'll probably sue me, too. I'm glad we have SLAPP laws here....
Their site posts directions on not being crawled (Score:2, Informative)
Re:Posted notice? (Score:3, Informative)
This woman seems to be angry just at the mere fact that it's been copied, and not with any occurrence of distribution. In a GPL violation case, one normally asks for compliance or a cessation of distribution, trying to work something out before launching into a lawsuit. That's because the violation may have been totally accidental (as it would be with a spider with no robots.txt present), and even if you "won" the lawsuit, your tiny award would not cover the costs. So, if she didn't first ask archive.org to remove the material, she'll have a difficult time showing she was acting with any good faith.
Re:no, I'm obligated to have a door (Score:2, Informative)
Her web pages appear to be HTML.
Wow, look at that. Right in the damn HTML4 spec:
The robots.txt file
When a Robot visits a Web site, say http://www.foobar.com/ [foobar.com], it firsts checks for http://www.foobar.com/robots.txt [foobar.com]. If it can find this document, it will analyze its contents to see if it is allowed to retrieve the document. You can customize the robots.txt file to apply only to specific robots, and to disallow access to specific directories or files.
robots.txt, and the meta tag defined later in the same document, are indeed rules.
Re:mitigating circumstances: she's pro-child abuse (Score:3, Informative)
In some places, I think that's entirely appropriate. I might have agreed with you until I watched a friend get accused of sexually abusing his kids by his crackhead ex wife. He's complied with ever detail of the law, and even though no evidence exists against him, he'll still probably not see his kids again unless they decide later in life to contact him.
Frankly, he would have been in far less trouble had he simply killed his ex and disappeared with the kids.
While I'm not saying that CPS is universally bad, I do know that I'll never again live in Missouri after seeing how they routinely shred the constitution "for the children".
Re:This is so stupid (Score:3, Informative)
Re:Posted notice? (Score:3, Informative)
Re:it would actually be nice if ... (Score:3, Informative)
If she were to sue anyone, she should have sued Google, who, as a for-profit information service selling advertisements, cannot claim to be a public library, and whose copy of her site is every bit as much a re-publication as the Archive's, but without the defenses spelled out in copyright law that the Archive has.
Not computer illeterate, a set-up (Score:3, Informative)
It's OK (Score:5, Informative)
GET / HTTP/1.1
User-Agent: By accepting this HTTP GET request you agree to release into the public domain the entire contents of this web site.
Host: www.profane-justice.org
Pragma: no-cache
Accept: */*
HTTP/1.1 200 OK
Date: Sun, 18 Mar 2007 02:11:09 GMT
Server: Apache
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html
semiPro Se too... (Score:2, Informative)
US Court of Appeasls for the Tenth District (html doc online) [kscourts.org]
v.
and
I. Background
Ms. Fields's daughter was the subject of a dependency and neglect proceeding initiated by the state of Colorado in January 2003. In connection therewith, the state provided Ms. Fields a court-appointed attorney, Mr. Kender, but Ms. Fields also hired Ms. Shell, a journalist who researches and documents child protection agencies' practices, to act as an expert consultant. Shortly thereafter, Ms. Fields executed a power of attorney naming Ms. Shell as her agent. Ms. Fields also agreed to be included in Ms. Shell's documentary video project concerning child protection services.
On April 16, 2003, Mr. Meconi, the Fremont County DHS's attorney, filed a motion in state court to make Ms. Shell a special respondent in the pending dependency and neglect action. The motion sought to prevent Ms. Shell "from contacting the minor child or [Ms. Fields] . . . and from otherwise being involved in the proceedings . . . , including, but not limited to, acting as counsel for [Ms. Fields] or otherwise engaging in the unauthorized practice of law."
[. . .]
On May 9, after a hearing on the motion to make Ms. Shell a special respondent, the state court issued an order granting the motion, vesting legal custody of Ms. Fields's daughter with the Fremont County DHS, and scheduling a jury trial. In making Ms. Shell a special respondent the court observed that Ms. Shell, "in the guise of acting as the agent" for Ms. Fields, has "essentially been providing legal advice to [Ms. Fields]." The state court further ordered Ms. Shell specifically prohibited from: