It occurred to me that one of the major complications (tracking a proving the link between the search result and the file itself) could be overcome with a spider program whose output style would track pages/links in depth with search-to-file collation tracking. I was wondering if any slashdotters out there knew of some F/OSS software like this?
Full text of email:
— Forwarded message — Date: Wed, Feb 4, 2009 at 9:29 AM The "eBook Thief" site is a typical example of file upload/download piracy (which differs from torrent-based file sharing, the other way in which pirated books move around). Here are two more: http://www.ebookpedia.info/ http://www.avaxhome.ws/ In this model, the website (e.g., "eBook Thief") directs visitors to a file-hosting site such as RapidShare (http://en.wikipedia.org/wiki/RapidShare), where someone usually unrelated to the web site operators has deposited a file that is available for free and/or paid downloading. It is also common that the link from website to fille host is channeled through another type of website that specializes hosting URLs that lead to filehost sites, presumably because this make it more difficult with automated anti-piracy monitoring to discover which website is guiding visitors to a which file-host site. As an example, at present "Atlas Shrugged" is the most recent post on eBook Thief (http://www.ebookthief.com). If you click the "Download eBook here" link, another file is appended to the post, a file with two URLs. This 2-step process is common because it makes it impossible for automated anti-piracy monitoring to find a book information (e.g., author, title, ISBN) in the same web page/file as a link to a file-host site. To continue, you'll see two links presented, one to Rapidshare and one to DepositFiles (both very popular file-host sites). You can go ahead and click on these, it won't do any damage to your computer. When I click through the DepositFile link this morning, I arrive at DepositFiles only after a brief stop at a third site that wants me to see an ad before I get to DepositFiles. Don't download the file from either RapidShare or DepositFile, though, since presumably this is illegal. It is also a process whereby viruses, trojans, and so forth can be sent along with the eBook on offer. However, 99% of the time this will indeed be the eBook it purports to be. So, here the publishers of Atlas Shrugged have three (or four) problems. One website, two file-hosting sites, and, in the case of the path from website to DepositFile, one intermediary "link-laundering" site. The website, eBook Thief, is a tempting target. They even invite you to contact them to have posts removed under DMCA. Some websites will actually do so in response. But mostly they hope this "disclaimer" somehow makes it less likely they are guilty of piracy. It may also make visitors feel less guilty about stealing, I would hypothesize. However, the web site probably isn't hosted in the U.S. It can be difficult to convince a web hosting firm in, say, Russia to shut down a site that directs visitors to pirated content. A higher priority target would be DepositFiles and RapidShare. The later in particular will usually remove a file in response to a DMCA request. I don't know of any examples of successful intervention against the "link laundering" sites. Nobody seems to pay much attention to their role in eBook piracy. Unfortunately, even if RapidShare and DepoitFiles remove the files, it may take only a few days before the files show up again, on another file-host site, and frequently even on RapidShare, under a new account, with a new file name. RapidShare is being pressured by German courts do to more to prevent pirated material from re-appearing, but actually this is a difficult and expensive task, for technical reasons too complicated to go into at the moment). (As an aside, now you can see why RIAA has given up on everything but trying to go after U.S. ISPs who carry all this piracy traffic for consumers in the U.S.) By the way, you can't completely trust the Search functions on sites like eBook Thief. There are some tricks that are used to thwart visitors who may be looking for copyright violations (I'll elaborate another day, if you like)."
Link to Original Source