teece - Slashdot User

Comment They simply can not scan for subsets (Score 1) 185

by teece on Tuesday December 19, 2006 @04:43PM (#17304498) Attached to: Copyright Tool Scans Web For Violations

The company claims to be able to find "a customer's content based on the appearance of as little as a few sentences of text or a few seconds of audio or video."

This is nonsense, setting aside the fact that such things are quite probably fair use. Having any kind of complete catalog of "digital fingerprints" for a given work is (practically) impossible. At best, a few select snippets of a given document could be fingerprinted. Changing even a single bit will change a one-way encryption hash (which is, presumably, the method used here), and it won't change the fingerprint in a predictable way. One would need to catalog hashes for every subset of the given document, and the number of such hashes would grow as n^2, where n is the "word-size" of the document.

I wrote two articles on it on my blog, one general, one mathematical. Read 'em if you'd like. Beware the Digital Snake Oil How Many Substrings in a Given Text?

Slashdot Top Deals