Comment Re:Use pHash (Score 1) 487
How efficient is it, though? How many calls to hashdist() would you expect to make for a database of, say, 10 million songs?
How efficient is it, though? How many calls to hashdist() would you expect to make for a database of, say, 10 million songs?
This may be true in the US, but apparently not in the Netherlands: the patent lawyer he contacted told him Shazam would have a case if he published the code.
Even then, code is speech until you run it. Are we now to limit free speech by government order to protect their patents?
By that logic you could freely distribute an infringing program as long as you don't run it. So yes, free speech is limited in some way.
If the hardware store sells me a CNC mill and I make patented widgets with it will they sue the hardware store?
No, but if they also gave you pre-milled parts of a patented widget and instructions to assemble them together they would sure as hell be liable.
Remember in a software patent all you need to say is "a method for identifying music playing by listening to a small sample and comparing to a list of sonic fingerprints" and you are pretty much all set.
You're referring to the description, which has little legal effect. The stuff that they can really take to court is any of the claims they have listed. Their main claim is
A method of characterizing a relationship between a first and a second audio sample, the method comprising: generating a first set of fingerprint objects for the first audio sample, each fingerprint object occurring at a respective location within the first audio sample, the respective location being determined in dependence upon the content of the first audio sample, and each fingerprint object characterising one or more features of the first audio sample at or near each respective location; generating a second set of fingerprint objects for the second audio sample, each fingerprint object occurring at a respective location within the second audio sample, the respective location being determined in dependence upon the content of the second audio sample, and each fingerprint object characterising one or more features of the second audio sample at or near each respective location; pairing fingerprint objects by matching a first fingerprint object from the first audio sample with a second fingerprint object from the second audio sample that is substantially similar to the first fingerprint object; generating, based on the pairing, a list of pairs of matched fingerprint objects; determining a relative value for each pair of matched fingerprint objects; generating a histogram of the relative values; and searching for a statistically significant peak in the histogram, the peak characterizing the relationship between the first and second audio samples.
which is not nearly as vague. But it's still very basic and obvious stuff. It doesn't seem easy to implement an efficient fingerprinter that avoids this patent since you basically have to throw away all the inter-feature timing information if you don't want to run into something equivalent to their peak histogram stuff.
I'm shocked at how such broad claims can be accepted by patent offices...
His blog post contains a lot of code, making it dangerously close to a full implementation. Although even their lawyers don't seem entirely confident in this interpretation, since they only mentioned the blog post in their last e-mail.
Erratum: when I said variance (second central moment) I didn't mean variance but the expected value of X^2 (second moment about 0).
Also, someone should mod up the post below about wedgelets. The useful properties I was mentioning in the parent are common to all forms of wedgelet approximation.
I completely agree with you about the fact that everything in the paper seems to be pulled out of thin air... But I do see two reasons why his compression algorithm might be better than JPEG or other lossy codecs in some situations:
1) the decompression performs no arithmetic on the pixels, hence you can perform gamma correction or color change losslessly (like in a square-pixel image)
2) aside from the choice of mask, the compression is entirely deterministic, which is a plus in scientific imaging: when you have a "triangular pixel" with value 200, you know that the average of that zone was exactly 200 (with JPEG, you can't know anything for sure as the compressor could add artefacts or remove detail as it sees fit)
Why are you maximizing contrast instead of minimizing error like any sane person would do, WHY?
In fact they are equivalent, assuming that the masks are equal-area:
square of RMS error
= Variance(residual)
= Variance(maskedimage1) - average1^2 + Variance(maskedimage2) - average2^2
Since Variance(maskedimage1) + Variance(maskedimage2) remains constant (we just shuffle pixels between both masked images when we change the mask), minimising the error is equivalent to maximising
average1^2 + average2^2
= 1/2 * ( (average1-average2)^2 + (average1+average2)^2 )
Since average1+average2 also remains constant, this is equivalent to maximising
(average1-average2)^2
which gives us the maximum-contrast method.
Happiness is twin floppies.