MD5 is designed to produce very different checksums even in case of single bit differences in the checksummed data, which makes MD5 and similar algorithms completely useless for pattern recognition purposes.
Ohh, and if you are talking about ripped media, such as MP3, you naturally should not create a checksum about the audio piece as is. You should maybe first make a profile of it, to kill the differences, and then make the md5sum. But I think you really need something exact like md5 - if that information is going to be provided through a publicly accessed database to reduce the load caused to the server.