If you have 100 files all of one size, you'll have to do 4950 comparisons.
You only have to do 4950 comparisons if you have 100 unique files.
What I do is pop the first file from the list, to use as a standard, and compare all the files with it, block by block. If a block fails to match, I give up on that file matching the standard. The files that don't match generally don't go very far, and don't take much time. For the ones that match, I would have taken all that time if I was using a hash method anyway. As for reading the standard file multiple times: It goes fast because it's in cache.
The ones that match get taken from the list. Obviously I don't compare the one which match with each other. That would be stupid.
Then I go back to the list and rinse/repeat until there are less than 2 files.
I have done this many times with a set of 3 million files which take up about 600GB.