Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×

Comment Re:CRC (Score 1) 440

if you have 50 files that all are *exactly* 1GB in size

Hmm. To the byte?

I very rarely find any similarly sized files that large, and those that I do, there are usually only two of them. Usually, these are videos, or audio files that I've copied/rsynced around in such a way that they ended up in two places.
Of course, everyone's usage will be unique, but I can't imagine finding that scenario being common.

Comment Re:CRC (Score 1) 440

You need to compile the .java files to .class files.

You'll need something called javac (which comes with the Java Development Kit (JDK)).
It's not super easy getting the hang of it at first.

Clone the git repo, run mvn clean install (you'll need to download Maven), and then you should end up with a JAR file. Then run java -jar finddups.jar and things should start to happen.

Or should I just commit a JAR file to Github?

Comment Re:CRC (Score 1) 440

Yes, that's something I thought about. It's a trade off, isn't it? If you have two 700MB files that are both the exact same size, but are different, the way I'm doing it is quickest. If the two 700MB files are the same, then it will probably be about the same time as CRC/MD5ing the files.

If you have many small files, then I guess the IO won't be that much anyway.
My implementation offers a parameter to ignore files smaller than a specific size, which is how I run it: java -jar finddups.jar /path 200000 (for instance).

Another commenter to your comment suggested doing it adaptively, which would be easy.

Comment Re:CRC (Score 5, Informative) 440

Exactly. What I do is this:

1. Compare filesizes.
2. When there are multiple files with the same size, start diffing them. I don't read the whole file to compute a checksum - that's inefficient with large files. I simply read the two files byte by byte, and compare - that way, I can quit checking as soon as I hit the first different byte.

Source is at https://github.com/caluml/finddups - it needs some tidying up, but it works pretty well.

git clone, and then mvn clean install.

Slashdot Top Deals

Our business in life is not to succeed but to continue to fail in high spirits. -- Robert Louis Stevenson

Working...