Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×

Comment Re:CRC (Score 1) 440

if you have 50 files that all are *exactly* 1GB in size

Hmm. To the byte?

I very rarely find any similarly sized files that large, and those that I do, there are usually only two of them. Usually, these are videos, or audio files that I've copied/rsynced around in such a way that they ended up in two places.
Of course, everyone's usage will be unique, but I can't imagine finding that scenario being common.

Comment Re:CRC (Score 1) 440

You need to compile the .java files to .class files.

You'll need something called javac (which comes with the Java Development Kit (JDK)).
It's not super easy getting the hang of it at first.

Clone the git repo, run mvn clean install (you'll need to download Maven), and then you should end up with a JAR file. Then run java -jar finddups.jar and things should start to happen.

Or should I just commit a JAR file to Github?

Comment Re:CRC (Score 1) 440

Yes, that's something I thought about. It's a trade off, isn't it? If you have two 700MB files that are both the exact same size, but are different, the way I'm doing it is quickest. If the two 700MB files are the same, then it will probably be about the same time as CRC/MD5ing the files.

If you have many small files, then I guess the IO won't be that much anyway.
My implementation offers a parameter to ignore files smaller than a specific size, which is how I run it: java -jar finddups.jar /path 200000 (for instance).

Another commenter to your comment suggested doing it adaptively, which would be easy.

Comment Re:CRC (Score 5, Informative) 440

Exactly. What I do is this:

1. Compare filesizes.
2. When there are multiple files with the same size, start diffing them. I don't read the whole file to compute a checksum - that's inefficient with large files. I simply read the two files byte by byte, and compare - that way, I can quit checking as soon as I hit the first different byte.

Source is at https://github.com/caluml/finddups - it needs some tidying up, but it works pretty well.

git clone, and then mvn clean install.

Comment Re:What? No encrypted IPs? (Score 1) 94

Assuming the admin wasn't too lazy to set it up. :)

Assuming that the DNS for the IP address range is delegated to the admin first of all.

It's all very well setting up rDNS, but sometimes, the bureaucratic nightmare to get the range pointed at your DNS server is just not worth it.

Comment Re:The TOC's location is a soft secret (Score 2) 79

What are you talking about datacentres for?

In a skyscraper in London's Canary Wharf financial district, Olympic organizers opened a Technology Operations Center (TOC) last month and that act as mission control for monitoring the health of Olympic IT systems. The TOC's location is a soft secret, and organizers did not want its exact location to be published for security reasons.

Slashdot Top Deals

On the eighth day, God created FORTRAN.

Working...