Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×

Comment fslint's findup deduplicator (Score 1) 440

Well yes, this is a linux tool, but still I was quite pleased with it's results for 800k files. It took some time but it had an end.
It's basically a shellscript doing what others have suggested: sort by size, same size files are checksummed. /usr/share/fslint/fslint/findup
find dUPlicate files.
Usage: findup [[[-t [-m|-d]] | [--summary]] [-r] [-f] paths(s) ...]
If no path(s) specified then the currrent directory is assumed.
When -m is specified any found duplicates will be merged (using hardlinks).
When -d is specified any found duplicates will be deleted (leaving just 1).
When -t is specfied, only report what -m or -d would do.

When --summary is specified change output format to include file sizes.
You can also pipe this summary format to /usr/share/fslint/fslint/fstool/dupwaste
to get a total of the wastage due to duplicates.

As it's a single command line with dozens of pipes, it should use all cores if needed.
some text from the source:

Description

      will show duplicate files in the specified directories
      (and their subdirectories), in the format:

              file1
              file2

              file3
              file4
              file5

      or if the --summary option is specified:

              2 * 2048 file1 file2
              3 * 1024 file3 file4 file5

      Where the number is the disk usage in bytes of each of the
      duplicate files on that line, and all duplicate files are
      shown on the same line.
              Output it ordered by largest disk usage first and
      then by the number of duplicate files.
Caveats/Notes:
      I compared this to any equivalent utils I could find (as of Nov 2000)
      and it's (by far) the fastest, has the most functionality (thanks to
      find) and has no (known) bugs. In my opinion fdupes is the next best but
      is slower (even though written in C), and has a bug where hard links
      in different directories are reported as duplicates sometimes.

      This script requires uniq > V2.0.21 (part of GNU textutils|coreutils)
      dir/file names containing \n are ignored
      undefined operation for dir/file names containing \1
      sparse files are not treated differently.
      Don't specify params to find that affect output etc. (e.g -printf etc.)
      zero length files are ignored.
      symbolic links are ignored.
      path1 & path2 can be files &/or directories

and the code has optimizations like this one
sort -k2,2n -k3,3n | #NB sort inodes so md5sum does less seeking all over disk

Comment unison is bi-directional (Score 1) 153

unison has already been suggested multiple times.

I used unison. It's perfect to sync from A to B (it only syncs the diffs) then modify B and later sync B to A
You also can modify A and B at the same time as long as it's not the same file, then sync and then A and B are identical.
You can even sync in cycles: A->B->C->A with modifications on all three directory trees and it still works
Unison also handles deletions on both sides fine.
Hint: use the -group -owner -times flags

Comment This has not been the first time. Can't they test? (Score 1) 297

several space projects have failed because of blocked fuel lines or similar problems due to forgotten items.
Can't they "simply" test the full operation of the sattelite, including the engine, before mounting it on the rocket?
Othe solutin would be to have every item, even rugs, fitted with serial numbers and rfid chips so you can easily and fast account of the whereabouts (or not) of everything.

Slashdot Top Deals

This file will self-destruct in five minutes.

Working...