Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!


Forgot your password?

Slashdot videos: Now with more Slashdot!

  • View

  • Discuss

  • Share

We've improved Slashdot's video section; now you can view our video interviews, product close-ups and site visits with all the usual Slashdot options to comment, share, etc. No more walled garden! It's a work in progress -- we hope you'll check it out (Learn more about the recent updates).

User Journal

GameboyRMH's Journal: My BTRFS dedupe script

Journal by GameboyRMH

Here's a BTRFS dedupe script I made earlier this year. I started with this and modded from there. Right now it runs in sort of a paranoid mode, even if two files have identical sizes and hashes it will still do a byte-for-byte comparison before considering them identical. This will run faster on a system that uses tmpfs for /tmp.

WARNING: When I tried this script earlier this year on an Oneiric box it would hang on one of the first few reflink operations and freeze the whole PC. It damaged the BTRFS partition it was operating on beyond repair. In theory this should certainly work but in practice it might ruin your shit. YOU HAVE BEEN WARNED

DTEMPPATH="/tmp/btrfs-dedup-sums-`echo $$`"
# use trap to clean temp dir on break
trap 'rm -rf $DTEMPPATH; exit' 2 3
mkdir "$DTEMPPATH" ;
find $@ -type f | while read F
        FHASH=$(md5sum "$F" | cut -d" " -f1);
        FSIZE=$(stat --printf %s "$F");
        # If hashed, it's probably a dupe, compare bytewise
        # and create a CoW reflink
        if [ -f "$DTEMPPATH/$FSIZE/$FHASH" ];
                if cmp -s "`readlink -f $DTEMPPATH/$FSIZE/$FHASH`" "$F";
                        echo "$F is a duplicate of `readlink -f $DTEMPPATH/$FSIZE/$FHASH`" ;
                        #get permissions of file to be deduped
                        FOWNERSHIP=$(stat --printf "%u:%g" "$F");
                        FPERMS=$(stat --printf %a "$F");
                        #make delete, link & permission set unbreakable
                        trap '' 2 3
                        echo -n "starting dedupe op..." ;
                        #---action part, comment this out for dry run---
                        echo -n "deleting..." ;
                        rm "$F" ;
                        echo -n "reflinking..." ;
                        cp --reflink "`readlink -f $DTEMPPATH/$FSIZE/$FHASH`" "$F" ;
                        echo -n "chowning..." ;
                        chown "$FOWNERSHIP" "$F" ;
                        echo -n "chmodding..." ;
                        chmod "$FPERMS" "$F" ;
                        #---action part's over---
                        echo "complete." ;
                        #re-set exit trap to clean temp dir
                        trap 'rm -rf $DTEMPPATH; exit' 2 3
                        echo "HASH COLLISION BETWEEN $F -AND- `readlink -f $DTEMPPATH/$FSIZE/$FHASH` - skipping." ;
        # It's a new file, create a hash entry.
                #echo "$F is new" ;
                if [ ! -d "$DTEMPPATH/$FSIZE/" ];
                        mkdir "$DTEMPPATH"/"$FSIZE" ;
                ln -s "$F" "$DTEMPPATH/$FSIZE/$FHASH" ;
rm -rf "$DTEMPPATH" ;

This also doesn't handle SELinux contexts or xattrs, but if I could get this to work I'd try changing "cp --reflink" to "cp --preserve=mode,ownership,timestamps,context,xattr --reflink", which should also replace the chown & chmod operations if it works properly.

This discussion has been archived. No new comments can be posted.

My BTRFS dedupe script

Comments Filter:

"Pascal is Pascal is Pascal is dog meat." -- M. Devine and P. Larson, Computer Science 340