Please create an account to participate in the Slashdot moderation system


Forgot your password?
User Journal

Journal GameboyRMH's Journal: My BTRFS dedupe script

Here's a BTRFS dedupe script I made earlier this year. I started with this and modded from there. Right now it runs in sort of a paranoid mode, even if two files have identical sizes and hashes it will still do a byte-for-byte comparison before considering them identical. This will run faster on a system that uses tmpfs for /tmp.

WARNING: When I tried this script earlier this year on an Oneiric box it would hang on one of the first few reflink operations and freeze the whole PC. It damaged the BTRFS partition it was operating on beyond repair. In theory this should certainly work but in practice it might ruin your shit. YOU HAVE BEEN WARNED

DTEMPPATH="/tmp/btrfs-dedup-sums-`echo $$`"
# use trap to clean temp dir on break
trap 'rm -rf $DTEMPPATH; exit' 2 3
mkdir "$DTEMPPATH" ;
find $@ -type f | while read F
        FHASH=$(md5sum "$F" | cut -d" " -f1);
        FSIZE=$(stat --printf %s "$F");
        # If hashed, it's probably a dupe, compare bytewise
        # and create a CoW reflink
        if [ -f "$DTEMPPATH/$FSIZE/$FHASH" ];
                if cmp -s "`readlink -f $DTEMPPATH/$FSIZE/$FHASH`" "$F";
                        echo "$F is a duplicate of `readlink -f $DTEMPPATH/$FSIZE/$FHASH`" ;
                        #get permissions of file to be deduped
                        FOWNERSHIP=$(stat --printf "%u:%g" "$F");
                        FPERMS=$(stat --printf %a "$F");
                        #make delete, link & permission set unbreakable
                        trap '' 2 3
                        echo -n "starting dedupe op..." ;
                        #---action part, comment this out for dry run---
                        echo -n "deleting..." ;
                        rm "$F" ;
                        echo -n "reflinking..." ;
                        cp --reflink "`readlink -f $DTEMPPATH/$FSIZE/$FHASH`" "$F" ;
                        echo -n "chowning..." ;
                        chown "$FOWNERSHIP" "$F" ;
                        echo -n "chmodding..." ;
                        chmod "$FPERMS" "$F" ;
                        #---action part's over---
                        echo "complete." ;
                        #re-set exit trap to clean temp dir
                        trap 'rm -rf $DTEMPPATH; exit' 2 3
                        echo "HASH COLLISION BETWEEN $F -AND- `readlink -f $DTEMPPATH/$FSIZE/$FHASH` - skipping." ;
        # It's a new file, create a hash entry.
                #echo "$F is new" ;
                if [ ! -d "$DTEMPPATH/$FSIZE/" ];
                        mkdir "$DTEMPPATH"/"$FSIZE" ;
                ln -s "$F" "$DTEMPPATH/$FSIZE/$FHASH" ;
rm -rf "$DTEMPPATH" ;

This also doesn't handle SELinux contexts or xattrs, but if I could get this to work I'd try changing "cp --reflink" to "cp --preserve=mode,ownership,timestamps,context,xattr --reflink", which should also replace the chown & chmod operations if it works properly.

This discussion has been archived. No new comments can be posted.

My BTRFS dedupe script

Comments Filter:

"There is no distinctly American criminal class except Congress." -- Mark Twain