Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
User Journal

GameboyRMH's Journal: My BTRFS dedupe script

Journal by GameboyRMH

Here's a BTRFS dedupe script I made earlier this year. I started with this and modded from there. Right now it runs in sort of a paranoid mode, even if two files have identical sizes and hashes it will still do a byte-for-byte comparison before considering them identical. This will run faster on a system that uses tmpfs for /tmp.

WARNING: When I tried this script earlier this year on an Oneiric box it would hang on one of the first few reflink operations and freeze the whole PC. It damaged the BTRFS partition it was operating on beyond repair. In theory this should certainly work but in practice it might ruin your shit. YOU HAVE BEEN WARNED


#!/bin/bash
# Usage: dedup.sh PATH_TO_HIER_WITH_MANY_EXPECTED_DUPES
DTEMPPATH="/tmp/btrfs-dedup-sums-`echo $$`"
# use trap to clean temp dir on break
trap 'rm -rf $DTEMPPATH; exit' 2 3
mkdir "$DTEMPPATH" ;
find $@ -type f | while read F
do
        FHASH=$(md5sum "$F" | cut -d" " -f1);
        FSIZE=$(stat --printf %s "$F");
        # If hashed, it's probably a dupe, compare bytewise
        # and create a CoW reflink
        if [ -f "$DTEMPPATH/$FSIZE/$FHASH" ];
        then
                if cmp -s "`readlink -f $DTEMPPATH/$FSIZE/$FHASH`" "$F";
                then
                        echo "$F is a duplicate of `readlink -f $DTEMPPATH/$FSIZE/$FHASH`" ;
                        #get permissions of file to be deduped
                        FOWNERSHIP=$(stat --printf "%u:%g" "$F");
                        FPERMS=$(stat --printf %a "$F");
                        #make delete, link & permission set unbreakable
                        trap '' 2 3
                        echo -n "starting dedupe op..." ;
                        #---action part, comment this out for dry run---
                        echo -n "deleting..." ;
                        rm "$F" ;
                        echo -n "reflinking..." ;
                        cp --reflink "`readlink -f $DTEMPPATH/$FSIZE/$FHASH`" "$F" ;
                        echo -n "chowning..." ;
                        chown "$FOWNERSHIP" "$F" ;
                        echo -n "chmodding..." ;
                        chmod "$FPERMS" "$F" ;
                        #---action part's over---
                        echo "complete." ;
                        #re-set exit trap to clean temp dir
                        trap 'rm -rf $DTEMPPATH; exit' 2 3
                else
                        echo "HASH COLLISION BETWEEN $F -AND- `readlink -f $DTEMPPATH/$FSIZE/$FHASH` - skipping." ;
                fi
        # It's a new file, create a hash entry.
        else
                #echo "$F is new" ;
                if [ ! -d "$DTEMPPATH/$FSIZE/" ];
                then
                        mkdir "$DTEMPPATH"/"$FSIZE" ;
                fi
                ln -s "$F" "$DTEMPPATH/$FSIZE/$FHASH" ;
        fi
done
rm -rf "$DTEMPPATH" ;

This also doesn't handle SELinux contexts or xattrs, but if I could get this to work I'd try changing "cp --reflink" to "cp --preserve=mode,ownership,timestamps,context,xattr --reflink", which should also replace the chown & chmod operations if it works properly.

This discussion has been archived. No new comments can be posted.

My BTRFS dedupe script

Comments Filter:

Long computations which yield zero are probably all for naught.

Working...