it needs incredible amount of memory to operate effectively.
from my university notes:
5TB data, average blocksize 64K = 78125000 blocks
for each block the dedup needs 320 bytes so
78125000 x 320 byte = 25 GB dedup table
use compression instead. (eg zfs compression)
I'm having a hard time imagining why you're using 320 bytes for each block. If you had a 100TB of data in 4KB blocks, that's 25 billion blocks. You could use an on-disk reference table for pointing out duplicate blocks and reference it with 36 bits. If you used an excessive 256 bit hashing algorithm for each block, a b-tree with two 64-bit spaces for references, and a 4 control bits, that still only gets you to 420 bits, which is 53 bytes. You only need to store location information for blocks that are actually deduped in RAM (the rest is only referenced when a dedupe happens), so two 36 bit locations on a duplicate block would bring the total for a duplicate block to 62 bytes, with an additional ~5 bytes for each additional block that is found to be a duplicate.
Let's say it's 50TB duplicated exactly, for a total of 100TB. 5*10^13/4KB*62B = 775GB. That's a lot of RAM, but it's still only 0.7% of the total. And you could get far more efficient by using a smaller hash with a reference to a larger on-disk hash table. A 32 bit hash with a 35 bit on-disk reference (containing disk block locations and longer hashes), a b-tree with two 35 bit memory offsets, plus 20 bits for whatever, and you'd be at 24 bytes, which would drop your total memory usage to 300GB.
Of course, 32 bytes would align much better, and you could decrease space to 1/16th by using 64KB block sizes, but the point is that 320 bytes is clearly absurd. All I can think of is that you should be typing bits instead of bytes.