Well, thats the nature of de-duplication. Either your data is full of duplicates, or it isn't. If you are not sure what your data will do, you probably should assume no reduction until you have actual numbers.
We had a disk library vendor try to sell us on their dedupe on their VTL, which we put to work backing up exchange data. They told us stories of huge compression ratios, and how Exchange mailboxes compess so easily and we could easily fit weeks of backups onto it. The fact was the machine was allready bought by management, and we just looked after the backups not exchange, so we just set it up as we were ordered to.
Turns out their examples were all based on sites with 1G mailboxes or more. Our bizare setup with 50M mailboxes and thousands of employees was not that good for dedupe because all the repeat mails got moved to peoples PSTs immediately or their mailbox would fill. So after a week of backups and the VTL full, we were stuffed. So then we had to move it all off to tape....
Turns out while it did backup and run its de-dupe fine, unduping the data back for restoration was hopeless - like less than 1MB/s throughput, and so it took days to move off.
And that experience is why I now believe that de-dupe for backups is fundamentally flawed. You want a backup of all your data - not the data some borken firmware decides is unique. Any corruption, bang, all backups useless because they are all missing that highly duplicated block.