Our mindset at my research institution is very different. We generate a certain amount of data per year (several terabytes), but the cost of storage decreases so fast we just copy old data onto new media and never delete ANYTHING.
In fact, we consider the cost of actually figuring out what data to delete to be higher than simply buying more storage.
I would not call it "well-indexed" however.
Our backup strategy is tailored to the nature of our data. Most of our data is simulation results. We back up "lightweight" data and analyzed results, input files, and log files. "Heavyweight" data we do not back up, since we consider the cost of reproducing this data (given the input files and the log files) modified by the low probability of actually ever needing it to be lower than the cost of backing it up. This results in our backup requirement to be maybe 5% of our "live" data archive.
If it gets to the point where we can't afford the storage anymore, we'll delete the "heavyweight" data ourselves to reduce the data footprint.