Comment Re:Which 90% ? (Score 1) 224
The discussion isn't about discarding the data, rather pushing it back to slower and cheaper storage with less frequent backup.
You could implement this like a cache. The front server interacts with the clients and holds 1TB of data, the back server(s) hold 10TB of data. You interact with the front server. When you read a file if the server has it you receive it quickly. If the server doesn't you get a 'cache miss', the server pulls it from the back store, caches it and provides it but takes 30 seconds to do so. Modified data is routinely pushed back to the back server with dirty flags etc. like most other caches.
From a users perspective the 30 second delay is annoying but they wouldn't notice if it only happened once a week. Keeping the incident rate of that down is where stats like the 90% come in handy.
Back to looking at your probability statistics. We don't have equal interest, that's what the entire article is about. Having 35% of this data needed again is fine, a quick reflection on standard work practices would suggest that these would be spread through time.