Automated Tiered Storage Coming to Desktops? 110
roj3 writes "Tiered storage has been the scourge of administrators because the vendors tell us to hold meetings with all departments and then classify data to storage tier based on its type or relative importance. eWeek has a story about a new approach to tiered storage — sorting it all by usage patterns. Regularly used data goes on high-performance storage, idle data goes on slower/cheaper storage. Volumes and files even span several types of drives or RAID levels. Is automated tiered storage headed to desktops?"
Networks, sure. (Score:5, Insightful)
IDE Neutrality? (Score:5, Funny)
The Coalition of Unused Files believes that the desktop is a crucial engine for personal and economic growth. They are working together to urge System Admins to preserve IDE Neutrality, the First Amendment for the Desktop Hard Drive that ensures that the Desktop remains open to innovation and progress.
Re:Networks, sure. (Score:5, Interesting)
With multiple PCs per household, it makes sense to get rid of the hard drives at the PC level and put them in a RAID enclose that is secured into a wall.
This, however, is a threat to Microsoft because you'll be able to PXE-boot any image of your choice (just think that perhaps your employer or bank supplies their own secure image in order to connect to their resources). Someone needs to get Windows to PXE boot at the hardware level (emulate IDE or something).
This will be huge but we've got to squeeze Microsoft into it, first. Then, everyone will be free to try linux and see what we've all been jabbering about.
Re:Networks, sure. (Score:5, Insightful)
The XP file sharing wizard is too much for a lot of people and you think a raid array sharing up OS images over a network via PXE makes sense?
Re:Networks, sure. (Score:2)
It doesnt really matter if the common person can install something, they wont be doing it anyways.
Re:Networks, sure. (Score:2)
Given I am an IT professional who can manage all this. I think that we will definately see the average home user get into tiered storage. Think about digit
Re:Networks, sure. (Score:5, Insightful)
Or, if you are like many people, you have documents on your desk and in piles on the floor that you will never use, your kids birth certificate is in a stack of papers from when you had to take it to school for registration, your file cabinets have partially labeled folders that are in chronological order...as in the order that you stuffed them in the filing cabinet, your will is in the "to be filed" folder in the bottom of said filing cabinet and you could fill the bathtub with your old phone and electric bills.
Hopefully the digital equivalents will be better for the organizationally challenged.
Re:Networks, sure. (Score:2)
And your wife agrees with this system?
*dumps gf*
QUICK WHERE DID YOU GET HER??!
Re:Networks, sure. (Score:1, Interesting)
The software would analyze file usage, and move them around every day. The anecdotal evidence I have that it worked on such small scale was that my girlfriend later asked me how I got the computer to start responding faster.
I don't know how well this technology would help on newer systems. I suspect at lea
Re:Networks, sure. (Score:2)
But people don't seem to think the same. And its their data, after all.
"I have heard too many people complain about losing something important because of hard drive failure."
Did they complain to the point to ask a hardware vendor for a RAID1 off-the-shelf (of course, they wouldn't ask "give me a RAID1", but they'd answer positively to a hardware vendor advertising "no more data loss! our patented 'doubledisk' technology secures your data
Re:Networks, sure. (Score:2)
In our case, we do not back up the desktops and constantly remind people that if they do not sync their data to the file server, they will deserve the pain when (not if) the disk crashes. Everyone gets a quota on their personal area and we tell them not to save crap like MP3s or AVIs to the server or the files will not last long. This saves backup tapes for the actual corporate data.
Project data gets saved in project speci
Re:Networks, sure. (Score:1)
I'm a Windows web/database developer by day, and when I have 4 different
Great Idea (Score:5, Insightful)
I try to follow this idea all the time with my system. Fast stuff goes on RAID 0, slow stuff, and backup stuff goes on the ole' 200 GB backup drive.
Re:Great Idea (Score:5, Informative)
But it was a lot smarter than the admins who had to use it so it wasn't very popular.
Re:Great Idea (Score:2)
Re:Great Idea (Score:4, Insightful)
No.
You (and a number of other posters on this topic) have described what we look for - Geeks who want to get the most out of their systems with the least expense. If I could get killer performance with a RAID0 of tiny but fast drives (think Raptors, or even Cheetahs if you don't mind dealing with SCSI), while still having the capacity of a cheap 400GB IDE drive - Of course I'd have such a setup (and in fact, many of us already do, we just manually transfer things to/from the big-n'-slow).
Most people, however, do not want this. For starters, most people don't even need the huge drives they already have - If you gave them just the pair of RAID0 36GBs, they'd never use even half that the capacity, so no need for ever moving files to the slow storage. Then failing that, the members of the Sixpack family that manage to store hundreds of GB only fill it with downloaded porn, music, and movies - Uses that really don't need fast drives, just tons of space.
So while it sounds useful in theory - in practice, such a setup would just add cost and complexity without providing any tangible benefit to most users. I suspect even most Geek users would rarely notice the difference (aside from OS load times), and would only make such a setup for bragging rights.
Re:Great Idea (Score:1)
Already have teirs... (Score:4, Insightful)
It's all about feeding that data hungry CPU, as quickly as possible.
Not so new... (Score:5, Interesting)
Looks to me like an excuse to charge 8-10x what you should be paying for storage of that size.
Re:Not so new... (Score:3, Informative)
Re:Not so new... (Score:3, Interesting)
Re:Not so new... (Score:1)
Re:Not so new... (Score:3, Informative)
You know, I did read it, and what they're talking about is that data that is less used/less critical gets moved to slower/less reliable storage automatically.
And when you have only two kinds of storage, a DASD bank and mag tape, and your system automatically writes least used data to tape and tells you to file it, and asks you for tapes when it needs them - well, I'd say the two are highly analogous. The fact that the slower storage is offline
Re:Not so new... (Score:5, Informative)
The day after vacation, when you kept getting the message, "DFHSM is recalling dataset xyz for user jkl" as it pulled all of your storage back online was a pain, and we all thought it would be neat to get rid of, as we migrated to workstations. But in retrospect, HSM was great, never having to worry about your data quantity. That's compared with having to root through $HOME every few months to take care of quota problems.
Re:Not so new... (Score:2)
Any word on something like that for Linux fileservers? I am envisioning (as a first pass thought) a cron job with find -ctime that replaced the file with a symlink to the online compressed storage, but you may need some kind of hook into Samba or a lower level hook into the FS itself that grabbed it out of "cold storage" so to speak.
Re:Not so new... (Score:2, Informative)
Re:Not so new... (Score:2)
Something to check on - when I was actually looking at these kinds of systems 10 years ago, they helped that issue by using Magneto-Optical disks in between disk and tape. It may be that your installation is l
Re:Not so new... (Score:4, Informative)
Oh yeah. BITD, there was the archiver, a job that ran every night and moved files that hadn't been accessed in the last N time periods to tape. It left the VTOC entry (kind of like an inode), just marked it "archived" and the label of the tape. Then, the next time that file was accessed, a hook in the open() call would send a message to the console operator telling them to mount tape such-and-such. When the tape was mounted, the archiver would automatically copy the file back into place, the open() call would complete normally, and life was good. Basically transparent to the user (they'd look at their directory and all their files would be there), except for the fact that the file open would take two-three minutes. Then again, since they were paying for disk storage by the block-day, they were generally pretty happy to only pay for a fifty-cent tape mount every quarter instead of keeping that 1200-block file on-line for three months when they weren't using it.
Re:Not so new... (Score:1, Informative)
Novell.
Re:Not so new... (Score:3, Insightful)
No kidding. So they find a way to put less-used data on slower disks, that still COST NEARLY AS MUCH. The entry price is still listed as $50,000. Big fuckin' deal. Let me know when you take a bunch of garden-variety servers, and do this, with the super cheap clone raid server with 40 terabytes of SATA as the 'last tier' for slowest files, where I can build 100 terabytes for $50,000.
And yet, managers will get a woody over this buzzword compliance and want to give these guys million
Re:Not so new... (Score:2)
Got you beat there. I was using a similar system nearly 30 years ago at university - again disk and tapes. The O/S was GEORGE III running on a ICL xxxx (can't remember). Very usful in the days when your disk quota was measured in kilobytes but totally automatic migration to tape tended to keep your disk usage down.
One problem, though. After the summer break, all
Certainly could be done in a desktop (Score:2, Insightful)
All you would need is some software for automatically moving it around. Though most people with desktop rigs like that probably would rather control what is on which drives themselves.
Re:Certainly could be done in a desktop (Score:1)
You just described my desktop exactly.
Re:Certainly could be done in a desktop (Score:1)
Re:Certainly could be done in a desktop (Score:4, Informative)
http://www.anandtech.com/printarticle.aspx?i=2101 [anandtech.com]
Re:Certainly could be done in a desktop (Score:1)
Microsoft does a handy tool called Diskpar.exe (it's included with the Resource Kit)
Re:Certainly could be done in a desktop (Score:2)
Re:Certainly could be done in a desktop (Score:1)
Sure the drives are more likely to fail, but then again so is that single 250Gb "for everything else" drive.
Re:Certainly could be done in a desktop (Score:1)
--------
Why? For the price of a 36Gb Raptor you can get 300Gb el cheapo drives. I put 4 of those in a RAID5 on my SATA1 controller and get write speeds of 130MBytes/s (reads at 180MBytes/s according to dd). More diskspace with a higher reliability compared to RAID0 without the need
Re:Certainly could be done in a desktop (Score:1)
What the article says is not important since it's about expensive hardware where as a desktop raid is cheap disks + some software glue.
To flood a SATA150 bus you only need 2 high performance disks. So you suggestion would most likely be 2 raptors raid0 and 2 lowends raid1/0 (on a 4 port controller). When the cheap storage is idle one will get max read/write, less when active (hard to guess
Oh....good.. (Score:5, Insightful)
So that special little something that you need once a year, but when you need it, you need it RIGHT NOW is tied to the foot of a pigeon fluttering around the warehouse somewhere. Frequency of use does NOT denote importance.
No but it is correlated (Score:3, Insightful)
Yes, there are exceptional cases, like the President's access to the Nuclear Briefcase. It hasn't been used for real in a long time if ever but when he needs it it had better be close at hand. However, these special cases can be treated as the special cases they are.
Re:No but it is correlated (Score:5, Insightful)
Oddly enough, I think most people in the world would prefer that it wasn't close at hand when Bush decides he wants it.
A better example is fire extinguishers -- most of them will literally never be used, but there's a very good reason to ensure that they are readily available.
Re:No but it is correlated (Score:4, Funny)
If that's the case for you, then I feel sorry for you. You've apparently never known the snowy, probably-toxic joy of Fire Extinguisher Expiration Day. It's the happiest day of the decade.
Re:Oh....good.. (Score:1)
Liars, damn liars, and statistics (Score:5, Informative)
No. Absent other data, it only denotes frequency of use, period. Playboy.com gets more hits than the general ledger webapp if you unblock your company firewall, but the general ledger is more important to the company.
There is actually very little correlation between what the average user wants and what s/he needs, as is empirically obvious. If the image from the "fly-fishing.com" website that they've set to come up as their background image every morning fails to load, they can still work, but if the once-a-year corporate audit checklist gets put on slow, old storage and then gets lost in a hardware failure, the company stock price may flutter and certainly heads will roll in the corporate IS department.
I don't think that word means what you think it means.
Re:Liars, damn liars, and statistics (Score:1)
Re:Oh....good.. (Score:1)
Hell, if its a text document you need only once a year then 1970 hd speeds might be good enough (for reading the doc, not MS Word)
Re:Oh....good.. (Score:4, Informative)
Re:Oh....good.. (Score:1)
Re:Oh....good.. (Score:2)
But look at it this way, at least the pigeon won't put you on eternal hold when you need rapid tech support in your time of crisis.
Re:Oh....good.. (Score:2)
It sounds like you don't pay the IT department for your storage. In my experience, once a department is charged for storage, they suddenly start requesting cheaper storage.
Re:Oh....good.. (Score:2)
When expensive storage == no new Blackberries his year, sales departments take notice
Re:Oh....good.. (Score:2)
Re:Oh....good.. (Score:2)
VP of Marketing: "What do you mean the chargeback for that tech is more than I make per hour?"
CIO/CFO: "Their time is valuable to us than yours."
VP of Marketing: "What am I, a schmuck?!"
CIO/CFO: "Yes."
In my experience, that's the downside of chargebacks -- all of a sudden, everyone has an idea of what "that guy in the server room" makes... and is VERY unhappy about it.
Re:Oh....good.. (Score:4, Insightful)
As an example, financial records for past years might be very important, but you don't need to be able to access them in a tenth of a second. As long as you can get to them if you really want to (sacrificing a few seconds), then it's all right.
The way I see this translating to reality is that you'd keep all your old documents in slow-speed storage, but then keep an index in high-speed storage, so that you could easily search (both by name and by content) and decide when to pull stuff out of your archives.
This is no different than what people have been doing for centuries with paper. Just because the card catalog is located in the center of the library doesn't mean its contents are inherently more valuable than the actual books (which might be in the basement, back shelves, wherever); it just means that the catalog gets accessed much more often.
Actually, in the physical world, people often exchange speed of recall for certainty of recall. You put important documents in a safe-deposit box, rather than your kitchen counter, because even though it'll take you longer to get them out of the box, they're guaranteed to be there when you need them. Likewise, a system which traded off speed for redundancy would probably be appropriate for "important" but infrequently-accessed electronic documents.
Re:Oh....good.. (Score:2)
It doesn't *always* denote importance. however, if a tiered storage system improves performnce a large enough percentage of the time then I'd live with a drop in performance on the odd occasion. Similarly to using spare memory for IO/file caching.
New form of something old (Score:1)
I do like the idea of this product. Similar performance gains can be had by having the OS manage the data. It's a different-yet-similar concent but some desktop OSes do this already with code libraries, putting them all in a single directory with little or no fragmentation within the file to allow for faster loading. Other OSes play similar tricks with system library metadata.
--
This would have been FIRST PO
Low power Optimization (Score:1)
Hot File Adaptive Clustering? (Score:4, Informative)
Apple's "About disk optimization with Mac OS X [apple.com]" (basically telling you that you don't need to defrag), says "Mac OS X 10.2 and later includes delayed allocation for Mac OS X Extended-formatted volumes. This allows a number of small allocations to be combined into a single large allocation in one area of the disk."
There's also a reference to a "hot band," a region of the drive where data is written that's used during startup, in order to increase performance and I assume lessen boot times.
There's also reference to some automatic defragging in this macosxhints article on HFAC [macosxhints.com]: So that seems to be the deal; if anyone else has more information, I'd be interested to hear about it.
There's also a MacSlash article on HFAC [macslash.org] and a discussion on Ars that includes a post of the source code [arstechnica.com].
It sounds good in theory... (Score:1)
For example, take an MP3 collection. I go to open up my old Soviet music collection (which I have), but I haven't listened to it in months, possibly even years. This would put it on the low-end of the priority and I would have to wait for the data to
Re:It sounds good in theory... (Score:2)
Re:It sounds good in theory... (Score:2)
I would like to know what kind of paint your using that dry's in the time it takes to load an mp3 off of a slow 5400 rpm drive.
This is "new"? (Score:4, Insightful)
IBM mainframes that literally pumped water were doing this decades ago.
What, you say water cooling is coming back too?
It already is (Score:4, Insightful)
Just read TFA: (Score:5, Insightful)
Re:Just read TFA: (Score:3, Interesting)
Channels of Fiber come not cheap.
Terabytes 6 with connection of light for less than $50k you will not find.
Terabytes 6 with connections of wire you may.
SATA drives, untested are delivered.
SATA drives with fewer bearings.
SATA drives with short life.
Enterprise storage is not easy.
Re:Just read TFA: (Score:2)
Re:Just read TFA: (Score:1)
A man in a suit with a laptop and a Powerpoint presentation to demonstrate how it'll lower your TCO, increase your ROI, and boost your career.
Re:Just read TFA: (Score:2, Interesting)
Just like my kitchen (Score:5, Funny)
Beer goes in the fornt on the top shelf of the fridge, milk (eventually cheese, typically) goes on the bottom shelf in the back.
This is automated, since I simply shove things onto the shelves when I get home from the supermarket. Anything I consume and replace ends up at the front. Anything I buy because I 'should' be eating it (like fiber biscuits, or whatever) ends up pushed to the back.
It's automated via metatag, too. Anything tagged 'ice cream' goes in the door of the freezer, anything tagged 'vegetable' gets relegated somewhere in the back, where it quickly develops an inch of ice crystals, to slowly dry out to a freezer-burnt state of suspended animation until I buy a new fridge unit.
This costs no more than regular kitchen storage space, but if you'd like a custom design for you and your loved ones, my consulting fee is $75/hr, or a bag of chips and a six-pack.
Re:Just like my kitchen (Score:1)
Wow, that must be some sixpack!
Yes, Kinda... (Score:5, Informative)
Microsoft announced a while back that Windows Vista would support three technologies designed to improve disk speed called SuperFetch, ReadyBoost, and ReadyDrive. [msdn.com] SuperFetch is simply a way of preloading applications and data when the OS anticipates that you'll be loading those soon.
ReadyBoost and ReadyDrive both utilize persistent memory caches to speed up access to the disk.
ReadyBoost treats normal USB keys and flash disks like temporary caching locations for data from the disk.
ReadyDrive is essentially the term Microsoft uses to described their support for hybrid hard drives, which are disks that have a built in flash memory module that's used as a persistent cache.
Not only do hybrid disks [pcworld.com] dramatically increase performance, but they also result in huge power savings for mobile devices like laptops and media players.
I could see a use. (Score:3, Interesting)
I don't think I'm that atypical in this regard. GMail brought the idea of saving all your email, forever, to the masses; Flickr gives you an unlimited amount of photo storage; and technologies like Apple's Spotlight make it relatively easy to search through gigabytes of saved information and pull up related items. What we haven't seen yet is a lot of popular interest in redundant backup systems: that'll come later, once people start realizing how much of their lives they're stored away on the crummy OEM drive in their Dell. (Probably after a lot of them fail and we hear some real horror stories.)
It's not hard to imagine a near future where people just get used to not throwing anything away. In that situation, tiering storage -- allocating the fastest media to the most frequently accessed information -- could have big performance gains. And assuming that you have a relatively static amount of frequently-accessed information, and basically only add information to the "infrequenly accessed" category, a tiered system means that you only really have to add storage to the bottom tier. It's a pyramid where the base gets larger and larger, but the upper part remains basically the same size.
So for example, as you save more and more emails (infrequently accessed information), they automatically get saved onto inexpensive, slower drives, which are then mirrored to each other for redundancy. A single, fast drive could hold the system -- maybe solid state storage? -- and more frequently-accessed data. A smart system would know what information needs to be moved up to faster storage to be very useful (uncompressed digital video, for example, wouldn't be much fun to work with off of a slow drive), and what can be left there as it's accessed (MP3s and compressed video could be played directly from slower media).
I think it's an interesting technology with a lot of possible applications, but as with a lot of other things, it'll be the home user who arrives last to the party, because their storage is the least centralized. Unless there's a move away from storage on individual desktop PCs and towards storage on per-home servers, it'll be a while before most people require or see the benefit in such a thing.
How is this new? (Score:2)
This is hardly a new concept — mainframes have been migrating untouched datasets to tape for years. If this really is a new idea in the SAN market, SANs must suck worse than I'd previously supposed.
And “Is automated tiered storage headed to desktops?” Well, no, unless there's something cheaper than hard disks, which there currently really isn't.
It's not, really (Score:2)
Sounds reasonable (Score:2)
Coming full circle -- good idea! (Score:2)
This is interesting, because when you read about old operating systems that ran on computers with several types of memory--fast magnetic core memory for the active programs, slower rotating-drum memory for less active data, large and slow hard drives, and automatic tape drives--they did exactly this. It makes sense that, given that we have L1 cache, L2 cache, and system RAM, each of which is slower and larger than the next, that we would extend this to hard drives, having a small, fast drive for often-used
Re:Coming full circle -- good idea! (Score:4, Informative)
SANCTIFY and DESECRATE
"Sanctify file" moved the file to drum (basically, one-drive RAID 0 for all you young-uns). Desecrate moved it to the regular hard disk.
YMMV
Ratboy
Hierarchical Email storage, client driven (Score:1)
I'd love to see it... (Score:3, Insightful)
I should never have to empty my recycle bin manually, except where I want to perform a security erase - which should be a function delivered with my operating system. This is the height of stupidity.
It's not even a hard problem! There's functions which programs use to check for free space. Lie to them. Don't count files in the recycle bin against the available free space. If you're about to run out of space, delete the least recently used file. Perhaps you might also base things based on total number of accesses, or other criteria, but I believe (perhaps naively) that making the trash can an automatic FIFO from which files are automatically deleted when disk space is low would be about a hundred times better than what we have now.
Also, I want this functionality on all operating systems. Unless I explicitly request deletion, no file should ever be unlinked, deleted, or whatever you call it when I delete it, whether through the command line or the GUI.
This is not hard and it would make everyone a lot happier.
Re:I'd love to see it... (Score:4, Interesting)
The problem with this is, is that it causes a significant reduction in performance.
Ideally, the operating system chose the best possible spot for that file when it got written. Once that file is deleted, that spot will once again be the fastest best possible spot- for at least something. If the operating system skips that spot for a new file, then this new file isn't going to be accessed quite as quickly.
Truly automatic tiered storage solves this problem by splitting the directory services from the storage system- that is, the file's _name_ is no longer tied to the volume that the file happens to live on (and no, this isn't the same thing as symlinks or shortcuts). This allows the decision as to what the best spot for a file is to be deferred until later- and even spanned across multiple volumes!
Unfortunately, such a beast is very difficult- if we make a reduction in our requirements- say that performance isn't very important- or perhaps that we can stop using our computer for a few hours each evening, then it's probably possible. What we need is a new kind of file system that supports either atomic moves between disks, or a filesystem that splits the names from the storage.
A few research projects have been focused on these kinds of changes- but they all tend to break UNIX semantics (Amoeba immediately springs to mind)- and those UNIX semantics are, in-fact, the most widely used and recognized semantics for filesystems anywhere (Even Windows uses them!)-- people who develop a filesystem incapable of supporting them, really need to have a real good reason for breaking everyone's hard work.
While they often do, it hasn't yet been seen as good enough for general purpose stuff.
Re:I'd love to see it... (Score:2)
Filesystems may be automatically and intelligently defragmented (while live, if the filesystem is decent) when disk I/O is at a minimum.
Currently, some operating systems (e.g. Wind
Re:I'd love to see it... (Score:3, Informative)
But my filesystem is never idle, or even nearly so. Nonetheless, fragmentation isn't exactly a bad thing, and doesn't necessarily have to cause problems (such as lost performance) by itself.
Worse still: How does the defragmenter know to avoid using this block? Or how does it know that it's a good candidate to be moved to the other end of the disk?
We could make a record of e
Where did the article say Desktops? (Score:2)
Sure, your desktop connects over the network to a SAN attached server in some fashion, but I don't see anywhere in th
This is simple (Score:1)
if filename=pr0n
store_on_good_drive
else
store_on_slow_drive
its for ricers (Score:1)
Illegible Blobs (Score:2)
I already do this ... (Score:1)
Big files that I don't mind losing (ripped dvds and cds) are on a local, cheap raid-5 array.
Everything else resides on my PC.
Every night, my PC runs an automatic rsync job that syncs it all up to my rsync.net filesystem.
I guess, theoretically, I could take it a step further, and add a layer of geographic (and even political) redundancy by making my account sync to California and Colorado, and not just the primary CA site.
rsync.net just announced sites in Switzerland and India
Veritas Storage Foundation (Score:1)
Raid 0 array... (Score:1, Funny)
How do they handle free space? Among other things (Score:2)
My biggest question is how they handle free space tracking? Unless this box has "hooks" into the filesystem, it is not going to have the faintest clue when data has been deleted.
Also, can you say "Holy Fragmentation Batman!"? Again, pretty intense "hooks" into the filesystem are going to be required in order to keep files even remotel
1996, Netware 4.1 & HP hardware (Score:2)
I built something like this 10 years ago. A big corporation's in-house marketing & PR department, lotsa project files full of artwork and such for campaigns, big files used daily for months then ignored for years. It was MacOS 9 & Windows 95 clients, Netware 4.1 on a HP server with RAID 5 and 2 DLTs w/ loaders.
One DLT was for backups using ARCServe (before they got bought by CA). It was simply a matter of shipping cartridges in and out of the storage vault & off-site as required, replacing indi
Gee, a New Idea, only 43 years old! (Score:2, Informative)
Been there, done that. (Score:2)
The joys of waiting for an operator to load a tape so you could edit a file, hoping he wouldn't CANTDO.
(Little used files got shoved of to mag tape. Still showed up in the filestore. When you accessed them a message was sent to the operator: "PLEASE LOAD VOLUME ASBHJ123 FOR