Businesses run on data, and getting at that data as fast as possible is the trick to making proper business decisions. At the same time, enterprises don’t need to wade through old, obsolete data when making those decisions.
Managing where the data is stored is what makes the difference in that regard, as does automating the movement of data. “Regardless of a customer’s size, SMB or enterprise,” said Bob Fine, director of product marketing for Dell’s Compellent storage systems, “all of these environments have quite a bit of inactive data, and they didn’t have an automated way to figure out what data is old and what data is new.”
Automated tiering of data—putting the most important data in tier 1 and least important/used/accessed data in the lowest tiers—has been available for some time, albeit as a slave to the limitations of hard disks and spinning media. Compellent, acquired by Dell in 2010, has been in the automated tiered storage business since 2002.
TheInfoStor, a data storage research group and part of The 451 Group, lists automated tiered storage as the hottest technology of 2012, with more IT managers adopting and investing in it than any other storage technology. The key reason is efficiency.
“They are facing continued data growth and flat or slightly decreasing storage budgets. So they need a way to purchase further capacity with less money,” said Marco Coulter, research director for the storage practice at TheInfoPro, of IT departments. At the same time, he noted, database managers have to maintain a consistent performance for applications. So managers are stuck in the middle of conflicting demands.
Meanwhile, storage vendors have engaged in a certain amount of fudging in order to squeeze out a bit more performance, most notably by short-stroking hard drives.
If you look at the mechanism of a hard disk, you’ll see the read/write head can move across the body of the disk platter. Short stroking limits the drive so the arm on which the head is mounted moves as little as possible. You get the fastest performance when the data is on the outside edges, so data is all stored there and the inner capacity of the disk isn’t used.
“In a 100 terabyte array, you got way more than 100TB and got charged for it but only used a fraction of it,” said Eric Herzog, senior vice president of product management and product marketing for EMC’s Unified Storage Division.
A drive spinning at 15,000 RPM—more than twice the speed of the 7,200 RPM of a typical computer—is running very hot, and consumes power like crazy. Combine that with a typical drive’s high cost, and it’s clear a new solution’s been needed for some time.
The first step toward that solution was the advent of solid state storage. In terms of raw performance, Herzon puts a 15k RPM drive at about 200 I/O operations per second (IOPS) at maximum, while SSD is 5,000 IOPS and up. Spinning media is no match for memory.
“Solid state changed the picture. Solid-state storage was significantly better than Fibre, so if you could work out the bits you were accessing you got better performance, and the bits you weren’t accessing you put on hard drives,” said Coulter.
But determining what went where was a difficult process. A selection of controllers and storage vendors eventually came to realize that “Hot” data, i.e., that which is frequently accessed, could be written to the SSD, while “cold,” i.e., the less important data, could be written to hard drives.
Much of that was done through custom controllers, said Coulter. For example, Hitachi Data Systems, one of EMC’s closest competitors in automated tiered storage, has controllers with embedded Intel Xeons to process the data while custom ASICs move it around. That way the system can move data without impacting system performance. It also does wear leveling, writes data into a newly formatted spot, and handles garbage collection.
SSD brought a big jump forward in performance but also a huge jump in price. Because they use high-end interfaces like PCI Express or Fibre Channel, these drives can cost $10,000 or more.
As a result, customers tend to rely on as few SSDs as possible to get the performance, said Fine, sidestepping any capacity issues via storing only “Hot” data.
Some customers need as little as just one percent of their total capacity in flash to get optimal improvements, while others need as high as 10 to 15 percent of their total storage in flash, Herzog added. One EMC customer, Scripps Networks, saw an 87 percent decrease in power and cooling and an 80 percent increase in usable IOPS thanks to hybrid disk and flash.
No Blockheads Here
Another needed breakthrough to help automated tiered storage was the use of blocks of data instead of individual files. Keeping a little metadata with all of the blocks of data allows for daily analysis of what blocks are being used the most and what should be moved where.
Before this technology was on the market, a customer could keep an entire database on a tier 1 disk, meaning 15k RPM drives, stored at RAID10 for maximum performance. Now, they can keep only the hot blocks on SSD and move up to 90 percent of their data off the fastest storage to slower, less expensive SAS-connected drives.
Compellent uses a combination of hardware controllers and software to monitor, track and move metadata, while EMC uses custom software called FAST, for Fully Automated Storage Tiering. HDS has its own method called paging, where it divides its storage system like a paging system in a mainframe memory system.
Hu Yoshida, vice president and CTO of HDS, suggested that moving pages instead of whole volumes is what made dynamic tiering possible. “Prior to dynamic tiering you had to move the whole volume. Not all the data on the volume needed that type of performance. So by dividing this into pages, the fastest volume gets only hot pages,” he said.
With hybrid systems instead of all-hard disk striping, Yoshida estimates companies can reduce the amount of drives they need by almost two-thirds with just a little SSD in the tier one level. He estimates power and space savings of about 40 percent.
The ROI is notable because performance gains come by trading a few SSD drives for a lot of 15k RPM drives. EMC said that when you combine SSD with big capacity disks, the overall system load is reduced, both in use and dramatically reduced space, because you use fewer hard drives. For one customer, Herzog relates the story of one customer who needed 70 hard drives instead of 200; the rest was done in SSDs.
Yoshida said he would like to see the apps provide a little help in the future: “What we do in storage is based upon past activity and anticipate heightened activity and adjust for it.”
He added: “What we want to do is look forward. To do that I need help from the apps. Apps can tell me what they are intending to do and I can pre-stage things to get the highest performance.”