RAID: wide striping, storage virtualization and erasure coding
Storage manufacturers have been quick to modify and adapt RAID levels to meet the needs of their customers. Technologies like wide striping, storage virtualization and erasure coding are changing the basic assumptions of RAID. Much of this work was unheralded and invisible to customers, however, and the old nomenclature persists.
EMC Corp., Hewlett-Packard (HP) Co. and others abandoned the whole-disk concept in the mid-1990s, building RAID 1 and RAID 5 sets from slices of capacity spread across multiple drives. This was taken further in the 2000s by companies like 3PAR and Compellent Technologies Inc., whose "
wide striping" technology places just a little data on each hard disk drive. Spreading data across many more drives improves average performance and reduces the time required to rebuild a
RAID set in the event of a failure. Although many arrays still rely on rigidly defined disk groups, most high-end devices spread data more widely.
Like its server-based cousin,
storage virtualization breaks the rigid link between physical systems and their logical representation. Virtualized arrays present drives and file systems to servers that aren't tied to a specific set of disks. This allows them to freely move this data between RAID sets, hard disk drives, flash storage and even across multiple arrays. Conventional RAID might still be used at the lowest level, but storage virtualization overcomes its inflexible layout and performance limits.
As discussed in my August Tech Tip,
erasure coding is a new kind of data protection math that goes well beyond the simple parity checks used by classic RAID systems. Although often referred to as "dual parity," most implementations of
RAID 6 actually employ advanced Reed-Solomon coding, bringing many advantages over basic parity calculation. These systems can not only recover lost data, they can detect corruption of data. Some systems disperse data widely across drives, storage nodes and geographies for even greater reliability. Although these calculations were widely known in the 1980s, computing power hadn't advanced far enough to utilize them in storage arrays.
Living in the post-RAID world
Today's enterprise storage systems are just as likely to employ these modern data protection schemas as they are to use classic RAID levels, and most are at least somewhat-virtualized. Data storage buyers are likely to encounter any number of new technologies in combinations that make them difficult to assess. It's therefore important to discard outdated "rules of thumb" regarding RAID and focus instead on real-world performance and manageability of systems. Once, the only way to achieve high performance was to combine RAID 1 and data striping (also called "RAID 0") into a "RAID 1+0" or "
RAID 10" set. But modern systems with DRAM and flash caches, wide striping and automated tiering can perform even better without the 50% capacity hit of RAID 1. Similarly, database administrators are loath to use RAID 5 due to the limited performance of classical implementations. But today's systems can overcome these issues, delivering more performance than the basic mirrored disks DBAs often request.
Advances in technology have made RAID technology more common, but not all RAID systems are equal. The power of the CPU and capacity of the cache in an array have much more to do with performance than the arrangement of the disk drives. And disk drives with greater capacity can make a small array appear to be a decent alternative to a larger system, though performance will surely suffer. Put simply, one can't assume that a given system will perform.
The best strategy for storage buyers is to examine the real-world performance of a storage device rather than making assumptions based on RAID levels. They should request references from vendors and examine how a given system supports their applications. RAID is not dead, but the critical issues in enterprise storage have moved beyond it.