|

 |
 |
 |
 |

RAID
Overview
Definition of RAID
RAID, or Redundant Array of Inexpensive Disks (or later also referred to
as Redundant Array of Independent Disks) is an acronym first used in a 1988 paper
by Berkeley researchers Patterson, Gibson and Katz. RAID is a technology developed
to improve data protection and performance while storing large amounts of data, without
necessarily requiring improvements in disk drive technology.
RAID Levels
As the definition and awareness of the RAID technology has grown, several RAID configurations
for storing data have been devised and standardized upon. These RAID "levels" are now commonly
discussed in the industry. The simplest RAID configurations either "stripe" data across two drives
to increase data transfer speed, but offer no data protection; or "mirror" redundant data onto a
second drive, without increasing performance. More advanced configurations involve three or more
drives, and simultaneously provide fault tolerance, increased performance, and the ability to
"recreate" information onto a spare drive should a drive failure occur. These more advanced RAID
configurations are preferred in server environments where maximum data availability and performance
is critical.
The applications, advantages, and disadvantages of the different RAID configurations, or levels,
are described below. The numbers assigned to each level of RAID do not indicate superiority or
effectiveness; they are only used to differentiate between them.
RAID 0 - Disk Striping
With RAID 0, or a configuration known as "data striping", data is written in sequential sections
across more than two drives. RAID 0 is easy to implement, and it can dramatically improve performance.
Several drives can be accessed at once, minimizing the overall "seek" time of larger files. This
configuration has no data redundancy and therefore no protection against data loss, however, so it
should not be used for business-critical applications.
RAID 1 - Mirroring
Also known as "drive mirroring", RAID 1 simultaneously copies data to a second drive. The mirroring
method offers data protection and good performance in the case where a mirrored drive fails. RAID 1
is the simplest RAID configuration, requiring only a minimum of two drives with equal capacity, and
also that the drives be added in pairs. The main disadvantage of RAID 1 is that it uses 100% drive
overhead (the highest of all RAID levels), which can be considered an inefficient use of drive
capacity.
RAID 2- Redundancy Using Hamming Code
A RAID 2 array stripes data to a group of drives using a byte stripe. A hamming code Error Checking
and Correction (ECC) symbol for each data stripe is stored on a dedicated drive. This code provides
detection and correction of data errors, allowing data to be recovered without completely duplicating
the data. Since most drives now embed ECC information within each sector as standard, however, RAID
2 doesn't offer any advantages over RAID 3.
RAID 3 - Striping Plus Parity
RAID 3 stripes data across multiple drives, with an additional drive dedicated to parity for error
correction/recovery. This configuration offers very high data transfer rates and only requires a
small percentage of ECC (parity) to data drives. However, RAID 3 requires a complicated controller
design and the configuration may be difficult to rebuild after a drive failure.
RAID 4 - Independent Striping Plus Parity
RAID 4 is identical to RAID 3 except that large strips are used, so that records can be read from any
individual drive in the array apart from the parity drive, allowing read operations to be
overlapped. Since RAID 4 offers no significant advantages over RAID 5, the RAID 4 configuration
is now rarely implemented.
RAID 5 - Independent Striping Plus Distributed Parity
With RAID 5, each block of data is written on a data drive and parity information is then striped
across all drives. RAID 5 is the most popular of the RAID levels because it delivers data protection
and good performance with a small overhead for parity. RAID 5 offers the most efficient use of drive
capacity of all the redundant RAID levels. This configuration requires at least three drives of equal
size, which can be added one at a time.
RAID 6 -RAID 5 With Double Parity (or "P+Q Redundancy")
(Not recognized by the RAID Advisory Board (RAB).)
RAID 6 is an extension of RAID 5 that uses a second independent distributed parity scheme.
Data is striped on a block level across a set of drives, and then a second set of parity is
calculated and written across all of the drives. This configuration provides extremely high
fault tolerance and can sustain several simultaneous drive failures, but it requires an "n+2"
number of drives and a very complicated controller design.
RAID 10 - Combination of RAID 1 and RAID 0
(Not recognized by original Berkeley papers or by the RAB.)
RAID 10 combines RAID 0 and RAID 1 by striping data across multiple drives without
parity, and it mirrors the entire array to a second set of drives. This process delivers fast
data access (like RAID 0) and single drive fault tolerance (like RAID 1), but cuts the usable drive
space in half. RAID 10 requires a minimum of four equally sized drives, is the most expensive RAID
solution and offers limited scalability.
RAID 53 - Combination of RAID 0 and RAID 3
(Not recognized by original Berkeley papers or by the RAB.)
The RAID 53 configuration should really be called "RAID 03". This configuration is a striped array
whose segments are essentially RAID 3 arrays. It has the same fault tolerance and high data transfer
rates of RAID 3, with the high I/O rates associated with RAID 0 (striping), plus some added
performance. This configuration is very expensive and requires at least 5 drives to implement.
RAID Applications, Tradeoffs and Limitations
Typically, RAID is used in systems where data accessibility is critical and fault tolerance is
required, such as in large file servers. However, RAID is now also more frequently seen used in
desktop systems for CAD, multimedia editing and playback where higher transfer rates are needed.
In general, for a given price point, the performance improvement of a particular type of RAID array
"trades off" with the amount of the redundancy and data security of the array. Similarly,
capacity of the array "trades off" with the price and fault tolerance. Inexpensive RAID
solutions are limited in their ability to protect your data or improve performance, whereas high-end
RAID implementations providing very high performance and very high data reliability are quite
expensive.
Although RAID can greatly improve the reliability and performance of a storage system, it is
dangerous to assume that a RAID system with redundancy provides absolute data protection. Since
there are sources of failure that are still applicable to RAID systems, such as viruses,
environmental disturbances and/or cases where more than one drive fails at the same time, regular
system maintenance and backup remain critical practices.
|
|
|