Information and Strategies for implementing RAID

  • email Email to a friend
  • print Print version
  • Add to your del.icio.us del.icio.us
  • Digg this story Digg this

Did you enjoy this article?

(total 0 votes)

Newsletter

Subscribe to newsletter:

Poll: memory

How much memory in your computer do you have?
Adjust font size: Decrease font Enlarge font
image
Introduction

RAID stands for Redundant Array of Independent Disks. The goal of RAID is to increase redundancy of data as well as the fault tolerance, or the ability of the array to preserve data integrity in the event of a disk failure. RAID is also frequently used to increase performance as well. Listed are five common levels of RAID, with important information on each.

When RAID can be used

RAID can be used in a system whenever there is a RAID controller present and a compatible operating system is used. If a RAID controller isn't present on the motherboard, a PCI or PCI-Express RAID controller can be purchased and installed in the motherboard to add RAID support to the system. Most operating systems have support for multiple disk RAID arrays in various array types, though some require additional drivers.

When RAID should be used

RAID is generally needed in three types of situations:
  1. The hard disk performance need is greater than that offered by a single disk, and multiple disks can be used to help boost read/write speed.
  2. The hard disk redundancy need is greater than that offered by a single disk, and multiple disks can be used to help preserve data integrity.
  3. A combination of numbers 1 and 2.
Home and workstation users generally will not require RAID, though there are exceptions. Most home users utilize RAID 0 to increase performance in computers where disk demand is high but data integrity is not critically important like it is in servers. RAID is most commonly used in servers, where data needs to be preserved and the server needs to remain operational even during a disk crash. In these situations, new hard disks can be put in because the disks are hot swappable, meaning that the new disks can be put in while the server is still running. The data on that disk is then rebuilt still while the server remains operational.

RAID 0: Striping
  • Minimum Number of Disks: 2
  • Fault Tolerance: None
  • Performance Increase: Medium
  • Redundancy/Storage Efficiency: N/A
Pros:
  • Greatly improves read/write performance
  • Inexpensive and easy to implement
  • Has high storage efficiency
Cons:
  • Has no data redundancy at all
  • Failure probability increases with each additional disk
Disk Space

Disk space of a RAID 0 array is the sum of the capacities of all the disks in the array. Assuming each disk has the same formated capacity, the total capacity of the array will be equal to the capacity of any single disk multiplied by the number of disks in the array. The total capacity of a RAID 0 array can be represented by the following equation.
  • Total Capacity = [Capacity of single disk] x [Number of disks in array]
Redundancy:
The major problem with RAID 0 is that there is no data redundancy at all. Since the controller spreads data blocks across multiple disks, it is not protected. For every disk in the array that is added, the redundancy is decreased. A RAID 0 array with two disks, for example, has 1/2 the redundancy as a single disk; an array with three disks has 1/3 the redundancy. This is because if one disk fails, all the data in the entire array is lost. Although redundancy/storage efficiency technically does not apply to RAID 0, it does have the highest storage efficiency because it involves no redundancy.

Performance:

The only purpose of RAID 0 is to increase performance. With more disks working together, the read and write performance is greatly increased. A RAID 0 array with two disks theoretically increases read/write performance in the same way that a dual channel pair of RAM is faster than RAM in single channel. As more disks are added to the array, the controller distributes data blocks across all available disks in the array increasing performance. While RAID 0 arrays often look good on benchmarks, they often have very little real-world benefits unfortunately. Factors can help improve RAID 0 performance such as putting disks in ATA RAID 0 on separate channels and using dedicated RAID controllers with their own RAM, and using identical disks in the array.


Data blocks in a 2-disk RAID 0 array.


RAID 1: Mirroring
  • Minimum Number of Disks: 2
  • Fault Tolerance: Excellent
  • Performance Increase: Low
  • Redundancy/Storage Efficiency: Low
Pros:
  • Greatly improves fault tolerance
  • Greatly improves read performance
  • Inexpensive and easy to implement
  • Easier to rebuild data after failure
  • Can sometimes sustain multiple failures
Cons:
  • Has no write performance increase
  • Has low redundancy/storage efficiency
Disk Space:

Disk space of a RAID 1 array always remains the same no matter how many disks are added to the array. Assuming each disk has the same formated capacity, the total capacity of the array will be equal to the capacity of any single disk. The total capacity of a RAID 1 array can be represented by the following equation.
  • Total Capacity = [Capacity of single disk]
Redundancy:

The goal of a RAID 1 array is to make data redundant in case of a disk failure. Data blocks are written across each drive so that each drive has data written on it exactly the same. This increases redundancy for each drive added. In the event of one drive failing, the controller will read data blocks from the other drive(s). It can be described as an automated backup system. With each drive added to the array, the data is more redundant, and more drives have to fail before data is lost.

Performance:

An often overlooked benefit of a RAID 1 array is the increase in read time. While the purpose of a RAID 1 array is to protect data integrity, it also yields a large performance increase in read time about equal to that of a RAID 0 array. Because the data on each disk is identical, the controller can read different blocks of data at the same time, and increase read performance in the same way as a RAID 0 array. The performance difference is that a RAID 1 array does not gain a performance increase in write times since the controller must write the same data blocks on each disk. Performance can be increased using the same methods listed for RAID 0 arrays.


Data blocks in a 2-disk RAID 1 array.


RAID 5: Striping With Distributed Parity
  • Minimum Number of Disks: 3
  • Fault Tolerance: High
  • Performance Increase: Medium
  • Redundancy/Storage Efficiency: High
Pros:
  • Greatly improves fault tolerance
  • Greatly improves read performance
  • Has high redundancy/storage efficiency
Cons:
  • Expensive and difficult to implement
  • More difficult to rebuild data after failure
Disk Space:

Disk space of a RAID 5 array is the sum of the capacities of all the disks in the array minus the average of the disk capacities. Assuming each disk has the same formatted capacity (which is highly recommended for RAID 5), the total capacity of the array will be equal to the capacity of any single disk multiplied by the number of disks in the array minus the capacity a single disk. The total capacity of a RAID 5 array can be represented by the following equation.
  • Total Capacity = [Capacity of a single disk] x [Number of disks in the array] - 1
Redundancy:

Although RAID 5 uses striping to improve read performance, it has redundancy equal to that of a RAID 1 array. Unlike RAID 0+1 however, RAID 5 uses parity error checking to recover files on a crashed disk in an array like in RAID 4. RAID 5 has a higher redundancy/storage efficiency than RAID 1 because only one extra disk total is needed for redundancy rather than 1 extra disk for every disk in the array. The difference between RAID 4 and RAID 5 is that in RAID 5, the controller distributes parity blocks over all the disks rather than having a dedicated parity disk, which creates a bottleneck. Inaddition, down time in RAID 5 is shorter than that of other mirrored arrays because of the ability of the controller to rebuild data on the fly once the failed disk has been replaced.

Performance:

When it comes to read performance, RAID 5 is a little behind RAID 1 in performance. RAID 1 lacks the parity blocks that RAID 5 has, so RAID 1 has faster read times than RAID 5. RAID 1 also has better write performance than RAID 5 because of the parity blocks, however both configurations to not yield a significant write performance increase like RAID 0 does because all drives are writing the same blocks at the same time. RAID 5 is usually used as a compromise between redundancy/storage efficiency and performance.


Data and parity blocks in a 4-disk RAID 5 array.


RAID 10: Mirroring Plus Striping
  • Minimum Number of Disks: 4
  • Fault Tolerance: Excellent
  • Performance Increase: High
  • Redundancy/Storage Efficiency: Low
Pros:
  • Greatly improves fault tolerance
  • Greatly improves read/write performance
  • Easier to rebuild data after failure
  • Can sometimes sustain multiple failures
Cons:
  • Expensive and difficult to implement
  • Has low redundancy/storage efficiency
Disk Space:

Disk space of a RAID 10 array is the sum of the capacities of all the disks in the array divided by two. Assuming each disk has the same formated capacity which is highly recommended for RAID 10, the total capacity of the array will be equal to the capacity of any single disk multiplied by the number of disks in the array divided by two. The total capacity of a RAID 0 array can be represented by the following figure.
  • Total Capacity = [Capacity of single disk] x [Number of disks in array] ÷ 2
Redundancy:

RAID 10 is the opposite of RAID 0+1, which is explained next. It is sometimes known as RAID 1+0. RAID 10 consists of the striping of two mirrored arrays. The goal of RAID 10 is to get all the redundancy of RAID 1 while improving performance further. Since all the data is mirrored, the configuration has a high fault tolerance, the same as RAID 1. It can sustain two drive failures if each failure is on the opposite side of the array. If one drive in the array does fail, the array will turn into a RAID 0 array until it is replaced.

Performance:

The performance of RAID 10 is equal to that of a RAID 0 array with the same number of disks. If one drive is slightly slower than the others, however, it become a bottleneck will slow down the overall transfer rate. Since each drive is mirrored and then striped, the array has the same read performance as a four disk RAID 0 array, and as a result is favored over RAID 0+1 when the controller supports it. Write time however is still only double that of s single disk. Disks must be added on in pairs, and when they are, they will be used for further striping rather than further mirroring. Therefore, adding more disks will result in increased performance.

Data blocks in a 4-disk RAID 10 array.


RAID 0+1: Striping Plus Mirroring
  • Minimum Number of Disks: 4
  • Fault Tolerance: High
  • Performance Increase: Medium
  • Redundancy/Storage Efficiency: Low
Pros:
  • Greatly improves fault tolerance
  • Greatly improves read/write performance
  • Easier to rebuild data after failure
Cons:
  • Expensive and difficult to implement
  • Has low redundancy/storage efficiency
Disk Space:
Disk space of a RAID 0+1 array is the same as that of a RAID 10 array, it is the sum of the capacities of all the disks in the array divided by two. Assuming each disk has the same formated capacity which is highly recommended for RAID 10, the total capacity of the array will be equal to the capacity of any single disk multiplied by the number of disks in the array divided by two. The total capacity of a RAID 0 + 1 array can be represented by the following equation.
  • Total Capacity = [Capacity of single disk] x [Number of disks in array] ÷ 2
Redundancy:

A RAID 0+1 array is the opposite of a RAID 10 array. It is sometimes known as RAID 01, which is not to be confused with RAID 1. In a RAID 0+1 array, the redundancy is equal to those of a RAID 1 and RAID 5 arrays. Each pair of RAID 0 arrays is equal, so all data blocks are written in multiple places. The difference between that and a RAID 0+1 array is that when one disk fails, the side of the array with the disk failure stops being written to, and the entire array turns into a RAID 0 array until the failed disk is replaced. In a RAID 1 array, the array just turns into a single disk system until another disk is added.

Performance:

RAID 0+1 is basically a pair of RAID 0 arrays that are mirrored. Since each pair of drives is striped like in RAID 0, read and write performance is increased per drive almost as much as in a RAID 0 array. If more than four drives are added, they must be added in pairs, and they are used to increase the performance of each RAID 0 array within the entire array. The only limitation is that each pair of arrays must work exactly the same for data to be read and written at optimum speeds. RAID 0+1 also doesn't get the read performance increase that a RAID 1 array gets.


Data blocks in a 4-disk RAID 0+1 array.


Tips to consider when using RAID arrays:
  • Consider using RAID 1 in place of RAID 0 when read performance is more important than write performance.
  • Installing operating systems on RAID 5 arrays should not be done if possible.
  • Operating systems, programs, and important data should be stored on mirrored RAID arrays like RAID 1, 10, or 0+1.
  • Use inexpensive disks on RAID arrays that require many disks like RAID 10 and 0+1.
  • Minimize the number of drives in RAID 0 arrays to reduce the chance that all data in the array will be lost.
  • Use disks with identical specifications when setting up all RAID arrays for best performance.
  • Hardware RAID using a RAID controller will yield higher overall performance than software RAID.
  • When looking into a new computer that will be host to a small RAID array, look for a motherboard with an onboard controller.
  • For large RAID arrays, a dedicated controller with its own dedicated RAM will offer best performance.
  • Optimize enterprise RAID 5 arrays by installing seven drives: six in the array and one as a global hot spare.


Acknowledgments: Special thanks to crazijoe, Zazula, JohnthePilot, kodi, cjessee, and Kalim for helping me revise and shape this article.

The first three images are licensed under the GNU Free Documentation License. For more information, click here.
The last two images are not licensed and have been released into the Public Domain.

  • email Email to a friend
  • print Print version
  • Add to your del.icio.us del.icio.us
  • Digg this story Digg this

Post your comment comment Comments (0 posted)