A Case for Redundant Arrays of Inexpensive Disks (RAID)

D. A. Patterson, G. Gibson, and R. H. Katz @ UC-Berkeley

Proceedings of the ACM SIGMOD International Conference on Management of Data, 1988, pages 109-116

Motivation and Problem

Increasing performance gap between processor/memory and disk

Buffer caches or a cache in disk can't help everything since some applications, like transaction processing and seismic processing, have access patterns not well suited to caches

Amdahl's law: Effective speed-up = 1 / [ f / k + ( 1 - f ) ]

	k = speed up
	f = fraction of work done by the components whose speeds are boosted up

A solution: stripping

Array of inexpensive disks connected to SCSI bus

Each disk has its own controller

Organization

Interleaving for bulk data access, or

parallel read/write for small independent accesses

Reliability issue

simple stripping won't work well because reliability is compromised by the large number of components

MTBF (array) = MTBF (individual disk) / # of disks in the array

100 disks -> every 2 weeks

A Better Solution: RAID

Use extra disks for redundant information, e.g., checksum or parity, and repair faults of disks from those extra disks

Partition disks into groups and assign some extra check disks to each group

MTTR (Mean Time To Repair)

MTBF (group) = [MTBF (individual disk) / # of disks in the group] * [1/prob(another failure in the group before repair)]

prob(another failure) = MTTR / MTBF(remaining disks in the group)

MTBF(raid) = MTBF(group) / # of group = MTBF(individual disk)²/[(D+C*n) * (G+C-1) * MTTR]

D = total # of disks

G = # of data disks in a group

C = # of check disks in a group

n = # of groups

Some metrics

Reliability overhead = check-disks / data-disks

Usable storage capacity percentage = just reverse of reliability overhead

Performance: small independent reads Vs big reads

Effective performance = performance boost Vs check disk overhead

First Level RAID: Mirrored Disks

Organization

	Every data disk has an identical copy (a mirror)
	Write to both disks simultaneously
	Read from one of mirrors -- read can be interleaved or done independently by doubling controllers

Advantages

Simple & high reliability

Disadvantages

	Write bandwidth is halved, storage space is halved
	failure of a disk in a group leads to bottleneck (chained declustering can help with this)

Second Level RAID: Hamming Code for ECC

Organization

	Bit interleaving
	Use Hamming codes to detect and correct errors

Small reads/writes issue

	To check and/or update Hamming code, every disk in a group should be accessed
	If disk I/O unit is 512 bytes and group size = 10 with C = 4, then at least 14 * 512 bytes should be accessed per read or write operation

Advantages

	Lower cost (40% overhead usually) than level 1 (100%)
	Performance for large I/O is as good as or better than level 1

Disadvantages

Performance is bad for small reads/writes: 6 ~ 9 % of level 1

Third Level RAID: Single Check Disk Per Group

Level 2 uses Hamming code to determine which disk failed

	Most disk controllers are good enough to detect (hard) error or even to correct (soft) error from checksum per sector
	Parity per group is enough to correct error if we know which disk has failed!

Organization

	Bit interleaving
	Use parity to correct the bit of the faulty disk

Advantages

	Lower cost (4 ~ 10% overhead)
	Less disks per group means less synchronization cost

Disadvantages

Performance is still bad for small reads/writes

Fourth Level RAID: Independent Reads/Writes

Bit level interleaving

Pros: use all disks simultaneously; load balancing

Cons

	only 1 I/O at a time per group
	if disks are not synchronized, biggest rotation delay is taken account

Organization

	Sector (or disk I/O unit) interleaving
	Use parity to correct the bit of the faulty disk

Read - just read the data disk having the sector

Write

Read the data disk having the sector

Read the parity disk

Write to the data disk

Write to the parity disk

new parity = (old-data XOR new-data) XOR old-parity

Advantages

	Same cost as level 2 & 3
	Bulk I/O is as fast as level 2 & 3
	Small I/O is a lot faster than level 2 & 3

Disadvantages

Small modify is still slow. But this type of access is rare!

Parity disk should be accessed every time and thus is bottleneck

	Only one write is allowed at a time per group
	LFS can help with this since it writes out to a log instead of updating in place

Fifth Level RAID: No Single Check Disk

Organization

Removes the distinction between data and parity disks by distributing all data and parity over all disks in a group

Advantages

	small writes do not bottleneck on the check disk
	the system may be able to continue operation during a single failure since parity information is on other disks (but the system is vulnerable to data loss until reconstruction completes.)

Disadvantages

small writes still need to update parity information (a read and write unless parity information is in the host's memory), but the bottlenecking on a single check disk is less likely

Discussion

	For small I/O workload, use level 1
	For bulk I/O workload or you can't afford the capacity overload of mirroring, use level 5

Criticism

see Remzi's note