QQ登录

只需一步,快速开始

 找回密码
 注册

QQ登录

只需一步,快速开始

查看: 972|回复: 2

linux on intel sata raid

[复制链接]
发表于 2004-10-1 11:27:49 | 显示全部楼层 |阅读模式
Intel Software RAID Driver (iswraid)
                  ====================================




                              Overview
                           
Intel Software RAID driver works in conjunction with the Intel RAID Option
ROM, distributed with most (but not all) ICH5R/ICH6R chipsets. It understands
the Intel RAID metadata and allows booting from RAID volumes, regardless of
their RAID level. It is useful when there is a need for compatibility with
other operating systems using these RAID volumes.




                     License, Copyright, Authors

Copyright (C) 2003,2004 Intel Corporation. All rights reserved.

This program is free software; you can redistribute it and/or modify it
under the terms of the GNU General Public License as published by the
Free Software Foundation; either version 2, or (at your option) any later
version.

You should have received a copy of the GNU General Public License (for
example /usr/src/linux/COPYING); if not, write to the Free Software
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.

Authors:
Boji Tony Kannanthanam < boji dot t dot kannanthanam at intel dot com >,
Martins Krikis         < martins dot krikis at intel dot com >.




                               Features

This driver is an ataraid subdriver, albeit utilizing a very minimal set
of facilities provided by it. There are several features that currently
distinguish iswraid from other ataraid subdrivers:
* it scans the Linux SCSI subsystem's disks instead of IDE disks when
  looking for the Intel RAID metadata;
* it "claims" the disks that contain Intel RAID metadata for itself,
  disallowing direct user access to them, thus protecting from an accidental
  data corruption;
* it notices and reports I/O errors for its RAID volumes;
* it updates the Intel RAID metadata when necessary upon errors, thus
  causing volumes to become degraded or even failed, and this status
  is persistent across reboots and operating system changes;
* it provides a user interface via the /proc filesystem that allows the
  inspection of the status of its RAID arrays, disks and volumes;
* it has several module load time parameters that influence its behavior;
* when necessary to split an I/O request, it does so on natural strip
  boundaries;
* it uses slab caches for efficiency;
* it generally does a lot of things its own way, thus avoiding any existing
  problems specific to ataraid subdrivers (and possibly introducing its own).

While they may or may not be distinguishing features, iswraid also:
* supports RAID0 (striping) and RAID1 (mirroring) over 2-disk volumes;
* supports multiple volumes per array ("Matrix RAID");
* deals with missing disks in a reasonable manner;
* can operate with volumes in degraded mode (unless instructed not to);
* implements read error thresholds;
* tries to satisfy failed RAID1 volume reads from their mirrors;
* etc.




                     Requirements and Installation

Intel RAID metadata is generally created using the Intel RAID OROM.
Currently most mainboards based on Intel chipsets with ICH5R/ICH6R
southbridges have this OROM. The "RAID" mode needs to be selected in BIOS
configuration to enable the RAID OROM. The ICH5R/ICH6R are Serial ATA
controllers and iswraid depends on the ata_piix (or any other) driver that
can present SATA disks as SCSI devices to the Linux kernel. Thus, the basic
requirements for using this driver are:
* Intel RAID OROM or Intel RAID metadata already created on disks;
* ata_piix (or any other driver) that can present such disks as SCSI disks
  (ata_piix is part of libata, by Jeff Garzik);
* ataraid (comes standard with 2.4 kernels).

Unless your kernel source came with libata you need to install it.
Please do so before installing iswraid.

The iswraid driver should compile cleanly for all 2.4 series kernels
but has seen more testing with 2.4.22 and above kernels, and such
kernels have the BH_Sync buffer_head flag that this driver likes to use.

In your kernel configuration file you should have "Support for IDE RAID
controllers" (CONFIG_BLK_DEV_ATARAID) and "Support for Intel software RAID"
(CONFIG_BLK_DEV_ATARAID_ISW) enabled (as modules or statically linked,
it does not matter). You should also enable the driver that will present
the disks with Intel RAID metadata as SCSI disks. Normally this means
enabling "Serial ATA (SATA) support" (CONFIG_SCSI_SATA) and "Intel PIIX/ICH
SATA support" (CONFIG_SCSI_ATA_PIIX). Obviously, SCSI support and SCSI disk
support are also necessary.

Note that the iswraid driver is built as part of the Linux SCSI subsystem,
not as part of the IDE modules because when statically linked it needs to
be initialized after the SCSI subsystem. When loading it as a module,
you should load the scsi low level driver first (ata_piix, typically).

Please pay special attention to whether all the necessary disks are
visible by the lower level driver. There can be some unwanted consequences
if iswraid is loaded when not all disks are available to it. Please read
below for how to use one of the module parameters as an additional safety
measure in this situation.

If all the module dependencies are current (do "depmod -a"), it is possible
to cause ata_piix to be loaded on demand when loading iswraid. For this,
add a line like this
  alias scsi_hostadapter ata_piix
to your /etc/modules.conf file or to any files that participate in
generating this file (such as Debian's /etc/modutils/*). Please only
do so once you have made sure that the lower level driver (e.g., ata_piix)
can access all the necessary devices.

When the iswraid driver runs, it scans the Linux SCSI subsystem and makes
the Intel RAID volumes available as ataraid devices. Their device nodes
typically are called /dev/ataraid/d0, /dev/ataraid/d1, etc. The individual
partitions on disk dX (where X is 0, 1, ...) are typically named
/dev/ataraid/dXpY (where Y is 1, 2, ...). These details may be distribution-
specific; the nodes can be created if necessary---ataraid's major number
is 114 and minor numbers from 16 * X to 16 * X + 15 (where X = 0, 1, ...)
belong to the same volume. Numbers in the form 16 * X are for the whole
volumes, numbers in the form 16 * X + Y (where Y > 0) are for partition Y
of volume X. For example:
  mkdir /dev/ataraid
  mknod /dev/ataraid/d2   b 114 32
  mknod /dev/ataraid/d2p8 b 114 40

When modifying LILO configuration file for booting from volumes (or even
in the presence of iswraid that has claimed disks for RAID) you may have
to add lines like:
  disk=/dev/sda
  inaccessible
It may also be necessary sometimes to specify how BIOS will be seeing
the disks, e.g.:
  disk=/dev/ataraid/d0
  bios=0x80
  disk=/dev/hda
  bios=0x81
See the LILO (or your favorite bootloader's) documentation for more
information.



                          Module Parameters

Iswraid recognizes a few module load time parameters, explained below.

* iswraid_claim_disks:
Set to 1 by default. I.e., iswraid will claim all disks with Intel SW
RAID metadata for itself and disable direct access to their block devices.
If set to 0, this feature is turned off.

* iswraid_halt_degraded:
Set to 0 by default, i.e., not in use. If set to 1, this feature is enabled
and causes iswraid to stop using RAID1 volumes that are degraded. It will
instead fail all I/O requests for such volumes. This parameter also has a
useful side effect on RAID metadata updates done at startup, which is
described in detail later in this document.

* iswraid_never_fail:
Set to 0 by default, i.e., not in use. When a RAID1 volume is already
degraded, a failed write or exceeding the read error threshold can cause
it to become failed and this is the default and generally expected behavior.
When this parameter is set, however, such errors will not cause the volume
to be marked as failed, instead merely the I/O itself will fail. Some people
may prefer this behavior because it always makes it clear which disk has the
more up-to-date data.

* iswraid_error_threshold:
Set to 10 by default. Iswraid counts read errors on each disk and if they
exceed this threshold, it marks the disk as failed. This could cause the
volumes containing the disk to become degraded or failed (depending on
RAID levels and other module load parameters). Setting this value to 0
disables checking the error counts on disks. The error counts are not
persistent.




                          Proc Filesystem

The iswraid driver can output information about the state of Intel RAID
arrays, disks and volumes through the /proc filesystem. Each /proc file
generated by iswraid has a header line starting with '#' and containing
space-separated field names. The following lines each correspond to
one object (array, disk or volume) being listed and their fields are
tab-separated. Each of these real data lines is also associated with an
implicit index (starting at 0) and the objects cross-reference each other
using these indices.

In order to query the iswraid arrays, do "cat /proc/iswraid/arrays". Here
is a sample output:

# family generation numdisks numvolumes disks volumes
3e37c9ab        78        2        2        0,2        0,1
3a57e490        74        2        2        1,3        2,3

The first field is the "array family number", which basically distinguishes
each array from any other. The second field is the "array generation number"
that shows how many times this array's metadata have been written out to its
disks. The next fields give the number of disks and volumes in the array,
respectively. The final two fields give comma-separated listings of
disks and volumes that this array contains. The disks and volumes
are given by their implicit indices in the disk and volume listings.

In order to query the disks, do "cat /proc/iswraid/disks". Here is a sample
output:

# major minor status errorcount array serial
8         0        0x13a         0         0        3JT3L0J2
8        16        0x13a         0         1        3JT3LCX6
8        32        0x13a         0         0        3JT3KXRX
8        48        0x13a         0         1        3JT3FX3X

The first two fields are the major and minor numbers of the block devices
corresponding to the disks. The status field is next (the status field
has many bits, not all of which are actually used by iswraid). Each
disk's error count follows. The next field shows which array the disk
belongs to, using the implicit array indices. The last field gives each
disk's serial number (possibly altered by iswraid to strip spaces and
non-printable characters).

The likely most useful information comes from the volume listing, which
can be obtained by doing "cat /proc/iswraid/volumes". A sample output
looks like this:

# node state degradedbits refcnt raidlevel sectors blocksperstrip pbaoflba0 numdisks array disks serial
d0        0x0        0x0        0        0         104026112          8                 0        2        0        0,2        RAID_Volume0
--        0x1        0x0        0        1         104287744        256          52013056        2        0        0,2        RAID_Volume1
--        0x1        0x0        0        1         104026112        256                 0        2        1        1,3        RAID_Volume2
d1        0x0        0x0        0        0         104549888          8         104026112        2        1        1,3        RAID_Volume3

The very first field gives the ataraid device name that the volume corresponds
to. (Actually, the driver does not know the name, but if ataraid's device
nodes are created in the usual manner described above, the dX should be
accurate.) If the volume is in use, it will have an ataraid device
corresponding to it, and this field will show dX (where X is 0, 1, ...).
If the volume is disabled (this only happens if it is "a hopeless volume"
on iswraid startup), then it will not have a corresponding ataraid device
and this field will be "--". When a volume gets disabled, iswraid prints
the reason for this action, so you can check the kernel log.

The second field gives volume state, which is a bitfield; ideally no bits
should be set. The third field, degradedbits, is a bitfield identifying any
disks that are degraded (and thus not in use by RAID1 volumes). The next
field, refcnt gives the number of references to this volume (how many times
its block device has been opened). The RAID level (0 or 1), total sector
count and blocks per strip follow. The "physical block address" of volume's
"logical block address 0" tells where (in each of its constituent disks) the
volume begins. Next comes the number of disks the volume contains (which in
theory could be less than the number of disks in the array) and the implicit
array index. The next-to-last field is a comma-separated list of the disks
that the volume contains, using the indices that are implicit in the disk
listing. Please note that this order may be different from the order in
which the volume's array lists the disks. Finally, we have the "serial
number" (symbolic name) of the volume in the last field.

The array, disk and volume indices are not present in the output
intentionally, in order to save space. Any user-space tools processing
these /proc files can easily generate these missing indices and thus
be able to cross reference the data from all 3 files.




                     Intel RAID Metadata Updates

The iswraid driver is relatively reluctant to update the Intel RAID
metadata. There are a couple of situations when it considers updating
the metadata, explained below.

It normally does update the metadata in error cases, to mark the disks
that have failed and volumes that have changed their state. Sometimes this
can be suppressed, however, by the use of the iswraid_never_fail parameter
and some luck. If there are no volumes that need to change their state,
the RAID metadata will be unchanged.

It will also update the metadata when a formerly missing disk is found.
Unless the Intel RAID Option ROM is misbehaving, however, this should
be hard to observe. This update can only be done on module startup.

Finally, iswraid may update the RAID metadata if a disk needed by some
RAID volumes is missing. RAID0 volumes will simply be disabled in this
case (without marking them failed in the RAID metadata), but RAID1 volumes
would become degraded or failed. This update, too, can only happen during
module startup, not during its operation. Furthermore, unless the OROM is
misbehaving, it will already mark the disk as missing, so iswraid will not
have to do it.

The last update scenario _could_ unfortunately come up when it really
should not---it could be caused by the lower level driver (e.g.,
ata_piix) not seeing all the disks that it should be seeing. For example,
if 4 disks are plugged into an ICH6R-based mainboard and the OROM sees
them all but iswraid is given only 2 of them by the lower level driver
to work with then many volumes could be missing disks and requiring RAID
metadata updates. Performing such updates would not be helpful overall
because they would later require lengthy array rebuild operations
(to be done with the help of OROM and other operating systems or by
using user-space utilities such as dd and your favorite hex editor).
This situation is where the above mentioned "iswraid_halt_degraded"
parameter can be used as an insurance against needless metadata updates.
It is now explained how.

If iswraid_halt_degraded is set, iswraid will realize that it cannot
use the volumes requiring the missing disks because they are either the
disabled RAID0 volumes or the degraded-or-failed (but definitely not usable)
RAID1 volumes. Because of this, it will skip updating the RAID metadata
because it has no volumes to work with anyway. Therefore, for the first
invocation of iswraid it is recommended to do it with the parameter
iswraid_halt_degraded set to 1 for safety. This way, even if only some
disks are found, the RAID metadata on disks will be unaltered.

====================================================
还有一个是dwraid,因为偶以前用的intel 875 + sata 120G x 2的
阵列在linux下一直无法使用,后来在一次意外中损失了全部的数据,后来就不用raid了,唉,这篇东西找点出来就好了。
发表于 2004-10-1 20:15:25 | 显示全部楼层
Who may tranlate this document?
回复

使用道具 举报

 楼主| 发表于 2004-10-2 08:59:15 | 显示全部楼层
i will
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 注册

本版积分规则

GMT+8, 2024-10-5 06:20 , Processed in 0.089772 second(s), 16 queries .

© 2021 Powered by Discuz! X3.5.

快速回复 返回顶部 返回列表