Monday, December 20, 2010

Veritas Volume Manager (VxVM) Hot Spares

When a disk fails that is part of a redundant volume (e.g., RAID 1, RAID 5), the volume is able to continue handling I/O requests, but becomes susceptible to data loss if additional devices fail ( and in the case of RAID5 volumes, the volume will operate in a degraded state, since parity calculations are required to recreate data).
To remedy the potential impacts associated with device failures, Veritas Volume Manager (VxVM) starts the vxrelocd(1m) failure event detection and subdisk relocation daemon at system boot time. This daemon periodically scans the vxnotify(1m) output, and upon detecting a failure, attempts to relocate data to a working device.
When relocating data, Veritas will first attempt to use a device marked as a spare. If Veritas is unable to find a device marked as a spare, Veritas will attempt to relocate data to a device that contains adequate space and doesn’t have the “nohotuse” flag set. To see if a device contains the nohotuse or spare flag, the vxdisk(1m) utility can be invoked with the list option, and the device to list:
$ vxdisk list c1t6d0
Device:    c1t1d0s2
devicetag: c1t1d0
type:      auto
hostid:    pooh
disk:      name=c1t1d0 id=1123602295.10.pooh
group:     name=oradg id=1123603158.13.pooh
info:      format=cdsdisk,privoffset=256,pubslice=2,privslice=2
flags:     online ready private autoconfig spare autoimport imported
pubpaths:  block=/dev/vx/dmp/c1t1d0s2 char=/dev/vx/rdmp/c1t1d0s2
version:   3.1
iosize:    min=512 (bytes) max=2048 (blocks)
public:    slice=2 offset=2304 len=35365968 disk_offset=0
private:   slice=2 offset=256 len=2048 disk_offset=0
update:    time=1123603160 seqno=0.6
ssb:       actual_seqno=0.0
headers:   0 240
configs:   count=1 len=1280
logs:      count=1 len=192
Defined regions:
 config   priv 000048-000239[000192]: copy=01 offset=000000 enabled
 config   priv 000256-001343[001088]: copy=01 offset=000192 enabled
 log      priv 001344-001535[000192]: copy=01 offset=000000 enabled
 lockrgn  priv 001536-001679[000144]: part=00 offset=000000
Multipathing information:
numpaths:   1
c1t1d0s2        state=enabled
To mark a device as a hot spare, the vxedit(1m) utility can be used:
$ vxedit set spare=on c1t6d0
$ vxdisk list c1t6d0
Device:    c1t6d0s2
devicetag: c1t6d0
type:      auto
hostid:    winnie
disk:      name=c1t6d0 id=1127240120.14.winnie
group:     name=oradg id=1127240283.19.winnie
info:      format=cdsdisk,privoffset=256,pubslice=2,privslice=2
flags:     online ready private autoconfig spare autoimport imported
pubpaths:  block=/dev/vx/dmp/c1t6d0s2 char=/dev/vx/rdmp/c1t6d0s2
version:   3.1
iosize:    min=512 (bytes) max=2048 (blocks)
public:    slice=2 offset=2304 len=35521408 disk_offset=0
private:   slice=2 offset=256 len=2048 disk_offset=0
update:    time=1127961735 seqno=0.28
ssb:       actual_seqno=0.0
headers:   0 240
configs:   count=1 len=1280
logs:      count=1 len=192
Defined regions:
 config   priv 000048-000239[000192]: copy=01 offset=000000 disabled
 config   priv 000256-001343[001088]: copy=01 offset=000192 disabled
 log      priv 001344-001535[000192]: copy=01 offset=000000 disabled
 lockrgn  priv 001536-001679[000144]: part=00 offset=000000
Multipathing information:
numpaths:   1
c1t6d0s2        state=enabled
To request that a device not be used for relocation, the “nohotuse” flag can be set. This will cause vxrelocd(1m) to skip the device when making relocation decisions, which ensures that data doesn’t get relocated to free space on busy disks. To set the “nohotuse”flag, the vxedit(1m) utility can be used:
$ vxedit set nohotuse=on c1t6d0
$ vxdisk list c1t6d0
Device:    c1t6d0s2
devicetag: c1t6d0
type:      auto
hostid:    winnie
disk:      name=c1t6d0 id=1127240120.14.winnie
group:     name=oradg id=1127240283.19.winnie
info:      format=cdsdisk,privoffset=256,pubslice=2,privslice=2
flags:     online ready private autoconfig nohotuse autoimport imported
pubpaths:  block=/dev/vx/dmp/c1t6d0s2 char=/dev/vx/rdmp/c1t6d0s2
version:   3.1
iosize:    min=512 (bytes) max=2048 (blocks)
public:    slice=2 offset=2304 len=35521408 disk_offset=0
private:   slice=2 offset=256 len=2048 disk_offset=0
update:    time=1127961735 seqno=0.28
ssb:       actual_seqno=0.0
headers:   0 240
configs:   count=1 len=1280
logs:      count=1 len=192
Defined regions:
 config   priv 000048-000239[000192]: copy=01 offset=000000 disabled
 config   priv 000256-001343[001088]: copy=01 offset=000192 disabled
 log      priv 001344-001535[000192]: copy=01 offset=000000 disabled
 lockrgn  priv 001536-001679[000144]: part=00 offset=000000
Multipathing information:
numpaths:   1
c1t6d0s2        state=enabled
The relocation process can consumes considerable amounts of I/O and CPU resources, so it’s often beneficial to explicitly pick the hot spares by hand. This will ensure that when failures occur, data is not relocated to a chunk of free space that resides on the same spindles as your production data.

No comments:

Post a Comment