That server
Sep. 19th, 2005 12:50 pmWell, the problem:
Server is up, oracle is running.
Disks are mirrored, but no one can login (telnet, ssh, rsh, or on the console).
Information to hand is from systems monitoring - out of date:
Diagnosis? c0t10d0 has failed. It's mirrored, so this _shouldn't_ have an impact. But obviously does.
Well, there's no d3 and no d8. But every other '/dev/md/dsk' is mounted as a filesystem.
So what I reckon has happened, is that when someone mirrored the OS disk, they didn't do the swap partition. So a disk has failed, and 'swap' has gone byebye.c0t10d0s7
If I'm right, this is a relatively easy fix - boot from cd, mount the 'root' slice, edit the vfstab, and change swap from whatever it is, to whatever it isn't.
c0t10d0s?? will be failed swap. This will probably be c0t10d0s4 - since 0,1,3,5,6,7 have other uses, and slice 2 is the 'backup', and will have an entry looking like:
/dev/dsk/c0t10d0s4 - - swap - no -
So changing that 10 to a 0 _should_ fix the problem. (Or if there's two swap lines, then just deleting it should do the trick).
Hopefully that'll fix it, so this afternoon, we find out if my remote best guess diagnosis is correct. If it is, I get to do the 'wahoo I'm fantastic dance'. If it's not, I get to shrug and do the 'well, you can't even log in to the damn box, and i'm 100 miles away, what do you expect' fob off.
Server is up, oracle is running.
Disks are mirrored, but no one can login (telnet, ssh, rsh, or on the console).
Information to hand is from systems monitoring - out of date:
cts021 - meta
--------------------------------------------------------------------------------
red Mon Sep 19 12:50:53 BST 2005
MetaDatabases (/usr/opt/SUNWmd/sbin/metadb -i)
Database replicas are not active:
flags first blk block count
a m p luo 16 1034 /dev/dsk/c0t12d0s7
a p luo 16 1034 /dev/dsk/c0t9d0s7
a p luo 16 1034 /dev/dsk/c0t11d0s7
a p luo 16 1034 /dev/dsk/c0t8d0s7
a p luo 16 1034 /dev/dsk/c0t0d0s6
W p l 16 1034 /dev/dsk/c0t10d0s6
o - replica active prior to last mddb configuration change
u - replica is up to date
l - locator for this replica was read successfully
c - replica's location was in /etc/opt/SUNWmd/mddb.cf
p - replica's location was patched in kernel
m - replica is master, this is replica selected as input
W - replica has device write errors
a - replica is active, commits are occurring to this replica
M - replica had problem with master blocks
D - replica had problem with data blocks
F - replica had format problems
S - replica is too small to hold current data base
R - replica had device read errors
Metadevices (/usr/opt/SUNWmd/sbin/metastat)
Metadevices are not Okay:
d1: Mirror
Submirror 0: d11
State: Okay
Submirror 1: d12
State: Needs maintenance
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 2049720 blocks
d11: Submirror of d1
State: Okay
Size: 2049720 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c0t0d0s1 0 No Okay
d12: Submirror of d1
State: Needs maintenance
Invoke: metareplace d1 c0t10d0s1
Size: 2049720 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c0t10d0s1 0 No Maintenance
d2: Mirror
Submirror 0: d21
State: Okay
Submirror 1: d22
State: Needs maintenance
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 1050776 blocks
d21: Submirror of d2
State: Okay
Size: 1050776 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c0t0d0s3 0 No Okay
d22: Submirror of d2
State: Needs maintenance
Invoke: metareplace d2 c0t10d0s3
Size: 1050776 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c0t10d0s3 0 No Maintenance
d4: Mirror
Submirror 0: d41
State: Okay
Submirror 1: d42
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 2049720 blocks
d41: Submirror of d4
State: Okay
Size: 2049720 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c0t0d0s5 0 No Okay
d42: Submirror of d4
State: Okay
Size: 2049720 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c0t10d0s5 0 No Okay
d5: Mirror
Submirror 0: d51
State: Okay
Submirror 1: d52
State: Needs maintenance
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 25850032 blocks
d51: Submirror of d5
State: Okay
Size: 25850032 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c0t0d0s7 0 No Okay
d52: Submirror of d5
State: Needs maintenance
Invoke: metareplace d5 c0t10d0s7
Size: 25850032 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c0t10d0s7 0 No Maintenance
d6: Mirror
Submirror 0: d61
State: Okay
Submirror 1: d62
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 35354136 blocks
d61: Submirror of d6
State: Okay
Size: 35354136 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c0t8d0s7 4712 Yes Okay
d62: Submirror of d6
State: Okay
Size: 35354136 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c0t11d0s7 4712 Yes Okay
d7: Mirror
Submirror 0: d71
State: Okay
Submirror 1: d72
State: Okay
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 35354136 blocks
d71: Submirror of d7
State: Okay
Size: 35354136 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c0t9d0s7 4712 Yes Okay
d72: Submirror of d7
State: Okay
Size: 35354136 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c0t12d0s7 4712 Yes Okay
d9: Mirror
Submirror 0: d91
State: Okay
Submirror 1: d92
State: Needs maintenance
Pass: 1
Read option: roundrobin (default)
Write option: parallel (default)
Size: 259160 blocks
d91: Submirror of d9
State: Okay
Size: 259160 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c0t0d0s0 0 No Okay
d92: Submirror of d9
State: Needs maintenance
Invoke: metareplace d9 c0t10d0s0
Size: 259160 blocks
Stripe 0:
Device Start Block Dbase State Hot Spare
c0t10d0s0 0 No Maintenance
Hot Spares (/usr/opt/SUNWmd/sbin/metahs -i)
metahs: cts021: no hotspare pools found
Status unchanged in 2.77 days
Status message received from 10.34.72.21
And disk layouts of:
swap 2562240 9728 2552512 1% /tmp
/dev/md/dsk/d4 962571 212570 692247 24% /opt
/dev/md/dsk/d6 17404618 6186904 11043668 36% /u02
/dev/md/dsk/d7 17404618 6835357 10395215 40% /u03
/dev/md/dsk/d9 120667 45760 62841 43% /
/dev/md/dsk/d5 12726996 6856364 5743363 55% /u01
/dev/md/dsk/d2 494235 259486 185326 59% /var
/dev/md/dsk/d1 962571 774706 130111 86% /usr
Total size on physical partitions (not including swap) is = 14335Mb
Diagnosis? c0t10d0 has failed. It's mirrored, so this _shouldn't_ have an impact. But obviously does.
Well, there's no d3 and no d8. But every other '/dev/md/dsk' is mounted as a filesystem.
So what I reckon has happened, is that when someone mirrored the OS disk, they didn't do the swap partition. So a disk has failed, and 'swap' has gone byebye.c0t10d0s7
If I'm right, this is a relatively easy fix - boot from cd, mount the 'root' slice, edit the vfstab, and change swap from whatever it is, to whatever it isn't.
c0t10d0s?? will be failed swap. This will probably be c0t10d0s4 - since 0,1,3,5,6,7 have other uses, and slice 2 is the 'backup', and will have an entry looking like:
/dev/dsk/c0t10d0s4 - - swap - no -
So changing that 10 to a 0 _should_ fix the problem. (Or if there's two swap lines, then just deleting it should do the trick).
Hopefully that'll fix it, so this afternoon, we find out if my remote best guess diagnosis is correct. If it is, I get to do the 'wahoo I'm fantastic dance'. If it's not, I get to shrug and do the 'well, you can't even log in to the damn box, and i'm 100 miles away, what do you expect' fob off.