Last night I was working late ish.
Had some SAN work to do - our Clariion's Service Processors had become 'unmanaged' which basically meant that no config could be done on them until they were rebooted.
Now, these aren't just stand alone systems - the SP of a clariion is kinda relied upon by all the things that use the clariion. Which of course, because SAN storage is high performance and expensive, also meant 'all our business critical systems'.
The SP is effectively the SCSI 'target' of the host bus adaptor (fiber card) in the server that's accessing SAN disk. When one fails, the client software _should_ transparently switch to the other SP. I was moderately confident that that worked fine, but the fact that I hadn't actually done this kind of test before meant I was getting a little nervous.
Better yet, we've found that we have _some_ servers on the clariion which can only be taken down on sundays. And others that can only be taken down on saturdays.
The SP reboot went relatively painlessly, apart from 2 'known problems' and one 'well, it might do that' problem. It was something that was in need of testing really anyway, as ... well if one of them blows up at 2am, that's approximately the same thing happening.
Anyhow, finally got as far as swimming last night. At about 20:20 they turned on the 'mood lights' - basically drop the main lighting, and rely on underwater lighting and coloured lights around the pool. Managed a mile non stop. That was rather good.
Had some SAN work to do - our Clariion's Service Processors had become 'unmanaged' which basically meant that no config could be done on them until they were rebooted.
Now, these aren't just stand alone systems - the SP of a clariion is kinda relied upon by all the things that use the clariion. Which of course, because SAN storage is high performance and expensive, also meant 'all our business critical systems'.
The SP is effectively the SCSI 'target' of the host bus adaptor (fiber card) in the server that's accessing SAN disk. When one fails, the client software _should_ transparently switch to the other SP. I was moderately confident that that worked fine, but the fact that I hadn't actually done this kind of test before meant I was getting a little nervous.
Better yet, we've found that we have _some_ servers on the clariion which can only be taken down on sundays. And others that can only be taken down on saturdays.
The SP reboot went relatively painlessly, apart from 2 'known problems' and one 'well, it might do that' problem. It was something that was in need of testing really anyway, as ... well if one of them blows up at 2am, that's approximately the same thing happening.
Anyhow, finally got as far as swimming last night. At about 20:20 they turned on the 'mood lights' - basically drop the main lighting, and rely on underwater lighting and coloured lights around the pool. Managed a mile non stop. That was rather good.
no subject
Date: 2006-06-28 10:28 am (UTC)no subject
Date: 2006-06-28 10:53 am (UTC)no subject
Date: 2006-06-28 12:58 pm (UTC)no subject
Date: 2006-06-29 05:03 pm (UTC)