sobrique: (Default)
[personal profile] sobrique
I've heard a few people ask 'what I do'. I thought I'd take some time to scribble it down. What I do is 'storage'. I'm a storage analyst. This means I'm responsible for looking after SANs, backups and archiving. I'm responsible for where these services interface to the rest of infrastructure, such as servers and networking.


SAN

A SAN is a storage area network. Conceptually, it's a network of disks. Each host (or server) has one or more host bus adaptors (which are basically SAN network cards). Usually you use two, or more, because that gives you more resilience.

These host bus adaptors are connected fabric switches, which are in turn connected to storage arrays.

The storage arrays vary a great deal. The ones I'm working with are EMC Symmetrixes. What they do, is hold a large number of individual disk drives. They allow for creation of logical volumes - volumes made up of multiple physical devices.

In addition, they have a few other features to improve reliability and performance of the drives. Amongst these are:

RAID - a technology that introduces redundancy into the disk configuration. It stands for Redundant Array of Inexpensive Disks.

RAID 1 is disk mirroring, such that each chunk of data is written to two (or more) separate physical locations.

RAID 0 is writing data in parallel across multiple drives, to allow you to read and write faster (each drive is limited in throughput, using multiple overcomes this limitation).

RAID 1 is very expensive on disks - you use twice as many as actual storage. So RAID 5 is also used, which is 'parity' - you 'waste' effectively one disk out of your RAID 5 set, to provide some resilience, providing a tradeoff. (RAID5 is often 4+1 or 7+1, indicating the overhead of maintaining the parity).

There's serveral other forms of RAID of course, but 0, 1 and 5 are the most commonly used.

Caching - large amounts of memory, to allow disk reads and writes (I/Os - input/output operations) to be done faster. In the symmetrix, every I/O goes through the cache. The cache itself is very large, 120Gb or more on the larger arrays.

Multipathing - the array is 'clever' enough to allow access to the disk through multiple access channels. This allows you to use two independant 'fabrics' (or disk networks) for your activity. This is for resilience reasons, to remove single points of failure.

Snapshots and Clones - a snapshot is a point in time 'marker' on a logical volume. When it's activated, anything that changes on the volume is recorded. This uses additional storage space, but allows you to have a point in time view of the data. This can be really handy for a variety of things, but the major one is backups. Clones are similar, in that they maintain a point in time copy of your disk, but the difference is a clone is a 'full' copy - everything on the volume is duplicated to the 'clone' volume.

Remote Replication - SRDF - Symmetrix Remote Data Facility. Allows for logical volumes to be duplicated across multiple physical arrays. In our system, we have a 'primary' array, and a 'standby' at a remote site - data is continually synchronized to the remote site, for disaster recovery reasons.

The other areas I look after are Backup and Archiving. These are different things - the point of a backup is to allow for recovery in case of 'problems'. Typically a backup is made daily, to a tape library. Backups are preserved for a defined period of time - exactly how long is based on the tradeoff of volume of data vs. the need to recover. 'most' (>95%) of backup recovery requests are for under 30 days.

We use a network backup server, onto a tape library to back up our servers. It's down to me (and my team) to keep an eye on it, check that it's doing backups properly, manage the tape usage, and otherwise install new clients onto the backup service.

We use some of the SAN technologies to do our backups faster and more efficiently. For example, on some of our servers we take a snapshot of the hard disk, so the server can continue running at full speed, whilst we're backing up. Some things, such as databases, can be 'fun' to back up, when they're running, because the data is continually changing, and needs to be consistent to be recoverable.

Archiving is vaguely similar to backup, in that it's retention of data. The difference is that archiving is about keeping data, where backup is about recovering it in a disaster. We will be archiving email, to allow everyone to keep their emails for as long as they wish, without having to put a huge burden on the mail server. Individual mailboxes can grow very large if unchecked, and ... well it's not desireable to delete all mail coming in. So, we archive stuff older than a certain age, take it off the mail server, and put it onto a Centera. A centera is a storage device, and it's a cheaper and slower than the 'primary' disks that the mailserver uses.

All these things have daily checks that need doing. Disks in the arrays do fail periodically - individual disks are very reliable, but when you have thousands, failures start to become more common. Backups do fail, and I need to troubleshoot and figure out why. And sometimes start the backups again. I also need to recover things from the backups from time to time.

So, that's mostly what I do. I do have to interact with lots of other areas of the IT service - storage, backup and archiving underpins a whole lot of things.
This account has disabled anonymous posting.
If you don't have an account you can create one now.
HTML doesn't work in the subject.
More info about formatting

Profile

sobrique: (Default)
sobrique

December 2015

S M T W T F S
  12345
6789101112
13141516171819
20212223242526
2728 293031  

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Mar. 10th, 2026 05:29 pm
Powered by Dreamwidth Studios