Backups

Jan. 20th, 2006 11:06 am
sobrique: (Default)
[personal profile] sobrique
I've just been told by 'directorial fiat' that the backups of our oracle cluster shall be stopped henceforth, because one of the apps is running a bit slow.
I don't like that.

We are tasked with looking for workarounds for the problem. The problem is that our backup data volumes have increased dramatically (we're moving around 2-3 Tb / night) and our network infrastructure hasn't been upgraded in 5 years. (We have 100mb to every desktop, and a 155mb backbone). So we are faced with wallpapering over cracks.
Again.
I don't like that either.

Oh, and the person having the problem whinged at his director, who has in turn whinged and our director, who has whinged at our management team, who have whinged at us. Which is good really, because that was the first we'd heard about it.
I don't like that either.

Date: 2006-01-20 11:18 am (UTC)
From: [identity profile] tenuous.livejournal.com
Other than that, you're happy, though?

Date: 2006-01-20 11:59 am (UTC)
From: [identity profile] sobrique.livejournal.com
Friday; beer soon.

Date: 2006-01-20 12:01 pm (UTC)
From: [identity profile] jorune.livejournal.com
What!?

WHAT?!

No backups, are they mad?

Are they sure it's a network capacity issue? are they even sure what a network is?

Just when is this random act of management supposed to kick in?

Date: 2006-01-20 12:07 pm (UTC)
From: [identity profile] sobrique.livejournal.com
Backups have already been stopped. :/

They are not sure it's a capacity issue. That's only what myself and my collegue have been saying; we haven't had a consultant in therefore we cannot have an official opinion on the matter.

We have a network bottleneck, that's running at 40-80% all day. We have a backup server that's (over) running at 100Gb/hour + for 14-16 hours per day. With 8 tape drives, any 'glitch' has notable knock on effects (such as a server with a bad network link, tying up a tape drive for 16 hours)

Date: 2006-01-20 12:59 pm (UTC)
From: [identity profile] darkgodfred.livejournal.com
Is it wrong of me to suggest a data failure to demonstrate precisely why the God of Backups must be appeased?

Date: 2006-01-20 01:04 pm (UTC)
From: [identity profile] jorune.livejournal.com
Just what do they expect to happen when they need to go back to backup. I imagine the business's expectation will be - it'll just happen. What do your manager and director expect to happen? Knowing that each day that goes forward the business is not supported and stands to lose all their effort.

Presumably your manager and the director have gone back to the business and told them that if they want a faster network/no backups then they will have accept the risk, in writing. Or would that be putting too much trust and credibility in the management

Date: 2006-01-20 02:03 pm (UTC)
From: [identity profile] sobrique.livejournal.com
I have overused the:
If a car park is full, it is not faulty, it is just full.
If a road is congested, it is not faulty, it's just busy.
analogies.

Date: 2006-01-20 02:34 pm (UTC)
From: [identity profile] jorune.livejournal.com
I think Adam and myself are wondering whether they are service agreements in place. If so does the choice to stop the backups now violate those service agreement?

Date: 2006-01-20 02:37 pm (UTC)
From: [identity profile] sobrique.livejournal.com
Probably. I'm planning to start the backups again this evening anyway, simply because I'm not prepared to accept being the scape goat.

Date: 2006-01-20 01:21 pm (UTC)
From: [identity profile] xarrion.livejournal.com
Make sure you have it in writing/hardcopy that you've been told to stop backups. If it goes wrong now, you know who they're likely to blame ;)

I'd also raise the issue with management about correct error reporting procedure. Do you have Service Level Requests or equivalent business mumbo-jumbo over at your end?

Having said that, it could be one of those unintentional chinese-whisper cascades you get with management, where you say 'this network is a bit slow', your boss overhears and goes to his manager with a 'my people can't work due to slow IT', and so forth.

Date: 2006-01-20 02:02 pm (UTC)
From: [identity profile] sobrique.livejournal.com
In writing? What, so I can prove they screwed up? I don't think so.
Chinese whispers is a definite possibility, especially when you have techy -> manager -> director -> director -> manager -> techy.

Which is one of the reasons We Don't Do It.

Date: 2006-01-20 02:29 pm (UTC)
From: [identity profile] malal.livejournal.com
I'm sorry, you need to turn around and say "Stopping the backups is such a huge breach in our data security policy that I need to have it in writing before I can implement it".

Your job & reference history could depend on it.

Date: 2006-01-20 02:44 pm (UTC)
From: [identity profile] sobrique.livejournal.com
Agreed.
I'm doing arse covering at the moment.
I _do_ have a fallback plan on how to recover, but it's ugly.
Oracle databases really suck to rebuild :/

Date: 2006-01-20 03:10 pm (UTC)
From: [identity profile] warmage.livejournal.com
Oh man do I feel you on the "we haven't had a talking head brought in to tell us what we already know" thing.

When was the last time you were asked to submit a buildup plan for disapproval? I recall in the last year you've added at least one storage appliance, and most of your kit is ready for gigabit or fibre-channel, right?

The way I feel it, the data is looming ever closer to crashdown, simply by the way the fates conspire to make it increasingly difficult to do effective recovery.

What happens if you start cold calling offsitedata storage shops and get big fat blue-sky quotes from them to contrast against a backbone upgrade?


This may not be a terribly dangerous situation now, Ed. You know, though, that the length of time left before it does is directly proportional to the MTBFR! (at this point you don't *really* trust this kit, so much as add another wad of chewing gum, aye?) And didn't you just come back up from some nightmare a couple weeks ago??

Date: 2006-01-20 03:22 pm (UTC)
From: [identity profile] sobrique.livejournal.com
We have a SAN. It's about 30Tb these days. Our backup server was sized for the previous incarnation which was 9Tb. This is excluding the 'server storage creep' where now 100-200Gb of storage is the norm.
We have a _fair_ number of gigabit capable machines. However our backbone remains ATM, 155mb. And our 'server' network is 95% 100Mb, with a few gigabits that don't do a lot of good, because they have no where to go.

Basically, we've already had the 'we're understaffed' meltdown, we're getting to the 'our kit is too old' meltdown.

Profile

sobrique: (Default)
sobrique

December 2015

S M T W T F S
  12345
6789101112
13141516171819
20212223242526
2728 293031  

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Feb. 18th, 2026 11:31 am
Powered by Dreamwidth Studios