sobrique: (Default)
[personal profile] sobrique
Right. I'm trying to diagnose a problem with our backup server.
I have a log file, that tells me: start time, finish time, 'quantity' of data backed up and number of files. Along with things like filesystem and backup 'level' (incremental or full).

The problem is, I _know_ it's running 'a bit slow'. And I need to figure out how best to represent this information, such that it's 'obvious' if there's a particular culprit.

My suspicion is that we're just 'overloaded' and need more tape drives, but they're not cheap, and so if we go that route, it'd better fix the problem...

Anyone got thoughts on the subject?

Date: 2004-09-14 06:03 am (UTC)
From: [identity profile] xarrion.livejournal.com
With that info, I'd probably go back a while (provided you keep historical logs like the one you've described) and calculate the ave. backup speed. (perhaps one ave. for quantity, another for files). Then dump it all into a spreadsheet, create a graph and see if there's a certain day when the backup time 'spikes'. Or do it by eye, but it'd be harder to spot, I'd imagine.

Date: 2004-09-14 06:10 am (UTC)
From: [identity profile] sobrique.livejournal.com
Ah, I can _do_ a throughput graph. But the problem I've got is 4 tapedrives and 'backup sets' mean I get multiple concurrent backups. Which can be hard to track which is slow, especially if one is incremental, the other 'full'.

got thoughts

Date: 2004-09-14 06:36 am (UTC)
From: [identity profile] erisreg.livejournal.com
are you keeping just event logs, or are you doing full logging with PID and timing,.. with the full info you can track the bottle necks that are happening and pinpoint the cause of those bottlenecks,..0.0

Re: got thoughts

Date: 2004-09-14 07:15 am (UTC)
From: [identity profile] sobrique.livejournal.com
At the moment we're keeping 'backup logs'. Eg. job schedule, start time, throughtput, ufsdump level etc.

At the moment, I'm aiming for staggered chart looking something like this:
18:00-------19:00-----20:00
---                           HOSTNAME:/FS
 --                           HOSTNAME:/FS
 --                           HOSTNAME:/FS
   ----                       HOSTNAME:/FS  
   --                         HOSTNAME:/FS 
   --------                   HOSTNAME:/FS
     --- 
    
Throughput
18:00-------19:00-----20:00
  |    | 
 |||   ||
 ||||  || 
|||||  ||

Ok, crappy formatting I know ;p And it'll probably not turn out right when I submit this, so I can try to corellate 'slow' jobs, with throughput troughs.

Full logging is about plan D at the moment, because the amount of rubbish something like a truss will grab is going to be horrible. We're talking maybe a Terabyte a night here, so...

Profile

sobrique: (Default)
sobrique

December 2015

S M T W T F S
  12345
6789101112
13141516171819
20212223242526
2728 293031  

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Feb. 21st, 2026 02:26 pm
Powered by Dreamwidth Studios