« Keeping things simple… | Main | Motivating people to take action…. »

November 06, 2007

Availability, availability, availability.

Since this is my first entry into Cobweb’s blog since we’ve started this, so I’d like to introduce myself a little first:  I am the Technical Manager at Cobweb, responsible for the Technical Team and the systems that they support.  I joined about one year ago (just coming up...) coming previously from mostly large corporate organisations.  Going from a company where I was one of 75,000 others to a company where I know something about everyone who works at Cobweb was certainly a change to what I’m familiar with!  I’ve really enjoyed the difference though and if I cannot believe that I’ve been working with Cobweb that long it goes some way to explain how exciting the last year has been.  So for my first entry I thought I’d write about a subject close to the heart of my role at Cobweb – availability.

According to the Compact Oxford English Dictionary, ‘available’ describes a thing “able to be used or obtained” and for us of course that means our systems able to be used by our customers.  If you’re interested in the numbers, in a 30-day month there are 43200 minutes, and therefore to meet an availability of 99.5%, unplanned downtime must not exceed 217 minutes, or 3.6 hours.  99.9% must not exceed 43.2 minutes, and 99.99% must not exceed 4.32 minutes.

The last three months at Cobweb have been a journey as far as this is concerned, with a heavy month in August followed by the best result for the year in September, and a very good service in October – 99.78%, 99.99% and 99.95% respectively as averages for Hosted Exchange services.  In August we suffered a hardware issue on one Exchange platform – the first in my eleven months at Cobweb, and on another Exchange platform, repeated problems with non-paged pool memory.  The former required an engineer to resolve the issue and restore redundancy, and the non-paged pool memory an initial reduction in mailbox numbers that night to bring the service within limits.

Both of these events in August served as a significant alert to me and the operational teams of Cobweb – especially speaking first hand with customers impacted and disappointed by both outages.  So what have we been doing since these events?  Well, we have made a number of changes to process and service content, from simple items such as getting up-to-date availability information centre stage in front of the Technical Team, to work with Microsoft and our own lab-research on how our platforms behave.  Last week on one platform for example, further to lab-testing, we replaced network cards in an HP server with Intel equipment to gain an immediate reduction in non-paged pool memory usage – something we know impacts stability when it increases. 

Based on the figures, this activity has paid off in September and October with improvements across all systems and these issues not repeated.  And we haven’t finished yet.  The change to network cards for example we will achieve across the other clusters, and I’m confident that there’s always going to be something else that we can do to benefit this.  And I think that this is the greatest lesson of availability, as it is in business, that no matter how good you are today, you need to be better and sharper tomorrow to stay ahead.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/2492574/23088286

Listed below are links to weblogs that reference Availability, availability, availability.:

Comments

Post a comment

If you have a TypeKey or TypePad account, please Sign In