Blackberry outage

Mario Luis

So as most know, RIM had a huge break yesterday with their UK serevr going down and rendering most of the EMEA without service and bubbling in tears ;P Besides programming I also handle some servers, especially our sql and web servers. I'm wondering if fellow cp'ians are similar and what did they think about yesterdays outage? I would have thought that there would be redundancy both within the data center and if neccesary redundancy accross multiple data centers with fail over. That the service was down for so many hours was a huge failure on their part. Anyone have any thoughts, ideas or knowledge as to why it took so long for them to restore services or how to prevent similar from happening again or in a similar environment? I'm a happy blackberry user so this is in no way a RIM troll/burn/stab etc.

Nagy Vilmos

Mario Luis wrote:

I'm a happy blackberry user

So you like RIMming?

Panic, Chaos, Destruction. My work here is done. Drink. Get drunk. Fall over - P O'H OK, I will win to day or my name isn't Ethel Crudacre! - DD Ethel Crudacre I cannot live by bread alone. Bacon and ketchup are needed as well. - Trollslayer Have a bit more patience with newbies. Of course some of them act dumb - they're often *students*, for heaven's sake - Terry Pratchett

Mario Luis

Lol, if you mean sugar on a cocktail glass , yes :-D

Lost User

I think if you put anything in Slough it is going to have a massive breakdown before too long. The server no doubt went on a massive drinking binge and tried to fight a few people before slumping in a corner and crying the rest of the night away. (This only works if the initial report I saw yesterday blaming the problem on something in Slough is true, otherwise ignore). (If you have no idea what Slough is or what it is like, under no circumstances should you try to rectify that)

Every man can tell how many goats or sheep he possesses, but not how many friends.

DanHodgson88

lol if i could rate this 10 i would!

BobJanova

I think perhaps the way their encrypted communications work makes it hard to have a redundant data centre that can just be switched in. But yes, in general, a major company like that should never have outages measured in hours.

Nagy Vilmos

ChrisElston wrote:

If you have no idea what Slough is or what it is like, under no circumstances should you try to rectify that

John Betjeman wrote:

Come, friendly bombs, and fall on Slough! It isn't fit for humans now, There isn't grass to graze a cow. Swarm over, Death

Enough said.

Panic, Chaos, Destruction. My work here is done. Drink. Get drunk. Fall over - P O'H OK, I will win to day or my name isn't Ethel Crudacre! - DD Ethel Crudacre I cannot live by bread alone. Bacon and ketchup are needed as well. - Trollslayer Have a bit more patience with newbies. Of course some of them act dumb - they're often *students*, for heaven's sake - Terry Pratchett

Mario Luis

I'd think it would merely be a routing change? True the redundant server would have to either be a mirror or subscription that may be slightly out of date but that would just mean the minimal loss of any messages during the transistion but that should be few and far between. I'd actually love to see their redundancy policy and how they handle node failure. For all we know they had one in place but like a similar incident , nobody tested it ;P