Worst deploy story?
-
OK, so we had a messy, messy day here with a deploy that went pear-shaped, so I was wondering if anyone has any true horror stories of deploys that went terribly, horrifyingly bad. The sorts of thing where you no longer even visit that town because the Wanted posters are still flying from the street posts.
cheers Chris Maunder
While it's not specifically related to software deployment, the first thought that came to my mind is that anyone in the business of launching satellites and people into space have truly the worst deployment stories. Marc
Imperative to Functional Programming Succinctly Higher Order Programming
-
OK, so we had a messy, messy day here with a deploy that went pear-shaped, so I was wondering if anyone has any true horror stories of deploys that went terribly, horrifyingly bad. The sorts of thing where you no longer even visit that town because the Wanted posters are still flying from the street posts.
cheers Chris Maunder
In the last '80s, I was working for a company that was running two production systems on PDP 11/73's in a manufacturing environment. We had maxed out the memory and had code that would work in test, but not in production because of the memory issues. The first system was product routing in the plant; the second was a storage system for receipt/delivery from/to the first system. A project was created to rewrite the first system to run on a MicroVAX; similar technology, language, etc. The rewrite took a year with a team of people. When implemented (and there was no go back, only go forward), it was discovered the code had not been complied with array bound check. It wasn't done in the old version because that would use to much memory and overlooked in the new system. Production ground to a halt for January and most of February. Changes had to be described on a paper form and signed off by on site support before being implemented. The next year, the second system was rewritten with lessons learned. On implementation, around the clock support. Management asked when it was going to be installed; we said it was installed - two weeks ago. Much better. I am a firm proponent to a post mortem on all projects; see what worked, what didn't and learn from it. No finger pointing, just learning. Tim
-
While it's not specifically related to software deployment, the first thought that came to my mind is that anyone in the business of launching satellites and people into space have truly the worst deployment stories. Marc
Imperative to Functional Programming Succinctly Higher Order Programming
Yeah. They win this game. :sigh:
cheers Chris Maunder
-
OK, so we had a messy, messy day here with a deploy that went pear-shaped, so I was wondering if anyone has any true horror stories of deploys that went terribly, horrifyingly bad. The sorts of thing where you no longer even visit that town because the Wanted posters are still flying from the street posts.
cheers Chris Maunder
Great Thread. I love ghost stories.
-
Great Thread. I love ghost stories.
-
We're about to deploy a major major upgrade to the new system. I'm IT manager, and paranoid, so everything is being tested, tested, then tested again. This involves using various test pre-deployment databases to run the various update scripts against, then doing a system test against the updated database. There were a lot of database schema changes. A lot. Somewhat more of a conversion than an update in some cases. And the Address table was particularly problematic -as we were introducing encoding using the Post Office's Unique Postal ID system - so as it converted the address to the new schema, it used the P.O. software to try to find the matching address. The ones thrown out were being manually cleaned by a team of cannon fodder - then the conversion run again - we'd agreed not to do a final convert until we could get 95% hit rate. A contractor was assigned the job of tweaking and testing the conversion. Everything else was ready, and we still had 2 days before the rollout. So there was a relaxed atmosphere. I'm sitting, chatting to one of the devs, and I notice the contractor standing at the end of my desk. I finish the chat, turn to the contractor, who says "Maxxx. I just truncated the Address table" "That's OK" quoth I, "it's just the pre-deployment database, we can recreate it easy enou.." That's when I noticed his head was shaking, slowly, from side to side. "Production" he croaked. That's when my phone rang. 150 phone operators had suddenly lost every address in the system. During the next 48 hours the contractor, myself, and one of the contractors friends, managed (just) to restore the address table from a backup. Let me tell you, that was not in the least bit easy! And scary! Using undocumented features of Oracle. After I had bought beers I suggested to the contractor that he might want to make the background colour of his Live vs Pre-Deployment windows a different shade - maybe FLASHING RED for the live database! It's funny, but although at the time it was very stressful, looking back it was actually bloody good fun! time is a great healer!
PooperPig - Coming Soon
_Maxxx_ wrote:
"Production" he croaked
Been there, done that. That feeling of icy chill that goes down your back is one you never forget.
cheers Chris Maunder
-
We're about to deploy a major major upgrade to the new system. I'm IT manager, and paranoid, so everything is being tested, tested, then tested again. This involves using various test pre-deployment databases to run the various update scripts against, then doing a system test against the updated database. There were a lot of database schema changes. A lot. Somewhat more of a conversion than an update in some cases. And the Address table was particularly problematic -as we were introducing encoding using the Post Office's Unique Postal ID system - so as it converted the address to the new schema, it used the P.O. software to try to find the matching address. The ones thrown out were being manually cleaned by a team of cannon fodder - then the conversion run again - we'd agreed not to do a final convert until we could get 95% hit rate. A contractor was assigned the job of tweaking and testing the conversion. Everything else was ready, and we still had 2 days before the rollout. So there was a relaxed atmosphere. I'm sitting, chatting to one of the devs, and I notice the contractor standing at the end of my desk. I finish the chat, turn to the contractor, who says "Maxxx. I just truncated the Address table" "That's OK" quoth I, "it's just the pre-deployment database, we can recreate it easy enou.." That's when I noticed his head was shaking, slowly, from side to side. "Production" he croaked. That's when my phone rang. 150 phone operators had suddenly lost every address in the system. During the next 48 hours the contractor, myself, and one of the contractors friends, managed (just) to restore the address table from a backup. Let me tell you, that was not in the least bit easy! And scary! Using undocumented features of Oracle. After I had bought beers I suggested to the contractor that he might want to make the background colour of his Live vs Pre-Deployment windows a different shade - maybe FLASHING RED for the live database! It's funny, but although at the time it was very stressful, looking back it was actually bloody good fun! time is a great healer!
PooperPig - Coming Soon
_Maxxx_ wrote:
everything is being tested, tested, then tested again
In my experience, that's generally not worth the trouble. Something will always go wrong, fix it when it does and move on.
-
OK, so we had a messy, messy day here with a deploy that went pear-shaped, so I was wondering if anyone has any true horror stories of deploys that went terribly, horrifyingly bad. The sorts of thing where you no longer even visit that town because the Wanted posters are still flying from the street posts.
cheers Chris Maunder
Oh, and not a deployment story, but... There was a system I wrote that involved communication with a third-party product via a socket connection. I had a table in the database to contain all the messages to and from the socket. It quickly became rather large so I decided to trim it down a bit, something along the lines of
DELETE FROM messages WHERE timestamp<amonthago
in SSMS, and I sat there wondering when it would finish. Then the phone rang, it's the President of the Company, "the system is unresponsive, the call center is at a stand-still, are you doing anything that might be causing trouble?" Oops. :doh: Try terminating the DELETE, no go. :omg: Shut down my PC, reboot, get back into database, see that everything is working again. I then wrote a feature that would delete a thousand messages then sleep and repeat until there were no more messages to delete. It took several days for the process to get the old messages cleaned out. -
No great loss, he was a douchebag anyway!! Funny story, after he left, we had to clear out his desk, and in it, found a hand written list of porn titles, around 1/2 of which were crossed off... At least that explained the lunches away from his desk and returning smelling a little funky... :laugh: :laugh:
Quad skating his way through the world since the early 80's... Booger Mobile - My bright green 1964 Ford Falcon - check out the blog here!! | If you feel generous - make a donation to Camp Quality!!
-
OK, so we had a messy, messy day here with a deploy that went pear-shaped, so I was wondering if anyone has any true horror stories of deploys that went terribly, horrifyingly bad. The sorts of thing where you no longer even visit that town because the Wanted posters are still flying from the street posts.
cheers Chris Maunder
I've two that vie for worst ever: First one was a server system upgrade. We were dropping in a new SAN, changing from the old switch to a iSCSI fabric and doing a major OS upgrade. All in the same night. Why are you groaning? My part was the (relatively) easy one; I had to make sure the databases were backed up and ready for shut down, wait for the rest of the team to do the physical changes in the colo facility, and then initiate the upgrade/rollout to 800 thin client machines throughout the building. The first part went just fine. The server techs pulled the plug on the switch... And that is when the magic happened. They hadn't actually bothered to shut down any of the servers before yanking the switch. We were in a 100% virtualized environment using vSphere and suddenly every server was basically disconnected simultaneously from the control systems. Our CPU on the servers red lined as the poor little VM's tried to figure out what had happened and fix it. Meanwhile, they're trying to dump stack traces and error logs to the database, to disk, to anywhere they can find a place... and nothing is responding. Of course, I have no idea this is happening, and the server techs are too busy trying to map a switch panel and replicate it in the fabric (I am not joking, they hadn't bothered to map this ahead of time) to notice. Five hours later, the new fabric is in place and the team reconnects the servers. My monitoring systems are the first things back online and they suddenly flood with the best part of 400 GB of error data and stack dumps. It was like getting hit with the biggest DDoS I've ever seen. Everything was screaming for attention, the fabric was actually misconfigured (although we didn't know at the time) and was sending packets in a round-robin loop to non-existent addresses, and the databases, hit with error log write queries, slowed to a crawl. It took 3 hours to sort it out. Reboot everything. Flush error logs to disk. Kill db threads. You name it. And all of this masked the fabric errors. So when our lead server tech left the colo, he headed straight for vacation. Two hours later, he was somewhere in the air with his cell phone off... and we found the misconfiguration issue when people started arriving for work. The other one started as a relatively benign little program that ran questionnaires for our call center. Basically, it was a "ask these questions and route through the top 50 most common issues" type of program. Neat little thing. Anyway, we were aske
-
No great loss, he was a douchebag anyway!! Funny story, after he left, we had to clear out his desk, and in it, found a hand written list of porn titles, around 1/2 of which were crossed off... At least that explained the lunches away from his desk and returning smelling a little funky... :laugh: :laugh:
Quad skating his way through the world since the early 80's... Booger Mobile - My bright green 1964 Ford Falcon - check out the blog here!! | If you feel generous - make a donation to Camp Quality!!
-
OK, so we had a messy, messy day here with a deploy that went pear-shaped, so I was wondering if anyone has any true horror stories of deploys that went terribly, horrifyingly bad. The sorts of thing where you no longer even visit that town because the Wanted posters are still flying from the street posts.
cheers Chris Maunder
My friend was on work experience from college with one of ireland's largest bank about a decade ago. She was on night shift and was tired and deploying something and mistyped and accidentally shut down every ATM in the country. Had to wait ages for them all to come back online again.
-
While it's not specifically related to software deployment, the first thought that came to my mind is that anyone in the business of launching satellites and people into space have truly the worst deployment stories. Marc
Imperative to Functional Programming Succinctly Higher Order Programming
-
We're about to deploy a major major upgrade to the new system. I'm IT manager, and paranoid, so everything is being tested, tested, then tested again. This involves using various test pre-deployment databases to run the various update scripts against, then doing a system test against the updated database. There were a lot of database schema changes. A lot. Somewhat more of a conversion than an update in some cases. And the Address table was particularly problematic -as we were introducing encoding using the Post Office's Unique Postal ID system - so as it converted the address to the new schema, it used the P.O. software to try to find the matching address. The ones thrown out were being manually cleaned by a team of cannon fodder - then the conversion run again - we'd agreed not to do a final convert until we could get 95% hit rate. A contractor was assigned the job of tweaking and testing the conversion. Everything else was ready, and we still had 2 days before the rollout. So there was a relaxed atmosphere. I'm sitting, chatting to one of the devs, and I notice the contractor standing at the end of my desk. I finish the chat, turn to the contractor, who says "Maxxx. I just truncated the Address table" "That's OK" quoth I, "it's just the pre-deployment database, we can recreate it easy enou.." That's when I noticed his head was shaking, slowly, from side to side. "Production" he croaked. That's when my phone rang. 150 phone operators had suddenly lost every address in the system. During the next 48 hours the contractor, myself, and one of the contractors friends, managed (just) to restore the address table from a backup. Let me tell you, that was not in the least bit easy! And scary! Using undocumented features of Oracle. After I had bought beers I suggested to the contractor that he might want to make the background colour of his Live vs Pre-Deployment windows a different shade - maybe FLASHING RED for the live database! It's funny, but although at the time it was very stressful, looking back it was actually bloody good fun! time is a great healer!
PooperPig - Coming Soon
Ah Yes, different colors for different systems. We had terminals and a switch box. I was working on Dev, ran out for a printout. My tests were great, I entered the command to DELETE my test data. The phones started ringing. I picked up, the ENTIRE list of GMAC Car loans were missing from Florida. Strange, I thought. That is the account I just deleted on Dev. Let me turn my Switch box over to production Florida... Oh, strange, why is it ALREADY on Florida. OMG, run into bosses office, call emergency meeting. Contract operations, start the restore process from last night. Gather all work done for the day, and set aside to be re-processed. OMG, what a mistake. Later, someone else killed Texas with the same command. I ultimately patched the operating systems so that command would FAIL on production, a strange version of the command would have to be used! Never happened again.
-
agolddog wrote:
"Oh, you guys were using metric units?"
Yup. I remember that one. Marc
Imperative to Functional Programming Succinctly Higher Order Programming
-
OK, so we had a messy, messy day here with a deploy that went pear-shaped, so I was wondering if anyone has any true horror stories of deploys that went terribly, horrifyingly bad. The sorts of thing where you no longer even visit that town because the Wanted posters are still flying from the street posts.
cheers Chris Maunder
wasn't me. Thank goodness. But at an organization I used to work at. A co-worker upgraded a SAN storage system. on a Tuesday. AFternoon, Everything crashed. Had he verified the backups nope, No backup. Everyone lost 3 months worth of work. two weeks later SAN was finally up. Not sure whatever became of co-worker. Never saw him again.
To err is human to really mess up you need a computer
-
OK, so we had a messy, messy day here with a deploy that went pear-shaped, so I was wondering if anyone has any true horror stories of deploys that went terribly, horrifyingly bad. The sorts of thing where you no longer even visit that town because the Wanted posters are still flying from the street posts.
cheers Chris Maunder
-
We're about to deploy a major major upgrade to the new system. I'm IT manager, and paranoid, so everything is being tested, tested, then tested again. This involves using various test pre-deployment databases to run the various update scripts against, then doing a system test against the updated database. There were a lot of database schema changes. A lot. Somewhat more of a conversion than an update in some cases. And the Address table was particularly problematic -as we were introducing encoding using the Post Office's Unique Postal ID system - so as it converted the address to the new schema, it used the P.O. software to try to find the matching address. The ones thrown out were being manually cleaned by a team of cannon fodder - then the conversion run again - we'd agreed not to do a final convert until we could get 95% hit rate. A contractor was assigned the job of tweaking and testing the conversion. Everything else was ready, and we still had 2 days before the rollout. So there was a relaxed atmosphere. I'm sitting, chatting to one of the devs, and I notice the contractor standing at the end of my desk. I finish the chat, turn to the contractor, who says "Maxxx. I just truncated the Address table" "That's OK" quoth I, "it's just the pre-deployment database, we can recreate it easy enou.." That's when I noticed his head was shaking, slowly, from side to side. "Production" he croaked. That's when my phone rang. 150 phone operators had suddenly lost every address in the system. During the next 48 hours the contractor, myself, and one of the contractors friends, managed (just) to restore the address table from a backup. Let me tell you, that was not in the least bit easy! And scary! Using undocumented features of Oracle. After I had bought beers I suggested to the contractor that he might want to make the background colour of his Live vs Pre-Deployment windows a different shade - maybe FLASHING RED for the live database! It's funny, but although at the time it was very stressful, looking back it was actually bloody good fun! time is a great healer!
PooperPig - Coming Soon
Yep, I trashed the Inspections system for a property management company once that way. And I had my systems color coded and everything. I was at the tail end of a 32-hour shift, had just finished my God-only-knows-what-number cup of break-room coffee, and hit execute before I really had checked the command well. Only thing that saved me there was that I realized immediately what I had done and broke the backup-imaging process so I didn't lose the same tables in the backup system. Then copied the data back over and restored the process. Twelve minute outage, and enough adrenaline to light up a city block...
There are lies, there are damned lies and then there are statistics. But if you want to beat them all, you'll need to start running some aggregate queries...
-
Chris Maunder wrote:
I was wondering if anyone has any true horror stories of deploys that went terribly, horrifyingly bad.
I have cleaned up after plenty of people who have had deployments go sideways. So far, knock on wood I have always had a good back out plan.
Common sense is admitting there is cause and effect and that you can exert some control over what you understand.
The worst deployment I remember involved an act of nature in the middle of a software upgrade on a client's site. The installation required a copy of the client's data on tape for reformatting of certain data files. As luck would have it, the building was struck by lightning, causing damage to the tape in the middle of reloading a critcal index. For reasons as yet unknown, we discovered that the client had not backed up their data for three months prior to the incident. Since I was on duty during the holiday weekend, I was stuck with the task of salvaging the damaged index file from the raw data that survived... a seventeen hour marathon.
-
OK, so we had a messy, messy day here with a deploy that went pear-shaped, so I was wondering if anyone has any true horror stories of deploys that went terribly, horrifyingly bad. The sorts of thing where you no longer even visit that town because the Wanted posters are still flying from the street posts.
cheers Chris Maunder
Got a phone call in the middle of the night once -- system crashed, and since my tools had been used to convert it to the new OS, my name was on the on-call list. Unbeknownst to me, I was in the middle of an appendicitis attack in my sleep. I can't really describe how awful I was feeling while trying to help them with a problem that I had no clue about (turned out to be buggy filesystem code).
We can program with only 1's, but if all you've got are zeros, you've got nothing.