CodeProject outage
-
At around 2AM this morning things started to go a little pear-shaped with our servers. Our tech guys were looking into it trying to work out what was happening, but it became a all-hands exercise in trying to work out what broke what. We had updated CodeProject's code. So we redeployed, cleaned and deployed, rolled back and cleaned and deployed. It wasn't our code. Our requests per second were a little crazy. As in 1000x what we normally would get, but nothing that set off the DoS alarms. Things were adjusted to ease the load but the load remained uneased. Finally, with zero load we still had the site pinned. It turns out the firewall needs replacing. Firewall fixed, load reduced, site back up. Mostly. There were also a series of Windows patches that were installed as part of routine maintenance. These were to do with HTTP security, so they naturally got some attention. Uninstalling Windows patches can be painful, but once one of the patches was removed everything popped back up. However, that was only on 1 of the servers, so that patch, since it's a security update, will be reinstalled, and if it causes issues the entire server will be binned and a new one rolled in. So: fun and games at CodeProject central and I apologise for being down for so long. This was a bit of a trifecta, but we're now into mopping up and analysis stage so :beer: all round.
cheers Chris Maunder
I felt like a crack addict ... was missing my CP fix... Nothing worse than when things go tits up! Glad to hear that you and the team have it under control. :beer: :beer: :beer:
Graeme
"I fear not the man who has practiced ten thousand kicks one time, but I fear the man that has practiced one kick ten thousand times!" - Bruce Lee
-
At around 2AM this morning things started to go a little pear-shaped with our servers. Our tech guys were looking into it trying to work out what was happening, but it became a all-hands exercise in trying to work out what broke what. We had updated CodeProject's code. So we redeployed, cleaned and deployed, rolled back and cleaned and deployed. It wasn't our code. Our requests per second were a little crazy. As in 1000x what we normally would get, but nothing that set off the DoS alarms. Things were adjusted to ease the load but the load remained uneased. Finally, with zero load we still had the site pinned. It turns out the firewall needs replacing. Firewall fixed, load reduced, site back up. Mostly. There were also a series of Windows patches that were installed as part of routine maintenance. These were to do with HTTP security, so they naturally got some attention. Uninstalling Windows patches can be painful, but once one of the patches was removed everything popped back up. However, that was only on 1 of the servers, so that patch, since it's a security update, will be reinstalled, and if it causes issues the entire server will be binned and a new one rolled in. So: fun and games at CodeProject central and I apologise for being down for so long. This was a bit of a trifecta, but we're now into mopping up and analysis stage so :beer: all round.
cheers Chris Maunder
The hamsters probably needed a breather anyway.
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
-
At around 2AM this morning things started to go a little pear-shaped with our servers. Our tech guys were looking into it trying to work out what was happening, but it became a all-hands exercise in trying to work out what broke what. We had updated CodeProject's code. So we redeployed, cleaned and deployed, rolled back and cleaned and deployed. It wasn't our code. Our requests per second were a little crazy. As in 1000x what we normally would get, but nothing that set off the DoS alarms. Things were adjusted to ease the load but the load remained uneased. Finally, with zero load we still had the site pinned. It turns out the firewall needs replacing. Firewall fixed, load reduced, site back up. Mostly. There were also a series of Windows patches that were installed as part of routine maintenance. These were to do with HTTP security, so they naturally got some attention. Uninstalling Windows patches can be painful, but once one of the patches was removed everything popped back up. However, that was only on 1 of the servers, so that patch, since it's a security update, will be reinstalled, and if it causes issues the entire server will be binned and a new one rolled in. So: fun and games at CodeProject central and I apologise for being down for so long. This was a bit of a trifecta, but we're now into mopping up and analysis stage so :beer: all round.
cheers Chris Maunder
Sounds like my morning execpt on a much much mush smaller scale. And it was the cable company coming and going.
-
At around 2AM this morning things started to go a little pear-shaped with our servers. Our tech guys were looking into it trying to work out what was happening, but it became a all-hands exercise in trying to work out what broke what. We had updated CodeProject's code. So we redeployed, cleaned and deployed, rolled back and cleaned and deployed. It wasn't our code. Our requests per second were a little crazy. As in 1000x what we normally would get, but nothing that set off the DoS alarms. Things were adjusted to ease the load but the load remained uneased. Finally, with zero load we still had the site pinned. It turns out the firewall needs replacing. Firewall fixed, load reduced, site back up. Mostly. There were also a series of Windows patches that were installed as part of routine maintenance. These were to do with HTTP security, so they naturally got some attention. Uninstalling Windows patches can be painful, but once one of the patches was removed everything popped back up. However, that was only on 1 of the servers, so that patch, since it's a security update, will be reinstalled, and if it causes issues the entire server will be binned and a new one rolled in. So: fun and games at CodeProject central and I apologise for being down for so long. This was a bit of a trifecta, but we're now into mopping up and analysis stage so :beer: all round.
cheers Chris Maunder
Still running on AWS? This happens to us sometimes too.
-
Must... not... tell... war... story... It involved a customer replacing the 1Gb hub we have in our equipment with a 100Mb hub they had laying around. They had the audacity to (1) not tell us what they had done, (2) lie when we asked them point blank what they had changed, and (3) tried to hide the evidence when a [very sleepy after a 10 hour drive] field service dude showed up. Irrelevant side note: the outage must be Jeremy's fault, since he raised the subject of uptime a few threads down.
Software Zen:
delete this;
Been there done that :laugh: A customer called and told me things didn't work. I asked them if anything at all had happened that could've caused this? Nope, nothing had happend. Really, nothing you can think of? Nothing. This customer never called so I wasn't too familiar with their software and setup. After a few hours of looking into it I found nothing. When I called them back the IT manager "suddenly remembered" they'd restarted the server "but that can't be it, right?". We fixed the problem five minutes later :doh:
Best, Sander Azure DevOps Succinctly (free eBook) Azure Serverless Succinctly (free eBook) Migrating Apps to the Cloud with Azure arrgh.js - Bringing LINQ to JavaScript
-
At around 2AM this morning things started to go a little pear-shaped with our servers. Our tech guys were looking into it trying to work out what was happening, but it became a all-hands exercise in trying to work out what broke what. We had updated CodeProject's code. So we redeployed, cleaned and deployed, rolled back and cleaned and deployed. It wasn't our code. Our requests per second were a little crazy. As in 1000x what we normally would get, but nothing that set off the DoS alarms. Things were adjusted to ease the load but the load remained uneased. Finally, with zero load we still had the site pinned. It turns out the firewall needs replacing. Firewall fixed, load reduced, site back up. Mostly. There were also a series of Windows patches that were installed as part of routine maintenance. These were to do with HTTP security, so they naturally got some attention. Uninstalling Windows patches can be painful, but once one of the patches was removed everything popped back up. However, that was only on 1 of the servers, so that patch, since it's a security update, will be reinstalled, and if it causes issues the entire server will be binned and a new one rolled in. So: fun and games at CodeProject central and I apologise for being down for so long. This was a bit of a trifecta, but we're now into mopping up and analysis stage so :beer: all round.
cheers Chris Maunder
Things go sideways sometimes. A bit of adrenaline to get the blood pumping. Sounds like you’ve got it figured out. Cheers, and thanks! :java:
Time is the differentiation of eternity devised by man to measure the passage of human events. - Manly P. Hall Mark Just another cog in the wheel
-
At around 2AM this morning things started to go a little pear-shaped with our servers. Our tech guys were looking into it trying to work out what was happening, but it became a all-hands exercise in trying to work out what broke what. We had updated CodeProject's code. So we redeployed, cleaned and deployed, rolled back and cleaned and deployed. It wasn't our code. Our requests per second were a little crazy. As in 1000x what we normally would get, but nothing that set off the DoS alarms. Things were adjusted to ease the load but the load remained uneased. Finally, with zero load we still had the site pinned. It turns out the firewall needs replacing. Firewall fixed, load reduced, site back up. Mostly. There were also a series of Windows patches that were installed as part of routine maintenance. These were to do with HTTP security, so they naturally got some attention. Uninstalling Windows patches can be painful, but once one of the patches was removed everything popped back up. However, that was only on 1 of the servers, so that patch, since it's a security update, will be reinstalled, and if it causes issues the entire server will be binned and a new one rolled in. So: fun and games at CodeProject central and I apologise for being down for so long. This was a bit of a trifecta, but we're now into mopping up and analysis stage so :beer: all round.
cheers Chris Maunder
-
At around 2AM this morning things started to go a little pear-shaped with our servers. Our tech guys were looking into it trying to work out what was happening, but it became a all-hands exercise in trying to work out what broke what. We had updated CodeProject's code. So we redeployed, cleaned and deployed, rolled back and cleaned and deployed. It wasn't our code. Our requests per second were a little crazy. As in 1000x what we normally would get, but nothing that set off the DoS alarms. Things were adjusted to ease the load but the load remained uneased. Finally, with zero load we still had the site pinned. It turns out the firewall needs replacing. Firewall fixed, load reduced, site back up. Mostly. There were also a series of Windows patches that were installed as part of routine maintenance. These were to do with HTTP security, so they naturally got some attention. Uninstalling Windows patches can be painful, but once one of the patches was removed everything popped back up. However, that was only on 1 of the servers, so that patch, since it's a security update, will be reinstalled, and if it causes issues the entire server will be binned and a new one rolled in. So: fun and games at CodeProject central and I apologise for being down for so long. This was a bit of a trifecta, but we're now into mopping up and analysis stage so :beer: all round.
cheers Chris Maunder
-
Must... not... tell... war... story... It involved a customer replacing the 1Gb hub we have in our equipment with a 100Mb hub they had laying around. They had the audacity to (1) not tell us what they had done, (2) lie when we asked them point blank what they had changed, and (3) tried to hide the evidence when a [very sleepy after a 10 hour drive] field service dude showed up. Irrelevant side note: the outage must be Jeremy's fault, since he raised the subject of uptime a few threads down.
Software Zen:
delete this;
Gary R. Wheeler wrote:
Irrelevant side note: the outage must be Jeremy's fault, since he raised the subject of uptime a few threads down.
I thought the same :rolleyes: :laugh: :laugh:
M.D.V. ;) If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about? Help me to understand what I'm saying, and I'll explain it better to you Rating helpful answers is nice, but saying thanks can be even nicer.
-
Obviously it was tongue in cheek, but seriously, Windows updates too often break stuff, and Murphy's law ensures that they break said stuff at the most inconvenient times.
obeobe wrote:
Obviously it was tongue in cheek, but seriously,
The problem is that some people make that sort of suggestion while being absolutely serious.
obeobe wrote:
Windows updates too often break stuff
Can't argue there...
obeobe wrote:
they break said stuff at the most inconvenient times.
Is there ever a convenient time during which updates should be okay to break stuff...?
-
obeobe wrote:
Obviously it was tongue in cheek, but seriously,
The problem is that some people make that sort of suggestion while being absolutely serious.
obeobe wrote:
Windows updates too often break stuff
Can't argue there...
obeobe wrote:
they break said stuff at the most inconvenient times.
Is there ever a convenient time during which updates should be okay to break stuff...?
-
dandy72 wrote:
Is there ever a convenient time during which updates should be okay to break stuff...?
When I am on vacation and unreachable?
-
...so less qualified people rush their own fix that you then inherit when you come back...?
-
Chris Maunder wrote:
if it causes issues the entire server will be binned and a new one rolled in
So you know Chris, ever think about hosting CP on Debian? :-\
Jeremy Falcon
We have this awful albatross of a webforms project that ruins the fun. .NET Core, Linux, PostgreSQL or MariaDB and I'd be so happy and our costs would be dramatically lower. Sigh.
cheers Chris Maunder
-
Must... not... tell... war... story... It involved a customer replacing the 1Gb hub we have in our equipment with a 100Mb hub they had laying around. They had the audacity to (1) not tell us what they had done, (2) lie when we asked them point blank what they had changed, and (3) tried to hide the evidence when a [very sleepy after a 10 hour drive] field service dude showed up. Irrelevant side note: the outage must be Jeremy's fault, since he raised the subject of uptime a few threads down.
Software Zen:
delete this;
10% of capacity? But what could do wrong? ;)
cheers Chris Maunder
-
10% of capacity? But what could do wrong? ;)
cheers Chris Maunder
Especially on a machine printing both sides of the paper at 17 feet per second, full color, front and back.
Software Zen:
delete this;