I have vanquished the creeping horror
-
My takeaway is "don't write things in assembly" :laugh:
Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix
I wouldn't necessarily argue! :-D In this particular product there's a lot of time sensitive code, and over the years it's grown, A LOT.
"the debugger doesn't tell me anything because this code compiles just fine" - random QA comment "Facebook is where you tell lies to your friends. Twitter is where you tell the truth to strangers." - chriselst "I don't drink any more... then again, I don't drink any less." - Mike Mullikins uncle
-
Now imagine weeks of debugging and seconds for the actual fix (after the problem was identified and well understood).
True, it's tough when you are going through it, for me sleepless nights are the usual result!
dandy72 wrote:
after the problem was identified and well understood
To me there's a certain satisfaction to that, as rough as the road was to get to that point (and maybe learning a thing or two about better implementation, as I did).
"the debugger doesn't tell me anything because this code compiles just fine" - random QA comment "Facebook is where you tell lies to your friends. Twitter is where you tell the truth to strangers." - chriselst "I don't drink any more... then again, I don't drink any less." - Mike Mullikins uncle
-
We have a customer for whom we've done a [great steaming] pile of custom work. A couple of times a year they find a problem with the custom stuff. It always turns into a wretched slog through a Lovecraftian swamp of bubbling ichor (e.g. legacy code) I didn't write but am now required to maintain. Three months ago they reported an issue with one of their features in the new generation of product that didn't happen with the old one. I compared the code between the two and it was identical. I've spent considerable hours debugging through the code for the feature. It turns out the new code is looking in the wrong place in the registry to see if their custom features are enabled :doh: . The old code only worked accidently. Cue the fireworks a day early, and let the naked happy dance :jig: commence!
Software Zen:
delete this;
-
Unfortunately no :sigh:, at least in my day job.
Software Zen:
delete this;
-
We have a customer for whom we've done a [great steaming] pile of custom work. A couple of times a year they find a problem with the custom stuff. It always turns into a wretched slog through a Lovecraftian swamp of bubbling ichor (e.g. legacy code) I didn't write but am now required to maintain. Three months ago they reported an issue with one of their features in the new generation of product that didn't happen with the old one. I compared the code between the two and it was identical. I've spent considerable hours debugging through the code for the feature. It turns out the new code is looking in the wrong place in the registry to see if their custom features are enabled :doh: . The old code only worked accidently. Cue the fireworks a day early, and let the naked happy dance :jig: commence!
Software Zen:
delete this;
Wait, you said the code was identical between the new and old versions of the product. So how is there a difference that caused it to look in the wrong place? UNLESS, it was always looking in the wrong place, and changes to the surrounding code put the information into a different place that it was unable to find by accident? OR, are you saying that the code that just turned the feature on didn't work, and the code of the feature itself was unchanged?
The difficult we do right away... ...the impossible takes slightly longer.
-
We have a customer for whom we've done a [great steaming] pile of custom work. A couple of times a year they find a problem with the custom stuff. It always turns into a wretched slog through a Lovecraftian swamp of bubbling ichor (e.g. legacy code) I didn't write but am now required to maintain. Three months ago they reported an issue with one of their features in the new generation of product that didn't happen with the old one. I compared the code between the two and it was identical. I've spent considerable hours debugging through the code for the feature. It turns out the new code is looking in the wrong place in the registry to see if their custom features are enabled :doh: . The old code only worked accidently. Cue the fireworks a day early, and let the naked happy dance :jig: commence!
Software Zen:
delete this;
Whew.
-
Wait, you said the code was identical between the new and old versions of the product. So how is there a difference that caused it to look in the wrong place? UNLESS, it was always looking in the wrong place, and changes to the surrounding code put the information into a different place that it was unable to find by accident? OR, are you saying that the code that just turned the feature on didn't work, and the code of the feature itself was unchanged?
The difficult we do right away... ...the impossible takes slightly longer.
The difference was in a class used throughout the product to access the registry. This component (a Windows service) used that class incorrectly in the old product but still managed to find the values at the appropriate key. When the service was migrated to the new product (which changes the registry key used), its incorrect usage of the registry class caused it to look at the incorrect registry key and not find the required values. In both cases, the service and the registry class, they looked correct on inspection. It wasn't until I stepped through the service that I discovered it was assuming certain things about the registry class weren't true and never had been. FWIW, I didn't write either of them.
Software Zen:
delete this;
-
Whew.
Yup :-D.
Software Zen:
delete this;
-
True, it's tough when you are going through it, for me sleepless nights are the usual result!
dandy72 wrote:
after the problem was identified and well understood
To me there's a certain satisfaction to that, as rough as the road was to get to that point (and maybe learning a thing or two about better implementation, as I did).
"the debugger doesn't tell me anything because this code compiles just fine" - random QA comment "Facebook is where you tell lies to your friends. Twitter is where you tell the truth to strangers." - chriselst "I don't drink any more... then again, I don't drink any less." - Mike Mullikins uncle
I get what you say, but the last time such a thing happened to me it was a case of a missing quote in a string that was being built with multiple layers of escaping them. So of course the compiler didn't know any better and was of no help. There was nothing satisfying about solving that particular problem. Just annoyance at whoever last modified the string in the first place...
-
jeron1 wrote:
Twas a happy day when I found it :beer:, and as important, a big learning experience
Same here. I've had other memorable bugs that were excruciating to recreate and diagnose. One was a GDI handle leak that took over a week of run time to show up and crash the application. Another was a piece of embedded code where the TCP/IP code we bought back in 1995 did not re-initialize properly after a network hardware error. Both of these took weeks of debugging to find and reproduce and only a couple of hours to correct.
Software Zen:
delete this;
-
jeron1 wrote:
Twas a happy day when I found it :beer:, and as important, a big learning experience
Same here. I've had other memorable bugs that were excruciating to recreate and diagnose. One was a GDI handle leak that took over a week of run time to show up and crash the application. Another was a piece of embedded code where the TCP/IP code we bought back in 1995 did not re-initialize properly after a network hardware error. Both of these took weeks of debugging to find and reproduce and only a couple of hours to correct.
Software Zen:
delete this;
-
My favorite was a 49-day crash bug caused by a 32-bit timer rollover. Management wanted to know why test did not find this bug.
I have a similar story about a once-per-49-days issue. Takes a long time to run those experiments, and to be patient enough to not interrupt them for some other test of the system.