When you only have production environment to test your code
-
Recently I have to work with AGV controller to integrate it with .NET application via TCP/IP Socket. Factory is using it to control robots on the factory floor. The only issue is factory is working at 100% capacity (by the way it produces water heaters ) and there is only one controller available to use when one of the production line is not using it. Despite the fact that everything worked OK when I have tested it and during UAT. There are strange issues happening and only way to test those was to put my untested code in production environment. Finally I was able to figure out that Controller is giving up connection once in a while failing to recognize message from .NET or other way around. I still have no idea why it is doing it but at least I was able to provide solution to reconnect and resend. Deploying untested code in the production was scary. What would you do when there is no way to replicate situations happening in production environment ?
Zen and the art of software maintenance : rm -rf * Math is like love : a simple idea but it can get complicated.
-
Recently I have to work with AGV controller to integrate it with .NET application via TCP/IP Socket. Factory is using it to control robots on the factory floor. The only issue is factory is working at 100% capacity (by the way it produces water heaters ) and there is only one controller available to use when one of the production line is not using it. Despite the fact that everything worked OK when I have tested it and during UAT. There are strange issues happening and only way to test those was to put my untested code in production environment. Finally I was able to figure out that Controller is giving up connection once in a while failing to recognize message from .NET or other way around. I still have no idea why it is doing it but at least I was able to provide solution to reconnect and resend. Deploying untested code in the production was scary. What would you do when there is no way to replicate situations happening in production environment ?
Zen and the art of software maintenance : rm -rf * Math is like love : a simple idea but it can get complicated.
I work in a similar environment where there is one "machine" fully automated manufacturing process with robots, vacuum chambers heaters, gauges, etc. At times I cannot get to this equipment because it is under test and this keeps me from testing my code base. I finally bit the bullet and created a number of virtual machines which are deployed onto other computers. These vMachines will use the identical messaging system TCP/IP, RS232, RS485 (same infrastructure used in production). The nice thing about my virtual machines is that they can be setup to fail for specific reasons. Also the virtual machines do not need to produce the full list of features, just the areas that you need to test.
-
Recently I have to work with AGV controller to integrate it with .NET application via TCP/IP Socket. Factory is using it to control robots on the factory floor. The only issue is factory is working at 100% capacity (by the way it produces water heaters ) and there is only one controller available to use when one of the production line is not using it. Despite the fact that everything worked OK when I have tested it and during UAT. There are strange issues happening and only way to test those was to put my untested code in production environment. Finally I was able to figure out that Controller is giving up connection once in a while failing to recognize message from .NET or other way around. I still have no idea why it is doing it but at least I was able to provide solution to reconnect and resend. Deploying untested code in the production was scary. What would you do when there is no way to replicate situations happening in production environment ?
Zen and the art of software maintenance : rm -rf * Math is like love : a simple idea but it can get complicated.
I used to program computer controlled conveyors and there is only one device to use, the production conveyor, they build only one. Initially the programmers were too worried about processor headroom and would only put static menus on the screens, all the while saying tests they had run showed the conveyor was only consuming 10% of the processor. (12 MHz days) My programming buddy and I started putting status displays on the screens that displayed every internal number that we thought would be useful, even though some screamed by on the screen too fast to be of any use unless the program crashed and we could then get a hint of what was going on by the displayed numbers. Once we got that to work, I advanced the art by building screens with graphical displays (IBM text graphics) that would be in the shape of the conveyor and would show photoeyes being blocked and diverters being fired. I turned one of the unreadable numbers, the index into the internal pseudo belt that we tracked the product through into a foot per minute display. We once had a conveyor failing because for some reason the motor speed was wrong and that became a handy way to verify speed without having to put a manual tach against the hub of the motor. I also displayed numerical messages that came from the PLC by expanding the message into English on a scrollable subsection of the status display. We were running DOS 3.3 and used the multitasker the FORTH language vendor had built into their implementation. We had 10 tasks running and occasionally a task would die and we wouldn't know which one. So I modified the multitasker to have a flag, when set, that would display the name of the current task in control on the 25th line of the CRT. So the name of the offending task would be displayed when it died. I also modified the multitasker from being round robin to support priorities by adding a parameter that said how many times around the task loop it should wait before executing instead of immediately surrendering control. That let me put the status display to a lower priority than say, the communications handler. So in summary, start adding status displays to your controller system that can help you narrow down failure points. It also helps to have a buddy working with you. Since these systems were real time, we couldn't single step the code. We ended up sounding like doctors comparing symptoms. Maybe it's "X", but if it's "X", we should be seeing "Y", and we're not, so it's got to be "Z", and we'd eventually narrow the reasons for fa
-
Add lots of logging until you are sure you understand what the real problem is (although sometimes you have to take your best guess or add code to fix the symptoms without finding the root cause in a truly complicated system). I worked on an automated conveyor system at one time. One customer kept having their box counts off and the shipping company was so aggravated they threatened not to ship for the company any more. The programmer tried lots of things to fix the "faulty sensors". The problem was finally solved when he was at the company late one night and saw a bored employee playing with the blinking lights. The employee had no idea it was throwing the shipping counts off. Production always seems to come up with some combination of circumstances you don't think of.
Member 8824288 wrote:
Production always seems to come up with some combination of circumstances you don't think of.
Don't they ever! I could probably share a ton of horror stories with you since I used to program conveyors as well. Of course the biggest problem we always had was what I called the U shaped communication channel. Worker would note a problem, mention it to his supervisor, who'd mention to his boss, until the company I worked for was called about it, and then it eventually filtered down to me. (AKA the Telephone Game) What would start as a simple problem would turn into "The conveyor is on fire, running backwards." We once thought we were clever hiding a "Dump All" report behind "Ctrl-A", never expecting the operators to rest with their palms pressed down on the keyboard covering CTRL and A and the terminal autorepeating requests for reports that would print for a half hour each. We'd had to reset the computer because we had not added a "delete print job" to our report queue, although even if we had, there would have been hundreds to delete.
Psychosis at 10 Film at 11 Those who do not remember the past, are doomed to repeat it. Those who do not remember the past, cannot build upon it.
-
Recently I have to work with AGV controller to integrate it with .NET application via TCP/IP Socket. Factory is using it to control robots on the factory floor. The only issue is factory is working at 100% capacity (by the way it produces water heaters ) and there is only one controller available to use when one of the production line is not using it. Despite the fact that everything worked OK when I have tested it and during UAT. There are strange issues happening and only way to test those was to put my untested code in production environment. Finally I was able to figure out that Controller is giving up connection once in a while failing to recognize message from .NET or other way around. I still have no idea why it is doing it but at least I was able to provide solution to reconnect and resend. Deploying untested code in the production was scary. What would you do when there is no way to replicate situations happening in production environment ?
Zen and the art of software maintenance : rm -rf * Math is like love : a simple idea but it can get complicated.
I was writing fuel management software package, and had to write a virtual fuel pump to test with, based off the service manual.
I need an app that will automatically deliver a new BBBBBBBBaBB (beautiful blonde bimbo brandishing bountiful bobbing bare breasts and bodacious butt) every day. John Simmons / outlaw programmer
-
Recently I have to work with AGV controller to integrate it with .NET application via TCP/IP Socket. Factory is using it to control robots on the factory floor. The only issue is factory is working at 100% capacity (by the way it produces water heaters ) and there is only one controller available to use when one of the production line is not using it. Despite the fact that everything worked OK when I have tested it and during UAT. There are strange issues happening and only way to test those was to put my untested code in production environment. Finally I was able to figure out that Controller is giving up connection once in a while failing to recognize message from .NET or other way around. I still have no idea why it is doing it but at least I was able to provide solution to reconnect and resend. Deploying untested code in the production was scary. What would you do when there is no way to replicate situations happening in production environment ?
Zen and the art of software maintenance : rm -rf * Math is like love : a simple idea but it can get complicated.
virang_21 wrote:
What would you do when there is no way to replicate situations happening in production environment ?
In normal server work there are ways to help with the situation. 1- Add logging (over time one would just use the log output.) 2- Build a simulator (over time it gets better.) 3- Unit testing. 4- Very rigorous design (at implementation level) and rigorous code reviews. The fourth is often to time consuming to apply to all code and probably too boring for most developers. Although the later would seem to be just something that developers must do the reality is that humans will tend to get glassy eyed if one attempts to force this for all code. Just the way humans work. But eye balling a very small but very critical subset can help. The last one can do as an independent developer but one is still subject the to the boredom factor as well. At a minimum I find for myself that I only attempt such detail reviews of my own code by waiting a day. If I attempt it immediately I don't really see the code. 3 and 4 can be used together. However whether applicable depends on specifics of the system.
-
virang_21 wrote:
What would you do when there is no way to replicate situations happening in production environment ?
In normal server work there are ways to help with the situation. 1- Add logging (over time one would just use the log output.) 2- Build a simulator (over time it gets better.) 3- Unit testing. 4- Very rigorous design (at implementation level) and rigorous code reviews. The fourth is often to time consuming to apply to all code and probably too boring for most developers. Although the later would seem to be just something that developers must do the reality is that humans will tend to get glassy eyed if one attempts to force this for all code. Just the way humans work. But eye balling a very small but very critical subset can help. The last one can do as an independent developer but one is still subject the to the boredom factor as well. At a minimum I find for myself that I only attempt such detail reviews of my own code by waiting a day. If I attempt it immediately I don't really see the code. 3 and 4 can be used together. However whether applicable depends on specifics of the system.
jschell wrote:
In normal server work there are ways to help with the situation.
1- Add logging (over time one would just use the log output.)
2- Build a simulator (over time it gets better.)
3- Unit testing.
4- Very rigorous design (at implementation level) and rigorous code reviews.1. That is how I was able to figure out what is going on. Logging every condition and messages sent and received and all the variable values when it fails 2.The controller I am using is from a company in Germany... Manual ? what is that ? .. Some old flowchart is what I had to program it with... 3. Unit testing/ BA Testing / UAT did not pick up those errors because if you produce few items it works like a charm but when it is constantly being used and one of the production line needs a response time of max 1.5 second it gets complex with full factory using the application on same network and controller is connected to network via WiFi.... 4. Code Reviews ? Who will do that when you are the only developer ;P ....I have to put a hand on my heart and tell them that it is not the code that is failing but something else is causing it... I have to deal with PM who keeps on insisting that it must be your code that is wrong... No technical help just keep looking at the flow chart and keep insisting you must be doing something wrong...
Zen and the art of software maintenance : rm -rf * Math is like love : a simple idea but it can get complicated.
-
Recently I have to work with AGV controller to integrate it with .NET application via TCP/IP Socket. Factory is using it to control robots on the factory floor. The only issue is factory is working at 100% capacity (by the way it produces water heaters ) and there is only one controller available to use when one of the production line is not using it. Despite the fact that everything worked OK when I have tested it and during UAT. There are strange issues happening and only way to test those was to put my untested code in production environment. Finally I was able to figure out that Controller is giving up connection once in a while failing to recognize message from .NET or other way around. I still have no idea why it is doing it but at least I was able to provide solution to reconnect and resend. Deploying untested code in the production was scary. What would you do when there is no way to replicate situations happening in production environment ?
Zen and the art of software maintenance : rm -rf * Math is like love : a simple idea but it can get complicated.
I've never had any issue with doing that when I needed to. Just be careful.
virang_21 wrote:
one controller available to use when one of the production line is not using
You actually have it pretty good. I have had jobs where there was only one of the system in the company (and there was no way they would buy or build another simply for development and test). It's one of the things that separates the men from the boys.
You'll never get very far if all you do is follow instructions.
-
This problem has been *already* been solved. Many times over. Since computers were invented. By NASA, by the Military, by safety critical industries. Seek education in their solutions.
Michael Kingsford Gray wrote:
By NASA, by the Military, by safety critical industries.
Yep, when something blows up, they patch it.
To alcohol! The cause of, and solution to, all of life's problems - Homer Simpson ---- Our heads are round so our thoughts can change direction - Francis Picabia
-
This problem has been *already* been solved. Many times over. Since computers were invented. By NASA, by the Military, by safety critical industries. Seek education in their solutions.
-
jschell wrote:
In normal server work there are ways to help with the situation.
1- Add logging (over time one would just use the log output.)
2- Build a simulator (over time it gets better.)
3- Unit testing.
4- Very rigorous design (at implementation level) and rigorous code reviews.1. That is how I was able to figure out what is going on. Logging every condition and messages sent and received and all the variable values when it fails 2.The controller I am using is from a company in Germany... Manual ? what is that ? .. Some old flowchart is what I had to program it with... 3. Unit testing/ BA Testing / UAT did not pick up those errors because if you produce few items it works like a charm but when it is constantly being used and one of the production line needs a response time of max 1.5 second it gets complex with full factory using the application on same network and controller is connected to network via WiFi.... 4. Code Reviews ? Who will do that when you are the only developer ;P ....I have to put a hand on my heart and tell them that it is not the code that is failing but something else is causing it... I have to deal with PM who keeps on insisting that it must be your code that is wrong... No technical help just keep looking at the flow chart and keep insisting you must be doing something wrong...
Zen and the art of software maintenance : rm -rf * Math is like love : a simple idea but it can get complicated.
virang_21 wrote:
4. Code Reviews ? Who will do that when you are the only developer
As I suggested - do it yourself. Actually I have often found that I must do that myself with critical code even when there are other developers because they just won't take it serious. Walking through my own code, just the critical pieces, not immediately but after a day allows me to verify logic if I do it in detail.