Protocol Design Bug
-
I designed a LAN protocol for low bandwidth devices. Its first (and alas its only) use was for temperature monitoring in already well regulated refrigerators. My protocol used RS-232 at 38.4K baud with the transmit and receive lines tied together. All devices saw everything anyone put on the LAN. It should be clear from this that ‘twaz important for only one device to be transmitting at any given time. The LAN controller polled each device in turn and waited a reasonable amount of time for a response. Byte parity errors, message parity errors and timeouts were considered errors. The remote “pods” handled errors by waiting for 3 mSec of quiet on the line and restarted their normal wait for message. My controller code guaranteed an occasional 6 mSec of quiet. My bug: after saying all the things that were errors I included, “and if you don’t understand the message you receive then that is also an error.” I had the responsibility of sending arbitrary messages from my host to the temperature measuring pods. My host sent a message a pod did not understand and the pod appeared to die. Later, in an attempt to find recently connected pods I sent the “is anybody home” message to the apparently dead pod. The pod responded and (overt bug here) I sent its reply to my host. The host guy phoned me and basically said WTF is this? It took me a week with crap debug tools to find the thing. I have never designed another protocol. But if I do I will never forget to include a “I don’t understand this otherwise well formed message” response.
-
I designed a LAN protocol for low bandwidth devices. Its first (and alas its only) use was for temperature monitoring in already well regulated refrigerators. My protocol used RS-232 at 38.4K baud with the transmit and receive lines tied together. All devices saw everything anyone put on the LAN. It should be clear from this that ‘twaz important for only one device to be transmitting at any given time. The LAN controller polled each device in turn and waited a reasonable amount of time for a response. Byte parity errors, message parity errors and timeouts were considered errors. The remote “pods” handled errors by waiting for 3 mSec of quiet on the line and restarted their normal wait for message. My controller code guaranteed an occasional 6 mSec of quiet. My bug: after saying all the things that were errors I included, “and if you don’t understand the message you receive then that is also an error.” I had the responsibility of sending arbitrary messages from my host to the temperature measuring pods. My host sent a message a pod did not understand and the pod appeared to die. Later, in an attempt to find recently connected pods I sent the “is anybody home” message to the apparently dead pod. The pod responded and (overt bug here) I sent its reply to my host. The host guy phoned me and basically said WTF is this? It took me a week with crap debug tools to find the thing. I have never designed another protocol. But if I do I will never forget to include a “I don’t understand this otherwise well formed message” response.
Why not borrow from Microsoft and use: ASP Error 500: Internal Server Error It's worked for them for years!:-D
"...a photo album is like Life, but flat and stuck to pages." - Shog9
-
I designed a LAN protocol for low bandwidth devices. Its first (and alas its only) use was for temperature monitoring in already well regulated refrigerators. My protocol used RS-232 at 38.4K baud with the transmit and receive lines tied together. All devices saw everything anyone put on the LAN. It should be clear from this that ‘twaz important for only one device to be transmitting at any given time. The LAN controller polled each device in turn and waited a reasonable amount of time for a response. Byte parity errors, message parity errors and timeouts were considered errors. The remote “pods” handled errors by waiting for 3 mSec of quiet on the line and restarted their normal wait for message. My controller code guaranteed an occasional 6 mSec of quiet. My bug: after saying all the things that were errors I included, “and if you don’t understand the message you receive then that is also an error.” I had the responsibility of sending arbitrary messages from my host to the temperature measuring pods. My host sent a message a pod did not understand and the pod appeared to die. Later, in an attempt to find recently connected pods I sent the “is anybody home” message to the apparently dead pod. The pod responded and (overt bug here) I sent its reply to my host. The host guy phoned me and basically said WTF is this? It took me a week with crap debug tools to find the thing. I have never designed another protocol. But if I do I will never forget to include a “I don’t understand this otherwise well formed message” response.