What do you do?

Adadurov

back in times of using Visual C++ 6.0 we built .map files for our products' binary modules and kept these files for every build that was approved for delivery to customers. if user experienced a crash, all we needed from him/her was crashing module name and address, (and some information about user actions that precedeed the crash) which was provided by Windows. with that we could trace crash to source file and line number using .map file and try to fix it. of course, this approach works only with simple bugs.

WhiteSpy

It's a pain in the A$$ but you might try putting additional code in to monitor (write to file) where the program is (top of the sub routine) when it crashes you'll at least have an idea of where it was when it happened. On really tough ones we've done this. Then we go in and do the same thing with the individual routine adding code as we think of things that may be causing it. We do this until we find and can reproduce the error on the badly acting machine. We once bought our client a loaner machine so that we could take the bad acter to our office. You may not be able to do that, but it helped us find a bad driver that was causing the problem.

Patrick Etc

In my experience, the performance hit is so low that I haven't observed it, and I've used it in real time systems without problem. That said, if logging isn't necessary for any but debug operations, there's still no reason to have it turned on UNLESS you're trying to debug. No point in generating tons of log files that will never be used.

Ballpoint Penguin

I'm sorry, but I'm going to have to consult with a few other people before I can say "Aye." Could you please give me your name and phone number, and after I've done some checking I can get back to you? :-D

pgorbas

Review all the basic requirements for your app. Does the clients machine: * Have minimum memory and cpu power? * If it runs in a browser, is the client using the same browser you are? * What else is the client trying to do while running your app? * Have the correct libraries and run time installed. Are you deploying a full stand-alone version with all required libs and run times? * Have you ruled out the customer simply not understanding what the app is supposed to do in the first place? ("Hey this financial calculator doesn't spell check my Latin translation correctly!" ) To fix the problem, you will need to learn to reproduce it. Your two most powerful tools to do this are log files and application feedback trough error messages. *) If it runs in a browser, have you tested it on the same browser version the client is using? *) For a desktop app, or purely a client side issue then physically go over to your clients machine and run it there - if you can't do that then remotely do it over net-meeting or some tool like that. *) If you created your app to write to a log file - you may need to add additional log messages to box in where this problem is? *) Of course your app is catching all possible errors already in try-catch blocks right? And when you catch a error you can't recover from then it displays a meaningful information message? You may have to put additional feedback in your app so you and customers can monitor what state the app is in.

r_z_aret

Have a productive discussion with the user(s). This depends heavily on the communication skills of everyone in the discussion. It also depends on the observational skills of the users (whether they see key symptoms that help diagnose the problem. I have some hidden logging hooks. So I can tell the user how to turn them on and send me the logs. This has actually not been nearly as useful as I hoped. The most efficient method is perhaps unique to my major app. It is a data collection app that runs mostly stand-alone, and usually stores the data in the same directory as the executable. So I can ask a user to zip up the contents of that directory and send it to me. The problem is often a problem in their data that I don't handle gracefully. And sometimes it's a legitimate case I overlooked or misunderstood. Travel to the user with as many debugging aids as I can muster.

Tom Archer

The problem with the last part of that advice is that the practice of "sacrificing" denotes the giving up of something of value :~

Christopher Duncan

Saaaaay... Aren't you in Management now? Boys, fire up the grille! :-D

Author of The Career Programmer and Unite the Tribes www.PracticalStrategyConsulting.com

patbob

The incense & sacrifice post is about spot-on for this :) However, I have had marginal luck with two techniques. Borrow the customer's machine and remote debug it locally. Or, use the log file. Borrowing practically never works, even when it's the sales guy's laptop that is having the problems. It is, however, the best way to find and fix the exact bug. For a log file, we built debug logging into our product. Create the magic directory, get a debug log written to a new file in it when run the program. Delete the directory, no log file. This way, we can turn logging on and off for release builds at individual customers by having them create/delete that magic directory. The logging will theoretically change the timing of multithreaded app, but after doing MT programming for about 20 years now, I've found that all such problems are really just latent bugs anyway, so might as well get them fixed before they suprise you with strange failures on some customer's system that you can't easily debug. What do we put in the log? Everything we can think of, but mostly we developers use it as one of our tools for debugging the apps, so there's all sorts of useful debugging stuff like deeper explanation of failures before the app takes a dump, mention of detected events that are not supposed to cause failures but are unexpected, etc. The logs never have quite what we want in them, but if you have an idea of what might have happened based on your mental model of how the code works and the failure report, then they often have just enough clues to confirm or deny your theory... or you put more in for the next release. And when that doesn't work, well, there's always incense and sacrifice.

patbob

dburns

I'm not vouching for it (haven't tried it) but there's a "black box" product out there at http://www.identify.com/products/index.php[^]. I have no idea if it will help but a product like this may help. Depending on the product, it's sometimes a good idea for the product to support tracing. Then in cases like this you ask the customer to enable it and send you the log. If you don't already support tracing, you could send the customer a hacked-up version that traces to a log.

dburns

Oh BTW I just hacked this tidbit together for tracing. I'll just throw it out there in case someone finds it useful. Obviously I was completely hacking... #include class FT { public: FT(const char* pcFunc) : mpcFunc(pcFunc) { printf("Enter %s\n", mpcFunc); } ~FT() { printf("Exit %s\n", mpcFunc); } const char* mpcFunc; }; #define __TR FT oFT(__FUNCSIG__); void function() { __TR } That will spit out enter/exit messages for each function that has the __TR macro.

miennaco

I will add to that, Logging, Logging, more Logging As well as that, try to identify differences between the development environment and the customers, hardware being number 1. For example I saw one where the developer had tested on a single processor machine, but it would crash on a multi-processor (yes it was a threading issue, but only one where a single move eax is not atomic).

Richard Andrew x64

Thank you, that is brilliant! I never would have thought of using an automatic variable to automatically record the entrance and exit of a function. The beauty of it is that I don't have to code an exit message for every place in the function that it has a "return" statement. Come to think about it, I'll bet this technique would also work with CRITICAL_SECTION's, so that I don't have to manually code every LeaveCriticalSection() call everywhere the function returns.

-------------------------------- "All that is necessary for the forces of evil to win in the world is for enough good men to do nothing" -- Edmund Burke

dburns

Why thank you, and while I don't deny any claims of being brilliant :-) it's a fairly common technique. The CWaitCursor MFC class uses it to turn on and off the wait cursor automatically. It's also handy to free up resources (memory, files, whatever). In an unmanaged language like C++ this is a very important technique if you're using exceptions, since an exception can cause a resource leak otherwise.

JMOdom

;P Since when has sacrificing a manager been a bad thing???? :laugh:

MrChug

* Limit network connections (try one socket only) * Limit process threads (try with one, not with forty) * Limit multiprocessor execution (set affinity for cpu 0) * Run with your own DLLs, not customer's (already suggested) * Verify DLLS using Sysinternals ProcessExplorer * OutputDebugString printf's even from services * Use those minidumps * Assert with wild abandon * Divide and conquer - disable parts run the rest * Code review with trusted peer * Follow Robbins' advice in Debugging MS .NET 2.0 Applications That's what I do. :)