Determining reason for SendMessageTimeout() failure
-
If I call SendMessageTimeout() with the flag SMTO_ABORTIFHUNG and a finite timeout period then it could fail (return 0) due to either of these reasons: 1. Timeout (the thread which owns the window started processing the message but didn't complete it within the timeout period). 2. The thread which owns the window was considered to be hung, so didn't (and won't) process the message. In the first case GetLastError() returns ERROR_TIMEOUT (1460). I can't find any error codes in winerror.h which look relevant to the second case (hung thread). What I actually want to know, immediately after SendMessageTimeout() returns, is whether the thread which owns the window has or will actually start processing the message. I don't need nor want to wait until it has finished processing the message. I've spent a couple of hours searching the web and reading various articles, questions etc. but not found an answer to this. A little while later... I've set up a test case where the window owning thread is spinning in a loop, and then I call SendMessageTimeout(hWnd, WM_MY_MESSAGE, wParam, lParam, SMTO_NORMAL|SMTO_ABORTIFHUNG, 1000, &dwResult). SendMessageTimeout() immediately returns 0 as expected, and calling GetLastError() returns [drumroll...] 18 (ERROR_NO_MORE_FILES)!!! Go figure... So it seems there are 2 yucky ways to trap this: 1) Wrap a timer around the call to SendMessageTimeout(), and if it returns in less than the timeout period (1000ms in the example above) then we know it failed because the thread is hung. 2) Test GetLastError(), if it's ERROR_TIMEOUT then we know it's a timeout, if it's ERROR_NO_MORE_FILES then we assume it's because the thread is hung. Or maybe if it's NOT ERROR_TIMEOUT then assume it's because the thread is hung. Method 1 is smells of kludge, method 2 relies on undocumented behaviour. And while we are on undocumented behaviour, it seems that passing a timeout value of 0 means "infinite timeout" although no doc I can find for SendMessageTimeout() mentions that. And some time later again... Well the behaviour is inconsistent. I've also had it fail (return 0) immediately with ERROR_TIMEOUT, and bizarrely sometimes fail immediately with error code 0 (ERROR_SUCCESS). This is Windows 10. It seems that all bets are off in trying to interpret the return value and/or error code.
-
If I call SendMessageTimeout() with the flag SMTO_ABORTIFHUNG and a finite timeout period then it could fail (return 0) due to either of these reasons: 1. Timeout (the thread which owns the window started processing the message but didn't complete it within the timeout period). 2. The thread which owns the window was considered to be hung, so didn't (and won't) process the message. In the first case GetLastError() returns ERROR_TIMEOUT (1460). I can't find any error codes in winerror.h which look relevant to the second case (hung thread). What I actually want to know, immediately after SendMessageTimeout() returns, is whether the thread which owns the window has or will actually start processing the message. I don't need nor want to wait until it has finished processing the message. I've spent a couple of hours searching the web and reading various articles, questions etc. but not found an answer to this. A little while later... I've set up a test case where the window owning thread is spinning in a loop, and then I call SendMessageTimeout(hWnd, WM_MY_MESSAGE, wParam, lParam, SMTO_NORMAL|SMTO_ABORTIFHUNG, 1000, &dwResult). SendMessageTimeout() immediately returns 0 as expected, and calling GetLastError() returns [drumroll...] 18 (ERROR_NO_MORE_FILES)!!! Go figure... So it seems there are 2 yucky ways to trap this: 1) Wrap a timer around the call to SendMessageTimeout(), and if it returns in less than the timeout period (1000ms in the example above) then we know it failed because the thread is hung. 2) Test GetLastError(), if it's ERROR_TIMEOUT then we know it's a timeout, if it's ERROR_NO_MORE_FILES then we assume it's because the thread is hung. Or maybe if it's NOT ERROR_TIMEOUT then assume it's because the thread is hung. Method 1 is smells of kludge, method 2 relies on undocumented behaviour. And while we are on undocumented behaviour, it seems that passing a timeout value of 0 means "infinite timeout" although no doc I can find for SendMessageTimeout() mentions that. And some time later again... Well the behaviour is inconsistent. I've also had it fail (return 0) immediately with ERROR_TIMEOUT, and bizarrely sometimes fail immediately with error code 0 (ERROR_SUCCESS). This is Windows 10. It seems that all bets are off in trying to interpret the return value and/or error code.
-
I would suggest that you raise this on one of the Microsoft support forums, as it will then get through (in time) to the people responsible.
-
If I call SendMessageTimeout() with the flag SMTO_ABORTIFHUNG and a finite timeout period then it could fail (return 0) due to either of these reasons: 1. Timeout (the thread which owns the window started processing the message but didn't complete it within the timeout period). 2. The thread which owns the window was considered to be hung, so didn't (and won't) process the message. In the first case GetLastError() returns ERROR_TIMEOUT (1460). I can't find any error codes in winerror.h which look relevant to the second case (hung thread). What I actually want to know, immediately after SendMessageTimeout() returns, is whether the thread which owns the window has or will actually start processing the message. I don't need nor want to wait until it has finished processing the message. I've spent a couple of hours searching the web and reading various articles, questions etc. but not found an answer to this. A little while later... I've set up a test case where the window owning thread is spinning in a loop, and then I call SendMessageTimeout(hWnd, WM_MY_MESSAGE, wParam, lParam, SMTO_NORMAL|SMTO_ABORTIFHUNG, 1000, &dwResult). SendMessageTimeout() immediately returns 0 as expected, and calling GetLastError() returns [drumroll...] 18 (ERROR_NO_MORE_FILES)!!! Go figure... So it seems there are 2 yucky ways to trap this: 1) Wrap a timer around the call to SendMessageTimeout(), and if it returns in less than the timeout period (1000ms in the example above) then we know it failed because the thread is hung. 2) Test GetLastError(), if it's ERROR_TIMEOUT then we know it's a timeout, if it's ERROR_NO_MORE_FILES then we assume it's because the thread is hung. Or maybe if it's NOT ERROR_TIMEOUT then assume it's because the thread is hung. Method 1 is smells of kludge, method 2 relies on undocumented behaviour. And while we are on undocumented behaviour, it seems that passing a timeout value of 0 means "infinite timeout" although no doc I can find for SendMessageTimeout() mentions that. And some time later again... Well the behaviour is inconsistent. I've also had it fail (return 0) immediately with ERROR_TIMEOUT, and bizarrely sometimes fail immediately with error code 0 (ERROR_SUCCESS). This is Windows 10. It seems that all bets are off in trying to interpret the return value and/or error code.
MikeBz wrote:
because the thread is hung.
In my experience attempting to definitely trap that is just not going to work. The states that one can reliably define for a thread 1. It worked. Worked can include returning an error result (of any sort.) 2. It took too long. For the second case one might try to hypothesize about the cause. And then perhaps collect information, even over time, that allows one to narrow detectable types of failures. But one should never expect that they can detect and certainly not correct all possible errors. All one can do is reduce the failure rate to a point where it can be ignored.
-
MikeBz wrote:
because the thread is hung.
In my experience attempting to definitely trap that is just not going to work. The states that one can reliably define for a thread 1. It worked. Worked can include returning an error result (of any sort.) 2. It took too long. For the second case one might try to hypothesize about the cause. And then perhaps collect information, even over time, that allows one to narrow detectable types of failures. But one should never expect that they can detect and certainly not correct all possible errors. All one can do is reduce the failure rate to a point where it can be ignored.
Understood, thanks for your reply. I'm looking at some not-very-well-designed legacy code and trying to work out whether its attempts to deal with a potential deadlock are flawed. Given that it's over 20 years old and nobody has complained it's probably better to just leave it alone.