ATL COM server crash
-
Hi I haven an ATL COM in-proc server which is crashing randomly. I would not say it as a crash, the server just vanishes after running successfully for 9-10 hours. And then restarts immediately on the next call to CreateInstance() by the clients. I am unable to debug as this is not a crash and happening randomly. Though I'm not providing much details as the code base is huge, any thoughts just to start with would be very much appreciated. I observed that FinalRelease() was called before the server vanishes, but for the initial 10 hours many clients have called FinalCOnstruct() and FinalRelease() without any issues. Thanks in advance Harish
-
Hi I haven an ATL COM in-proc server which is crashing randomly. I would not say it as a crash, the server just vanishes after running successfully for 9-10 hours. And then restarts immediately on the next call to CreateInstance() by the clients. I am unable to debug as this is not a crash and happening randomly. Though I'm not providing much details as the code base is huge, any thoughts just to start with would be very much appreciated. I observed that FinalRelease() was called before the server vanishes, but for the initial 10 hours many clients have called FinalCOnstruct() and FinalRelease() without any issues. Thanks in advance Harish
Harish Pulimi wrote:
the server just vanishes after running successfully for 9-10 hours
I don't understand what you mean by this. How did you reach such conclusion? As I interpret your complete description it seems like you're describing the expected behaviour. I'm quite confident that clients don't call
FinalConstruct()
orFinalRelease()
through any interface. Those functions are called internally by the framework.FinalConstruct()
is called when the server is created by a client andFinalRelease()
is called when the reference count reaches zero and the server consequently should be destroyed. This means that if you're experiencing a lot of calls toFinalConstruct()
andFinalRelease()
your server is being created and destroyed multiple times by its client. If this is not your intention, I suggest you put a breakpoint inFinalRelease()
and you should be able to see the origin of the call in the call stack debug window when the breakpoint is hit. It looks like you expected the server to be created once and then stay alive until the application is closed, but this doesn't seem to be the case given your error description. Perhaps I have misunderstood what you mean by "vanish" and if that's the case please elaborate on the subject a bit. It could mean thatCreateInstance()
fails after 9-10 hours, but I think you would have written that as a description of the error if this was the case."It's supposed to be hard, otherwise anybody could do it!" - selfquote
"High speed never compensates for wrong direction!" - unknown -
Harish Pulimi wrote:
the server just vanishes after running successfully for 9-10 hours
I don't understand what you mean by this. How did you reach such conclusion? As I interpret your complete description it seems like you're describing the expected behaviour. I'm quite confident that clients don't call
FinalConstruct()
orFinalRelease()
through any interface. Those functions are called internally by the framework.FinalConstruct()
is called when the server is created by a client andFinalRelease()
is called when the reference count reaches zero and the server consequently should be destroyed. This means that if you're experiencing a lot of calls toFinalConstruct()
andFinalRelease()
your server is being created and destroyed multiple times by its client. If this is not your intention, I suggest you put a breakpoint inFinalRelease()
and you should be able to see the origin of the call in the call stack debug window when the breakpoint is hit. It looks like you expected the server to be created once and then stay alive until the application is closed, but this doesn't seem to be the case given your error description. Perhaps I have misunderstood what you mean by "vanish" and if that's the case please elaborate on the subject a bit. It could mean thatCreateInstance()
fails after 9-10 hours, but I think you would have written that as a description of the error if this was the case."It's supposed to be hard, otherwise anybody could do it!" - selfquote
"High speed never compensates for wrong direction!" - unknownHi Roger Thanks a lot for your reply and apologies for not providing more details. The architecture is like I have an exe which keeps track of all the servers that will be created by the clients through CreateInstance() calls. This exe needs to run all the time and if not already started, it will start through the next CreateInstance() call made by a client. The problem is that the exe runs fine for around 10 hours and after that it suddenly vanishes i.e. the process just stops, there is no crash reported and the exe immediately starts with the next CreateInstance() call but loses all the previous data. This happens only on some PCs, so I tried putting my debug build on that PC, but still there is no exception, no crash reported, so I thought this could be a heap corruption. To reproduce this, I have created a test harness where around 100 clients continously bombard the exe with CreateInstance() calls and I could get an exception as follows, not sure whether this is the same crash but it seems like similar to the actual problem. Any idea how to debug the exceptions in OLE32.dll? First-chance exception in AniteLicenser.exe (OLE32.DLL): 0xC0000005: Access Violation. OLE32! 77600f3b() OLE32! 77600ee9() OLE32! 77600ba0() OLE32! 7752ad31() OLE32! 7752ac56() OLE32! 776007f5() OLE32! 77602df3() OLE32! 77600715() RPCRT4! 77e794bd() RPCRT4! 77e79422() RPCRT4! 77e7934e() RPCRT4! 77e8a384() RPCRT4! 77e8a3c5() RPCRT4! 77e7bcc1() RPCRT4! 77e7bc05() RPCRT4! 77e76caf() RPCRT4! 77e76ad1() RPCRT4! 77e76c97() KERNEL32! 7c80b713() One time I could get the following call stack, seems something is going wrong in the Release() call, but the same exe runs fine for around 10 hours (100s of instances were created and released successfully in the meantime) and on a different PC, this exe ran fine for around 3 days: ATL::_QIThunk::Release(ATL::_QIThunk * const 0x0171d440) line 2734 + 11 bytes OLE32! 7750d339() OLE32! 7750d09a() OLE32! 7752deb3() OLE32! 7752dcc8() OLE32! 7752db6a() RPCRT4! 77e799f4() RPCRT4! 77ef421a() RPCRT4! 77ef4bf3() OLE32! 77600c15() OLE32! 77600bbf() OLE32! 7752ad31() OLE32! 7752ac56() OLE32! 776007f5() OLE32! 77602df3() OLE32! 77600715() RPCRT4! 77e794bd() RPCRT4! 77e79422() RPCRT4! 77e7934e() RPCRT4! 77e8a384() RPCRT4! 77e8a3c5() RPCRT4! 77e7bcc1() RPCRT4! 77e7bc05() RPCRT4! 77e76caf() RPCRT4! 77e76ad1() RPCRT4! 77e76c97() KERNEL32! 7c80b713()
-
Hi Roger Thanks a lot for your reply and apologies for not providing more details. The architecture is like I have an exe which keeps track of all the servers that will be created by the clients through CreateInstance() calls. This exe needs to run all the time and if not already started, it will start through the next CreateInstance() call made by a client. The problem is that the exe runs fine for around 10 hours and after that it suddenly vanishes i.e. the process just stops, there is no crash reported and the exe immediately starts with the next CreateInstance() call but loses all the previous data. This happens only on some PCs, so I tried putting my debug build on that PC, but still there is no exception, no crash reported, so I thought this could be a heap corruption. To reproduce this, I have created a test harness where around 100 clients continously bombard the exe with CreateInstance() calls and I could get an exception as follows, not sure whether this is the same crash but it seems like similar to the actual problem. Any idea how to debug the exceptions in OLE32.dll? First-chance exception in AniteLicenser.exe (OLE32.DLL): 0xC0000005: Access Violation. OLE32! 77600f3b() OLE32! 77600ee9() OLE32! 77600ba0() OLE32! 7752ad31() OLE32! 7752ac56() OLE32! 776007f5() OLE32! 77602df3() OLE32! 77600715() RPCRT4! 77e794bd() RPCRT4! 77e79422() RPCRT4! 77e7934e() RPCRT4! 77e8a384() RPCRT4! 77e8a3c5() RPCRT4! 77e7bcc1() RPCRT4! 77e7bc05() RPCRT4! 77e76caf() RPCRT4! 77e76ad1() RPCRT4! 77e76c97() KERNEL32! 7c80b713() One time I could get the following call stack, seems something is going wrong in the Release() call, but the same exe runs fine for around 10 hours (100s of instances were created and released successfully in the meantime) and on a different PC, this exe ran fine for around 3 days: ATL::_QIThunk::Release(ATL::_QIThunk * const 0x0171d440) line 2734 + 11 bytes OLE32! 7750d339() OLE32! 7750d09a() OLE32! 7752deb3() OLE32! 7752dcc8() OLE32! 7752db6a() RPCRT4! 77e799f4() RPCRT4! 77ef421a() RPCRT4! 77ef4bf3() OLE32! 77600c15() OLE32! 77600bbf() OLE32! 7752ad31() OLE32! 7752ac56() OLE32! 776007f5() OLE32! 77602df3() OLE32! 77600715() RPCRT4! 77e794bd() RPCRT4! 77e79422() RPCRT4! 77e7934e() RPCRT4! 77e8a384() RPCRT4! 77e8a3c5() RPCRT4! 77e7bcc1() RPCRT4! 77e7bc05() RPCRT4! 77e76caf() RPCRT4! 77e76ad1() RPCRT4! 77e76c97() KERNEL32! 7c80b713()
Harish Pulimi wrote:
apologies for not providing more details.
No worries. There's no point in posting a lot of details before it's clear what to provide the details for. :) I don't really understand your architecture and how this mysterious "exe" keeps track of the running servers. It seems like the "exe" is a COM server itself.... :~
Harish Pulimi wrote:
I have an exe which keeps track of all the servers
How is this accomplished if
CreateInstance()
isn't called from this "exe" in order to create the other servers?Harish Pulimi wrote:
This exe needs to run all the time and if not already started, it will start through the next CreateInstance() call made by a client.
How is the "exe" started? Is it an out-of-process COM server and "started" with
CreateInstance()
? Perhaps you should consider to have it running as a service. If the "exe" is an out-of-process COM server that doesn't run as a service, it is quite expected that it will terminate if its reference count reaches zero unless you've provided functionality to prevent it. The call stack snippets you've provided implies that the call to the server is made in an RPC-thread which means that you're using a multithreaded solution. In such case it's fairly reasonable to suspect thread synchronization issues, but first we have to agree on how the apartments are set up. So I have a couple of questions for you...- Are you certain that every thread that uses any kind of COM related stuff contains a call to
::CoInitialize()
or one if its equivalents? - Are you certain that every thread that instantiates a COM server has a message pump that will not be blocked?
- What threading model have you registered below the server entry in the registry? In other words; what value is assigned to the registry value "ThreadingModel" in the registry entry
HKCR\CLSID\{<your server CLSID>}\InprocServer32
? - How is the apartment initialized from which you create the server? I suspect it is initialized as a multithreaded apartment.
- Are you using proper marshalling? How?
"It's supposed to be hard, otherwise anybody could do it!" - selfquote
"High speed never compensates for wrong directio - Are you certain that every thread that uses any kind of COM related stuff contains a call to
-
Harish Pulimi wrote:
apologies for not providing more details.
No worries. There's no point in posting a lot of details before it's clear what to provide the details for. :) I don't really understand your architecture and how this mysterious "exe" keeps track of the running servers. It seems like the "exe" is a COM server itself.... :~
Harish Pulimi wrote:
I have an exe which keeps track of all the servers
How is this accomplished if
CreateInstance()
isn't called from this "exe" in order to create the other servers?Harish Pulimi wrote:
This exe needs to run all the time and if not already started, it will start through the next CreateInstance() call made by a client.
How is the "exe" started? Is it an out-of-process COM server and "started" with
CreateInstance()
? Perhaps you should consider to have it running as a service. If the "exe" is an out-of-process COM server that doesn't run as a service, it is quite expected that it will terminate if its reference count reaches zero unless you've provided functionality to prevent it. The call stack snippets you've provided implies that the call to the server is made in an RPC-thread which means that you're using a multithreaded solution. In such case it's fairly reasonable to suspect thread synchronization issues, but first we have to agree on how the apartments are set up. So I have a couple of questions for you...- Are you certain that every thread that uses any kind of COM related stuff contains a call to
::CoInitialize()
or one if its equivalents? - Are you certain that every thread that instantiates a COM server has a message pump that will not be blocked?
- What threading model have you registered below the server entry in the registry? In other words; what value is assigned to the registry value "ThreadingModel" in the registry entry
HKCR\CLSID\{<your server CLSID>}\InprocServer32
? - How is the apartment initialized from which you create the server? I suspect it is initialized as a multithreaded apartment.
- Are you using proper marshalling? How?
"It's supposed to be hard, otherwise anybody could do it!" - selfquote
"High speed never compensates for wrong directioFinally found the bug. In the client code COM library was not being initialized properly. Because of this, it is unable to tear down the connection to the server. Now everything works as expected.
- Are you certain that every thread that uses any kind of COM related stuff contains a call to
-
Finally found the bug. In the client code COM library was not being initialized properly. Because of this, it is unable to tear down the connection to the server. Now everything works as expected.
Can you please share this solution with more details? I am seeing the similar kind of issue, and unable to drill down to root cause and solution. --Ashish