MultiByteToWideChar crashes out on longer strings [modified]

RichardBrock

I've got a code segment below that converts the block of chars read in from an RSS XML file and converts it to the unicode equivalent using a code page (the code page ID is determined beforehand by scanning for the the encoding ID in the XML). Sometimes it works, sometimes it crashes the application. I'm using the buffer requirement returned by 'MultiByteToWideChar' to allocate the buffer required (+1 because the block of chars I'm passing does not include the 0 terminator). What's really odd is that on small strings it works fine, but on a 216 character string it crashes. If I extend the buffer allocation by 1 then it works all the time. I should not have to kludge buffer allocation to make things work. int nSizeReq = MultiByteToWideChar(m_nCPID,0,(const char* m_pChars,nBlockLength,0,0); TCHAR* pszConverted = new TCHAR[nSizeReq+1]; _tcsnset(pszConverted,0,nSizeReq+1); MultiByteToWideChar(m_nCPID, 0, (const char*)m_pChars,nBlockLength, pszConverted, nSizeReq); //up to here it always works, but the next step crashes because pszConverted has damage //past its memory allocation CString strConverted = pszConverted Any ideas why this could be happening? ps. it runs just great in debug, and if I run it in release mode whilst in visual studio it also works. Run the release by itself and *boom* [edit] Coded in C++ (MFC application) using Visual Studio 2008 Unicode is defined Testing on Vista.

modified on Friday, March 13, 2009 10:51 AM

led mike · modified on Friday, March 13, 2009 10:51 AM

First thing I notice is the code you posted does not check the return value of MultiByteToWideChar. Therefore your next operations are running on blind faith. This is not usually considered a Software Development Best Practice.

Akt_4_U · modified on Friday, March 13, 2009 10:51 AM

How are you calculating this nBlockLength?

prvn

RichardBrock

Yep, you make a good point, I updated the code to check the value: int nSizeReq = MultiByteToWideChar(m_nCPID,0,(const char* m_pChars,nBlockLength,0,0); TCHAR* pszConverted = new TCHAR[nSizeReq+1]; _tcsnset(pszConverted,0,nSizeReq+1); int nConverted = MultiByteToWideChar(m_nCPID, 0, (const char*)m_pChars,nBlockLength, pszConverted, nSizeReq); int nTest = wcslen(pszConverted); the results: nConverted = 214. nSizeReq = 214, but nTest = 206. Weird.

RichardBrock

I have a file open using CreateFile, I use ReadFile to locate the start and end tags in the XML file for the field, e.g. <description>.....</description> (Internet RSS news feed in Arabic). The nBlockLength indicates the number of characters extracted between > and <, the m_pChars buffer holds the actual character data.

led mike

RichardBrock wrote:

Weird.

What are you compiling to? Try int nTest = _tcslen(pszConverted);

RichardBrock

The project outputs a 32 bit Windows executable (target platforms are XP and Vista), MFC linked is static. I tried _tcslen as you suggested, same result. I'm testing from a live RSS feed, so the news item length has changed but here's the latest output from my outputdebugstring placed just after the 2nd MultiByteToWideChar call. 'return value = 140 (wcslen is 308) nSizeReq = 140 nBlockLength = 140' so you can see the function call returns 140, the buffer allocated was 140 and the block length read from the file is 140. But the converted string is 308 in length, obviously overruning memory allocated to it. Do you think compiler optimization could be causing a problem? I'm compiling with 'Enable link-time code generation (/GL)'. Btw, a previous call for the preceding news item yields: 'return value = 121 (wcslen is 121) nSizeReq = 121 nBlockLength = 121'

led mike

You did not answer my question, I guess I didn't make it clear but I assumed you had some knowledge about what you were doing.

led mike wrote:

What are you compiling to?

Since the subject of this discussion is character sets that's what that question is about. Are you compiling to _UNICODE or what? You should check out the example of MultiByteToWideChar in this article[^] Try to look for the differences in your code, there an obvious difference. Next I strongly urge you to study this subject thoroughly before you attempt your implementation. My experience is that working with conversion requires a sound understanding of this subject. I believe there are great articles here on Code Project that cover this topic well.