ANSI to UTF-8

Souldrift

Hi there, I found a code snippet on the web which converts an ANSI string to UTF-8 format. I implemented it in my own and it works. //////////// // to UTF-8 char text[1024]={0}; WCHAR w[1024]={0}; int erg=0; strcpy(text, m_pData); erg=MultiByteToWideChar(CP_ACP, 0, text, -1, w, sizeof(w) / sizeof(WCHAR)); // ANSI to UNICODE erg=WideCharToMultiByte(CP_UTF8, 0, w, -1, text, sizeof(text), 0, 0); // UNICODE to UTF-8 // //////////// After that 'text' is UTF-8 formatted just nicely. Now I was wondering, why doesn´t the following (slightly altered) code not work? I just created a char* instead of char[]; //////////// // to UTF-8 char* text = new char[1024]; WCHAR w[1024]={0}; int erg=0; strcpy(text, m_pData); erg=MultiByteToWideChar(CP_ACP, 0, text, -1, w, sizeof(w) / sizeof(WCHAR)); // ANSI to UNICODE erg=WideCharToMultiByte(CP_UTF8, 0, w, -1, text, sizeof(text), 0, 0); // UNICODE to UTF-8 // //////////// Thanks, Souldrift

tolw

My guess is in the first example the sizeof operator can calculate the size of the array. When it comes to pointers - no such luck. Try passing the size of the allocated memory directly (preferably by using a #define):

#define ARRAY_SIZE 1024
erg=MultiByteToWideChar(CP_ACP, 0, text, -1, w, ARRAY_SIZE); // ANSI to UNICODE
erg=WideCharToMultiByte(CP_UTF8, 0, w, -1, text, ARRAY_SIZE, 0, 0); // UNICODE

Nibu babu thomas

Souldrift wrote:

char* text = new char[1024]; WCHAR w[1024]={0}; int erg=0;

Try to avoid using numbers directly or hard coding instead store in a constant.

const int SIZE = 1024; // Bytes
char* text = new char[SIZE];
WCHAR w[SIZE]={0};

erg=MultiByteToWideChar(CP_ACP, 0, text, -1, w, SIZE); // ANSI to UNICODE
erg=WideCharToMultiByte(CP_UTF8, 0, w, -1, text, SIZE, 0, 0); // UNICODE to UTF-8

So when you change SIZE, this code still keeps working.

Nibu babu thomas Microsoft MVP for VC++ Code must be written to be read, not by the compiler, but by another human being. Programming Blog: http://nibuthomas.wordpress.com

Souldrift

Thanks. That to both of you. That works. Problem is that I wanted to avoid a const size and instead use the variable size of m_pData (the original text). Is that possible? Souldrift

tolw

Since m_pData is a string you can use string operations like strlen. Try this:

char *text = new char[strlen( m_pData ) + 1 /*The +1 is for the End-of-String \0*/];
WCHAR *w = new WCHAR[strlen( m_pData ) + 1];
int erg=0;

strcpy(text, m_pData);

erg=MultiByteToWideChar(CP_ACP, 0, text, -1, w, strlen( text )); // ANSI to UNICODE
erg=WideCharToMultiByte(CP_UTF8, 0, w, -1, text, wcslen( w ), 0, 0); // UNICODE to UTF-8

Just make sure that the strings are all NULL terminated!!