ANSI to UTF-8
-
Hi there, I found a code snippet on the web which converts an ANSI string to UTF-8 format. I implemented it in my own and it works.
//////////// // to UTF-8 char text[1024]={0}; WCHAR w[1024]={0}; int erg=0; strcpy(text, m_pData); erg=MultiByteToWideChar(CP_ACP, 0, text, -1, w, sizeof(w) / sizeof(WCHAR)); // ANSI to UNICODE erg=WideCharToMultiByte(CP_UTF8, 0, w, -1, text, sizeof(text), 0, 0); // UNICODE to UTF-8 // ////////////
After that 'text' is UTF-8 formatted just nicely. Now I was wondering, why doesn´t the following (slightly altered) code not work? I just created a char* instead of char[];//////////// // to UTF-8 char* text = new char[1024]; WCHAR w[1024]={0}; int erg=0; strcpy(text, m_pData); erg=MultiByteToWideChar(CP_ACP, 0, text, -1, w, sizeof(w) / sizeof(WCHAR)); // ANSI to UNICODE erg=WideCharToMultiByte(CP_UTF8, 0, w, -1, text, sizeof(text), 0, 0); // UNICODE to UTF-8 // ////////////
Thanks, Souldrift -
Hi there, I found a code snippet on the web which converts an ANSI string to UTF-8 format. I implemented it in my own and it works.
//////////// // to UTF-8 char text[1024]={0}; WCHAR w[1024]={0}; int erg=0; strcpy(text, m_pData); erg=MultiByteToWideChar(CP_ACP, 0, text, -1, w, sizeof(w) / sizeof(WCHAR)); // ANSI to UNICODE erg=WideCharToMultiByte(CP_UTF8, 0, w, -1, text, sizeof(text), 0, 0); // UNICODE to UTF-8 // ////////////
After that 'text' is UTF-8 formatted just nicely. Now I was wondering, why doesn´t the following (slightly altered) code not work? I just created a char* instead of char[];//////////// // to UTF-8 char* text = new char[1024]; WCHAR w[1024]={0}; int erg=0; strcpy(text, m_pData); erg=MultiByteToWideChar(CP_ACP, 0, text, -1, w, sizeof(w) / sizeof(WCHAR)); // ANSI to UNICODE erg=WideCharToMultiByte(CP_UTF8, 0, w, -1, text, sizeof(text), 0, 0); // UNICODE to UTF-8 // ////////////
Thanks, SouldriftMy guess is in the first example the sizeof operator can calculate the size of the array. When it comes to pointers - no such luck. Try passing the size of the allocated memory directly (preferably by using a #define):
#define ARRAY_SIZE 1024
erg=MultiByteToWideChar(CP_ACP, 0, text, -1, w, ARRAY_SIZE); // ANSI to UNICODE
erg=WideCharToMultiByte(CP_UTF8, 0, w, -1, text, ARRAY_SIZE, 0, 0); // UNICODE -
Hi there, I found a code snippet on the web which converts an ANSI string to UTF-8 format. I implemented it in my own and it works.
//////////// // to UTF-8 char text[1024]={0}; WCHAR w[1024]={0}; int erg=0; strcpy(text, m_pData); erg=MultiByteToWideChar(CP_ACP, 0, text, -1, w, sizeof(w) / sizeof(WCHAR)); // ANSI to UNICODE erg=WideCharToMultiByte(CP_UTF8, 0, w, -1, text, sizeof(text), 0, 0); // UNICODE to UTF-8 // ////////////
After that 'text' is UTF-8 formatted just nicely. Now I was wondering, why doesn´t the following (slightly altered) code not work? I just created a char* instead of char[];//////////// // to UTF-8 char* text = new char[1024]; WCHAR w[1024]={0}; int erg=0; strcpy(text, m_pData); erg=MultiByteToWideChar(CP_ACP, 0, text, -1, w, sizeof(w) / sizeof(WCHAR)); // ANSI to UNICODE erg=WideCharToMultiByte(CP_UTF8, 0, w, -1, text, sizeof(text), 0, 0); // UNICODE to UTF-8 // ////////////
Thanks, SouldriftSouldrift wrote:
char* text = new char[1024]; WCHAR w[1024]={0}; int erg=0;
Try to avoid using numbers directly or hard coding instead store in a constant.
const int SIZE = 1024; // Bytes
char* text = new char[SIZE];
WCHAR w[SIZE]={0};erg=MultiByteToWideChar(CP_ACP, 0, text, -1, w, SIZE); // ANSI to UNICODE
erg=WideCharToMultiByte(CP_UTF8, 0, w, -1, text, SIZE, 0, 0); // UNICODE to UTF-8So when you change SIZE, this code still keeps working.
Nibu babu thomas Microsoft MVP for VC++ Code must be written to be read, not by the compiler, but by another human being. Programming Blog: http://nibuthomas.wordpress.com
-
Souldrift wrote:
char* text = new char[1024]; WCHAR w[1024]={0}; int erg=0;
Try to avoid using numbers directly or hard coding instead store in a constant.
const int SIZE = 1024; // Bytes
char* text = new char[SIZE];
WCHAR w[SIZE]={0};erg=MultiByteToWideChar(CP_ACP, 0, text, -1, w, SIZE); // ANSI to UNICODE
erg=WideCharToMultiByte(CP_UTF8, 0, w, -1, text, SIZE, 0, 0); // UNICODE to UTF-8So when you change SIZE, this code still keeps working.
Nibu babu thomas Microsoft MVP for VC++ Code must be written to be read, not by the compiler, but by another human being. Programming Blog: http://nibuthomas.wordpress.com
-
Thanks. That to both of you. That works. Problem is that I wanted to avoid a const size and instead use the variable size of m_pData (the original text). Is that possible? Souldrift
Since m_pData is a string you can use string operations like strlen. Try this:
char *text = new char[strlen( m_pData ) + 1 /*The +1 is for the End-of-String \0*/];
WCHAR *w = new WCHAR[strlen( m_pData ) + 1];
int erg=0;strcpy(text, m_pData);
erg=MultiByteToWideChar(CP_ACP, 0, text, -1, w, strlen( text )); // ANSI to UNICODE
erg=WideCharToMultiByte(CP_UTF8, 0, w, -1, text, wcslen( w ), 0, 0); // UNICODE to UTF-8Just make sure that the strings are all NULL terminated!!