Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. ANSI to UTF-8

ANSI to UTF-8

Scheduled Pinned Locked Moved C / C++ / MFC
question
5 Posts 3 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S Offline
    S Offline
    Souldrift
    wrote on last edited by
    #1

    Hi there, I found a code snippet on the web which converts an ANSI string to UTF-8 format. I implemented it in my own and it works. //////////// // to UTF-8 char text[1024]={0}; WCHAR w[1024]={0}; int erg=0; strcpy(text, m_pData); erg=MultiByteToWideChar(CP_ACP, 0, text, -1, w, sizeof(w) / sizeof(WCHAR)); // ANSI to UNICODE erg=WideCharToMultiByte(CP_UTF8, 0, w, -1, text, sizeof(text), 0, 0); // UNICODE to UTF-8 // //////////// After that 'text' is UTF-8 formatted just nicely. Now I was wondering, why doesn´t the following (slightly altered) code not work? I just created a char* instead of char[]; //////////// // to UTF-8 char* text = new char[1024]; WCHAR w[1024]={0}; int erg=0; strcpy(text, m_pData); erg=MultiByteToWideChar(CP_ACP, 0, text, -1, w, sizeof(w) / sizeof(WCHAR)); // ANSI to UNICODE erg=WideCharToMultiByte(CP_UTF8, 0, w, -1, text, sizeof(text), 0, 0); // UNICODE to UTF-8 // //////////// Thanks, Souldrift

    T N 2 Replies Last reply
    0
    • S Souldrift

      Hi there, I found a code snippet on the web which converts an ANSI string to UTF-8 format. I implemented it in my own and it works. //////////// // to UTF-8 char text[1024]={0}; WCHAR w[1024]={0}; int erg=0; strcpy(text, m_pData); erg=MultiByteToWideChar(CP_ACP, 0, text, -1, w, sizeof(w) / sizeof(WCHAR)); // ANSI to UNICODE erg=WideCharToMultiByte(CP_UTF8, 0, w, -1, text, sizeof(text), 0, 0); // UNICODE to UTF-8 // //////////// After that 'text' is UTF-8 formatted just nicely. Now I was wondering, why doesn´t the following (slightly altered) code not work? I just created a char* instead of char[]; //////////// // to UTF-8 char* text = new char[1024]; WCHAR w[1024]={0}; int erg=0; strcpy(text, m_pData); erg=MultiByteToWideChar(CP_ACP, 0, text, -1, w, sizeof(w) / sizeof(WCHAR)); // ANSI to UNICODE erg=WideCharToMultiByte(CP_UTF8, 0, w, -1, text, sizeof(text), 0, 0); // UNICODE to UTF-8 // //////////// Thanks, Souldrift

      T Offline
      T Offline
      tolw
      wrote on last edited by
      #2

      My guess is in the first example the sizeof operator can calculate the size of the array. When it comes to pointers - no such luck. Try passing the size of the allocated memory directly (preferably by using a #define):

      #define ARRAY_SIZE 1024
      erg=MultiByteToWideChar(CP_ACP, 0, text, -1, w, ARRAY_SIZE); // ANSI to UNICODE
      erg=WideCharToMultiByte(CP_UTF8, 0, w, -1, text, ARRAY_SIZE, 0, 0); // UNICODE

      1 Reply Last reply
      0
      • S Souldrift

        Hi there, I found a code snippet on the web which converts an ANSI string to UTF-8 format. I implemented it in my own and it works. //////////// // to UTF-8 char text[1024]={0}; WCHAR w[1024]={0}; int erg=0; strcpy(text, m_pData); erg=MultiByteToWideChar(CP_ACP, 0, text, -1, w, sizeof(w) / sizeof(WCHAR)); // ANSI to UNICODE erg=WideCharToMultiByte(CP_UTF8, 0, w, -1, text, sizeof(text), 0, 0); // UNICODE to UTF-8 // //////////// After that 'text' is UTF-8 formatted just nicely. Now I was wondering, why doesn´t the following (slightly altered) code not work? I just created a char* instead of char[]; //////////// // to UTF-8 char* text = new char[1024]; WCHAR w[1024]={0}; int erg=0; strcpy(text, m_pData); erg=MultiByteToWideChar(CP_ACP, 0, text, -1, w, sizeof(w) / sizeof(WCHAR)); // ANSI to UNICODE erg=WideCharToMultiByte(CP_UTF8, 0, w, -1, text, sizeof(text), 0, 0); // UNICODE to UTF-8 // //////////// Thanks, Souldrift

        N Offline
        N Offline
        Nibu babu thomas
        wrote on last edited by
        #3

        Souldrift wrote:

        char* text = new char[1024]; WCHAR w[1024]={0}; int erg=0;

        Try to avoid using numbers directly or hard coding instead store in a constant.

        const int SIZE = 1024; // Bytes
        char* text = new char[SIZE];
        WCHAR w[SIZE]={0};

        erg=MultiByteToWideChar(CP_ACP, 0, text, -1, w, SIZE); // ANSI to UNICODE
        erg=WideCharToMultiByte(CP_UTF8, 0, w, -1, text, SIZE, 0, 0); // UNICODE to UTF-8

        So when you change SIZE, this code still keeps working.

        Nibu babu thomas Microsoft MVP for VC++ Code must be written to be read, not by the compiler, but by another human being. Programming Blog: http://nibuthomas.wordpress.com

        S 1 Reply Last reply
        0
        • N Nibu babu thomas

          Souldrift wrote:

          char* text = new char[1024]; WCHAR w[1024]={0}; int erg=0;

          Try to avoid using numbers directly or hard coding instead store in a constant.

          const int SIZE = 1024; // Bytes
          char* text = new char[SIZE];
          WCHAR w[SIZE]={0};

          erg=MultiByteToWideChar(CP_ACP, 0, text, -1, w, SIZE); // ANSI to UNICODE
          erg=WideCharToMultiByte(CP_UTF8, 0, w, -1, text, SIZE, 0, 0); // UNICODE to UTF-8

          So when you change SIZE, this code still keeps working.

          Nibu babu thomas Microsoft MVP for VC++ Code must be written to be read, not by the compiler, but by another human being. Programming Blog: http://nibuthomas.wordpress.com

          S Offline
          S Offline
          Souldrift
          wrote on last edited by
          #4

          Thanks. That to both of you. That works. Problem is that I wanted to avoid a const size and instead use the variable size of m_pData (the original text). Is that possible? Souldrift

          T 1 Reply Last reply
          0
          • S Souldrift

            Thanks. That to both of you. That works. Problem is that I wanted to avoid a const size and instead use the variable size of m_pData (the original text). Is that possible? Souldrift

            T Offline
            T Offline
            tolw
            wrote on last edited by
            #5

            Since m_pData is a string you can use string operations like strlen. Try this:

            char *text = new char[strlen( m_pData ) + 1 /*The +1 is for the End-of-String \0*/];
            WCHAR *w = new WCHAR[strlen( m_pData ) + 1];
            int erg=0;

            strcpy(text, m_pData);

            erg=MultiByteToWideChar(CP_ACP, 0, text, -1, w, strlen( text )); // ANSI to UNICODE
            erg=WideCharToMultiByte(CP_UTF8, 0, w, -1, text, wcslen( w ), 0, 0); // UNICODE to UTF-8

            Just make sure that the strings are all NULL terminated!!

            1 Reply Last reply
            0
            Reply
            • Reply as topic
            Log in to reply
            • Oldest to Newest
            • Newest to Oldest
            • Most Votes


            • Login

            • Don't have an account? Register

            • Login or register to search.
            • First post
              Last post
            0
            • Categories
            • Recent
            • Tags
            • Popular
            • World
            • Users
            • Groups