Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. Managed C++/CLI
  4. WideCharToMultiByte vs Encoding::UTF8->GetBytes

WideCharToMultiByte vs Encoding::UTF8->GetBytes

Scheduled Pinned Locked Moved Managed C++/CLI
c++visual-studiodata-structuresquestion
3 Posts 2 Posters 5 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J Offline
    J Offline
    John Schroedl
    wrote on last edited by
    #1

    I'm looking into perf tuning in our application and one area we've identified when converting many strings between String^ and a native array of UTF-8 chars. Currently, I use code similar to this:

    array^ byteArray = System::Text::Encoding::UTF8->GetBytes(str);
    pin_ptr p = &byteArray[0];

    I then proceed to memcpy from p to my own storage block. Has anyone compared Encoding::UTF8->GetBytes() to pinning a string^ and using WideCharToMultiByte(CP_UTF8, ...)? I suspect it will be faster to use WideCharToMultiByte even if I call twice (once to get byte count, once to convert) and will investigate today but I thought there may be a war story or two out there. Any lessons learned? John

    J 1 Reply Last reply
    0
    • J John Schroedl

      I'm looking into perf tuning in our application and one area we've identified when converting many strings between String^ and a native array of UTF-8 chars. Currently, I use code similar to this:

      array^ byteArray = System::Text::Encoding::UTF8->GetBytes(str);
      pin_ptr p = &byteArray[0];

      I then proceed to memcpy from p to my own storage block. Has anyone compared Encoding::UTF8->GetBytes() to pinning a string^ and using WideCharToMultiByte(CP_UTF8, ...)? I suspect it will be faster to use WideCharToMultiByte even if I call twice (once to get byte count, once to convert) and will investigate today but I thought there may be a war story or two out there. Any lessons learned? John

      J Offline
      J Offline
      John Schroedl
      wrote on last edited by
      #2

      Update: Well, my initial experiment proved to me that YES, it's much faster to use WideCharToMultiByte(). The speedup varies by language of text I'm converting of course. The time to run my tests were reduced by: English: 13%, German: 18%, Japanese: 16%, Chinese: 12% The gist of my code is now:

      String^ str = "...the string to convert...";

      pin_ptr unicode16 = PtrToStringChars(str);

      // Perf Note: Surprisingly, using -1 for length is MUCH faster than using a precomputed str->Length+1
      int const cbNeeded = WideCharToMultiByte(CP_UTF8, 0, unicode16, -1, nullptr, 0, nullptr, nullptr);

      auto converted = make_unique(cbNeeded);

      int const cbConverted = WideCharToMultiByte(CP_UTF8, 0, unicode16, -1, converted.get(), cbNeeded, nullptr, nullptr);

      // ... use converted ...

      It was a surprise that passing -1 for the length parameter to WCtoMB resulted in an even faster conversion! I hope this helps someone out there and I'm still interested in any responses from any devs doing similar work. John

      Richard Andrew x64R 1 Reply Last reply
      0
      • J John Schroedl

        Update: Well, my initial experiment proved to me that YES, it's much faster to use WideCharToMultiByte(). The speedup varies by language of text I'm converting of course. The time to run my tests were reduced by: English: 13%, German: 18%, Japanese: 16%, Chinese: 12% The gist of my code is now:

        String^ str = "...the string to convert...";

        pin_ptr unicode16 = PtrToStringChars(str);

        // Perf Note: Surprisingly, using -1 for length is MUCH faster than using a precomputed str->Length+1
        int const cbNeeded = WideCharToMultiByte(CP_UTF8, 0, unicode16, -1, nullptr, 0, nullptr, nullptr);

        auto converted = make_unique(cbNeeded);

        int const cbConverted = WideCharToMultiByte(CP_UTF8, 0, unicode16, -1, converted.get(), cbNeeded, nullptr, nullptr);

        // ... use converted ...

        It was a surprise that passing -1 for the length parameter to WCtoMB resulted in an even faster conversion! I hope this helps someone out there and I'm still interested in any responses from any devs doing similar work. John

        Richard Andrew x64R Offline
        Richard Andrew x64R Offline
        Richard Andrew x64
        wrote on last edited by
        #3

        Thanks for posting the result. This is valuable information. :)

        The difficult we do right away... ...the impossible takes slightly longer.

        1 Reply Last reply
        0
        Reply
        • Reply as topic
        Log in to reply
        • Oldest to Newest
        • Newest to Oldest
        • Most Votes


        • Login

        • Don't have an account? Register

        • Login or register to search.
        • First post
          Last post
        0
        • Categories
        • Recent
        • Tags
        • Popular
        • World
        • Users
        • Groups