WideCharToMultiByte vs Encoding::UTF8->GetBytes
-
I'm looking into perf tuning in our application and one area we've identified when converting many strings between String^ and a native array of UTF-8 chars. Currently, I use code similar to this:
array^ byteArray = System::Text::Encoding::UTF8->GetBytes(str);
pin_ptr p = &byteArray[0];I then proceed to memcpy from p to my own storage block. Has anyone compared Encoding::UTF8->GetBytes() to pinning a string^ and using WideCharToMultiByte(CP_UTF8, ...)? I suspect it will be faster to use WideCharToMultiByte even if I call twice (once to get byte count, once to convert) and will investigate today but I thought there may be a war story or two out there. Any lessons learned? John
-
I'm looking into perf tuning in our application and one area we've identified when converting many strings between String^ and a native array of UTF-8 chars. Currently, I use code similar to this:
array^ byteArray = System::Text::Encoding::UTF8->GetBytes(str);
pin_ptr p = &byteArray[0];I then proceed to memcpy from p to my own storage block. Has anyone compared Encoding::UTF8->GetBytes() to pinning a string^ and using WideCharToMultiByte(CP_UTF8, ...)? I suspect it will be faster to use WideCharToMultiByte even if I call twice (once to get byte count, once to convert) and will investigate today but I thought there may be a war story or two out there. Any lessons learned? John
Update: Well, my initial experiment proved to me that YES, it's much faster to use WideCharToMultiByte(). The speedup varies by language of text I'm converting of course. The time to run my tests were reduced by: English: 13%, German: 18%, Japanese: 16%, Chinese: 12% The gist of my code is now:
String^ str = "...the string to convert...";
pin_ptr unicode16 = PtrToStringChars(str);
// Perf Note: Surprisingly, using -1 for length is MUCH faster than using a precomputed str->Length+1
int const cbNeeded = WideCharToMultiByte(CP_UTF8, 0, unicode16, -1, nullptr, 0, nullptr, nullptr);auto converted = make_unique(cbNeeded);
int const cbConverted = WideCharToMultiByte(CP_UTF8, 0, unicode16, -1, converted.get(), cbNeeded, nullptr, nullptr);
// ... use converted ...
It was a surprise that passing -1 for the length parameter to WCtoMB resulted in an even faster conversion! I hope this helps someone out there and I'm still interested in any responses from any devs doing similar work. John
-
Update: Well, my initial experiment proved to me that YES, it's much faster to use WideCharToMultiByte(). The speedup varies by language of text I'm converting of course. The time to run my tests were reduced by: English: 13%, German: 18%, Japanese: 16%, Chinese: 12% The gist of my code is now:
String^ str = "...the string to convert...";
pin_ptr unicode16 = PtrToStringChars(str);
// Perf Note: Surprisingly, using -1 for length is MUCH faster than using a precomputed str->Length+1
int const cbNeeded = WideCharToMultiByte(CP_UTF8, 0, unicode16, -1, nullptr, 0, nullptr, nullptr);auto converted = make_unique(cbNeeded);
int const cbConverted = WideCharToMultiByte(CP_UTF8, 0, unicode16, -1, converted.get(), cbNeeded, nullptr, nullptr);
// ... use converted ...
It was a surprise that passing -1 for the length parameter to WCtoMB resulted in an even faster conversion! I hope this helps someone out there and I'm still interested in any responses from any devs doing similar work. John
Thanks for posting the result. This is valuable information. :)
The difficult we do right away... ...the impossible takes slightly longer.