bstr, wchar_t, and code pages
-
Hello, I have text in bstr string. Text is in some code page. How to convert text from code page to UTF8 and put it in wchar_t?
A
BSTR
holds a Unicode string, which has no code pages. You can useWideCharToMultiByte()
withCP_UTF8
as the first parameter to convert it to UTF-8, but the destination is achar
array, notwchar_t
. --Mike-- LINKS~! Ericahist | 1ClickPicGrabber | CP SearchBar v2.0.2 | C++ Forum FAQ | You Are Dumb Magnae clunes mihi placent, nec possum de hac re mentiri. -
A
BSTR
holds a Unicode string, which has no code pages. You can useWideCharToMultiByte()
withCP_UTF8
as the first parameter to convert it to UTF-8, but the destination is achar
array, notwchar_t
. --Mike-- LINKS~! Ericahist | 1ClickPicGrabber | CP SearchBar v2.0.2 | C++ Forum FAQ | You Are Dumb Magnae clunes mihi placent, nec possum de hac re mentiri. -
Michael Dunn wrote: A BSTR holds a Unicode string, which has no code pages. HTML page, from IE, is taken in BSTR. What about charset which is defined in HTML? Does it mean no need to convert from the charset to UTF8?
When IE reads the HTML, it handles the encoding itself. When you get the HTML via a COM method, it's returned as a
BSTR
(using UCS-2 encoding) because that's how strings are passed around in COM. So you need to be clear about what you want. If you want to change thatBSTR
to UTF-8, see my previous answer. --Mike-- LINKS~! Ericahist | 1ClickPicGrabber | CP SearchBar v2.0.2 | C++ Forum FAQ | You Are Dumb Magnae clunes mihi placent, nec possum de hac re mentiri. -
When IE reads the HTML, it handles the encoding itself. When you get the HTML via a COM method, it's returned as a
BSTR
(using UCS-2 encoding) because that's how strings are passed around in COM. So you need to be clear about what you want. If you want to change thatBSTR
to UTF-8, see my previous answer. --Mike-- LINKS~! Ericahist | 1ClickPicGrabber | CP SearchBar v2.0.2 | C++ Forum FAQ | You Are Dumb Magnae clunes mihi placent, nec possum de hac re mentiri. -
You can't store UTF-8 in a
wchar_t
array because UTF-8 is a byte-oriented encoding. --Mike-- LINKS~! Ericahist | 1ClickPicGrabber | CP SearchBar v2.0.2 | C++ Forum FAQ | You Are Dumb Magnae clunes mihi placent, nec possum de hac re mentiri.