STL and UNICODE

Nick Blumhardt

Hi all, not using MFC for my latest prj, and having a few concerns about character portablily. These are the two approaches I've come up with, but I have no idea what I'm doing, so if anyone can give me a pointer I'll be very grateful :rose: 1. -- typedefs: #ifdef _UNICODE typedef wstring tstring #else typedef string tstring #endif 2. -- _TCHAR strings: typedef basic_string<_TCHAR> tstring .... which would you use? would you use either? cheers & happy coding nb

Michael Dunn

I use this:

typedef std::basic_string<TCHAR> _tstring

Don't forget that I/O streams have different versions too:

#ifdef _UNICODE
#define _tcout wcout;
#define _tcin wcin;
#define _tcerr wcerr;
#else
#define _tcout cout;
#define _tcin cin;
#define _tcerr cerr;
#endif

--Mike-- http://home.inreach.com/mdunn/ "Holding the away team at bay with a non-functioning phaser was an act of unmitigated gall. I admire gall." -- Lt. Cmdr. Worf

Nick Blumhardt

thanks Michael :) think we need more stl warriors in this MFC-polluted world... nb

Lost User

One thing you need to be clear on - do you want to be able to create a single exe that works on both ASCII and UNICODE systems, or do you want to create a code base that can be compiled into an ASCII exe and a separate UNICODE exe ? The entire 'TCHAR' concept that microsoft has developed is for creating a single set of source that can be compiled into both ASCII and UNICODE versions. If you are writing a program that will only be ASCII, or only UNICODE, or if you a writing a program that must be able to handle both, then the TCHAR style of coding is useless. The TCHAR/typedef approach might seem like a good idea, but there are much more significant differences between ASCII and UNICODE code other than the size of the storage uhit (8 bit or 16 bit) - for example, upper/lower case conversion is completely different between ASCII and UNICODE, so a function like : "void ToUpper(_tstring& input);" is of limited value, since although it will compile into ASCII and UNICODE versions, it will probably only work correctly in one or the other! In general, I have found that if Unicode is required, or likely to be needed, it's better just to write the entire program in std::wstring, and convert to/from ASCII(std::string) at the "edges".

Lost User

It is always usefull to use TCHAR, this will help port the code in the future! You might not wan't unicode at the moment, but in the future who knows?

James Pullicino

USES_CONVERSION; A2W("I agree with you."); (2b || !2b)

Lost User

You have missed the point - TCHAR addresses the storage issue (8 bit versus 16 bit), but nothing else. When you say something like : "You might not wan't unicode at the moment, but in the future who knows?", you are implying that the only thing that is different between a UNICODE string and an ASCII string is it's size. This is completely incorrect! A program that is written using TCHAR is NOT a UNICODE program - it is a program that uses 16 bits to store each ASCII character. If "in the future" you really do want a UNICODE program, then you will have to review and probably rewrite any and all code that actually examines the strings in any way (especially I/O code), to reflect to details of the UNICODE specification. There are two main reasons to use UNICODE in a Windows program. 1. Efficiency. Since NT/2000/XP are all UNICODE systems internally, you avoid the 'thunking' layer of the 'A' versions of the API if you use the native 'W' versions (turned on by "#define _UNICODE"). In this case you get some sort of efficiency gain, but you re NOT writing a true UNICODE program - you are writing a UNICODE program that uses only the ASCII subset. Of course you can do this, but do not fool yourself into believeing you are writing a UNICODE program! 2. True UNICODE. In this case you are writing a program that needs to deal with Asian, Middle Eastern or European languages. UNICODE gives you the ability to store all of these, but you must work carefully to write code that works with strings - TCHAR is of no use at all for this! I have found that if you are using UNICODE for point(1), then TCHAR has value - it allows you to write one code base, then compile it into different exe's - one ASCII, one UNICODE. The UNICODE version will have a slight performance boost. However, if you are writing UNICODE for reason (2), then I have found TCHAR to be more trouble than it is worth - it constantly 'hides' the real type, and allows code to be written that will com;ile in different environments, but which does not correctly run in thise environments - in other words, it is very dangerous to write 'library' code using TCHAR, and then assume that it actually is UNICODE compliant.