Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. ATL / WTL / STL
  4. Unicode and codeproject article

Unicode and codeproject article

Scheduled Pinned Locked Moved ATL / WTL / STL
c++comtutorialquestion
14 Posts 4 Posters 9 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • D Daniel Pfeffer

    1. All current versions of Windows fully support Unicode. You should use ANSI functions only if your code needs to run on the Windows 95/98/Me series. In standard C++, this typically means using std::wstring rather than std::string. 2. Your UI should definitely be in Unicode. This makes translating your code to run in a different language much easier. However, Internationalization (I18n) and Localization (L10n) are separate topics. 3. Your text data storage should use UTF-8 encoding or something similar. Not only will this save storage for the common (in the Americas and Europe) case of Latin characters, but it is a well-defined coding that is portable across any display language that you are likely to use.

    If you have an important point to make, don't try to be subtle or clever. Use a pile driver. Hit the point once. Then come back and hit it again. Then hit it a third time - a tremendous whack. --Winston Churchill

    B Offline
    B Offline
    bkelly13
    wrote on last edited by
    #3

    Hello Daniel, Then that is the way I am going. I found a 2012 article here on code project titled

    Quote:

    What are TCHAR, WCHAR, LPSTR, LPWSTR, LPCTSTR (etc.)?

    and will be using it for my first reference and learning tool. Nice quote in your siggie.

    Thank you for your time If you work with telemetry, please check this bulletin board: www.irigbb.com

    1 Reply Last reply
    0
    • D Daniel Pfeffer

      1. All current versions of Windows fully support Unicode. You should use ANSI functions only if your code needs to run on the Windows 95/98/Me series. In standard C++, this typically means using std::wstring rather than std::string. 2. Your UI should definitely be in Unicode. This makes translating your code to run in a different language much easier. However, Internationalization (I18n) and Localization (L10n) are separate topics. 3. Your text data storage should use UTF-8 encoding or something similar. Not only will this save storage for the common (in the Americas and Europe) case of Latin characters, but it is a well-defined coding that is portable across any display language that you are likely to use.

      If you have an important point to make, don't try to be subtle or clever. Use a pile driver. Hit the point once. Then come back and hit it again. Then hit it a third time - a tremendous whack. --Winston Churchill

      B Offline
      B Offline
      bkelly13
      wrote on last edited by
      #4

      Additional reading is not yielding a good conclusion. My application is in telemetry. Cutting this to an absolute minimum, I use Excel VBA to build a text based file containing as many as 100,000 pieces of information. My applications uses that to configure itself and determine how to translate the raw input into parameters that another application displays in real time. The application can write copious amounts to text base log files so I can understand the data better and see how it runs. The only human interaction is to start the app, select a configuration file, and use checkboxes to set logging options. Everything is currently running as Unicode in Visual Studio. The app will never be used by the general public. There is no expectation of translation to other languages. But, I do want to write in a style that will be useful in other projects. Am I OK with Unicode and strings such as L"read this"? Do I need to use the UTF-8 options?

      Thank you for your time

      D L 2 Replies Last reply
      0
      • B bkelly13

        Additional reading is not yielding a good conclusion. My application is in telemetry. Cutting this to an absolute minimum, I use Excel VBA to build a text based file containing as many as 100,000 pieces of information. My applications uses that to configure itself and determine how to translate the raw input into parameters that another application displays in real time. The application can write copious amounts to text base log files so I can understand the data better and see how it runs. The only human interaction is to start the app, select a configuration file, and use checkboxes to set logging options. Everything is currently running as Unicode in Visual Studio. The app will never be used by the general public. There is no expectation of translation to other languages. But, I do want to write in a style that will be useful in other projects. Am I OK with Unicode and strings such as L"read this"? Do I need to use the UTF-8 options?

        Thank you for your time

        D Offline
        D Offline
        Daniel Pfeffer
        wrote on last edited by
        #5

        Given your constraints (no public release, no translation to other languages), using Unicode is not necessary. The ANSI functions are a tiny bit slower (they must convert all string data to/from Unicode), but that is not relevant to your case. I still believe that for new programs, Unicode is the correct way to go for UI. Among other reasons, Microsoft is slowly "deprecating" its MBCS (multi-byte character set) support - in recent versions of Visual Studio, the MBCS library was a separate download! As for the data processing, that depends on the input and output formats. If your input is ASCII (alphanumerics, punctuation, CR/LF), and the output is the same, there is no need or reason to convert it to Unicode for processing. Just as a (very) short example, this coding style is perfectly valid:

        #define UNICODE // defined when you set the Windows functions to Unicode-style in VS
        #include #include void foo(void)
        {
        FILE* fp = fopen( "bar", "rb" );
        int c;

        //...

        while ((c = getc(fp) != EOF)
        {
        if ( c == '\x42' )
        MessageBox( NULL, L"Telemetry", L"Bad input", MB_OK );
        // further processing here...
        }

        //...
        }

        Note that I am using char functions to read the data, but Unicode (wide char) functions for the UI. If you must force a Windows API to be char-based (ANSI), use the name with an 'A' suffix (e.g. MessageBoxA instead of MessageBox). If you must force it to be wide char-based (Unicode), use a 'W' suffix. This, of course, only applies to APIs that have string / character parameters. If you need to convert between Unicode and ASCII (or UTF-8), the best way to do so is using the WideCharToMultiByte() / MultiByteToWideChar() Windows APIs. I hope that this helps.

        If you have an important point to make, don't try to be subtle or clever. Use a pile driver. Hit the point once. Then come back and hit it again. Then hit it a third time - a tremendous whack. --Winston Churchill

        1 Reply Last reply
        0
        • B bkelly13

          Additional reading is not yielding a good conclusion. My application is in telemetry. Cutting this to an absolute minimum, I use Excel VBA to build a text based file containing as many as 100,000 pieces of information. My applications uses that to configure itself and determine how to translate the raw input into parameters that another application displays in real time. The application can write copious amounts to text base log files so I can understand the data better and see how it runs. The only human interaction is to start the app, select a configuration file, and use checkboxes to set logging options. Everything is currently running as Unicode in Visual Studio. The app will never be used by the general public. There is no expectation of translation to other languages. But, I do want to write in a style that will be useful in other projects. Am I OK with Unicode and strings such as L"read this"? Do I need to use the UTF-8 options?

          Thank you for your time

          L Offline
          L Offline
          Lost User
          wrote on last edited by
          #6

          bkelly13 wrote:

          Am I OK with Unicode and strings such as L"read this"? Do I need to use the UTF-8 options?

          If you make everything Unicode, you should not have any issues. Apart from perhaps converting your text files from ANSI to Unicode when you read them. Either way, Unicode is the best choice for the long term, especially as you may decide to move to Windows Forms/C# in the future.

          D 1 Reply Last reply
          0
          • L Lost User

            bkelly13 wrote:

            Am I OK with Unicode and strings such as L"read this"? Do I need to use the UTF-8 options?

            If you make everything Unicode, you should not have any issues. Apart from perhaps converting your text files from ANSI to Unicode when you read them. Either way, Unicode is the best choice for the long term, especially as you may decide to move to Windows Forms/C# in the future.

            D Offline
            D Offline
            Daniel Pfeffer
            wrote on last edited by
            #7

            Richard MacCutchan wrote:

            If you make everything Unicode, you should not have any issues.

            The OP is processing real-time telemetry, which is (these days) usually char-based. IMO, there is no good reason to convert the telemetry to Unicode before processing - it slows the processing, doubles the storage requirements, and adds nothing to any processing of numeric data. Similar considerations apply to the output.

            If you have an important point to make, don't try to be subtle or clever. Use a pile driver. Hit the point once. Then come back and hit it again. Then hit it a third time - a tremendous whack. --Winston Churchill

            L T 2 Replies Last reply
            0
            • D Daniel Pfeffer

              Richard MacCutchan wrote:

              If you make everything Unicode, you should not have any issues.

              The OP is processing real-time telemetry, which is (these days) usually char-based. IMO, there is no good reason to convert the telemetry to Unicode before processing - it slows the processing, doubles the storage requirements, and adds nothing to any processing of numeric data. Similar considerations apply to the output.

              If you have an important point to make, don't try to be subtle or clever. Use a pile driver. Hit the point once. Then come back and hit it again. Then hit it a third time - a tremendous whack. --Winston Churchill

              L Offline
              L Offline
              Lost User
              wrote on last edited by
              #8

              I am well aware of what he is doing, and I only added that as a "perhaps". At the end of the day it's his choice.

              D 1 Reply Last reply
              0
              • L Lost User

                I am well aware of what he is doing, and I only added that as a "perhaps". At the end of the day it's his choice.

                D Offline
                D Offline
                Daniel Pfeffer
                wrote on last edited by
                #9

                I sit corrected. :)

                If you have an important point to make, don't try to be subtle or clever. Use a pile driver. Hit the point once. Then come back and hit it again. Then hit it a third time - a tremendous whack. --Winston Churchill

                L 1 Reply Last reply
                0
                • D Daniel Pfeffer

                  I sit corrected. :)

                  If you have an important point to make, don't try to be subtle or clever. Use a pile driver. Hit the point once. Then come back and hit it again. Then hit it a third time - a tremendous whack. --Winston Churchill

                  L Offline
                  L Offline
                  Lost User
                  wrote on last edited by
                  #10

                  I stand in ignorance. ;)

                  B 1 Reply Last reply
                  0
                  • L Lost User

                    I stand in ignorance. ;)

                    B Offline
                    B Offline
                    bkelly13
                    wrote on last edited by
                    #11

                    Telemetry data is usually all numbers and all binary. Economy in bandwidth is a primary goal. The only text may be things like software version embedded in some parts. Even then those are treated as binary data and handed off to the display device. The text part is where I have a "bunch" of Excel code to build configuration files. Some assembly, make that much assembly, is required to translate the vendor telemetry map (describes all the fields of the data) to something directly usable by my application. When not running in mission mode the app can write copious log files so I can verify what it did and why. Those are all text based for easy reading. Unicode is fine there. Side note/rant IRIG (Inter Range Instrumentation Group) defines telemetry standards for all government ranges. A range is a place where things like bombs are dropped and missiles shot. That standard defines bit 1 as being the MSB and bit N being the LSB. It is absolutely backwards so one of tasks of my code it to renumber all the bit fields. But the vendors do not follow the standard anyway. In one telemetry map the LSB is sometimes bit 0 and sometimes bit 1. In almost every word that has bit field definitions they have put a note that says the MSB is numbered as bit 0 or bit 1. They just cannot understand that the need to keep putting that note in there is a not so subtle indicating that they are doing things wrong. Further, they have at least six different formats for describing those bit fields. With 10,000 parameters in a telemetry stream, that becomes a nightmare for writing code to extract the data needed to process the parameters. End of rant It appears that when writing text files, Excel VBA code writes Unicode by default. Since Windows is now Unicode based, its seems much better to go with that. I am mostly there, but have not looked at my tokenizer code lately. (Each parameter is written to a text file, one line per parameter and as many as a dozen pieces of data in each line.) This text file must be in text rather than binary because I must be able to read it myself to check for errors. Other than log files, none of the real time work uses any text operations. I don't care if it takes 10 bytes per character to store the configuration file. Conclusion I'll go with Unicode all the way. Question What is this deal with this WCHAR in Visual Studio? One of the articles I found said WCHAR is equivalent to wchar_t, then said no more. Ok, but being a guy with sometimes too much self dou

                    L 1 Reply Last reply
                    0
                    • B bkelly13

                      Telemetry data is usually all numbers and all binary. Economy in bandwidth is a primary goal. The only text may be things like software version embedded in some parts. Even then those are treated as binary data and handed off to the display device. The text part is where I have a "bunch" of Excel code to build configuration files. Some assembly, make that much assembly, is required to translate the vendor telemetry map (describes all the fields of the data) to something directly usable by my application. When not running in mission mode the app can write copious log files so I can verify what it did and why. Those are all text based for easy reading. Unicode is fine there. Side note/rant IRIG (Inter Range Instrumentation Group) defines telemetry standards for all government ranges. A range is a place where things like bombs are dropped and missiles shot. That standard defines bit 1 as being the MSB and bit N being the LSB. It is absolutely backwards so one of tasks of my code it to renumber all the bit fields. But the vendors do not follow the standard anyway. In one telemetry map the LSB is sometimes bit 0 and sometimes bit 1. In almost every word that has bit field definitions they have put a note that says the MSB is numbered as bit 0 or bit 1. They just cannot understand that the need to keep putting that note in there is a not so subtle indicating that they are doing things wrong. Further, they have at least six different formats for describing those bit fields. With 10,000 parameters in a telemetry stream, that becomes a nightmare for writing code to extract the data needed to process the parameters. End of rant It appears that when writing text files, Excel VBA code writes Unicode by default. Since Windows is now Unicode based, its seems much better to go with that. I am mostly there, but have not looked at my tokenizer code lately. (Each parameter is written to a text file, one line per parameter and as many as a dozen pieces of data in each line.) This text file must be in text rather than binary because I must be able to read it myself to check for errors. Other than log files, none of the real time work uses any text operations. I don't care if it takes 10 bytes per character to store the configuration file. Conclusion I'll go with Unicode all the way. Question What is this deal with this WCHAR in Visual Studio? One of the articles I found said WCHAR is equivalent to wchar_t, then said no more. Ok, but being a guy with sometimes too much self dou

                      L Offline
                      L Offline
                      Lost User
                      wrote on last edited by
                      #12

                      bkelly13 wrote:

                      What is this deal with this WCHAR

                      If you right click your mouse on any of these types in your source code you can then select "Go to definition", which will bring up the include file where it's defined. You can see that WCHAR is defined in winnt.h as equivalent to wchar_t which is a fundamental type known by the compiler. The definition of WCHAR is required for porting to compilers that do not have that fundamental type (or did not in the days before C++). Use whichever type you are more comfortable with, although using WCHAR tends to give more flexibility if you ever need to port your code to some alternative platform.

                      B 1 Reply Last reply
                      0
                      • L Lost User

                        bkelly13 wrote:

                        What is this deal with this WCHAR

                        If you right click your mouse on any of these types in your source code you can then select "Go to definition", which will bring up the include file where it's defined. You can see that WCHAR is defined in winnt.h as equivalent to wchar_t which is a fundamental type known by the compiler. The definition of WCHAR is required for porting to compilers that do not have that fundamental type (or did not in the days before C++). Use whichever type you are more comfortable with, although using WCHAR tends to give more flexibility if you ever need to port your code to some alternative platform.

                        B Offline
                        B Offline
                        bkelly13
                        wrote on last edited by
                        #13

                        Re: Use whichever type you are more comfortable with, although using WCHAR tends to give more flexibility if you ever need to port your code to some alternative platform. I have been working with Microsoft VS for a while now and have not gotten out to play with others in a long time. I will go with that and stick with the WCHAR.

                        Thank you for your time If you work with telemetry, please check this bulletin board: www.irigbb.com

                        1 Reply Last reply
                        0
                        • D Daniel Pfeffer

                          Richard MacCutchan wrote:

                          If you make everything Unicode, you should not have any issues.

                          The OP is processing real-time telemetry, which is (these days) usually char-based. IMO, there is no good reason to convert the telemetry to Unicode before processing - it slows the processing, doubles the storage requirements, and adds nothing to any processing of numeric data. Similar considerations apply to the output.

                          If you have an important point to make, don't try to be subtle or clever. Use a pile driver. Hit the point once. Then come back and hit it again. Then hit it a third time - a tremendous whack. --Winston Churchill

                          T Offline
                          T Offline
                          Theo Buys
                          wrote on last edited by
                          #14

                          Daniel Pfeffer wrote:

                          Richard MacCutchan wrote:

                          If you make everything Unicode, you should not have any issues.

                          It depends on what you mean by Unicode... Windows API and UI use UTF-16 (started with Windows-NT 4.0) but if you generate output for a SMTP/email/WEB you must use UTF-8. For UTF-16 you can use CStringW or std::wstring but for UTF-8 CStringA or std::string. UTF-8 is a multibyte string format but it has nothing to do with the old MBCS which depend on codepages. In this case using CSting depended on the UNICODE define to make the code UTF-16 aware is now out of time and can shoot you in the foot. Conversions between UTF-16 and UTF-8 can be done with the current MultiByteToWideChar and WideCharToMultiByte. But if you write more general software, do it with the stl:

                          wstring_convert> converter;

                          The bad thing is that the current C++ Visual Studio editor can't handle utf-8 string literals. It is a Windows application you know...

                          1 Reply Last reply
                          0
                          Reply
                          • Reply as topic
                          Log in to reply
                          • Oldest to Newest
                          • Newest to Oldest
                          • Most Votes


                          • Login

                          • Don't have an account? Register

                          • Login or register to search.
                          • First post
                            Last post
                          0
                          • Categories
                          • Recent
                          • Tags
                          • Popular
                          • World
                          • Users
                          • Groups