Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. UTF problem

UTF problem

Scheduled Pinned Locked Moved C / C++ / MFC
helpquestion
9 Posts 3 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S Offline
    S Offline
    Suneet 03
    wrote on last edited by
    #1

    I have Hindi Text and want to display it on a Dialog Box, but the problem comes when i m reading it from a file. It shows weird characters like "ऊ भ भ भ 分有点晕," I am using fread function to read these Hindi Text. and converting it to wide characters using mbstowcs(buf0,str0,l_Len0+1); SetDlgItemTextW(m_hWnd,IDC_MY_TEXT,buf0); where buf0 is wchar_t and str0 is the above weird characters. Can anyone tell where I am going wrong ???

    C M 2 Replies Last reply
    0
    • S Suneet 03

      I have Hindi Text and want to display it on a Dialog Box, but the problem comes when i m reading it from a file. It shows weird characters like "ऊ भ भ भ 分有点晕," I am using fread function to read these Hindi Text. and converting it to wide characters using mbstowcs(buf0,str0,l_Len0+1); SetDlgItemTextW(m_hWnd,IDC_MY_TEXT,buf0); where buf0 is wchar_t and str0 is the above weird characters. Can anyone tell where I am going wrong ???

      C Offline
      C Offline
      Cedric Moonen
      wrote on last edited by
      #2

      Suneet.03 wrote:

      SetDlgItemTextW

      Why are you using the specialized version of the function ? You should use SetDlgItemText instead, this will 'link' to SetDlgItemTextW if UNICODE is defined. If you get a compilation error, it probably means that you didn't define UNICODE and I don't think you'll be able to display a unicode string in that case.


      Cédric Moonen Software developer
      Charting control [v1.2]

      1 Reply Last reply
      0
      • S Suneet 03

        I have Hindi Text and want to display it on a Dialog Box, but the problem comes when i m reading it from a file. It shows weird characters like "ऊ भ भ भ 分有点晕," I am using fread function to read these Hindi Text. and converting it to wide characters using mbstowcs(buf0,str0,l_Len0+1); SetDlgItemTextW(m_hWnd,IDC_MY_TEXT,buf0); where buf0 is wchar_t and str0 is the above weird characters. Can anyone tell where I am going wrong ???

        M Offline
        M Offline
        Matthew Faithfull
        wrote on last edited by
        #3

        If the characters in the file are Hindi then they're probably already wide characters so you don't want to be converting them. Read wide characters from the file in the first place and just display them. (I'm guessing you're already using a Hindi capable Font like Arial UNICODE) If they're not wide characters but from a Hindi Code Page then things are more complex and you need to make sure your program is set to use that code page before doing the mbstowcs conversion. This will also depend on whether you're using Hindi Windows or US/English Windows with the input language set to Hindi. Welcome to the frustrating world of internationalisation (i18n for short) :)

        Nothing is exactly what it seems but everything with seems can be unpicked.

        S 1 Reply Last reply
        0
        • M Matthew Faithfull

          If the characters in the file are Hindi then they're probably already wide characters so you don't want to be converting them. Read wide characters from the file in the first place and just display them. (I'm guessing you're already using a Hindi capable Font like Arial UNICODE) If they're not wide characters but from a Hindi Code Page then things are more complex and you need to make sure your program is set to use that code page before doing the mbstowcs conversion. This will also depend on whether you're using Hindi Windows or US/English Windows with the input language set to Hindi. Welcome to the frustrating world of internationalisation (i18n for short) :)

          Nothing is exactly what it seems but everything with seems can be unpicked.

          S Offline
          S Offline
          Suneet 03
          wrote on last edited by
          #4

          yaa, I am using US/English Windows with Input language set to Hindi How can I make my program to use the Hindi Code page. and I am struck at reading them from file, why is it showing those characters

          M 1 Reply Last reply
          0
          • S Suneet 03

            yaa, I am using US/English Windows with Input language set to Hindi How can I make my program to use the Hindi Code page. and I am struck at reading them from file, why is it showing those characters

            M Offline
            M Offline
            Matthew Faithfull
            wrote on last edited by
            #5

            If you're using US/English Windows then you'll want to use UNICODE for your program rather than setting up for a specific code page. Have a look at MultiByteToWideChar[^] you'll need to find out which code page was used to create the text file and put that actual value into the first parameter instead of one the standard ones. I gather Hindi is normally 1137. Make sure your code is fully UNICODE compliant ( _T("") macros for the text and TCHAR or WCHAR or wchar_t everywhere and wcslen or tcslen type CLibrary calls instead of strlen, UNICODE and _UNICODE predefined as symbols for the project. ) Remember that whenever you see the characters, even in the debugger, you're seeing them translated in one way or another so don't worry if the Hindi doesn't show properly everywhere as long as it works when your code puts it on the screen.

            Nothing is exactly what it seems but everything with seems can be unpicked.

            S 1 Reply Last reply
            0
            • M Matthew Faithfull

              If you're using US/English Windows then you'll want to use UNICODE for your program rather than setting up for a specific code page. Have a look at MultiByteToWideChar[^] you'll need to find out which code page was used to create the text file and put that actual value into the first parameter instead of one the standard ones. I gather Hindi is normally 1137. Make sure your code is fully UNICODE compliant ( _T("") macros for the text and TCHAR or WCHAR or wchar_t everywhere and wcslen or tcslen type CLibrary calls instead of strlen, UNICODE and _UNICODE predefined as symbols for the project. ) Remember that whenever you see the characters, even in the debugger, you're seeing them translated in one way or another so don't worry if the Hindi doesn't show properly everywhere as long as it works when your code puts it on the screen.

              Nothing is exactly what it seems but everything with seems can be unpicked.

              S Offline
              S Offline
              Suneet 03
              wrote on last edited by
              #6

              Thanx for such quick replies. If that is the case then I fear , I have 2 make lot of changes.. sine I have used _MBCS during compilation and using strcpy and strlen everywhere.... Can you tell me instead of strcpy what function can i use.. same way as strlen is for _tcslen

              M 1 Reply Last reply
              0
              • S Suneet 03

                Thanx for such quick replies. If that is the case then I fear , I have 2 make lot of changes.. sine I have used _MBCS during compilation and using strcpy and strlen everywhere.... Can you tell me instead of strcpy what function can i use.. same way as strlen is for _tcslen

                M Offline
                M Offline
                Matthew Faithfull
                wrote on last edited by
                #7

                Programming for UNICODE is a big topic and you may want to do some research before embarking on a rewrite if your project. However here's a starter strcpy only handles 8 bit char(s) as you know wcscpy only handles 16 bit WCHAR(s) _mbscpy handles multibyte characters (unsigned long apparently) see here[^] _tcscpy is really a macro which turns into wcscpy if your build your project for UNICODE, _mbscpy if you build with _MBCS defined and strcpy if you build your project without UNICODE or _MBCS defined. Similarly TCHAR becomes WCHAR or char or presumably unsigned long. Personally I would stay clear of all the _MBCS stuff and stick with 2 builds, One with UNICODE defined and one without for the same source. Try to use the _t functions, like _tcslen, everywhere and remember to wrap your string constants with the _T macro e.g. _T("Some Text"). Most if not all your code should work in both builds. Any features that only work in UNICODE, like dealing with Hindi, might get excluded from the non UNICODE build with #ifdef UNICODE. Unfortunately things are not quite that simple. Microsoft now recommend we all use Safe String functions, see here[^] and all the various _s extended variants of the normal C Library functions with extra parameters to enable buffer overrun protection. I hope this is enough to give you a start. If it looks like a mess I'm afraid it is and the only way through is really to pick a standard way of doing things and stick to it rigerously. Of course it helps if you pick the right way for your application and that can be hard.

                Nothing is exactly what it seems but everything with seems can be unpicked.

                S 1 Reply Last reply
                0
                • M Matthew Faithfull

                  Programming for UNICODE is a big topic and you may want to do some research before embarking on a rewrite if your project. However here's a starter strcpy only handles 8 bit char(s) as you know wcscpy only handles 16 bit WCHAR(s) _mbscpy handles multibyte characters (unsigned long apparently) see here[^] _tcscpy is really a macro which turns into wcscpy if your build your project for UNICODE, _mbscpy if you build with _MBCS defined and strcpy if you build your project without UNICODE or _MBCS defined. Similarly TCHAR becomes WCHAR or char or presumably unsigned long. Personally I would stay clear of all the _MBCS stuff and stick with 2 builds, One with UNICODE defined and one without for the same source. Try to use the _t functions, like _tcslen, everywhere and remember to wrap your string constants with the _T macro e.g. _T("Some Text"). Most if not all your code should work in both builds. Any features that only work in UNICODE, like dealing with Hindi, might get excluded from the non UNICODE build with #ifdef UNICODE. Unfortunately things are not quite that simple. Microsoft now recommend we all use Safe String functions, see here[^] and all the various _s extended variants of the normal C Library functions with extra parameters to enable buffer overrun protection. I hope this is enough to give you a start. If it looks like a mess I'm afraid it is and the only way through is really to pick a standard way of doing things and stick to it rigerously. Of course it helps if you pick the right way for your application and that can be hard.

                  Nothing is exactly what it seems but everything with seems can be unpicked.

                  S Offline
                  S Offline
                  Suneet 03
                  wrote on last edited by
                  #8

                  After making the changes still , getting the weird characters while reading from file ( file has hindi characters) Can anyone tell how can i read a Text file having hindi charcters. Do I need to set some Font in my VC++ program so that i can display this Hindi Text on to the dialog. Confused !!!!!!!

                  M 1 Reply Last reply
                  0
                  • S Suneet 03

                    After making the changes still , getting the weird characters while reading from file ( file has hindi characters) Can anyone tell how can i read a Text file having hindi charcters. Do I need to set some Font in my VC++ program so that i can display this Hindi Text on to the dialog. Confused !!!!!!!

                    M Offline
                    M Offline
                    Matthew Faithfull
                    wrote on last edited by
                    #9

                    Sorry to hear you're still struggling. You will need the font used on the dialog to contain Hindi, e.g. Arial UNICODE MS. To determine if this is really the problem you need to check the values of the wide characters you're actually writing out. Then look them up against the font you're using in the Windows Character Map tool. Accessories/System Tools/Character Map on the Windows Start Menu. This will let you see what's in the font for that code point. If the font is not the issue and the code points are wrong then you need to take one step back and look again at how you get those character values. you could even try examining the original file in a hex editor to see if it contains valid character values that correspond to the characters you're expecting to see. It is possible that the byte-order of the 16bit words has been reversed, depending on where the file comes from and this could cause the characters to appear as rubbish. For example U+0936 (Hexidecial) is श but U+3609 (Hexidecimal) is missing even fom Arial UNICODE MS and would probably result in a blank or a square box. You could always post some of the original file if you can't make any sense of it.

                    Nothing is exactly what it seems but everything with seems can be unpicked.

                    1 Reply Last reply
                    0
                    Reply
                    • Reply as topic
                    Log in to reply
                    • Oldest to Newest
                    • Newest to Oldest
                    • Most Votes


                    • Login

                    • Don't have an account? Register

                    • Login or register to search.
                    • First post
                      Last post
                    0
                    • Categories
                    • Recent
                    • Tags
                    • Popular
                    • World
                    • Users
                    • Groups