Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
CODE PROJECT For Those Who Code
  • Home
  • Articles
  • FAQ
Community
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. Testing if Unicode

Testing if Unicode

Scheduled Pinned Locked Moved C / C++ / MFC
testingbeta-testingquestion
22 Posts 9 Posters 1 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • H Offline
    H Offline
    Hans Dietrich
    wrote on last edited by
    #1

    Does anyone have a reliable implementation of IsTextUnicode()? I'm tired of dealing with it. All I can find on the internet are people complaining about it, but so far no better alternative. :(

    Best wishes, Hans


    [Hans Dietrich Software]

    R L S N 5 Replies Last reply
    0
    • H Hans Dietrich

      Does anyone have a reliable implementation of IsTextUnicode()? I'm tired of dealing with it. All I can find on the internet are people complaining about it, but so far no better alternative. :(

      Best wishes, Hans


      [Hans Dietrich Software]

      R Offline
      R Offline
      Rajesh R Subramanian
      wrote on last edited by
      #2

      Man, that's shocking! I came here to ask the very same question. Please let me know if you find something useful (I'll do the same).

      Workout progress:
      Current arm size: 14.4in
      Desired arm size: 18in
      Next Target: 15.4in by Dec 2010

      Current training method: HIT

      H T 2 Replies Last reply
      0
      • R Rajesh R Subramanian

        Man, that's shocking! I came here to ask the very same question. Please let me know if you find something useful (I'll do the same).

        Workout progress:
        Current arm size: 14.4in
        Desired arm size: 18in
        Next Target: 15.4in by Dec 2010

        Current training method: HIT

        H Offline
        H Offline
        Hans Dietrich
        wrote on last edited by
        #3

        It's not shocking to me, it's pathetic and annoying. This situation has existed for many years, and MS continues to ship this crappy API.

        Best wishes, Hans


        [Hans Dietrich Software]

        1 Reply Last reply
        0
        • H Hans Dietrich

          Does anyone have a reliable implementation of IsTextUnicode()? I'm tired of dealing with it. All I can find on the internet are people complaining about it, but so far no better alternative. :(

          Best wishes, Hans


          [Hans Dietrich Software]

          L Offline
          L Offline
          Lost User
          wrote on last edited by
          #4

          The documentation for IsTextUnicode() states: Determines if a buffer is likely to contain a form of Unicode text.
          Which is to say, it can only test for the likelihood by sampling various bytes within the buffer. Since there is no byte or word pattern which absolutely guarantees that a buffer is Unicode there is no way that such a function can guarantee to recognise it.

          It's time for a new signature.

          R 1 Reply Last reply
          0
          • L Lost User

            The documentation for IsTextUnicode() states: Determines if a buffer is likely to contain a form of Unicode text.
            Which is to say, it can only test for the likelihood by sampling various bytes within the buffer. Since there is no byte or word pattern which absolutely guarantees that a buffer is Unicode there is no way that such a function can guarantee to recognise it.

            It's time for a new signature.

            R Offline
            R Offline
            Rajesh R Subramanian
            wrote on last edited by
            #5

            And I've asked them several times why they have that crappy API and they say it could help instead of nothing. But the problem is - it NEVER worked, so why have it?!

            Workout progress:
            Current arm size: 14.4in
            Desired arm size: 18in
            Next Target: 15.4in by Dec 2010

            Current training method: HIT

            L 1 Reply Last reply
            0
            • R Rajesh R Subramanian

              And I've asked them several times why they have that crappy API and they say it could help instead of nothing. But the problem is - it NEVER worked, so why have it?!

              Workout progress:
              Current arm size: 14.4in
              Desired arm size: 18in
              Next Target: 15.4in by Dec 2010

              Current training method: HIT

              L Offline
              L Offline
              Lost User
              wrote on last edited by
              #6

              You are right, I just tried it and what do you know - it doesn't work. See other answer below.

              It's time for a new signature.

              modified on Saturday, June 26, 2010 9:32 AM

              1 Reply Last reply
              0
              • H Hans Dietrich

                Does anyone have a reliable implementation of IsTextUnicode()? I'm tired of dealing with it. All I can find on the internet are people complaining about it, but so far no better alternative. :(

                Best wishes, Hans


                [Hans Dietrich Software]

                S Offline
                S Offline
                Software_Developer
                wrote on last edited by
                #7

                Nothing seems to work. The output of the code below is always Text is ASCII

                #define UNICODE
                #define _UNICODE
                #include <windows.h>
                #include <tchar.h>
                #include <wchar.h>
                #include <stdio.h>

                int main()
                {

                char \*asciimessage = "This is an ASCII string.";
                wchar\_t \*unicodemessage = L"This is a Wide Unicode string.";
                TCHAR \*automessage = TEXT("This message can be either ASCII or UNICODE!");
                
                
                
                if(IsTextUnicode(unicodemessage,80,NULL))
                 printf("Text is Unicode\\n");
                else
                  printf("Text is ASCII\\n");
                
                
                if(IsTextUnicode(asciimessage,80,NULL))
                 printf("Text is Unicode\\n");
                else
                  printf("Text is ASCII\\n");
                

                return 0;
                }

                L 1 Reply Last reply
                0
                • H Hans Dietrich

                  Does anyone have a reliable implementation of IsTextUnicode()? I'm tired of dealing with it. All I can find on the internet are people complaining about it, but so far no better alternative. :(

                  Best wishes, Hans


                  [Hans Dietrich Software]

                  L Offline
                  L Offline
                  Lost User
                  wrote on last edited by
                  #8

                  This code works:

                  char ascii\[\] = "The rain in Spain";
                  WCHAR ucode\[\] = L"The rain in Spain";
                  
                  wcout << "Testing the ASCII string" << endl;
                  if (IsTextUnicode(ascii, sizeof ascii, NULL))
                  	wcout << "It is Unicode" << endl;
                  else
                  	wcout << "It is NOT Unicode" << endl;
                  
                  wcout << "Testing the Unicode string" << endl;
                  if (IsTextUnicode(ucode, sizeof ucode, NULL))
                  	wcout << "It is Unicode" << endl;
                  else
                  	wcout << "It is NOT Unicode" << endl;
                  

                  The issue seems to be that the buffer length must be correct, and parameter 3 should be NULL to ensure all possible tests are tried.

                  It's time for a new signature.

                  S R 2 Replies Last reply
                  0
                  • S Software_Developer

                    Nothing seems to work. The output of the code below is always Text is ASCII

                    #define UNICODE
                    #define _UNICODE
                    #include <windows.h>
                    #include <tchar.h>
                    #include <wchar.h>
                    #include <stdio.h>

                    int main()
                    {

                    char \*asciimessage = "This is an ASCII string.";
                    wchar\_t \*unicodemessage = L"This is a Wide Unicode string.";
                    TCHAR \*automessage = TEXT("This message can be either ASCII or UNICODE!");
                    
                    
                    
                    if(IsTextUnicode(unicodemessage,80,NULL))
                     printf("Text is Unicode\\n");
                    else
                      printf("Text is ASCII\\n");
                    
                    
                    if(IsTextUnicode(asciimessage,80,NULL))
                     printf("Text is Unicode\\n");
                    else
                      printf("Text is ASCII\\n");
                    

                    return 0;
                    }

                    L Offline
                    L Offline
                    Lost User
                    wrote on last edited by
                    #9

                    Your parameters are incorrect. You have specified 80 for the buffer length in both your calls, but neither of your strings are 80 bytes long, so the tests will be inspecting some random data beyond the end of the strings. See also my response below.

                    It's time for a new signature.

                    1 Reply Last reply
                    0
                    • L Lost User

                      This code works:

                      char ascii\[\] = "The rain in Spain";
                      WCHAR ucode\[\] = L"The rain in Spain";
                      
                      wcout << "Testing the ASCII string" << endl;
                      if (IsTextUnicode(ascii, sizeof ascii, NULL))
                      	wcout << "It is Unicode" << endl;
                      else
                      	wcout << "It is NOT Unicode" << endl;
                      
                      wcout << "Testing the Unicode string" << endl;
                      if (IsTextUnicode(ucode, sizeof ucode, NULL))
                      	wcout << "It is Unicode" << endl;
                      else
                      	wcout << "It is NOT Unicode" << endl;
                      

                      The issue seems to be that the buffer length must be correct, and parameter 3 should be NULL to ensure all possible tests are tried.

                      It's time for a new signature.

                      S Offline
                      S Offline
                      Software_Developer
                      wrote on last edited by
                      #10

                      Thanks Richard. I ran your code and it works. Output:

                      Testing the ASCII string
                      It is NOT Unicode
                      Testing the Unicode string
                      It is Unicode
                      Press any key to continue

                      Complete code listing

                      #define UNICODE
                      #define _UNICODE
                      #include <windows.h>
                      #include <tchar.h>
                      #include <wchar.h>
                      #include <stdio.h>
                      #include <iostream.h>

                      int main()
                      {
                      char ascii[] = "The rain in Spain";
                      WCHAR ucode[] = L"The rain in Spain";

                      cout << "Testing the ASCII string" << endl;
                      if (IsTextUnicode(ascii, sizeof ascii, NULL))
                      	cout << "It is Unicode" << endl;
                      else
                      	cout << "It is NOT Unicode" << endl;
                      
                      cout << "Testing the Unicode string" << endl;
                      if (IsTextUnicode(ucode, sizeof ucode, NULL))
                      	cout << "It is Unicode" << endl;
                      else
                      	cout << "It is NOT Unicode" << endl;
                      

                      return 0;

                      }

                      1 Reply Last reply
                      0
                      • L Lost User

                        This code works:

                        char ascii\[\] = "The rain in Spain";
                        WCHAR ucode\[\] = L"The rain in Spain";
                        
                        wcout << "Testing the ASCII string" << endl;
                        if (IsTextUnicode(ascii, sizeof ascii, NULL))
                        	wcout << "It is Unicode" << endl;
                        else
                        	wcout << "It is NOT Unicode" << endl;
                        
                        wcout << "Testing the Unicode string" << endl;
                        if (IsTextUnicode(ucode, sizeof ucode, NULL))
                        	wcout << "It is Unicode" << endl;
                        else
                        	wcout << "It is NOT Unicode" << endl;
                        

                        The issue seems to be that the buffer length must be correct, and parameter 3 should be NULL to ensure all possible tests are tried.

                        It's time for a new signature.

                        R Offline
                        R Offline
                        Rajesh R Subramanian
                        wrote on last edited by
                        #11

                        I wish it were as simple as that. Try this:

                        LPCWSTR str = L"कमल"; //If you see ???, then you don't have Indic fonts installed.
                        BOOL b = false;
                        b = IsTextUnicode(str, _tcslen(str), NULL);
                        if(b)
                        AfxMessageBox(L"Text is Unicode!");
                        else
                        AfxMessageBox(L"Boo!");

                        I've worked extensively with Asian languages and I've never had this API to work reiably. I know the documentation kinda confesses it, but there has been no development on this at all!

                        Richard MacCutchan wrote:

                        char ascii[] = "The rain in Spain";

                        Richard MacCutchan wrote:

                        if (IsTextUnicode(ascii, sizeof ascii, NULL))

                        Also, your example is flawed because you're wrongly passing the size of the pointer instead of passing the size of the buffer itself in bytes.

                        Workout progress:
                        Current arm size: 14.4in
                        Desired arm size: 18in
                        Next Target: 15.4in by Dec 2010

                        Current training method: HIT

                        N H A L 4 Replies Last reply
                        0
                        • R Rajesh R Subramanian

                          I wish it were as simple as that. Try this:

                          LPCWSTR str = L"कमल"; //If you see ???, then you don't have Indic fonts installed.
                          BOOL b = false;
                          b = IsTextUnicode(str, _tcslen(str), NULL);
                          if(b)
                          AfxMessageBox(L"Text is Unicode!");
                          else
                          AfxMessageBox(L"Boo!");

                          I've worked extensively with Asian languages and I've never had this API to work reiably. I know the documentation kinda confesses it, but there has been no development on this at all!

                          Richard MacCutchan wrote:

                          char ascii[] = "The rain in Spain";

                          Richard MacCutchan wrote:

                          if (IsTextUnicode(ascii, sizeof ascii, NULL))

                          Also, your example is flawed because you're wrongly passing the size of the pointer instead of passing the size of the buffer itself in bytes.

                          Workout progress:
                          Current arm size: 14.4in
                          Desired arm size: 18in
                          Next Target: 15.4in by Dec 2010

                          Current training method: HIT

                          N Offline
                          N Offline
                          Niklas L
                          wrote on last edited by
                          #12

                          Rajesh R Subramanian wrote:

                          Also, your example is flawed because you're wrongly passing the size of the pointer instead of passing the size of the buffer itself in bytes.

                          No. Last time I checked:

                          char ascii\[\] = "The rain in Spain";
                          
                          sizeof ascii == ::strlen(ascii) + 1
                          

                          home

                          R 1 Reply Last reply
                          0
                          • R Rajesh R Subramanian

                            I wish it were as simple as that. Try this:

                            LPCWSTR str = L"कमल"; //If you see ???, then you don't have Indic fonts installed.
                            BOOL b = false;
                            b = IsTextUnicode(str, _tcslen(str), NULL);
                            if(b)
                            AfxMessageBox(L"Text is Unicode!");
                            else
                            AfxMessageBox(L"Boo!");

                            I've worked extensively with Asian languages and I've never had this API to work reiably. I know the documentation kinda confesses it, but there has been no development on this at all!

                            Richard MacCutchan wrote:

                            char ascii[] = "The rain in Spain";

                            Richard MacCutchan wrote:

                            if (IsTextUnicode(ascii, sizeof ascii, NULL))

                            Also, your example is flawed because you're wrongly passing the size of the pointer instead of passing the size of the buffer itself in bytes.

                            Workout progress:
                            Current arm size: 14.4in
                            Desired arm size: 18in
                            Next Target: 15.4in by Dec 2010

                            Current training method: HIT

                            H Offline
                            H Offline
                            Hans Dietrich
                            wrote on last edited by
                            #13

                            I wonder what result you would get if the second param was number of bytes:

                            LPCWSTR str = L"कमल"; //If you see ???, then you don't have Indic fonts installed.
                            BOOL b = false;
                            b = IsTextUnicode(str, _tcslen(str)*sizeof(WCHAR), NULL);

                            Best wishes, Hans


                            [Hans Dietrich Software]

                            1 Reply Last reply
                            0
                            • R Rajesh R Subramanian

                              I wish it were as simple as that. Try this:

                              LPCWSTR str = L"कमल"; //If you see ???, then you don't have Indic fonts installed.
                              BOOL b = false;
                              b = IsTextUnicode(str, _tcslen(str), NULL);
                              if(b)
                              AfxMessageBox(L"Text is Unicode!");
                              else
                              AfxMessageBox(L"Boo!");

                              I've worked extensively with Asian languages and I've never had this API to work reiably. I know the documentation kinda confesses it, but there has been no development on this at all!

                              Richard MacCutchan wrote:

                              char ascii[] = "The rain in Spain";

                              Richard MacCutchan wrote:

                              if (IsTextUnicode(ascii, sizeof ascii, NULL))

                              Also, your example is flawed because you're wrongly passing the size of the pointer instead of passing the size of the buffer itself in bytes.

                              Workout progress:
                              Current arm size: 14.4in
                              Desired arm size: 18in
                              Next Target: 15.4in by Dec 2010

                              Current training method: HIT

                              A Offline
                              A Offline
                              Aescleal
                              wrote on last edited by
                              #14

                              What pointer? Richard was taking the size of an array. Ash

                              1 Reply Last reply
                              0
                              • R Rajesh R Subramanian

                                I wish it were as simple as that. Try this:

                                LPCWSTR str = L"कमल"; //If you see ???, then you don't have Indic fonts installed.
                                BOOL b = false;
                                b = IsTextUnicode(str, _tcslen(str), NULL);
                                if(b)
                                AfxMessageBox(L"Text is Unicode!");
                                else
                                AfxMessageBox(L"Boo!");

                                I've worked extensively with Asian languages and I've never had this API to work reiably. I know the documentation kinda confesses it, but there has been no development on this at all!

                                Richard MacCutchan wrote:

                                char ascii[] = "The rain in Spain";

                                Richard MacCutchan wrote:

                                if (IsTextUnicode(ascii, sizeof ascii, NULL))

                                Also, your example is flawed because you're wrongly passing the size of the pointer instead of passing the size of the buffer itself in bytes.

                                Workout progress:
                                Current arm size: 14.4in
                                Desired arm size: 18in
                                Next Target: 15.4in by Dec 2010

                                Current training method: HIT

                                L Offline
                                L Offline
                                Lost User
                                wrote on last edited by
                                #15

                                This works: (apologies if that is not Hindi, and I hope it's a nice word!)

                                WCHAR hindi[] = L"कमल";

                                wcout << sizeof hindi << endl;

                                if (IsTextUnicode(hindi, sizeof hindi, NULL))
                                wcout << "It is Unicode" << endl;

                                Note the result of the sizeof operator; sizeof arrayName gives size in bytes, which is the value required by the IsTextUnicode() function.

                                It's time for a new signature.

                                R 1 Reply Last reply
                                0
                                • L Lost User

                                  This works: (apologies if that is not Hindi, and I hope it's a nice word!)

                                  WCHAR hindi[] = L"कमल";

                                  wcout << sizeof hindi << endl;

                                  if (IsTextUnicode(hindi, sizeof hindi, NULL))
                                  wcout << "It is Unicode" << endl;

                                  Note the result of the sizeof operator; sizeof arrayName gives size in bytes, which is the value required by the IsTextUnicode() function.

                                  It's time for a new signature.

                                  R Offline
                                  R Offline
                                  Rajesh R Subramanian
                                  wrote on last edited by
                                  #16

                                  I must stop replying in the middle of the night. I overlooked it were size of an array. The word cannot be nicer, it means "Lotus". This probably means the API is now "improved"?! I'd be glad to go and test some of the old stuff that never worked. I'll keep you posted if they don't!

                                  Workout progress:
                                  Current arm size: 14.4in
                                  Desired arm size: 18in
                                  Next Target: 15.4in by Dec 2010

                                  Current training method: HIT

                                  L 1 Reply Last reply
                                  0
                                  • N Niklas L

                                    Rajesh R Subramanian wrote:

                                    Also, your example is flawed because you're wrongly passing the size of the pointer instead of passing the size of the buffer itself in bytes.

                                    No. Last time I checked:

                                    char ascii\[\] = "The rain in Spain";
                                    
                                    sizeof ascii == ::strlen(ascii) + 1
                                    

                                    home

                                    R Offline
                                    R Offline
                                    Rajesh R Subramanian
                                    wrote on last edited by
                                    #17

                                    Yes, yes, apologies. See my reply to Richard. :)

                                    Workout progress:
                                    Current arm size: 14.4in
                                    Desired arm size: 18in
                                    Next Target: 15.4in by Dec 2010

                                    Current training method: HIT

                                    N 1 Reply Last reply
                                    0
                                    • R Rajesh R Subramanian

                                      I must stop replying in the middle of the night. I overlooked it were size of an array. The word cannot be nicer, it means "Lotus". This probably means the API is now "improved"?! I'd be glad to go and test some of the old stuff that never worked. I'll keep you posted if they don't!

                                      Workout progress:
                                      Current arm size: 14.4in
                                      Desired arm size: 18in
                                      Next Target: 15.4in by Dec 2010

                                      Current training method: HIT

                                      L Offline
                                      L Offline
                                      Lost User
                                      wrote on last edited by
                                      #18

                                      Rajesh R Subramanian wrote:

                                      I must stop replying in the middle of the night.

                                      If it's any consolation I do this all the time in the middle of the day!

                                      It's time for a new signature.

                                      1 Reply Last reply
                                      0
                                      • R Rajesh R Subramanian

                                        Yes, yes, apologies. See my reply to Richard. :)

                                        Workout progress:
                                        Current arm size: 14.4in
                                        Desired arm size: 18in
                                        Next Target: 15.4in by Dec 2010

                                        Current training method: HIT

                                        N Offline
                                        N Offline
                                        Niklas L
                                        wrote on last edited by
                                        #19

                                        No worries, I consider it a typo :)

                                        home

                                        1 Reply Last reply
                                        0
                                        • H Hans Dietrich

                                          Does anyone have a reliable implementation of IsTextUnicode()? I'm tired of dealing with it. All I can find on the internet are people complaining about it, but so far no better alternative. :(

                                          Best wishes, Hans


                                          [Hans Dietrich Software]

                                          N Offline
                                          N Offline
                                          Nemanja Trifunovic
                                          wrote on last edited by
                                          #20

                                          Michael Kaplan does not like IsTextUnicode either.[^]

                                          utf8-cpp

                                          V 1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Don't have an account? Register

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups