Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
CODE PROJECT For Those Who Code
  • Home
  • Articles
  • FAQ
Community
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. Testing if Unicode

Testing if Unicode

Scheduled Pinned Locked Moved C / C++ / MFC
testingbeta-testingquestion
22 Posts 9 Posters 1 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • H Hans Dietrich

    Does anyone have a reliable implementation of IsTextUnicode()? I'm tired of dealing with it. All I can find on the internet are people complaining about it, but so far no better alternative. :(

    Best wishes, Hans


    [Hans Dietrich Software]

    R Offline
    R Offline
    Rajesh R Subramanian
    wrote on last edited by
    #2

    Man, that's shocking! I came here to ask the very same question. Please let me know if you find something useful (I'll do the same).

    Workout progress:
    Current arm size: 14.4in
    Desired arm size: 18in
    Next Target: 15.4in by Dec 2010

    Current training method: HIT

    H T 2 Replies Last reply
    0
    • R Rajesh R Subramanian

      Man, that's shocking! I came here to ask the very same question. Please let me know if you find something useful (I'll do the same).

      Workout progress:
      Current arm size: 14.4in
      Desired arm size: 18in
      Next Target: 15.4in by Dec 2010

      Current training method: HIT

      H Offline
      H Offline
      Hans Dietrich
      wrote on last edited by
      #3

      It's not shocking to me, it's pathetic and annoying. This situation has existed for many years, and MS continues to ship this crappy API.

      Best wishes, Hans


      [Hans Dietrich Software]

      1 Reply Last reply
      0
      • H Hans Dietrich

        Does anyone have a reliable implementation of IsTextUnicode()? I'm tired of dealing with it. All I can find on the internet are people complaining about it, but so far no better alternative. :(

        Best wishes, Hans


        [Hans Dietrich Software]

        L Offline
        L Offline
        Lost User
        wrote on last edited by
        #4

        The documentation for IsTextUnicode() states: Determines if a buffer is likely to contain a form of Unicode text.
        Which is to say, it can only test for the likelihood by sampling various bytes within the buffer. Since there is no byte or word pattern which absolutely guarantees that a buffer is Unicode there is no way that such a function can guarantee to recognise it.

        It's time for a new signature.

        R 1 Reply Last reply
        0
        • L Lost User

          The documentation for IsTextUnicode() states: Determines if a buffer is likely to contain a form of Unicode text.
          Which is to say, it can only test for the likelihood by sampling various bytes within the buffer. Since there is no byte or word pattern which absolutely guarantees that a buffer is Unicode there is no way that such a function can guarantee to recognise it.

          It's time for a new signature.

          R Offline
          R Offline
          Rajesh R Subramanian
          wrote on last edited by
          #5

          And I've asked them several times why they have that crappy API and they say it could help instead of nothing. But the problem is - it NEVER worked, so why have it?!

          Workout progress:
          Current arm size: 14.4in
          Desired arm size: 18in
          Next Target: 15.4in by Dec 2010

          Current training method: HIT

          L 1 Reply Last reply
          0
          • R Rajesh R Subramanian

            And I've asked them several times why they have that crappy API and they say it could help instead of nothing. But the problem is - it NEVER worked, so why have it?!

            Workout progress:
            Current arm size: 14.4in
            Desired arm size: 18in
            Next Target: 15.4in by Dec 2010

            Current training method: HIT

            L Offline
            L Offline
            Lost User
            wrote on last edited by
            #6

            You are right, I just tried it and what do you know - it doesn't work. See other answer below.

            It's time for a new signature.

            modified on Saturday, June 26, 2010 9:32 AM

            1 Reply Last reply
            0
            • H Hans Dietrich

              Does anyone have a reliable implementation of IsTextUnicode()? I'm tired of dealing with it. All I can find on the internet are people complaining about it, but so far no better alternative. :(

              Best wishes, Hans


              [Hans Dietrich Software]

              S Offline
              S Offline
              Software_Developer
              wrote on last edited by
              #7

              Nothing seems to work. The output of the code below is always Text is ASCII

              #define UNICODE
              #define _UNICODE
              #include <windows.h>
              #include <tchar.h>
              #include <wchar.h>
              #include <stdio.h>

              int main()
              {

              char \*asciimessage = "This is an ASCII string.";
              wchar\_t \*unicodemessage = L"This is a Wide Unicode string.";
              TCHAR \*automessage = TEXT("This message can be either ASCII or UNICODE!");
              
              
              
              if(IsTextUnicode(unicodemessage,80,NULL))
               printf("Text is Unicode\\n");
              else
                printf("Text is ASCII\\n");
              
              
              if(IsTextUnicode(asciimessage,80,NULL))
               printf("Text is Unicode\\n");
              else
                printf("Text is ASCII\\n");
              

              return 0;
              }

              L 1 Reply Last reply
              0
              • H Hans Dietrich

                Does anyone have a reliable implementation of IsTextUnicode()? I'm tired of dealing with it. All I can find on the internet are people complaining about it, but so far no better alternative. :(

                Best wishes, Hans


                [Hans Dietrich Software]

                L Offline
                L Offline
                Lost User
                wrote on last edited by
                #8

                This code works:

                char ascii\[\] = "The rain in Spain";
                WCHAR ucode\[\] = L"The rain in Spain";
                
                wcout << "Testing the ASCII string" << endl;
                if (IsTextUnicode(ascii, sizeof ascii, NULL))
                	wcout << "It is Unicode" << endl;
                else
                	wcout << "It is NOT Unicode" << endl;
                
                wcout << "Testing the Unicode string" << endl;
                if (IsTextUnicode(ucode, sizeof ucode, NULL))
                	wcout << "It is Unicode" << endl;
                else
                	wcout << "It is NOT Unicode" << endl;
                

                The issue seems to be that the buffer length must be correct, and parameter 3 should be NULL to ensure all possible tests are tried.

                It's time for a new signature.

                S R 2 Replies Last reply
                0
                • S Software_Developer

                  Nothing seems to work. The output of the code below is always Text is ASCII

                  #define UNICODE
                  #define _UNICODE
                  #include <windows.h>
                  #include <tchar.h>
                  #include <wchar.h>
                  #include <stdio.h>

                  int main()
                  {

                  char \*asciimessage = "This is an ASCII string.";
                  wchar\_t \*unicodemessage = L"This is a Wide Unicode string.";
                  TCHAR \*automessage = TEXT("This message can be either ASCII or UNICODE!");
                  
                  
                  
                  if(IsTextUnicode(unicodemessage,80,NULL))
                   printf("Text is Unicode\\n");
                  else
                    printf("Text is ASCII\\n");
                  
                  
                  if(IsTextUnicode(asciimessage,80,NULL))
                   printf("Text is Unicode\\n");
                  else
                    printf("Text is ASCII\\n");
                  

                  return 0;
                  }

                  L Offline
                  L Offline
                  Lost User
                  wrote on last edited by
                  #9

                  Your parameters are incorrect. You have specified 80 for the buffer length in both your calls, but neither of your strings are 80 bytes long, so the tests will be inspecting some random data beyond the end of the strings. See also my response below.

                  It's time for a new signature.

                  1 Reply Last reply
                  0
                  • L Lost User

                    This code works:

                    char ascii\[\] = "The rain in Spain";
                    WCHAR ucode\[\] = L"The rain in Spain";
                    
                    wcout << "Testing the ASCII string" << endl;
                    if (IsTextUnicode(ascii, sizeof ascii, NULL))
                    	wcout << "It is Unicode" << endl;
                    else
                    	wcout << "It is NOT Unicode" << endl;
                    
                    wcout << "Testing the Unicode string" << endl;
                    if (IsTextUnicode(ucode, sizeof ucode, NULL))
                    	wcout << "It is Unicode" << endl;
                    else
                    	wcout << "It is NOT Unicode" << endl;
                    

                    The issue seems to be that the buffer length must be correct, and parameter 3 should be NULL to ensure all possible tests are tried.

                    It's time for a new signature.

                    S Offline
                    S Offline
                    Software_Developer
                    wrote on last edited by
                    #10

                    Thanks Richard. I ran your code and it works. Output:

                    Testing the ASCII string
                    It is NOT Unicode
                    Testing the Unicode string
                    It is Unicode
                    Press any key to continue

                    Complete code listing

                    #define UNICODE
                    #define _UNICODE
                    #include <windows.h>
                    #include <tchar.h>
                    #include <wchar.h>
                    #include <stdio.h>
                    #include <iostream.h>

                    int main()
                    {
                    char ascii[] = "The rain in Spain";
                    WCHAR ucode[] = L"The rain in Spain";

                    cout << "Testing the ASCII string" << endl;
                    if (IsTextUnicode(ascii, sizeof ascii, NULL))
                    	cout << "It is Unicode" << endl;
                    else
                    	cout << "It is NOT Unicode" << endl;
                    
                    cout << "Testing the Unicode string" << endl;
                    if (IsTextUnicode(ucode, sizeof ucode, NULL))
                    	cout << "It is Unicode" << endl;
                    else
                    	cout << "It is NOT Unicode" << endl;
                    

                    return 0;

                    }

                    1 Reply Last reply
                    0
                    • L Lost User

                      This code works:

                      char ascii\[\] = "The rain in Spain";
                      WCHAR ucode\[\] = L"The rain in Spain";
                      
                      wcout << "Testing the ASCII string" << endl;
                      if (IsTextUnicode(ascii, sizeof ascii, NULL))
                      	wcout << "It is Unicode" << endl;
                      else
                      	wcout << "It is NOT Unicode" << endl;
                      
                      wcout << "Testing the Unicode string" << endl;
                      if (IsTextUnicode(ucode, sizeof ucode, NULL))
                      	wcout << "It is Unicode" << endl;
                      else
                      	wcout << "It is NOT Unicode" << endl;
                      

                      The issue seems to be that the buffer length must be correct, and parameter 3 should be NULL to ensure all possible tests are tried.

                      It's time for a new signature.

                      R Offline
                      R Offline
                      Rajesh R Subramanian
                      wrote on last edited by
                      #11

                      I wish it were as simple as that. Try this:

                      LPCWSTR str = L"कमल"; //If you see ???, then you don't have Indic fonts installed.
                      BOOL b = false;
                      b = IsTextUnicode(str, _tcslen(str), NULL);
                      if(b)
                      AfxMessageBox(L"Text is Unicode!");
                      else
                      AfxMessageBox(L"Boo!");

                      I've worked extensively with Asian languages and I've never had this API to work reiably. I know the documentation kinda confesses it, but there has been no development on this at all!

                      Richard MacCutchan wrote:

                      char ascii[] = "The rain in Spain";

                      Richard MacCutchan wrote:

                      if (IsTextUnicode(ascii, sizeof ascii, NULL))

                      Also, your example is flawed because you're wrongly passing the size of the pointer instead of passing the size of the buffer itself in bytes.

                      Workout progress:
                      Current arm size: 14.4in
                      Desired arm size: 18in
                      Next Target: 15.4in by Dec 2010

                      Current training method: HIT

                      N H A L 4 Replies Last reply
                      0
                      • R Rajesh R Subramanian

                        I wish it were as simple as that. Try this:

                        LPCWSTR str = L"कमल"; //If you see ???, then you don't have Indic fonts installed.
                        BOOL b = false;
                        b = IsTextUnicode(str, _tcslen(str), NULL);
                        if(b)
                        AfxMessageBox(L"Text is Unicode!");
                        else
                        AfxMessageBox(L"Boo!");

                        I've worked extensively with Asian languages and I've never had this API to work reiably. I know the documentation kinda confesses it, but there has been no development on this at all!

                        Richard MacCutchan wrote:

                        char ascii[] = "The rain in Spain";

                        Richard MacCutchan wrote:

                        if (IsTextUnicode(ascii, sizeof ascii, NULL))

                        Also, your example is flawed because you're wrongly passing the size of the pointer instead of passing the size of the buffer itself in bytes.

                        Workout progress:
                        Current arm size: 14.4in
                        Desired arm size: 18in
                        Next Target: 15.4in by Dec 2010

                        Current training method: HIT

                        N Offline
                        N Offline
                        Niklas L
                        wrote on last edited by
                        #12

                        Rajesh R Subramanian wrote:

                        Also, your example is flawed because you're wrongly passing the size of the pointer instead of passing the size of the buffer itself in bytes.

                        No. Last time I checked:

                        char ascii\[\] = "The rain in Spain";
                        
                        sizeof ascii == ::strlen(ascii) + 1
                        

                        home

                        R 1 Reply Last reply
                        0
                        • R Rajesh R Subramanian

                          I wish it were as simple as that. Try this:

                          LPCWSTR str = L"कमल"; //If you see ???, then you don't have Indic fonts installed.
                          BOOL b = false;
                          b = IsTextUnicode(str, _tcslen(str), NULL);
                          if(b)
                          AfxMessageBox(L"Text is Unicode!");
                          else
                          AfxMessageBox(L"Boo!");

                          I've worked extensively with Asian languages and I've never had this API to work reiably. I know the documentation kinda confesses it, but there has been no development on this at all!

                          Richard MacCutchan wrote:

                          char ascii[] = "The rain in Spain";

                          Richard MacCutchan wrote:

                          if (IsTextUnicode(ascii, sizeof ascii, NULL))

                          Also, your example is flawed because you're wrongly passing the size of the pointer instead of passing the size of the buffer itself in bytes.

                          Workout progress:
                          Current arm size: 14.4in
                          Desired arm size: 18in
                          Next Target: 15.4in by Dec 2010

                          Current training method: HIT

                          H Offline
                          H Offline
                          Hans Dietrich
                          wrote on last edited by
                          #13

                          I wonder what result you would get if the second param was number of bytes:

                          LPCWSTR str = L"कमल"; //If you see ???, then you don't have Indic fonts installed.
                          BOOL b = false;
                          b = IsTextUnicode(str, _tcslen(str)*sizeof(WCHAR), NULL);

                          Best wishes, Hans


                          [Hans Dietrich Software]

                          1 Reply Last reply
                          0
                          • R Rajesh R Subramanian

                            I wish it were as simple as that. Try this:

                            LPCWSTR str = L"कमल"; //If you see ???, then you don't have Indic fonts installed.
                            BOOL b = false;
                            b = IsTextUnicode(str, _tcslen(str), NULL);
                            if(b)
                            AfxMessageBox(L"Text is Unicode!");
                            else
                            AfxMessageBox(L"Boo!");

                            I've worked extensively with Asian languages and I've never had this API to work reiably. I know the documentation kinda confesses it, but there has been no development on this at all!

                            Richard MacCutchan wrote:

                            char ascii[] = "The rain in Spain";

                            Richard MacCutchan wrote:

                            if (IsTextUnicode(ascii, sizeof ascii, NULL))

                            Also, your example is flawed because you're wrongly passing the size of the pointer instead of passing the size of the buffer itself in bytes.

                            Workout progress:
                            Current arm size: 14.4in
                            Desired arm size: 18in
                            Next Target: 15.4in by Dec 2010

                            Current training method: HIT

                            A Offline
                            A Offline
                            Aescleal
                            wrote on last edited by
                            #14

                            What pointer? Richard was taking the size of an array. Ash

                            1 Reply Last reply
                            0
                            • R Rajesh R Subramanian

                              I wish it were as simple as that. Try this:

                              LPCWSTR str = L"कमल"; //If you see ???, then you don't have Indic fonts installed.
                              BOOL b = false;
                              b = IsTextUnicode(str, _tcslen(str), NULL);
                              if(b)
                              AfxMessageBox(L"Text is Unicode!");
                              else
                              AfxMessageBox(L"Boo!");

                              I've worked extensively with Asian languages and I've never had this API to work reiably. I know the documentation kinda confesses it, but there has been no development on this at all!

                              Richard MacCutchan wrote:

                              char ascii[] = "The rain in Spain";

                              Richard MacCutchan wrote:

                              if (IsTextUnicode(ascii, sizeof ascii, NULL))

                              Also, your example is flawed because you're wrongly passing the size of the pointer instead of passing the size of the buffer itself in bytes.

                              Workout progress:
                              Current arm size: 14.4in
                              Desired arm size: 18in
                              Next Target: 15.4in by Dec 2010

                              Current training method: HIT

                              L Offline
                              L Offline
                              Lost User
                              wrote on last edited by
                              #15

                              This works: (apologies if that is not Hindi, and I hope it's a nice word!)

                              WCHAR hindi[] = L"कमल";

                              wcout << sizeof hindi << endl;

                              if (IsTextUnicode(hindi, sizeof hindi, NULL))
                              wcout << "It is Unicode" << endl;

                              Note the result of the sizeof operator; sizeof arrayName gives size in bytes, which is the value required by the IsTextUnicode() function.

                              It's time for a new signature.

                              R 1 Reply Last reply
                              0
                              • N Niklas L

                                Rajesh R Subramanian wrote:

                                Also, your example is flawed because you're wrongly passing the size of the pointer instead of passing the size of the buffer itself in bytes.

                                No. Last time I checked:

                                char ascii\[\] = "The rain in Spain";
                                
                                sizeof ascii == ::strlen(ascii) + 1
                                

                                home

                                R Offline
                                R Offline
                                Rajesh R Subramanian
                                wrote on last edited by
                                #16

                                Yes, yes, apologies. See my reply to Richard. :)

                                Workout progress:
                                Current arm size: 14.4in
                                Desired arm size: 18in
                                Next Target: 15.4in by Dec 2010

                                Current training method: HIT

                                N 1 Reply Last reply
                                0
                                • L Lost User

                                  This works: (apologies if that is not Hindi, and I hope it's a nice word!)

                                  WCHAR hindi[] = L"कमल";

                                  wcout << sizeof hindi << endl;

                                  if (IsTextUnicode(hindi, sizeof hindi, NULL))
                                  wcout << "It is Unicode" << endl;

                                  Note the result of the sizeof operator; sizeof arrayName gives size in bytes, which is the value required by the IsTextUnicode() function.

                                  It's time for a new signature.

                                  R Offline
                                  R Offline
                                  Rajesh R Subramanian
                                  wrote on last edited by
                                  #17

                                  I must stop replying in the middle of the night. I overlooked it were size of an array. The word cannot be nicer, it means "Lotus". This probably means the API is now "improved"?! I'd be glad to go and test some of the old stuff that never worked. I'll keep you posted if they don't!

                                  Workout progress:
                                  Current arm size: 14.4in
                                  Desired arm size: 18in
                                  Next Target: 15.4in by Dec 2010

                                  Current training method: HIT

                                  L 1 Reply Last reply
                                  0
                                  • R Rajesh R Subramanian

                                    I must stop replying in the middle of the night. I overlooked it were size of an array. The word cannot be nicer, it means "Lotus". This probably means the API is now "improved"?! I'd be glad to go and test some of the old stuff that never worked. I'll keep you posted if they don't!

                                    Workout progress:
                                    Current arm size: 14.4in
                                    Desired arm size: 18in
                                    Next Target: 15.4in by Dec 2010

                                    Current training method: HIT

                                    L Offline
                                    L Offline
                                    Lost User
                                    wrote on last edited by
                                    #18

                                    Rajesh R Subramanian wrote:

                                    I must stop replying in the middle of the night.

                                    If it's any consolation I do this all the time in the middle of the day!

                                    It's time for a new signature.

                                    1 Reply Last reply
                                    0
                                    • R Rajesh R Subramanian

                                      Yes, yes, apologies. See my reply to Richard. :)

                                      Workout progress:
                                      Current arm size: 14.4in
                                      Desired arm size: 18in
                                      Next Target: 15.4in by Dec 2010

                                      Current training method: HIT

                                      N Offline
                                      N Offline
                                      Niklas L
                                      wrote on last edited by
                                      #19

                                      No worries, I consider it a typo :)

                                      home

                                      1 Reply Last reply
                                      0
                                      • H Hans Dietrich

                                        Does anyone have a reliable implementation of IsTextUnicode()? I'm tired of dealing with it. All I can find on the internet are people complaining about it, but so far no better alternative. :(

                                        Best wishes, Hans


                                        [Hans Dietrich Software]

                                        N Offline
                                        N Offline
                                        Nemanja Trifunovic
                                        wrote on last edited by
                                        #20

                                        Michael Kaplan does not like IsTextUnicode either.[^]

                                        utf8-cpp

                                        V 1 Reply Last reply
                                        0
                                        • N Nemanja Trifunovic

                                          Michael Kaplan does not like IsTextUnicode either.[^]

                                          utf8-cpp

                                          V Offline
                                          V Offline
                                          VeganFanatic
                                          wrote on last edited by
                                          #21

                                          One other idea that comes to mind try { cout << mystring << endl; catch wcout << mystring << endl; }

                                          http://www.contract-developer.tk

                                          1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Don't have an account? Register

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups