Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. API for UNICODE

API for UNICODE

Scheduled Pinned Locked Moved C / C++ / MFC
jsonquestion
7 Posts 6 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S Offline
    S Offline
    sandeepkavade
    wrote on last edited by
    #1

    hi all, is there any api to check whether the file contains unicode, utf-8 or ansi characters?

    A N 2 Replies Last reply
    0
    • S sandeepkavade

      hi all, is there any api to check whether the file contains unicode, utf-8 or ansi characters?

      A Offline
      A Offline
      Abhijeet Pathak
      wrote on last edited by
      #2

      IsTextUnicode() function can be used to check if the text is unicode... The following code might help you: int IsUnicodeFile(char* szFileName) {     FILE *fpUnicode;     char l_szCharBuffer[80];     //Open the file     if((fpUnicode= fopen(szFileName,"r")) == NULL)     return 0; //Unable to open file     if(!feof(fpUnicode))     {         fread(l_szCharBuffer,80,1,fpUnicode);         fclose(fpUnicode);         if(IsTextUnicode(l_szCharBuffer,80,NULL))         {             return 2; //Text is Unicode         }         else         {             return 1; //Text is ASCII         }     }     return 0; // Some error happened } :)

      1 Reply Last reply
      0
      • S sandeepkavade

        hi all, is there any api to check whether the file contains unicode, utf-8 or ansi characters?

        N Offline
        N Offline
        Nibu babu thomas
        wrote on last edited by
        #3

        sandeepkavade wrote:

        is there any api to check whether the file contains unicode, utf-8 or ansi characters?

        First few bytes of a file determine the nature of a file... If the first three bytes of a file are EF, BB and BF, the file is a UTF-8 file. If the first two bytes are FE and FF, the file is a Unicode file.


        Nibu thomas A Developer Code must be written to be read, not by the compiler, but by another human being. http:\\nibuthomas.wordpress.com

        S N 2 Replies Last reply
        0
        • N Nibu babu thomas

          sandeepkavade wrote:

          is there any api to check whether the file contains unicode, utf-8 or ansi characters?

          First few bytes of a file determine the nature of a file... If the first three bytes of a file are EF, BB and BF, the file is a UTF-8 file. If the first two bytes are FE and FF, the file is a Unicode file.


          Nibu thomas A Developer Code must be written to be read, not by the compiler, but by another human being. http:\\nibuthomas.wordpress.com

          S Offline
          S Offline
          sandeepkavade
          wrote on last edited by
          #4

          hi thomas i am very new to VC++. it would be really thankful if you could tell me what is this EF, BB and BF. and how to determine them? Thanx in advance.

          R 1 Reply Last reply
          0
          • N Nibu babu thomas

            sandeepkavade wrote:

            is there any api to check whether the file contains unicode, utf-8 or ansi characters?

            First few bytes of a file determine the nature of a file... If the first three bytes of a file are EF, BB and BF, the file is a UTF-8 file. If the first two bytes are FE and FF, the file is a Unicode file.


            Nibu thomas A Developer Code must be written to be read, not by the compiler, but by another human being. http:\\nibuthomas.wordpress.com

            N Offline
            N Offline
            Nemanja Trifunovic
            wrote on last edited by
            #5

            Nibu babu thomas wrote:

            First few bytes of a file determine the nature of a file... If the first three bytes of a file are EF, BB and BF, the file is a UTF-8 file. If the first two bytes are FE and FF, the file is a Unicode file.

            That's not a reliable way to determine whether a file contains Unicode. UTF-8 is not required to start with a byte-order mark, and files with UTF-16LE and UTF-16BE encodings are actually forbiden to start with it.


            Programming Blog utf8-cpp

            R 1 Reply Last reply
            0
            • S sandeepkavade

              hi thomas i am very new to VC++. it would be really thankful if you could tell me what is this EF, BB and BF. and how to determine them? Thanx in advance.

              R Offline
              R Offline
              Rage
              wrote on last edited by
              #6

              These are hex numbers : 0xEF = 239, 0xBB= 187, ... Simply read these bytes from the file header and compare them to these numbers.

              http://www.readytogiveup.com/[^] - Do something special today. http://www.totalcoaching.ca/[^] - Give me some feedback about this site !

              1 Reply Last reply
              0
              • N Nemanja Trifunovic

                Nibu babu thomas wrote:

                First few bytes of a file determine the nature of a file... If the first three bytes of a file are EF, BB and BF, the file is a UTF-8 file. If the first two bytes are FE and FF, the file is a Unicode file.

                That's not a reliable way to determine whether a file contains Unicode. UTF-8 is not required to start with a byte-order mark, and files with UTF-16LE and UTF-16BE encodings are actually forbiden to start with it.


                Programming Blog utf8-cpp

                R Offline
                R Offline
                Ralf Lohmueller
                wrote on last edited by
                #7

                Nemanja Trifunovic wrote:

                UTF-8 is not required to start with a byte-order mark, and files with UTF-16LE and UTF-16BE encodings are actually forbiden to start with it.

                Sorry, why UTF-16(little/big endian) are actually forbidden?

                1 Reply Last reply
                0
                Reply
                • Reply as topic
                Log in to reply
                • Oldest to Newest
                • Newest to Oldest
                • Most Votes


                • Login

                • Don't have an account? Register

                • Login or register to search.
                • First post
                  Last post
                0
                • Categories
                • Recent
                • Tags
                • Popular
                • World
                • Users
                • Groups