API for UNICODE
-
hi all, is there any api to check whether the file contains unicode, utf-8 or ansi characters?
-
hi all, is there any api to check whether the file contains unicode, utf-8 or ansi characters?
IsTextUnicode()
function can be used to check if the text is unicode... The following code might help you:int IsUnicodeFile(char* szFileName) { FILE *fpUnicode; char l_szCharBuffer[80]; //Open the file if((fpUnicode= fopen(szFileName,"r")) == NULL) return 0; //Unable to open file if(!feof(fpUnicode)) { fread(l_szCharBuffer,80,1,fpUnicode); fclose(fpUnicode); if(IsTextUnicode(l_szCharBuffer,80,NULL)) { return 2; //Text is Unicode } else { return 1; //Text is ASCII } } return 0; // Some error happened }
:) -
hi all, is there any api to check whether the file contains unicode, utf-8 or ansi characters?
sandeepkavade wrote:
is there any api to check whether the file contains unicode, utf-8 or ansi characters?
First few bytes of a file determine the nature of a file... If the first three bytes of a file are EF, BB and BF, the file is a UTF-8 file. If the first two bytes are FE and FF, the file is a Unicode file.
Nibu thomas A Developer Code must be written to be read, not by the compiler, but by another human being. http:\\nibuthomas.wordpress.com
-
sandeepkavade wrote:
is there any api to check whether the file contains unicode, utf-8 or ansi characters?
First few bytes of a file determine the nature of a file... If the first three bytes of a file are EF, BB and BF, the file is a UTF-8 file. If the first two bytes are FE and FF, the file is a Unicode file.
Nibu thomas A Developer Code must be written to be read, not by the compiler, but by another human being. http:\\nibuthomas.wordpress.com
hi thomas i am very new to VC++. it would be really thankful if you could tell me what is this EF, BB and BF. and how to determine them? Thanx in advance.
-
sandeepkavade wrote:
is there any api to check whether the file contains unicode, utf-8 or ansi characters?
First few bytes of a file determine the nature of a file... If the first three bytes of a file are EF, BB and BF, the file is a UTF-8 file. If the first two bytes are FE and FF, the file is a Unicode file.
Nibu thomas A Developer Code must be written to be read, not by the compiler, but by another human being. http:\\nibuthomas.wordpress.com
Nibu babu thomas wrote:
First few bytes of a file determine the nature of a file... If the first three bytes of a file are EF, BB and BF, the file is a UTF-8 file. If the first two bytes are FE and FF, the file is a Unicode file.
That's not a reliable way to determine whether a file contains Unicode. UTF-8 is not required to start with a byte-order mark, and files with UTF-16LE and UTF-16BE encodings are actually forbiden to start with it.
-
hi thomas i am very new to VC++. it would be really thankful if you could tell me what is this EF, BB and BF. and how to determine them? Thanx in advance.
These are hex numbers : 0xEF = 239, 0xBB= 187, ... Simply read these bytes from the file header and compare them to these numbers.
http://www.readytogiveup.com/[^] - Do something special today. http://www.totalcoaching.ca/[^] - Give me some feedback about this site !
-
Nibu babu thomas wrote:
First few bytes of a file determine the nature of a file... If the first three bytes of a file are EF, BB and BF, the file is a UTF-8 file. If the first two bytes are FE and FF, the file is a Unicode file.
That's not a reliable way to determine whether a file contains Unicode. UTF-8 is not required to start with a byte-order mark, and files with UTF-16LE and UTF-16BE encodings are actually forbiden to start with it.
Nemanja Trifunovic wrote:
UTF-8 is not required to start with a byte-order mark, and files with UTF-16LE and UTF-16BE encodings are actually forbiden to start with it.
Sorry, why UTF-16(little/big endian) are actually forbidden?