Hi, if a text file is encoded using ASCII or ANSI or some other 8-bit character set, then zero-testing looks acceptable. if a text file is encoded using some 16-bit encoding scheme, then zero bytes can occur in text files (e.g. the char 0x0100, 0x0200, etc). You could check the first few bytes of the file, Unicode/UTF8/UTF16 use special values here; if these match you might assume it is text and skip further testing (and once in a while such assumption will be wrong); if they dont match you could assume it is an 8-bit encoding, and do the zero test. Whatever you do, since 100% confidence will not be achievable, I see no point in checking more than a few hundred bytes before deciding text/no text. :)
Luc Pattyn [My Articles] [Forum Guidelines]