UNICODE problem: fgetws

IGx89

I'm trying to read a file (which could be using ANSI or MBCS/UNICODE encoding) using the fgetws function, but am having problems. Here's my code: FILE *file = fopen(tmp,"r"); wchar_t *line = new wchar_t[2001]; wcscpy(line,L""); wchar_t buf[501]; while(fgetws(buf,500,file) != NULL) wcscat(line,buf); The problem is that only the first fgetws function call seems to work correctly; all the following calls return semi-invalid strings, almost double-UNICODE encoded: if there are, say, three spaces, they are stored as 00 00 00 32 00 00 00 32 00 00 00 32 in memory! I'm completely new to UNICODE programming, so I certainly could be making a simple mistake; I just haven't been able to find it yet :(.

Mike Dimmick

If opened in Text mode (the default), the MS C run-time treats the file as if it is ANSI (i.e. encoded using your default locale's character set). fgetws passes the data read through MultiByteToWideChar to get a UTF-16 string. If the file is already UTF-16, you'll get the wrong answer (what you're seeing here). Files opened in Binary mode (by adding a 'b' to the mode parameter) are treated as-is, with no conversions. This also means that CR+LF pairs are not converted to line feeds alone: you'll see \r as well as \n. Stability. What an interesting concept. -- Chris Maunder

IGx89

Ok, thanks for the help! So I'll just try reading the file as binary, and see how that works. Is using fgetws the best way to fill a char array with the contents of a file, in a straight-Win32 app?