UNICODE problem: fgetws
-
I'm trying to read a file (which could be using ANSI or MBCS/UNICODE encoding) using the fgetws function, but am having problems. Here's my code:
FILE *file = fopen(tmp,"r"); wchar_t *line = new wchar_t[2001]; wcscpy(line,L""); wchar_t buf[501]; while(fgetws(buf,500,file) != NULL) wcscat(line,buf);
The problem is that only the first fgetws function call seems to work correctly; all the following calls return semi-invalid strings, almost double-UNICODE encoded: if there are, say, three spaces, they are stored as 00 00 00 32 00 00 00 32 00 00 00 32 in memory! I'm completely new to UNICODE programming, so I certainly could be making a simple mistake; I just haven't been able to find it yet :(. -
I'm trying to read a file (which could be using ANSI or MBCS/UNICODE encoding) using the fgetws function, but am having problems. Here's my code:
FILE *file = fopen(tmp,"r"); wchar_t *line = new wchar_t[2001]; wcscpy(line,L""); wchar_t buf[501]; while(fgetws(buf,500,file) != NULL) wcscat(line,buf);
The problem is that only the first fgetws function call seems to work correctly; all the following calls return semi-invalid strings, almost double-UNICODE encoded: if there are, say, three spaces, they are stored as 00 00 00 32 00 00 00 32 00 00 00 32 in memory! I'm completely new to UNICODE programming, so I certainly could be making a simple mistake; I just haven't been able to find it yet :(.If opened in Text mode (the default), the MS C run-time treats the file as if it is ANSI (i.e. encoded using your default locale's character set).
fgetws
passes the data read throughMultiByteToWideChar
to get a UTF-16 string. If the file is already UTF-16, you'll get the wrong answer (what you're seeing here). Files opened in Binary mode (by adding a 'b' to themode
parameter) are treated as-is, with no conversions. This also means that CR+LF pairs are not converted to line feeds alone: you'll see \r as well as \n. Stability. What an interesting concept. -- Chris Maunder -
If opened in Text mode (the default), the MS C run-time treats the file as if it is ANSI (i.e. encoded using your default locale's character set).
fgetws
passes the data read throughMultiByteToWideChar
to get a UTF-16 string. If the file is already UTF-16, you'll get the wrong answer (what you're seeing here). Files opened in Binary mode (by adding a 'b' to themode
parameter) are treated as-is, with no conversions. This also means that CR+LF pairs are not converted to line feeds alone: you'll see \r as well as \n. Stability. What an interesting concept. -- Chris Maunder