ifstream read() logic?

John R Shaw

I have been wondering what the logic was behind the fact that the ifstream read method sets the eofbit as well as the failbit on a read failure, but can find no explanation. I can see that if you are just checking for EOF in a loop then setting the eofbit will stop any future reading, but this also hides the cause of the error. That is if we have not actually reached the end of the file, then there is nothing to tell us this fact. If you try to read a 25k file and the codecvt do_in method returns error after reading only half a line of text, then both bits are set and there is nothing to tell us that we have not just read the entire file. I can easily hack my way around the problem, but the fact that I would have to do that is ridiculous. Does anyone have an idea or explanation for this behavior?

INTP "Program testing can be used to show the presence of bugs, but never to show their absence."Edsger Dijkstra

Stuart Dootson

Looking at the C++ standard specification for read (27.6.1.3 paragraph 28):

Characters are extracted and stored until either of the following occurs: —n characters are stored; —end-of-file occurs on the input sequence (in which case the function calls setstate(failbit|eofbit), which may throw ios_base::failure(27.4.4.3)).

There are two outcomes - either we can read n characters or not. If not, failbit and eofbit get set. BTW - the idiomatic way (from what I'veseen) to loop on a stream isn't to check for eof, it's to check the stream status like so:

std::ifstream f(...);
while(f)
{
// read stuff
}

Java, Basic, who cares - it's all a bunch of tree-hugging hippy cr*p

John R Shaw

Here is that basic problem, the while loop works great but the test calls an over-ride that is just checking if the EOF has been reached (overloaded … etc.). Which makes since accept when that is not true? If an error occurs while reading, you need to know that is what happened; just saying that you have reached the EOF, when untrue, gives the wrong information. An error has occurred, true, but it is not because you reached the end of the file (EOF). This little piece of information is very important. One work around, which applies to C as well, is to get the size of the file before reading it; is to compare the number of bytes read as opposed to the size of the file. The main difference between C and C++ (I may be wrong) is that a read error can occur without reaching the EOF, so you need to check for both while reading a file. I will grant that if you try to read past the EOF that it is an error and you have reached the EOF. But before reaching the EOF, only an error flag should be set; indicating that a read occurred. If it sets the failbit and the eofbit every time an error occurs, it is self defeating and is lying to the developer

INTP "Program testing can be used to show the presence of bugs, but never to show their absence."Edsger Dijkstra

Stuart Dootson

[edit]Pressed 'Post Message' with no message - DOH![/edit] If the failbit is set, you know reading terminated because of an error. eofbit set alone is an indicator of end-of-file. Sounds like you just need to change the priority of checks round a bit?

John R Shaw

If an error occurs before the number of elements requested is read then both eofbit and failbit are set. This is done whether you reach the end of file or not, because all the read method knows is the number of elements read and is therefore guessing.

// The standard read pattern
amountRead = rdbuf()->sgetn(pElements, amountRequested);
GCount += amountRead;
if( amountRead != amountRequested )
State |= eofbit | failbit; // Read error. Hmm! We must be at EOF (Good Guess, but not always true)

// In: vc6 STL - istream standard header
_St |= eofbit | failbit;

// In: .Net 2003 STL
State |= ios_base::eofbit | ios_base::failbit; // short read

// In: GNU ISO C++ Library
__err |= (ios_base::eofbit | ios_base::failbit);

I do know it terminated because of a read error, but since we can not know how many characters the encoded source data represents, we can not avoid the error. Therefore we do not know if it is an attempt to read the EOF or an encoding error.

INTP "Program testing can be used to show the presence of bugs, but never to show their absence." - Edsger Dijkstra "I have never been lost, but I will admit to being confused for several weeks. " - Daniel Boone

modified on Tuesday, July 14, 2009 2:36 PM