Writing/Reading Large files

Jenleonard

I am debugging a Visual C++ program, and my job is to speed up the file access. Currently the program is reading in a 600 MB ascii-text file using fopen and fscanf, etc., in text mode. Is there an easy way to speed this up? I have tried using streams and they aren't defined. Would changing it to binary make it faster? Thanks so much for your help! Jennifer

Christian Graus

JenniferLeonard522 wrote: I have tried using streams and they aren't defined. How do you mean, they aren't defined ? Regardless of speed issues, moving your file handling code to c++ is a positive step. Christian Graus - Microsoft MVP - C++

Anonymous

When I use ifstream and cin they aren't defined. the code I am debugging currently uses fopen, fscanf, fprintf, etc. Thanks! Jen

Christian Graus

Anonymous wrote: When I use ifstream and cin they aren't defined. Did you include them ? ifstream is in fstrem and cin is in iostream. You need to scope them in std, or pull them into the global namespace with using statements as well. Anonymous wrote: the code I am debugging currently uses fopen, fscanf, fprintf, etc. Yeah, the world is full of crappy code that uses those instead of C++. It's a common problem. Christian Graus - Microsoft MVP - C++

Axter

For reading files of this size, I don't recommend using C++ stream classes. Instead, I recommend you use file mapping API functions, which will greatly increase the speed of your code. For more info, look at your help files for the following API functions: MapViewOfFile CreateFileMapping UnmapViewOfFile For example code, check out the following links: http://code.axter.com/MapFileToMemory.h and http://code.axter.com/mapfile2mem.cpp The C++ stream classes are reliable, but not very speedy. If you have a large file, and you need speed, they're not the best choice. Top ten member of C++ Expert Exchange. http://www.experts-exchange.com/Cplusplus

kakan

Hello. One way to speed up file handling (while maintaining ths stream I/O) could be to use fread() and fwrite(), using a buffer size that match the disk sector size of the file. Since the normal sector size is 512 bytes, read and write blocks that is an even multiplier of 512. (I guess the most effective disk I/O would be to read/write the size of a complete disk cluster at once, but I'm not sure). This means that you have to use your own code to extract/create the data (text lines) in the buffer, but that job has to take place anyhow. If it´s done in the runtime library or in your own code dosen't really matter (provided that your own code is efficially written). Another way could be to use the native Win32-API and to use overlapped I/O. But it works (logically) in the same way as fread/fwrite, which means you still have to write your own code to handle your buffer.

John R Shaw

I rated your answer as 5 (but I do not know if it took). You are correct about reading and writing a cluster at a time (or a multiple there of). The problems may start to appear with how the memory required is managed, which the question ignores (I,m not going into that). The simpilest solution (if they are using MFC), is to use a CMemFile (which side steps the issue entirely and allows MFC to handle it). INTP Every thing is relative...

Jenleonard

Thank you for all the suggestions. I looked into CMemFile, but it said " Because CMemFile doesn't use a disk file, the data member CFile::m_hFile is not used and has no meaning." I read through the description and didn't see how to use it to read a large data file from disk. I also looked into writing the file a cluster at a time, but the problem is that my file is constantly growing (every 10 minutes it gets more information). So right now it is 600MB, but in a couple months it will be 800MB. What would the quickest way be to read a large file like this? Thanks! Jen

John R Shaw

The first statement that the m_hFile is not used and has no meaning is essentially true. That is the file handle was only needed long enough to load the file into memory. I (incorrectly) assumed that was not the case (blast it). The fasted way to read a file is a cluster at a time. You do not need to actualy care about this detail, since you are only reading the file as a whole. Look into sharing file read access. This will not speed up the actual read time, but will reduce the preceived time (if used properly). What this means, is that (if) the file is only changed by adding to it, then you just need to read the additional information added to the file. What I mensioned above is also true if you do not have file sharing read access. What I mean is that (assuming the other application closes the file, after writing to it) you can check if the file size has changed on a regular basis, and just read the changes. What all that bull boils down to is this: Only read what has changed and nothing more. I hope that helps, because I cann't explane all the ideas that have poped into my head. INTP Every thing is relative...