Memory Mapped Files and Performance Strategies

Steve The Plant

Hi! Are memory mapped files typically used for loading/saving speed, or is it just to handle incredibly large files? I'm currently in a situation where I need to load and parse large files (~20 MB) and I'm wondering about the different strategies for doing that as fast as possible. Thanks Shawn

David Crow

They eliminate the need to hit the disk multiple times. With a 20MB file, you will notice a difference. With smaller files, the gain, if any, is negligible.

Mike Dimmick

For some situations it's more convenient to use memory mapped files (e.g. if your file is structured with offsets to given items), and so that you can have a disk-backed data structure. It really simplifies programming such a shared structure - you don't have to explicitly read or write the file, you just read from or write to memory. Note that there isn't any particular improvement in speed, nor any fewer disk hits: pages of a mapped view of a memory-mapped file are still paged in (and out) on demand. SQL Server makes a lot of use of memory mapped files, IIRC. If you're just reading a file sequentially, rely on the file system caching. It'll do much the same thing and is typically less complicated to program. To optimise the caching behaviour, pass the FILE_FLAG_SEQUENTIAL_SCAN or FILE_FLAG_RANDOM_ACCESS flags, depending on how you're using the file. If you specify one mode but your program actually reads the other way, performance is typically worse than if you hadn't specified either flag. Essentially FILE_FLAG_SEQUENTIAL_SCAN tells the system to read-ahead a lot and not at all behind, whereas FILE_FLAG_RANDOM_ACCESS tells it not to read ahead or behind very much. Specifying neither gives a compromise mode that does a little read-behind and some read-ahead. FILE_FLAG_SEQUENTIAL_SCAN causes pages behind the current file pointer to be discarded aggressively. For information on how Windows 2000 file caching works, see Inside Windows 2000[^], chapters 7 (Memory Management) and 11 (Cache Manager). Basically the system implements caching by memory-mapping the files.

David Crow

Mike Dimmick wrote: Note that there isn't any particular improvement in speed, nor any fewer disk hits: pages of a mapped view of a memory-mapped file are still paged in (and out) on demand. True, but for the last project I changed, I simply read from a CMemFile instead of a CFile, and the subsequent processing time could be measured in minutes instead of what used to take days. That's quite an improvement.

Mike Dimmick

CMemFile is not a memory mapped file. It's a block of memory that conforms to the CFile interface, i.e., you can treat it as if it's a file. True memory-mapped files are a different thing entirely.

David Crow

I disagree. A CMemFile object is a memory file that behaves like a disk file except that the file is stored in RAM rather than on disk. A MMF, by definition, provides you with the capability to map a view of all or part of a file on disk to a specific range of addresses within your process's address space. Once that is done, accessing the content of it is as simple as dereferencing a pointer in the designated range of addresses. I fail to see how they are THAT different. In either case, once the file is read from disk and into memory, you operate on the file in its memory-mapped state, using the appropriate functions. The MSDN article Q142377 offers another view.