Memory Mapped Files and Performance Strategies
-
Hi! Are memory mapped files typically used for loading/saving speed, or is it just to handle incredibly large files? I'm currently in a situation where I need to load and parse large files (~20 MB) and I'm wondering about the different strategies for doing that as fast as possible. Thanks Shawn
-
Hi! Are memory mapped files typically used for loading/saving speed, or is it just to handle incredibly large files? I'm currently in a situation where I need to load and parse large files (~20 MB) and I'm wondering about the different strategies for doing that as fast as possible. Thanks Shawn
They eliminate the need to hit the disk multiple times. With a 20MB file, you will notice a difference. With smaller files, the gain, if any, is negligible.
-
They eliminate the need to hit the disk multiple times. With a 20MB file, you will notice a difference. With smaller files, the gain, if any, is negligible.
For some situations it's more convenient to use memory mapped files (e.g. if your file is structured with offsets to given items), and so that you can have a disk-backed data structure. It really simplifies programming such a shared structure - you don't have to explicitly read or write the file, you just read from or write to memory. Note that there isn't any particular improvement in speed, nor any fewer disk hits: pages of a mapped view of a memory-mapped file are still paged in (and out) on demand. SQL Server makes a lot of use of memory mapped files, IIRC. If you're just reading a file sequentially, rely on the file system caching. It'll do much the same thing and is typically less complicated to program. To optimise the caching behaviour, pass the
FILE_FLAG_SEQUENTIAL_SCAN
orFILE_FLAG_RANDOM_ACCESS
flags, depending on how you're using the file. If you specify one mode but your program actually reads the other way, performance is typically worse than if you hadn't specified either flag. EssentiallyFILE_FLAG_SEQUENTIAL_SCAN
tells the system to read-ahead a lot and not at all behind, whereasFILE_FLAG_RANDOM_ACCESS
tells it not to read ahead or behind very much. Specifying neither gives a compromise mode that does a little read-behind and some read-ahead.FILE_FLAG_SEQUENTIAL_SCAN
causes pages behind the current file pointer to be discarded aggressively. For information on how Windows 2000 file caching works, see Inside Windows 2000[^], chapters 7 (Memory Management) and 11 (Cache Manager). Basically the system implements caching by memory-mapping the files. -
For some situations it's more convenient to use memory mapped files (e.g. if your file is structured with offsets to given items), and so that you can have a disk-backed data structure. It really simplifies programming such a shared structure - you don't have to explicitly read or write the file, you just read from or write to memory. Note that there isn't any particular improvement in speed, nor any fewer disk hits: pages of a mapped view of a memory-mapped file are still paged in (and out) on demand. SQL Server makes a lot of use of memory mapped files, IIRC. If you're just reading a file sequentially, rely on the file system caching. It'll do much the same thing and is typically less complicated to program. To optimise the caching behaviour, pass the
FILE_FLAG_SEQUENTIAL_SCAN
orFILE_FLAG_RANDOM_ACCESS
flags, depending on how you're using the file. If you specify one mode but your program actually reads the other way, performance is typically worse than if you hadn't specified either flag. EssentiallyFILE_FLAG_SEQUENTIAL_SCAN
tells the system to read-ahead a lot and not at all behind, whereasFILE_FLAG_RANDOM_ACCESS
tells it not to read ahead or behind very much. Specifying neither gives a compromise mode that does a little read-behind and some read-ahead.FILE_FLAG_SEQUENTIAL_SCAN
causes pages behind the current file pointer to be discarded aggressively. For information on how Windows 2000 file caching works, see Inside Windows 2000[^], chapters 7 (Memory Management) and 11 (Cache Manager). Basically the system implements caching by memory-mapping the files.Mike Dimmick wrote: Note that there isn't any particular improvement in speed, nor any fewer disk hits: pages of a mapped view of a memory-mapped file are still paged in (and out) on demand. True, but for the last project I changed, I simply read from a CMemFile instead of a CFile, and the subsequent processing time could be measured in minutes instead of what used to take days. That's quite an improvement.
-
Mike Dimmick wrote: Note that there isn't any particular improvement in speed, nor any fewer disk hits: pages of a mapped view of a memory-mapped file are still paged in (and out) on demand. True, but for the last project I changed, I simply read from a CMemFile instead of a CFile, and the subsequent processing time could be measured in minutes instead of what used to take days. That's quite an improvement.
CMemFile
is not a memory mapped file. It's a block of memory that conforms to theCFile
interface, i.e., you can treat it as if it's a file. True memory-mapped files are a different thing entirely. -
CMemFile
is not a memory mapped file. It's a block of memory that conforms to theCFile
interface, i.e., you can treat it as if it's a file. True memory-mapped files are a different thing entirely.I disagree. A CMemFile object is a memory file that behaves like a disk file except that the file is stored in RAM rather than on disk. A MMF, by definition, provides you with the capability to map a view of all or part of a file on disk to a specific range of addresses within your process's address space. Once that is done, accessing the content of it is as simple as dereferencing a pointer in the designated range of addresses. I fail to see how they are THAT different. In either case, once the file is read from disk and into memory, you operate on the file in its memory-mapped state, using the appropriate functions. The MSDN article Q142377 offers another view.