(Potential) Memory Mapped File question

swjam

My task is as follows... I have 2 data files a Source (A) and a Destination (B) both located in the shared network drive. A is a data file (plain text) between 2GB and 3GB made up of records of fixed size. A is always being written into by other processes (new records being appended at the end). I would like to read through the whole file A, read each record, leave what I still need and archive the rest to B. There is a 10 character string I can use within each record as a filter. Any tips how to go about this? Is MMF the path to take? Thanks for any help.

---------------------------------------------------------- Lorem ipsum dolor sit amet.

Joe Woodbury

Read it in in chunks. fopen() does that for you, but you could optimize the processing by doing it yourself. Since you are using fixed size records, this is even easier to code. MMF is definitely NOT the way you want to do this for several reasons include the size of the file and that it's on a network resource.

Anyone who thinks he has a better idea of what's good for people than people do is a swine. - P.J. O'Rourke

ktm TechMan

You can use ReadDirectoryChanges WIN 32 API, which registers for file changes in a directory, and use GetQueuedCompletionStatus function to wait for the changes. So when ever new records get added to A you will get notification of the changes, and read the added record and archive it to B. You can store a pointer on where you are in the file (like the number of bytes read) and continue from it when the next change arrive so that you don't have to read the whole file every time a change occurs. If you want to delete contents from the file though you will need some kind of process synchronization to avoid simultaneous write.

swjam

actually, i am using MMF for this reason (the size of the file). With MMF, I can specify in MapViewOfFile which part of the file to load, instead of opening the file the usual way which loads the whole file in memory. Also, I can just write to the file via the mapped region of memory directly. This would be tricky in the usual way as I have to remove records in place (meaning if a record does not fit the filter, i have to write in its place the next record that does.)

---------------------------------------------------------- Lorem ipsum dolor sit amet.

Joe Woodbury

Since you are reading sections of the file sequentially, memory mapped files offer zero benefit. Since shifting the window of the memory mapping requires operations, you are making the program slower for no reason. I fail to see how this is tricky using multiple buffers and straight I/O.

Anyone who thinks he has a better idea of what's good for people than people do is a swine. - P.J. O'Rourke

swjam

maybe i'm missing something but it is mainly for the operations i am performing to file A which is large. say, i partition file A in blocks of 64kb. i read block1 and filter those records i am not interested in. i write back to this same block1 those i want to keep. when i am finished with this block, i move to the next block2. i may still be not done writing to block1 because i will always be interested in less than or equal records to the original. therefore i have 2 mapviews of fileA, view1 is just reading which is always the same or advanced in the file as view2 which writes back. the other point i think MMF is useful is the fact the file is too large as to load entirely in memory via the usual file open operations. i hope i make sense to you? do you still think i could do this same as or better with usual fopen?

---------------------------------------------------------- Lorem ipsum dolor sit amet.

Joe Woodbury

I understand what you're doing and still wonder why you don't just use CreateFile() and two buffers?

Anyone who thinks he has a better idea of what's good for people than people do is a swine. - P.J. O'Rourke

swjam

I've finished my small program as I described. Out of curiosity, how do you suggest I do it using CreateFile and 2 buffers? Thanks.

---------------------------------------------------------- Lorem ipsum dolor sit amet.

Joe Woodbury

Use CreateFile() to open the file. Allocate a read and write buffer that are multiples of the record size. Each buffer would have an associated offset. Start reading records. Once you find records to be extracted, set the second buffer to that offset and start copying records into it as needed and flushing it when required.

Anyone who thinks he has a better idea of what's good for people than people do is a swine. - P.J. O'Rourke