HDD access chaotic speed problem.
-
Hi, Using C++, std::fstream, Windows. I have an algorithm that processes large file (actually this is the series of algorithms, but lets consider only one, cause others having the same problems). In general this algorithm reads chunks of data from one file and write to the other file with during some realignment (For reference, I'm doing realignment of the 3d volume data). I'm reading chunk size is 512 bytes, writing chunk size is 16 kb. Usually this algorithm finishes in 1 minute and 50 seconds. But I noticed that sometimes (rarely) it finishes in 24 seconds! Processing the same file, the same execution path. I've started to search for the reasons of why this slowdown is happening and how can I control that. 1) I have tried increasing the coalescing of the accesses to disk 2) I have considered the fragmentation problem (the file that I write is wrote in small chunks, therefore it's fragmented, about 600 fragments). When I resolved fragmentation problem (so, it's guaranteed that file is not fragmented) - I didnt got anything, still this chaotic access speed. 3) I have investigated the probability, that Windows flashes my memory buffers to HDD. No, that's not the case. 4) I have found that if I do all this operation on the other physical disk (not the one with OS) - I get this slow down more rarely, and algorithm usually finishes in 40 seconds (but anyway, speed is of HDD access is chaotic). Frequently during the same execution, access speed is rising or falling down, may be few times. This looks like my accesses are going out of tact with some internal HDD or OS operations, don't know. Anyone, have some experience or idea? Thanks.
If you read and write data from the same disk from different files, the read-head will have to move to a different place on the disk, which is slow. If your input and output file are on different drives, this problem is solved. If you read and write alternatively (to the same physical disk) in your program each time, this slow down is maximum. Note that some variation in disk speed can't be prevented in a multitasking OS where other programs might be using the disk too. If your memory permits, you could consider reading the entire input file to memory at once at the beginning of your algorithm and then processing it and writing the results.
-
If you read and write data from the same disk from different files, the read-head will have to move to a different place on the disk, which is slow. If your input and output file are on different drives, this problem is solved. If you read and write alternatively (to the same physical disk) in your program each time, this slow down is maximum. Note that some variation in disk speed can't be prevented in a multitasking OS where other programs might be using the disk too. If your memory permits, you could consider reading the entire input file to memory at once at the beginning of your algorithm and then processing it and writing the results.
My files are always one the same drive in all tests. When it does in 1:54 or 24 seconds.
Thaddeus Jones wrote:
Note that some variation in disk speed can't be prevented in a multitasking OS where other programs might be using the disk too.
I always ensure no other work is running. Also I'm looking in the "Resource monitor", no big HDD accesses except my program.
Thaddeus Jones wrote:
If your memory permits, you could consider reading the entire input file to memory at once at the beginning of your algorithm and then processing it and writing the results.
No this is not possible, algorithm should work with unlimited file size.
-
If you read and write data from the same disk from different files, the read-head will have to move to a different place on the disk, which is slow. If your input and output file are on different drives, this problem is solved. If you read and write alternatively (to the same physical disk) in your program each time, this slow down is maximum. Note that some variation in disk speed can't be prevented in a multitasking OS where other programs might be using the disk too. If your memory permits, you could consider reading the entire input file to memory at once at the beginning of your algorithm and then processing it and writing the results.
The problem is not "Algorithm is slow", but "Algorithm speed is too chaotic". Sometimes it's done 5 times faster then usual, this means - it should always be done 5 times faster. I agree, that access speed can vary a little, but 5 times... I think this is something that should be puzzled out.
-
Hi, Using C++, std::fstream, Windows. I have an algorithm that processes large file (actually this is the series of algorithms, but lets consider only one, cause others having the same problems). In general this algorithm reads chunks of data from one file and write to the other file with during some realignment (For reference, I'm doing realignment of the 3d volume data). I'm reading chunk size is 512 bytes, writing chunk size is 16 kb. Usually this algorithm finishes in 1 minute and 50 seconds. But I noticed that sometimes (rarely) it finishes in 24 seconds! Processing the same file, the same execution path. I've started to search for the reasons of why this slowdown is happening and how can I control that. 1) I have tried increasing the coalescing of the accesses to disk 2) I have considered the fragmentation problem (the file that I write is wrote in small chunks, therefore it's fragmented, about 600 fragments). When I resolved fragmentation problem (so, it's guaranteed that file is not fragmented) - I didnt got anything, still this chaotic access speed. 3) I have investigated the probability, that Windows flashes my memory buffers to HDD. No, that's not the case. 4) I have found that if I do all this operation on the other physical disk (not the one with OS) - I get this slow down more rarely, and algorithm usually finishes in 40 seconds (but anyway, speed is of HDD access is chaotic). Frequently during the same execution, access speed is rising or falling down, may be few times. This looks like my accesses are going out of tact with some internal HDD or OS operations, don't know. Anyone, have some experience or idea? Thanks.
progDes wrote:
Usually this algorithm finishes in 1 minute and 50 seconds. But I noticed that sometimes (rarely) it finishes in 24 seconds! Processing the same file, the same execution path. I've started to search for the reasons of why this slowdown is happening and how can I control that.
Caching, perhaps?
"One man's wage rise is another man's price increase." - Harold Wilson
"Fireproof doesn't mean the fire will never come. It means when the fire comes that you will be able to withstand it." - Michael Simmons
"Man who follows car will be exhausted." - Confucius
-
My files are always one the same drive in all tests. When it does in 1:54 or 24 seconds.
Thaddeus Jones wrote:
Note that some variation in disk speed can't be prevented in a multitasking OS where other programs might be using the disk too.
I always ensure no other work is running. Also I'm looking in the "Resource monitor", no big HDD accesses except my program.
Thaddeus Jones wrote:
If your memory permits, you could consider reading the entire input file to memory at once at the beginning of your algorithm and then processing it and writing the results.
No this is not possible, algorithm should work with unlimited file size.
Maybe you could increase your input buffer then from 512 bytes to say 100Mb, and every time you've processed the 100Mb from memory, you'll read a new 100Mb. Similarly, writing your output to a memory buffer (say also 100Mb) and once your buffer is full writing that to file, should help with speed too. The idea is to concentrate disk access to areas on the disk that are near eachother, since those access operations are much faster than if the head has to be repositioned.
-
Hi, Using C++, std::fstream, Windows. I have an algorithm that processes large file (actually this is the series of algorithms, but lets consider only one, cause others having the same problems). In general this algorithm reads chunks of data from one file and write to the other file with during some realignment (For reference, I'm doing realignment of the 3d volume data). I'm reading chunk size is 512 bytes, writing chunk size is 16 kb. Usually this algorithm finishes in 1 minute and 50 seconds. But I noticed that sometimes (rarely) it finishes in 24 seconds! Processing the same file, the same execution path. I've started to search for the reasons of why this slowdown is happening and how can I control that. 1) I have tried increasing the coalescing of the accesses to disk 2) I have considered the fragmentation problem (the file that I write is wrote in small chunks, therefore it's fragmented, about 600 fragments). When I resolved fragmentation problem (so, it's guaranteed that file is not fragmented) - I didnt got anything, still this chaotic access speed. 3) I have investigated the probability, that Windows flashes my memory buffers to HDD. No, that's not the case. 4) I have found that if I do all this operation on the other physical disk (not the one with OS) - I get this slow down more rarely, and algorithm usually finishes in 40 seconds (but anyway, speed is of HDD access is chaotic). Frequently during the same execution, access speed is rising or falling down, may be few times. This looks like my accesses are going out of tact with some internal HDD or OS operations, don't know. Anyone, have some experience or idea? Thanks.
There are countless factors that could be contributing to this. I would guess that the major ones are file caching and delayed writes[^]. File Caching: When a file is opened and read, its contents are loaded into RAM by the OS, and then sections are copied to your exe as you need them (usually in 4KB chunks which you then read smaller chunks from). Once the file is closed it is marked as unused but the OS, but is not removed from the RAM. If then another program needs heaps of memory the file will be removed from the RAM, however if this does not happen, and your file is still in the RAM then your program doesn't actually need to use the hard disk. This is most notable if you open a program that uses lots of files on load, say MS Word. If you then close the program and open it again shortly after without opening something else the 2nd time you open the program it will load much quicker. Delayed Writes: These generally only occur to slow mediums such as USB memory sticks, however they can happen to HDD as well. When you write to a file, and storage device is busy the data you write will often get written into a virtual file in RAM which is then written to the storage device at a later stage. Other problems may include HDD head seeks (as mentioned by Thaddeus Jones) and other programs accessing the disk. If you are running Windows vista or 7 then you can look at disk accesses with the Resource Monitor (resmon.exe)
-
Maybe you could increase your input buffer then from 512 bytes to say 100Mb, and every time you've processed the 100Mb from memory, you'll read a new 100Mb. Similarly, writing your output to a memory buffer (say also 100Mb) and once your buffer is full writing that to file, should help with speed too. The idea is to concentrate disk access to areas on the disk that are near eachother, since those access operations are much faster than if the head has to be repositioned.
Thaddeus Jones wrote:
Maybe you could increase your input buffer then from 512 bytes to say 100Mb, and every time you've processed the 100Mb from memory, you'll read a new 100Mb. Similarly, writing your output to a memory buffer (say also 100Mb) and once your buffer is full writing that to file, should help with speed too.
I've tried this approach. Although it gives slight speed increase, it doesnt resolve the problem of chaotic speed.
-
Thaddeus Jones wrote:
Maybe you could increase your input buffer then from 512 bytes to say 100Mb, and every time you've processed the 100Mb from memory, you'll read a new 100Mb. Similarly, writing your output to a memory buffer (say also 100Mb) and once your buffer is full writing that to file, should help with speed too.
I've tried this approach. Although it gives slight speed increase, it doesnt resolve the problem of chaotic speed.
-
There are countless factors that could be contributing to this. I would guess that the major ones are file caching and delayed writes[^]. File Caching: When a file is opened and read, its contents are loaded into RAM by the OS, and then sections are copied to your exe as you need them (usually in 4KB chunks which you then read smaller chunks from). Once the file is closed it is marked as unused but the OS, but is not removed from the RAM. If then another program needs heaps of memory the file will be removed from the RAM, however if this does not happen, and your file is still in the RAM then your program doesn't actually need to use the hard disk. This is most notable if you open a program that uses lots of files on load, say MS Word. If you then close the program and open it again shortly after without opening something else the 2nd time you open the program it will load much quicker. Delayed Writes: These generally only occur to slow mediums such as USB memory sticks, however they can happen to HDD as well. When you write to a file, and storage device is busy the data you write will often get written into a virtual file in RAM which is then written to the storage device at a later stage. Other problems may include HDD head seeks (as mentioned by Thaddeus Jones) and other programs accessing the disk. If you are running Windows vista or 7 then you can look at disk accesses with the Resource Monitor (resmon.exe)
Thanks Adrew, I will consider the situation with file caching. Need to investigate on this more. Meanwhile, are you think that rare disk accesses by other programs can reduce speed of my accesses in 5 times? I'm making sure that no heavy HDD operations are performed by other programs, but other programs for sure doing disk accesses even in the idle mode.
-
Hi, Using C++, std::fstream, Windows. I have an algorithm that processes large file (actually this is the series of algorithms, but lets consider only one, cause others having the same problems). In general this algorithm reads chunks of data from one file and write to the other file with during some realignment (For reference, I'm doing realignment of the 3d volume data). I'm reading chunk size is 512 bytes, writing chunk size is 16 kb. Usually this algorithm finishes in 1 minute and 50 seconds. But I noticed that sometimes (rarely) it finishes in 24 seconds! Processing the same file, the same execution path. I've started to search for the reasons of why this slowdown is happening and how can I control that. 1) I have tried increasing the coalescing of the accesses to disk 2) I have considered the fragmentation problem (the file that I write is wrote in small chunks, therefore it's fragmented, about 600 fragments). When I resolved fragmentation problem (so, it's guaranteed that file is not fragmented) - I didnt got anything, still this chaotic access speed. 3) I have investigated the probability, that Windows flashes my memory buffers to HDD. No, that's not the case. 4) I have found that if I do all this operation on the other physical disk (not the one with OS) - I get this slow down more rarely, and algorithm usually finishes in 40 seconds (but anyway, speed is of HDD access is chaotic). Frequently during the same execution, access speed is rising or falling down, may be few times. This looks like my accesses are going out of tact with some internal HDD or OS operations, don't know. Anyone, have some experience or idea? Thanks.
-
If you run out of ideas, have you tried disabling anti-virus software? Maybe it performs some weird caching of scanned data. Even a long shot is a shot...