Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. File I/O: What is the best approach

File I/O: What is the best approach

Scheduled Pinned Locked Moved C / C++ / MFC
questionhardwareperformancehelptutorial
10 Posts 4 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • K Offline
    K Offline
    KenThompson
    wrote on last edited by
    #1

    I am currently involved a project that requires that I have random access to multiple files on disk. I have a single 'file writer' object that handles writes, but I am unsure about how to proceed with reads. The question is: If I want to be able to service multiple 'reads' at the same time to a single file should I have a single object (an fstream) that is synchronized (using a lock) or multiple fstream objects that are independent. I want to take advantage of my raid hardware as well as multiple processors throughout the application. My initial thought is that having multiple 'reader' objects leaves synchronization up to the OS, and that using some type of locking mechanism (such as a criticalsection/mutex) could slow performance. Any help here is much appreciated. On a side note: If I'm just writing buffers of data (or 1 byte aligned structures) does it make more sense to just use stdio functions?

    L C 2 Replies Last reply
    0
    • K KenThompson

      I am currently involved a project that requires that I have random access to multiple files on disk. I have a single 'file writer' object that handles writes, but I am unsure about how to proceed with reads. The question is: If I want to be able to service multiple 'reads' at the same time to a single file should I have a single object (an fstream) that is synchronized (using a lock) or multiple fstream objects that are independent. I want to take advantage of my raid hardware as well as multiple processors throughout the application. My initial thought is that having multiple 'reader' objects leaves synchronization up to the OS, and that using some type of locking mechanism (such as a criticalsection/mutex) could slow performance. Any help here is much appreciated. On a side note: If I'm just writing buffers of data (or 1 byte aligned structures) does it make more sense to just use stdio functions?

      L Offline
      L Offline
      led mike
      wrote on last edited by
      #2

      KenThompson wrote:

      My initial thought is that having multiple 'reader' objects leaves synchronization up to the OS

      What OS and where is it documented that it performs such synchronization. And by this I assume you mean synchronization to the "write operations".

      KenThompson wrote:

      and that using some type of locking mechanism (such as a criticalsection/mutex) could slow performance.

      "would" slow performance, it's not in question. However without the OS performing synchronization, of which I am unaware, you would have to do it to avoid reading corrupted data. You could optimize by creating a far more complex mechanism like Databases do of managing what "parts" of the file are locked for writing. Which of course raises the question of why you wouldn't just just use a database because they already implement everything you require. Also you are not clear if this is across threads or across processes. The later synchronization is far more expensive.

      K 1 Reply Last reply
      0
      • L led mike

        KenThompson wrote:

        My initial thought is that having multiple 'reader' objects leaves synchronization up to the OS

        What OS and where is it documented that it performs such synchronization. And by this I assume you mean synchronization to the "write operations".

        KenThompson wrote:

        and that using some type of locking mechanism (such as a criticalsection/mutex) could slow performance.

        "would" slow performance, it's not in question. However without the OS performing synchronization, of which I am unaware, you would have to do it to avoid reading corrupted data. You could optimize by creating a far more complex mechanism like Databases do of managing what "parts" of the file are locked for writing. Which of course raises the question of why you wouldn't just just use a database because they already implement everything you require. Also you are not clear if this is across threads or across processes. The later synchronization is far more expensive.

        K Offline
        K Offline
        KenThompson
        wrote on last edited by
        #3

        I'm already synchronizing write and read operations. By this I mean that I keep track of what is currently being done to the file. Basically, the writer never goes backwards, so whatever has been written is fair game in regards to reading. The only random access is reading. A database for this application isn't acceptable. The question remains though. What approach is the best? Have a single reader per file handling many requests. (ie. setg to the offset) or having several fstream objects created that read independently, in a shared mode. I didn't mean that the OS, in this case Windows, prevents corruption when modifying files. I should of been more clear in my statement. I meant to say: My initial thought is that having multiple 'reader' objects is perfectly acceptable and not a performance hit. In addition, in a raid situation would it not make more sense to create multiple file streams to the same file due to the very nature of multiple disk heads? I'm not all that aware of where there is any performance to gain based on implementation. I can only assume that if I issue two reads to the same file, via two streams, that the raid controller (in my case raid 5) would out perform a setg operation. Maybe not with 2 reads, but maybe 100s of reads per second. Does this make sense?

        L J 2 Replies Last reply
        0
        • K KenThompson

          I'm already synchronizing write and read operations. By this I mean that I keep track of what is currently being done to the file. Basically, the writer never goes backwards, so whatever has been written is fair game in regards to reading. The only random access is reading. A database for this application isn't acceptable. The question remains though. What approach is the best? Have a single reader per file handling many requests. (ie. setg to the offset) or having several fstream objects created that read independently, in a shared mode. I didn't mean that the OS, in this case Windows, prevents corruption when modifying files. I should of been more clear in my statement. I meant to say: My initial thought is that having multiple 'reader' objects is perfectly acceptable and not a performance hit. In addition, in a raid situation would it not make more sense to create multiple file streams to the same file due to the very nature of multiple disk heads? I'm not all that aware of where there is any performance to gain based on implementation. I can only assume that if I issue two reads to the same file, via two streams, that the raid controller (in my case raid 5) would out perform a setg operation. Maybe not with 2 reads, but maybe 100s of reads per second. Does this make sense?

          L Offline
          L Offline
          led mike
          wrote on last edited by
          #4

          KenThompson wrote:

          Does this make sense?

          Absolutely. It would seem, on the surface (not a valid assessment), that multiple readers would be much more efficient. You might consider just implementing that and then run a profiler and see if that area even stands out at all. Sometimes (usually) profiler results will surprise you. ;)

          K 1 Reply Last reply
          0
          • L led mike

            KenThompson wrote:

            Does this make sense?

            Absolutely. It would seem, on the surface (not a valid assessment), that multiple readers would be much more efficient. You might consider just implementing that and then run a profiler and see if that area even stands out at all. Sometimes (usually) profiler results will surprise you. ;)

            K Offline
            K Offline
            KenThompson
            wrote on last edited by
            #5

            Thanks, I needed some sanity here. What type of profiler do you use? I've been playing with a few intel products, but the cost sours me to them.

            L 1 Reply Last reply
            0
            • K KenThompson

              Thanks, I needed some sanity here. What type of profiler do you use? I've been playing with a few intel products, but the cost sours me to them.

              L Offline
              L Offline
              led mike
              wrote on last edited by
              #6

              QC mostly runs them (several). I think the last time I used one myself it was an Intel product. The company has lots of money so cost is not an issue.

              1 Reply Last reply
              0
              • K KenThompson

                I am currently involved a project that requires that I have random access to multiple files on disk. I have a single 'file writer' object that handles writes, but I am unsure about how to proceed with reads. The question is: If I want to be able to service multiple 'reads' at the same time to a single file should I have a single object (an fstream) that is synchronized (using a lock) or multiple fstream objects that are independent. I want to take advantage of my raid hardware as well as multiple processors throughout the application. My initial thought is that having multiple 'reader' objects leaves synchronization up to the OS, and that using some type of locking mechanism (such as a criticalsection/mutex) could slow performance. Any help here is much appreciated. On a side note: If I'm just writing buffers of data (or 1 byte aligned structures) does it make more sense to just use stdio functions?

                C Offline
                C Offline
                cmk
                wrote on last edited by
                #7

                IOCP - not just for sockets, they can be used with any IFS based HANDLE e.g. sockets, files, pipes, ... Scatter/Gather I/O functions: http://msdn2.microsoft.com/en-us/library/aa365472.aspx[^] http://en.wikipedia.org/wiki/Vectored_I/O[^]

                ...cmk The idea that I can be presented with a problem, set out to logically solve it with the tools at hand, and wind up with a program that could not be legally used because someone else followed the same logical steps some years ago and filed for a patent on it is horrifying. - John Carmack

                K 1 Reply Last reply
                0
                • K KenThompson

                  I'm already synchronizing write and read operations. By this I mean that I keep track of what is currently being done to the file. Basically, the writer never goes backwards, so whatever has been written is fair game in regards to reading. The only random access is reading. A database for this application isn't acceptable. The question remains though. What approach is the best? Have a single reader per file handling many requests. (ie. setg to the offset) or having several fstream objects created that read independently, in a shared mode. I didn't mean that the OS, in this case Windows, prevents corruption when modifying files. I should of been more clear in my statement. I meant to say: My initial thought is that having multiple 'reader' objects is perfectly acceptable and not a performance hit. In addition, in a raid situation would it not make more sense to create multiple file streams to the same file due to the very nature of multiple disk heads? I'm not all that aware of where there is any performance to gain based on implementation. I can only assume that if I issue two reads to the same file, via two streams, that the raid controller (in my case raid 5) would out perform a setg operation. Maybe not with 2 reads, but maybe 100s of reads per second. Does this make sense?

                  J Offline
                  J Offline
                  jhwurmbach
                  wrote on last edited by
                  #8

                  KenThompson wrote:

                  A database for this application isn't acceptable.

                  So its better to write half of a database on your own instead of using e.g. SQLlite[^]?

                  KenThompson wrote:

                  Does this make sense?

                  I think no. You would only ever work on buffers managed by the OS. The Harddisk-Heads would ascend and decend its cylinders all the time, doing the read/write out of order.


                  Though I speak with the tongues of men and of angels, and have not money, I am become as a sounding brass, or a tinkling cymbal.
                  George Orwell, "Keep the Aspidistra Flying", Opening words

                  K 1 Reply Last reply
                  0
                  • J jhwurmbach

                    KenThompson wrote:

                    A database for this application isn't acceptable.

                    So its better to write half of a database on your own instead of using e.g. SQLlite[^]?

                    KenThompson wrote:

                    Does this make sense?

                    I think no. You would only ever work on buffers managed by the OS. The Harddisk-Heads would ascend and decend its cylinders all the time, doing the read/write out of order.


                    Though I speak with the tongues of men and of angels, and have not money, I am become as a sounding brass, or a tinkling cymbal.
                    George Orwell, "Keep the Aspidistra Flying", Opening words

                    K Offline
                    K Offline
                    KenThompson
                    wrote on last edited by
                    #9

                    SQLite is designed for use with databases sized in kilobytes or megabytes not gigabytes. Therefore, it is unacceptable.

                    1 Reply Last reply
                    0
                    • C cmk

                      IOCP - not just for sockets, they can be used with any IFS based HANDLE e.g. sockets, files, pipes, ... Scatter/Gather I/O functions: http://msdn2.microsoft.com/en-us/library/aa365472.aspx[^] http://en.wikipedia.org/wiki/Vectored_I/O[^]

                      ...cmk The idea that I can be presented with a problem, set out to logically solve it with the tools at hand, and wind up with a program that could not be legally used because someone else followed the same logical steps some years ago and filed for a patent on it is horrifying. - John Carmack

                      K Offline
                      K Offline
                      KenThompson
                      wrote on last edited by
                      #10

                      Thank you, this suggestion has proved quite fruitful! Ken

                      1 Reply Last reply
                      0
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Don't have an account? Register

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups