Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. The Lounge
  3. Writes to the hard drive [modified]

Writes to the hard drive [modified]

Scheduled Pinned Locked Moved The Lounge
questioncomsysadmin
18 Posts 14 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J Offline
    J Offline
    Jim Crafton
    wrote on last edited by
    #1

    Let's say you have a process that's running merrily along in Windows (one of the server OS's), writing to disk, maybe 1K per sec or something like that. Now Windows crashes. How much data in the process of being written out to disk would be lost? Is there some way to determine this? Are there some sort of stats that Microsoft publishes or is this really dependent on the HD? I'm writing a program for work which gets a semi-real time feed of data. The question arose about how much of the data would we lose if the server crashes. //Edit Thanks to everyone for the responses - there were some really good answers, and it all helped out. Thanks again!

    ¡El diablo está en mis pantalones! ¡Mire, mire! SELECT * FROM User WHERE Clue > 0 0 rows returned Save an Orange - Use the VCF! Personal 3D projects Just Say No to Web 2 Point Blow

    modified on Wednesday, November 17, 2010 4:13 PM

    F A S L L 12 Replies Last reply
    0
    • J Jim Crafton

      Let's say you have a process that's running merrily along in Windows (one of the server OS's), writing to disk, maybe 1K per sec or something like that. Now Windows crashes. How much data in the process of being written out to disk would be lost? Is there some way to determine this? Are there some sort of stats that Microsoft publishes or is this really dependent on the HD? I'm writing a program for work which gets a semi-real time feed of data. The question arose about how much of the data would we lose if the server crashes. //Edit Thanks to everyone for the responses - there were some really good answers, and it all helped out. Thanks again!

      ¡El diablo está en mis pantalones! ¡Mire, mire! SELECT * FROM User WHERE Clue > 0 0 rows returned Save an Orange - Use the VCF! Personal 3D projects Just Say No to Web 2 Point Blow

      modified on Wednesday, November 17, 2010 4:13 PM

      F Offline
      F Offline
      federico strati
      wrote on last edited by
      #2

      If you use an hardware cluster you will be able to quantify precisely what you loose when one of the nodes goes down. otherwise you just have to rely on estimates of the amount of data you lost.

      1 Reply Last reply
      0
      • J Jim Crafton

        Let's say you have a process that's running merrily along in Windows (one of the server OS's), writing to disk, maybe 1K per sec or something like that. Now Windows crashes. How much data in the process of being written out to disk would be lost? Is there some way to determine this? Are there some sort of stats that Microsoft publishes or is this really dependent on the HD? I'm writing a program for work which gets a semi-real time feed of data. The question arose about how much of the data would we lose if the server crashes. //Edit Thanks to everyone for the responses - there were some really good answers, and it all helped out. Thanks again!

        ¡El diablo está en mis pantalones! ¡Mire, mire! SELECT * FROM User WHERE Clue > 0 0 rows returned Save an Orange - Use the VCF! Personal 3D projects Just Say No to Web 2 Point Blow

        modified on Wednesday, November 17, 2010 4:13 PM

        S Offline
        S Offline
        SimulationofSai
        wrote on last edited by
        #3

        In most cases, its the amount of data remaining from the last I/O flush + the amount of write behind cache in the HDD

        SG Aham Brahmasmi!

        1 Reply Last reply
        0
        • J Jim Crafton

          Let's say you have a process that's running merrily along in Windows (one of the server OS's), writing to disk, maybe 1K per sec or something like that. Now Windows crashes. How much data in the process of being written out to disk would be lost? Is there some way to determine this? Are there some sort of stats that Microsoft publishes or is this really dependent on the HD? I'm writing a program for work which gets a semi-real time feed of data. The question arose about how much of the data would we lose if the server crashes. //Edit Thanks to everyone for the responses - there were some really good answers, and it all helped out. Thanks again!

          ¡El diablo está en mis pantalones! ¡Mire, mire! SELECT * FROM User WHERE Clue > 0 0 rows returned Save an Orange - Use the VCF! Personal 3D projects Just Say No to Web 2 Point Blow

          modified on Wednesday, November 17, 2010 4:13 PM

          A Offline
          A Offline
          Andy Brummer
          wrote on last edited by
          #4

          Shouldn't the controller have a battery backup, so as long as an unbuffered disk write call completes you wouldn't loose any of that data. In order for you not to loose any data you need a two phase commit for pulling chunks off your feed.

          Curvature of the Mind

          D 1 Reply Last reply
          0
          • J Jim Crafton

            Let's say you have a process that's running merrily along in Windows (one of the server OS's), writing to disk, maybe 1K per sec or something like that. Now Windows crashes. How much data in the process of being written out to disk would be lost? Is there some way to determine this? Are there some sort of stats that Microsoft publishes or is this really dependent on the HD? I'm writing a program for work which gets a semi-real time feed of data. The question arose about how much of the data would we lose if the server crashes. //Edit Thanks to everyone for the responses - there were some really good answers, and it all helped out. Thanks again!

            ¡El diablo está en mis pantalones! ¡Mire, mire! SELECT * FROM User WHERE Clue > 0 0 rows returned Save an Orange - Use the VCF! Personal 3D projects Just Say No to Web 2 Point Blow

            modified on Wednesday, November 17, 2010 4:13 PM

            L Offline
            L Offline
            leppie
            wrote on last edited by
            #5

            Flush often, that causes windows to write the cache.

            ((λ (x) `(,x ',x)) '(λ (x) `(,x ',x)))

            1 Reply Last reply
            0
            • J Jim Crafton

              Let's say you have a process that's running merrily along in Windows (one of the server OS's), writing to disk, maybe 1K per sec or something like that. Now Windows crashes. How much data in the process of being written out to disk would be lost? Is there some way to determine this? Are there some sort of stats that Microsoft publishes or is this really dependent on the HD? I'm writing a program for work which gets a semi-real time feed of data. The question arose about how much of the data would we lose if the server crashes. //Edit Thanks to everyone for the responses - there were some really good answers, and it all helped out. Thanks again!

              ¡El diablo está en mis pantalones! ¡Mire, mire! SELECT * FROM User WHERE Clue > 0 0 rows returned Save an Orange - Use the VCF! Personal 3D projects Just Say No to Web 2 Point Blow

              modified on Wednesday, November 17, 2010 4:13 PM

              L Offline
              L Offline
              Luc Pattyn
              wrote on last edited by
              #6

              Unfortunately, the whole picture of streaming, buffering, flushing through all the layers of I/O classes, disk drivers, OS caching and hardware caching is all but transparent. FWIW: With a limited throughput such as yours you could open/write/close the data all the time, hence only loose one "record" on an app crash. I tend to use File.AppendAllText() for each and every line I log in my app while developing/debugging, making sure I get everything, especially the most relevant last line the app logged before crashing (which is more likely than Windows crashing). :)

              Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum

              Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.

              1 Reply Last reply
              0
              • J Jim Crafton

                Let's say you have a process that's running merrily along in Windows (one of the server OS's), writing to disk, maybe 1K per sec or something like that. Now Windows crashes. How much data in the process of being written out to disk would be lost? Is there some way to determine this? Are there some sort of stats that Microsoft publishes or is this really dependent on the HD? I'm writing a program for work which gets a semi-real time feed of data. The question arose about how much of the data would we lose if the server crashes. //Edit Thanks to everyone for the responses - there were some really good answers, and it all helped out. Thanks again!

                ¡El diablo está en mis pantalones! ¡Mire, mire! SELECT * FROM User WHERE Clue > 0 0 rows returned Save an Orange - Use the VCF! Personal 3D projects Just Say No to Web 2 Point Blow

                modified on Wednesday, November 17, 2010 4:13 PM

                E Offline
                E Offline
                Electron Shepherd
                wrote on last edited by
                #7

                For a discussion of this by the technical deity Raymond Chen, see http://blogs.msdn.com/b/oldnewthing/archive/2010/09/09/10059575.aspx[^]

                Server and Network Monitoring

                1 Reply Last reply
                0
                • J Jim Crafton

                  Let's say you have a process that's running merrily along in Windows (one of the server OS's), writing to disk, maybe 1K per sec or something like that. Now Windows crashes. How much data in the process of being written out to disk would be lost? Is there some way to determine this? Are there some sort of stats that Microsoft publishes or is this really dependent on the HD? I'm writing a program for work which gets a semi-real time feed of data. The question arose about how much of the data would we lose if the server crashes. //Edit Thanks to everyone for the responses - there were some really good answers, and it all helped out. Thanks again!

                  ¡El diablo está en mis pantalones! ¡Mire, mire! SELECT * FROM User WHERE Clue > 0 0 rows returned Save an Orange - Use the VCF! Personal 3D projects Just Say No to Web 2 Point Blow

                  modified on Wednesday, November 17, 2010 4:13 PM

                  E Offline
                  E Offline
                  Electron Shepherd
                  wrote on last edited by
                  #8

                  You could consider using Transactional NTFS[^]

                  Server and Network Monitoring

                  1 Reply Last reply
                  0
                  • J Jim Crafton

                    Let's say you have a process that's running merrily along in Windows (one of the server OS's), writing to disk, maybe 1K per sec or something like that. Now Windows crashes. How much data in the process of being written out to disk would be lost? Is there some way to determine this? Are there some sort of stats that Microsoft publishes or is this really dependent on the HD? I'm writing a program for work which gets a semi-real time feed of data. The question arose about how much of the data would we lose if the server crashes. //Edit Thanks to everyone for the responses - there were some really good answers, and it all helped out. Thanks again!

                    ¡El diablo está en mis pantalones! ¡Mire, mire! SELECT * FROM User WHERE Clue > 0 0 rows returned Save an Orange - Use the VCF! Personal 3D projects Just Say No to Web 2 Point Blow

                    modified on Wednesday, November 17, 2010 4:13 PM

                    D Offline
                    D Offline
                    Dave Kreskowiak
                    wrote on last edited by
                    #9

                    It's impossible to determine with the limited information available. It's up to O/S configuration, driver implementation, cache size on the drive, caching capability of the controller, what the drive was up to at the time of loss, what the O/S was doing at the time, load on the O/S, I/O channels, how your app is written, any queing by the data source, by the input source in the app, blah, blah, blah... You can make it close to zero however, using a caching disk controller with a battery backup, usually a RAID controller and the correct code to minimize the number of records lost between receipt, process and write operations. If you want absolute zero, then you're going to have to up the research into the data source capabilities, disk setup, O/S config, queing options, application coding, correct transaction implementation, ... In other words, spend more and more money the closer you want to get to no data loss.

                    A guide to posting questions on CodeProject[^]
                    Dave Kreskowiak

                    1 Reply Last reply
                    0
                    • A Andy Brummer

                      Shouldn't the controller have a battery backup, so as long as an unbuffered disk write call completes you wouldn't loose any of that data. In order for you not to loose any data you need a two phase commit for pulling chunks off your feed.

                      Curvature of the Mind

                      D Offline
                      D Offline
                      Dan Neely
                      wrote on last edited by
                      #10

                      Andy Brummer wrote:

                      Shouldn't the controller have a battery backup, so as long as an unbuffered disk write call completes you wouldn't loose any of that data.

                      Unfortunately even that's not a guarantee. No link handy, it's something I read about (in a comment???) on Raymond Chen's blog a year or two ago. But as of about two years ago, most (all?) hardrives (ata and scsi) lie to the rest of the system about the status of writes to their internal caches. This wonderful 'feature' came about because hard drive benchmarkers look at speed fanatically but verifiability almost never and marketing somehow talked management into making the engineers have the firmware lie about when writes are completed to inflate benchmark scores. X| Edit: I think these are the articles I was referring to, not sure because live journal is blocked. brad.livejournal.com/2116715.html brad.livejournal.com/2094221.html Sorry these aren't clickable but the linkifier has stopped working for me...

                      3x12=36 2x12=24 1x12=12 0x12=18

                      A 1 Reply Last reply
                      0
                      • J Jim Crafton

                        Let's say you have a process that's running merrily along in Windows (one of the server OS's), writing to disk, maybe 1K per sec or something like that. Now Windows crashes. How much data in the process of being written out to disk would be lost? Is there some way to determine this? Are there some sort of stats that Microsoft publishes or is this really dependent on the HD? I'm writing a program for work which gets a semi-real time feed of data. The question arose about how much of the data would we lose if the server crashes. //Edit Thanks to everyone for the responses - there were some really good answers, and it all helped out. Thanks again!

                        ¡El diablo está en mis pantalones! ¡Mire, mire! SELECT * FROM User WHERE Clue > 0 0 rows returned Save an Orange - Use the VCF! Personal 3D projects Just Say No to Web 2 Point Blow

                        modified on Wednesday, November 17, 2010 4:13 PM

                        M Offline
                        M Offline
                        Marc Clifton
                        wrote on last edited by
                        #11

                        Jim Crafton wrote:

                        How much data in the process of being written out to disk would be lost?

                        Well, you have to consider that the hard disk controller (and the hard disk, I imagine) will cache the write operation before performing it, so if Windows crashes, it'll still write out the data. Of course, if that also requires updating the NTFS or FAT32 info, then while the data is on the disk, it'll probably get overwritten and there'll be all sorts of hell to pay at some point. [edit] By HDC, I mean the chip on your motherboard. So you might have three layers of caching occuring: the OS, the HDC on the motherboard, and the hardware physically on the drive.[/edit] So, it's complicated. And on top of that, are you writing to the file system via the OS, or are you using a database? And how does the database write to its file store? In theory, any write should be able to hand off the entire operation to ensure data and linkage integrity even if the OS crashes, but who knows what really happens? [edit]For example, can you set up a write operation to a disk that says "write these sectors, then update these sectors", or even more interesting "update these sectors only after a successful write to those sectors", so, for example, you could update the NTFS after writing the data. Who knows? You have to talk to OS folks and probably drive manufacture folks, and probably even the folks who develop the controller chips. And then who knows if the OS even uses advanced features?[/edit] Interesting question though! Marc

                        1 Reply Last reply
                        0
                        • J Jim Crafton

                          Let's say you have a process that's running merrily along in Windows (one of the server OS's), writing to disk, maybe 1K per sec or something like that. Now Windows crashes. How much data in the process of being written out to disk would be lost? Is there some way to determine this? Are there some sort of stats that Microsoft publishes or is this really dependent on the HD? I'm writing a program for work which gets a semi-real time feed of data. The question arose about how much of the data would we lose if the server crashes. //Edit Thanks to everyone for the responses - there were some really good answers, and it all helped out. Thanks again!

                          ¡El diablo está en mis pantalones! ¡Mire, mire! SELECT * FROM User WHERE Clue > 0 0 rows returned Save an Orange - Use the VCF! Personal 3D projects Just Say No to Web 2 Point Blow

                          modified on Wednesday, November 17, 2010 4:13 PM

                          M Offline
                          M Offline
                          Michael Kingsford Gray
                          wrote on last edited by
                          #12

                          Arghh!! It is "lose", not "loose"! And you should plan for losing every bit of data in a crash, whatever prompts it. Thus the employment of a database that can cope with same, combined with programming practices that are designed to cope from the outset, such as Oracle, or SQL Server. Honestly, this has been standard fare since the 1950s. Read some books.

                          G 1 Reply Last reply
                          0
                          • J Jim Crafton

                            Let's say you have a process that's running merrily along in Windows (one of the server OS's), writing to disk, maybe 1K per sec or something like that. Now Windows crashes. How much data in the process of being written out to disk would be lost? Is there some way to determine this? Are there some sort of stats that Microsoft publishes or is this really dependent on the HD? I'm writing a program for work which gets a semi-real time feed of data. The question arose about how much of the data would we lose if the server crashes. //Edit Thanks to everyone for the responses - there were some really good answers, and it all helped out. Thanks again!

                            ¡El diablo está en mis pantalones! ¡Mire, mire! SELECT * FROM User WHERE Clue > 0 0 rows returned Save an Orange - Use the VCF! Personal 3D projects Just Say No to Web 2 Point Blow

                            modified on Wednesday, November 17, 2010 4:13 PM

                            G Offline
                            G Offline
                            George2007
                            wrote on last edited by
                            #13

                            "semi-real time feed of data" I assume this is coming from an external source (i.e. network)? If your server crashes, shouldn't you be more concerned with data loss because you are not listening anymore?

                            L 1 Reply Last reply
                            0
                            • J Jim Crafton

                              Let's say you have a process that's running merrily along in Windows (one of the server OS's), writing to disk, maybe 1K per sec or something like that. Now Windows crashes. How much data in the process of being written out to disk would be lost? Is there some way to determine this? Are there some sort of stats that Microsoft publishes or is this really dependent on the HD? I'm writing a program for work which gets a semi-real time feed of data. The question arose about how much of the data would we lose if the server crashes. //Edit Thanks to everyone for the responses - there were some really good answers, and it all helped out. Thanks again!

                              ¡El diablo está en mis pantalones! ¡Mire, mire! SELECT * FROM User WHERE Clue > 0 0 rows returned Save an Orange - Use the VCF! Personal 3D projects Just Say No to Web 2 Point Blow

                              modified on Wednesday, November 17, 2010 4:13 PM

                              S Offline
                              S Offline
                              Snorri Kristjansson
                              wrote on last edited by
                              #14

                              How about using a removable drive to store the data? E.g. a USB flash memory drive. These type of drives already have the 'Optimize for quick removal' setting ON, so no write chache on the OS side and no write chache on the hardware side either. Also use the FILE_FLAG_WRITE_THROUGH flag in your CreateFile call and write data in 'cluster' sized chunks to the file. IMHO the best solution for almost no money :) N.b. it's possible to determine if the 'Optimize for quick removal' setting is on for a drive but that involves using DeviceIoControl and 'asking' the device driver.

                              1 Reply Last reply
                              0
                              • D Dan Neely

                                Andy Brummer wrote:

                                Shouldn't the controller have a battery backup, so as long as an unbuffered disk write call completes you wouldn't loose any of that data.

                                Unfortunately even that's not a guarantee. No link handy, it's something I read about (in a comment???) on Raymond Chen's blog a year or two ago. But as of about two years ago, most (all?) hardrives (ata and scsi) lie to the rest of the system about the status of writes to their internal caches. This wonderful 'feature' came about because hard drive benchmarkers look at speed fanatically but verifiability almost never and marketing somehow talked management into making the engineers have the firmware lie about when writes are completed to inflate benchmark scores. X| Edit: I think these are the articles I was referring to, not sure because live journal is blocked. brad.livejournal.com/2116715.html brad.livejournal.com/2094221.html Sorry these aren't clickable but the linkifier has stopped working for me...

                                3x12=36 2x12=24 1x12=12 0x12=18

                                A Offline
                                A Offline
                                Andy Brummer
                                wrote on last edited by
                                #15

                                Wow. That's awful.

                                Curvature of the Mind

                                1 Reply Last reply
                                0
                                • M Michael Kingsford Gray

                                  Arghh!! It is "lose", not "loose"! And you should plan for losing every bit of data in a crash, whatever prompts it. Thus the employment of a database that can cope with same, combined with programming practices that are designed to cope from the outset, such as Oracle, or SQL Server. Honestly, this has been standard fare since the 1950s. Read some books.

                                  G Offline
                                  G Offline
                                  ghle
                                  wrote on last edited by
                                  #16

                                  Huh? I don't think Jim said it was going to a database. Didn't even say it was records. Regardless, wanting to know what failed to write to HD is NOT the same as not planning to lose everything. So, if I use a nice, robust Oracle you recommend, and the power/processor/program/O.S./controller fails, how much data will be lost, oh wise one? You can know what was written was written, but can you know what was not written in the first place? I don't think Oracle would help you out one bit here. A HDD writes sectors at a time, not bits, bytes, or K-bytes. And it reads before it writes if not in cache. Not an easy question to answer. Yes, there are ways around it, but that wasn't the question posed.

                                  Gary

                                  M 1 Reply Last reply
                                  0
                                  • G George2007

                                    "semi-real time feed of data" I assume this is coming from an external source (i.e. network)? If your server crashes, shouldn't you be more concerned with data loss because you are not listening anymore?

                                    L Offline
                                    L Offline
                                    Luc Pattyn
                                    wrote on last edited by
                                    #17

                                    Not necessarily. The server will have accepted the data it got, so the sender thinks everything is fine. As soon as the server goes down, it doesn't acknowledge new data any longer, and the sender should be aware of that. But it cannot guess how much data has been acknowledged but is nevertheless (probably) lost. Delaying the acknowledge until the data is really written is hard and slow, as it has to go all the way down to the disk sectors, and back up to the app. :)

                                    Luc Pattyn [Forum Guidelines] [Why QA sucks] [My Articles] Nil Volentibus Arduum

                                    Please use <PRE> tags for code snippets, they preserve indentation, and improve readability.

                                    1 Reply Last reply
                                    0
                                    • G ghle

                                      Huh? I don't think Jim said it was going to a database. Didn't even say it was records. Regardless, wanting to know what failed to write to HD is NOT the same as not planning to lose everything. So, if I use a nice, robust Oracle you recommend, and the power/processor/program/O.S./controller fails, how much data will be lost, oh wise one? You can know what was written was written, but can you know what was not written in the first place? I don't think Oracle would help you out one bit here. A HDD writes sectors at a time, not bits, bytes, or K-bytes. And it reads before it writes if not in cache. Not an easy question to answer. Yes, there are ways around it, but that wasn't the question posed.

                                      Gary

                                      M Offline
                                      M Offline
                                      Michael Kingsford Gray
                                      wrote on last edited by
                                      #18

                                      Both Oracle & SQL have a tried and tested logging and recovery system. A "transaction", no matter how large, will either be written in total, or not at all. This is absolutely guaranteed. It doesn't matter whether it is a power-failure in the middle of a disk write, or a meteor smashing the server. The workstation that issued the transaction is able to tell whether the transaction succeeded or failed, and act accordingly. It is up to the client software as to what to do from there on in, but it WILL know if the transaction did not "go through". Guaranteed. I have had more than a little fun testing this out for myself! :-D

                                      1 Reply Last reply
                                      0
                                      Reply
                                      • Reply as topic
                                      Log in to reply
                                      • Oldest to Newest
                                      • Newest to Oldest
                                      • Most Votes


                                      • Login

                                      • Don't have an account? Register

                                      • Login or register to search.
                                      • First post
                                        Last post
                                      0
                                      • Categories
                                      • Recent
                                      • Tags
                                      • Popular
                                      • World
                                      • Users
                                      • Groups