Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Other Discussions
  3. The Weird and The Wonderful
  4. Ultimate Database Fail [updated]

Ultimate Database Fail [updated]

Scheduled Pinned Locked Moved The Weird and The Wonderful
databasehostingcloudhelpdiscussion
20 Posts 4 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A AspDotNetDev

    You are right, there's not really a good reason to store that type of data using a clustered index. Still, I don't see anything that you've come across which would fix the problem you described. Would inserting a value with a bad clustered index cause all the rows in the table to be moved so that the physical location on the hard drive is stricly enforced (i.e., so rows will be ordered on the HD by the clustered index)? Or is SQL smart enough to insert the data without rearranging it (i.e., by using a tree-like structure)?

    [WikiLeaks Cablegate Cables]

    W Offline
    W Offline
    wizardzz
    wrote on last edited by
    #9

    Clustered indexes physically store the rows in the order provided by the index, as far as I know, it can be quite taxing on resources so I'm not sure what logic they use for it.

    "Life should not be a journey to the grave with the intention of arriving safely in a pretty and well preserved body, but rather to skid in broadside in a cloud of smoke, thoroughly used up, totally worn out, and loudly proclaiming "Wow! What a Ride!" — Hunter S. Thompson

    A 1 Reply Last reply
    0
    • A AspDotNetDev

      You are right, there's not really a good reason to store that type of data using a clustered index. Still, I don't see anything that you've come across which would fix the problem you described. Would inserting a value with a bad clustered index cause all the rows in the table to be moved so that the physical location on the hard drive is stricly enforced (i.e., so rows will be ordered on the HD by the clustered index)? Or is SQL smart enough to insert the data without rearranging it (i.e., by using a tree-like structure)?

      [WikiLeaks Cablegate Cables]

      A Offline
      A Offline
      Andy Brummer
      wrote on last edited by
      #10

      A clustered index is by definition used to physically order the table data in the file. Non-clustered indexes are only ordered by the data in the index. http://www.simple-talk.com/sql/learn-sql-server/sql-server-index-basics/[^]

      Curvature of the Mind now with 3D

      A 1 Reply Last reply
      0
      • A Andy Brummer

        A clustered index is by definition used to physically order the table data in the file. Non-clustered indexes are only ordered by the data in the index. http://www.simple-talk.com/sql/learn-sql-server/sql-server-index-basics/[^]

        Curvature of the Mind now with 3D

        A Offline
        A Offline
        AspDotNetDev
        wrote on last edited by
        #11

        Thanks, that is what I suspected. Looks like the row data is stored with the custered index. Since the indexes, including the clustered index, are stored as B-trees, there should indeed be no huge performance hit (i.e., all the data isn't going to have to be reorganized on disk if a row is inserted into the middle of the clustered index). The only time a performance hit would occur would be during a reindexing of the table (even then, the hit would turn an O(1) operation into an O(1 * x) operation, and since x is effectively constant, that would reduce to a O(1) operation). Since those shouldn't be occurring very often, that should not cause the problem described by the OP.

        [WikiLeaks Cablegate Cables]

        A W 2 Replies Last reply
        0
        • W wizardzz

          Clustered indexes physically store the rows in the order provided by the index, as far as I know, it can be quite taxing on resources so I'm not sure what logic they use for it.

          "Life should not be a journey to the grave with the intention of arriving safely in a pretty and well preserved body, but rather to skid in broadside in a cloud of smoke, thoroughly used up, totally worn out, and loudly proclaiming "Wow! What a Ride!" — Hunter S. Thompson

          A Offline
          A Offline
          AspDotNetDev
          wrote on last edited by
          #12

          See here. Basically, the row data is stored with the clustered index, which is stored as a B-tree, so insertion is not an expensive operation (i.e., this should not be the cause of your problem).

          [WikiLeaks Cablegate Cables]

          1 Reply Last reply
          0
          • A AspDotNetDev

            Thanks, that is what I suspected. Looks like the row data is stored with the custered index. Since the indexes, including the clustered index, are stored as B-trees, there should indeed be no huge performance hit (i.e., all the data isn't going to have to be reorganized on disk if a row is inserted into the middle of the clustered index). The only time a performance hit would occur would be during a reindexing of the table (even then, the hit would turn an O(1) operation into an O(1 * x) operation, and since x is effectively constant, that would reduce to a O(1) operation). Since those shouldn't be occurring very often, that should not cause the problem described by the OP.

            [WikiLeaks Cablegate Cables]

            A Offline
            A Offline
            Andy Brummer
            wrote on last edited by
            #13

            What happens is the pages fill up and then get split as records are inserted in the middle. Splitting pages is significantly slower than just appending records on the end. It's not the O(whatever) performance, but the big ass constant performance of re-arranging data on a disk.

            Curvature of the Mind now with 3D

            A 1 Reply Last reply
            0
            • A AspDotNetDev

              Thanks, that is what I suspected. Looks like the row data is stored with the custered index. Since the indexes, including the clustered index, are stored as B-trees, there should indeed be no huge performance hit (i.e., all the data isn't going to have to be reorganized on disk if a row is inserted into the middle of the clustered index). The only time a performance hit would occur would be during a reindexing of the table (even then, the hit would turn an O(1) operation into an O(1 * x) operation, and since x is effectively constant, that would reduce to a O(1) operation). Since those shouldn't be occurring very often, that should not cause the problem described by the OP.

              [WikiLeaks Cablegate Cables]

              W Offline
              W Offline
              wizardzz
              wrote on last edited by
              #14

              I am only speaking from experience here: Inserting large numbers of data at a high rate of speed into the middle of a table with seemingly random clustered index can really hurt performance. I've done it in the past, rate: ~10-30k rows per second on a table ~ 100m to 1b rows in size. Using a clustered index, and the pushing the data to the table in the order of the clustered index made queries nearly unusable or in the very least lengthy, however when doing a time-series based search, on a table with a time series clustered index was fine and allowed for a realtime monitoring solution.

              "Life should not be a journey to the grave with the intention of arriving safely in a pretty and well preserved body, but rather to skid in broadside in a cloud of smoke, thoroughly used up, totally worn out, and loudly proclaiming "Wow! What a Ride!" — Hunter S. Thompson

              A 1 Reply Last reply
              0
              • A Andy Brummer

                What happens is the pages fill up and then get split as records are inserted in the middle. Splitting pages is significantly slower than just appending records on the end. It's not the O(whatever) performance, but the big ass constant performance of re-arranging data on a disk.

                Curvature of the Mind now with 3D

                A Offline
                A Offline
                AspDotNetDev
                wrote on last edited by
                #15

                A page is 8KB. That's not going to take long to move around. The time to split a page also averages in the long run to be O(1). Say a page holds 50 records. It may split on a single record insertion, but those other 49 record insertions would not cause a split. The average time per insertion would be O(1). Now, if SQL sorts data within the page for each insert, that may slow things down a bit, as the entire page would essentially be rewritten each time randomly ordered data got inserted. Not sure exactly how it works at that low of a level. Even in this case, though, SQL should be able to optimize things. For example, if records are inserted in batches, adjacent records need not cause multiple sorts within a page... SQL can figure out the position within the page, and sort all the data at once before writing it to disk. And in the case that many small insertions are peformed, those individual inserts shouldn't take long enough to slow other stuff down. Of course, transaction locks and such could bork things up, but that's a different issue.

                [WikiLeaks Cablegate Cables]

                A 1 Reply Last reply
                0
                • W wizardzz

                  I am only speaking from experience here: Inserting large numbers of data at a high rate of speed into the middle of a table with seemingly random clustered index can really hurt performance. I've done it in the past, rate: ~10-30k rows per second on a table ~ 100m to 1b rows in size. Using a clustered index, and the pushing the data to the table in the order of the clustered index made queries nearly unusable or in the very least lengthy, however when doing a time-series based search, on a table with a time series clustered index was fine and allowed for a realtime monitoring solution.

                  "Life should not be a journey to the grave with the intention of arriving safely in a pretty and well preserved body, but rather to skid in broadside in a cloud of smoke, thoroughly used up, totally worn out, and loudly proclaiming "Wow! What a Ride!" — Hunter S. Thompson

                  A Offline
                  A Offline
                  AspDotNetDev
                  wrote on last edited by
                  #16

                  Well, the clustered index you talked about is certainly not ideal and is less performant than it should be. I supposed the magnitude of the problem depends on the exact amount of data you are using and the frequency of inserts/updates/reads/deletes. I guess the "ultimate" fail here would be that the others you work with didn't find the problem sooner.

                  [WikiLeaks Cablegate Cables]

                  A 1 Reply Last reply
                  0
                  • A AspDotNetDev

                    A page is 8KB. That's not going to take long to move around. The time to split a page also averages in the long run to be O(1). Say a page holds 50 records. It may split on a single record insertion, but those other 49 record insertions would not cause a split. The average time per insertion would be O(1). Now, if SQL sorts data within the page for each insert, that may slow things down a bit, as the entire page would essentially be rewritten each time randomly ordered data got inserted. Not sure exactly how it works at that low of a level. Even in this case, though, SQL should be able to optimize things. For example, if records are inserted in batches, adjacent records need not cause multiple sorts within a page... SQL can figure out the position within the page, and sort all the data at once before writing it to disk. And in the case that many small insertions are peformed, those individual inserts shouldn't take long enough to slow other stuff down. Of course, transaction locks and such could bork things up, but that's a different issue.

                    [WikiLeaks Cablegate Cables]

                    A Offline
                    A Offline
                    Andy Brummer
                    wrote on last edited by
                    #17

                    It all depends. If the entire table can fit inside the memory of your sql server and there isn't a lot of contention then it doesn't really matter one way or the other. If you have 32Gigs of memory in your server and have 50 databases on the server with about 1 TB of data, then it makes the difference between inserting 5 million records in a few minutes or in a few days. For another server it made the difference between inserting 35,000 records a second for 22 hours straight and starting to slow down after a few hours.

                    Curvature of the Mind now with 3D

                    A 1 Reply Last reply
                    0
                    • A AspDotNetDev

                      Well, the clustered index you talked about is certainly not ideal and is less performant than it should be. I supposed the magnitude of the problem depends on the exact amount of data you are using and the frequency of inserts/updates/reads/deletes. I guess the "ultimate" fail here would be that the others you work with didn't find the problem sooner.

                      [WikiLeaks Cablegate Cables]

                      A Offline
                      A Offline
                      Andy Brummer
                      wrote on last edited by
                      #18

                      A lot of the time has to do with speculative reads missing and the random read write access on the disks pushing them from < 10ms access times to over 100ms. SQL server optimizes it's accesses for overall throughput and that can drive latency to ridiculous levels. As much as it doesn't sound like it should be a big deal. I can tell you from real world experience it is.

                      Curvature of the Mind now with 3D

                      1 Reply Last reply
                      0
                      • A Andy Brummer

                        It all depends. If the entire table can fit inside the memory of your sql server and there isn't a lot of contention then it doesn't really matter one way or the other. If you have 32Gigs of memory in your server and have 50 databases on the server with about 1 TB of data, then it makes the difference between inserting 5 million records in a few minutes or in a few days. For another server it made the difference between inserting 35,000 records a second for 22 hours straight and starting to slow down after a few hours.

                        Curvature of the Mind now with 3D

                        A Offline
                        A Offline
                        AspDotNetDev
                        wrote on last edited by
                        #19

                        Andy Brummer wrote:

                        inserting 35,000 records a second for 22 hours straight

                        Inserting nearly 3 billion records is going to take a while no matter what you do.

                        Andy Brummer wrote:

                        it makes the difference between inserting 5 million records in a few minutes or in a few days.

                        I don't think it would cause a 1,000x slowdown. 10x, maybe... but 1,000x? That doesn't sound right.

                        [WikiLeaks Cablegate Cables]

                        1 Reply Last reply
                        0
                        • A AspDotNetDev

                          I wouldn't think the clustering vs non-clustering would not be a very big problem. The open_quantity and order_type index could be useful if the goal is, for example, to process all orders which have not yet been processed. Rather than scan 23,000,000 rows, you'd only have to process the 10 unprocessed orders. You sure you found the problem?

                          [WikiLeaks Cablegate Cables]

                          W Offline
                          W Offline
                          wizardzz
                          wrote on last edited by
                          #20

                          I don't know why this was down voted, hopefully I corrected it enough.

                          "Life should not be a journey to the grave with the intention of arriving safely in a pretty and well preserved body, but rather to skid in broadside in a cloud of smoke, thoroughly used up, totally worn out, and loudly proclaiming "Wow! What a Ride!" — Hunter S. Thompson

                          1 Reply Last reply
                          0
                          Reply
                          • Reply as topic
                          Log in to reply
                          • Oldest to Newest
                          • Newest to Oldest
                          • Most Votes


                          • Login

                          • Don't have an account? Register

                          • Login or register to search.
                          • First post
                            Last post
                          0
                          • Categories
                          • Recent
                          • Tags
                          • Popular
                          • World
                          • Users
                          • Groups