Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Other Discussions
  3. The Weird and The Wonderful
  4. Ultimate Database Fail [updated]

Ultimate Database Fail [updated]

Scheduled Pinned Locked Moved The Weird and The Wonderful
databasehostingcloudhelpdiscussion
20 Posts 4 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • W wizardzz

    So, I recently changed jobs (4 months ago) and have found some humorous legacy code in the past. But nothing, nothing comes close to the database horror I found. For several days, the senior architect and my manager have been trying to figure out why our nightly and weekly stored procedures were taking so long as to time out. As it turns out, neither have and dba experience. Well today the discussion turned to hiring an outsider to audit the tables, index, stored procedures, etc. I kindly volunteered to have a look myself, kindly pointing out my database experience. After 15 minutes I found the problem. There is a table called trades, it has ~23,000,000 rows, and is almost always queried by tradeid and date columns. Not surprisingly the primary key consists of tradeid and date. I looked at the indexes and was shocked to see that there was a clustered index made up of open_quantity and order_type; values of which are almost always 0 and 0 or 1 respectively and almost never search criteria. I looked back at the primary key, and it was set to non-clustered. This table has existed as such for years- recording and reporting positions and trends on a public facing website the entire time. I was told I earned my pay for the week, but I'm not allowed to just go home :confused: Update: We created a clustered index on the id and time column and kept the original clustered index as a non-clustered index. It did solve the problems we were having, primarily when inserts were extremely fast, select queries would become impossibly slow.

    "Life should not be a journey to the grave with the intention of arriving safely in a pretty and well preserved body, but rather to skid in broadside in a cloud of smoke, thoroughly used up, totally worn out, and loudly proclaiming "Wow! What a Ride!" — Hunter S. Thompson

    modified on Tuesday, March 22, 2011 10:30 AM

    A Offline
    A Offline
    AspDotNetDev
    wrote on last edited by
    #6

    I wouldn't think the clustering vs non-clustering would not be a very big problem. The open_quantity and order_type index could be useful if the goal is, for example, to process all orders which have not yet been processed. Rather than scan 23,000,000 rows, you'd only have to process the 10 unprocessed orders. You sure you found the problem?

    [WikiLeaks Cablegate Cables]

    W 2 Replies Last reply
    0
    • A AspDotNetDev

      I wouldn't think the clustering vs non-clustering would not be a very big problem. The open_quantity and order_type index could be useful if the goal is, for example, to process all orders which have not yet been processed. Rather than scan 23,000,000 rows, you'd only have to process the 10 unprocessed orders. You sure you found the problem?

      [WikiLeaks Cablegate Cables]

      W Offline
      W Offline
      wizardzz
      wrote on last edited by
      #7

      The open_quantity is never a search criteria. Its value is 0 99.99% of the time, and we never search by open_quantity or order_type, if we did, the order_id and date would be included as well. Have you ever inserted large sets of data into a table that had a clustered index, while the inserted data would have to be physically stored into the middle of the table because of the index? It's quite ugly. I'm not suggesting eliminating the clustered index, but there's no reason the data need be physically stored in order according to open quantity and order type.

      "Life should not be a journey to the grave with the intention of arriving safely in a pretty and well preserved body, but rather to skid in broadside in a cloud of smoke, thoroughly used up, totally worn out, and loudly proclaiming "Wow! What a Ride!" — Hunter S. Thompson

      A 1 Reply Last reply
      0
      • W wizardzz

        The open_quantity is never a search criteria. Its value is 0 99.99% of the time, and we never search by open_quantity or order_type, if we did, the order_id and date would be included as well. Have you ever inserted large sets of data into a table that had a clustered index, while the inserted data would have to be physically stored into the middle of the table because of the index? It's quite ugly. I'm not suggesting eliminating the clustered index, but there's no reason the data need be physically stored in order according to open quantity and order type.

        "Life should not be a journey to the grave with the intention of arriving safely in a pretty and well preserved body, but rather to skid in broadside in a cloud of smoke, thoroughly used up, totally worn out, and loudly proclaiming "Wow! What a Ride!" — Hunter S. Thompson

        A Offline
        A Offline
        AspDotNetDev
        wrote on last edited by
        #8

        You are right, there's not really a good reason to store that type of data using a clustered index. Still, I don't see anything that you've come across which would fix the problem you described. Would inserting a value with a bad clustered index cause all the rows in the table to be moved so that the physical location on the hard drive is stricly enforced (i.e., so rows will be ordered on the HD by the clustered index)? Or is SQL smart enough to insert the data without rearranging it (i.e., by using a tree-like structure)?

        [WikiLeaks Cablegate Cables]

        W A 2 Replies Last reply
        0
        • A AspDotNetDev

          You are right, there's not really a good reason to store that type of data using a clustered index. Still, I don't see anything that you've come across which would fix the problem you described. Would inserting a value with a bad clustered index cause all the rows in the table to be moved so that the physical location on the hard drive is stricly enforced (i.e., so rows will be ordered on the HD by the clustered index)? Or is SQL smart enough to insert the data without rearranging it (i.e., by using a tree-like structure)?

          [WikiLeaks Cablegate Cables]

          W Offline
          W Offline
          wizardzz
          wrote on last edited by
          #9

          Clustered indexes physically store the rows in the order provided by the index, as far as I know, it can be quite taxing on resources so I'm not sure what logic they use for it.

          "Life should not be a journey to the grave with the intention of arriving safely in a pretty and well preserved body, but rather to skid in broadside in a cloud of smoke, thoroughly used up, totally worn out, and loudly proclaiming "Wow! What a Ride!" — Hunter S. Thompson

          A 1 Reply Last reply
          0
          • A AspDotNetDev

            You are right, there's not really a good reason to store that type of data using a clustered index. Still, I don't see anything that you've come across which would fix the problem you described. Would inserting a value with a bad clustered index cause all the rows in the table to be moved so that the physical location on the hard drive is stricly enforced (i.e., so rows will be ordered on the HD by the clustered index)? Or is SQL smart enough to insert the data without rearranging it (i.e., by using a tree-like structure)?

            [WikiLeaks Cablegate Cables]

            A Offline
            A Offline
            Andy Brummer
            wrote on last edited by
            #10

            A clustered index is by definition used to physically order the table data in the file. Non-clustered indexes are only ordered by the data in the index. http://www.simple-talk.com/sql/learn-sql-server/sql-server-index-basics/[^]

            Curvature of the Mind now with 3D

            A 1 Reply Last reply
            0
            • A Andy Brummer

              A clustered index is by definition used to physically order the table data in the file. Non-clustered indexes are only ordered by the data in the index. http://www.simple-talk.com/sql/learn-sql-server/sql-server-index-basics/[^]

              Curvature of the Mind now with 3D

              A Offline
              A Offline
              AspDotNetDev
              wrote on last edited by
              #11

              Thanks, that is what I suspected. Looks like the row data is stored with the custered index. Since the indexes, including the clustered index, are stored as B-trees, there should indeed be no huge performance hit (i.e., all the data isn't going to have to be reorganized on disk if a row is inserted into the middle of the clustered index). The only time a performance hit would occur would be during a reindexing of the table (even then, the hit would turn an O(1) operation into an O(1 * x) operation, and since x is effectively constant, that would reduce to a O(1) operation). Since those shouldn't be occurring very often, that should not cause the problem described by the OP.

              [WikiLeaks Cablegate Cables]

              A W 2 Replies Last reply
              0
              • W wizardzz

                Clustered indexes physically store the rows in the order provided by the index, as far as I know, it can be quite taxing on resources so I'm not sure what logic they use for it.

                "Life should not be a journey to the grave with the intention of arriving safely in a pretty and well preserved body, but rather to skid in broadside in a cloud of smoke, thoroughly used up, totally worn out, and loudly proclaiming "Wow! What a Ride!" — Hunter S. Thompson

                A Offline
                A Offline
                AspDotNetDev
                wrote on last edited by
                #12

                See here. Basically, the row data is stored with the clustered index, which is stored as a B-tree, so insertion is not an expensive operation (i.e., this should not be the cause of your problem).

                [WikiLeaks Cablegate Cables]

                1 Reply Last reply
                0
                • A AspDotNetDev

                  Thanks, that is what I suspected. Looks like the row data is stored with the custered index. Since the indexes, including the clustered index, are stored as B-trees, there should indeed be no huge performance hit (i.e., all the data isn't going to have to be reorganized on disk if a row is inserted into the middle of the clustered index). The only time a performance hit would occur would be during a reindexing of the table (even then, the hit would turn an O(1) operation into an O(1 * x) operation, and since x is effectively constant, that would reduce to a O(1) operation). Since those shouldn't be occurring very often, that should not cause the problem described by the OP.

                  [WikiLeaks Cablegate Cables]

                  A Offline
                  A Offline
                  Andy Brummer
                  wrote on last edited by
                  #13

                  What happens is the pages fill up and then get split as records are inserted in the middle. Splitting pages is significantly slower than just appending records on the end. It's not the O(whatever) performance, but the big ass constant performance of re-arranging data on a disk.

                  Curvature of the Mind now with 3D

                  A 1 Reply Last reply
                  0
                  • A AspDotNetDev

                    Thanks, that is what I suspected. Looks like the row data is stored with the custered index. Since the indexes, including the clustered index, are stored as B-trees, there should indeed be no huge performance hit (i.e., all the data isn't going to have to be reorganized on disk if a row is inserted into the middle of the clustered index). The only time a performance hit would occur would be during a reindexing of the table (even then, the hit would turn an O(1) operation into an O(1 * x) operation, and since x is effectively constant, that would reduce to a O(1) operation). Since those shouldn't be occurring very often, that should not cause the problem described by the OP.

                    [WikiLeaks Cablegate Cables]

                    W Offline
                    W Offline
                    wizardzz
                    wrote on last edited by
                    #14

                    I am only speaking from experience here: Inserting large numbers of data at a high rate of speed into the middle of a table with seemingly random clustered index can really hurt performance. I've done it in the past, rate: ~10-30k rows per second on a table ~ 100m to 1b rows in size. Using a clustered index, and the pushing the data to the table in the order of the clustered index made queries nearly unusable or in the very least lengthy, however when doing a time-series based search, on a table with a time series clustered index was fine and allowed for a realtime monitoring solution.

                    "Life should not be a journey to the grave with the intention of arriving safely in a pretty and well preserved body, but rather to skid in broadside in a cloud of smoke, thoroughly used up, totally worn out, and loudly proclaiming "Wow! What a Ride!" — Hunter S. Thompson

                    A 1 Reply Last reply
                    0
                    • A Andy Brummer

                      What happens is the pages fill up and then get split as records are inserted in the middle. Splitting pages is significantly slower than just appending records on the end. It's not the O(whatever) performance, but the big ass constant performance of re-arranging data on a disk.

                      Curvature of the Mind now with 3D

                      A Offline
                      A Offline
                      AspDotNetDev
                      wrote on last edited by
                      #15

                      A page is 8KB. That's not going to take long to move around. The time to split a page also averages in the long run to be O(1). Say a page holds 50 records. It may split on a single record insertion, but those other 49 record insertions would not cause a split. The average time per insertion would be O(1). Now, if SQL sorts data within the page for each insert, that may slow things down a bit, as the entire page would essentially be rewritten each time randomly ordered data got inserted. Not sure exactly how it works at that low of a level. Even in this case, though, SQL should be able to optimize things. For example, if records are inserted in batches, adjacent records need not cause multiple sorts within a page... SQL can figure out the position within the page, and sort all the data at once before writing it to disk. And in the case that many small insertions are peformed, those individual inserts shouldn't take long enough to slow other stuff down. Of course, transaction locks and such could bork things up, but that's a different issue.

                      [WikiLeaks Cablegate Cables]

                      A 1 Reply Last reply
                      0
                      • W wizardzz

                        I am only speaking from experience here: Inserting large numbers of data at a high rate of speed into the middle of a table with seemingly random clustered index can really hurt performance. I've done it in the past, rate: ~10-30k rows per second on a table ~ 100m to 1b rows in size. Using a clustered index, and the pushing the data to the table in the order of the clustered index made queries nearly unusable or in the very least lengthy, however when doing a time-series based search, on a table with a time series clustered index was fine and allowed for a realtime monitoring solution.

                        "Life should not be a journey to the grave with the intention of arriving safely in a pretty and well preserved body, but rather to skid in broadside in a cloud of smoke, thoroughly used up, totally worn out, and loudly proclaiming "Wow! What a Ride!" — Hunter S. Thompson

                        A Offline
                        A Offline
                        AspDotNetDev
                        wrote on last edited by
                        #16

                        Well, the clustered index you talked about is certainly not ideal and is less performant than it should be. I supposed the magnitude of the problem depends on the exact amount of data you are using and the frequency of inserts/updates/reads/deletes. I guess the "ultimate" fail here would be that the others you work with didn't find the problem sooner.

                        [WikiLeaks Cablegate Cables]

                        A 1 Reply Last reply
                        0
                        • A AspDotNetDev

                          A page is 8KB. That's not going to take long to move around. The time to split a page also averages in the long run to be O(1). Say a page holds 50 records. It may split on a single record insertion, but those other 49 record insertions would not cause a split. The average time per insertion would be O(1). Now, if SQL sorts data within the page for each insert, that may slow things down a bit, as the entire page would essentially be rewritten each time randomly ordered data got inserted. Not sure exactly how it works at that low of a level. Even in this case, though, SQL should be able to optimize things. For example, if records are inserted in batches, adjacent records need not cause multiple sorts within a page... SQL can figure out the position within the page, and sort all the data at once before writing it to disk. And in the case that many small insertions are peformed, those individual inserts shouldn't take long enough to slow other stuff down. Of course, transaction locks and such could bork things up, but that's a different issue.

                          [WikiLeaks Cablegate Cables]

                          A Offline
                          A Offline
                          Andy Brummer
                          wrote on last edited by
                          #17

                          It all depends. If the entire table can fit inside the memory of your sql server and there isn't a lot of contention then it doesn't really matter one way or the other. If you have 32Gigs of memory in your server and have 50 databases on the server with about 1 TB of data, then it makes the difference between inserting 5 million records in a few minutes or in a few days. For another server it made the difference between inserting 35,000 records a second for 22 hours straight and starting to slow down after a few hours.

                          Curvature of the Mind now with 3D

                          A 1 Reply Last reply
                          0
                          • A AspDotNetDev

                            Well, the clustered index you talked about is certainly not ideal and is less performant than it should be. I supposed the magnitude of the problem depends on the exact amount of data you are using and the frequency of inserts/updates/reads/deletes. I guess the "ultimate" fail here would be that the others you work with didn't find the problem sooner.

                            [WikiLeaks Cablegate Cables]

                            A Offline
                            A Offline
                            Andy Brummer
                            wrote on last edited by
                            #18

                            A lot of the time has to do with speculative reads missing and the random read write access on the disks pushing them from < 10ms access times to over 100ms. SQL server optimizes it's accesses for overall throughput and that can drive latency to ridiculous levels. As much as it doesn't sound like it should be a big deal. I can tell you from real world experience it is.

                            Curvature of the Mind now with 3D

                            1 Reply Last reply
                            0
                            • A Andy Brummer

                              It all depends. If the entire table can fit inside the memory of your sql server and there isn't a lot of contention then it doesn't really matter one way or the other. If you have 32Gigs of memory in your server and have 50 databases on the server with about 1 TB of data, then it makes the difference between inserting 5 million records in a few minutes or in a few days. For another server it made the difference between inserting 35,000 records a second for 22 hours straight and starting to slow down after a few hours.

                              Curvature of the Mind now with 3D

                              A Offline
                              A Offline
                              AspDotNetDev
                              wrote on last edited by
                              #19

                              Andy Brummer wrote:

                              inserting 35,000 records a second for 22 hours straight

                              Inserting nearly 3 billion records is going to take a while no matter what you do.

                              Andy Brummer wrote:

                              it makes the difference between inserting 5 million records in a few minutes or in a few days.

                              I don't think it would cause a 1,000x slowdown. 10x, maybe... but 1,000x? That doesn't sound right.

                              [WikiLeaks Cablegate Cables]

                              1 Reply Last reply
                              0
                              • A AspDotNetDev

                                I wouldn't think the clustering vs non-clustering would not be a very big problem. The open_quantity and order_type index could be useful if the goal is, for example, to process all orders which have not yet been processed. Rather than scan 23,000,000 rows, you'd only have to process the 10 unprocessed orders. You sure you found the problem?

                                [WikiLeaks Cablegate Cables]

                                W Offline
                                W Offline
                                wizardzz
                                wrote on last edited by
                                #20

                                I don't know why this was down voted, hopefully I corrected it enough.

                                "Life should not be a journey to the grave with the intention of arriving safely in a pretty and well preserved body, but rather to skid in broadside in a cloud of smoke, thoroughly used up, totally worn out, and loudly proclaiming "Wow! What a Ride!" — Hunter S. Thompson

                                1 Reply Last reply
                                0
                                Reply
                                • Reply as topic
                                Log in to reply
                                • Oldest to Newest
                                • Newest to Oldest
                                • Most Votes


                                • Login

                                • Don't have an account? Register

                                • Login or register to search.
                                • First post
                                  Last post
                                0
                                • Categories
                                • Recent
                                • Tags
                                • Popular
                                • World
                                • Users
                                • Groups