Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Database & SysAdmin
  3. Database
  4. Naming my clustered index column which isn't the PK (and is a non-PK clustered Identity a good idea?)

Naming my clustered index column which isn't the PK (and is a non-PK clustered Identity a good idea?)

Scheduled Pinned Locked Moved Database
databasehelpquestionsql-serversysadmin
17 Posts 4 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J JChrisCompton

    > I believe you're mixing up indexes with keys. Perhaps - let me try again; maybe I didn't convey this well. The index isn't just for lookup speed, it is a unique index (same functionality as a unique constraint unless I'm missing something... I'm more of an 'app dba' than a 'real DBA') to enforce referential integrity because... referential integrity. The addition of the Identity field was due to my concern about page splits. The guid will remain the primary key for the table. The guid is generated within a vendor's application so there isn't much I can do about that at the moment. I know (or at least 'think') that fill factor is only applied at rebuild or creation. This troubled me since guids are random (in the sense that where a newly generated guid falls numerically in relation to an existing list of guids is a random position). Inserts will fill up any fill factor <100 and then splits start (and the splits are equivalent to fill factor = 100 as the table grows). So I came up with the idea of a Clustered Identity as the 'physical sorting field' (numerically increasing won't generate page split) because it seems a better idea than (1) guid inserts causing page splits, and (2) better than making the table a heap (without a clustered index). Someone mentioned the volume being low. Yes, last month was 3,000 but this will be expanded from 1 to 5 categories. This could cause 3,100 entries next month or 31,000 entries. I don't have the time of day of the inserts but it is all first shift and (I think) mostly at the start of the month. If I can cause less work for the db, it feels like I should. One new piece of information that I'll share: testing this I don't see as much of a negative impact from a pk clustered guid as I expected.

    J Offline
    J Offline
    Jorgen Andersson
    wrote on last edited by
    #8

    JChrisCompton wrote:

    The index isn't just for lookup speed, it is a unique index (same functionality as a unique constraint unless I'm missing something... I'm more of an 'app dba' than a 'real DBA') to enforce referential integrity because... referential integrity.

    You're not missing anything, but you need to differentiate the purpose. A key, (Unique or primary) needs a unique index for its function, and while it's enough to create a unique index to enforce the uniqueness of a column(s) you should really create a unique (or primary) key to show the intent if nothing else.

    From the help file

    Creating a unique index guarantees that any attempt to duplicate key values fails. There are no significant differences between creating a UNIQUE constraint and creating a unique index that is independent of a constraint. Data validation occurs in the same manner, and the query optimizer does not differentiate between a unique index created by a constraint or manually created. However, you should create a UNIQUE constraint on the column when data integrity is the objective. This makes the objective of the index clear. (emphasis mine.)

    There's also another slight difference, on a unique index you can actually add the option "Ignore duplicate values", this is not allowed for a constraint.

    JChrisCompton wrote:

    So I came up with the idea of a Clustered Identity as the 'physical sorting field'

    This is a good idea, but it would need to be the primary key in that case. Which doesn't matter at all, from a relational point of view there's no difference between a primary key and a unique key. It's only about how the data is stored in the case of clustered tables and magnetic tapes. :)

    JChrisCompton wrote:

    Someone mentioned the volume being low. Yes, last month was 3,000 but this will be expanded from 1 to 5 categories. This could cause 3,100 entries next month or 31,000 entries

    To put it in relation, at my job we insert approximately 300000 rows every week on a clustered table with 300 columns and a semi-random natural key. It takes a couple of minutes.

    Wrong is evil and must be defeated. - Jeff Ello

    J 1 Reply Last reply
    0
    • L Lost User

      JChrisCompton wrote:

      I know (or at least 'think')

      Specify Fill Factor for an Index | Microsoft Docs[^] :thumbsup:

      Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^] "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

      J Offline
      J Offline
      JChrisCompton
      wrote on last edited by
      #9

      Yes, it says "The fill-factor option is provided for... [when] an index is created or rebuilt..." "I think" was intended to mean "I think fill-factor has no influence except during create/rebuild". The issue that concerns me is page splits. From memory of a MSFT SQL Server Performance Tuning class I took five years ago, I think tables with a clustered index are a B+ tree with doubly linked leaf nodes. (class and school were long ago :–) When an insert happens, and the bottom node of the tree is full, then a split happens. When SSvr splits a node, my understanding is that it just makes a new node - the fill-factor doesn't enter in to it. If I specify my fill factor to be "only fill 1/3" it doesn't matter - a new node is created (instead of 2 extra nodes being created then each adjusted to 1/3 full). This is done without rebalancing, because it is the fastest solution. Is my mental model right, or are there things in newer versions that makes 'real life' different? (maybe some auto tuning feature I've missed)

      L 1 Reply Last reply
      0
      • J JChrisCompton

        Yes, it says "The fill-factor option is provided for... [when] an index is created or rebuilt..." "I think" was intended to mean "I think fill-factor has no influence except during create/rebuild". The issue that concerns me is page splits. From memory of a MSFT SQL Server Performance Tuning class I took five years ago, I think tables with a clustered index are a B+ tree with doubly linked leaf nodes. (class and school were long ago :–) When an insert happens, and the bottom node of the tree is full, then a split happens. When SSvr splits a node, my understanding is that it just makes a new node - the fill-factor doesn't enter in to it. If I specify my fill factor to be "only fill 1/3" it doesn't matter - a new node is created (instead of 2 extra nodes being created then each adjusted to 1/3 full). This is done without rebalancing, because it is the fastest solution. Is my mental model right, or are there things in newer versions that makes 'real life' different? (maybe some auto tuning feature I've missed)

        L Offline
        L Offline
        Lost User
        wrote on last edited by
        #10

        I never did trees in school.

        JChrisCompton wrote:

        The issue that concerns me is page splits.

        From the page I linked; "When an index is created or rebuilt, the fill-factor value determines the percentage of space on each leaf-level page to be filled with data, reserving the remainder on each page as free space for future growth." "How do you avoid them? There are strategies for doing this, one such strategy is to use a low fill factor when you create and index, but that will be at the expense of read operations and because you usually have more reads to writes then that’s bad." What is a page split? What happens? Why does it happen? Why worry? - Tony Rogerson's ramblings on SQL Server[^]

        Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^] "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

        J 1 Reply Last reply
        0
        • J Jorgen Andersson

          JChrisCompton wrote:

          The index isn't just for lookup speed, it is a unique index (same functionality as a unique constraint unless I'm missing something... I'm more of an 'app dba' than a 'real DBA') to enforce referential integrity because... referential integrity.

          You're not missing anything, but you need to differentiate the purpose. A key, (Unique or primary) needs a unique index for its function, and while it's enough to create a unique index to enforce the uniqueness of a column(s) you should really create a unique (or primary) key to show the intent if nothing else.

          From the help file

          Creating a unique index guarantees that any attempt to duplicate key values fails. There are no significant differences between creating a UNIQUE constraint and creating a unique index that is independent of a constraint. Data validation occurs in the same manner, and the query optimizer does not differentiate between a unique index created by a constraint or manually created. However, you should create a UNIQUE constraint on the column when data integrity is the objective. This makes the objective of the index clear. (emphasis mine.)

          There's also another slight difference, on a unique index you can actually add the option "Ignore duplicate values", this is not allowed for a constraint.

          JChrisCompton wrote:

          So I came up with the idea of a Clustered Identity as the 'physical sorting field'

          This is a good idea, but it would need to be the primary key in that case. Which doesn't matter at all, from a relational point of view there's no difference between a primary key and a unique key. It's only about how the data is stored in the case of clustered tables and magnetic tapes. :)

          JChrisCompton wrote:

          Someone mentioned the volume being low. Yes, last month was 3,000 but this will be expanded from 1 to 5 categories. This could cause 3,100 entries next month or 31,000 entries

          To put it in relation, at my job we insert approximately 300000 rows every week on a clustered table with 300 columns and a semi-random natural key. It takes a couple of minutes.

          Wrong is evil and must be defeated. - Jeff Ello

          J Offline
          J Offline
          JChrisCompton
          wrote on last edited by
          #11

          Just so you know (in the event that I ask a question here again) those inserts would be against the same database server which supports thousands of real-time users. I was experimenting because it seemed needlessly inefficient to have a clustered char(36) as the key. :wtf: Turns out that all my questions may be moot (aside from my own learning) :sigh: For review exec sp_helpindex <table name> gives the following description

          table #1
          clustered, unique, primary key located on PRIMARY guid column

          table #2
          clustered located on PRIMARY the int column
          nonclustered, unique, primary key located on PRIMARY guid column

          There is no detectable difference on insert speed when table size is <100k. There is a difference, but it seems trivial, when the table contains more than two million rows (14 second difference when inserting 300k rows in each table) P.S. Love the magnetic tape reference. P.P.S. > a clustered table with 300 columns and a semi-random natural key Yikes!

          J 1 Reply Last reply
          0
          • J JChrisCompton

            Just so you know (in the event that I ask a question here again) those inserts would be against the same database server which supports thousands of real-time users. I was experimenting because it seemed needlessly inefficient to have a clustered char(36) as the key. :wtf: Turns out that all my questions may be moot (aside from my own learning) :sigh: For review exec sp_helpindex <table name> gives the following description

            table #1
            clustered, unique, primary key located on PRIMARY guid column

            table #2
            clustered located on PRIMARY the int column
            nonclustered, unique, primary key located on PRIMARY guid column

            There is no detectable difference on insert speed when table size is <100k. There is a difference, but it seems trivial, when the table contains more than two million rows (14 second difference when inserting 300k rows in each table) P.S. Love the magnetic tape reference. P.P.S. > a clustered table with 300 columns and a semi-random natural key Yikes!

            J Offline
            J Offline
            Jorgen Andersson
            wrote on last edited by
            #12

            The difference is bigger than you might think. On table #1 you have one index, and on table #2 you have two indexes. Insert performance is basically linear with the number of indexes[^]. So the fact that it doesn't take twice as long to do the inserts on table #2 means that your idea was quite correct after all. And this is on a narrow table, on a wide clustered table the difference would be bigger.

            Wrong is evil and must be defeated. - Jeff Ello

            J 1 Reply Last reply
            0
            • J Jorgen Andersson

              The difference is bigger than you might think. On table #1 you have one index, and on table #2 you have two indexes. Insert performance is basically linear with the number of indexes[^]. So the fact that it doesn't take twice as long to do the inserts on table #2 means that your idea was quite correct after all. And this is on a narrow table, on a wide clustered table the difference would be bigger.

              Wrong is evil and must be defeated. - Jeff Ello

              J Offline
              J Offline
              JChrisCompton
              wrote on last edited by
              #13

              One has twice the indices... well duh. I should have thought of that... but I didn't :laugh: Thanks for sticking with me until I got it!

              1 Reply Last reply
              0
              • L Lost User

                I never did trees in school.

                JChrisCompton wrote:

                The issue that concerns me is page splits.

                From the page I linked; "When an index is created or rebuilt, the fill-factor value determines the percentage of space on each leaf-level page to be filled with data, reserving the remainder on each page as free space for future growth." "How do you avoid them? There are strategies for doing this, one such strategy is to use a low fill factor when you create and index, but that will be at the expense of read operations and because you usually have more reads to writes then that’s bad." What is a page split? What happens? Why does it happen? Why worry? - Tony Rogerson's ramblings on SQL Server[^]

                Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^] "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

                J Offline
                J Offline
                JChrisCompton
                wrote on last edited by
                #14

                Thanks for the article reference, it is good reading. It is great background and wonderfully answers "what is a page split." I do wonder, however, how much of that has changed in SSrv 2008+. I thought, for example, that VarChar columns in SQL Server can be moved off to different storage location to prevent splits due a column changing its size. (row overflow? or is that just for blobs?) I didn't find something to corroborate my thought when I did a quick search, so if someone could let me know whether I'm on/off base in the previous paragraph it would be great. Just specifically about VarChar (not text/blob; that's a different world I'm rarely in).

                L 1 Reply Last reply
                0
                • J JChrisCompton

                  Thanks for the article reference, it is good reading. It is great background and wonderfully answers "what is a page split." I do wonder, however, how much of that has changed in SSrv 2008+. I thought, for example, that VarChar columns in SQL Server can be moved off to different storage location to prevent splits due a column changing its size. (row overflow? or is that just for blobs?) I didn't find something to corroborate my thought when I did a quick search, so if someone could let me know whether I'm on/off base in the previous paragraph it would be great. Just specifically about VarChar (not text/blob; that's a different world I'm rarely in).

                  L Offline
                  L Offline
                  Lost User
                  wrote on last edited by
                  #15

                  JChrisCompton wrote:

                  It is great background and wonderfully answers "what is a page split."

                  Better than I could in a single post, and it is nice to have an actual example that you can copy/paste to try for yourself and see :)

                  JChrisCompton wrote:

                  I didn't find

                  Let me share more bookmarks ;P How are varchar values stored in a SQL Server database? - Stack Overflow[^] - first answer. I had already forgotten about the option to compress those fields.

                  Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^] "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

                  J 1 Reply Last reply
                  0
                  • L Lost User

                    JChrisCompton wrote:

                    It is great background and wonderfully answers "what is a page split."

                    Better than I could in a single post, and it is nice to have an actual example that you can copy/paste to try for yourself and see :)

                    JChrisCompton wrote:

                    I didn't find

                    Let me share more bookmarks ;P How are varchar values stored in a SQL Server database? - Stack Overflow[^] - first answer. I had already forgotten about the option to compress those fields.

                    Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^] "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

                    J Offline
                    J Offline
                    JChrisCompton
                    wrote on last edited by
                    #16

                    Thanks again. I believe the term I was looking for is "offrow pages". Here's a recent (Feb 2018) link that talks about it briefly then takes the undocumented DBCC IND command really deep into the RowID and beyond SQL Server Row Data Linking to Off Row Data[^] in case you're interested. I enjoyed it but was a little got lost :)

                    L 1 Reply Last reply
                    0
                    • J JChrisCompton

                      Thanks again. I believe the term I was looking for is "offrow pages". Here's a recent (Feb 2018) link that talks about it briefly then takes the undocumented DBCC IND command really deep into the RowID and beyond SQL Server Row Data Linking to Off Row Data[^] in case you're interested. I enjoyed it but was a little got lost :)

                      L Offline
                      L Offline
                      Lost User
                      wrote on last edited by
                      #17

                      Cool, thanks for the link :)

                      Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^] "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

                      1 Reply Last reply
                      0
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Don't have an account? Register

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups