Naming my clustered index column which isn't the PK (and is a non-PK clustered Identity a good idea?)

JChrisCompton

Good question... 3rd party software is the reason.

Lost User

JChrisCompton wrote:

2. I have the identity starting at -1 and decreasing by 1 to make it obvious that it isn't the pk. Don't know how much this will help, but I do what I can.

If it is your table, you decide what the PK is. The vendors ID should simply be a unique constraint with an index and be treated as an "alternative primary key".

JChrisCompton wrote:

3. I don't like the names ix_need_a_good_name_here, need_a_good_name_here_ix, need_a_good_name_here_id, because if I saw them I would assume that was the primary key. Suggestions?

The name should be identifying enough for you to know which constraint has been broken once the exception is thrown. To make searching easier, I use the tablename as the beginning of that name, and postfix the columname on which the constraint lies, as well as the letters "IX" for an index, or "C" for a constraint.

Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^] "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

Jorgen Andersson

JChrisCompton wrote:

1. I don't want the guid to be the clustered index because there will be inserts. There were 3,000 inserts the first month, and I'm expanding from one category to four. No way to know the distribution of inserts (except 1st shift / business days) and I don't know the frequency of the other categories. Fill factor == 100, not sure I can get it changed.

3000 inserts is really not much.

JChrisCompton wrote:

2. I have the identity starting at -1 and decreasing by 1 to make it obvious that it isn't the pk. Don't know how much this will help, but I do what I can.

The best identifier is a proper name, see Eddys response I agree fully with him. And so does ISO 11179.

JChrisCompton wrote:

3. I don't like the names ix_need_a_good_name_here, need_a_good_name_here_ix, need_a_good_name_here_id, because if I saw them I would assume that was the primary key. Suggestions?

I believe you're mixing up indexes with keys. The purpose of an index is to speed up the searching of a table, an index does not need to be unique. The purpose of a key is to ensure the integrity of the database. A key needs an index for the function, but not the other way around. If you create a clustered table, you should not use a guid as the key as you really should avoid random inserts on them. so either you use the identity column as the primary key and simply put a unique key on the guid column, or you use a nonclustered table.

Wrong is evil and must be defeated. - Jeff Ello

JChrisCompton

> I believe you're mixing up indexes with keys. Perhaps - let me try again; maybe I didn't convey this well. The index isn't just for lookup speed, it is a unique index (same functionality as a unique constraint unless I'm missing something... I'm more of an 'app dba' than a 'real DBA') to enforce referential integrity because... referential integrity. The addition of the Identity field was due to my concern about page splits. The guid will remain the primary key for the table. The guid is generated within a vendor's application so there isn't much I can do about that at the moment. I know (or at least 'think') that fill factor is only applied at rebuild or creation. This troubled me since guids are random (in the sense that where a newly generated guid falls numerically in relation to an existing list of guids is a random position). Inserts will fill up any fill factor <100 and then splits start (and the splits are equivalent to fill factor = 100 as the table grows). So I came up with the idea of a Clustered Identity as the 'physical sorting field' (numerically increasing won't generate page split) because it seems a better idea than (1) guid inserts causing page splits, and (2) better than making the table a heap (without a clustered index). Someone mentioned the volume being low. Yes, last month was 3,000 but this will be expanded from 1 to 5 categories. This could cause 3,100 entries next month or 31,000 entries. I don't have the time of day of the inserts but it is all first shift and (I think) mostly at the start of the month. If I can cause less work for the db, it feels like I should. One new piece of information that I'll share: testing this I don't see as much of a negative impact from a pk clustered guid as I expected.

Lost User

JChrisCompton wrote:

I know (or at least 'think')

Specify Fill Factor for an Index | Microsoft Docs[^] :thumbsup:

Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^] "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

Jorgen Andersson

JChrisCompton wrote:

The index isn't just for lookup speed, it is a unique index (same functionality as a unique constraint unless I'm missing something... I'm more of an 'app dba' than a 'real DBA') to enforce referential integrity because... referential integrity.

You're not missing anything, but you need to differentiate the purpose. A key, (Unique or primary) needs a unique index for its function, and while it's enough to create a unique index to enforce the uniqueness of a column(s) you should really create a unique (or primary) key to show the intent if nothing else.

From the help file

Creating a unique index guarantees that any attempt to duplicate key values fails. There are no significant differences between creating a UNIQUE constraint and creating a unique index that is independent of a constraint. Data validation occurs in the same manner, and the query optimizer does not differentiate between a unique index created by a constraint or manually created. However, you should create a UNIQUE constraint on the column when data integrity is the objective. This makes the objective of the index clear. (emphasis mine.)

There's also another slight difference, on a unique index you can actually add the option "Ignore duplicate values", this is not allowed for a constraint.

JChrisCompton wrote:

So I came up with the idea of a Clustered Identity as the 'physical sorting field'

This is a good idea, but it would need to be the primary key in that case. Which doesn't matter at all, from a relational point of view there's no difference between a primary key and a unique key. It's only about how the data is stored in the case of clustered tables and magnetic tapes. :)

JChrisCompton wrote:

Someone mentioned the volume being low. Yes, last month was 3,000 but this will be expanded from 1 to 5 categories. This could cause 3,100 entries next month or 31,000 entries

To put it in relation, at my job we insert approximately 300000 rows every week on a clustered table with 300 columns and a semi-random natural key. It takes a couple of minutes.

Wrong is evil and must be defeated. - Jeff Ello

JChrisCompton

Yes, it says "The fill-factor option is provided for... [when] an index is created or rebuilt..." "I think" was intended to mean "I think fill-factor has no influence except during create/rebuild". The issue that concerns me is page splits. From memory of a MSFT SQL Server Performance Tuning class I took five years ago, I think tables with a clustered index are a B+ tree with doubly linked leaf nodes. (class and school were long ago :–) When an insert happens, and the bottom node of the tree is full, then a split happens. When SSvr splits a node, my understanding is that it just makes a new node - the fill-factor doesn't enter in to it. If I specify my fill factor to be "only fill 1/3" it doesn't matter - a new node is created (instead of 2 extra nodes being created then each adjusted to 1/3 full). This is done without rebalancing, because it is the fastest solution. Is my mental model right, or are there things in newer versions that makes 'real life' different? (maybe some auto tuning feature I've missed)

Lost User

I never did trees in school.

JChrisCompton wrote:

The issue that concerns me is page splits.

From the page I linked; "When an index is created or rebuilt, the fill-factor value determines the percentage of space on each leaf-level page to be filled with data, reserving the remainder on each page as free space for future growth." "How do you avoid them? There are strategies for doing this, one such strategy is to use a low fill factor when you create and index, but that will be at the expense of read operations and because you usually have more reads to writes then that’s bad." What is a page split? What happens? Why does it happen? Why worry? - Tony Rogerson's ramblings on SQL Server[^]

Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^] "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

JChrisCompton

Just so you know (in the event that I ask a question here again) those inserts would be against the same database server which supports thousands of real-time users. I was experimenting because it seemed needlessly inefficient to have a clustered char(36) as the key. :wtf: Turns out that all my questions may be moot (aside from my own learning) :sigh: For review exec sp_helpindex <table name> gives the following description

table #1
clustered, unique, primary key located on PRIMARY guid column

table #2
clustered located on PRIMARY the int column
nonclustered, unique, primary key located on PRIMARY guid column

There is no detectable difference on insert speed when table size is <100k. There is a difference, but it seems trivial, when the table contains more than two million rows (14 second difference when inserting 300k rows in each table) P.S. Love the magnetic tape reference. P.P.S. > a clustered table with 300 columns and a semi-random natural key Yikes!

Jorgen Andersson

The difference is bigger than you might think. On table #1 you have one index, and on table #2 you have two indexes. Insert performance is basically linear with the number of indexes[^]. So the fact that it doesn't take twice as long to do the inserts on table #2 means that your idea was quite correct after all. And this is on a narrow table, on a wide clustered table the difference would be bigger.

Wrong is evil and must be defeated. - Jeff Ello

JChrisCompton

One has twice the indices... well duh. I should have thought of that... but I didn't :laugh: Thanks for sticking with me until I got it!

JChrisCompton

Thanks for the article reference, it is good reading. It is great background and wonderfully answers "what is a page split." I do wonder, however, how much of that has changed in SSrv 2008+. I thought, for example, that VarChar columns in SQL Server can be moved off to different storage location to prevent splits due a column changing its size. (row overflow? or is that just for blobs?) I didn't find something to corroborate my thought when I did a quick search, so if someone could let me know whether I'm on/off base in the previous paragraph it would be great. Just specifically about VarChar (not text/blob; that's a different world I'm rarely in).

Lost User

JChrisCompton wrote:

It is great background and wonderfully answers "what is a page split."

Better than I could in a single post, and it is nice to have an actual example that you can copy/paste to try for yourself and see :)

JChrisCompton wrote:

I didn't find

Let me share more bookmarks ;P How are varchar values stored in a SQL Server database? - Stack Overflow[^] - first answer. I had already forgotten about the option to compress those fields.

Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^] "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

JChrisCompton

Thanks again. I believe the term I was looking for is "offrow pages". Here's a recent (Feb 2018) link that talks about it briefly then takes the undocumented DBCC IND command really deep into the RowID and beyond SQL Server Row Data Linking to Off Row Data[^] in case you're interested. I enjoyed it but was a little got lost :)

Lost User

Cool, thanks for the link :)

Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^] "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.