MS SQL: how to reorganize identity column values?

Mycroft Holmes

So did Geralds ideas help?

Never underestimate the power of human stupidity RAH

michal kreslik

hello, guys, thanks much for your suggestions! update: I have selected the 1.6 billion rows from the old table into a new table without any problems (it took a while, though). I did the cast to bigint on the way. the growth of the log file was nothing compared to me trying to change the Int32 to Int64 directly in the old table. I scripted the constraints from the old table and now I'm adding them to the new table (will take a while I guess as after adding the constraints I'm re-checking them) after that I'll take note what indexes were in the old table, I'll drop the old table, rename the new table and build the indexes on the new table. I'll update you how it goes. obviously, I have a full backup of the last state of the database (file copy DB + LOG). thanks again for you help so far! Michal

PIEBALDconsult

Mycroft Holmes wrote:

especially in the development phase

And after a few months or years? And, perhaps, you could mock it up; have your own Guid class that returns: g = new System.Guid ( "00000000-0000-0000-0000-000000000001" ) ; g = new System.Guid ( "00000000-0000-0000-0000-000000000002" ) ; etc. for during development. P.S. Maybe I should get right on that.

Mycroft Holmes

After a few more years it is the problem of the support team :rolleyes: . I know that's not a reasonable answer!

PIEBALDconsult wrote:

And, perhaps, you could mock it up; have your own Guid class that returns:

and the difference between

Select * from SomeTable where ID = 1

and

Select * from SomeTable where ID = '00000000-0000-0000-0000-000000000001'

Is that in the 2nd instance I HAVE to cut and paste or 29 key strokes plus 4 shifts oh yeah and the ', and I can remember 1. Nope guids are a developers nightmare, if I knew before hand that a contract used guids for IDs I would refuse the contract (I have not been desperate for work for a looong time). IMHO the only valid application for guids is in a distributed application where the data is to be merged and even then I would opt for a locationid or only put the guids on the transaction table!

Never underestimate the power of human stupidity RAH

PIEBALDconsult

Well, I've used them for a few projects now (by choice) and I find them superior in ways that matter to me.

Mycroft Holmes

PIEBALDconsult wrote:

I find them superior in ways that matter to me

Would you care to elaborate, I'm very interested to hear how they are of a benefit other than the distributed requirement I am aware of.

Never underestimate the power of human stupidity RAH

michal kreslik

update: so I'm again at the beginning. the attempt to set the bigint column in the new table to Identity results in the SQL server growing the LOG so that it fills the whole disk and then it stops doing anything. why should this simple operation be so dramatically demanding? I'll now try to repeat the whole process, but first I'll create the new table with the Identity column already set up and only then I'll select into this table from the original table. I'll appreciate any suggestions. thanks, Michal

Mycroft Holmes

Don't forget to turn indentity insert on

Never underestimate the power of human stupidity RAH

michal kreslik

yes, I know, thanks. I've done this already with the same DB before. let's see, I'll update you, thanks much. Michal

michal kreslik

I'm using GUIDs as keys in another database where multiple servers need to write to the same table. so this is the model of the distributed app you were talking about. but even there it's not ideal. just yesterday I was thinking about changing this structure. in another table of the same database I'm storing rows that contain this GUID as a FK. there are many rows with the same GUID and I need to group these rows based on the GUID. the problem is obviously the performance. I have a clustered index on a table that has this GUID as a FK on which I base the clustering. since new GUIDs don't come in a sequential order (they're generated randomly), each insert with a new GUID into this GUID-clustered table will force the index to be recalculated. this wouldn't happen with sequential autogenerated values like int. Michal

Mycroft Holmes

If you opt for the identity type column the cost is a complex primary key on the merged table (ID & Server) and I'd think that may have a higher cost than the guid. As piebald suggested a custom guid based on an identity field and server may be a good idea!

Never underestimate the power of human stupidity RAH

michal kreslik

actually, I'm thinking it might be a good idea to also set up all indexes (most importantly, the clustered one) on the blank new table before I do the bulk INSERT INTO. this way SQL will be building the indexes along the way as it copies the data. I think this is potentially less time-and-space costly compared to SQL having to physically repartition the data in the clustered index later on, after the INSERT INTO has completed. let's see :)

michal kreslik

udpate: so even this way, the LOG file grows so much that it fills the whole 1000 GB disk. which is ridiculous as the db itself without the indexes is only about 110 GBs big. it's 315 GBs with all indexes. in the new table I'm only using 2 out of the 6 original indexes. as always when one does some change, I've found out I'm actually no longer using the remaining 4 indexes, so I dropped them in the new table :) I think the problem here is that although the recovery model is set to simple, the log file keeps track of all the transactions until the statement has terminated and all data has been fully written to the disk. so I think the solution now is to try to do this INSERT INTO in batches and make sure the log file gets truncated after the completion of each batch. I'm wondering just how much disk space this operation would eventually need to complete if done at once. it would be useful if SQL server provided a way to carry out these bulk operations in some kind of "unsafe" mode with no logging at all. unless I'm missing something, it's not possible to turn off all logging in the MS SQL server. I'll update you :)

michal kreslik

final update: all done. the correct way on how to do this was: 1) create a new table with the same columns, change Int32 column to Int64 2) script out all keys and indexes from the old table and create them on the new table 3) make sure the DB is in the simple recovery mode 4) INSERT the rows from the old table into the new table in batches, truncating the LOG after each batch. I was also inserting the identity column values, so I have also set the IDENTITY_INSERT to ON before the batch loop. that's it :) thanks for your help, guys. Michal