Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Database & SysAdmin
  3. Database
  4. MS SQL: how to reorganize identity column values?

MS SQL: how to reorganize identity column values?

Scheduled Pinned Locked Moved Database
databasetutorialquestion
20 Posts 5 Posters 4 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M michal kreslik

    Hello, in one of my tables, I'm using Int32 as a data type for the identity column. This Int32 identity value is autogenerated. Now it has reached the maximum value for Int32, although the total number of rows is "only" 1.6 billion, because some rows were deleted and because I used an identity seed of 400 million on this column when creating the table for various reasons. Now, no new rows can be added to the table, because the Int32 autogenerated identity column value is hitting the maximum value for Int32. One of the options would obviously be to change the data type of the identity column to another data type, let's say Int64 (biginit in MS SQL terms). However, changing Int32 to Int64 on 1.6 billion rows might seem like a small step for a human, but it's a big step for such a database.. Even if I turn off all logging, I switch the recovery model to simple and I delete all indexes on this table, the process of changing Int32 to Int64 on this column fails since there's not enough disk space to make this change. I'm using two 1000 GB hard drivers in RAID0, so the total space is 1000GB and the database with all indexes is 315 GBs big. For some reason, even 700 GBs is not enough for this operation. Since there are still roughly 500 million values for this Int32 identity column theoretically available, I'm thinking my best bet at this point would be to try to reorganize the identity column values so that they reorder themselves to begin with 1 and will defragment themselves so there will be no holes in the used Int32 values for the identity column. But I haven't found a way on how to do this. Does anyone know if this is possible? Thanks very much for any input, Michal

    M Offline
    M Offline
    Mycroft Holmes
    wrote on last edited by
    #7

    So did Geralds ideas help?

    Never underestimate the power of human stupidity RAH

    M 2 Replies Last reply
    0
    • M Mycroft Holmes

      So did Geralds ideas help?

      Never underestimate the power of human stupidity RAH

      M Offline
      M Offline
      michal kreslik
      wrote on last edited by
      #8

      hello, guys, thanks much for your suggestions! update: I have selected the 1.6 billion rows from the old table into a new table without any problems (it took a while, though). I did the cast to bigint on the way. the growth of the log file was nothing compared to me trying to change the Int32 to Int64 directly in the old table. I scripted the constraints from the old table and now I'm adding them to the new table (will take a while I guess as after adding the constraints I'm re-checking them) after that I'll take note what indexes were in the old table, I'll drop the old table, rename the new table and build the indexes on the new table. I'll update you how it goes. obviously, I have a full backup of the last state of the database (file copy DB + LOG). thanks again for you help so far! Michal

      1 Reply Last reply
      0
      • M Mycroft Holmes

        PIEBALDconsult wrote:

        I'll stick with GUIDs.

        What a revolting idea, I got stuck with a GUID project in the late 90s, worst project I ever worked on, especially in the development phase when the numbers are useful, I can remember single or even double digit numbers where I have no chance with a 17 character string! I use bigint on transaction tables and int on static tables, never has a problem. We move multiple 100k+ batches into and out of transaction tables daily and bigint works nicely thank you.

        Never underestimate the power of human stupidity RAH

        P Offline
        P Offline
        PIEBALDconsult
        wrote on last edited by
        #9

        Mycroft Holmes wrote:

        especially in the development phase

        And after a few months or years? And, perhaps, you could mock it up; have your own Guid class that returns: g = new System.Guid ( "00000000-0000-0000-0000-000000000001" ) ; g = new System.Guid ( "00000000-0000-0000-0000-000000000002" ) ; etc. for during development. P.S. Maybe I should get right on that.

        M 1 Reply Last reply
        0
        • P PIEBALDconsult

          Mycroft Holmes wrote:

          especially in the development phase

          And after a few months or years? And, perhaps, you could mock it up; have your own Guid class that returns: g = new System.Guid ( "00000000-0000-0000-0000-000000000001" ) ; g = new System.Guid ( "00000000-0000-0000-0000-000000000002" ) ; etc. for during development. P.S. Maybe I should get right on that.

          M Offline
          M Offline
          Mycroft Holmes
          wrote on last edited by
          #10

          After a few more years it is the problem of the support team :rolleyes: . I know that's not a reasonable answer!

          PIEBALDconsult wrote:

          And, perhaps, you could mock it up; have your own Guid class that returns:

          and the difference between

          Select * from SomeTable where ID = 1

          and

          Select * from SomeTable where ID = '00000000-0000-0000-0000-000000000001'

          Is that in the 2nd instance I HAVE to cut and paste or 29 key strokes plus 4 shifts oh yeah and the ', and I can remember 1. Nope guids are a developers nightmare, if I knew before hand that a contract used guids for IDs I would refuse the contract (I have not been desperate for work for a looong time). IMHO the only valid application for guids is in a distributed application where the data is to be merged and even then I would opt for a locationid or only put the guids on the transaction table!

          Never underestimate the power of human stupidity RAH

          P 1 Reply Last reply
          0
          • M Mycroft Holmes

            After a few more years it is the problem of the support team :rolleyes: . I know that's not a reasonable answer!

            PIEBALDconsult wrote:

            And, perhaps, you could mock it up; have your own Guid class that returns:

            and the difference between

            Select * from SomeTable where ID = 1

            and

            Select * from SomeTable where ID = '00000000-0000-0000-0000-000000000001'

            Is that in the 2nd instance I HAVE to cut and paste or 29 key strokes plus 4 shifts oh yeah and the ', and I can remember 1. Nope guids are a developers nightmare, if I knew before hand that a contract used guids for IDs I would refuse the contract (I have not been desperate for work for a looong time). IMHO the only valid application for guids is in a distributed application where the data is to be merged and even then I would opt for a locationid or only put the guids on the transaction table!

            Never underestimate the power of human stupidity RAH

            P Offline
            P Offline
            PIEBALDconsult
            wrote on last edited by
            #11

            Well, I've used them for a few projects now (by choice) and I find them superior in ways that matter to me.

            M 1 Reply Last reply
            0
            • P PIEBALDconsult

              Well, I've used them for a few projects now (by choice) and I find them superior in ways that matter to me.

              M Offline
              M Offline
              Mycroft Holmes
              wrote on last edited by
              #12

              PIEBALDconsult wrote:

              I find them superior in ways that matter to me

              Would you care to elaborate, I'm very interested to hear how they are of a benefit other than the distributed requirement I am aware of.

              Never underestimate the power of human stupidity RAH

              M 1 Reply Last reply
              0
              • M Mycroft Holmes

                So did Geralds ideas help?

                Never underestimate the power of human stupidity RAH

                M Offline
                M Offline
                michal kreslik
                wrote on last edited by
                #13

                update: so I'm again at the beginning. the attempt to set the bigint column in the new table to Identity results in the SQL server growing the LOG so that it fills the whole disk and then it stops doing anything. why should this simple operation be so dramatically demanding? I'll now try to repeat the whole process, but first I'll create the new table with the Identity column already set up and only then I'll select into this table from the original table. I'll appreciate any suggestions. thanks, Michal

                M 1 Reply Last reply
                0
                • M michal kreslik

                  update: so I'm again at the beginning. the attempt to set the bigint column in the new table to Identity results in the SQL server growing the LOG so that it fills the whole disk and then it stops doing anything. why should this simple operation be so dramatically demanding? I'll now try to repeat the whole process, but first I'll create the new table with the Identity column already set up and only then I'll select into this table from the original table. I'll appreciate any suggestions. thanks, Michal

                  M Offline
                  M Offline
                  Mycroft Holmes
                  wrote on last edited by
                  #14

                  Don't forget to turn indentity insert on

                  Never underestimate the power of human stupidity RAH

                  M 1 Reply Last reply
                  0
                  • M Mycroft Holmes

                    Don't forget to turn indentity insert on

                    Never underestimate the power of human stupidity RAH

                    M Offline
                    M Offline
                    michal kreslik
                    wrote on last edited by
                    #15

                    yes, I know, thanks. I've done this already with the same DB before. let's see, I'll update you, thanks much. Michal

                    M 2 Replies Last reply
                    0
                    • M Mycroft Holmes

                      PIEBALDconsult wrote:

                      I find them superior in ways that matter to me

                      Would you care to elaborate, I'm very interested to hear how they are of a benefit other than the distributed requirement I am aware of.

                      Never underestimate the power of human stupidity RAH

                      M Offline
                      M Offline
                      michal kreslik
                      wrote on last edited by
                      #16

                      I'm using GUIDs as keys in another database where multiple servers need to write to the same table. so this is the model of the distributed app you were talking about. but even there it's not ideal. just yesterday I was thinking about changing this structure. in another table of the same database I'm storing rows that contain this GUID as a FK. there are many rows with the same GUID and I need to group these rows based on the GUID. the problem is obviously the performance. I have a clustered index on a table that has this GUID as a FK on which I base the clustering. since new GUIDs don't come in a sequential order (they're generated randomly), each insert with a new GUID into this GUID-clustered table will force the index to be recalculated. this wouldn't happen with sequential autogenerated values like int. Michal

                      M 1 Reply Last reply
                      0
                      • M michal kreslik

                        I'm using GUIDs as keys in another database where multiple servers need to write to the same table. so this is the model of the distributed app you were talking about. but even there it's not ideal. just yesterday I was thinking about changing this structure. in another table of the same database I'm storing rows that contain this GUID as a FK. there are many rows with the same GUID and I need to group these rows based on the GUID. the problem is obviously the performance. I have a clustered index on a table that has this GUID as a FK on which I base the clustering. since new GUIDs don't come in a sequential order (they're generated randomly), each insert with a new GUID into this GUID-clustered table will force the index to be recalculated. this wouldn't happen with sequential autogenerated values like int. Michal

                        M Offline
                        M Offline
                        Mycroft Holmes
                        wrote on last edited by
                        #17

                        If you opt for the identity type column the cost is a complex primary key on the merged table (ID & Server) and I'd think that may have a higher cost than the guid. As piebald suggested a custom guid based on an identity field and server may be a good idea!

                        Never underestimate the power of human stupidity RAH

                        1 Reply Last reply
                        0
                        • M michal kreslik

                          yes, I know, thanks. I've done this already with the same DB before. let's see, I'll update you, thanks much. Michal

                          M Offline
                          M Offline
                          michal kreslik
                          wrote on last edited by
                          #18

                          actually, I'm thinking it might be a good idea to also set up all indexes (most importantly, the clustered one) on the blank new table before I do the bulk INSERT INTO. this way SQL will be building the indexes along the way as it copies the data. I think this is potentially less time-and-space costly compared to SQL having to physically repartition the data in the clustered index later on, after the INSERT INTO has completed. let's see :)

                          1 Reply Last reply
                          0
                          • M michal kreslik

                            yes, I know, thanks. I've done this already with the same DB before. let's see, I'll update you, thanks much. Michal

                            M Offline
                            M Offline
                            michal kreslik
                            wrote on last edited by
                            #19

                            udpate: so even this way, the LOG file grows so much that it fills the whole 1000 GB disk. which is ridiculous as the db itself without the indexes is only about 110 GBs big. it's 315 GBs with all indexes. in the new table I'm only using 2 out of the 6 original indexes. as always when one does some change, I've found out I'm actually no longer using the remaining 4 indexes, so I dropped them in the new table :) I think the problem here is that although the recovery model is set to simple, the log file keeps track of all the transactions until the statement has terminated and all data has been fully written to the disk. so I think the solution now is to try to do this INSERT INTO in batches and make sure the log file gets truncated after the completion of each batch. I'm wondering just how much disk space this operation would eventually need to complete if done at once. it would be useful if SQL server provided a way to carry out these bulk operations in some kind of "unsafe" mode with no logging at all. unless I'm missing something, it's not possible to turn off all logging in the MS SQL server. I'll update you :)

                            M 1 Reply Last reply
                            0
                            • M michal kreslik

                              udpate: so even this way, the LOG file grows so much that it fills the whole 1000 GB disk. which is ridiculous as the db itself without the indexes is only about 110 GBs big. it's 315 GBs with all indexes. in the new table I'm only using 2 out of the 6 original indexes. as always when one does some change, I've found out I'm actually no longer using the remaining 4 indexes, so I dropped them in the new table :) I think the problem here is that although the recovery model is set to simple, the log file keeps track of all the transactions until the statement has terminated and all data has been fully written to the disk. so I think the solution now is to try to do this INSERT INTO in batches and make sure the log file gets truncated after the completion of each batch. I'm wondering just how much disk space this operation would eventually need to complete if done at once. it would be useful if SQL server provided a way to carry out these bulk operations in some kind of "unsafe" mode with no logging at all. unless I'm missing something, it's not possible to turn off all logging in the MS SQL server. I'll update you :)

                              M Offline
                              M Offline
                              michal kreslik
                              wrote on last edited by
                              #20

                              final update: all done. the correct way on how to do this was: 1) create a new table with the same columns, change Int32 column to Int64 2) script out all keys and indexes from the old table and create them on the new table 3) make sure the DB is in the simple recovery mode 4) INSERT the rows from the old table into the new table in batches, truncating the LOG after each batch. I was also inserting the identity column values, so I have also set the IDENTITY_INSERT to ON before the batch loop. that's it :) thanks for your help, guys. Michal

                              1 Reply Last reply
                              0
                              Reply
                              • Reply as topic
                              Log in to reply
                              • Oldest to Newest
                              • Newest to Oldest
                              • Most Votes


                              • Login

                              • Don't have an account? Register

                              • Login or register to search.
                              • First post
                                Last post
                              0
                              • Categories
                              • Recent
                              • Tags
                              • Popular
                              • World
                              • Users
                              • Groups