Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Database & SysAdmin
  3. Database
  4. Best way to batchprocess a large update

Best way to batchprocess a large update

Scheduled Pinned Locked Moved Database
htmlcomhelptutorialquestion
18 Posts 4 Posters 22 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J Jorgen Andersson

    So I need to regularly update a table with data from another table. The problem is that if I update the normal way I get a table lock on the target table for half an hour, which is frowned upon by the users. So I need to run the update in batches. The other problem is that the ID sequence is having gaps in it. Larger gaps than the batch size. At the moment I have this solution:

    DECLARE
    @LastID int = 0,
    @NextID int,
    @RC int = 1;

    WHILE (@RC > 0)
    BEGIN
    SELECT TOP 5000
    @NextID = s.id
    FROM Source s
    WHERE s.id> @LastID
    ORDER BY s.id
    ;
    UPDATE t
    SET ------
    FROM Source s
    JOIN Target t ON t.id = s.id
    WHERE s.id > @LastID
    AND s.id <= @NextID
    ;
    SET @RC = @@ROWCOUNT;
    SET @LastID = @NextID ;
    END

    Which works just fine, but using two selects is getting under my skin. Any better suggestions for how to do it?

    Wrong is evil and must be defeated. - Jeff Ello

    Richard DeemingR Offline
    Richard DeemingR Offline
    Richard Deeming
    wrote on last edited by
    #2

    How about something like:

    DROP TABLE IF EXISTS #ProcessedIDs;
    CREATE TABLE #ProcessedIDs (id int NOT NULL Primary Key);

    DECLARE @RC int = 5000;

    WHILE @RC = 5000
    BEGIN
    UPDATE TOP (5000)
    T
    SET
    ...
    OUTPUT
    inserted.id INTO #ProcessedIDs
    FROM
    Target As T
    INNER JOIN Source As S
    ON S.id = T.id
    WHERE
    Not Exists
    (
    SELECT 1
    FROM #ProcessedIDs As P
    WHERE P.id = T.id
    )
    ;

    SET @RC = @@ROWCOUNT;
    

    END;

    DROP TABLE IF EXISTS #ProcessedIDs;

    NB: The DROP TABLE IF EXISTS syntax is new in SQL Server 2016. If you're using an earlier version, you'll need to use an alternative syntax[^]. The OUTPUT clause should work in SQL Sever 2005 or later.


    "These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer

    "These people looked deep within my soul and assigned me a number based on the order in which I joined" - Homer

    J 2 Replies Last reply
    0
    • Richard DeemingR Richard Deeming

      How about something like:

      DROP TABLE IF EXISTS #ProcessedIDs;
      CREATE TABLE #ProcessedIDs (id int NOT NULL Primary Key);

      DECLARE @RC int = 5000;

      WHILE @RC = 5000
      BEGIN
      UPDATE TOP (5000)
      T
      SET
      ...
      OUTPUT
      inserted.id INTO #ProcessedIDs
      FROM
      Target As T
      INNER JOIN Source As S
      ON S.id = T.id
      WHERE
      Not Exists
      (
      SELECT 1
      FROM #ProcessedIDs As P
      WHERE P.id = T.id
      )
      ;

      SET @RC = @@ROWCOUNT;
      

      END;

      DROP TABLE IF EXISTS #ProcessedIDs;

      NB: The DROP TABLE IF EXISTS syntax is new in SQL Server 2016. If you're using an earlier version, you'll need to use an alternative syntax[^]. The OUTPUT clause should work in SQL Sever 2005 or later.


      "These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer

      J Offline
      J Offline
      Jorgen Andersson
      wrote on last edited by
      #3

      Ah, yes! I always tend to forget the output clause. Thanks!

      Wrong is evil and must be defeated. - Jeff Ello

      1 Reply Last reply
      0
      • Richard DeemingR Richard Deeming

        How about something like:

        DROP TABLE IF EXISTS #ProcessedIDs;
        CREATE TABLE #ProcessedIDs (id int NOT NULL Primary Key);

        DECLARE @RC int = 5000;

        WHILE @RC = 5000
        BEGIN
        UPDATE TOP (5000)
        T
        SET
        ...
        OUTPUT
        inserted.id INTO #ProcessedIDs
        FROM
        Target As T
        INNER JOIN Source As S
        ON S.id = T.id
        WHERE
        Not Exists
        (
        SELECT 1
        FROM #ProcessedIDs As P
        WHERE P.id = T.id
        )
        ;

        SET @RC = @@ROWCOUNT;
        

        END;

        DROP TABLE IF EXISTS #ProcessedIDs;

        NB: The DROP TABLE IF EXISTS syntax is new in SQL Server 2016. If you're using an earlier version, you'll need to use an alternative syntax[^]. The OUTPUT clause should work in SQL Sever 2005 or later.


        "These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer

        J Offline
        J Offline
        Jorgen Andersson
        wrote on last edited by
        #4

        Using a WHERE NOT EXISTS turned out to be very slow because of the antijoin using index seeks for every row. I changed it to WHERE ID > (SELECT ISNULL(max(ID),0) FROM @ProcessedIDs) which allows an index scan. This is magnitudes faster than the original nonbatched update. The question is how to use this with composite keys?

        Wrong is evil and must be defeated. - Jeff Ello

        Richard DeemingR 1 Reply Last reply
        0
        • J Jorgen Andersson

          So I need to regularly update a table with data from another table. The problem is that if I update the normal way I get a table lock on the target table for half an hour, which is frowned upon by the users. So I need to run the update in batches. The other problem is that the ID sequence is having gaps in it. Larger gaps than the batch size. At the moment I have this solution:

          DECLARE
          @LastID int = 0,
          @NextID int,
          @RC int = 1;

          WHILE (@RC > 0)
          BEGIN
          SELECT TOP 5000
          @NextID = s.id
          FROM Source s
          WHERE s.id> @LastID
          ORDER BY s.id
          ;
          UPDATE t
          SET ------
          FROM Source s
          JOIN Target t ON t.id = s.id
          WHERE s.id > @LastID
          AND s.id <= @NextID
          ;
          SET @RC = @@ROWCOUNT;
          SET @LastID = @NextID ;
          END

          Which works just fine, but using two selects is getting under my skin. Any better suggestions for how to do it?

          Wrong is evil and must be defeated. - Jeff Ello

          L Offline
          L Offline
          Lost User
          wrote on last edited by
          #5

          Jörgen Andersson wrote:

          The other problem is that the ID sequence is having gaps in it.

          That doesn't change the functionality, and since it should not be visible to the outside world it should not be a problem.

          Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^] "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

          J 1 Reply Last reply
          0
          • J Jorgen Andersson

            Using a WHERE NOT EXISTS turned out to be very slow because of the antijoin using index seeks for every row. I changed it to WHERE ID > (SELECT ISNULL(max(ID),0) FROM @ProcessedIDs) which allows an index scan. This is magnitudes faster than the original nonbatched update. The question is how to use this with composite keys?

            Wrong is evil and must be defeated. - Jeff Ello

            Richard DeemingR Offline
            Richard DeemingR Offline
            Richard Deeming
            wrote on last edited by
            #6

            When you use the TOP clause with the UPDATE statement, there's no guarantee that the rows to update will be picked in any particular order. Using the MAX(id) option, you could end up missing rows. I notice you've replaced the temporary table with a table variable. Was there a reason for that? IIRC, execution plans for table variables tend to assume they contain a very low number of rows, which might explain the poor performance.


            "These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer

            "These people looked deep within my soul and assigned me a number based on the order in which I joined" - Homer

            J 2 Replies Last reply
            0
            • L Lost User

              Jörgen Andersson wrote:

              The other problem is that the ID sequence is having gaps in it.

              That doesn't change the functionality, and since it should not be visible to the outside world it should not be a problem.

              Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^] "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

              J Offline
              J Offline
              Jorgen Andersson
              wrote on last edited by
              #7

              It was important to mention so that I don't get suggestions like WHERE ID BETWEEN @LastID AND @LastID + 5000

              Wrong is evil and must be defeated. - Jeff Ello

              1 Reply Last reply
              0
              • Richard DeemingR Richard Deeming

                When you use the TOP clause with the UPDATE statement, there's no guarantee that the rows to update will be picked in any particular order. Using the MAX(id) option, you could end up missing rows. I notice you've replaced the temporary table with a table variable. Was there a reason for that? IIRC, execution plans for table variables tend to assume they contain a very low number of rows, which might explain the poor performance.


                "These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer

                J Offline
                J Offline
                Jorgen Andersson
                wrote on last edited by
                #8

                Richard Deeming wrote:

                When you use the TOP clause with the UPDATE statement, there's no guarantee that the rows to update will be picked in any particular order. Using the MAX(id) option, you could end up missing rows.

                I know, and you can't add an order by to an UPDATE or INSERT. But you can put the SELECT with TOP and ORDER BY in a CTE.

                Richard Deeming wrote:

                I notice you've replaced the temporary table with a table variable. Was there a reason for that?

                No particular reason. I like to keep the scope as local as possible, so it's mostly a habit.

                Richard Deeming wrote:

                IIRC, execution plans for table variables tend to assume they contain a very low number of rows, which might explain the poor performance.

                Table variables don't have statistics, which obviously could affect the plan, but since all ID's are unique I don't think it would make a big difference in this case But I will test it. <edit>Oh, and table variables can't go parallell, which obviously can affect performance a lot in this case.</edit>

                Wrong is evil and must be defeated. - Jeff Ello

                1 Reply Last reply
                0
                • Richard DeemingR Richard Deeming

                  When you use the TOP clause with the UPDATE statement, there's no guarantee that the rows to update will be picked in any particular order. Using the MAX(id) option, you could end up missing rows. I notice you've replaced the temporary table with a table variable. Was there a reason for that? IIRC, execution plans for table variables tend to assume they contain a very low number of rows, which might explain the poor performance.


                  "These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer

                  J Offline
                  J Offline
                  Jorgen Andersson
                  wrote on last edited by
                  #9

                  Done some testing now. And as I suspected, there is no difference in either performance or plan as long as the temp table has one column with unique values. Until the query goes parallel that is. Then the difference is quickly getting huge. But as long as I'm batching the query it stays the same until the batch is big enough to go parallel (which happens between 10000 and 20000 rows in this case). But then I will also get a table lock. And oddly enough, it also goes much slower when parallel until reaching 100000 rows per batch. I will do some more testing on this.

                  Wrong is evil and must be defeated. - Jeff Ello

                  1 Reply Last reply
                  0
                  • J Jorgen Andersson

                    So I need to regularly update a table with data from another table. The problem is that if I update the normal way I get a table lock on the target table for half an hour, which is frowned upon by the users. So I need to run the update in batches. The other problem is that the ID sequence is having gaps in it. Larger gaps than the batch size. At the moment I have this solution:

                    DECLARE
                    @LastID int = 0,
                    @NextID int,
                    @RC int = 1;

                    WHILE (@RC > 0)
                    BEGIN
                    SELECT TOP 5000
                    @NextID = s.id
                    FROM Source s
                    WHERE s.id> @LastID
                    ORDER BY s.id
                    ;
                    UPDATE t
                    SET ------
                    FROM Source s
                    JOIN Target t ON t.id = s.id
                    WHERE s.id > @LastID
                    AND s.id <= @NextID
                    ;
                    SET @RC = @@ROWCOUNT;
                    SET @LastID = @NextID ;
                    END

                    Which works just fine, but using two selects is getting under my skin. Any better suggestions for how to do it?

                    Wrong is evil and must be defeated. - Jeff Ello

                    realJSOPR Offline
                    realJSOPR Offline
                    realJSOP
                    wrote on last edited by
                    #10

                    Add WITH(NOLOCK) to your selects and joins:

                    DECLARE
                    @LastID int = 0,
                    @NextID int,
                    @RC int = 1;

                    WHILE (@RC > 0)
                    BEGIN
                    SELECT TOP 5000
                    @NextID = s.id
                    FROM Source s WITH(NOLOCK)
                    WHERE s.id> @LastID
                    ORDER BY s.id
                    ;
                    UPDATE t
                    SET ------
                    FROM Source s
                    JOIN Target t WITH(NOLOCK) ON t.id = s.id
                    WHERE s.id > @LastID
                    AND s.id <= @NextID
                    ;
                    SET @RC = @@ROWCOUNT;
                    SET @LastID = @NextID ;
                    END

                    ".45 ACP - because shooting twice is just silly" - JSOP, 2010
                    -----
                    You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010
                    -----
                    When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013

                    Richard DeemingR 1 Reply Last reply
                    0
                    • realJSOPR realJSOP

                      Add WITH(NOLOCK) to your selects and joins:

                      DECLARE
                      @LastID int = 0,
                      @NextID int,
                      @RC int = 1;

                      WHILE (@RC > 0)
                      BEGIN
                      SELECT TOP 5000
                      @NextID = s.id
                      FROM Source s WITH(NOLOCK)
                      WHERE s.id> @LastID
                      ORDER BY s.id
                      ;
                      UPDATE t
                      SET ------
                      FROM Source s
                      JOIN Target t WITH(NOLOCK) ON t.id = s.id
                      WHERE s.id > @LastID
                      AND s.id <= @NextID
                      ;
                      SET @RC = @@ROWCOUNT;
                      SET @LastID = @NextID ;
                      END

                      ".45 ACP - because shooting twice is just silly" - JSOP, 2010
                      -----
                      You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010
                      -----
                      When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013

                      Richard DeemingR Offline
                      Richard DeemingR Offline
                      Richard Deeming
                      wrote on last edited by
                      #11

                      But only if you understand the risks first. :) Using NOLOCK? Here's How You'll Get the Wrong Query Results. - Brent Ozar Unlimited®[^] Bad habits : Putting NOLOCK everywhere - SQL Sentry[^] Also, it won't work on the target table: Avoid using NOLOCK on SQL Server UPDATE and DELETE statements[^]


                      "These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer

                      "These people looked deep within my soul and assigned me a number based on the order in which I joined" - Homer

                      realJSOPR 1 Reply Last reply
                      0
                      • Richard DeemingR Richard Deeming

                        But only if you understand the risks first. :) Using NOLOCK? Here's How You'll Get the Wrong Query Results. - Brent Ozar Unlimited®[^] Bad habits : Putting NOLOCK everywhere - SQL Sentry[^] Also, it won't work on the target table: Avoid using NOLOCK on SQL Server UPDATE and DELETE statements[^]


                        "These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer

                        realJSOPR Offline
                        realJSOPR Offline
                        realJSOP
                        wrote on last edited by
                        #12

                        I didn't put nolock on the update statement - I put it on the join. You could just create a job that does the monster update at night.

                        ".45 ACP - because shooting twice is just silly" - JSOP, 2010
                        -----
                        You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010
                        -----
                        When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013

                        Richard DeemingR 1 Reply Last reply
                        0
                        • realJSOPR realJSOP

                          I didn't put nolock on the update statement - I put it on the join. You could just create a job that does the monster update at night.

                          ".45 ACP - because shooting twice is just silly" - JSOP, 2010
                          -----
                          You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010
                          -----
                          When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013

                          Richard DeemingR Offline
                          Richard DeemingR Offline
                          Richard Deeming
                          wrote on last edited by
                          #13

                          Quote:

                          UPDATE t
                          ...
                          FROM Source s
                          JOIN Target t WITH(NOLOCK) ON t.id = s.id
                          ...

                          That NOLOCK hint is on the target table. It's exactly the same as the first example from the article I linked to:

                          Avoid using NOLOCK on SQL Server UPDATE and DELETE statements[^]:

                          UPDATE t1
                          SET t1.x = something
                          FROM dbo.t1 WITH (NOLOCK)
                          INNER JOIN ...;


                          "These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer

                          "These people looked deep within my soul and assigned me a number based on the order in which I joined" - Homer

                          realJSOPR 1 Reply Last reply
                          0
                          • J Jorgen Andersson

                            So I need to regularly update a table with data from another table. The problem is that if I update the normal way I get a table lock on the target table for half an hour, which is frowned upon by the users. So I need to run the update in batches. The other problem is that the ID sequence is having gaps in it. Larger gaps than the batch size. At the moment I have this solution:

                            DECLARE
                            @LastID int = 0,
                            @NextID int,
                            @RC int = 1;

                            WHILE (@RC > 0)
                            BEGIN
                            SELECT TOP 5000
                            @NextID = s.id
                            FROM Source s
                            WHERE s.id> @LastID
                            ORDER BY s.id
                            ;
                            UPDATE t
                            SET ------
                            FROM Source s
                            JOIN Target t ON t.id = s.id
                            WHERE s.id > @LastID
                            AND s.id <= @NextID
                            ;
                            SET @RC = @@ROWCOUNT;
                            SET @LastID = @NextID ;
                            END

                            Which works just fine, but using two selects is getting under my skin. Any better suggestions for how to do it?

                            Wrong is evil and must be defeated. - Jeff Ello

                            realJSOPR Offline
                            realJSOPR Offline
                            realJSOP
                            wrote on last edited by
                            #14

                            Have you tried using a MERGE statement?

                            ".45 ACP - because shooting twice is just silly" - JSOP, 2010
                            -----
                            You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010
                            -----
                            When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013

                            J 1 Reply Last reply
                            0
                            • Richard DeemingR Richard Deeming

                              Quote:

                              UPDATE t
                              ...
                              FROM Source s
                              JOIN Target t WITH(NOLOCK) ON t.id = s.id
                              ...

                              That NOLOCK hint is on the target table. It's exactly the same as the first example from the article I linked to:

                              Avoid using NOLOCK on SQL Server UPDATE and DELETE statements[^]:

                              UPDATE t1
                              SET t1.x = something
                              FROM dbo.t1 WITH (NOLOCK)
                              INNER JOIN ...;


                              "These people looked deep within my soul and assigned me a number based on the order in which I joined." - Homer

                              realJSOPR Offline
                              realJSOPR Offline
                              realJSOP
                              wrote on last edited by
                              #15

                              We use WITH(NOLOCK) prolifically. Of course, we have indexes on all of our tables, and don't generally do massive updates in the middle of the work day. We have no issues.

                              ".45 ACP - because shooting twice is just silly" - JSOP, 2010
                              -----
                              You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010
                              -----
                              When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013

                              J 1 Reply Last reply
                              0
                              • realJSOPR realJSOP

                                We use WITH(NOLOCK) prolifically. Of course, we have indexes on all of our tables, and don't generally do massive updates in the middle of the work day. We have no issues.

                                ".45 ACP - because shooting twice is just silly" - JSOP, 2010
                                -----
                                You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010
                                -----
                                When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013

                                J Offline
                                J Offline
                                Jorgen Andersson
                                wrote on last edited by
                                #16

                                NOLOCK can cause nonclustered index corruption, and it's also deprecated[^].

                                Wrong is evil and must be defeated. - Jeff Ello

                                realJSOPR 1 Reply Last reply
                                0
                                • realJSOPR realJSOP

                                  Have you tried using a MERGE statement?

                                  ".45 ACP - because shooting twice is just silly" - JSOP, 2010
                                  -----
                                  You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010
                                  -----
                                  When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013

                                  J Offline
                                  J Offline
                                  Jorgen Andersson
                                  wrote on last edited by
                                  #17

                                  Yes. The update in my OP is greatly simplified. The actual update is a merge with more than 4000 rows.

                                  Wrong is evil and must be defeated. - Jeff Ello

                                  1 Reply Last reply
                                  0
                                  • J Jorgen Andersson

                                    NOLOCK can cause nonclustered index corruption, and it's also deprecated[^].

                                    Wrong is evil and must be defeated. - Jeff Ello

                                    realJSOPR Offline
                                    realJSOPR Offline
                                    realJSOP
                                    wrote on last edited by
                                    #18

                                    That's something for our DBAs to worry about. :)

                                    ".45 ACP - because shooting twice is just silly" - JSOP, 2010
                                    -----
                                    You can never have too much ammo - unless you're swimming, or on fire. - JSOP, 2010
                                    -----
                                    When you pry the gun from my cold dead hands, be careful - the barrel will be very hot. - JSOP, 2013

                                    1 Reply Last reply
                                    0
                                    Reply
                                    • Reply as topic
                                    Log in to reply
                                    • Oldest to Newest
                                    • Newest to Oldest
                                    • Most Votes


                                    • Login

                                    • Don't have an account? Register

                                    • Login or register to search.
                                    • First post
                                      Last post
                                    0
                                    • Categories
                                    • Recent
                                    • Tags
                                    • Popular
                                    • World
                                    • Users
                                    • Groups