large SQL table update performance
-
Overview... Client has an inventory DB in SQL with over 2 million records. The inventory db is constantly updated by .NET 3.5 winforms app that queries another 3rd party db via web services. the inventory is essently in constant update. Client is looking for faster performance as it's taking days to cycle thru the whole table and inventoried items will continue to grow. I have no control over the structure of the SQL table as its tied to other applications and I have no control over the 3rd party web service that I'm getting the updated info from.. I only have control over the winform app that communicates with the two. It simply queries the webservice for inventory data, do some work with the data, and update the results into SQL db. But it has to run down thru the entire inventory table. I've inherited this app and so most of the work I've done was specific to functional tweaks and code cleanup but now the client is looking for more speed. Throwing 5 figures worth of hardware horsepower and my "tweaks" isn't giving him the results he was expecting. While his old hardware was inadequate for what he was doing, I explained throwing muscle doesn't really fix slow software. Sending Update queries 1 record at a time whether via the .NET app or even calling a Stored Procedure just seems inefficient when your dealing with a dataset of this size.. And when it takes days to update, it runs 24/7 which causes other issues. My thought is to do work with the data via a datatable and then do one big update to SQL. I'm not talking loading all records.. currently the app loops thru chunks of 500-1000 records at a time. There has to be way better than a 1 by 1 record update encapsulated in a big For-Loop. Advice, Comments, Opinions? I've done similar work with datatables but it was for manipulating CSV files and on much smaller scale (< 1000 rows). But it seems logical that the priciples would be the same and faster than calling SQL directly to update 1 record at a time. Thanks In Advance!
-
Overview... Client has an inventory DB in SQL with over 2 million records. The inventory db is constantly updated by .NET 3.5 winforms app that queries another 3rd party db via web services. the inventory is essently in constant update. Client is looking for faster performance as it's taking days to cycle thru the whole table and inventoried items will continue to grow. I have no control over the structure of the SQL table as its tied to other applications and I have no control over the 3rd party web service that I'm getting the updated info from.. I only have control over the winform app that communicates with the two. It simply queries the webservice for inventory data, do some work with the data, and update the results into SQL db. But it has to run down thru the entire inventory table. I've inherited this app and so most of the work I've done was specific to functional tweaks and code cleanup but now the client is looking for more speed. Throwing 5 figures worth of hardware horsepower and my "tweaks" isn't giving him the results he was expecting. While his old hardware was inadequate for what he was doing, I explained throwing muscle doesn't really fix slow software. Sending Update queries 1 record at a time whether via the .NET app or even calling a Stored Procedure just seems inefficient when your dealing with a dataset of this size.. And when it takes days to update, it runs 24/7 which causes other issues. My thought is to do work with the data via a datatable and then do one big update to SQL. I'm not talking loading all records.. currently the app loops thru chunks of 500-1000 records at a time. There has to be way better than a 1 by 1 record update encapsulated in a big For-Loop. Advice, Comments, Opinions? I've done similar work with datatables but it was for manipulating CSV files and on much smaller scale (< 1000 rows). But it seems logical that the priciples would be the same and faster than calling SQL directly to update 1 record at a time. Thanks In Advance!
Couple of things come to mind ... Assuming the communication via the web-service is probably the slowest component in the process ... Can you make your application multi-threaded or at least run more than one instance of the applicaiton ? We had a process that used to process transaction from a single table which would cause many records to be updated; the problem was that new transactions were being added to the table faster than we could process them. As a quick and dirty fix for this customer was to create multiple "processor" applications; each instance of the "processor" applicaiton would read from the transaction table and handle only those records assigned to it. The workload assignment was broken into departments, Processor-A would handle transactions from Dept-A, etc. This actually worked out for us and the total throughput increased dramatically. The advantage of this approach was that we didn't have to modify the original "proccessor" application drastically in order to get it to work. Might be something you want to consider. :rose: Good luck. David
-
Couple of things come to mind ... Assuming the communication via the web-service is probably the slowest component in the process ... Can you make your application multi-threaded or at least run more than one instance of the applicaiton ? We had a process that used to process transaction from a single table which would cause many records to be updated; the problem was that new transactions were being added to the table faster than we could process them. As a quick and dirty fix for this customer was to create multiple "processor" applications; each instance of the "processor" applicaiton would read from the transaction table and handle only those records assigned to it. The workload assignment was broken into departments, Processor-A would handle transactions from Dept-A, etc. This actually worked out for us and the total throughput increased dramatically. The advantage of this approach was that we didn't have to modify the original "proccessor" application drastically in order to get it to work. Might be something you want to consider. :rose: Good luck. David
the web service is by far the slowest and also most restrictive component.. Right now the client is running between 4 and 8 instances of the app at once.. but when the app is only knocking out combined average 30,000 records an hour(on a good day), we're still talking days to cycle the whole table running 24/7. Each instance of the app is set up to queue up 500-1000 records at a time.. Queue as in queue build an array of IDs that will hit the webservices.. But then it updates record by record in a for-loop. Grabbing 500 IDs, adding them to an arraylist alone is silly slow.. Especially when a DT.Fill can do thousands of records in a couple seconds or less. Among the "tweaks" I've done for the client is I have created the ability break down and categorize the workload for each instance of the app. This has helped the client some for specific business needs but in the end the entire inventory catalog needs to be updated as frequently as possible. Given the web services constraints this project has a utopian theoretical max of 150,000 records an hour.. whether it's with 1 instance or with 20 instances.. at a certain threshold the webservice will start denying requests (security measure). I don't ever expect to see 150,000/hr with the current environment.. But I'd like to think that I could achieve a stable 80,000-100,000/hr if I could avoid having the app code from looping and hitting the SQL for each and every record it has queued up. Too bad .NET doesn't have a BulkUpdate class similar to BulkCopy.. :)
-
the web service is by far the slowest and also most restrictive component.. Right now the client is running between 4 and 8 instances of the app at once.. but when the app is only knocking out combined average 30,000 records an hour(on a good day), we're still talking days to cycle the whole table running 24/7. Each instance of the app is set up to queue up 500-1000 records at a time.. Queue as in queue build an array of IDs that will hit the webservices.. But then it updates record by record in a for-loop. Grabbing 500 IDs, adding them to an arraylist alone is silly slow.. Especially when a DT.Fill can do thousands of records in a couple seconds or less. Among the "tweaks" I've done for the client is I have created the ability break down and categorize the workload for each instance of the app. This has helped the client some for specific business needs but in the end the entire inventory catalog needs to be updated as frequently as possible. Given the web services constraints this project has a utopian theoretical max of 150,000 records an hour.. whether it's with 1 instance or with 20 instances.. at a certain threshold the webservice will start denying requests (security measure). I don't ever expect to see 150,000/hr with the current environment.. But I'd like to think that I could achieve a stable 80,000-100,000/hr if I could avoid having the app code from looping and hitting the SQL for each and every record it has queued up. Too bad .NET doesn't have a BulkUpdate class similar to BulkCopy.. :)
ok. seems like you've got a handle on the situation. Regarding the updates and looping ... If you have a table, #StagingTable, and it has the updates, then you should be able to do a join on your update statement. Something like this ... UPDATE Table1 SET Table1.Field1 = #StagingTable.Field1 FROM Table1 INNER JOIN StagingTable ON Table1.Field2 = StagingTable.Field2 WHERE By doing this, you will certainly reduce the amount of network traffic between the client computer and db server. Let the db server do all the heavy work ... Just an idea.:thumbsup:
-
Overview... Client has an inventory DB in SQL with over 2 million records. The inventory db is constantly updated by .NET 3.5 winforms app that queries another 3rd party db via web services. the inventory is essently in constant update. Client is looking for faster performance as it's taking days to cycle thru the whole table and inventoried items will continue to grow. I have no control over the structure of the SQL table as its tied to other applications and I have no control over the 3rd party web service that I'm getting the updated info from.. I only have control over the winform app that communicates with the two. It simply queries the webservice for inventory data, do some work with the data, and update the results into SQL db. But it has to run down thru the entire inventory table. I've inherited this app and so most of the work I've done was specific to functional tweaks and code cleanup but now the client is looking for more speed. Throwing 5 figures worth of hardware horsepower and my "tweaks" isn't giving him the results he was expecting. While his old hardware was inadequate for what he was doing, I explained throwing muscle doesn't really fix slow software. Sending Update queries 1 record at a time whether via the .NET app or even calling a Stored Procedure just seems inefficient when your dealing with a dataset of this size.. And when it takes days to update, it runs 24/7 which causes other issues. My thought is to do work with the data via a datatable and then do one big update to SQL. I'm not talking loading all records.. currently the app loops thru chunks of 500-1000 records at a time. There has to be way better than a 1 by 1 record update encapsulated in a big For-Loop. Advice, Comments, Opinions? I've done similar work with datatables but it was for manipulating CSV files and on much smaller scale (< 1000 rows). But it seems logical that the priciples would be the same and faster than calling SQL directly to update 1 record at a time. Thanks In Advance!
-
Overview... Client has an inventory DB in SQL with over 2 million records. The inventory db is constantly updated by .NET 3.5 winforms app that queries another 3rd party db via web services. the inventory is essently in constant update. Client is looking for faster performance as it's taking days to cycle thru the whole table and inventoried items will continue to grow. I have no control over the structure of the SQL table as its tied to other applications and I have no control over the 3rd party web service that I'm getting the updated info from.. I only have control over the winform app that communicates with the two. It simply queries the webservice for inventory data, do some work with the data, and update the results into SQL db. But it has to run down thru the entire inventory table. I've inherited this app and so most of the work I've done was specific to functional tweaks and code cleanup but now the client is looking for more speed. Throwing 5 figures worth of hardware horsepower and my "tweaks" isn't giving him the results he was expecting. While his old hardware was inadequate for what he was doing, I explained throwing muscle doesn't really fix slow software. Sending Update queries 1 record at a time whether via the .NET app or even calling a Stored Procedure just seems inefficient when your dealing with a dataset of this size.. And when it takes days to update, it runs 24/7 which causes other issues. My thought is to do work with the data via a datatable and then do one big update to SQL. I'm not talking loading all records.. currently the app loops thru chunks of 500-1000 records at a time. There has to be way better than a 1 by 1 record update encapsulated in a big For-Loop. Advice, Comments, Opinions? I've done similar work with datatables but it was for manipulating CSV files and on much smaller scale (< 1000 rows). But it seems logical that the priciples would be the same and faster than calling SQL directly to update 1 record at a time. Thanks In Advance!
Seems to me there is a fundamental flaw in the process, the web services should be your bottleneck but for a database to only be updating 30k records an hour seems ludicrous. If the cursor/loop is the problem then look a little further afield. Can you construct an xml dataset on the UI end and pass in a bulk set of updates. How about doing a bunch of bulk inserts into another table and then using a proc to do the updates to your transaction table. Target tables could be created for each instance and truncated after the update. Or cycle through 2 target tables per instance.
Never underestimate the power of human stupidity RAH