SQL vs Code
-
Rama Krishna Vavilala wrote:
No my experience with previous projects has been the same code is usually lot slower than SPs. Which makes sense as no data needs to be transfered over wire or via IPC.
I've made huge performance improvements by changing queries to just stream raw table data to an app instead of doing any aggregation in SQL server. As soon as the memory required for the aggregation on sql server overflowed the available server memory, paging dragged performance to a standstill with hundreds of other queries backed up behind it. Streaming several 5-12Gigs of data from SQL required much less memory and we could use more efficient structures in C# so it was thousands of times faster.
I can imagine the sinking feeling one would have after ordering my book, only to find a laughably ridiculous theory with demented logic once the book arrives - Mark McCutcheon
Seconded.
Need custom software developed? I do custom programming based primarily on MS tools with an emphasis on C# development and consulting. A man said to the universe: "Sir I exist!" "However," replied the universe, "The fact has not created in me A sense of obligation." --Stephen Crane
-
Rama Krishna Vavilala wrote:
No my experience with previous projects has been the same code is usually lot slower than SPs. Which makes sense as no data needs to be transfered over wire or via IPC.
I've made huge performance improvements by changing queries to just stream raw table data to an app instead of doing any aggregation in SQL server. As soon as the memory required for the aggregation on sql server overflowed the available server memory, paging dragged performance to a standstill with hundreds of other queries backed up behind it. Streaming several 5-12Gigs of data from SQL required much less memory and we could use more efficient structures in C# so it was thousands of times faster.
I can imagine the sinking feeling one would have after ordering my book, only to find a laughably ridiculous theory with demented logic once the book arrives - Mark McCutcheon
Rather than remain silent and appear intelligent, I'll ask a question... Can't you use C# in SQL Server now? Wouldn't those C# optimizations work within the db itself? I haven't used any .NET in the db, so I'm interested to see what the answer is.
-
I haven't seen the code nor do I know the problem domain, but the kind of numbers you posted (6 hours vs. 15 minutes) indicate that something else is going on other than comparing two optimum approaches. I used to think transferring data over the wire is a big deal, but it really isn't. Try the following exercise for fun one day:
DateTime startDate = DateTime.Now;
...
sqlCommand.CommandText = "SELECT * FROM myTable";
...
reader = sqlCommand.ExecuteReader();
while(reader.read()){
...//Don't really have to do anything for a test
}
TimeSpan result = DateTime.Now - startDate;6 million records shouldn't take more than a few minutes at worse.
Need custom software developed? I do custom programming based primarily on MS tools with an emphasis on C# development and consulting. A man said to the universe: "Sir I exist!" "However," replied the universe, "The fact has not created in me A sense of obligation." --Stephen Crane
Well it is not just streaming but processing too. When you include processing it is streaming back and forth which effects the overall performance.
-
Your dev/test environment has a dataset as massively huge as the production one? ON the one hand I'm impressed that you've gone so far towards realistic testing, OTOH having a dataset small enough to check results with in under a minute is very useful too.
3x12=36 2x12=24 1x12=12 0x12=18
Dan Neely wrote:
with in under a minute is very useful too.
Yes. All I need to do us to enter different date ranges.
-
I haven't seen the code nor do I know the problem domain, but the kind of numbers you posted (6 hours vs. 15 minutes) indicate that something else is going on other than comparing two optimum approaches. I used to think transferring data over the wire is a big deal, but it really isn't. Try the following exercise for fun one day:
DateTime startDate = DateTime.Now;
...
sqlCommand.CommandText = "SELECT * FROM myTable";
...
reader = sqlCommand.ExecuteReader();
while(reader.read()){
...//Don't really have to do anything for a test
}
TimeSpan result = DateTime.Now - startDate;6 million records shouldn't take more than a few minutes at worse.
Need custom software developed? I do custom programming based primarily on MS tools with an emphasis on C# development and consulting. A man said to the universe: "Sir I exist!" "However," replied the universe, "The fact has not created in me A sense of obligation." --Stephen Crane
On a good connection perhaps...but what if they're across the state and have to go through several firewalls in the process. Keeping a connection open for each user for 'a few minutes' would be horrible. Plus, a few minutes is pretty hard for a user to accept. I mean, if your computer took a few minutes to bring up your homepage. Would you consider a new computer or at least a new homepage? :D
-
On a good connection perhaps...but what if they're across the state and have to go through several firewalls in the process. Keeping a connection open for each user for 'a few minutes' would be horrible. Plus, a few minutes is pretty hard for a user to accept. I mean, if your computer took a few minutes to bring up your homepage. Would you consider a new computer or at least a new homepage? :D
Whomever the user is they are already accustomed to a 30 minute wait. Heaven forbid I suggest a 2 minute wait is more usable.
Need custom software developed? I do custom programming based primarily on MS tools with an emphasis on C# development and consulting. A man said to the universe: "Sir I exist!" "However," replied the universe, "The fact has not created in me A sense of obligation." --Stephen Crane
-
Seconded.
Need custom software developed? I do custom programming based primarily on MS tools with an emphasis on C# development and consulting. A man said to the universe: "Sir I exist!" "However," replied the universe, "The fact has not created in me A sense of obligation." --Stephen Crane
Seconded the second time:~ If and almost only if it's on a local network/ethernet. Or an very fast internet connection(8+ MBytes/Sec). On my ex work place I managed to make a Core2Duo at ~2 Ghz with 2Gigs RAM outperform a Server with 20+ processing power dedicated for the task not shared for other apps. But for that I did lots of optimizations on the DataLayer. No automated ORM tools In the worst case a BinarySerach and most of times O(1) with Dictionaries and the like, instead of complex joins and calculations on the server. But I knew that the lowest of PC would be a Core2 at aprox 2GHz with 2Gigs of RAM. :)
I bug
-
Whomever the user is they are already accustomed to a 30 minute wait. Heaven forbid I suggest a 2 minute wait is more usable.
Need custom software developed? I do custom programming based primarily on MS tools with an emphasis on C# development and consulting. A man said to the universe: "Sir I exist!" "However," replied the universe, "The fact has not created in me A sense of obligation." --Stephen Crane
Correct, with respect to the situation at hand. I was responding to the general statement about the 2 minutes being fine. Sorry, I'm having flashbacks from my reporting days. ;)
-
Rather than remain silent and appear intelligent, I'll ask a question... Can't you use C# in SQL Server now? Wouldn't those C# optimizations work within the db itself? I haven't used any .NET in the db, so I'm interested to see what the answer is.
I have not heard of any experienced developer here who is actually using the CLR in SQL Server, I can't see adding another layer of abstraction over TSQL as being useful. INMHO anyone who uses a database to store data should learn the databases language (I hate LINQ as well).
Never underestimate the power of human stupidity RAH
-
Move over to Oracle. The amount of stuff I've moved into the database because it fits in better there - it's wonderful, no longer having to retrieve items to perform complex processing only to put the output back into the database.
"WPF has many lovers. It's a veritable porn star!" - Josh Smith
As Braveheart once said, "You can take our freedom but you'll never take our Hobnobs!" - Martin Hughes.
Pete O'Hanlon wrote:
retrieve items to perform complex processing
But... why, surely TSQL (or the PL flavour) can do basically anything you can do in code. I have yet to run across any requirement that I have had to go out to c# to meet. And this week we have to implement cubic spline into our procs!
Never underestimate the power of human stupidity RAH
-
Rather than remain silent and appear intelligent, I'll ask a question... Can't you use C# in SQL Server now? Wouldn't those C# optimizations work within the db itself? I haven't used any .NET in the db, so I'm interested to see what the answer is.
It's a good solution too. The problem however is that SQLCLR is disabled by default and many database admins are still reluctant to get that enabled.
-
Whomever the user is they are already accustomed to a 30 minute wait. Heaven forbid I suggest a 2 minute wait is more usable.
Need custom software developed? I do custom programming based primarily on MS tools with an emphasis on C# development and consulting. A man said to the universe: "Sir I exist!" "However," replied the universe, "The fact has not created in me A sense of obligation." --Stephen Crane
Ennis Ray Lynch, Jr. wrote:
Whomever the user is they are already accustomed to a 30 minute wait
It's a batch process and runs automatically in my case. No user will ever accept waiting for 30 minutes :)
-
I just made a tough decision by opting for a complex SQL (with recursive CTEs) in favor of complex logic in code. I hope it does not bite me later.:~
Well, my general rule is, "if it can be done by the database, it should be done by the database". In the past I have written the RDB equivalent of extended stored procedures to avoid bringing the data out needlessly. (I got chastised for it, "the metadata will bog down the server!" :rolleyes: )
-
Rather than remain silent and appear intelligent, I'll ask a question... Can't you use C# in SQL Server now? Wouldn't those C# optimizations work within the db itself? I haven't used any .NET in the db, so I'm interested to see what the answer is.
If that had allowed us to scale out it would have been an option. However, we were so resource constrained on the database servers that it probably would not have scaled better. The application was a web log processing application which processed anywhere between 10 and billions of records a day. We calculated rolling aggregates over the past week for various attributes to make predictions, and had another process to apply future data backwards to detect additional fraud. The original design used things like materialized views to generate the aggregates, however that was not able to scale with the hardware constraints that we had. The next design calculated lated loaded aggregates with a set of indexes on the data table. However inserts became too slow. It is possible that CLR aggregation would be able to help, but I was flabbergasted to find that opening up multiple threads and pulling chunks of raw data straight out the the main table while building aggregates on the fly with .net dictionaries was tremendously fast. It allowed us to load up a week of data in under a minute hitting 70% utilization of a gigabit network, and then processing at 3-5,000 transactions a second. For some of the larger clients we had a special build which was able to hit 25,000 to 30,000 with 10Gigs of data in memory. Not bad for a little C# console app. In fact, the biggest bottleneck was normalizing cookie data.
I can imagine the sinking feeling one would have after ordering my book, only to find a laughably ridiculous theory with demented logic once the book arrives - Mark McCutcheon
-
Ennis Ray Lynch, Jr. wrote:
Whomever the user is they are already accustomed to a 30 minute wait
It's a batch process and runs automatically in my case. No user will ever accept waiting for 30 minutes :)
We find LINQ and CLR to be extremely interesting innovations that enable flexible options -- in combination with stored procedures/SQL -- to solve more problems fast. SQL is mature, LINQ and CLR are just getting started, so give them a chance, and a try...
-
Pete O'Hanlon wrote:
retrieve items to perform complex processing
But... why, surely TSQL (or the PL flavour) can do basically anything you can do in code. I have yet to run across any requirement that I have had to go out to c# to meet. And this week we have to implement cubic spline into our procs!
Never underestimate the power of human stupidity RAH
We do a lot of spatial work, which we previously had to delegate out to MapInfo to process; now we do it directly in Oracle Spatial and it works a treat.
"WPF has many lovers. It's a veritable porn star!" - Josh Smith
As Braveheart once said, "You can take our freedom but you'll never take our Hobnobs!" - Martin Hughes.
-
Rama Krishna Vavilala wrote:
No my experience with previous projects has been the same code is usually lot slower than SPs. Which makes sense as no data needs to be transfered over wire or via IPC.
I've made huge performance improvements by changing queries to just stream raw table data to an app instead of doing any aggregation in SQL server. As soon as the memory required for the aggregation on sql server overflowed the available server memory, paging dragged performance to a standstill with hundreds of other queries backed up behind it. Streaming several 5-12Gigs of data from SQL required much less memory and we could use more efficient structures in C# so it was thousands of times faster.
I can imagine the sinking feeling one would have after ordering my book, only to find a laughably ridiculous theory with demented logic once the book arrives - Mark McCutcheon
This question can not be answered without knowing the details of the implementation. These are general remarks to consider : * In a performance perspective : SQL works a lot faster on processing dataset than individual row computation (in general). But as Andy said, there is some caveat : in one of my project, i do all computation by the database (around 10 millions rows), it was working fine, but slowly when we reach 60 millions, the total duration grows exponentially. Because the resources needed to run the procedure starves all the server ressources (meory, paging, ...). I think there is a balance in term of the computed dataset size where sql is better than code. You've just to find it :-) * In a design perspective : In my opinion, moving business logic (computation, calculation) inside the database should be avoid (if possible). I usally used database as data storage and few basic queries (basic agreggation). For complex data manipulation with business logic, it should be done in the BAL (Business Access Layer). Conclusion : - You have to determine what is best for you : architecture design versus performance design - You have to determine what is best approach : whole dataset processing, chunk of data processing, or streaming the process
-
We do a lot of spatial work, which we previously had to delegate out to MapInfo to process; now we do it directly in Oracle Spatial and it works a treat.
"WPF has many lovers. It's a veritable porn star!" - Josh Smith
As Braveheart once said, "You can take our freedom but you'll never take our Hobnobs!" - Martin Hughes.
Pete O'Hanlon wrote:
We do a lot of spatial work
90% of my work is CRUD financial calcs so SQL deals with that nicely. Tell me does oravle have the equivalent of the HierarchyID that SQL Server uses. We have approx 4tb of data to work on and I have opted for Oracle rather than SQL Server as the databse.
Never underestimate the power of human stupidity RAH
-
I just made a tough decision by opting for a complex SQL (with recursive CTEs) in favor of complex logic in code. I hope it does not bite me later.:~
I'm with you on that. I've always favored pure SQL for performance. In a project I worked on I advised the project leader of the possible performance pitfalls of using LINQtoSQL. Guess what? System got slow as hell and a lot of effort had to be put in in order to improve performance. The argument of productivity fell on this case. I beleive that the other case might be true when no complex operations happens with data from the database, the productivity might be good without affecting performance much.
-
This question can not be answered without knowing the details of the implementation. These are general remarks to consider : * In a performance perspective : SQL works a lot faster on processing dataset than individual row computation (in general). But as Andy said, there is some caveat : in one of my project, i do all computation by the database (around 10 millions rows), it was working fine, but slowly when we reach 60 millions, the total duration grows exponentially. Because the resources needed to run the procedure starves all the server ressources (meory, paging, ...). I think there is a balance in term of the computed dataset size where sql is better than code. You've just to find it :-) * In a design perspective : In my opinion, moving business logic (computation, calculation) inside the database should be avoid (if possible). I usally used database as data storage and few basic queries (basic agreggation). For complex data manipulation with business logic, it should be done in the BAL (Business Access Layer). Conclusion : - You have to determine what is best for you : architecture design versus performance design - You have to determine what is best approach : whole dataset processing, chunk of data processing, or streaming the process
Dark Yak wrote:
In my opinion, moving business logic (computation, calculation) inside the database should be avoid (if possible). I usally used database as data storage and few basic queries (basic agreggation). For complex data manipulation with business logic, it should be done in the BAL (Business Access Layer).
As far as scaling into large datasets, there are other solutions like greenplum, cubes, hadoop, etc. that might provide better benefit than just writing custom code. So when it comes down to it, there is no clear cut answer to the question. What works in one context can easily be a horrible solution in another. There are so many choices now, it's hard to pick the right one.
I can imagine the sinking feeling one would have after ordering my book, only to find a laughably ridiculous theory with demented logic once the book arrives - Mark McCutcheon