Where to do the processing

Kornfeld Eliyahu Peter

It's heavily depends on the nature of the process... 1. There are computations involved? 2. There are string processing involved? 3. It's an in-place update or the results go to new table? 4. What the volume of the data we talk about?

I'm not questioning your powers of observation; I'm merely remarking upon the paradox of asking a masked man who he is. (V)

Mycroft Holmes

1. Yes per record dependant on the previous transaction (think compound interest on steroids) 2. No all decimal - some referencing string values external to the transaction 3. Indeterminate - this will depend on the design we take on, my preferences will be for insert as it ia faster than update 4. 4+ million records each pass. I expect the passes to be broken down to smaller chunks but 1m is not unfeasable.

Never underestimate the power of human stupidity RAH

Mycroft Holmes

We will need to extract the records (probably up to 1m), process each record 36 to 120 times and write the information back to the DB. The original data is already a copy, partially aggregated as much as possible. Really not interested in getting into a processing farm, another team failed to do this and I'll stick to stored procs if that is a requirement. A commercial package takes approx 4 hours to process and we don't need their level of complexity so it should be do able.

Never underestimate the power of human stupidity RAH

Kornfeld Eliyahu Peter

IMHO - do it in SQL. Do running computations (via the select itself) and put the result in some temporary table - then use it as you wish... Passing 4+ million records over the net can be very painful. How ever a well configured SQL with properly indexed table can handle 4+ million records with no problem... Try to turn the dependencies into some parameters (pre-calculated maybe in the previous run or fixed from some external source) if you can, so even string manipulations will not slow you down... In an optimal case you my turn it into some SQL job and left it alone for ages...

I'm not questioning your powers of observation; I'm merely remarking upon the paradox of asking a masked man who he is. (V)

Mycroft Holmes

Kornfeld Eliyahu Peter wrote:

do it in SQL

That is my default option but I don't want to eliminate what may be a better solution just because I am comfortable with TSQL. Trust me the various transaction and working tables will be indexed within an inch of their life, one reason I want to use inserts instead of updates.

Never underestimate the power of human stupidity RAH

Kornfeld Eliyahu Peter

I can't see no reason to do in SQL - as you have no nothing that beyond the capabilities of SQL, why add networking (even in server data transfer will add to it...)? Can you explain why you doubted about SQL in the first place?

I'm not questioning your powers of observation; I'm merely remarking upon the paradox of asking a masked man who he is. (V)

Tim Carmichael

I don't think it is a doubt SQL, but rather, an examine all possibilities. I'd do the same; examine all possibilities if no to provide justification when asked why I picked what I did. Tim

Mycroft Holmes

Tim is right, in my original post I made it clear I was investigating options put forward by one of my senior devs. Ignoring the option would be a disservice to that dev.

Never underestimate the power of human stupidity RAH

GuyThiebaut

I would say that performance-wise an database engine is going to be faster than shifting all that data over the network to the client, processing it and sending it back again. With the proviso that you set up the indexes correctly I would say that SQL will be the fastest method of processing the data from a computer point of view(in terms of the human side and GUI etc. that is something only you would know about). Go with SQL as that is what you are most comfortable with and correctly created indexes can helps thing fly. I would avoid cursors and use temporary tables(don’t forget to add indexes to the temporary tables too) doing the processing in steps – my experience is that this is the fastest way of processing large quantities of data.

“That which can be asserted without evidence, can be dismissed without evidence.”

― Christopher Hitchens

Mycroft Holmes

We are well experienced with TSQL and the importance of indexing correctly, it just that one of the senior devs suggested a c# solution so I though I'd get some other opinions. I have a rule of thumb that uses table vars for small reference type info, temp table with indexing for serious volume and cursors only under duress.

Never underestimate the power of human stupidity RAH

GuyThiebaut

Sounds like you know exactly what you need to use :thumbsup: I get the checking with others because a senior dev suggested something ;)

“That which can be asserted without evidence, can be dismissed without evidence.”

― Christopher Hitchens

Jorgen Andersson

There's a third option, you don't need to choose between C# or processing in the database. You can do both using a CLR SQL Server User-Defined Function[^]. That should appeal to both you and that senior developer. :) Note that I've never done it, so I can't say how much fuzz there is to it. But I know that MS invented it with performance in mind.

Wrong is evil and must be defeated. - Jeff Ello[^]

jschell

Per the other post... "process each record 36 to 120 times" Given that it seems likely that the processing isn't going to be simple. So that suggests the TSQL is going to be rather CPU intensive. So what is the expectation of other work that the database needs to do at the same time that this runs, now and in the future? And what is the growth rate of the record set? Does this occur every day? Moving records out of and into the system is of concern but given the processing numbers above it is something I would consider. A separate application allows processing to be moved off box (easier at least with my experience.)

Nicholas Swandel

I have done this on both sides and the DB side is much much faster. I have big data modeling app (35 million transactions across 20 tables) with DB procedures and a big ETL app that has to exist and run on the MSoft client side. The DB is much much faster.