Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Database & SysAdmin
  3. Database
  4. Where to do the processing

Where to do the processing

Scheduled Pinned Locked Moved Database
csharpdatabasesql-serverlinqcom
16 Posts 8 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M Mycroft Holmes

    We have a requirement to do some serious record by record processing and I have had some suggestions to do the processing in c# rather than use a TDE or cursor. Personally I prefer TSQL but I suspect that is because that is what I am used to. What has been suggested is that we use Dapper [^]and loops or linq to process the records then update back to the database. I will prefer using bulk copy to write the records back but that requires a datatable and converting from ORM List<> to data table for multi million recordsets may be detrimental to the performance. Both processes would run on the same server or at least of the same spec. Any opinions on which would be the fastest method of processing! [edit] link recovery [\edit]

    Never underestimate the power of human stupidity RAH

    Kornfeld Eliyahu PeterK Offline
    Kornfeld Eliyahu PeterK Offline
    Kornfeld Eliyahu Peter
    wrote on last edited by
    #3

    It's heavily depends on the nature of the process... 1. There are computations involved? 2. There are string processing involved? 3. It's an in-place update or the results go to new table? 4. What the volume of the data we talk about?

    I'm not questioning your powers of observation; I'm merely remarking upon the paradox of asking a masked man who he is. (V)

    "It never ceases to amaze me that a spacecraft launched in 1977 can be fixed remotely from Earth." ― Brian Cox

    M 1 Reply Last reply
    0
    • Kornfeld Eliyahu PeterK Kornfeld Eliyahu Peter

      It's heavily depends on the nature of the process... 1. There are computations involved? 2. There are string processing involved? 3. It's an in-place update or the results go to new table? 4. What the volume of the data we talk about?

      I'm not questioning your powers of observation; I'm merely remarking upon the paradox of asking a masked man who he is. (V)

      M Offline
      M Offline
      Mycroft Holmes
      wrote on last edited by
      #4

      1. Yes per record dependant on the previous transaction (think compound interest on steroids) 2. No all decimal - some referencing string values external to the transaction 3. Indeterminate - this will depend on the design we take on, my preferences will be for insert as it ia faster than update 4. 4+ million records each pass. I expect the passes to be broken down to smaller chunks but 1m is not unfeasable.

      Never underestimate the power of human stupidity RAH

      Kornfeld Eliyahu PeterK 1 Reply Last reply
      0
      • L Lost User

        Mycroft Holmes wrote:

        do some serious record by record processing

        Will you be writing to that record (manipulating it), or will the result be written/aggregated elsewhere?

        Mycroft Holmes wrote:

        for multi million recordsets

        Does it have to process the original data, or could you work from a copy?

        Mycroft Holmes wrote:

        Both processes would run on the same server or at least of the same spec.

        "Could" run - if you were to cut that table in five, you could use five clients to do the processing. That need not be dedicated clients; if you're on a network, then there are a few computers on there that are probably idle at some point. Easiest way to detect that is by writing a screensaver. If you go this route, please write an article and hand us the resulting code :laugh: --edit Y'er link is broken.

        Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^]

        M Offline
        M Offline
        Mycroft Holmes
        wrote on last edited by
        #5

        We will need to extract the records (probably up to 1m), process each record 36 to 120 times and write the information back to the DB. The original data is already a copy, partially aggregated as much as possible. Really not interested in getting into a processing farm, another team failed to do this and I'll stick to stored procs if that is a requirement. A commercial package takes approx 4 hours to process and we don't need their level of complexity so it should be do able.

        Never underestimate the power of human stupidity RAH

        1 Reply Last reply
        0
        • M Mycroft Holmes

          1. Yes per record dependant on the previous transaction (think compound interest on steroids) 2. No all decimal - some referencing string values external to the transaction 3. Indeterminate - this will depend on the design we take on, my preferences will be for insert as it ia faster than update 4. 4+ million records each pass. I expect the passes to be broken down to smaller chunks but 1m is not unfeasable.

          Never underestimate the power of human stupidity RAH

          Kornfeld Eliyahu PeterK Offline
          Kornfeld Eliyahu PeterK Offline
          Kornfeld Eliyahu Peter
          wrote on last edited by
          #6

          IMHO - do it in SQL. Do running computations (via the select itself) and put the result in some temporary table - then use it as you wish... Passing 4+ million records over the net can be very painful. How ever a well configured SQL with properly indexed table can handle 4+ million records with no problem... Try to turn the dependencies into some parameters (pre-calculated maybe in the previous run or fixed from some external source) if you can, so even string manipulations will not slow you down... In an optimal case you my turn it into some SQL job and left it alone for ages...

          I'm not questioning your powers of observation; I'm merely remarking upon the paradox of asking a masked man who he is. (V)

          "It never ceases to amaze me that a spacecraft launched in 1977 can be fixed remotely from Earth." ― Brian Cox

          M 1 Reply Last reply
          0
          • Kornfeld Eliyahu PeterK Kornfeld Eliyahu Peter

            IMHO - do it in SQL. Do running computations (via the select itself) and put the result in some temporary table - then use it as you wish... Passing 4+ million records over the net can be very painful. How ever a well configured SQL with properly indexed table can handle 4+ million records with no problem... Try to turn the dependencies into some parameters (pre-calculated maybe in the previous run or fixed from some external source) if you can, so even string manipulations will not slow you down... In an optimal case you my turn it into some SQL job and left it alone for ages...

            I'm not questioning your powers of observation; I'm merely remarking upon the paradox of asking a masked man who he is. (V)

            M Offline
            M Offline
            Mycroft Holmes
            wrote on last edited by
            #7

            Kornfeld Eliyahu Peter wrote:

            do it in SQL

            That is my default option but I don't want to eliminate what may be a better solution just because I am comfortable with TSQL. Trust me the various transaction and working tables will be indexed within an inch of their life, one reason I want to use inserts instead of updates.

            Never underestimate the power of human stupidity RAH

            Kornfeld Eliyahu PeterK 1 Reply Last reply
            0
            • M Mycroft Holmes

              Kornfeld Eliyahu Peter wrote:

              do it in SQL

              That is my default option but I don't want to eliminate what may be a better solution just because I am comfortable with TSQL. Trust me the various transaction and working tables will be indexed within an inch of their life, one reason I want to use inserts instead of updates.

              Never underestimate the power of human stupidity RAH

              Kornfeld Eliyahu PeterK Offline
              Kornfeld Eliyahu PeterK Offline
              Kornfeld Eliyahu Peter
              wrote on last edited by
              #8

              I can't see no reason to do in SQL - as you have no nothing that beyond the capabilities of SQL, why add networking (even in server data transfer will add to it...)? Can you explain why you doubted about SQL in the first place?

              I'm not questioning your powers of observation; I'm merely remarking upon the paradox of asking a masked man who he is. (V)

              "It never ceases to amaze me that a spacecraft launched in 1977 can be fixed remotely from Earth." ― Brian Cox

              T M 2 Replies Last reply
              0
              • Kornfeld Eliyahu PeterK Kornfeld Eliyahu Peter

                I can't see no reason to do in SQL - as you have no nothing that beyond the capabilities of SQL, why add networking (even in server data transfer will add to it...)? Can you explain why you doubted about SQL in the first place?

                I'm not questioning your powers of observation; I'm merely remarking upon the paradox of asking a masked man who he is. (V)

                T Offline
                T Offline
                Tim Carmichael
                wrote on last edited by
                #9

                I don't think it is a doubt SQL, but rather, an examine all possibilities. I'd do the same; examine all possibilities if no to provide justification when asked why I picked what I did. Tim

                1 Reply Last reply
                0
                • Kornfeld Eliyahu PeterK Kornfeld Eliyahu Peter

                  I can't see no reason to do in SQL - as you have no nothing that beyond the capabilities of SQL, why add networking (even in server data transfer will add to it...)? Can you explain why you doubted about SQL in the first place?

                  I'm not questioning your powers of observation; I'm merely remarking upon the paradox of asking a masked man who he is. (V)

                  M Offline
                  M Offline
                  Mycroft Holmes
                  wrote on last edited by
                  #10

                  Tim is right, in my original post I made it clear I was investigating options put forward by one of my senior devs. Ignoring the option would be a disservice to that dev.

                  Never underestimate the power of human stupidity RAH

                  1 Reply Last reply
                  0
                  • M Mycroft Holmes

                    We have a requirement to do some serious record by record processing and I have had some suggestions to do the processing in c# rather than use a TDE or cursor. Personally I prefer TSQL but I suspect that is because that is what I am used to. What has been suggested is that we use Dapper [^]and loops or linq to process the records then update back to the database. I will prefer using bulk copy to write the records back but that requires a datatable and converting from ORM List<> to data table for multi million recordsets may be detrimental to the performance. Both processes would run on the same server or at least of the same spec. Any opinions on which would be the fastest method of processing! [edit] link recovery [\edit]

                    Never underestimate the power of human stupidity RAH

                    G Offline
                    G Offline
                    GuyThiebaut
                    wrote on last edited by
                    #11

                    I would say that performance-wise an database engine is going to be faster than shifting all that data over the network to the client, processing it and sending it back again. With the proviso that you set up the indexes correctly I would say that SQL will be the fastest method of processing the data from a computer point of view(in terms of the human side and GUI etc. that is something only you would know about). Go with SQL as that is what you are most comfortable with and correctly created indexes can helps thing fly. I would avoid cursors and use temporary tables(don’t forget to add indexes to the temporary tables too) doing the processing in steps – my experience is that this is the fastest way of processing large quantities of data.

                    “That which can be asserted without evidence, can be dismissed without evidence.”

                    ― Christopher Hitchens

                    M 1 Reply Last reply
                    0
                    • G GuyThiebaut

                      I would say that performance-wise an database engine is going to be faster than shifting all that data over the network to the client, processing it and sending it back again. With the proviso that you set up the indexes correctly I would say that SQL will be the fastest method of processing the data from a computer point of view(in terms of the human side and GUI etc. that is something only you would know about). Go with SQL as that is what you are most comfortable with and correctly created indexes can helps thing fly. I would avoid cursors and use temporary tables(don’t forget to add indexes to the temporary tables too) doing the processing in steps – my experience is that this is the fastest way of processing large quantities of data.

                      “That which can be asserted without evidence, can be dismissed without evidence.”

                      ― Christopher Hitchens

                      M Offline
                      M Offline
                      Mycroft Holmes
                      wrote on last edited by
                      #12

                      We are well experienced with TSQL and the importance of indexing correctly, it just that one of the senior devs suggested a c# solution so I though I'd get some other opinions. I have a rule of thumb that uses table vars for small reference type info, temp table with indexing for serious volume and cursors only under duress.

                      Never underestimate the power of human stupidity RAH

                      G 1 Reply Last reply
                      0
                      • M Mycroft Holmes

                        We are well experienced with TSQL and the importance of indexing correctly, it just that one of the senior devs suggested a c# solution so I though I'd get some other opinions. I have a rule of thumb that uses table vars for small reference type info, temp table with indexing for serious volume and cursors only under duress.

                        Never underestimate the power of human stupidity RAH

                        G Offline
                        G Offline
                        GuyThiebaut
                        wrote on last edited by
                        #13

                        Sounds like you know exactly what you need to use :thumbsup: I get the checking with others because a senior dev suggested something ;)

                        “That which can be asserted without evidence, can be dismissed without evidence.”

                        ― Christopher Hitchens

                        1 Reply Last reply
                        0
                        • M Mycroft Holmes

                          We have a requirement to do some serious record by record processing and I have had some suggestions to do the processing in c# rather than use a TDE or cursor. Personally I prefer TSQL but I suspect that is because that is what I am used to. What has been suggested is that we use Dapper [^]and loops or linq to process the records then update back to the database. I will prefer using bulk copy to write the records back but that requires a datatable and converting from ORM List<> to data table for multi million recordsets may be detrimental to the performance. Both processes would run on the same server or at least of the same spec. Any opinions on which would be the fastest method of processing! [edit] link recovery [\edit]

                          Never underestimate the power of human stupidity RAH

                          J Offline
                          J Offline
                          Jorgen Andersson
                          wrote on last edited by
                          #14

                          There's a third option, you don't need to choose between C# or processing in the database. You can do both using a CLR SQL Server User-Defined Function[^]. That should appeal to both you and that senior developer. :) Note that I've never done it, so I can't say how much fuzz there is to it. But I know that MS invented it with performance in mind.

                          Wrong is evil and must be defeated. - Jeff Ello[^]

                          1 Reply Last reply
                          0
                          • M Mycroft Holmes

                            We have a requirement to do some serious record by record processing and I have had some suggestions to do the processing in c# rather than use a TDE or cursor. Personally I prefer TSQL but I suspect that is because that is what I am used to. What has been suggested is that we use Dapper [^]and loops or linq to process the records then update back to the database. I will prefer using bulk copy to write the records back but that requires a datatable and converting from ORM List<> to data table for multi million recordsets may be detrimental to the performance. Both processes would run on the same server or at least of the same spec. Any opinions on which would be the fastest method of processing! [edit] link recovery [\edit]

                            Never underestimate the power of human stupidity RAH

                            J Offline
                            J Offline
                            jschell
                            wrote on last edited by
                            #15

                            Per the other post... "process each record 36 to 120 times" Given that it seems likely that the processing isn't going to be simple. So that suggests the TSQL is going to be rather CPU intensive. So what is the expectation of other work that the database needs to do at the same time that this runs, now and in the future? And what is the growth rate of the record set? Does this occur every day? Moving records out of and into the system is of concern but given the processing numbers above it is something I would consider. A separate application allows processing to be moved off box (easier at least with my experience.)

                            1 Reply Last reply
                            0
                            • M Mycroft Holmes

                              We have a requirement to do some serious record by record processing and I have had some suggestions to do the processing in c# rather than use a TDE or cursor. Personally I prefer TSQL but I suspect that is because that is what I am used to. What has been suggested is that we use Dapper [^]and loops or linq to process the records then update back to the database. I will prefer using bulk copy to write the records back but that requires a datatable and converting from ORM List<> to data table for multi million recordsets may be detrimental to the performance. Both processes would run on the same server or at least of the same spec. Any opinions on which would be the fastest method of processing! [edit] link recovery [\edit]

                              Never underestimate the power of human stupidity RAH

                              N Offline
                              N Offline
                              Nicholas Swandel
                              wrote on last edited by
                              #16

                              I have done this on both sides and the DB side is much much faster. I have big data modeling app (35 million transactions across 20 tables) with DB procedures and a big ETL app that has to exist and run on the MSoft client side. The DB is much much faster.

                              1 Reply Last reply
                              0
                              Reply
                              • Reply as topic
                              Log in to reply
                              • Oldest to Newest
                              • Newest to Oldest
                              • Most Votes


                              • Login

                              • Don't have an account? Register

                              • Login or register to search.
                              • First post
                                Last post
                              0
                              • Categories
                              • Recent
                              • Tags
                              • Popular
                              • World
                              • Users
                              • Groups