Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Database & SysAdmin
  3. Database
  4. SQL Advice needed [modified]

SQL Advice needed [modified]

Scheduled Pinned Locked Moved Database
databasequestionsysadminhardwarehelp
14 Posts 8 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S Offline
    S Offline
    StevenWalsh
    wrote on last edited by
    #1

    This isn't a programming question... so thats why i'm here :) I'm working on a system thats working with some stock market data (this is a personal project at home) The system is collecting about 7 million records a week, and its really starting to slow down. I've done all the SQL optimizations i can think of (indexing, and what not) but i'm running into a road block with hardware i think... during the day when its downloading new quotes its flatlined at 100% CPU usage all day, and i'm seeing an increase of errors due to timeouts (when attempting to submit more data to sql). Currently the database engine is running on 1 dell poweredge 2550, i have 4 other servers with similar hardware (and 5 more that aren't really servers) so i'm looking for a way to distribute the load amoung the other servers. I've started to think about clustering, but i believe thats only for error tolerance (incase the server goes down) anyone have anyother suggestions?

    modified on Wednesday, July 23, 2008 6:11 PM

    P E P G G 6 Replies Last reply
    0
    • S StevenWalsh

      This isn't a programming question... so thats why i'm here :) I'm working on a system thats working with some stock market data (this is a personal project at home) The system is collecting about 7 million records a week, and its really starting to slow down. I've done all the SQL optimizations i can think of (indexing, and what not) but i'm running into a road block with hardware i think... during the day when its downloading new quotes its flatlined at 100% CPU usage all day, and i'm seeing an increase of errors due to timeouts (when attempting to submit more data to sql). Currently the database engine is running on 1 dell poweredge 2550, i have 4 other servers with similar hardware (and 5 more that aren't really servers) so i'm looking for a way to distribute the load amoung the other servers. I've started to think about clustering, but i believe thats only for error tolerance (incase the server goes down) anyone have anyother suggestions?

      modified on Wednesday, July 23, 2008 6:11 PM

      P Offline
      P Offline
      PIEBALDconsult
      wrote on last edited by
      #2

      Don't bother watching poorly-performing stocks? :-D Roll-up and/or archive old data? Only store new records when some threshold difference is reached?

      1 Reply Last reply
      0
      • S StevenWalsh

        This isn't a programming question... so thats why i'm here :) I'm working on a system thats working with some stock market data (this is a personal project at home) The system is collecting about 7 million records a week, and its really starting to slow down. I've done all the SQL optimizations i can think of (indexing, and what not) but i'm running into a road block with hardware i think... during the day when its downloading new quotes its flatlined at 100% CPU usage all day, and i'm seeing an increase of errors due to timeouts (when attempting to submit more data to sql). Currently the database engine is running on 1 dell poweredge 2550, i have 4 other servers with similar hardware (and 5 more that aren't really servers) so i'm looking for a way to distribute the load amoung the other servers. I've started to think about clustering, but i believe thats only for error tolerance (incase the server goes down) anyone have anyother suggestions?

        modified on Wednesday, July 23, 2008 6:11 PM

        E Offline
        E Offline
        Ennis Ray Lynch Jr
        wrote on last edited by
        #3

        Distributed Partitioned views are ideal for this massive data scenario.

        Need a C# Consultant? I'm available.
        Happiness in intelligent people is the rarest thing I know. -- Ernest Hemingway

        S 1 Reply Last reply
        0
        • S StevenWalsh

          This isn't a programming question... so thats why i'm here :) I'm working on a system thats working with some stock market data (this is a personal project at home) The system is collecting about 7 million records a week, and its really starting to slow down. I've done all the SQL optimizations i can think of (indexing, and what not) but i'm running into a road block with hardware i think... during the day when its downloading new quotes its flatlined at 100% CPU usage all day, and i'm seeing an increase of errors due to timeouts (when attempting to submit more data to sql). Currently the database engine is running on 1 dell poweredge 2550, i have 4 other servers with similar hardware (and 5 more that aren't really servers) so i'm looking for a way to distribute the load amoung the other servers. I've started to think about clustering, but i believe thats only for error tolerance (incase the server goes down) anyone have anyother suggestions?

          modified on Wednesday, July 23, 2008 6:11 PM

          P Offline
          P Offline
          Pete OHanlon
          wrote on last edited by
          #4

          With a bit of reengineering, I'd consider moving the load process out into a message queue and then load the data into a separate database on a separate server. Then, periodically do a bulk insert from this table.

          Deja View - the feeling that you've seen this post before.

          My blog | My articles

          modified on Wednesday, July 23, 2008 3:49 PM

          1 Reply Last reply
          0
          • S StevenWalsh

            This isn't a programming question... so thats why i'm here :) I'm working on a system thats working with some stock market data (this is a personal project at home) The system is collecting about 7 million records a week, and its really starting to slow down. I've done all the SQL optimizations i can think of (indexing, and what not) but i'm running into a road block with hardware i think... during the day when its downloading new quotes its flatlined at 100% CPU usage all day, and i'm seeing an increase of errors due to timeouts (when attempting to submit more data to sql). Currently the database engine is running on 1 dell poweredge 2550, i have 4 other servers with similar hardware (and 5 more that aren't really servers) so i'm looking for a way to distribute the load amoung the other servers. I've started to think about clustering, but i believe thats only for error tolerance (incase the server goes down) anyone have anyother suggestions?

            modified on Wednesday, July 23, 2008 6:11 PM

            G Offline
            G Offline
            Graham Bradshaw
            wrote on last edited by
            #5

            StevenWalsh wrote:

            its flatlined at 100% CPU usage all day

            What process is usingthe CPU? What is the physical disk subsystem? IDE drives can be very processor intensive, especially when writing.

            StevenWalsh wrote:

            I've done all the SQL optimizations i can think of (indexing, and what not)

            What indices do you have? Adding indices will only help querying performance - it will slow down inserts, updates and deletes. Are you in auto-commit mode, with an implicit commit after every insert, or do you perform number of inserts in a transaction and then commit? Auto-commit hurts performance since the database engine updates the indices for every insert. As a suggestion, if you are not querying the data during the day, you could consider writing it to a formatted text file, and use SQL Servers bulk copy to load it once per day.

            D S 2 Replies Last reply
            0
            • G Graham Bradshaw

              StevenWalsh wrote:

              its flatlined at 100% CPU usage all day

              What process is usingthe CPU? What is the physical disk subsystem? IDE drives can be very processor intensive, especially when writing.

              StevenWalsh wrote:

              I've done all the SQL optimizations i can think of (indexing, and what not)

              What indices do you have? Adding indices will only help querying performance - it will slow down inserts, updates and deletes. Are you in auto-commit mode, with an implicit commit after every insert, or do you perform number of inserts in a transaction and then commit? Auto-commit hurts performance since the database engine updates the indices for every insert. As a suggestion, if you are not querying the data during the day, you could consider writing it to a formatted text file, and use SQL Servers bulk copy to load it once per day.

              D Offline
              D Offline
              Daniel Turini
              wrote on last edited by
              #6

              Graham Bradshaw wrote:

              IDE drives can be very processor intensive, especially when writing.

              I thought that this was true only when PIO mode was active, but this was a thing of the 90's and newer disks used DMA, am I wrong?

              I see dead pixels Yes, even I am blogging now!

              G 1 Reply Last reply
              0
              • D Daniel Turini

                Graham Bradshaw wrote:

                IDE drives can be very processor intensive, especially when writing.

                I thought that this was true only when PIO mode was active, but this was a thing of the 90's and newer disks used DMA, am I wrong?

                I see dead pixels Yes, even I am blogging now!

                G Offline
                G Offline
                Graham Bradshaw
                wrote on last edited by
                #7

                It depends on the drive itself, the controller hardware and BIOS, and the driver used by the OS. Put it this way - there's a good reason why high performance servers use SCSI/SAS.

                1 Reply Last reply
                0
                • G Graham Bradshaw

                  StevenWalsh wrote:

                  its flatlined at 100% CPU usage all day

                  What process is usingthe CPU? What is the physical disk subsystem? IDE drives can be very processor intensive, especially when writing.

                  StevenWalsh wrote:

                  I've done all the SQL optimizations i can think of (indexing, and what not)

                  What indices do you have? Adding indices will only help querying performance - it will slow down inserts, updates and deletes. Are you in auto-commit mode, with an implicit commit after every insert, or do you perform number of inserts in a transaction and then commit? Auto-commit hurts performance since the database engine updates the indices for every insert. As a suggestion, if you are not querying the data during the day, you could consider writing it to a formatted text file, and use SQL Servers bulk copy to load it once per day.

                  S Offline
                  S Offline
                  StevenWalsh
                  wrote on last edited by
                  #8

                  Actually the servers are using a SCSI controller with 4 10k rpm drives. i'd have to check the auto-commit tip, never thought about that before :) i can't do the bulk copies unfortuantely since i query the data during the day.

                  P 1 Reply Last reply
                  0
                  • S StevenWalsh

                    Actually the servers are using a SCSI controller with 4 10k rpm drives. i'd have to check the auto-commit tip, never thought about that before :) i can't do the bulk copies unfortuantely since i query the data during the day.

                    P Offline
                    P Offline
                    Pete OHanlon
                    wrote on last edited by
                    #9

                    StevenWalsh wrote:

                    i can't do the bulk copies unfortuantely since i query the data during the day.

                    You can if you load the data into a staging database and use a linked database with a cross database view on it. Then, in the quiet periods do the bulk insert.

                    Deja View - the feeling that you've seen this post before.

                    My blog | My articles

                    1 Reply Last reply
                    0
                    • E Ennis Ray Lynch Jr

                      Distributed Partitioned views are ideal for this massive data scenario.

                      Need a C# Consultant? I'm available.
                      Happiness in intelligent people is the rarest thing I know. -- Ernest Hemingway

                      S Offline
                      S Offline
                      StevenWalsh
                      wrote on last edited by
                      #10

                      i'm thinking this is what i'm going to do.

                      1 Reply Last reply
                      0
                      • S StevenWalsh

                        This isn't a programming question... so thats why i'm here :) I'm working on a system thats working with some stock market data (this is a personal project at home) The system is collecting about 7 million records a week, and its really starting to slow down. I've done all the SQL optimizations i can think of (indexing, and what not) but i'm running into a road block with hardware i think... during the day when its downloading new quotes its flatlined at 100% CPU usage all day, and i'm seeing an increase of errors due to timeouts (when attempting to submit more data to sql). Currently the database engine is running on 1 dell poweredge 2550, i have 4 other servers with similar hardware (and 5 more that aren't really servers) so i'm looking for a way to distribute the load amoung the other servers. I've started to think about clustering, but i believe thats only for error tolerance (incase the server goes down) anyone have anyother suggestions?

                        modified on Wednesday, July 23, 2008 6:11 PM

                        G Offline
                        G Offline
                        GuyThiebaut
                        wrote on last edited by
                        #11

                        Are any the indexes clustered on your large tables? If you do you may have a lot of page splitting going on which can really slow things down with inserting 7m rows. Do you have the transaction log on a separate drive? Is the database on a separate drive from the application?

                        Continuous effort - not strength or intelligence - is the key to unlocking our potential.(Winston Churchill)
                        1 Reply Last reply
                        0
                        • S StevenWalsh

                          This isn't a programming question... so thats why i'm here :) I'm working on a system thats working with some stock market data (this is a personal project at home) The system is collecting about 7 million records a week, and its really starting to slow down. I've done all the SQL optimizations i can think of (indexing, and what not) but i'm running into a road block with hardware i think... during the day when its downloading new quotes its flatlined at 100% CPU usage all day, and i'm seeing an increase of errors due to timeouts (when attempting to submit more data to sql). Currently the database engine is running on 1 dell poweredge 2550, i have 4 other servers with similar hardware (and 5 more that aren't really servers) so i'm looking for a way to distribute the load amoung the other servers. I've started to think about clustering, but i believe thats only for error tolerance (incase the server goes down) anyone have anyother suggestions?

                          modified on Wednesday, July 23, 2008 6:11 PM

                          S Offline
                          S Offline
                          SomeGuyThatIsMe
                          wrote on last edited by
                          #12

                          depending on the db used, and how you set it up it may have to grow itself to be able to store that data i.e. if you origionally set up the db for 3gb, and its taking up all 3 or has gotten bigger, any time you insert you'll have to wait on the database to go find another contiguous section of disc to put the new data on. on sql server 2k you can set the inital size and the growth rate as a percentage or fixed amount, we have a 1tb raid array so we can let it grow 1gb at a time, which lets it work for a while without needing new space. we also created the db several times larger than it was when we built the server, so it wouldnt need to grow for a long time. Contiguous disk space is your friend for data heavy applications, because if you can put all a table or tables together it will reduce seek time. also disabling the log file or setting the log file up to have a large amount of space could help too, if its logging the inserts then that file will get pretty big and have to go find more space for itself. i've greatly sped up our db server and other databases by forcing it to clear the log file, and giving it quite abit of space for it since we regularly upload (insert, delete, update) over 500 million rows in a handful of tables. we upload them to a temp database on the main server then do compares to deteremin what needs to be added removed or changed, then run those queries, so we arent adding 500 million rows every few months, but we do have to mess with that much data, the inserts are ususally only 20 or 30 million, but it does run fairly fast(20 to 30 minutes at most including initial upload done with bcp) on the new machine. top end dell 5u 15k rpm sata2 drives..its sweet. hopefully this was somewhat helpful

                          Please remember to rate helpful or unhelpful answers, it lets us and people reading the forums know if our answers are any good.

                          S 1 Reply Last reply
                          0
                          • S SomeGuyThatIsMe

                            depending on the db used, and how you set it up it may have to grow itself to be able to store that data i.e. if you origionally set up the db for 3gb, and its taking up all 3 or has gotten bigger, any time you insert you'll have to wait on the database to go find another contiguous section of disc to put the new data on. on sql server 2k you can set the inital size and the growth rate as a percentage or fixed amount, we have a 1tb raid array so we can let it grow 1gb at a time, which lets it work for a while without needing new space. we also created the db several times larger than it was when we built the server, so it wouldnt need to grow for a long time. Contiguous disk space is your friend for data heavy applications, because if you can put all a table or tables together it will reduce seek time. also disabling the log file or setting the log file up to have a large amount of space could help too, if its logging the inserts then that file will get pretty big and have to go find more space for itself. i've greatly sped up our db server and other databases by forcing it to clear the log file, and giving it quite abit of space for it since we regularly upload (insert, delete, update) over 500 million rows in a handful of tables. we upload them to a temp database on the main server then do compares to deteremin what needs to be added removed or changed, then run those queries, so we arent adding 500 million rows every few months, but we do have to mess with that much data, the inserts are ususally only 20 or 30 million, but it does run fairly fast(20 to 30 minutes at most including initial upload done with bcp) on the new machine. top end dell 5u 15k rpm sata2 drives..its sweet. hopefully this was somewhat helpful

                            Please remember to rate helpful or unhelpful answers, it lets us and people reading the forums know if our answers are any good.

                            S Offline
                            S Offline
                            StevenWalsh
                            wrote on last edited by
                            #13

                            Actually that helps quite a bit, thanks for the advice

                            S 1 Reply Last reply
                            0
                            • S StevenWalsh

                              Actually that helps quite a bit, thanks for the advice

                              S Offline
                              S Offline
                              SomeGuyThatIsMe
                              wrote on last edited by
                              #14

                              not a problem, the disadvantage is that if you wanted to make it all contiguous you'd have to drop the DB, defrag the hdd then recreate the db larger than it was before. but that would require a decent amount of downtime(a few hours depending on your server) and enough storage some place to back up your current data so it could be restored. also on mass inserts, i've found it usefull to drop the indices and recreate them after the insert, if you can do it all at once. you should see a noticable performance improvement if you try that, doing a lot of inserts and updates to an index can(sometimes) corrupt it quite easily, thus slowing down performance for any action on that table, or anything that is related to that table.

                              Please remember to rate helpful or unhelpful answers, it lets us and people reading the forums know if our answers are any good.

                              1 Reply Last reply
                              0
                              Reply
                              • Reply as topic
                              Log in to reply
                              • Oldest to Newest
                              • Newest to Oldest
                              • Most Votes


                              • Login

                              • Don't have an account? Register

                              • Login or register to search.
                              • First post
                                Last post
                              0
                              • Categories
                              • Recent
                              • Tags
                              • Popular
                              • World
                              • Users
                              • Groups