Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Database & SysAdmin
  3. Database
  4. Handling big tables

Handling big tables

Scheduled Pinned Locked Moved Database
helpdatabasesysadmindata-structuresjson
8 Posts 5 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • V Offline
    V Offline
    V 0
    wrote on last edited by
    #1

    My boss just tells me that I could have a table in the database that receives about 3x10^9 records a year (3 000 000 000). I could split this table in roughly four equal parts, but it still remains a lot. The user can choose to view just a day a year or even the entire history. (eg. the last 2 years wich still means 1 500 000 000 records) I generally held a rule for myself to keep a production database around maximum 100 000 to 1 000 000 records total and put the rest in history, but this doesn't seem to be an option here and I'm very worried about the performance. Has anyone got any experience with this size of database and how to solve the issue of getting the data quickly? (Additonal info. The data are datapoints that need to be plotted on a graph in a certain time period. Performance is one of the top priorities.) I see two major problems: 1. Keep table size reasonable, so how to split the data into different tables easily? 2. Getting the data to the user. (eg create average graph on the server, because a graph with more then, let's say 1000-2000 points is not useful on the screen anyway.) Many thanks for any advice in advance. (PS: Only thing I could find on google is splitting into smaller tables, but that doesn't immediately solve the problem of sending data to the user)

    V.

    M L J 3 Replies Last reply
    0
    • V V 0

      My boss just tells me that I could have a table in the database that receives about 3x10^9 records a year (3 000 000 000). I could split this table in roughly four equal parts, but it still remains a lot. The user can choose to view just a day a year or even the entire history. (eg. the last 2 years wich still means 1 500 000 000 records) I generally held a rule for myself to keep a production database around maximum 100 000 to 1 000 000 records total and put the rest in history, but this doesn't seem to be an option here and I'm very worried about the performance. Has anyone got any experience with this size of database and how to solve the issue of getting the data quickly? (Additonal info. The data are datapoints that need to be plotted on a graph in a certain time period. Performance is one of the top priorities.) I see two major problems: 1. Keep table size reasonable, so how to split the data into different tables easily? 2. Getting the data to the user. (eg create average graph on the server, because a graph with more then, let's say 1000-2000 points is not useful on the screen anyway.) Many thanks for any advice in advance. (PS: Only thing I could find on google is splitting into smaller tables, but that doesn't immediately solve the problem of sending data to the user)

      V.

      M Offline
      M Offline
      Mycroft Holmes
      wrote on last edited by
      #2

      Look into partitioning, probably by month. Your users will NEVER look at 100k rows let alone 2 years worth. They may wish to analyse that volume but will now "view" them!

      Never underestimate the power of human stupidity RAH

      V R 2 Replies Last reply
      0
      • M Mycroft Holmes

        Look into partitioning, probably by month. Your users will NEVER look at 100k rows let alone 2 years worth. They may wish to analyse that volume but will now "view" them!

        Never underestimate the power of human stupidity RAH

        V Offline
        V Offline
        V 0
        wrote on last edited by
        #3

        Actually, what I think will happen is that the "view" on large amounts will be averaged. for example one point per month or week. I do realize I'll need to talk to the boss. :-D

        V.

        M 1 Reply Last reply
        0
        • M Mycroft Holmes

          Look into partitioning, probably by month. Your users will NEVER look at 100k rows let alone 2 years worth. They may wish to analyse that volume but will now "view" them!

          Never underestimate the power of human stupidity RAH

          R Offline
          R Offline
          R Giskard Reventlov
          wrote on last edited by
          #4

          Mycroft Holmes wrote:

          Look into partitioning, probably by month.

          Good call. :thumbsup:

          "If you think it's expensive to hire a professional to do the job, wait until you hire an amateur." Red Adair. nils illegitimus carborundum me, me, me

          1 Reply Last reply
          0
          • V V 0

            My boss just tells me that I could have a table in the database that receives about 3x10^9 records a year (3 000 000 000). I could split this table in roughly four equal parts, but it still remains a lot. The user can choose to view just a day a year or even the entire history. (eg. the last 2 years wich still means 1 500 000 000 records) I generally held a rule for myself to keep a production database around maximum 100 000 to 1 000 000 records total and put the rest in history, but this doesn't seem to be an option here and I'm very worried about the performance. Has anyone got any experience with this size of database and how to solve the issue of getting the data quickly? (Additonal info. The data are datapoints that need to be plotted on a graph in a certain time period. Performance is one of the top priorities.) I see two major problems: 1. Keep table size reasonable, so how to split the data into different tables easily? 2. Getting the data to the user. (eg create average graph on the server, because a graph with more then, let's say 1000-2000 points is not useful on the screen anyway.) Many thanks for any advice in advance. (PS: Only thing I could find on google is splitting into smaller tables, but that doesn't immediately solve the problem of sending data to the user)

            V.

            L Offline
            L Offline
            Lost User
            wrote on last edited by
            #5

            V. wrote:

            how to solve the issue of getting the data quickly?

            Sql Server can handle millions of rows. ..but it won't perform at all if you put in a single scalar user-function in TSQL. As long as you're not doing a complex query, performance shouldn't be too much of a problem. It might pay to create a test-environment with random data to identify bottlenecks as early as possible. With large amounts of data, I'd suggest large amounts of RAM (so that Sql Server can keep the large indexes in memory).

            Bastard Programmer from Hell :suss:

            1 Reply Last reply
            0
            • V V 0

              Actually, what I think will happen is that the "view" on large amounts will be averaged. for example one point per month or week. I do realize I'll need to talk to the boss. :-D

              V.

              M Offline
              M Offline
              Mycroft Holmes
              wrote on last edited by
              #6

              Using SQL Server to deal with large volumes of data takes some serious planning, if you need to get into this area I would engage a professional, while I consider myself a good SQL developer I would not undertake such a project without professional support. We had such a requirement and ended up using Oracle, it is faster and more robust but you HAVE to have a full time DBA.

              Never underestimate the power of human stupidity RAH

              1 Reply Last reply
              0
              • V V 0

                My boss just tells me that I could have a table in the database that receives about 3x10^9 records a year (3 000 000 000). I could split this table in roughly four equal parts, but it still remains a lot. The user can choose to view just a day a year or even the entire history. (eg. the last 2 years wich still means 1 500 000 000 records) I generally held a rule for myself to keep a production database around maximum 100 000 to 1 000 000 records total and put the rest in history, but this doesn't seem to be an option here and I'm very worried about the performance. Has anyone got any experience with this size of database and how to solve the issue of getting the data quickly? (Additonal info. The data are datapoints that need to be plotted on a graph in a certain time period. Performance is one of the top priorities.) I see two major problems: 1. Keep table size reasonable, so how to split the data into different tables easily? 2. Getting the data to the user. (eg create average graph on the server, because a graph with more then, let's say 1000-2000 points is not useful on the screen anyway.) Many thanks for any advice in advance. (PS: Only thing I could find on google is splitting into smaller tables, but that doesn't immediately solve the problem of sending data to the user)

                V.

                J Offline
                J Offline
                jschell
                wrote on last edited by
                #7

                V. wrote:

                The user can choose to view just a day a year or even the entire history. (eg. the last 2 years wich still means 1 500 000 000 records)

                Nonsense. Simple math will demonstrate that a user (person) will NEVER look at that many records at once. They will either look at a summary, or a summary with drill downs or a will be looking for a very, very small subset of that.

                V. wrote:

                Has anyone got any experience with this size of database and how to solve the issue of getting the data quickly?

                Get real requirements.

                V. wrote:

                The data are datapoints that need to be plotted on a graph in a certain time period.

                Excellent example. Graph 1.5 billion data points (from your first requirement) - so xactly how many pixels are on your screen? Again simple math will demonstrate that you can't view that many data points on a graph. So either there will be a much smaller time period or a summary of the entire period. If a summary one solution is to build summary tables. So for example at the end of every data you create summary data of the day. Then a graph that display every day, uses the summary table rather than the raw data.

                V. wrote:

                1. Keep table size reasonable, so how to split the data into different tables easily?

                You start with real usage patterns. For example what percentage of the time does a user want to look at data for the last week, month, year? Or hour, day, week? Or by collection point via week. Etc, etc, etc. Second you get realistic estimates and anticipated growth. So is 3 billion that average or the maximum? Will it be 3 billion next year or 30 billion. Keep in mind it must be realistic, not pie in the sky. Third how long must you keep it? 1 year, 10 years 100 years? And if it no one seems willing to discuss reality then you might want to look into price tags for a really hug SAN system, and submit a request to buy it because you will need it for testing - because the system will need that to run.

                V 1 Reply Last reply
                0
                • J jschell

                  V. wrote:

                  The user can choose to view just a day a year or even the entire history. (eg. the last 2 years wich still means 1 500 000 000 records)

                  Nonsense. Simple math will demonstrate that a user (person) will NEVER look at that many records at once. They will either look at a summary, or a summary with drill downs or a will be looking for a very, very small subset of that.

                  V. wrote:

                  Has anyone got any experience with this size of database and how to solve the issue of getting the data quickly?

                  Get real requirements.

                  V. wrote:

                  The data are datapoints that need to be plotted on a graph in a certain time period.

                  Excellent example. Graph 1.5 billion data points (from your first requirement) - so xactly how many pixels are on your screen? Again simple math will demonstrate that you can't view that many data points on a graph. So either there will be a much smaller time period or a summary of the entire period. If a summary one solution is to build summary tables. So for example at the end of every data you create summary data of the day. Then a graph that display every day, uses the summary table rather than the raw data.

                  V. wrote:

                  1. Keep table size reasonable, so how to split the data into different tables easily?

                  You start with real usage patterns. For example what percentage of the time does a user want to look at data for the last week, month, year? Or hour, day, week? Or by collection point via week. Etc, etc, etc. Second you get realistic estimates and anticipated growth. So is 3 billion that average or the maximum? Will it be 3 billion next year or 30 billion. Keep in mind it must be realistic, not pie in the sky. Third how long must you keep it? 1 year, 10 years 100 years? And if it no one seems willing to discuss reality then you might want to look into price tags for a really hug SAN system, and submit a request to buy it because you will need it for testing - because the system will need that to run.

                  V Offline
                  V Offline
                  V 0
                  wrote on last edited by
                  #8

                  jschell wrote:

                  Excellent example. Graph 1.5 billion data points (from your first requirement) - so xactly how many pixels are on your screen?

                  In meantime we already got here and I told my boss we need to downsample the information. :-)

                  V.

                  1 Reply Last reply
                  0
                  Reply
                  • Reply as topic
                  Log in to reply
                  • Oldest to Newest
                  • Newest to Oldest
                  • Most Votes


                  • Login

                  • Don't have an account? Register

                  • Login or register to search.
                  • First post
                    Last post
                  0
                  • Categories
                  • Recent
                  • Tags
                  • Popular
                  • World
                  • Users
                  • Groups