Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C#
  4. Best way to store MASSIVE amounts of data?

Best way to store MASSIVE amounts of data?

Scheduled Pinned Locked Moved C#
databasequestionlearning
10 Posts 7 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S Offline
    S Offline
    SledgeHammer01
    wrote on last edited by
    #1

    What is the best way to store *MASSIVE* amounts of data? I'm thinking [EDITED] millions or billions of images totalling 10TB or so. Not sure of the exact size yet... need to get some data samples... but somewhere in that arena. I guess at a high level, they could be split by state. I know that SQL can not handle this kind of size. Performant lookup is also desired. How is this usually handled? Needs to be backupable of course as well. Had an old boss who was a big fan of storing the path in the DB and the files on disk. From personal experience, you end up with so many directories, the file system breaks down. Try navigating to a folder with 1000+ directories. I don't think I'll need to do too complex of queries on the data, just simple look ups. Inserts should be fast as well.

    Richard Andrew x64R M G S J 6 Replies Last reply
    0
    • S SledgeHammer01

      What is the best way to store *MASSIVE* amounts of data? I'm thinking [EDITED] millions or billions of images totalling 10TB or so. Not sure of the exact size yet... need to get some data samples... but somewhere in that arena. I guess at a high level, they could be split by state. I know that SQL can not handle this kind of size. Performant lookup is also desired. How is this usually handled? Needs to be backupable of course as well. Had an old boss who was a big fan of storing the path in the DB and the files on disk. From personal experience, you end up with so many directories, the file system breaks down. Try navigating to a folder with 1000+ directories. I don't think I'll need to do too complex of queries on the data, just simple look ups. Inserts should be fast as well.

      Richard Andrew x64R Offline
      Richard Andrew x64R Offline
      Richard Andrew x64
      wrote on last edited by
      #2

      Never mind the DB you choose, how will you connect that many hard drives to one machine?! :omg: Have you looked at these guys MongoDB[^]?

      The difficult we do right away... ...the impossible takes slightly longer.

      S 1 Reply Last reply
      0
      • S SledgeHammer01

        What is the best way to store *MASSIVE* amounts of data? I'm thinking [EDITED] millions or billions of images totalling 10TB or so. Not sure of the exact size yet... need to get some data samples... but somewhere in that arena. I guess at a high level, they could be split by state. I know that SQL can not handle this kind of size. Performant lookup is also desired. How is this usually handled? Needs to be backupable of course as well. Had an old boss who was a big fan of storing the path in the DB and the files on disk. From personal experience, you end up with so many directories, the file system breaks down. Try navigating to a folder with 1000+ directories. I don't think I'll need to do too complex of queries on the data, just simple look ups. Inserts should be fast as well.

        M Offline
        M Offline
        Mycroft Holmes
        wrote on last edited by
        #3

        That sort of volume is going to required some serious iron. I would suggest you need to look at Oracle, I loathe Oracle but SQL server struggles with serious volume. We were looking at record volume about the same but data size substantially smaller and I think the 3 server cost topped $1m for the Oracle licences. Then you are going to want to hire an Oracle consultant/DBA to design and tune the blasted thing.

        Never underestimate the power of human stupidity RAH

        S 1 Reply Last reply
        0
        • Richard Andrew x64R Richard Andrew x64

          Never mind the DB you choose, how will you connect that many hard drives to one machine?! :omg: Have you looked at these guys MongoDB[^]?

          The difficult we do right away... ...the impossible takes slightly longer.

          S Offline
          S Offline
          SledgeHammer01
          wrote on last edited by
          #4

          Hmm... you're right haha... my estimates were a bit over the top. I did the math just now, and thats like 95,000 TB... lol... I think 10TB - 100TB total data is more in the right area... oops :)

          1 Reply Last reply
          0
          • M Mycroft Holmes

            That sort of volume is going to required some serious iron. I would suggest you need to look at Oracle, I loathe Oracle but SQL server struggles with serious volume. We were looking at record volume about the same but data size substantially smaller and I think the 3 server cost topped $1m for the Oracle licences. Then you are going to want to hire an Oracle consultant/DBA to design and tune the blasted thing.

            Never underestimate the power of human stupidity RAH

            S Offline
            S Offline
            SledgeHammer01
            wrote on last edited by
            #5

            As I responded to the other guy, I got a little over zealous with my estimates... 10TB to 100TB is probably closer.

            M 1 Reply Last reply
            0
            • S SledgeHammer01

              What is the best way to store *MASSIVE* amounts of data? I'm thinking [EDITED] millions or billions of images totalling 10TB or so. Not sure of the exact size yet... need to get some data samples... but somewhere in that arena. I guess at a high level, they could be split by state. I know that SQL can not handle this kind of size. Performant lookup is also desired. How is this usually handled? Needs to be backupable of course as well. Had an old boss who was a big fan of storing the path in the DB and the files on disk. From personal experience, you end up with so many directories, the file system breaks down. Try navigating to a folder with 1000+ directories. I don't think I'll need to do too complex of queries on the data, just simple look ups. Inserts should be fast as well.

              G Offline
              G Offline
              Garth J Lancaster
              wrote on last edited by
              #6

              ok, there's the images, and then there's the 'meta data', or think of the index fields telling you how to locate an image .. So, to locate the correct image or pointer to image, you need the metadata - this is what you search on .. the question is, do you really need to store the images WITH the meta/index data ? I'd suggest 'not' .. so, you have a searchable DB of metadata/indexes on fast tier 1 storage, then, once the right record is round, you then retrieve the image from maybe a worm drive, or 'tier 2 storage' using the pointer/location from the search. Plenty of storage providers provide WORM storage for example

              1 Reply Last reply
              0
              • S SledgeHammer01

                As I responded to the other guy, I got a little over zealous with my estimates... 10TB to 100TB is probably closer.

                M Offline
                M Offline
                Mycroft Holmes
                wrote on last edited by
                #7

                If the number of records is over 100m then you are going to struggle with SQL Server, even when you go down the path suggested by Garth - that is really your only option anyway. There is no way you want the images anywhere near your searchable data.

                Never underestimate the power of human stupidity RAH

                1 Reply Last reply
                0
                • S SledgeHammer01

                  What is the best way to store *MASSIVE* amounts of data? I'm thinking [EDITED] millions or billions of images totalling 10TB or so. Not sure of the exact size yet... need to get some data samples... but somewhere in that arena. I guess at a high level, they could be split by state. I know that SQL can not handle this kind of size. Performant lookup is also desired. How is this usually handled? Needs to be backupable of course as well. Had an old boss who was a big fan of storing the path in the DB and the files on disk. From personal experience, you end up with so many directories, the file system breaks down. Try navigating to a folder with 1000+ directories. I don't think I'll need to do too complex of queries on the data, just simple look ups. Inserts should be fast as well.

                  S Offline
                  S Offline
                  s_magus
                  wrote on last edited by
                  #8

                  Here is a relatively old article about the architecture of flicker, but it might give you some ideas. http://highscalability.com/flickr-architecture[^]

                  1 Reply Last reply
                  0
                  • S SledgeHammer01

                    What is the best way to store *MASSIVE* amounts of data? I'm thinking [EDITED] millions or billions of images totalling 10TB or so. Not sure of the exact size yet... need to get some data samples... but somewhere in that arena. I guess at a high level, they could be split by state. I know that SQL can not handle this kind of size. Performant lookup is also desired. How is this usually handled? Needs to be backupable of course as well. Had an old boss who was a big fan of storing the path in the DB and the files on disk. From personal experience, you end up with so many directories, the file system breaks down. Try navigating to a folder with 1000+ directories. I don't think I'll need to do too complex of queries on the data, just simple look ups. Inserts should be fast as well.

                    J Offline
                    J Offline
                    jschell
                    wrote on last edited by
                    #9

                    SledgeHammer01 wrote:

                    Performant lookup is also desired.

                    You need actual requirements - not off the cuff statements. For example you could take the above statement and claim that the the system must be capable of serving all of that data at the very same time. If so then the company is going to need to buy an IP backbone company just to deliver it. The reality is that system has business cases that dictate usage. You need to start with those.

                    SledgeHammer01 wrote:

                    Needs to be backupable of course as well.

                    Think about that in terms of the above requirements. How long is it going to take to restore the entire system from scratch? Obviously too long some something has to be different.

                    SledgeHammer01 wrote:

                    I guess at a high level, they could be split by state.

                    You start with requirements, business cases and usage patterns and then define categorizations. Categorizations will impact how it is stored. One final suggestion - if that estimate is a pie in the sky dream then go do your own research. If it is a hard business requirement then the business needs to pay for consultants (very likely plural) with experience in very large data systems rather than trying to roll their own.

                    1 Reply Last reply
                    0
                    • S SledgeHammer01

                      What is the best way to store *MASSIVE* amounts of data? I'm thinking [EDITED] millions or billions of images totalling 10TB or so. Not sure of the exact size yet... need to get some data samples... but somewhere in that arena. I guess at a high level, they could be split by state. I know that SQL can not handle this kind of size. Performant lookup is also desired. How is this usually handled? Needs to be backupable of course as well. Had an old boss who was a big fan of storing the path in the DB and the files on disk. From personal experience, you end up with so many directories, the file system breaks down. Try navigating to a folder with 1000+ directories. I don't think I'll need to do too complex of queries on the data, just simple look ups. Inserts should be fast as well.

                      E Offline
                      E Offline
                      Ennis Ray Lynch Jr
                      wrote on last edited by
                      #10

                      SQL is a language. I have a few hundred gigs that I generate google map tile images on the fly so it can work. Appropriate indexing and caching is the real key. How it is usually handled? Testing, scaling, testing, and measuring. Are you on an intranet exlusively? Inexpensive servers with 4 port NICS serve as really nice in-house CDN's and scale fairly well. Not on an intranet? Well there are public CDN which will host the images then it is merely about hosting the indexes.

                      Need custom software developed? I do custom programming based primarily on MS tools with an emphasis on C# development and consulting. "And they, since they Were not the one dead, turned to their affairs" -- Robert Frost "All users always want Excel" --Ennis Lynch

                      1 Reply Last reply
                      0
                      Reply
                      • Reply as topic
                      Log in to reply
                      • Oldest to Newest
                      • Newest to Oldest
                      • Most Votes


                      • Login

                      • Don't have an account? Register

                      • Login or register to search.
                      • First post
                        Last post
                      0
                      • Categories
                      • Recent
                      • Tags
                      • Popular
                      • World
                      • Users
                      • Groups