Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. Other Discussions
  3. Clever Code
  4. NTFS, win7 and 20 million files

NTFS, win7 and 20 million files

Scheduled Pinned Locked Moved Clever Code
questioncom
8 Posts 5 Posters 66 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M Offline
    M Offline
    Michael Pauli
    wrote on last edited by
    #1

    Hi, I'm about to begin a small project in which I must be able to store and lookup up to 20. mio. files - in the best possible way. Needless to say fast. For this I have been around - http://en.wikipedia.org/wiki/NTFS#Limitations http://www.ntfs.com/ntfs\_vs\_fat.htm And now my question: Dealing with a production load in the area around 60.000 files (=pictures) per day each around 300 kb in size, what would be the best ratio of what number of files in what number of directorys to make the search-time best? Obviously I do not put all the files in one directory, but in a number of dir's. So what would be the best economy for such a thing? Seems to be hard to find information about on the web. Thanx' in advance, Kind regards,

    Michael Pauli

    V S V 3 Replies Last reply
    0
    • M Michael Pauli

      Hi, I'm about to begin a small project in which I must be able to store and lookup up to 20. mio. files - in the best possible way. Needless to say fast. For this I have been around - http://en.wikipedia.org/wiki/NTFS#Limitations http://www.ntfs.com/ntfs\_vs\_fat.htm And now my question: Dealing with a production load in the area around 60.000 files (=pictures) per day each around 300 kb in size, what would be the best ratio of what number of files in what number of directorys to make the search-time best? Obviously I do not put all the files in one directory, but in a number of dir's. So what would be the best economy for such a thing? Seems to be hard to find information about on the web. Thanx' in advance, Kind regards,

      Michael Pauli

      V Offline
      V Offline
      V 0
      wrote on last edited by
      #2

      this is a programming question and does not belong in this forum. to answer the question, if the # of files is that big, I might opt for a batch running during the night and putting the files into a database (if that is an option). I found that recursive methods work very fast in this case.

      V.

      M A 2 Replies Last reply
      0
      • V V 0

        this is a programming question and does not belong in this forum. to answer the question, if the # of files is that big, I might opt for a batch running during the night and putting the files into a database (if that is an option). I found that recursive methods work very fast in this case.

        V.

        M Offline
        M Offline
        Michael Pauli
        wrote on last edited by
        #3

        Thank you for your answer. Thing is that we must use a file system for it and not a db. Second I'm not sure that a recursive principle would be first thing to go for for us due to the large number of files. Kind regards,

        Michael Pauli

        V 1 Reply Last reply
        0
        • M Michael Pauli

          Thank you for your answer. Thing is that we must use a file system for it and not a db. Second I'm not sure that a recursive principle would be first thing to go for for us due to the large number of files. Kind regards,

          Michael Pauli

          V Offline
          V Offline
          V 0
          wrote on last edited by
          #4

          Michael Pauli wrote:

          Second I'm not sure that a recursive principle would be first thing to go for for us due to the large number of files.

          I don't understand why you would say that, but I didn't vote you down. However I would suggest to move this post to the proper forum.

          V.

          M 1 Reply Last reply
          0
          • V V 0

            Michael Pauli wrote:

            Second I'm not sure that a recursive principle would be first thing to go for for us due to the large number of files.

            I don't understand why you would say that, but I didn't vote you down. However I would suggest to move this post to the proper forum.

            V.

            M Offline
            M Offline
            Michael Pauli
            wrote on last edited by
            #5

            I'll move it. Sorry for the inconvenience. Kind regards,

            Michael Pauli

            1 Reply Last reply
            0
            • M Michael Pauli

              Hi, I'm about to begin a small project in which I must be able to store and lookup up to 20. mio. files - in the best possible way. Needless to say fast. For this I have been around - http://en.wikipedia.org/wiki/NTFS#Limitations http://www.ntfs.com/ntfs\_vs\_fat.htm And now my question: Dealing with a production load in the area around 60.000 files (=pictures) per day each around 300 kb in size, what would be the best ratio of what number of files in what number of directorys to make the search-time best? Obviously I do not put all the files in one directory, but in a number of dir's. So what would be the best economy for such a thing? Seems to be hard to find information about on the web. Thanx' in advance, Kind regards,

              Michael Pauli

              S Offline
              S Offline
              Sazzad Hossain
              wrote on last edited by
              #6

              at work we deal with a large number files, (publication company). If possible I would suggest you to keep the file names in a database table as index and use some sort of folder structure, based on file name or index id, so that u don't end up with mil files in one folder. U may have to break down to two level folder structure. -ID[100-500] //Database Index Id within 100 to 500 etc.. \----[A] //File name starting A \----[B] what ever u want to do, you don't want do a look up in folders like Directory.GetFiles(), it will slow things down. So your best bet is to work out the file name from database table and recreate the file path based on the folder structure and directly access the file. Also if you don't want to use database table as index, then use some sort of naming scheme in ur files. Like FILECATEGORY_FileName.ext Then based on file category you have top level folders. Then based on file name first 3 character, you can many sub folders. That way files will be spread over two level folders without causing too much issues for regular folder browsing in explorer.

              ----------------------------- @SazzadHossain -----------------------------

              1 Reply Last reply
              0
              • V V 0

                this is a programming question and does not belong in this forum. to answer the question, if the # of files is that big, I might opt for a batch running during the night and putting the files into a database (if that is an option). I found that recursive methods work very fast in this case.

                V.

                A Offline
                A Offline
                Albert Holguin
                wrote on last edited by
                #7

                That's how most fast file search engines work... Including Linux' find... +5 for the suggestions (both the db and moving the question suggestions)

                1 Reply Last reply
                0
                • M Michael Pauli

                  Hi, I'm about to begin a small project in which I must be able to store and lookup up to 20. mio. files - in the best possible way. Needless to say fast. For this I have been around - http://en.wikipedia.org/wiki/NTFS#Limitations http://www.ntfs.com/ntfs\_vs\_fat.htm And now my question: Dealing with a production load in the area around 60.000 files (=pictures) per day each around 300 kb in size, what would be the best ratio of what number of files in what number of directorys to make the search-time best? Obviously I do not put all the files in one directory, but in a number of dir's. So what would be the best economy for such a thing? Seems to be hard to find information about on the web. Thanx' in advance, Kind regards,

                  Michael Pauli

                  V Offline
                  V Offline
                  virang_21
                  wrote on last edited by
                  #8

                  Checkout ZFS by Sun Microsystems (Now : Oralce ) .... http://hub.opensolaris.org/bin/view/Community+Group+zfs/whatis[^]

                  Zen and the art of software maintenance : rm -rf * Math is like love : a simple idea but it can get complicated.

                  1 Reply Last reply
                  0
                  Reply
                  • Reply as topic
                  Log in to reply
                  • Oldest to Newest
                  • Newest to Oldest
                  • Most Votes


                  • Login

                  • Don't have an account? Register

                  • Login or register to search.
                  • First post
                    Last post
                  0
                  • Categories
                  • Recent
                  • Tags
                  • Popular
                  • World
                  • Users
                  • Groups