Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. The Lounge
  3. Storing huge numbers of files

Storing huge numbers of files

Scheduled Pinned Locked Moved The Lounge
cryptographyquestionlounge
42 Posts 28 Posters 1 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • K kalberts

    I hope to persue most users to go for NTFS rather than FAT32. The most common access will be through an application, which will read the directory programmatically. Windows Explorer access can be considered an exception (although not that exceptional!).

    L Offline
    L Offline
    Lost User
    wrote on last edited by
    #19

    Windows explorer will be your bottleneck ... while you sit and wait while it "builds" a 100k tree view. Odds are, it will "hang". "Reading" directories is not a big deal; how you "display" them is.

    It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it. ― Confucian Analects: Rules of Confucius about his food

    J 1 Reply Last reply
    0
    • K kalberts

      This is about file systems in general, although with a primary emphasis on NTFS: If you are expecting to stor a huge number of files - in the order of 100 k or more - on a disk, is there any significant advantage of spreading them over a number of subdirectories (based on some sort of hash)? Or are modern file systems capable of handling a huge number of files in a single level directory?` If there are reasons to distribute the files over a series of subdirectories, what are the reasons (/explanations) why it would be an advantage? Is this differnent e.g among differnt FAT variants, and with NTFS?

      J Offline
      J Offline
      Jorgen Andersson
      wrote on last edited by
      #20

      What are you saving the files for? How will you access them? And how will you search for them? One at a time, sequential, by date, by name...?

      Wrong is evil and must be defeated. - Jeff Ello Never stop dreaming - Freddie Kruger

      1 Reply Last reply
      0
      • K kalberts

        This is about file systems in general, although with a primary emphasis on NTFS: If you are expecting to stor a huge number of files - in the order of 100 k or more - on a disk, is there any significant advantage of spreading them over a number of subdirectories (based on some sort of hash)? Or are modern file systems capable of handling a huge number of files in a single level directory?` If there are reasons to distribute the files over a series of subdirectories, what are the reasons (/explanations) why it would be an advantage? Is this differnent e.g among differnt FAT variants, and with NTFS?

        S Offline
        S Offline
        soulesurfer
        wrote on last edited by
        #21

        Neither Windows nor Linux do well when putting too many files in a single folder. I've tried it with a million files, it is very painful. Some operations, like simply listing the directory, or even trying to delete the files take absurdly long. It seems to be doing some operations that are simply not designed for large numbers of files. Like said, around 10,000 files in a folder is a reasonable max. I simply make it 1,000. So for a million files, spread them across 1,000 folders. There is a nice symmetry here, and it works like a charm.

        1 Reply Last reply
        0
        • K kalberts

          This is about file systems in general, although with a primary emphasis on NTFS: If you are expecting to stor a huge number of files - in the order of 100 k or more - on a disk, is there any significant advantage of spreading them over a number of subdirectories (based on some sort of hash)? Or are modern file systems capable of handling a huge number of files in a single level directory?` If there are reasons to distribute the files over a series of subdirectories, what are the reasons (/explanations) why it would be an advantage? Is this differnent e.g among differnt FAT variants, and with NTFS?

          J Offline
          J Offline
          JohaViss61
          wrote on last edited by
          #22

          A few years ago I worked on a system that generates around 50.000 to 100.000 files a day. We ran in trouble right away. Storing the files was not a problem, but retrieving them was impossible. And a second problem was that we needed to search the contents of the files to find all files with a certain string in the text. We eventually choose to store all files in a database. This was quite easy because the files were small. (Less than 10K) We choose an Oracle database because of the CLOB datatype. (it allows for indexing and searching) We had no problems since and have more the 200 million files.:cool:

          1 Reply Last reply
          0
          • K kalberts

            This is about file systems in general, although with a primary emphasis on NTFS: If you are expecting to stor a huge number of files - in the order of 100 k or more - on a disk, is there any significant advantage of spreading them over a number of subdirectories (based on some sort of hash)? Or are modern file systems capable of handling a huge number of files in a single level directory?` If there are reasons to distribute the files over a series of subdirectories, what are the reasons (/explanations) why it would be an advantage? Is this differnent e.g among differnt FAT variants, and with NTFS?

            J Offline
            J Offline
            jarvisa
            wrote on last edited by
            #23

            I worked on a system that had to stream 1MB images to disk at 75fps. I found that once there were about 700 files in a directory, creating new files suddenly became slower and the required transfer rate was unachievable. I ended up creating a new subdirectory every 500 files. Of course this won't be a problem if your system is purely for archive.

            1 Reply Last reply
            0
            • L Lost User

              Windows explorer will be your bottleneck ... while you sit and wait while it "builds" a 100k tree view. Odds are, it will "hang". "Reading" directories is not a big deal; how you "display" them is.

              It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it. ― Confucian Analects: Rules of Confucius about his food

              J Offline
              J Offline
              JohnnyCee
              wrote on last edited by
              #24

              I’m “curious” why you “quoted” those “words” in your “post”. JohnnyCee

              L 1 Reply Last reply
              0
              • K kalberts

                Can you provide an explanation of why it would be that way? Or is it at the "gut feeling" level?

                D Offline
                D Offline
                Daniel Wilianto
                wrote on last edited by
                #25

                Probably because a windows folder isn't designed to contained 10,000 files, unlike database table which is expected to contain millions of rows. Or spreadsheets. When we browse into a folder using windows explorer, it tried to read all file names inside that folder. There's no virtualization or partial loading. Reading 10,000 file names and extensions is surely detrimental. EDIT : it's probably fine as long as you don't browse it using any explorer view.

                1 Reply Last reply
                0
                • K kalberts

                  Can you provide an explanation of why it would be that way? Or is it at the "gut feeling" level?

                  O Offline
                  O Offline
                  obermd
                  wrote on last edited by
                  #26

                  At one point Microsoft actually recommended no more than 10,000 files per directory in NTFS. This was years ago, however. The real reason is that file name scans inside a directory are sequential.

                  1 Reply Last reply
                  0
                  • K kalberts

                    Why would the size of the files matter? Very few are small enough to fit in the available space of the directory entry. Yes, they are files, by definition. Mostly, new files are added to the directory. This is the most common access. File access is far more infrequent.

                    O Offline
                    O Offline
                    obermd
                    wrote on last edited by
                    #27

                    In this case I would create a sub directory structure based on the date of file addition.

                    1 Reply Last reply
                    0
                    • L Lost User

                      File-access; so, mostly reading "files"? A database would give you the most flexibility and performance. --edit You can easily expand Sql Server over multiple servers if need be, with more control over sharding and backups than with a regular filesystem.

                      Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^] "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

                      O Offline
                      O Offline
                      obermd
                      wrote on last edited by
                      #28

                      NTFS uses database techniques for file management.

                      L 1 Reply Last reply
                      0
                      • D DRHuff

                        People don’t relate well to numbers and this is a place where camaraderie is important. A name - even an obvious alias will make the interactions more personable. ;)

                        If you can't laugh at yourself - ask me and I will do it for you.

                        C Offline
                        C Offline
                        Choroid
                        wrote on last edited by
                        #29

                        Does that mean I am friends with OriginalGiff ?

                        1 Reply Last reply
                        0
                        • K kalberts

                          This is about file systems in general, although with a primary emphasis on NTFS: If you are expecting to stor a huge number of files - in the order of 100 k or more - on a disk, is there any significant advantage of spreading them over a number of subdirectories (based on some sort of hash)? Or are modern file systems capable of handling a huge number of files in a single level directory?` If there are reasons to distribute the files over a series of subdirectories, what are the reasons (/explanations) why it would be an advantage? Is this differnent e.g among differnt FAT variants, and with NTFS?

                          A Offline
                          A Offline
                          agolddog
                          wrote on last edited by
                          #30

                          I don't know about access issues for a large number of files in a directory, but you might also consider security issues. If, for example, you have several different users whose files should not be accessible by the others, creating a subfolder for each user might allow you to secure them such that only their user has access to their subfolder (plus maybe some 'admin' user that you use which can see all directories). Obvious organizational advantages as well.

                          1 Reply Last reply
                          0
                          • K kalberts

                            Can you provide an explanation of why it would be that way? Or is it at the "gut feeling" level?

                            M Offline
                            M Offline
                            milo xml
                            wrote on last edited by
                            #31

                            I've had experience with a large numbers of files in a directory. They load slow, if at all when you try to view them in the directory.

                            1 Reply Last reply
                            0
                            • J JohnnyCee

                              I’m “curious” why you “quoted” those “words” in your “post”. JohnnyCee

                              L Offline
                              L Offline
                              Lost User
                              wrote on last edited by
                              #32

                              Too lazy to use italics. "Builds": iterating and instantiating. "Hangs": no response or exceeding an acceptable response time. "reading": file i/o "display": where one loads a visual element for each file object. Better?

                              It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it. ― Confucian Analects: Rules of Confucius about his food

                              J 1 Reply Last reply
                              0
                              • L Lost User

                                Too lazy to use italics. "Builds": iterating and instantiating. "Hangs": no response or exceeding an acceptable response time. "reading": file i/o "display": where one loads a visual element for each file object. Better?

                                It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it. ― Confucian Analects: Rules of Confucius about his food

                                J Offline
                                J Offline
                                JFCee
                                wrote on last edited by
                                #33

                                I don't think emphasis is required for those words, but you do you.

                                L 1 Reply Last reply
                                0
                                • D Dave Kreskowiak

                                  If users would be copying these files to a USB stick for any reason, you may run into a problem as formatting a stick using FAT32 is a distinct possibility.

                                  Asking questions is a skill CodeProject Forum Guidelines Google: C# How to debug code Seriously, go read these articles.
                                  Dave Kreskowiak

                                  U Offline
                                  U Offline
                                  User 13224750
                                  wrote on last edited by
                                  #34

                                  You could always format the USB stick with NTFS.

                                  D 1 Reply Last reply
                                  0
                                  • U User 13224750

                                    You could always format the USB stick with NTFS.

                                    D Offline
                                    D Offline
                                    Dave Kreskowiak
                                    wrote on last edited by
                                    #35

                                    You could, but how many users actually read the documentation for your app?

                                    Asking questions is a skill CodeProject Forum Guidelines Google: C# How to debug code Seriously, go read these articles.
                                    Dave Kreskowiak

                                    1 Reply Last reply
                                    0
                                    • O obermd

                                      NTFS uses database techniques for file management.

                                      L Offline
                                      L Offline
                                      Lost User
                                      wrote on last edited by
                                      #36

                                      Which is not the same as using a database. The Dokan libraries have proven that a DB is very capable as a FS.

                                      Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^] "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

                                      1 Reply Last reply
                                      0
                                      • K kalberts

                                        This is about file systems in general, although with a primary emphasis on NTFS: If you are expecting to stor a huge number of files - in the order of 100 k or more - on a disk, is there any significant advantage of spreading them over a number of subdirectories (based on some sort of hash)? Or are modern file systems capable of handling a huge number of files in a single level directory?` If there are reasons to distribute the files over a series of subdirectories, what are the reasons (/explanations) why it would be an advantage? Is this differnent e.g among differnt FAT variants, and with NTFS?

                                        M Offline
                                        M Offline
                                        Member 1231137
                                        wrote on last edited by
                                        #37

                                        It really depends on your use case for accessing/managing these files. If you're going to be enumerating the files a lot (or portions of the files) then everything in one directory/folder may not be the best. You can at least "chunk up" the enumeration by subfolder if you create those. Also, if you break them up into subfolders in some logical way, then managing those units and/or groupings of files will become much easier. I.E. Backups, restoring, archiving, deleting. If you are storing the path to each file in a database, then you're going to get the same performance either way (subdirectories and everyone in the pool together). Can you explain a little more about the repository and how you'll be using it?

                                        1 Reply Last reply
                                        0
                                        • K kalberts

                                          This is about file systems in general, although with a primary emphasis on NTFS: If you are expecting to stor a huge number of files - in the order of 100 k or more - on a disk, is there any significant advantage of spreading them over a number of subdirectories (based on some sort of hash)? Or are modern file systems capable of handling a huge number of files in a single level directory?` If there are reasons to distribute the files over a series of subdirectories, what are the reasons (/explanations) why it would be an advantage? Is this differnent e.g among differnt FAT variants, and with NTFS?

                                          E Offline
                                          E Offline
                                          englebart
                                          wrote on last edited by
                                          #38

                                          Consider drive corruption, backups, replication, file listeners, aging/document retention and all of the other access aspects as well. Folder per day/month/year can help out with some of those items as suggested on another post.

                                          1 Reply Last reply
                                          0
                                          Reply
                                          • Reply as topic
                                          Log in to reply
                                          • Oldest to Newest
                                          • Newest to Oldest
                                          • Most Votes


                                          • Login

                                          • Don't have an account? Register

                                          • Login or register to search.
                                          • First post
                                            Last post
                                          0
                                          • Categories
                                          • Recent
                                          • Tags
                                          • Popular
                                          • World
                                          • Users
                                          • Groups