Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. The Lounge
  3. Storing huge numbers of files

Storing huge numbers of files

Scheduled Pinned Locked Moved The Lounge
cryptographyquestionlounge
42 Posts 28 Posters 1 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • K kalberts

    Can you provide an explanation of why it would be that way? Or is it at the "gut feeling" level?

    M Offline
    M Offline
    milo xml
    wrote on last edited by
    #31

    I've had experience with a large numbers of files in a directory. They load slow, if at all when you try to view them in the directory.

    1 Reply Last reply
    0
    • J JohnnyCee

      I’m “curious” why you “quoted” those “words” in your “post”. JohnnyCee

      L Offline
      L Offline
      Lost User
      wrote on last edited by
      #32

      Too lazy to use italics. "Builds": iterating and instantiating. "Hangs": no response or exceeding an acceptable response time. "reading": file i/o "display": where one loads a visual element for each file object. Better?

      It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it. ― Confucian Analects: Rules of Confucius about his food

      J 1 Reply Last reply
      0
      • L Lost User

        Too lazy to use italics. "Builds": iterating and instantiating. "Hangs": no response or exceeding an acceptable response time. "reading": file i/o "display": where one loads a visual element for each file object. Better?

        It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it. ― Confucian Analects: Rules of Confucius about his food

        J Offline
        J Offline
        JFCee
        wrote on last edited by
        #33

        I don't think emphasis is required for those words, but you do you.

        L 1 Reply Last reply
        0
        • D Dave Kreskowiak

          If users would be copying these files to a USB stick for any reason, you may run into a problem as formatting a stick using FAT32 is a distinct possibility.

          Asking questions is a skill CodeProject Forum Guidelines Google: C# How to debug code Seriously, go read these articles.
          Dave Kreskowiak

          U Offline
          U Offline
          User 13224750
          wrote on last edited by
          #34

          You could always format the USB stick with NTFS.

          D 1 Reply Last reply
          0
          • U User 13224750

            You could always format the USB stick with NTFS.

            D Offline
            D Offline
            Dave Kreskowiak
            wrote on last edited by
            #35

            You could, but how many users actually read the documentation for your app?

            Asking questions is a skill CodeProject Forum Guidelines Google: C# How to debug code Seriously, go read these articles.
            Dave Kreskowiak

            1 Reply Last reply
            0
            • O obermd

              NTFS uses database techniques for file management.

              L Offline
              L Offline
              Lost User
              wrote on last edited by
              #36

              Which is not the same as using a database. The Dokan libraries have proven that a DB is very capable as a FS.

              Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^] "If you just follow the bacon Eddy, wherever it leads you, then you won't have to think about politics." -- Some Bell.

              1 Reply Last reply
              0
              • K kalberts

                This is about file systems in general, although with a primary emphasis on NTFS: If you are expecting to stor a huge number of files - in the order of 100 k or more - on a disk, is there any significant advantage of spreading them over a number of subdirectories (based on some sort of hash)? Or are modern file systems capable of handling a huge number of files in a single level directory?` If there are reasons to distribute the files over a series of subdirectories, what are the reasons (/explanations) why it would be an advantage? Is this differnent e.g among differnt FAT variants, and with NTFS?

                M Offline
                M Offline
                Member 1231137
                wrote on last edited by
                #37

                It really depends on your use case for accessing/managing these files. If you're going to be enumerating the files a lot (or portions of the files) then everything in one directory/folder may not be the best. You can at least "chunk up" the enumeration by subfolder if you create those. Also, if you break them up into subfolders in some logical way, then managing those units and/or groupings of files will become much easier. I.E. Backups, restoring, archiving, deleting. If you are storing the path to each file in a database, then you're going to get the same performance either way (subdirectories and everyone in the pool together). Can you explain a little more about the repository and how you'll be using it?

                1 Reply Last reply
                0
                • K kalberts

                  This is about file systems in general, although with a primary emphasis on NTFS: If you are expecting to stor a huge number of files - in the order of 100 k or more - on a disk, is there any significant advantage of spreading them over a number of subdirectories (based on some sort of hash)? Or are modern file systems capable of handling a huge number of files in a single level directory?` If there are reasons to distribute the files over a series of subdirectories, what are the reasons (/explanations) why it would be an advantage? Is this differnent e.g among differnt FAT variants, and with NTFS?

                  E Offline
                  E Offline
                  englebart
                  wrote on last edited by
                  #38

                  Consider drive corruption, backups, replication, file listeners, aging/document retention and all of the other access aspects as well. Folder per day/month/year can help out with some of those items as suggested on another post.

                  1 Reply Last reply
                  0
                  • K kalberts

                    This is about file systems in general, although with a primary emphasis on NTFS: If you are expecting to stor a huge number of files - in the order of 100 k or more - on a disk, is there any significant advantage of spreading them over a number of subdirectories (based on some sort of hash)? Or are modern file systems capable of handling a huge number of files in a single level directory?` If there are reasons to distribute the files over a series of subdirectories, what are the reasons (/explanations) why it would be an advantage? Is this differnent e.g among differnt FAT variants, and with NTFS?

                    D Offline
                    D Offline
                    ddrogahn
                    wrote on last edited by
                    #39

                    You might get by if using SSDs, or files are large and accessed directly & infrequently, and won't increase by orders of magnitude. Better to spread them out. Huge directories in NTFS: * Accessing individual files is OK * Adding/removing/listing/sorting gets slow (consider EnumerateFiles instead of GetFiles) * Reading metadata (mod date) is slow (makes Explorer detail view slow) * Network access is slower * Defragging directories (with contig) helps some (also moving large dirs with robocopy /create) Directories (and empty/tiny files) are stored in the MFT. A massive number of MFT entries can be a problem. The MFT starting size is set when (and only when) you format the disk (controlled by a registry key). It will expand if needed (but fragment), and will contract (if possible) when space is low. Defragging MFT is possible but slow and difficult. After a disk was full of files, or had the MFT filled by directories or tiny files -- it may be best to reformat. How to segment depends on how sparse the file IDs will be. About 4k entries is a good starting target. If files have numeric IDs: Avoid bit shifts, for simplicity. Group into 3 digits (base 10) = 1000 files + 1000 subdirs or 3 hex chars 0xFFF = upto 4k files + 4k subdirs eg. 000/0.dat - 999.dat 001/1000.dat - 1999.dat 999/999000.dat - 999999.dat ... 000/001/1000000.dat - 1000999.dat 001/001/1001000.dat - 1001999.dat 123/987/987123000.dat - 987123999.dat

                    1 Reply Last reply
                    0
                    • J JFCee

                      I don't think emphasis is required for those words, but you do you.

                      L Offline
                      L Offline
                      Lost User
                      wrote on last edited by
                      #40

                      It's from writing too many User Manuals. As in: the CD disk "tray" is not a "cup holder." Glad to know you and your users are more sophisticated and have time to sweat this stuff.

                      It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it. ― Confucian Analects: Rules of Confucius about his food

                      1 Reply Last reply
                      0
                      • K kalberts

                        Why would the size of the files matter? Very few are small enough to fit in the available space of the directory entry. Yes, they are files, by definition. Mostly, new files are added to the directory. This is the most common access. File access is far more infrequent.

                        J Offline
                        J Offline
                        JasonSQ
                        wrote on last edited by
                        #41

                        File size is critically important. If you're breaking across the block size by just a little bit, the rest of the block is dead space. Assuming 4k block size and files storing 1K of data. That's 3k of wasted space on disk, per file. If you zip up the files, they'll store much, much more efficiently. We have this problem with hundreds of thousands of small text files. We sweep them up and zip them into archive folders on occasion to clean up the folders and reclaim disk space.

                        1 Reply Last reply
                        0
                        • K kalberts

                          This is about file systems in general, although with a primary emphasis on NTFS: If you are expecting to stor a huge number of files - in the order of 100 k or more - on a disk, is there any significant advantage of spreading them over a number of subdirectories (based on some sort of hash)? Or are modern file systems capable of handling a huge number of files in a single level directory?` If there are reasons to distribute the files over a series of subdirectories, what are the reasons (/explanations) why it would be an advantage? Is this differnent e.g among differnt FAT variants, and with NTFS?

                          J Offline
                          J Offline
                          Jim Knopf jr
                          wrote on last edited by
                          #42

                          Explorer does two things. Read the entries and sort them. Looks like reading is linear and sorting too. So you get N^2 time behavior. This didn’t change for decades. Some file systems allow accessing the files with a kind of pointer, avoiding the directory once you know the pointer. Nevertheless, adding and deleting files still has to touch the directory. Looking up directory names has the same problem. So better construct a directory “tree”. File size doesn’t matter for name lookups.

                          1 Reply Last reply
                          0
                          Reply
                          • Reply as topic
                          Log in to reply
                          • Oldest to Newest
                          • Newest to Oldest
                          • Most Votes


                          • Login

                          • Don't have an account? Register

                          • Login or register to search.
                          • First post
                            Last post
                          0
                          • Categories
                          • Recent
                          • Tags
                          • Popular
                          • World
                          • Users
                          • Groups