NTFS, win7 and 20 million files
-
Hi, I'm about to begin a small project in which I must be able to store and lookup up to 20. mio. files - in the best possible way. Needless to say fast. For this I have been around - http://en.wikipedia.org/wiki/NTFS#Limitations http://www.ntfs.com/ntfs\_vs\_fat.htm And now my question: Dealing with a production load in the area around 60.000 files (=pictures) per day each around 300 kb in size, what would be the best ratio of what number of files in what number of directorys to make the search-time best? Obviously I do not put all the files in one directory, but in a number of dir's. So what would be the best economy for such a thing? Seems to be hard to find information about on the web. Thanx' in advance, Kind regards,
Michael Pauli
-
Hi, I'm about to begin a small project in which I must be able to store and lookup up to 20. mio. files - in the best possible way. Needless to say fast. For this I have been around - http://en.wikipedia.org/wiki/NTFS#Limitations http://www.ntfs.com/ntfs\_vs\_fat.htm And now my question: Dealing with a production load in the area around 60.000 files (=pictures) per day each around 300 kb in size, what would be the best ratio of what number of files in what number of directorys to make the search-time best? Obviously I do not put all the files in one directory, but in a number of dir's. So what would be the best economy for such a thing? Seems to be hard to find information about on the web. Thanx' in advance, Kind regards,
Michael Pauli
this is a programming question and does not belong in this forum. to answer the question, if the # of files is that big, I might opt for a batch running during the night and putting the files into a database (if that is an option). I found that recursive methods work very fast in this case.
V.
-
this is a programming question and does not belong in this forum. to answer the question, if the # of files is that big, I might opt for a batch running during the night and putting the files into a database (if that is an option). I found that recursive methods work very fast in this case.
V.
Thank you for your answer. Thing is that we must use a file system for it and not a db. Second I'm not sure that a recursive principle would be first thing to go for for us due to the large number of files. Kind regards,
Michael Pauli
-
Thank you for your answer. Thing is that we must use a file system for it and not a db. Second I'm not sure that a recursive principle would be first thing to go for for us due to the large number of files. Kind regards,
Michael Pauli
Michael Pauli wrote:
Second I'm not sure that a recursive principle would be first thing to go for for us due to the large number of files.
I don't understand why you would say that, but I didn't vote you down. However I would suggest to move this post to the proper forum.
V.
-
Michael Pauli wrote:
Second I'm not sure that a recursive principle would be first thing to go for for us due to the large number of files.
I don't understand why you would say that, but I didn't vote you down. However I would suggest to move this post to the proper forum.
V.
I'll move it. Sorry for the inconvenience. Kind regards,
Michael Pauli
-
Hi, I'm about to begin a small project in which I must be able to store and lookup up to 20. mio. files - in the best possible way. Needless to say fast. For this I have been around - http://en.wikipedia.org/wiki/NTFS#Limitations http://www.ntfs.com/ntfs\_vs\_fat.htm And now my question: Dealing with a production load in the area around 60.000 files (=pictures) per day each around 300 kb in size, what would be the best ratio of what number of files in what number of directorys to make the search-time best? Obviously I do not put all the files in one directory, but in a number of dir's. So what would be the best economy for such a thing? Seems to be hard to find information about on the web. Thanx' in advance, Kind regards,
Michael Pauli
at work we deal with a large number files, (publication company). If possible I would suggest you to keep the file names in a database table as index and use some sort of folder structure, based on file name or index id, so that u don't end up with mil files in one folder. U may have to break down to two level folder structure.
-ID[100-500] //Database Index Id within 100 to 500 etc.. \----[A] //File name starting A \----[B]
what ever u want to do, you don't want do a look up in folders like Directory.GetFiles(), it will slow things down. So your best bet is to work out the file name from database table and recreate the file path based on the folder structure and directly access the file. Also if you don't want to use database table as index, then use some sort of naming scheme in ur files. Like FILECATEGORY_FileName.ext Then based on file category you have top level folders. Then based on file name first 3 character, you can many sub folders. That way files will be spread over two level folders without causing too much issues for regular folder browsing in explorer.----------------------------- @SazzadHossain -----------------------------
-
this is a programming question and does not belong in this forum. to answer the question, if the # of files is that big, I might opt for a batch running during the night and putting the files into a database (if that is an option). I found that recursive methods work very fast in this case.
V.
That's how most fast file search engines work... Including Linux'
find
... +5 for the suggestions (both the db and moving the question suggestions) -
Hi, I'm about to begin a small project in which I must be able to store and lookup up to 20. mio. files - in the best possible way. Needless to say fast. For this I have been around - http://en.wikipedia.org/wiki/NTFS#Limitations http://www.ntfs.com/ntfs\_vs\_fat.htm And now my question: Dealing with a production load in the area around 60.000 files (=pictures) per day each around 300 kb in size, what would be the best ratio of what number of files in what number of directorys to make the search-time best? Obviously I do not put all the files in one directory, but in a number of dir's. So what would be the best economy for such a thing? Seems to be hard to find information about on the web. Thanx' in advance, Kind regards,
Michael Pauli