Comparing file contents is slow, so you only want to do it when you have to. You can go through the drives and build a hash table of files and their sizes. Different sizes mean no duplicates. For the files that have the same size, the next step is to look at contents (I'm assuming you also want to detect duplicate contents stored under different names, or with different times.) You could store check sums with the files, as Luc suggested, but check sums could be wrong for similar files. They can be used for avoiding comparing file contents: If the check sums are different, the files are different. But if they're the same, you have to look at the contents for confirmation.