Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. General Programming
  3. C / C++ / MFC
  4. Sorting files by its fingerprints

Sorting files by its fingerprints

Scheduled Pinned Locked Moved C / C++ / MFC
algorithmsdata-structuresregex
2 Posts 1 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • S Offline
    S Offline
    Stephan Poirier
    wrote on last edited by
    #1

    Hi there! :-D Recently, I've noticed something that should interest you. On one of my hard drive, I have a folder where I put all the junk I download from Internet. There is much more than 500 files in there. This is because I download from WinMX and others file-sharing software like others do... What sucks with this kind of software is that you need to download so many files to be able to get a correct one. And the result, your disk are full of junk that you share with others... X| So, you have a lots of files of different names and extensions that contains almost the same thing. (It could be jpg, gif, mp3, wav, mov, avi, asf... doesn't matter.) You need to retrieve just what's good and delete everything that is partially downloaded or everything that is not what you were looking for. (ie : you were looking for a PetShop Boys picture, you have downloaded "boys_live.jpg" and retrieved a :wtf: BackStreet Boys pic... ) I already tried to sort and delete using explorer. Arrrrgghhhh!!!! It tooks me hour and hour !!! So I gave up! :) But like every programmers, I think that, when something is hard to do manually or when it takes ages to do it manually on a computer, you should program the computer to do it at your place. The first computer was exactly invented for that. What I thought is to scan my folder and look for every file within this folder. During the scan I do a sort of pre-sorting with the first 128 bytes of each file, it should be sufficiant to compare jpg from mp3, mov... I verify every first 128 bytes of each files, put each filename and its first 128 bytes on an resizable array called CURRENT. Notice that if a file contains 0 bytes, it should be deleted immeditaly. After, the first 128 bytes block of each files are compared one to eachothers in an iteration. Everytime a new match is did, a new array group is created containing the 128 bytes block fingerprint, and all files matching this fingerprint will be added to this group. When done, the program will have created muliple groups containing filenames of 'matching' files. Now it will have to scan completly each file fingerprint within each group. Some files may be incomplete too. It should be long and complicated I think. I don't know if I have to work with checksums, scan and compare each byte of the file ... It could be long if we have 12 files of 8M to compare bytes to bytes! (What it seems to appear frequently !) Again, everytime a new match is did, a new array group is created and all files matchi

    S 1 Reply Last reply
    0
    • S Stephan Poirier

      Hi there! :-D Recently, I've noticed something that should interest you. On one of my hard drive, I have a folder where I put all the junk I download from Internet. There is much more than 500 files in there. This is because I download from WinMX and others file-sharing software like others do... What sucks with this kind of software is that you need to download so many files to be able to get a correct one. And the result, your disk are full of junk that you share with others... X| So, you have a lots of files of different names and extensions that contains almost the same thing. (It could be jpg, gif, mp3, wav, mov, avi, asf... doesn't matter.) You need to retrieve just what's good and delete everything that is partially downloaded or everything that is not what you were looking for. (ie : you were looking for a PetShop Boys picture, you have downloaded "boys_live.jpg" and retrieved a :wtf: BackStreet Boys pic... ) I already tried to sort and delete using explorer. Arrrrgghhhh!!!! It tooks me hour and hour !!! So I gave up! :) But like every programmers, I think that, when something is hard to do manually or when it takes ages to do it manually on a computer, you should program the computer to do it at your place. The first computer was exactly invented for that. What I thought is to scan my folder and look for every file within this folder. During the scan I do a sort of pre-sorting with the first 128 bytes of each file, it should be sufficiant to compare jpg from mp3, mov... I verify every first 128 bytes of each files, put each filename and its first 128 bytes on an resizable array called CURRENT. Notice that if a file contains 0 bytes, it should be deleted immeditaly. After, the first 128 bytes block of each files are compared one to eachothers in an iteration. Everytime a new match is did, a new array group is created containing the 128 bytes block fingerprint, and all files matching this fingerprint will be added to this group. When done, the program will have created muliple groups containing filenames of 'matching' files. Now it will have to scan completly each file fingerprint within each group. Some files may be incomplete too. It should be long and complicated I think. I don't know if I have to work with checksums, scan and compare each byte of the file ... It could be long if we have 12 files of 8M to compare bytes to bytes! (What it seems to appear frequently !) Again, everytime a new match is did, a new array group is created and all files matchi

      S Offline
      S Offline
      Stephan Poirier
      wrote on last edited by
      #2

      No ones have an idea ??? Progamming looks like taking drugs... I think I did an overdose. ;-P

      1 Reply Last reply
      0
      Reply
      • Reply as topic
      Log in to reply
      • Oldest to Newest
      • Newest to Oldest
      • Most Votes


      • Login

      • Don't have an account? Register

      • Login or register to search.
      • First post
        Last post
      0
      • Categories
      • Recent
      • Tags
      • Popular
      • World
      • Users
      • Groups