we need help for this project

kenvil

we are students and we have our thesis this semester... we're looking for some guidelines on ways we could accomplish our objectives. Details: Title: Advanced Duplicate File Scanning Tool Over LAN Model of the Study: NoClone Problems: 1. Comparing files produce inaccurate results. * Distinguishing same contents in different text file formats as duplicate file. There are times that users make used of notepad (.txt) as a way to edit or make drafts of something to start a document, afterwards it will be transferred to MS WORD. The content made in notepad is the same as the document transferred to MS WORD, meaning both of them has exact contents, logically they were duplicates but this case was ignored by the system because they have different formats (Not Supported). *Failure to recognize files with the following scenarios test.doc has been created and copied, producing files such as “copy of test.doc”, “copy (1) of test.doc”, “copy (2) of test.doc”, “copy (3) of test.doc” , “copy (4) of test.doc”. On the latter part, after the scanning process with NoClone, there is no results listing these five files are said to be duplicate files. In addition, the proponents made a word document and typed the word “test” and it was then saved with “test.doc” filename. Afterwards, the file was intentionally “Save As” to another filename with the same content. This procedure continues until it reaches the fourth similar file which is “test4.doc”. Right after the saving of file, NoClone application was launched and tested to scan duplicate files. But surprisingly it didn’t found anything. This kind of instances only shows that it doesn’t have enough capability to have complete and reliable scan. Furthermore, NoClone is unable to scan within the archive of .zip or compressed file. In scanning for duplicate files, NoClone doesn’t include the files contained by the zip or compressed file nevertheless, it ONLY include the .zip file itself but not including the file being compressed. 2. No option to search for a particular file name or folder to search for its duplicate The system doesn’t have a field or an option to key in a particular filename or folder which a user wants to find distinctively. The ability to search for a particular file name is such a useful function so that the scanning process will not produce unintended results. 3. Inability to schedule the scanning process at specific date and time, identifying which computer to scan and compare. As the file grows bigger everyday, the

quacks_a_lot

1. For your first problem you need to be able to read each of the file formats to get the text each file contains and compare. 2. Your second problem is simple. When you are scanning the computer for the duplicate files, log each duplicate file's name and path in an array. When your scan completes, list the results with checkboxes by them so the user can choose which files to delete and which files to save. You can also make it so the user can double-click on the item and view it before he makes his choice. 3. Windows has Scheduled Tasks folder that you can open and schedule a date and time to scan. If you look hard enough, you can probably find some code somewhere on the Internet that will allow you to add a scheduled task. I'm assuming you are using Windows. In order to help out more I need to now what programming language you are using. Are you making this file scanner from scratch or do you already have some code to start out with? I'm not going to help you out much because this is a thesis. Don't get the wrong idea.

rwestgraham

kenvil wrote: The content made in notepad is the same as the document transferred to MS WORD, meaning both of them has exact contents, logically they were duplicates but this case was ignored by the system because they have different formats (Not Supported). Maybe by your "logic", but there is absolutely no reason to assume that the user considers the file a duplicate. Case in point: I have a resume, that I create with Word 2003. However I recognize that many potential employers may have an older version of Word. So I save the document in a Word 98 format on the assumption that any one can read that format. I have a third version which is saved as a simple text file because that is the format often encountered when submitting a resume through a website. All three documents have the same "content", but they are by no means duplicates.

kenvil

thanks alot. we're plannning to use VB 6.0... we want it from scratch. is it possible to imbed a code to an existing duplicate file finder?

quacks_a_lot

kenvil wrote: is it possible to imbed a code to an existing duplicate file finder? Only if you have the source code for the file finder. But then it wouldn't really be from scratch. VB6 comes with the DriveListBox, DirListBox, and FileListBox controls to get information about drives, folders, and files on the local computer. I don't think that you can set them to return information about a remote computer. You will probably have to create two apllications. A client program that runs in the background and allows the user to clean his own machine if he wishes to, and an administrator program that can connect to the client program on a remote computer. The client program will scan the computer as requested by the administrator program. The administrator can then decide what to do with the duplicate files. I don't know much about how to send this sort of information over LAN but, if you are willing to learn it, DirectPlay can probably be of use to you. If you don't like the DriveListBox, DirListBox, and FileListBox controls that come with VB, you can check out the FindFirstFile, FindNextFile, and FindClose functions of the kernel32 DLL. I've used these calls before and can send you snippets if you are interested.

kenvil

yup sir...