Sniffer
-
Yes, I know that's not a very appealing name, but I am working on a module that needs to detect the type of file...not whether .doc or .rtf or the like, but more like...whether it's a CSV file, XML file, line printer file ... you get the idea. This is to make the user's life easier by detecting/pre-scanning what file he's selected, so I can make some informed choices on his behalf earlier (surprise! intelligent software!!). There are additional complexities, such as if it's a CSV file the delimiters could vary etc. Are there any good algorithms anyone can point me to? Or just even some general coding ideas would be great.
-
Yes, I know that's not a very appealing name, but I am working on a module that needs to detect the type of file...not whether .doc or .rtf or the like, but more like...whether it's a CSV file, XML file, line printer file ... you get the idea. This is to make the user's life easier by detecting/pre-scanning what file he's selected, so I can make some informed choices on his behalf earlier (surprise! intelligent software!!). There are additional complexities, such as if it's a CSV file the delimiters could vary etc. Are there any good algorithms anyone can point me to? Or just even some general coding ideas would be great.
You mean something like the Unix/Linux
file
command? Get the source from ftp.astron.com and see how it's done. -
Yes, I know that's not a very appealing name, but I am working on a module that needs to detect the type of file...not whether .doc or .rtf or the like, but more like...whether it's a CSV file, XML file, line printer file ... you get the idea. This is to make the user's life easier by detecting/pre-scanning what file he's selected, so I can make some informed choices on his behalf earlier (surprise! intelligent software!!). There are additional complexities, such as if it's a CSV file the delimiters could vary etc. Are there any good algorithms anyone can point me to? Or just even some general coding ideas would be great.
There was an app on the Amiga that'd do that (it gave you a single application that you could set as the default app to open any file type, it'd analyze the file then open the appropriate viewer) - I can't remember its name though :( It worked by having a definition entry for each file type to be handled. E.g. a Windows Bitmap file could be described as having the extension .bmp and its binary data starting with the chars "BM" basically it let you define regular expressions on extension / bytes at a given address / bytes searched for / file size and an assortment of other file attributes, and seemed to be very good at picking up the correct file type for anything -- Help me! I'm turning into a grapefruit!