virus scanners are stupid : why not hash database?

newton saber

So the virus scanner at work is slowing down a download for Android Studio which is: https://dl.google.com/android/android-sdk\_r22.6.2-windows.zip Well, that's a google zip so it's probably okay. But the stupid virus scanner could just check the domain and the hash of the original zip and then know that it is okay. But instead the virus scanner doesn't and it slows the installation to a terrible crawl as it : 1. unzips the files. 2. copies files to sandbox 3. examines each file for malicious bytes It's terrible. VirusTotal.com So I thought, wow, VirusTotal.com probably has the hash of this file anyways and we could instantly discover if it were malicious. I go over there, enter the link and it reports 1 of 63 (of virus scanners) as being malicious. Blueliv reports malicious content. You can see the report at: https://www.virustotal.com/en/url/3a290d3a65cebd7a29a9c98106bfc04ef5e1f2df096412b55cd5de3f76e3b65b/analysis/1431690217/[^] What!?! Ugh. There has to be a better way. ***** EDIT **** Okay, I also noticed that if you click the information tab on the report, it says download size exceeded size allowed only 32 Mb will be downloaded. So I guess virustotal.com basically failed. Confusing.

Lost User

Because a hash database is very easy to defeat. Just sprinkle some random bytes around in unimportant places and you pass the test again.

newton saber

Wow, really? I thought a hash of a file was extremely difficult to defeat. I mean using a proper (uncracked) hash algorithm. I know MD5 was cracked but others? Interesting.

Lost User

It's hard to generate a collision, it's trivial to generate a non-collision. edit: OK so I re-read your original post, do you mean make a database of known-OK files? That wouldn't be easy to get past.. it would also be completely unworkable though.

Sascha Lefevre

Harold, I think you misunderstood him. Nevermind, I saw your edit :) And I think newton.saber overlooks one detail: While checking the actual hash against the should-be-hash obtained from an official site would probably work to detect automatic infections, it wouldn't if there was someone who "engineers" a new, infected zip file to match the hashcode of the original.

If the brain were so simple we could understand it, we would be so simple we couldn't. — Lyall Watson

newton saber

Sascha Lefèvre wrote:

it wouldn't if there was someone who "engineers" a new, infected zip file to match the hashcode of the original.

I thought of that, but I didn't think it was feasible if the HASH algorithm wasn't cracked. I didn't believe that was possible. Is it programmatically easy? I'm confused by this. Are you saying that if I obtain the hash for a windows DLL for example, then someone could create their malicious DLL and sprinkle bytes into it to match the hash of the original DLL, thus taking over my windows DLL? No one is safe. :)

Sascha Lefevre

A cracked hash algorithm would mean that the solution-space of possible input data can be narrowed down from the hash. But you don't need that here. Someone could infect some file in the zip archive with a malware and then modify "unimportant" parts of the archive so that it yields a hash collision with the original archive. Wouldn't be trivial but possible (in case of small archives potentially impossible).

If the brain were so simple we could understand it, we would be so simple we couldn't. — Lyall Watson

Dave Kreskowiak

And how are you going to find out if the files you're zipping up in your installation are infected before you post it for everyone to download and install on their machines??

A guide to posting questions on CodeProject

Click this: Asking questions is a skill. Seriously, do it.
Dave Kreskowiak

Lost User

You don't; if you download your OS, you get a checksum. Verify it, and compile and build the OS. Next download the sources for the tools you need.. ;P

Bastard Programmer from Hell :suss: If you can't read my code, try converting it here[^][](X-Clacks-Overhead: GNU Terry Pratchett)

newton saber

Ah, the singularity. This would be agreeable, to say that the original builder of the code: 1. builds the target 2. scans the target for malicious bytes 3. verifies there are no such malicious bytes in his target 4. generates the hash and publishes it along with his target. No more virus scanning needed. If it matches the hash it _must_ be the same file. But, alas, they are saying this can be hacked. Hmmm...who'da thunk it?

mikepwilson

Yeah it's pretty trivial. It's not a matter of cracking the algorithm.

Dave Kreskowiak

:laugh:

A guide to posting questions on CodeProject

Click this: Asking questions is a skill. Seriously, do it.
Dave Kreskowiak

Dave Kreskowiak

newton.saber wrote:

2. scans the target for malicious bytes

With what? A compiler/linker will output an .EXE which a virus can immediately infect so how are you going to know what the "malicious bytes" are? How are you going to compare what the compiler/linker intended to write to disk with what was actually written? There is a window of time between when the file is written to disk and when the hash algorithm is run against it that a virus can infect the file. This is the piece you are forgetting about.

newton.saber wrote:

No more virus scanning needed.

Bull. Granted virus scanners are not a perfect solution. The entire industry is stuck on the side of being reactionary to a new virus because there is currently no technology that exists that can guarantee a file being written to disk is what was intended to be written by some non-virus application operation.

A guide to posting questions on CodeProject

Click this: Asking questions is a skill. Seriously, do it.
Dave Kreskowiak

newton saber

Dave Kreskowiak wrote:

so how are you going to know what the "malicious bytes"

Uh, isn't that what the virus scanners do? They have signatures of malicious bytes and they scan the bytes in the target to determine if there are bytes that match? Isn't that why virus scanning is so slow? It sounds as if there is no way to ever determine whether or not the code a dev build is virus-free.

Dave Kreskowiak

newton.saber wrote:

Uh, isn't that what the virus scanners do?

And you said they were stupid. Maybe, but it's the only solution we have.

newton.saber wrote:

It sounds as if there is no way to ever determine whether or not the code a dev build is virus-free.

No there isn't and this is where "managing risk" comes into play.

A guide to posting questions on CodeProject

Click this: Asking questions is a skill. Seriously, do it.
Dave Kreskowiak