Antivirus firms are concerned about the emergence of techniques that could render meaningless the use of checksums to mark applications as safe.
The issue concerns hash functions - one way mathematical functions that produce a small fixed length checksum or message digest from a much longer batch of code or email message. When two different input values produce the same output value this is called a collision.
Weaknesses in hashing algorithms, such as MD5, that allowed the discovery of collisions much more quickly than would be possible using brute-force attacks have been known about by cryptographic researchers for more than three years.
Previous techniques meant one type of junk message might be mistaken for another junk message, a weakness of interest to cryptographers but that carried little sting in practice. In addition, high speed computers were needed to discover collisions.
But a recent post on a full disclosure list explains a method to append a few thousand bytes to two arbitrary files such that both files have the same MD5 value. One of the arbitrary files might be malicious. Not only that but the researchers - Marc Stevens, Arjen K. Lenstra, and Benne de Weger - produced their proof-of-concept files using a single PC in less than two days.
Symantec reports that the approach threatens to undermine the use of hash functions to identify applications as safe (whitelisting). Malware authors might get harmless code, which generates the same MD5 output as a companion (malicious) app, whitelisted by submitting it to a classification server. Such a technique would clear the way to later distribute a companion malicious application that generates a MD5 result previously flagged as safe.
The approach is far from trivial but creates a means to smuggle malicious apps past whitelisting tools. Both the malicious and harmless apps might be digitally signed to make the malware look even more harmless.
"While what they have achieved is not the same as producing an identical MD5 for an existing file, it's still not a good thing. In particular it causes serious trouble for application white-listing implementations," Symantec notes.
Looking for extra bytes might be a common sense means of detecting the trick. But the extra bytes may look like compressed data in an installer application, or some kind of signature, so that approach to solving the problem is unreliable.
MD5 is not the only hashing function known to have cryptographic weaknesses. SHA-1 is also known to produce collisions and is thus potentially subject to the same kinds of trickery. The solution might be to move towards more robust hashing algorithms such as SHA-2, Symantec researcher Peter Ferrie concludes. ®