We're going to talk a little bit more about how our new Cryptam system detects malware documents based on identification of the encrypted executables. A document exploit needs to install an executable, those executables are usually either obfuscated and embedded in the document, or downloaded from a remote site.
APT type targeted email attacks, or "spear phishing" attacks, in our experience, most commonly embed the executable trojan within the document exploit. From plaintext, to obfuscation using 1-1024 byte XOR keys, counters, and Rol/Ror bit shifting are all commonly seen.
Common AV typically fails with detecting malware documents, as the exploit shellcode is usually heavily packed, and the XOR encryption creates a huge number of variants of potential signatures, so usually AV detection ends up being hash based and lags behind the attacks with new attacks getting only 10-20% detection on Virustotal.
Our Cryptam system uses the entropy of the file content to ignore legitimate content and focus on the higher entropy sections to statistically calculate the key used based on the position in the document and occurrences at that position, all in a single read pass, unlike brute force like other systems use, this method is extremely fast.
Cryptam can also be customized for both exploit/shellcode signatures in regex, as well as embedded executable signatures - Windows/Mac/Linux executable traits or library references (which are scanned for using the calculated XOR key and variations of ROL).
Example dispersion and key detection of 256 byte key, 00-FF:
Example dispersion and key detection of 256 byte key, FF-00:
Example dispersion and key detection of 256 byte key, algorithmic:
The above graphs show the highest occurrence characters over 1024 bytes in red, and the key, which is 256 bytes overlayed in black. In most cases the highest occurrences when there is a 256 byte key will be obvious with the key pattern repeating 4 times over 1024 bytes.
A clean document will appear as a lot of noise, randomly: