Tuesday, June 15, 2010

Antivirus Struggling with Obfuscated JavaScript

As part of our offline research, we regularly test various desktop antivirus (AV) solutions to determine how effective they are at catching web based threats. One segment where I feel that AV has struggled, is with the identification of malicious content when analyzing obfuscated JavaScript. While obfuscated JavaScript can be an indication of malicious content, such as injected IFRAME attacks, the technique is regularly used by legitimate sites. Most notably, we see heavy usage of JavaScript obfuscation among online advertising vendors. JavaScript obfuscation is used by legitimate sites for two reasons. First, code may be obfuscated in an effort to limit the size of the code by removing whitespace and changing variable names in order to make it more efficient. Secondly, sites leverage obfuscation in order to 'protect' code by making it harder to understand and therefore copy. This latter motivation is of minimal value however as client side content can always be de-obfuscated given a basic level of effort. After all, the browser needs to interpret the code at some point.

JavaScript obfuscation presents a challenge for AV vendors. Despite what marketing literature would suggest, detecting malicious content still heavily relies on the use static signatures. Obfuscation is problematic as signatures cannot detect what they cannot see. This leaves two options. The AV engine can either first de-obfuscate the JavaScript or signatures can be created for obfuscated content. The latter is problematic as even a slight change in the content or encoding algorithm can lead to vastly different output, while de-obfuscation is an imperfect science as those who have used tools such as Malzilla can attest to.

I ran across an obfuscated JavaScript sample recently which illustrates this challenge. The code in question is perfectly legitimate and resides at About.com, a popular website which delivers information on a variety of topics. On a page entitled APR calculator, when looking at the page source, you will note a large block of obfuscated JavaScript within the section of the page.

<script language="javascript">document.write(unescape('%3C%73%63%72%69%70%74%20%6C%61%6E%67%75%61%67%65%3D%22%6A...

At first glance, this appears to be a classic case of malicious content injected into an otherwise legitimate page. It's essentially a large block of hexadecimal encoded characters and it requires a couple of passes to fully decode everything. However, once de-obfuscation is complete, you uncover JavaScript code designed to calculate your mortgage payments...not attack your browser. For those interested in seeing the code in all it's glory, please follow the links below.

Obfuscated/De-obfuscated JavaScript files
Despite the benign nature of this Javascript code, at the time this blog was published, no fewer than 18 of 41 antivirus engines flagged this code as malicious. What does this tell us about how the engines operate? Clearly they are inspecting the obfuscated code as opposed to the de-obfuscated code and flagging based on the existence of certain functions/data. The problem with this approach - while JavaScript obfuscation is used by the bad guys, it's used by the good guys as well. Identifying malicious content by statically inspecting obfuscated JavaScript is overly simplistic and as can be seen, is bound to trigger false positives.

- michael

1 comment:

Anonymous said...

Hi Michael.

Very interesting post!

I have followed your steps using Malzilla but i'm lost at the second step... How do you deofuscate the string "dF('%264Dtdsjqu%2631%2631uzqf......"
If i choose [Decode Hex %] in Malzilla the result it's not good.

Thx