Automated Malware Analysis

Cuckoo Sandbox 2.0 Release Candidate 1

  • January 21, 2016
  • Claudio Guarnieri
/assets/images/blog/200rc1/6nl3pnz.jpg

The time has come for a new release of Cuckoo Sandbox, version 2.0 RC1. This release is just shy of 10 months since our 1.2 release, but the development for the 2.0 release had already started over one and a half year ago.

Because we consider Cuckoo Sandbox 2.0 to be our largest release yet, and because a number of features are still in an alpha or beta stage, we decided to initiate the release process with a Release Candidate, number 1. In practice this means that users will be seeing a couple more Release Candidates in the upcoming months before we hit 2.0 stable, and through this process we'll be able to identify and fix bugs, extend the existing features and complete the ones that have been left awaiting. In other words, we invite our users to check this version out, use it and test it, and help us getting closer to hit 2.0 faster. Please notice: as mentioned, a few features are incomplete, and some have broken in the process (e.g. web interface's seach), so be aware before deploying this version in any production environment.

TL;DR New?

In this blog post we will go through the details of some of the most interesting new additions to Cuckoo Sandbox 2.0-rc1, but for those who get bored quickly, here's a short list of what has been introduced in this release:

  • Monitoring 64-bit Windows applications and samples.
  • Mac OS X, Linux, and Android analysis support.
  • Integration with Suricata, Snort, and Moloch.
  • Interception and decryption of TLS/HTTPS traffic.
  • Per analysis network routing including VPN support.
  • Over 300 signatures for isolating and identifying malicious behavior.
  • Volatility baseline capture to highlight the changes during the analysis.
  • Extraction of URLs from process memory dumps.
  • Possibility to run extra services in separate VMs next to the analysis.
  • Maliciousness scoring - does this analysis show malicious behavior?
  • Many bug fixes, improvements, tweaks and automation improvements.

64-bit analysis on Windows

Using the new monitoring component that we have been developing for the past one and a half year it has been possible to analyze 64-bit samples and applications for quite a while. In fact, in the case of 64-bit Internet Explorer 8 on 64-bit Windows 7, one gets even more results when compared to the 32-bit version of Internet Explorer as the new monitor is intercepting a small set of HTML DOM and Javascript functions.

In the following image we are looking at a HTML document with the Javascript code as displayed embedded in a <script> tag. Right after evaluating the Javascript block we immediately see a call to CHyperlink_SetUrlComponent. Now this is a bit of a descriptive name, but it is the one that matches the name as per PDB symbols of mshtml.dll. As one might conclude from the Javascript, yes, this is the underlying call made when assigning a new hyperlink to an HTML a tag. Following right after we see CScriptElement_put_src, the method for updating the URL of a HTML script element. It should strike as fairly obvious that using this functionality we are able to see all interesting Javascript behavior as it runs dynamically and unpacks itself.

/assets/images/blog/200rc1/GS5rrJL.png

This is just one example of what the new analysis instrumentation can achieve. We will get more in depth on many of the new freatures introduced by it in following blog posts.

Mac OS X Analysis

As part of GSoC 2015 (Google Summer of Code) Dmitry Rodionov build a wonderful Mac OS X Analyzer for Cuckoo Sandbox. As OS X analysis depends on having a functional OS X virtual machine, you will either have to run Mac OS X as a host system, or alternatively use a Hackintosh VM. Please be aware that that might be a breach of Apple's Terms of Service. We take no responsibility.

The OS X Analyzer is based on DTrace, a powerful dynamic tracing framework built right into the OS X kernel, which is capable of tracing user-land processes as well as in-kernel activity DTrace comes with its own scripting language (which is basically a subset of C),and in order to facilitate the configuration process, the analyzer comes with a DTrace code generator based on a precompiled list of API of interest.

Linux Analysis

In the meantime Mark Schloesser has been focusing his efforts on providing Cuckoo with proper Linux analysis. Using a couple of slick SystemTap scripts Cuckoo has learned how to properly analyze the latest samples that were dropped as part of Shellshock and ElasticSearch exploit rounds.

In theory Linux analysis is pretty simple - just trace syscalls executed by the target binary and its child processes. There are a few existing projects such as Sysdig, LTTNG and SystemTap that allow us to do this and they mostly make use of kernel mainline tracing subsystems in order to monitor the kernel. Sadly we start to run into issues when we want to cover multiple architectures. Some approach works on x86, some on x64, some on both. It's an even bigger problem when you extend to ARM, MIPS and other platforms. In addition some malware requires a specific environment, for example when they target embedded devices. We looked at malware that needs an OpenWRT environment and were able to prepare that in Cuckoo and analyze the malware.

In the end the current Linux analyzer now uses SystemTap, which is not our most favorite design, but it worked relatively well across all platforms. In order to run the non-native platforms we implemented a QEMU machinery module, but the x86/x64 analysis can also be done with VirtualBox / KVM / etc. The VM needs to run our Python agent as always and for system call traces it would need SystemTap or at least the "staprun" tool together with a precompiled SystemTap kernel module. The analyzer can also fall back to strace, but that has shown to lose track of child processes and we also did not implement a parser for its output. For SystemTap traces we parse the output and thus can display it exactly like the Windows API logs in the webinterface.

There are quite a few areas that could be improved about this Linux implementation - but it's simple and works for most of the samples we looked at.

Android Analysis

Now covering most major platforms for analysis naturally Android could not stay behind. Thanks to a lot of work from Idan Revivo the Cuckoo team has been able to integrate Android analysis. Idan still actively maintains his original version of Cuckoo with Android analysis, also known as Cuckoo Droid, adding new signatures for interesting samples as new malicious Android samples are found.

Cuckoo Droid is based on running the Android emulator through adb (and therefore also supports running analyses on actual/native Android devices!) and intercepts behavior from samples by hooking into the Dalvik/Java runtime. To perform this interception a monitor, Droidmon, has been developed which, through the usage of the eXposed framework is loaded into every new Application, where it will overwrite and log various Java functions.

Quite a few functions are intercepted and more can be added by simply adding some new entries to the following JSON file.

Integration with PCAP analysis tools

As a majority of our users is especially interested in generated network traffic (think CERT/IR teams) we could not miss out on the opportunity of integrating Suricata, Snort, and Moloch for PCAP analysis. (Note that Will Metcalf had already integrated Suricata and Moloch support in his Cuckoo fork a while back, but here we are as well).

Following we see the Suricata output in Cuckoo from a PCAP that we have imported manually from Malware Analysis Traffic. As can be seen there are a couple of Exploit Kit related alerts.

Now in order to determine whether we have seen any of these Suricata Signatures (note that SID is the Suricata ID for the rule that matched), IP addresses, or domain names (if you go their respective tab), we can simply click on their hyperlink which will take us to the Moloch web interface where Cuckoo will automatically perform a query so to only match the exact criteria we are interested in.

Within seconds one will be able to see other Cuckoo analyses which matched the given IP address / domain name / SID / etc. Note that Moloch is not only able to process PCAP files but can also be used to capture the traffic of an entire company (which is actually its main purpose), so the searching capabilities with Moloch are endless. As one community project to another we also took the opportunity of reporting a remote stack buffer overflow, a couple of cross side scripting vulnerabilities, and some out-of-bounds read crashes in the Moloch project, improving the stability of Moloch and thus also the Cuckoo users who use Moloch, more information can be found in this commit.

We would show Snort output as well, but unlike Suricata, where you can quickly analyze a PCAP through their unix socket support, it is required to run Snort separately for each PCAP analysis, making it a CPU intensive process to do so (taking up to 30 seconds for one processor at 100% usage - for your information this is more than the CPU performance required for doing the actual analysis in a VM).

Finally it should be noted that the usability of both Suricata and Snort is based entirely on their ruleset. Fortunately Emerging Threats (in their signatures referred to as ET as can be seen in the Suricata screenshot) provides tens of thousands of rules for free. Many of these do not really apply for our use-case, but there is definitely a gold mine of free information up for grabs that we take advantage of here :-)

HTTP/HTTPS Decryption and Parsing

Continuing further on the network part of this blogpost there have been quite some interesting developments regarding HTTP/HTTPS traffic. Namely, as Cuckoo has been doing for over half a year now, it is able to extract TLS Master Secrets. Put in layman terms; by intercepting the encryption keys for TLS traffic we are effectively able to decrypt HTTPS traffic. As can be read in the TLS Master Secrets blogpost a file called tlsmaster.txt will be created for each analysis which, when loaded with its associated PCAP in Wireshark, will decrypt HTTPS traffic.

However, decrypting traffic with Wireshark is one thing, but being able to manually extract and show HTTP and HTTPS streams from a PCAP file in the Cuckoo web interface is one another. In order to do so we have created a new library called HTTPReplay. With this library enabled (i.e., installed) looking at HTTPS traffic is as simple as going to the HTTP/HTTPS tab. Following a screenshot of a local Dutch bank, ING. (Keep in mind that even without HTTPS decryption enabled we would still see all HTML DOM and Javascript events come by as outlined in the 64-bit analysis paragraph).

One notable fact about our approach is that it intercepts TLS in a transparent way. We do not require to install a certificate on the VM in order to decrypt traffic - we only require the ability to extract the TLS Master Secrets and obviously we require the PCAP file which we cross-reference with the encryption keys. HTTPS decryption in Cuckoo Sandbox works with Certificate Pinning enabled applications. (Just as with any other generic approach it does not support TLS interception of applications that ship their own SSL/TLS library statically).

/assets/images/blog/200rc1/gXF2Orf.png

To be fair the name of the library does not really match up to its current usage, but one of its ultimate goals is to seed it with PCAP files (if required including TLS Master Secrets and maybe even the PRNG state of say, Javascripts Math.random() functionality) and turn it into a web server that perfectly replays the traffic as seen in the PCAP. This could in turn be used for unittesting exploit attempts against different versions of different browsers etc.

Per-Analysis Network Routing

After our users have been struggling with network routing in Cuckoo for about five years now it was time for us to step up our game. Easily said and with some help from Erik Kooijstra and n3sfox we made quick progress. Through a couple of simple configuration options one can define a default dirty line, one or more VPNs, and in a next release it will be possible to route to services such as FakeNet and InetSim as well. Keep in mind that this functionality is currently only supported on Ubuntu/Debian and that it is required to run an extra script shipped with Cuckoo as root - this script will run specific commands as commanded by Cuckoo (all of this so Cuckoo can be ran as non-root as we recommend).

While late in delivery to you all, Christmas Doge brought us this nifty new feature:

/assets/images/blog/200rc1/twvA9JT.png

Over 300 Cuckoo Signatures

While in the process of getting more accurate and actionable data we have also been putting in a fair bit of work on improving and adding new Signatures. With a special thanks to RedSocks, who contributed over 200 new Signatures, we are now running over 300 Signatures on each analysis.

You can download them with:

cuckoo $ /utils/community.py -waf

In addition to that we have implemented a basic maliciousness score to each report to quantify an average level of suspiciousness derived from the identified patterns in the available signatures. Do note that a low score does not indicate a benign sample per se, but that a higher score definitely does indicate potential malware. In fact, from our perspective, a malicious sample with a low score is more interesting than a sample with score 6 to 10, as we know right away that is malicious.

In the following screenshots we are looking at a Poweliks sample - a sample performing so well in our sandbox that it scores more than 10 out of 10 points.

/assets/images/blog/200rc1/i9FIILa.png /assets/images/blog/200rc1/GAeKHUW.png

Volatility Baseline Capture

Our list of TODO items is virtually never ending, but usually when users or potential contributors reach out with specific feature requests or suggestions, we try to prioritize them. Thanks to Bart Mauritz and Joshua Beens, who made a Proof of Concept on the creation of baseline captures for each VM, Cuckoo is now able to differentiate between the Volatility results captured after an analysis and the Volatility results captured without any analysis at all.

The baseline processing module will pretty much subtract the two different Volatility result sets from each other resulting in a quick overview of the new Volatility results after the analysis and results that are no longer present after the analysis. As an example, one will be quickly able to see which services got stopped, which kernel drivers were added, etc.

In the following screenshot we are looking at the baseline difference of an analysis to http://www.google.com/. Now there is nothing special about that, but it does show that some random Yahoo-related processes disappeared, some other random search processes were started, and that pythonw.exe was started as well. This last pythonw is started - and running until the end - in order to guide the analysis as it progresses. More notably there are no new Internet Explorer processes in the difference, this can be explained by the fact that Internet Explorer was closed/terminated before the end of the analysis, and thus does not show up in the Volatility results.

/assets/images/blog/200rc1/UZvCeai.png

URL Extraction from Process Memory Dumps

For a while we have supported Yara rules - and we are still supporting Yara rules. But sometimes it is good enough to simply extract URLs from a memory dump, some dropped files, or simply the submitted binary. This has already helped us facilitate and systematize the extraction of Command & Control information from a number of malware families. You can expect some more automation on this regard coming in future releases.

Extra Services as part of an Analysis

To conclude this features preview, we come to one that has been long requested by many of our users. In some circumstances, especially in the case of malware designed to spread and target corporate networks, the sample might attempt to scan, identify, and spread through additional servers available in the local network. Or for example, it might try to access and collect resources from nearby services. From this version, Cuckoo is able to run one or more Virtual Machines next to your standard analysis Virtual Machine in order to mimic a somewhat more realistic and juicy environment.

This functionality is in a very primitive stage, but we are looking forward to supporting some more realistic honeypot scenarios. At the moment it is be possible to start one or more VMs to host services such as vulnerable HTTP, SMTP, and FTP servers, but in the future we are looking to properly support Active Directory servers, in order to replicate a realistic corporate environment.

Conclusion

The length of this blog post is just a reflection of the size of the upcoming Cuckoo Sandbox 2.0 release. We are very excited about it, we invested a lot of time and effort in bringing it to you, and we are hopeful that you will welcome these recent developments with as much excitement.

We are looking forward to your feedback, bug finds, features requests, and we welcome everybody in our IRC channel (#cuckoosandbox on FreeNode) to discuss with us about the future of this project.

In case you missed it, we also launched our new Community platform. We completely replaced the previous site with a new software, therefore the old content is momentarily missing. We will try to migrate it and restore it in the future.

For the moment, that is all from us. We hope you will enjoy this release.

  • January 21, 2016
  • Claudio Guarnieri

Cuckoo Sandbox 2.0 -