Tuesday, September 6, 2011

What I Did on My Summer Vacation

Over the summer I worked at Microsoft Research, which has a fantastically smart bunch of people working on really cool and interesting problems. I just noticed that they've posted the video of my end-of-internship talk, Monitoring Untrusted Modern Applications with Collective Record and Replay. Please take a look if you're curious about what it might look like to try and monitor mobile apps in the wild with low overhead!

Saturday, May 28, 2011

Paper and Slides Available for "Virtuoso: Narrowing the Semantic Gap in Virtual Machine Introspection"

I've recently returned from Oakland, CA, where the 25 IEEE Symposium on Security and Privacy was held. There were a lot of excellent talks, and it was great to catch up with others in the security community. Now that the conference is over, I'm happy to release the paper and slides of our work, "Virtuoso: Narrowing the Semantic Gap in Virtual Machine Introspection", which I have described in an earlier post.

The slides contain some animations, and so I've made them available in three formats:
You can also get a copy of the full paper here. I'm also hoping to have the source ready for release soon; when it is available, you'll be able to find it on Google Code under the name Virtuoso.

Once again, thanks to my most excellent co-authors at MIT Lincoln Labs and Georgia Tech for helping me see this project through!

Wednesday, April 6, 2011

Applying Forensic Tools to Virtual Machine Introspection

I've just released a technical report summarizing some work I did a couple years ago that explores how forensic memory analysis and virtual machine introspection are closely linked.

Abstract: Virtual machine introspection (VMI) has formed the basis of a number of novel approaches to security in recent years. Although the isolation provided by a virtualized environment provides improved security, software that makes use of VMI must overcome the semantic gap, reconstructing high-level state information from low-level data sources such as physical memory. The digital forensics community has likewise grappled with semantic gap problems in the field of forensic memory analysis (FMA), which seeks to extract forensically relevant information from dumps of physical memory. In this paper, we will show that work done by the forensic community is directly applicable to the VMI problem, and that by providing an interface between the two worlds, the difficulty of developing new virtualization security solutions can be significantly reduced.

You can read the full paper on SMARTech. Hopefully this will encourage others to start using great memory analysis tools like Volatility for live analysis of virtual machines!

Tuesday, March 15, 2011

Automatically Generating Memory Forensic Tools

Now that the IEEE Symposium on Security and Privacy program has finally been posted, I can describe some research I've been working on for the past year and a half related to virtual machine introspection (VMI) and memory forensics.

A well-known problem with VMI and memory forensics is the semantic gap -- basically, the kind of information you want out of a memory image or a running VM is high level information (what processes are running, what files are open, and so on) but what you get is a big bunch of uninterpreted bytes (i.e., a view of physical memory). Bridging this gap is what tools like Volatility were built to do, and they do it well.

However, building a tool like Volatility takes a lot of work and a lot of knowledge about the internals of the operating system you're trying to examine. With operating systems like Windows, which are closed source, this kind of knowledge comes from things like the Windows Internals book, blog posts, and good old fashioned reverse engineering. This takes a lot of time, and the process has to be repeated every time there's a new version of Windows or a new operating system you want to support. Volatility's next release will support Vista and Windows 7, but it hasn't been easy – the networking code, for example, was rewritten for Vista, which required some reverse engineering by MHL and a new plugin.

Is there an easier way? What we want, in an ideal world, is some way that we can generate some of these tools automatically, for any OS or version. That's the problem that we set out to solve, and it's one that I think we made some good progress on -- though as with any academic work, there's still lots of room for improvement :)

The basic idea is that many of the tools we want to run on a memory image could be easily coded if we had access to the native APIs on the system – for example, we could easily write something similar to pslist if we had access to the Windows API by doing something like:


Our system, which we call Virtuoso, takes advantage of this fact. We take small programs like the one shown above and run them inside a virtual machine that logs every instruction they execute, both in user-mode and in the kernel. From these logs, we can then automatically generate Volatility plugins that do the same thing. Of course, I'm omitting a lot of technical detail here – there's a lot of work that needs to be done to clean up the logs, cut out irrelevant parts of the computation, and reconstitute the logs back into something that resembles a program – but that's the core idea.

In our paper, we show off our technique by automatically generating 6 different programs on Linux, Windows, and Haiku. These programs do things like list the PIDs of currently running processes, enumerate loaded kernel modules, and retrieve the executable name for a given PID, and didn't require any special knowledge to create: we just looked up the API functions that did what we wanted and wrote small programs like the one shown above, then let Virtuoso do the hard work of creating a Volatility plugin.

In future posts, I'll go deeper into the technical methods used to achieve this. I'll also post the paper itself once the conference happens (after all, I have to give people some reason to come and see the talk ;) ). And finally, I'm hoping to release the code itself, once I get approval from the people that funded the research. For now, I'm going to employ a tactic known as "proof by screenshot", showing the steps involved in creating a plugin to list the PIDs of running proceses under Haiku. (Click any of the screenshots to see a larger version.)

First we write a program that uses the Haiku API to get a list of running processes. We annotate the program with some markers that tell our logging engine where to start and stop the trace, and what the inputs and outputs are (the calls to vm_mark_buf_{in,out}):

We now compile and run that program inside a virtual machine running Haiku, and log what computation it does:


Next, we run our analyzer on it, which does its magic and produces a plugin for Volatility:


Finally, we can run that plugin within Volatility to analyze a Haiku memory image:


To wrap things, up, I want to thank my co-authors Tim Leek, Michael Zhivich, Jonathon Giffin, and Wenke Lee. It's been a long road, but I'm hoping this research will make it a lot easier to build exciting new security tools for VMI and memory forensics!