Tuesday, March 15, 2011

Automatically Generating Memory Forensic Tools

Now that the IEEE Symposium on Security and Privacy program has finally been posted, I can describe some research I've been working on for the past year and a half related to virtual machine introspection (VMI) and memory forensics.

A well-known problem with VMI and memory forensics is the semantic gap -- basically, the kind of information you want out of a memory image or a running VM is high level information (what processes are running, what files are open, and so on) but what you get is a big bunch of uninterpreted bytes (i.e., a view of physical memory). Bridging this gap is what tools like Volatility were built to do, and they do it well.

However, building a tool like Volatility takes a lot of work and a lot of knowledge about the internals of the operating system you're trying to examine. With operating systems like Windows, which are closed source, this kind of knowledge comes from things like the Windows Internals book, blog posts, and good old fashioned reverse engineering. This takes a lot of time, and the process has to be repeated every time there's a new version of Windows or a new operating system you want to support. Volatility's next release will support Vista and Windows 7, but it hasn't been easy – the networking code, for example, was rewritten for Vista, which required some reverse engineering by MHL and a new plugin.

Is there an easier way? What we want, in an ideal world, is some way that we can generate some of these tools automatically, for any OS or version. That's the problem that we set out to solve, and it's one that I think we made some good progress on -- though as with any academic work, there's still lots of room for improvement :)

The basic idea is that many of the tools we want to run on a memory image could be easily coded if we had access to the native APIs on the system – for example, we could easily write something similar to pslist if we had access to the Windows API by doing something like:


Our system, which we call Virtuoso, takes advantage of this fact. We take small programs like the one shown above and run them inside a virtual machine that logs every instruction they execute, both in user-mode and in the kernel. From these logs, we can then automatically generate Volatility plugins that do the same thing. Of course, I'm omitting a lot of technical detail here – there's a lot of work that needs to be done to clean up the logs, cut out irrelevant parts of the computation, and reconstitute the logs back into something that resembles a program – but that's the core idea.

In our paper, we show off our technique by automatically generating 6 different programs on Linux, Windows, and Haiku. These programs do things like list the PIDs of currently running processes, enumerate loaded kernel modules, and retrieve the executable name for a given PID, and didn't require any special knowledge to create: we just looked up the API functions that did what we wanted and wrote small programs like the one shown above, then let Virtuoso do the hard work of creating a Volatility plugin.

In future posts, I'll go deeper into the technical methods used to achieve this. I'll also post the paper itself once the conference happens (after all, I have to give people some reason to come and see the talk ;) ). And finally, I'm hoping to release the code itself, once I get approval from the people that funded the research. For now, I'm going to employ a tactic known as "proof by screenshot", showing the steps involved in creating a plugin to list the PIDs of running proceses under Haiku. (Click any of the screenshots to see a larger version.)

First we write a program that uses the Haiku API to get a list of running processes. We annotate the program with some markers that tell our logging engine where to start and stop the trace, and what the inputs and outputs are (the calls to vm_mark_buf_{in,out}):

We now compile and run that program inside a virtual machine running Haiku, and log what computation it does:


Next, we run our analyzer on it, which does its magic and produces a plugin for Volatility:


Finally, we can run that plugin within Volatility to analyze a Haiku memory image:


To wrap things, up, I want to thank my co-authors Tim Leek, Michael Zhivich, Jonathon Giffin, and Wenke Lee. It's been a long road, but I'm hoping this research will make it a lot easier to build exciting new security tools for VMI and memory forensics!