Thursday, October 4, 2007

The Types Stream

Over the past few weeks, I've been continuing to investigate the structure of the Types stream (stream 2) in Microsoft PDB files with the help of Sven Schreiber's PDB parsing code. Some issues with getting approval to publish research came up at work, but I think they're mostly ironed out now, so I'm going to devote this entry to going through some of the trickier bits involved in parsing the Types stream. Some code also accompanies this entry: a python script to parse and print out the types contained in a stream. It works on streams that have alrady been extracted from a PDB file (see this earlier entry); if you don't have one around you can try it out on the Types stream from ntoskrnl.exe on Windows XP SP2.


The Type Stream Header


The types stream begins with a header that gives a few pieces of useful information. The first dword represents the version number, of the PDB file, and is generally determined by the version of the compiler that created the PDB file. For XP, the version is 19990903, and for Vista 20040203 (note that the numbers can be read as dates that approximately line up to the release of Visual Studio used to create them).


The next word gives the size of the header; I have not yet seen a case where this was anything other than 0x38 bytes, but in theory this could allow for extensions to the header without breaking backwards compatibility with older parsers (they could just skip any extra header data). After the header size, there are two dwords, tiMin and tiMax, that give the numerical index of the first and last type listed in the file.


These type indices are important to understanding the structure of the file format. The first type in a file will be numbered tiMin, the next tiMin+1, and so on up to tiMax. In addition, many types will refer to other types in the same file using their type index; for example, if _MMVAD is type 0x1002, a pointer to an _MMVAD would be a pointer type with its utype (underlying type) field set to 0x1002.


Finally, the header contains a dword at offset 0x0E that gives the size in bytes of the data that follows the header. The size of the whole stream should be equal to the header size plus the size of the following data. This can be used as a crude sanity check to verify file integrity.


There is an additional structure present in the header aside from the ones already mentioned, which Schreiber calls the tpiHash. Unfortunately, I have not been able to determine what purpose the values in it serve; although the names seem to refer to some sort of hash, I have not found such a structure in the type streams available to me.


Overall Structure


The structure of the file is quite simple: after the header, we find a sequence of type structures, referred to in the CodeView documentation (more on this later) and Schreiber's code as leaf types. Each type has a size and a type, both words (in this case it could be seen as a metatype, as it is the type of that type--e.g., a structure, a field list, etc). It is imporant to note that the size does not include the size field itself, so the size of the entire structure is actually two bytes more than listed by the size field (this tripped me up for a little while).


The type field indicates how the leaf type is intended to be parsed. The constants have already been kindly defined by Schreiber in sbs_sdk/include/pdb_info.h in the win_pdbx sources. Likewise, most of the leaf types themselves are defined in that same file, as C structures. The structures appear to be extremely similar to the CodeView format documented in an early format specification from Microsoft, though there appear to have been some changes made to better accomodate 32-bit architectures (many words changed to dwords, and so on). However, certain details of how they are to be parsed are ambiguous, and one must look at how Schreiber actually uses them in his code to see how they should be interpreted. For example, many of the structures end with a char data[] field, which, upon investigation, actually contains a combination of the structure's name and a numeric value that means different things depending on the leaf type.


In the following sections, I will describe those portions of the file format that seemed obscure or were difficult to figure out just by looking at the data structures defined in Schreiber's code. Due to limited space and reader interest, I will not describe each leaf type in full; those interested in the details can check out my code or look at Schreiber's win_pdbx.


Padding Scheme


Based on examination of PDB files from Windows XP SP2 and Vista, structures in the types stream are aligned on 32-bit boundaries. This means that if a structure does not align properly, it will be padded out to the appropriate size. However, in the CodeView format, the padding is not junk data or nulls, but takes a particular form: each pad byte inserted looks like (0xF0 | [number of bytes until next structure]). Less formally, the upper four bits are all set, and the lower four give the number of bytes to skip until the next structure. This results in patterns that look like "?? F3 F2 F1 [next structure]".


I am not entirely clear on the reasoning behind this padding scheme. One possibility is that it allows alignment to multiple different boundaries without requiring changes to the parser: rather than rounding up to the next aligned boundary, a parser can just read a single byte to determine how far away the next structure is. Dealing with padding is not strictly necessary, though; Schreiber's parser simply rounds up to the nearest 4-byte boundary. This scheme does impose an interesting constraint, though: because bytes with values greater than 0xF0 are defined to be padding, no leaf type can have a type code that contains a byte greater than 0xF0.


Field Lists


Structures, bitfields, argument lists, and so on all need a way to refer to a list of types that they contain; for example, an _EPROCESS structure will contain a _KPROCESS, _DISPATCHER_HEADER, and so on. To deal with this, types that refer to other types will have a field that gives the type index of their corresponding field list. Fieldlists (leaf type 0x1203), in turn, have a very simple structure: after the standard size and type fields, the body of the structure is made up of an arbitrary number of leaf types of type LF_MEMBER, LF_ENUMERATE, LF_BCLASS, LF_VFUNCTAB, LF_ONEMETHOD, LF_METHOD, or LF_NESTTYPE. This is somewhat annoying to parse, because the number of substructures is not known in advance, and so the only way to know when field list is finished is to see how many bytes have been parsed and compare it to the size of the overall structure.


Values


As mentioned before, many of the structures Schreiber documents end with char data[]. This field is parsed as follows:


  • If the value of the first word is less than LF_NUMERIC (0x8000), the value data is just the value of that word. The name begins at data[2] and is a C string.

  • Otherwise, the word refers to the type of the value data, one of LF_CHAR, LF_SHORT, LF_USHORT, LF_LONG, or LF_ULONG. Then comes the actual value, and then the name as a C string. The length of the value data is determined by the value type--one byte for LF_CHAR, 2 for LF_SHORT, and so on.


The actual meaning of the value data depends on the leaf type it's embedded in. For example, for LF_STRUCTURE types, it refers to the size of the overall structure, while for LF_ENUMERATEs, it gives the value of the enum.


Forward References


As you parse through types, you will find that almost every structure appears more than once in the stream: first as an entry claiming to have zero members and no corresponding field list, and then as a much more normal looking structure with the appropriate number of members and a valid reference to a field list. The reason for this is that as the compiler generates the debugging symbols, it may come across names for structures that it does not yet know the contents of; in this case, it creates a forward reference by creating an empty structure in the types stream and then setting the "fwdref" bit in its attributes (bit 7 of the word at offset 0x06 of an LF_STRUCTURE).


Further Reading


The information already presented here, combined with Schreiber's already available win_pdbx source code, should be enough to build a parser for PDB type information. Specifically, the following files in Schreiber's distribution were most helpful to me:


  • sbs_sdk/include/pdb_info.h (constants and structure definitions for leaf types)

  • Program Files/DevStudio/MyProjects/sbs_pdb/sbs_pdb.h (header definitions for PDB files)

  • Program Files/DevStudio/MyProjects/win_pdbx/win_pdbx.c (the actual code that does the parsing using the structures defined in those first two files)


In addition, the CodeView documentation, while quite outdated, is extremely useful for getting an idea of how the format works, and most of the strucutres are still valid with a few tweaks to support 32-bit architectures (mostly changing words to dwords in some places). The documentation also covers the format of the debug stream, which Schreiber's code does not handle. This should make the development of a parser for the debug stream (coming soon!) much easier.


Code


Just to show that this all actually works, I have made available a preliminary parser that prints out parsed type information. I have also almost finished a more general parser that uses Construct, a Python library for parsing low-level binary data (though this description does not do it justice, and I will be writing more about it when I release the updated parser).

Tuesday, September 4, 2007

Administrativa

As I've mentioned to a few people, the past two posts have been the start of what I hope will become a trend: posting at least one piece of new, technically interesting content per week. With any luck, new posts will come out on Mondays (I cheated a bit this week and took Labor Day off).

Topics covered will center around memory analysis, as this is my primary area of research, but I have a few posts planned on topics like network-based exploitation and cryptography.

I hope any readers of this blog will enjoy the sudden influx of new content; don't hesitate to contact me with comments, criticisms, or ideas for new posts.

Challenges in Carving Registry Hives from Memory

As I mentioned last week, I moved to a new apartment this week, and as a result I didn't have a lot of time to do any serious work. Still, I didn't want to let the entire week go to waste, so I decided to try and tackle a problem that I thought would be relatively simple: extracting a copy of the binary registry hives out of memory. As it turned out, this was actually a bit more difficult than I expected, and I'll have to get back to the problem at a later date, but I thought in the meantime I'd write about what steps I took, where I ran into trouble, and describe the approach I hope to take when I revisit the issue.

There were several reasons to suspect that one might find at least partial copies of the registry in memory: the registry stores all of the configuration data for the Windows operating system, and its contents are referred to and updated quite often during normal operation (let Sysinternals' Regmon run for few minutes and look at the output if you have doubts about this). In addition, it seemed to me that the binary structure of the registry hives was in many ways well-suited to an in-memory representation: the size of a block in a registry file, 0x1000 bytes (4096 bytes, or 4KB) is the same as the size of a page in memory, and the various data structures present in registry hives have a clear interpretation as C structures in memory.

Indeed, it appears that the entire registry is stored in memory, at least in Windows 2000. Quoting Russinovich and Solomon's excellent Windows Internals:

Windows 2000 keeps a version of every hive in the kernel's address space. When a hive initializes, the configuration manager determines the size of the hive file, allocates enough memory from the kernel's paged pool to store it, and reads the hive file into memory. [...]

In Windows XP and Windows Server 2003, the configuration manager maps portions of a hive into memory as it needs to access them. It uses the cache manager's file mapping functions to map in 16-KB views into the hive files.

Windows Internals, p. 203

For simplicity's sake, I'm not going to deal with XP and Server 2003 in this entry; it seems unlikely that very much of the registry will be recoverable under those OSes, given the mapping mechanism described above. Instead, we'll look at recovering registry hives from Windows 2000, specifically the two memory images released for the DFRWS 2005 Memory Analysis Challenge.

As mentioned earlier, registry hives are divided into fixed blocks of size 0x1000 bytes; the first of these blocks is called the base block, and it begins with the signature "regf". The base block also contains four dwords giving the version of the hive structure in use (at offsets 0x14, 0x18, 0x1C, and 0x20), and in Windows 2000 these values are always 1, 3, 0, and 1. These two facts together make for an excellent signature to search for the start of a hive in a Windows 2000 memory dump. Using XMagic:

# XMagic

# Windows Registry files.
# updated by Joerg Jenderek
0 string = regf Windows 2000 registry file
# Reg version
<0x14 lelong = 1
<<0x18 lelong = 3
<<<0x1c lelong = 0
<<<<0x20 lelong = 1
Using this signature in conjunction with FTimes finds four hive headers in dfrws2005-physical-memory1.dmp, and ten in dfrws2005-physical-memory2.dmp. Going to the offests given in the memory dump and examining the results, there do not appear to be any false positives. The fact that there are so many more results from the second memory dump is likely due to the fact that the hives reside in the kernel's paged pool; this means that they can be swapped out to disk. Since the second image was taken right after the system was booted (according to the challenge details), it seems likely that some of the hive headers in the first image were paged out.

My first, naive attempt to carve out the files was to simply calculate the size of each hive and then use dd to carve out that much data from the offset given by FTimes. The size of the hive can be easily calculated by looking at offset 0x28 in the hive header; the dword at this position gives the offset of the last block in the registry hive. Since each block in the hive is 0x1000 bytes, the size of the hive as a whole should be the offset of the last block plus 0x1000. This method of extracting the registry fails, however – although it is reasonable to expect that the registry is contiguous in memory, this assumption would only hold for virtual memory, and our carving method is trying to extract a contiguous range from physical memory.

Clearly, then, what's needed is the virtual address of the start of the hive in memory. To obtain this, I decided to generate a map of each process's address space, showing each virtual address and its corresponding physical address (if any). A good tool to do this is Andreas Schuster's memdump.pl; I wrote my own version of this in Python, drawing on the address translation routines from x86.py in Volatility (you'll need to copy in the "forensics" directory from Volatility into wherever you save memdump.py in order for it to work). Both memdump.pl and memdump.py take a memory dump and the address of a page directory and generate a list of all offsets by trying to translate each possible virtual address from 0x00000000 up to 0xFFFFFFFF using the given page directory. (Note: PTFinder or my own XMagic signatures can be used to locate processes and their page directories in a memory dump.)

Once such a map has been generated, we find that physical offset 0x01614000 (which appeared to contain the SYSTEM hive) is mapped in at 0xE1012000 in the virtual address space of every process (addresses above 0x80000000 are generally reserved for the kernel and are the same for every process). I wrote a small program to extract a range of virtual memory from an image, given a page directory, a start address, and a length. (Note: this time I've used my own memutil.py to do the address translations; it was easier to modify to continue despite invalid pages and to give verbose debug output.) After invoking it like so:

./regdump.py dfrws2005-physical-memory1.dmp \
0x00030000 0xE1012000 0x28A000 > test.sys
we get something that we might hope is the complete SYSTEM hive for the DFRWS 2005 test system (modulo any memory pages that were swapped out).

Alas, we are not so lucky. Trying to validate this structure by using a small C program I had around that attempts to walk the tree of registry keys and print their names caused it to die after reading only a small portion of the file. An bit of manual examination of the file we have shows why: it appears that for some reason parts of the file are no longer in the correct position. We can tell this because registry hives store their data in bins, and each bin has a header with a signature ("hbin"), its offset from the first hbin block (i.e., its position in the file), and its size. We immediately notice that at offset 0x4000, there is a block that claims its offset is 0, meaning it should be found at file position 0x1000.

Putting aside for now the question of why the hbin blocks are not in the order we expected, could we perhaps get the file to validate by taking each block and placing it in its correct position? To do this, we scan the file on block boundaries for the "hbin" header, read in its offset, and then write it out at the correct position. The program reorder_hbins.py does just this: it takes a list of hbin offsets on standard input (such a list can be trivially generated with FTimes in dig mode) and a file containing the registry data, and writes out a new version with the hbins in their "correct" places.

With this modification, the registry hive comes closer to being parsable – regview.c gives the following output:

$$$PROTO.HIV
ControlSet001
Control
Arbiters
AllocationOrder
ReservedResources
BackupRestore
FilesNotToBackup
KeysNotToRestore
Biosinfo
BootVerificationProgram
Class
{018713C0-14ED-11D2-88FC-0000F49094C7}
Fatal: encountered unknown subkey type at 00277dc0
However, at this point, if we look at offset 0x00277dc0, we find that it is in the middle of a page of zeroes. It seems that there was no hbin that claimed to be at that position in the original section of memory we carved out.

There could be several reasons for this. First, it is possible that the hbin we're looking for is simply paged out; if this is the case, no amount of searching through the memory image will give us the block we need. To fully recover the registry we would need the pagefile from the system, and since the challenge is now two years old, we are unlikely to get a copy of it. Still, we could take the following approach to try and at least partially reconstruct the tree: now that the hbin blocks appear to be in the correct positions, we can scan through to find individual nodes (so-called nk cells), and then reconstruct subtrees from each of those until we run into trouble again. This might at least allow us to reconstruct interesting information such as the BootKey used in SYSKEY mechanism, which we could then use to decrypt the local password hashes for the system from the SAM hive (for more details on how to extract the necessary information from SYSKEY, take a look at the bkhive and samdump2 programs by Nicola Cuomo).

The other possible explanation for the missing block can be found by again consulting Windows Internals – which also may help explain why some of the blocks were in the wrong position.

If hives never grew, the configuration manager could perform all its registry management on the in-memory version of a hive as if the hive were a file. Given a cell index, the configuration manager could calculate the location in memory of a cell by adding the cell index, which is a hive file offset, to the base of the in-memory hive image. [...] Unfortunately, hives grow as they take on new keys and values, which mans the system must allocate paged pool memory to store the new bins that contain the added keys and values. Thus, the paged pool that keeps the registry data in memory isn't necessarily contiguous. [emphasis mine]

Windows Internals, p. 204

So our earlier assumption may not be valid, especially if there have been keys or values added (which is almost certainly the case on a system that has been running for a while).

To deal with this, the configuration manager uses a scheme reminiscent of x86 virtual address translation, and has a translation table that maps each cell index (i.e. offset within the registry hive) to the appropriate location in memory.

Dealing with these new complexities, however, is a bit too much for me, at least this week. The next steps down this path would involve locating the data structures used by the configuration manager in memory, and the parsing them to get an accurate picture of the registry in memory. If anyone wants to do some exploring on their own, a good place to start would be to look at debug symbols starting with _CM, as well as the !reg extension in WinDbg. Hopefully I'll be able to return to this topic in the near future.

For next week, however, I'll be back to parsing PDB files; with some luck, I should have code to extract type information ready by then, and the beginnings of a full-fledged Python module to handle PDB files.

Sunday, August 26, 2007

PDB Stream Decomposition

I've been taking a look at the structure of PDB files recently. PDB files are Microsoft's proprietary file format for storing debug information. They are generated by the Visual Studio family of products, and Microsoft has been kind enough to provide a full set of PDB files for its own operating system files since Windows 2000, which means that we get access to a whole bunch of great information like function symbols and type information. These can provide an excellent insight into the internals of the Windows operating system.

Naturally, the file format is undocumented and proprietary (there used to be a page on MSDN stating as much, but I can't seem to find it now). However, Sven Schreiber, in his book Undocumented Windows 2000 Secrets: A Programmer's Cookbook, provides details on the format and some sample code for reading the information in such files. Although the original code only deals with PDB files created with Visual Studio 6.0 and below, he has recently released a new program that parses the most recent version of the file format. As the newer format has not been written about, I thought I'd devote this entry to describing its high-level structure, and provide some code for breaking it down into its component parts. Hopefully I'll be following it up next week with some more polished code to parse the internal format of several of the streams (such as stream 2, which contains type information), but as I'm going to be moving this week, it may get pushed off until the week after.

(By the way, in case you're wondering why I don't simply use the Debug Interface Access SDK to get at this information, the main reason is that it's not available for my preferred operating systems, Linux and OS X. I also think it's good in general to know what's really going on inside a file format; it gives you a better feel for how reliable the information presented is (as opposed to just believing what the tool tells you), and can really save you when you find that the tool you use has some limitations or doesn't do what you expect.)

PDB files start with a fairly long, 0x20 byte signature:
'Microsoft C/C++ MSF 7.00\r\n\x1ADS\0\0\0'

Next come several dwords that give vital information and allow some basic integrity checking (note: I'm using Schrieber's names here):
  • dPageBytes - the size of a page in bytes.
  • dFlagPage - page number of the table describing which pages are allocated in the file. If page n of the PDB file is in use, the nth bit of the page will be set to 0.
  • dFilePages - number of pages in this file. dFilePages * dPageBytes should equal the file's size.
  • dRootBytes - the size of the root stream, in bytes.
  • dReserved - always zero.
  • adIndexPages[] - an array of dwords giving the pages numbers that contain the root stream index. In fact, I do not have any evidence that this is an array in the latest PDB format, rather than a simple dword, but Schreiber lists it as such.
All data in the PDB file is stored in blocks of a fixed size given in the dPageBytes member of the header. As a result, most offsets in the file are given as a page number, rather than a byte offset. To get the actual file offset, simply multiply the page number by dPageBytes. Overlaid on top of this block structure, information is stored in streams, much like files on a filesystem. A single structure, the root stream, provides an index of the streams, and on what pages they may be found. If this structure sounds familiar, that's because it appears to be quite a common one within Microsoft–this general layout is shared with the FAT filesystem, OLE (e.g. Word) documents, and Windows registry files.

If we seek to the page referred to by adIndexPages, we find that it contains an array of dwords giving the page numbers that make up the root stream. However, it is not immediately obvious how we are to know the length of this array. In fact, we must calculate it: since we know the size of the root stream from dRootBytes, and the size of a page from dPageBytes, the number of pages required to store the root stream is given by dRootBytes / dPageBytes (round up). For example, in the PDB file for ntoskrnl.exe under Windows XP SP2, dRootBytes is 15964, and the page size is 0x400 bytes, so the number of pages (and hence the length of the root stream index array) is 16. We can now read each page listed in the array to reconstruct the full root stream.

The root stream itself has a fairly simple structure. It starts with a dword, dStreams, giving the number of streams present in the file. Next is an array of dwords of length dStreams, giving the length of each stream in bytes. Finally, there are dStreams arrays of dwords listing the pages that make up each stream. In other words, the structure looks like:

[dStreams][length of stream 0][length of stream 1]...[length of stream n][first page of stream 0][second page of stream 0]...[last page of stream 0][first page of stream 1][second page of stream 1]

...and so on.

Once again, we have to calculate the size of each array that lists the page numbers for each stream. Luckily, it works exactly as before: the length of each array is given by the size of the stream divided by the page size, round up. Now that we have the list of pages that make up each stream, it is a simple matter to write out each stream separately.

Code to do just this can be found here (note: this may go down for a few days next week, as I move things to my new apartment. If anyone knows of a good shared host so I can stop hosting things off my desktop, let me know!) The usual caveats apply: use at your own risk, alpha quality code, and no support for older PDB files (some of the XPSP2 debug symbols use this format; it looks like all of the Vista symbols use the new format, however).

That's all for now! Next time, I'm hoping to present the beginnings of a full-fledged PDB parser, including support for the internal structures of some streams, such as the type information (stream 2) and function symbols (stream 3). Personally, the type information is most interesting to me; extracting this data from ntoskrnl.exe (ntkrnlmp.exe on Vista) allows us access to details of a number of internal kernel structures, which we can then use to interpret the contents of volatile memory. I used this technique with excellent results in my investigation of the VAD tree. Eventually, I hope to be able to output information on Windows kernel structures in a format that can be inserted into Volatility's vtypes.py.

Friday, May 18, 2007

VAD Tools Posted

As promised, I've posted the VAD tools. I'm hosting them on SourceForge right now. Anyone who's interested in memory forensics is encouraged to download the tools and play with them. Suggestions, bug reports, and code contributions would also be greatly appreciated.

Links:

Thursday, May 17, 2007

Oracle Forensics Articles

David Litchfield of NGS Software has put up several excellent articles on Oracle Database Forensics. Litchfield is generally considered to be the most knowledgeable person in the field of database security, so his thoughts on database forensics carry a lot of weight.

So far four articles have been posted:
  1. Oracle Forensics Part 1: Dissecting the Redo Logs
  2. Oracle Forensics Part 2: Locating Dropped Objects
  3. Oracle Forensics Part 3: Isolating Evidence of Attacks Against the Authentication Mechanism
  4. Oracle Forensics Part 4: Live Response
I haven't read through them in detail yet, but it looks like they have a ton of awesome information about the binary format of the redo logs and recovering information that has been deleted (dropped) from the database.

Monday, May 7, 2007

Virtual Address Descriptors (DFRWS 2007)

I've just had my first paper accepted to a conference! The paper, titled The VAD Tree: A Process-Eye View of Physical Memory, describes how to use the Virtual Address Descriptor structure of the Windows kernel to do a variety of nifty things, including list DLLs loaded by the process and categorize a process's memory allocations into shared regions, mapped files, and private allocations. I'll be presenting it at the 7th annual Digital Forensics Research Workshop in Pittsburgh this August.

Here's the abstract:
This paper describes the use of the Virtual Address Descriptor (VAD) tree structure in Windows memory dumps to help guide forensic analysis of Windows memory. We describe how to locate and parse the structure, and show its value in breaking up physical memory into more manageable and semantically meaningful units than can be obtained by simply walking the page directory for the process. Several tools to display information about the VAD tree and dump the memory regions it describes will also be presented.


The tools mentioned above will be released shortly; there's a small amount of cleanup I still need to do. I'll update this post with a link when that happens.

Last, but certainly not least, I want to take this opportunity to give heartfelt thanks to AAron Walters, who helped me out a ton by looking over my paper and giving suggestions. It's a much stronger and more technically interesting work as a result of his help. Also, thanks go out to Andy Bair, who got me interested in all this stuff in the first place.

Saturday, January 20, 2007

Windows Memory Forensics, Part 1

Despite my earlier bold claims that I'd be doing more analysis of "Big Yellow," I'm going to have to renege for now. At work, we recently came across a user who was trying to connect to an external IP upwards of a thousand times per day; some investigation showed that his machine had been compromised by a trojan, and so we started in on incident response. This has left me with very little time to look at Big Yellow, but it did give me something new to write about--Windows memory forensics.

Our first response was to get the user to run a cool little bundle that my friend Andy and I have been working on that collects as much forensically important information from the user's system as possible and uploads it to a server using Webjob. The package includes a bunch of Sysinternals tools, ftimes to get file hashes and search for specific strings, and a version of dd from the Forensic Acquisition Utilities that can dump physical memory. This last bit is what we'll focus on in this article--extracting useful information from a dump of Windows memory. In this first part, I'll deal with what information can be gotten with traditional utilities, and introduce ftimes and XMagic, which, together, can be used to do some really powerful stuff.

The most obvious thing to do is just grep through the raw dump for anything you might be looking for (in our case, we had the domain name of the site that the user was continually attempting to contact), or run strings on it and browse through to see what piques your interest. The nice thing about memory is that things nearby each other in the dump are often (though not always) owned by the same process, and hence related. So if you see, a few bytes away from "evil-hacker.net", a file in the Windows directory that isn't supposed to be there, you might want to investigate it more thoroughly.

Even more powerful things can be done using ftimes (part of the Integrity project). In its "dig" mode, it will go over a file or set of files and search for strings according to criteria you specify. It can search for plain strings (DigStringNormal), regular expressions (DigStringRegexp), or XMagic patterns (DigStringXMagic). Included in the ftimes tarball is a script called hipdig.pl that has several useful predefined regular expressions, including hostnames, IP addresses, password hashes, and a lot more. These can help cut down on the amount of data you have to sift through to get to the stuff that you really need, and time is often critical during incident response.

The coolest part of ftimes in dig mode, by my reckoning, is the ability to dig for XMagic patterns. XMagic is an updated and extended version of the patterns used by the venerable UNIX file(1) command, which are known as "magic". XMagic adds some powerful features to traditional magic, including tests for various cryptographic hashes, entropy, and, of course, perl-compatible regular expressions. In this case, it allows us to find files in memory based on characteristic signatures; for example, here's a signature to find all Windows executables:

# XMagic

0 string = MZ DOS Header
<&0x3c lelong x - \b, PE Offset = %lu <<(0x3c.l) regexp:4 =~ ^PE\x00\x00 \b, Windows PE

You could also just grab the most recent XMagic file and see what it turns up on physical memory; this can be a lot of fun even outside a forensic investigation, just to see what's actually being kept in memory, and for how long.

That's all I have time for right now, but tune in next time, when we really tear into the memory dump, go hunting for kernel memory structures, reconstruct executables from memory, and generally party like it's 1999. As a teaser, here's some XMagic that will find all process structures in a memory dump from an XP SP2 system:

# XMagic

0 string = \x03\x00\x1b\x00 Windows Process
# PDB is not null
<0x18> 0
# flink in kernel space
<<0x50> 0x80000000
# blink in kernel space
<<<0x54> 0x80000000
# SYNCH struct at 0xD8
<<<<0xD8 regexp =~ \x01.\x04.
# SYNCH struct at 0xFC
<<<<<0xFC regexp =~ \x01.\x04.
# Print out some pertinent info
>>>>>>0x84 lelong x - \b, PID %u
>>>>>>0x14C lelong x - \b, PPID %u
# Image Name
>>>>>>0x174 string x - \b, %s