Reverse Engineering C++ Malware With IDA Pro: Classes, Constructors, and Structs


Once C++ code has been compiled the concept of classes and instantiation is lost and all class instances are "flattened" into structs in memory that contain the class variables. Correctly identifying and defining these structs in IDA is the key to reverse engineering C++ code. In this tutorial we cover the basics of identifying C++ structs in IDA and we provide quick tips to speed up your C++ reverse engineering.

C++ Class Variables

We will be using the following simple C++ class example as the basis for our tutorial. In the example we have a class Rectangle that has two instance variables width and height. The class also has two public functions that act on these variables, set_values which is used to set the width and hight, and area which returns the area of the rectangle by multiplying the width and the height.

#include "stdafx.h"
#include <iostream>
using namespace std;

class Rectangle {
	int width, height;
public:
	void set_values(int, int);
	int area() { return width*height; }
};

void Rectangle::set_values(int x, int y) {
	width = x;
	height = y;
}

int _tmain(int argc, _TCHAR* argv[]){
	Rectangle rect;
	rect.set_values(3, 4);
	cout << "rectangle: 3 x 4 ";
	cout << "area: " << rect.area();
	return 0;
}

When this example code is compiled the instantiation Rectangle rect; is converted into two int variables on the stack. An illustration of this is shown below in Figure 1.

Figure 1 Compiled C++ Class 

Structs In IDA Pro

When IDA first decompiles compiled C++ code that contains these "flattened" class instantiation structs it is not able to automatically identify the structs. This results in decompiled code that is difficult to understand, often with variables that seem to magically appear with no prior assignment. An example of this is shown below.

int __thiscall sub_414D40(_DWORD *this)
{
  return this[1] * *this;
}

When analyzing decompiled C++ code a quick way to identify these class instantiation structs is to look for variables that are passed by reference (using the & sign) as the first argument to a function. By convention, the first argument passed to class functions is a pointer to the class instantiation struct. An example of this is shown below.

char v5; // [esp+D0h] [ebp-10h]@1
sub_41139D(&v5, 3, 4);

Once the struct has been identified we can use a simple trick to make IDA do the heavy work of creating the struct definition with the correct size. First, locate the first function where values are assigned to the struct. This is often the contractor for the class, but it some cases it may simply be the a "setter" function. In the function right click on the first variable in the struct and select Reset pointer type. Once the variable pointer type has been reset, right click the variable a second time and select Create new struct type. IDA will then attempt to determine how large the struct is and you will be presented with and editable struct definition. The struct for our initial Rectangle example would look like the following.

struct struct_rectangle
{
  _DWORD dword0;
  _DWORD dword4;
};

Once the struct has been defined all functions that are passed the struct must have their function type edited to reflect that it is a struct that is being passed to them an not a void pointer. Function types can be edited by selecting the function and pressing the y key. An example of the above rectangle struct being added to a function type is shown below in Figure 2.

Figure 2 Function Type Definition in IDA Pro

The struct members (variables in the struct) can be renamed as their purpose is identified. As more struct members are identified the reverse engineering process will become more intuitive. For example, once the two members of our Rectangle struct have been identified as the x and y of the rectangle our initial confusing function becomes the following easy to understand code.

int __thiscall sub_414D40(struct_rectangle *this)
{
  return this->y * this->x;
}

Structs can be edited at any time using either the Structures window or the Local types window as shown below in Figure 3.

The Local types window is easier to edit as it provides a C++ style editor. Structs that only appear in the Structures window can be copied into the Local types window for easier editing.