Proving Equivalence of x86 programs with McSema + SeaHorn (II)
McSema

In our previous post, we started talking about our experience using McSema and SeaHorn to prove equivalence between two x86 binary programs, where one is a variant from the other. In this post, we describe through examples the LLVM bitcode generated by McSema. This will help the reader to understand the challenges of reasoning about this bitcode in case she is not familiar with McSema.

McSema is a translator from x86 binary code to LLVM bitcode. McSema works in two stages. First, it disassembles the binary program into a Control Flow Graph (CFG) representation using the mcsema-disass command-line tool. This CFG representation is built with the help of Ida Pro and contains variables, functions, and basic blocks. In the second stage, McSema lifts the CFG representation to LLVM bitcode using the mcsema-lift command-line tool.

The LLVM bitcode lifted by McSema is always well-formed and, therefore, it can be processed by any LLVM-based tool. However, it is similar to disassembled code, and reasoning about it is very challenging. To illustrate this problem, we show a simple C program:

1
2
3
4
5
6
7
8
int main(int argc, char**argv) {
  if (argc > 1) {
    int n = atoi(argv[1]);
    return n;
  } else {
    return 0;
  }
}

and the disassembled code produced by the objdump command-line tool:

0000000000400530 <main>:
  400530:       55                    push   %rbp
  400531:       48 89 e5              mov    %rsp,%rbp
  400534:       48 83 ec 20           sub    $0x20,%rsp
  400538:       c7 45 fc 00 00 00 00  movl   $0x0,-0x4(%rbp)
  40053f:       89 7d f8              mov    %edi,-0x8(%rbp)
  400542:       48 89 75 f0           mov    %rsi,-0x10(%rbp)
  400546:       83 7d f8 01           cmpl   $0x1,-0x8(%rbp)
  40054a:       0f 8e 1b 00 00 00     jle    40056b <main+0x3b>
  400550:       48 8b 45 f0           mov    -0x10(%rbp),%rax
  400554:       48 8b 78 08           mov    0x8(%rax),%rdi
  400558:       e8 d3 fe ff ff        callq  400430 <atoi@plt>
  40055d:       89 45 ec              mov    %eax,-0x14(%rbp)
  400560:       8b 45 ec              mov    -0x14(%rbp),%eax
  400563:       89 45 fc              mov    %eax,-0x4(%rbp)
  400566:       e9 07 00 00 00        jmpq   400572 <main+0x42>
  40056b:       c7 45 fc 00 00 00 00  movl   $0x0,-0x4(%rbp)
  400572:       8b 45 fc              mov    -0x4(%rbp),%eax
  400575:       48 83 c4 20           add    $0x20,%rsp
  400579:       5d                    pop    %rbp
  40057a:       c3                    retq

In the rest of this post, we show relevant fragments of the LLVM bitcode produced by McSema.

Warning: We used McSema version 1.0 because we started this work using that version. The current version of McSema is 2.0.

Our first LLVM bitcode fragment generated by McSema shows the start of the main function:

define x86_64_sysvcc void @sub_400530(%RegState* %reg_state) #1 {
entry:
  %RIP = getelementptr inbounds %RegState, %RegState* %reg_state, i32 0, i32 0 
  %RAX = getelementptr inbounds %RegState, %RegState* %reg_state, i32 0, i32 1 
  …
  %RSP = getelementptr inbounds %RegState, %RegState* %reg_state, i32 0, i32 7
  %RBP = getelementptr inbounds %RegState, %RegState* %reg_state, i32 0, i32 8
  …
  br label %block_400530

The name main has been replaced by sub_400530 where 0x400530 is the initial address of the function. We only show the first few instructions of the entry block. An important difference with respect to the C code is that the parameters of the function are not argc and argv. Instead, the function takes a single input parameter denoted here by %reg_state. This parameter is a pointer to a McSema type RegState, which models the x86 register state. This type is a struct that contains all the CPU registers and status flags such as %RIP (register instruction pointer),%RAX, %RBP (register base pointer), …, %RSP (register stack pointer), %CF (carry flag), %PF (parity flag), %AF (adjust flag), %ZF (zero flag), %SF (sign flag), %OF (overflow flag), %DF (direction flag), etc. All functions except system calls have one parameter of type RegState. The code above contains LLVM getelemenptr instructions to access to each struct field (i.e., register or flag).

Coming back to our disassembled code, the first two instructions form the function prologue or preamble. %RBP is the base pointer, which points to the base of the current stack frame, and %RSP is the stack pointer which points to the top of the current stack frame. The preamble first pushes the old base pointer onto the stack to save it for later. Then it copies the value of the stack pointer to the base pointer. After this, %RBP points to the base of main stack frame. This is the corresponding LLVM bitcode from McSema:

block_400530:
  ;;; push %rbp
  %1 = load i64, i64* %RBP
  %2 = load i64, i64* %RSP
  %3 = sub i64 %2, 8
  %4 = inttoptr i64 %3 to i64*
  store i64 %1, i64* %4
  ;;; mv %rsp, %rbp
  %5 = load i64, i64* %RSP
  store volatile i64 %5, i64* %RBP

Comments in LLVM bitcode are preceded by semicolons. This code already shows key features of the bitcode generated by McSema. There is no distinction between integers and pointers. The contents of the CPU registers are integers that can be cast to pointers (intptr instructions) as needed.

After the prologue, the program allocates space for local variables. The bitcode is shown next. Note that it corresponds to a single assembler instruction!

block_400530: 
  ...
  ;;; sub $0x20,%rsp
  %6 = load i64, i64* %RSP
  %7 = sext i8 32 to i64
  %8 = sub i64 %6, %7
  %9 = xor i64 %8, %6
  %10 = xor i64 %9, %7
  %11 = and i64 %10, 16
  %12 = icmp ne i64 %11, 0
  %13 = zext i1 %12 to i8
  store volatile i8 %13, i8* %AF
  %14 = trunc i64 %8 to i8
  %15 = call i8 @llvm.ctpop.i8(i8 %14)
  %16 = trunc i8 %15 to i1
  %17 = zext i1 %16 to i8
  %18 = xor i8 %17, 1
  %19 = trunc i8 %18 to i1
  %20 = zext i1 %19 to i8
  store volatile i8 %20, i8* %PF 
  %21 = icmp eq i64 %8, 0
  %22 = zext i1 %21 to i8
  store volatile i8 %22, i8* %ZF
  %23 = lshr i64 %8, 63
  %24 = trunc i64 %23 to i1
  %25 = zext i1 %24 to i8
  %26 = trunc i8 %25 to i1
  %27 = zext i1 %26 to i8
  store volatile i8 %27, i8* %SF
  %28 = icmp ult i64 %6, %7
  %29 = zext i1 %28 to i8
  store volatile i8 %29, i8* %CF
  %30 = xor i64 %6, %7
  %31 = xor i64 %6, %8
  %32 = and i64 %30, %31
  %33 = lshr i64 %32, 63
  %34 = trunc i64 %33 to i1
  %35 = zext i1 %34 to i8
  %36 = trunc i8 %35 to i1
  %37 = zext i1 %36 to i8
  store volatile i8 %37, i8* %OF
  store volatile i64 %8, i64* %RSP

Because the stack grows downwards, the allocation of local variables is done by subtracting 32 bytes (0x20) from the top of the stack. The first three instructions correspond to reading the content of %RSP (%6 = load i64, i64* %RSP) and then subtracting 32 (%8 = sub i64 %6, %7) to that after performing a signed extension to 64 bits (%7 = sext i8 32 to i64). The rest of the code above, except the last instruction that stores the resulting value in %RSP (store volatile i64 %8, i64* %RSP), updates the x86 status flags which are also part of the bitcode generated by McSema (%AF, %PF, %ZF, %SF, %CF, and %OF). This is another interesting aspect of McSema’s translation to bitcode. As part of the translation, it explicit adds code to update the x86 status flags which are part of the x86 semantics but they are not present explicitly in assembly code.

The bitcode above illustrates how arithmetic instructions and local variables are translated by McSema. Note that the translated code is pretty similar to the assembly version, and local variables are offsets relative to %RBP. In other words, McSema does not attempt to recover stack frames. This impedes LLVM optimizations to generate a more efficient bitcode and has a huge impact on the scalability of the verification task.

Next, we show how jumps are translated by McSema. The two commented x86 instructions in the next bitcode perform if 1 <= -0x8(%rbp) then goto 40056b else goto 400550 where the “less than or equal” is a signed comparison:

block_400530:
  ...
  ;;;;  cmpl   $0x1,-0x8(%rbp)
  ;;;;  jle    40056b <main+0x3b>
  %53 = load i64, i64* %RBP
  %54 = add i64 %53, -8
  %55 = inttoptr i64 %54 to i64*
  %56 = ptrtoint i64* %55 to i64
  %57 = inttoptr i64 %56 to i32*
  %58 = load i32, i32* %57
  %59 = sub i32 %58, 1
  ;;; skipped code that updates %AF
  ;;; skipped code that updates %PF
  %72 = icmp eq i32 %59, 0
  %73 = zext i1 %72 to i8
  store volatile i8 %73, i8* %ZF
  %74 = lshr i32 %59, 31
  %75 = trunc i32 %74 to i1
  %76 = zext i1 %75 to i8
  %77 = trunc i8 %76 to i1
  %78 = zext i1 %77 to i8
  store volatile i8 %78, i8* %SF
  ;;; skipped code that updates %CF
  %81 = xor i32 %58, 1
  %82 = xor i32 %58, %59
  %83 = and i32 %81, %82
  %84 = lshr i32 %83, 31
  %85 = trunc i32 %84 to i1
  %86 = zext i1 %85 to i8
  %87 = trunc i8 %86 to i1
  %88 = zext i1 %87 to i8
  store volatile i8 %88, i8* %OF
  %89 = load i8, i8* %ZF
  %90 = trunc i8 %89 to i1
  %91 = icmp eq i1 %90, true
  %92 = load i8, i8* %SF
  %93 = trunc i8 %92 to i1
  %94 = load i8, i8* %OF
  %95 = trunc i8 %94 to i1
  %96 = icmp eq i1 %93, %95
  %97 = icmp eq i1 %96, false
  %98 = or i1 %91, %97
  br i1 %98, label %block_40056b, label %block_400550

This corresponds to the C branch if argc > 1. The two x86 instructions are again translated to a large number of LLVM instructions. Note that the terminator of the LLVM block will transfer the control to %block_40056b if the condition %ZF or (%SF != %OF) holds, and otherwise it will transfer the control to %block_400550.

Our next bitcode snippet shows the LLVM bitcode for the system call to function atoi (last line):

%block_400550:
  ...
  ;;; mov    -0x10(%rbp),%rax
  %99 = load i64, i64* %RBP
  %100 = add i64 %99, -16
  %101 = inttoptr i64 %100 to i64*
  %102 = load i64, i64* %101
  store volatile i64 %102, i64* %RAX
  ;;; mov 0x8(%rax),%rdi
  %103 = load i64, i64* %RAX
  %104 = add i64 %103, 8
  %105 = inttoptr i64 %104 to i64*
  %106 = load i64, i64* %105
  store volatile i64 %106, i64* %RDI
  ;;;; callq  400430 <atoi@plt>
  %107 = load i64, i64* %RDI
  %111 = call x86_64_sysvcc i64 @_atoi(i64 %107)

Before the call, the program stores the actual parameters into some predefined registers following a particular calling convention. In our case, the AMD64 ABI states that the first six arguments are stored in %RDI, %RSI, %RDX, %RCX, %R8, %R9. If a call has more than six arguments, the rest are stored on the stack. The LLVM bitcode above performs the following steps:

  1. Store the base address of argv in %RAX. Recall that the variable argc is in the current frame at offset -8 while argv is at offset -16. The bitcode reads the content of %RBP, subtract 16, casts it to a pointer, reads the content of the pointer, and finally, stores the content in %RAX.

  2. Store argv[1] into %RDI. The LLVM bitcode is pretty similar to Step 1 but it reads first %RAX to get access to &argv[0], adds 8 to it in order to access to &argv[1], casts it to a pointer, reading it, and stores the result in %RDI.

  3. Read the content of %RDI and pass it to the __atoi call.

Calls to user-defined functions are pretty similar. The only difference is that these functions have only one parameter of type RegState containing all the CPU registers and status flags.

Finally, we show how global variables are translated. A problem we faced is that McSema translates all ELF sections that can be loaded into memory into a LLVM global variable. This can produce many irrelevant global variables. We show first the loaded sections as identified by the readelf -lS command-line tool:

Section to Segment mapping:
   …
   02   .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r
        .rela.dyn .rela.plt .init.plt .text .fini .rodata .eh_frame_hdr .eh_frame 
   03   .init_array .fini_array .jcr .dynamic .got .got.plt .data .bss
   …

.rodata        PROGBITS      400600
.eh_frame_hdr  PROGBITS      400604
.eh_frame      PROGBITS      400638
.init_array    INIT_ARRAY    600e10 
.fini_array    FINI_ARRAY    600e18 
.jcr           PROGBITS      600e20 
.dynamic       DYNAMIC       600e28 
.got           PROGBITS      600ff8 
.got.plt       PROGBITS      601000 
.data          PROGBITS      601030 
.bss           NOBITS        601040 

From those, McSema produces the following LLVM bitcode:

  @data_400600 = internal constant %0 <{ [4 x i8] c"\01\00\02\00" }>, align 64
  @data_600e10 = internal global %1 <{ void ()* @callback_sub_400500 }>, align 64
  @data_600e18 = internal global %2 <{ void ()* @callback_sub_4004e0 }>, align 64
  @data_600e20 = internal global %3 zeroinitializer, align 64
  @data_600ff8 = internal global %4 <{ i64 ()* @__gmon_start__ }>, align 64
  @data_601000 = internal global %5 <{ [24 x i8] zeroinitializer, i64 (i64, i64, i64, i64, i64, i64, i64, i64)* 
  	                     @__libc_start_main, i64 ()* @__gmon_start__, i64 (i64)* @atoi }>, align 64
  @data_601030 = internal global %6 zeroinitializer, align 64
  @data_601040 = internal global %7 zeroinitializer, align 64

Note how McSema maps each loaded section starting at address X to a LLVM global variable with name data_X.

In summary, we have shown how stack variables, arithmetic operations, functions, jump, and global variables are translated by McSema to LLVM bitcode. Reasoning about McSema bitcode will be challenging because of the various issues with McSema mentioned previously: no type information, every register can be both an integer and a pointer, low-level bitcode to simulate process flags, etc.

In our next and last post, we will explain how SeaHorn can prove equivalence between the bitcode of thttpd and the bitcode of a variant obtained by applying a binary transformation based on block-level instruction layout randomization.

*****
Written by SeaHorn on 12 December 2018