In our previous post, we started talking about our experience using McSema and SeaHorn to prove equivalence between two x86 binary programs, where one is a variant from the other. In this post, we describe through examples the LLVM bitcode generated by McSema. This will help the reader to understand the challenges of reasoning about this bitcode in case she is not familiar with McSema.
McSema is a translator from
x86 binary code to LLVM bitcode. McSema works in
two stages. First, it disassembles the binary program into a Control
Flow Graph (CFG) representation using the mcsema-disass
command-line
tool. This CFG representation is built with the help
of Ida Pro and contains
variables, functions, and basic blocks. In the second stage, McSema
lifts the CFG representation to LLVM bitcode using the mcsema-lift
command-line tool.
The LLVM bitcode lifted by McSema is always well-formed and, therefore, it can be processed by any LLVM-based tool. However, it is similar to disassembled code, and reasoning about it is very challenging. To illustrate this problem, we show a simple C program:
1
2
3
4
5
6
7
8
int main(int argc, char**argv) {
if (argc > 1) {
int n = atoi(argv[1]);
return n;
} else {
return 0;
}
}
and the disassembled code produced by the objdump
command-line tool:
0000000000400530 <main>:
400530: 55 push %rbp
400531: 48 89 e5 mov %rsp,%rbp
400534: 48 83 ec 20 sub $0x20,%rsp
400538: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
40053f: 89 7d f8 mov %edi,-0x8(%rbp)
400542: 48 89 75 f0 mov %rsi,-0x10(%rbp)
400546: 83 7d f8 01 cmpl $0x1,-0x8(%rbp)
40054a: 0f 8e 1b 00 00 00 jle 40056b <main+0x3b>
400550: 48 8b 45 f0 mov -0x10(%rbp),%rax
400554: 48 8b 78 08 mov 0x8(%rax),%rdi
400558: e8 d3 fe ff ff callq 400430 <atoi@plt>
40055d: 89 45 ec mov %eax,-0x14(%rbp)
400560: 8b 45 ec mov -0x14(%rbp),%eax
400563: 89 45 fc mov %eax,-0x4(%rbp)
400566: e9 07 00 00 00 jmpq 400572 <main+0x42>
40056b: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
400572: 8b 45 fc mov -0x4(%rbp),%eax
400575: 48 83 c4 20 add $0x20,%rsp
400579: 5d pop %rbp
40057a: c3 retq
In the rest of this post, we show relevant fragments of the LLVM bitcode produced by McSema.
Warning: We used McSema version 1.0 because we started this work using that version. The current version of McSema is 2.0.
Our first LLVM bitcode fragment generated by McSema shows the start of the main function:
define x86_64_sysvcc void @sub_400530(%RegState* %reg_state) #1 {
entry:
%RIP = getelementptr inbounds %RegState, %RegState* %reg_state, i32 0, i32 0
%RAX = getelementptr inbounds %RegState, %RegState* %reg_state, i32 0, i32 1
…
%RSP = getelementptr inbounds %RegState, %RegState* %reg_state, i32 0, i32 7
%RBP = getelementptr inbounds %RegState, %RegState* %reg_state, i32 0, i32 8
…
br label %block_400530
The name main
has been replaced by sub_400530
where 0x400530
is
the initial address of the function. We only show the first few
instructions of the entry block. An important difference with respect
to the C code is that the parameters of the function are not argc
and argv
. Instead, the function takes a single input parameter
denoted here by %reg_state
. This parameter is a pointer to a McSema
type RegState
, which models the x86 register state. This type is a
struct that contains all the CPU registers and status flags such as
%RIP
(register instruction pointer),%RAX
, %RBP
(register base
pointer), …, %RSP
(register stack pointer), %CF
(carry flag),
%PF
(parity flag), %AF
(adjust flag), %ZF
(zero flag), %SF
(sign flag), %OF
(overflow flag), %DF
(direction flag), etc. All
functions except system calls have one parameter of type
RegState
. The code above contains LLVM getelemenptr
instructions to
access to each struct field (i.e., register or flag).
Coming back to our disassembled code, the first two instructions form
the function prologue or preamble. %RBP
is the base pointer, which
points to the base of the current stack frame, and %RSP
is the stack
pointer which points to the top of the current stack frame. The
preamble first pushes the old base pointer onto the stack to save it
for later. Then it copies the value of the stack pointer to the base
pointer. After this, %RBP
points to the base of main stack
frame. This is the corresponding LLVM bitcode from McSema:
block_400530:
;;; push %rbp
%1 = load i64, i64* %RBP
%2 = load i64, i64* %RSP
%3 = sub i64 %2, 8
%4 = inttoptr i64 %3 to i64*
store i64 %1, i64* %4
;;; mv %rsp, %rbp
%5 = load i64, i64* %RSP
store volatile i64 %5, i64* %RBP
Comments in LLVM bitcode are preceded by semicolons. This code
already shows key features of the bitcode generated by McSema. There
is no distinction between integers and pointers. The contents of the
CPU registers are integers that can be cast to pointers (intptr
instructions) as needed.
After the prologue, the program allocates space for local variables. The bitcode is shown next. Note that it corresponds to a single assembler instruction!
block_400530:
...
;;; sub $0x20,%rsp
%6 = load i64, i64* %RSP
%7 = sext i8 32 to i64
%8 = sub i64 %6, %7
%9 = xor i64 %8, %6
%10 = xor i64 %9, %7
%11 = and i64 %10, 16
%12 = icmp ne i64 %11, 0
%13 = zext i1 %12 to i8
store volatile i8 %13, i8* %AF
%14 = trunc i64 %8 to i8
%15 = call i8 @llvm.ctpop.i8(i8 %14)
%16 = trunc i8 %15 to i1
%17 = zext i1 %16 to i8
%18 = xor i8 %17, 1
%19 = trunc i8 %18 to i1
%20 = zext i1 %19 to i8
store volatile i8 %20, i8* %PF
%21 = icmp eq i64 %8, 0
%22 = zext i1 %21 to i8
store volatile i8 %22, i8* %ZF
%23 = lshr i64 %8, 63
%24 = trunc i64 %23 to i1
%25 = zext i1 %24 to i8
%26 = trunc i8 %25 to i1
%27 = zext i1 %26 to i8
store volatile i8 %27, i8* %SF
%28 = icmp ult i64 %6, %7
%29 = zext i1 %28 to i8
store volatile i8 %29, i8* %CF
%30 = xor i64 %6, %7
%31 = xor i64 %6, %8
%32 = and i64 %30, %31
%33 = lshr i64 %32, 63
%34 = trunc i64 %33 to i1
%35 = zext i1 %34 to i8
%36 = trunc i8 %35 to i1
%37 = zext i1 %36 to i8
store volatile i8 %37, i8* %OF
store volatile i64 %8, i64* %RSP
Because the stack grows downwards, the allocation of local variables
is done by subtracting 32 bytes (0x20
) from the top of the stack.
The first three instructions correspond to reading the content of
%RSP
(%6 = load i64, i64* %RSP
) and then subtracting 32 (%8 = sub
i64 %6, %7
) to that after performing a signed extension to 64 bits
(%7 = sext i8 32 to i64
). The rest of the code above, except the
last instruction that stores the resulting value in %RSP
(store
volatile i64 %8, i64* %RSP
), updates the x86 status flags which are
also part of the bitcode generated by McSema (%AF
, %PF
, %ZF
,
%SF
, %CF
, and %OF
). This is another interesting aspect of
McSema’s translation to bitcode. As part of the translation, it
explicit adds code to update the x86 status flags which are part of
the x86 semantics but they are not present explicitly in assembly
code.
The bitcode above illustrates how arithmetic instructions and local
variables are translated by McSema. Note that the translated code is
pretty similar to the assembly version, and local variables are
offsets relative to %RBP
. In other words, McSema does not attempt to
recover stack frames. This impedes LLVM optimizations to generate a
more efficient bitcode and has a huge impact on the scalability of the
verification task.
Next, we show how jumps are translated by McSema. The two commented
x86 instructions in the next bitcode perform if 1 <= -0x8(%rbp) then
goto 40056b else goto 400550
where the “less than or equal” is a
signed comparison:
block_400530:
...
;;;; cmpl $0x1,-0x8(%rbp)
;;;; jle 40056b <main+0x3b>
%53 = load i64, i64* %RBP
%54 = add i64 %53, -8
%55 = inttoptr i64 %54 to i64*
%56 = ptrtoint i64* %55 to i64
%57 = inttoptr i64 %56 to i32*
%58 = load i32, i32* %57
%59 = sub i32 %58, 1
;;; skipped code that updates %AF
;;; skipped code that updates %PF
%72 = icmp eq i32 %59, 0
%73 = zext i1 %72 to i8
store volatile i8 %73, i8* %ZF
%74 = lshr i32 %59, 31
%75 = trunc i32 %74 to i1
%76 = zext i1 %75 to i8
%77 = trunc i8 %76 to i1
%78 = zext i1 %77 to i8
store volatile i8 %78, i8* %SF
;;; skipped code that updates %CF
%81 = xor i32 %58, 1
%82 = xor i32 %58, %59
%83 = and i32 %81, %82
%84 = lshr i32 %83, 31
%85 = trunc i32 %84 to i1
%86 = zext i1 %85 to i8
%87 = trunc i8 %86 to i1
%88 = zext i1 %87 to i8
store volatile i8 %88, i8* %OF
%89 = load i8, i8* %ZF
%90 = trunc i8 %89 to i1
%91 = icmp eq i1 %90, true
%92 = load i8, i8* %SF
%93 = trunc i8 %92 to i1
%94 = load i8, i8* %OF
%95 = trunc i8 %94 to i1
%96 = icmp eq i1 %93, %95
%97 = icmp eq i1 %96, false
%98 = or i1 %91, %97
br i1 %98, label %block_40056b, label %block_400550
This corresponds to the C branch if argc > 1
. The two x86
instructions are again translated to a large number of LLVM
instructions. Note that the terminator of the LLVM block will transfer
the control to %block_40056b
if the condition %ZF or (%SF != %OF)
holds, and otherwise it will transfer the control to %block_400550
.
Our next bitcode snippet shows the LLVM bitcode for the system call to
function atoi
(last line):
%block_400550:
...
;;; mov -0x10(%rbp),%rax
%99 = load i64, i64* %RBP
%100 = add i64 %99, -16
%101 = inttoptr i64 %100 to i64*
%102 = load i64, i64* %101
store volatile i64 %102, i64* %RAX
;;; mov 0x8(%rax),%rdi
%103 = load i64, i64* %RAX
%104 = add i64 %103, 8
%105 = inttoptr i64 %104 to i64*
%106 = load i64, i64* %105
store volatile i64 %106, i64* %RDI
;;;; callq 400430 <atoi@plt>
%107 = load i64, i64* %RDI
%111 = call x86_64_sysvcc i64 @_atoi(i64 %107)
Before the call, the program stores the actual parameters into some
predefined registers following a particular calling convention. In our
case, the AMD64 ABI states that the first six arguments are stored in
%RDI
, %RSI
, %RDX
, %RCX
, %R8
, %R9
. If a call has more than
six arguments, the rest are stored on the stack. The LLVM bitcode
above performs the following steps:
-
Store the base address of
argv
in%RAX
. Recall that the variableargc
is in the current frame at offset -8 whileargv
is at offset -16. The bitcode reads the content of%RBP
, subtract 16, casts it to a pointer, reads the content of the pointer, and finally, stores the content in%RAX
. -
Store
argv[1]
into%RDI
. The LLVM bitcode is pretty similar to Step 1 but it reads first%RAX
to get access to&argv[0]
, adds 8 to it in order to access to&argv[1]
, casts it to a pointer, reading it, and stores the result in%RDI
. -
Read the content of
%RDI
and pass it to the__atoi
call.
Calls to user-defined functions are pretty similar. The only
difference is that these functions have only one parameter of type
RegState
containing all the CPU registers and status flags.
Finally, we show how global variables are translated. A problem we
faced is that McSema translates all ELF sections that can be loaded
into memory into a LLVM global variable. This can produce many
irrelevant global variables. We show first the loaded sections as
identified by the readelf -lS
command-line tool:
Section to Segment mapping:
…
02 .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r
.rela.dyn .rela.plt .init.plt .text .fini .rodata .eh_frame_hdr .eh_frame
03 .init_array .fini_array .jcr .dynamic .got .got.plt .data .bss
…
.rodata PROGBITS 400600
.eh_frame_hdr PROGBITS 400604
.eh_frame PROGBITS 400638
.init_array INIT_ARRAY 600e10
.fini_array FINI_ARRAY 600e18
.jcr PROGBITS 600e20
.dynamic DYNAMIC 600e28
.got PROGBITS 600ff8
.got.plt PROGBITS 601000
.data PROGBITS 601030
.bss NOBITS 601040
From those, McSema produces the following LLVM bitcode:
@data_400600 = internal constant %0 <{ [4 x i8] c"\01\00\02\00" }>, align 64
@data_600e10 = internal global %1 <{ void ()* @callback_sub_400500 }>, align 64
@data_600e18 = internal global %2 <{ void ()* @callback_sub_4004e0 }>, align 64
@data_600e20 = internal global %3 zeroinitializer, align 64
@data_600ff8 = internal global %4 <{ i64 ()* @__gmon_start__ }>, align 64
@data_601000 = internal global %5 <{ [24 x i8] zeroinitializer, i64 (i64, i64, i64, i64, i64, i64, i64, i64)*
@__libc_start_main, i64 ()* @__gmon_start__, i64 (i64)* @atoi }>, align 64
@data_601030 = internal global %6 zeroinitializer, align 64
@data_601040 = internal global %7 zeroinitializer, align 64
Note how McSema maps each loaded section starting at address X
to a
LLVM global variable with name data_X
.
In summary, we have shown how stack variables, arithmetic operations, functions, jump, and global variables are translated by McSema to LLVM bitcode. Reasoning about McSema bitcode will be challenging because of the various issues with McSema mentioned previously: no type information, every register can be both an integer and a pointer, low-level bitcode to simulate process flags, etc.
In our next and last post, we will explain how SeaHorn can prove equivalence between the bitcode of thttpd and the bitcode of a variant obtained by applying a binary transformation based on block-level instruction layout randomization.