Proving Equivalence of x86 programs with McSema + SeaHorn (I)
Overview

We have been working together with SRI folks, Bruno Dutertre and Ian Mason, on proving program equivalence of x86 programs using SeaHorn. This post is the first of a series of posts.

First, a bit of background. The N-variant framework is a method for cyber security based on artificial diversity. The idea is to produce N variants of a program and execute all the variants with the same inputs while monitoring their outputs. A successful attacker requires to simultaneously compromise all system variants with the same input. Of course, a highly desired property is to ensure that the original and the variants are indeed functionally equivalent. Otherwise, the security of the whole system can be compromised.

Under the DARPA’s Cyber Fault-tolerant Attack Recovery (CFAR) program, folks at University of Virginia have built a new infrastructure that implements the N-variant framework called DoubleHelix+RAVEN. DoubleHelix is the component that generates the N variants, and Raven is the infrastructure needed to monitor the execution of the variants. The purpose of the equivalence proofs is to establish that a DoubleHelix binary transform results in a variant that is functionally equivalent to the original when not under attack. We start with two different x86 binaries: the original code and a variant.

The first step in our analysis is to convert both binaries to LLVM bitcode. We then merge the two LLVM bitcode files into a single file. The verification is done one function at a time. For every function in the original binary, we identify the corresponding function in the variant. Both of them are converted to two functions in the LLVM bitcode. We merge the bitcode for both functions and we perform an equivalence check for each function pair. We assume that both the original and variant start from identical global states and we prove that executing both functions leads them to identical states. This check is performed by SeaHorn using our bounded model checking (BMC) engine. In general, BMC searches for finite executions of a system that start from initial state and violate a safety property of interest. When applied to software, BMC amounts to setting an upper bound on loops and recursive function calls to set a limit on the length of execution traces. Given such a bound, the search for executions that violate a safety property is expressible as a satisfiability modulo theories (SMT) problem. Since we need to be bit precise to reason about binaries, all the verification conditions generated by BMC are in the theory QF_AUFBV, which means quantifier-free formulas, with constraints on bit-vectors, arrays, and uninterpreted functions.

This is the verification tool chain we have built:

McSema+SeaHorn for Proving Equivalence of x86 Programs

We use the McSema tool for converting x86 binaries to LLVM bitcode and then use SeaHorn’s BMC engine to perform verification. By default, SeaHorn uses the Z3 SMT solver via its API. Optionally, SeaHorn can also produce a textual representation of the BMC problem (in the SMT-LIB format), which allows one to use any off-the-shelf solver. In addition to Z3, we have used Yices 2 and Boolector. The results produced by the SMT solver are analyzed by SeaHorn. If the solver finds a solution, there is an execution that violates the safety property. In our case, this execution shows that the original function and its variant have different behaviors. When the internal Z3 solver is used, SeaHorn can convert the trace produced by Z3 into an executable counterexample, as described in one of our previous posts.

In the next posts, we will describe how the LLVM bitcode generated by McSema looks like and how we can prove equivalence between the binary code of thttpd and a variant obtained by applying a binary transformation based on block-level instruction layout randomization.

*****
Written by SeaHorn on 12 December 2018