Proving Memory Safety with SeaHorn
how to automatically prove memory safety?

A program is memory safe if the following memory access errors never occur:

In this post, we will focus only on buffer overflows. A buffer overflow occurs when a program while reading or writing from/to a buffer, access outside the boundaries out-of-bound of the buffer. The Array Bounds Checking (ABC) problem focuses on whether there is a pointer that can never access to a memory address which has not been previously allocated. In the rest of this post, we will show how we can use SeaHorn to tackle the ABC problem.

Consider we want to prove that all the array accesses are in-bounds for the next example abc1.c:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#include "seahorn.h"
extern int nd();
#define N 10
int main(int argc, char**argv) {
  int i;
  int a[N];
  for (i = 0; i < N; i++)  {
    a[i] = i;
  }

#ifndef ERROR
  printf("%d\n", a[i-1]);
#else
  printf("%d\n", a[i]);
#endif   
  return 0;
}

To do that, we can instrument abc1.c by adding two ghost variables offset and size per each pointer p. The variable offset is the distance from the base address of the memory object that contains p to the actual p, and size is the actual size of the allocated memory for p. In addition, we need to instrument each array access to update offset accordingly while adding the in-bound assertions that we would like to prove. We call this instrumented program abc1_inst.c:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
#include "seahorn.h"
extern int nd();

#define N 10

int main(int argc, char**argv)
{
  int i;
  int a[N];

  // ghost variables
  int offset;
  unsigned int size;

  // initialization of the ghost variables
  offset = 0;
  size = sizeof(a);

  for (i = 0; i < N; i++) {
    // update offset
    offset = sizeof(int)*i;
	// check the array write is in-bounds
    sassert (offset < size);
    sassert (offset >= 0);

    a[i] = i;
  }

#ifndef ERROR
  // update offset
  offset = sizeof(int)*(i-1);  
  // check the array read is in-bounds
  sassert (offset < size);
  sassert (offset >= 0);

  printf("%d\n", a[i-1]);
#else
  // update offset
  offset = sizeof(int)*i;  
  // check the array read is in-bounds
  sassert (offset < size);
  sassert (offset >= 0);

  printf("%d\n", a[i]);
#endif   
  return 0;
}

Now, we are ready to prove abc1_inst.c:

sea pf -O0 abc1_inst.c --show-invars

SeaHorn can prove the program is safe! (i.e., all array accesses are in-bounds). The invariants produced by SeaHorn are:

unsat
Function: sassert
sassert@_call: true
sassert@_ret: true
Function: main
main@entry: true
main@_bb:
	(main@%i.0.i>=0)
	(main@%i.0.i<=10)
main@verifier.error.split: false

Let us also try a buggy version where the array index i is accessed when i=10:

sea pf -O0 abc1_inst.c -DERROR --show-invars

SeaHorn finds a counterexample of length 10.

As you can see, this instrumentation can be done manually in a straightforward manner if there is not many pointers in the program. Otherwise, it can be a very tedious process. Apart from that, the instrumentation can get really complicated with functions and specially when pointers are stored in memory.

Consider another program called abc2.c similar to abc1.c but where two arrays a and b are initialized instead of one:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#include "seahorn.h"
#include <stdint.h>

extern int nd();

#define N 10

int main(int argc, char**argv) {

  int8_t a[N];
  int8_t b[N];

  int i;
  for (i = 0; i < N; i++) {
    a[i] = i;
  }

  int j;
  for (j = 0; j < N; j++) {
    b[j] = j;
  }

#ifndef ERROR
  printf("%d\n", a[i-1]);
  printf("%d\n", b[i-1]);  
#else
  printf("%d\n", b[i]);    
#endif

  return 0;
}

We propose now a simpler instrumentation that does not need to instrument each pointer individually. The idea is to choose non-deterministically a pointer and keep track of the base address and size of the region to which the pointer points to. The key insight is that the solver takes care of resolving the non-determinism. The instrumentation adds only four global variables:

The key advantage is that we keep track of only four variables regardless of the number of pointers in the program. Moreover, it is really easy to instrument pointers that are stored in memory. We show the instrumented program called abc2_inst.c following this global encoding:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
#include "seahorn.h"
#include <stddef.h>
#include <stdint.h>

extern int nd();

#define N 10

typedef ptrdiff_t sea_ptrdiff_t;
typedef ptrdiff_t sea_size_t;

extern int nd(void);
extern int8_t *nd_int8_ptr (void);
extern sea_size_t nd_sea_size_t (void);

static int8_t* base;
static int8_t* ptr;
static sea_ptrdiff_t offset;
static sea_size_t size;

int main(int argc, char**argv)
{

  // -- init
  base = nd_int8_ptr ();
  assume (base > 0);
  size = nd_sea_size_t ();
  assume (size >= 0);
  ptr = 0;  
  offset = 0;

  int8_t a[N];

  // -- allocation of a
  int8_t *b_a  = &(a[0]);  
  if (!ptr  && (b_a == base)) {
    // ptr = base;
    ptr = nd_int8_ptr ();
    assume (ptr == base);
    assume (size == sizeof(int8_t) * N);
    offset = 0;
  } else {
    assume (base+size < b_a);
  }

  int i=0;
  int8_t* q_a = b_a;
  for (; i < N; i++, q_a++) {
    // -- pointer arithmetic
    if (nd() && (q_a == ptr)) {
      ptr = nd_int8_ptr ();
      assume (ptr == (q_a + 1));
      offset += 1;
    }
    // -- a[i] = i;
    *q_a = i;
  }

  int8_t b[N];
  int8_t *b_b  = &(b[0]);

  // -- allocation of b
  if (!ptr  && (b_b == base)) {
    // ptr = base;
    ptr = nd_int8_ptr ();
    assume (ptr == base);
    assume (size == sizeof(int8_t) * N);
    offset = 0;
  } else {
    assume (base+size < b_b);
  }

  int j=0;
  int8_t* q_b = b_b;
  for (; j < N; j++, q_b++) {
    // -- pointer arithmetic
    if (nd() && (q_b == ptr)) {
      ptr = nd_int8_ptr ();
      assume (ptr == (q_b + 1));
      offset += 1;
    }
    // -- b[j] = j;
    *q_b = j;
  }

#ifndef ERROR
  // -- safe memory access
  if (ptr && (ptr  == (b_a + (i-1)))) {
    sassert (offset < size);
  }
  printf("%d\n", a[i-1]);

  // -- safe memory access
  if (ptr && (ptr  == (b_b + (j-1)))) {
    sassert (offset < size);
  }
  printf("%d\n", b[j-1]);

#else
  // -- safe memory access
  if (ptr && (ptr  == (b_a + (i-1)))) {
    sassert (offset < size);
  }
  printf("%d\n", a[i-1]);

  // -- unsafe memory access
  if (ptr && (ptr  == (b_b + j))) {
    sassert (offset < size);
  }
  printf("%d\n", b[j]);

#endif   
  return 0;
}

We can now prove that abc2_inst.c is safe by running the command:

sea pf -O0 abc2_inst.c --show-invars

We do not show the invariants but we recommend you to run this command and look at the invariants to convince yourself why the program is safe.

We can also check that the buggy version is indeed unsafe by running:

sea pf -O0 abc2_inst.c --show-invars -DERROR

We will not show either the counterexample but if you have read previous posts then you should be able to generate a harness program and find out why the program is unsafe using your favorite debugger.

So we have seen that instrumenting a program to prove absence of buffer overflows it is a lot of fun. But if you do not want to instrument manually your program, SeaHorn can add fully automatically the array bounds checks. For instance, if you execute the command:

sea abc -O0 abc2.c --show-invars

then you should see the following output:

unsat
Function: main
main@entry: true
main@_i.0.i:
		(main@%i.0.i>=0)
	(main@%i.0.i<=5)
main@_j.0.i:
		(main@%j.0.i>=0)
	(main@%i.0.i>=5)
	(main@%i.0.i<=5)
main@verifier.error.split: false

SeaHorn proves absence of buffer overflows by typing one single command!

Well, this is the end of our post where we have shown two different ways of instrumenting a program to prove absence of buffer overflows. Luckily, SeaHorn can add the instrumentation automatically so that we do not need to do it by ourselves.

*****
Written by SeaHorn on 27 May 2017