View on GitHub

compcertmc

Website for POPL artefact (Stack-Aware CompCert and CompCert MC)

An Abstract Stack Based Approach to Verified Compositional Compilation to Machine Code

This is the website hosting the artifact associated with our POPL’19 submission.

You will find our source code in this archive.

Requirements

The compiler is based on CompCert. You can find the user manual of CompCert here.

The development is known to compile with the following software:

Instructions

Download the source code and unzip it to a directory.

The easiest way to setup the development environment is to use the Nix package manager. We provide a configuration file (default.nix) for a nix environment. Simply run nix-shell in the directory will install all the dependent packages and enter into a shell for compiling our source code.

You can also setup the environment using your preferred method (such as OPAM).

To compile our development, first configure the target architecture:

./configure x86_32-linux

This will configure the development for the 32-bit x86 architecture. Note that generally speaking our approach for machine code generation is architecture independent. We are limited to the 32-bit x86 architecture because the RockSalt instruction encoder we use for this project only supports x86_32.

Then generate the dependencies:

make depend

Finally compile the whole development

make -j

This will compile CompCertMC (which Stack-Aware CompCert is a part of) and Stack-Aware CompCertX.

This takes approximately 30-40 minutes depending on your machine. Note that the compilation of RockSalt encoder consumes a lot of memory. To avoid running out of memory, we recommend to have at least 16GB memory (We use a workstation with 64GB memory).

Usage

The Stack-Aware CompCert is a part of CompCertMC which on its own works in the same way as the original CompCert. CompCertMC supports compilation to ELF files. Stack-Aware CompCertX supports contextual compilation, which will replace CompCertX in the CertiKOS project.

The CompCertMC compiler is verified up to the MC language which is very close to machine code. Further compiler to machine code (ELF_X88_32 format) is supported through an unverified process. For now we only support encoding of a subset of CompCert assembly instructions and the compilation to an ELF executable (not object) files. Moreover, the call to external functions are not supported because it would require us to implement a linker to link with the standard C libraries (we are still trying to figure out how to implement such a linker). For the purpose of demonstration, we have set up some examples in c_examples. As an example, you can compile and run the following program for computing fibonacci numbers.

  $ cd c_examples
  $ cat fib.c
    int fib (int n)
    {
      if (n <= 2) 
        return 1;
      else 
        return fib(n-1) + fib(n-2);
    }
    
    int n = 12;
    
    int main() 
    {
      return fib(n);
    }
  
  $ ../ccomp -machinecode fib.c
  $ chmod +x fib.o
  $ ./fib.o 
  $ echo $?
    144

The -machinecode flag is passed to CompCertMC for generating machine code. The resulting ELF file is fib.o. The last echo command outputs the most recent return value, which is the 12th fibonacci number.

The following is another example of compiling and running a program that computes factorials.

$ cat fac.c 

int fac(int n)
{
  int v = 0;

  if (n == 0) {
    v = 1;
  } else {
    v = n * fac(n-1);
  }

  return v;
}

int main()
{
  return fac(6);
}

$ ../ccomp -timings -machinecode fac.c 
   0.00s  Parsing
   0.00s  Elaboration
   0.00s  Emulations
   0.00s  CompCert C generation
   0.00s  Clight generation
   0.00s  Simplification of locals
   0.00s  C#minor generation
   0.00s  Cminor generation
   0.00s  Instruction selection
   0.00s  RTL generation
   0.00s  Tail calls
   0.00s  Inlining
   0.00s  Renumbering
   0.00s  Constant propagation
   0.00s  CSE
   0.00s  Redundancy elimination
   0.00s  Unused globals
   0.00s  Register allocation
   0.00s  Branch tunneling
   0.00s  CFG linearization
   0.00s  Label cleanup
   0.00s  Mach generation
   0.00s  Asm generation
   0.00s  Elimination of pseudo instruction
   0.00s  Generation of FlatAsm
   0.00s  Generation of MC
   0.01s  Generation of RockSalt program
   0.03s  Total compilation time

$ chmod +x ./fac.o 
$ ./fac.o 
$ echo $?
208

Here we use the -timings flag to show the compilation passes went through. Note that the correct result is 720. We got 208 because return value is truncated to a byte.

Our most non-trivial example is an algorithm for solving the N-queen problem, which will return the number of solutions (See the table at this wiki page to find out the number of solutions).

$ ../ccomp -timings -machinecode nqueen.c 
   0.00s  Parsing
   0.00s  Elaboration
   0.00s  Emulations
   0.00s  CompCert C generation
   0.00s  Clight generation
   0.00s  Simplification of locals
   0.00s  C#minor generation
   0.00s  Cminor generation
   0.00s  Instruction selection
   0.00s  RTL generation
   0.00s  Tail calls
   0.00s  Inlining
   0.00s  Renumbering
   0.00s  Constant propagation
   0.00s  CSE
   0.00s  Redundancy elimination
   0.00s  Unused globals
   0.00s  Register allocation
   0.00s  Branch tunneling
   0.00s  CFG linearization
   0.00s  Label cleanup
   0.00s  Mach generation
   0.00s  Asm generation
   0.00s  Elimination of pseudo instruction
   0.00s  Generation of FlatAsm
   0.00s  Generation of MC
   0.02s  Generation of RockSalt program
   0.12s  Total compilation time

$ ls
fac.c  fib.c  gcd_alt.c  gcd.c  max.c  nqueen.c  nqueen.o

$ chmod +x nqueen.o 

$ ./nqueen.o 

$ echo $?
92

Structure of the proofs

For reference, you can browse the original CompCert source code here.

We also export our code as browsable HTML files.

We have made the following changes to the original CompCert for our compilers.

Stack-Aware CompCert

The abstract stack is defined in StackADT.v. The definitions of stages, frames and abstract blocks described in the paper can be found in this file. Lower in this file are the definition of stack injection.

Then in Memimpl.v, the CompCert memory model is enriched with a stack and some invariants on this stack, described as the stack_inv predicate. The new operations record, push_stage and pop_stage are defined here.

The semantics of every language is instrumented to insert stack operations. Let us have a look for example at the semantics for Cminor. The rule step_internal_function records a new frame, based on the size of the stack block. The rule step_return pops the current stage and the step_call rule pushes a new stage.

Similar changes can be observed in the semantics of all languages except Mach and Asm: RTL, Clight for instance.

The stack access policy is stated as the stack_access predicate. It is preserved under injection, as shown here.

The size of the stack is defined here. For most compiler passes, the stacks are in the stack_equiv relation, implying they are of the same size.

For Inlining, the stacks are in the inline_sizes relation, meaning a number of stages in the source correspond to one frame in the target stack. A corollary of this relation is that the target stack is not larger than the source stack.

Similarly for Tailcall recognition: the relation is tc_sizes.

CompCertMC

CompCert MC extends Stack-Aware CompCert with the following phases (all the files listed below are in the folder x86 and are new additions we made to CompCert):

Stack-Aware CompCertX

The compcertx folder contains new files for the contextual compilation aspect of this work. Each language semantics has an X version in this folder, where the semantics are redefined so that they do not denote the meaning of a main program without arguments but rather the execution of an arbitrary function in a program, starting from an arbitrary memory state and register state (when applicable). Look for example at ClightX or RTLX. The step relation is kept but the initial and final states are redefined.

The correctness proofs in *proofX.v are mostly reused from the classic CompCert version, with some adjustments for the new initial and final states.

The final theorem for CompCertX is found in SeparateCompilerproof.v.