CompCert

CompCert is a formally verified optimizing C compiler.

Formally verified optimizing C compiler: unparalleled level of confidence

CompCert is a formally verified optimizing C compiler. Its intended use is compiling safety-critical and mission-critical software written in C and meeting high levels of assurance. It ac­cepts most of the ISO C 99 language, with some exceptions and a few extensions. It produces machine code for PowerPC, x86, ARM, AArch64, and RISC-V architectures.

CompCert is the only production compiler that is formally verified, using machine-assisted math­ematical proofs, to be exempt from mis­compilation issues. The code it pro­duces is proved to behave exactly as specified by the semantics of the source C program.

This level of confidence in the correctness of the compilation process is unprecedented and con­tributes to meeting the highest levels of software assurance.

The CompCert proof

The CompCert frontend and backend compilation passes are all formally proved to be free of miscompilation errors; as a consequence, so is their composition.

The property that is formally verified is semantic preservation between the input code and output code of every pass.

To state this property with mathematical precision, formal semantics are given for every source, intermediate and target language, from C to assembly. These semantics associate to each program the set of all its possible behaviors. Behaviors indicate whether the program terminates (normally by exiting, or abnormally by causing a runtime error such as dereferencing the null pointer) or runs forever.

Behaviors also contain a trace of all observable input/output actions performed by the program, such as system calls and accesses to “volatile” memory areas that could correspond to a memory-mapped I/O device.

To a first approximation, a compiler preserves semantics if the generated code has exactly the same set of observable behaviors as the source code (same termination properties, same I/O actions). This first approximation fails to account for two important degrees of freedom left to the compiler.

  1. First, the source program can have several possible behaviors: this is the case for C, which permits several evaluation orders for expressions. A compiler is allowed to reduce this non-determinism by picking one specific evaluation order.
  2. Second, a C compiler can “optimize away” runtime errors present in the source code, replacing them by any behavior of its choice. (This is the essence of the notion of “undefined behavior” in the ISO C standards.)
To address the two degrees of flexibility above, CompCert’s formal verification uses the following definition of semantic preservation, viewed as a refinement over observable behaviors:
 
If the compiler produces compiled code C from source code S, without reporting compile-time errors, then every observable behavior of C is either identical to an allowed behavior of S, or improves over such an allowed behavior of S by replacing undefined behaviors with more defined behaviors.
 
The semantic preservation property is a corollary of a stronger property, called a simulation diagram that relates the transitions that C can make with those that S can make. First, the simulation diagrams are proved independently, one for each pass of the frontend and backend compilers. Then, the diagrams are composed together, establishing semantic preservation for the whole compiler.
 
The proofs are very large, owing to the many passes and the many cases to be considered — too large to be carried using pencil and paper. We therefore use machine assistance in the form of the Coq proof assistant. Coq gives us the means to write precise, unambiguous specifications; conduct proofs in interaction with the tool; and automatically re-check the proofs for soundness and completeness. We therefore achieve very high levels of confidence in the proof.

REQUEST INFO