Verifying Imperative Programs with Functional Programming

fstar-logo

Nik Swamy

Trends in Functional Programming

Vancouver, BC

June 12, 2019

Program verification: Shall the twain ever meet?

Interactive proof assistants Semi-automated verifiers of imperative programs
Coq, CompCert,   air Dafny, Verve,
Isabelle, seL4, FramaC, IronClad,
Agda, Bedrock, Why3 miTLS
Lean, PVS, 4 colors   gap Vale
  • In the left corner: Very expressive dependently-typed logics,
    but only purely functional programming

  • In the right: effectful programming, SMT-based automation,
    but only first-order logic

Bridging the gap: F*

  • Functional programming language with effects

    • like OCaml, F#, Haskell,
    • F* extracted to OCaml or F# by default
    • Subset of F* compiled to efficient C code
  • Semi-automated verification system using SMT

    • like Dafny, FramaC, Why3,
  • With an expressive core language based on dependent type theory

    • like Coq, Lean, Agda, PVS,
  • A metaprogramming and tactic framework for interactive proof and user-defined automation

    • like Coq, Isabelle, Lean, PVS, etc.

A first taste

  • Write ML-like code

    let rec factorial n = 
      if n = 0 then 1 
      else n * factorial (n - 1)
  • Give it a specification, claiming that factorial is a total function from non-negative to positive integers.

    val factorial: n:int{n >= 0} -> Tot (i:int{i >= 1}) 
  • Ask F* to check it

    fstar factorial.fst
    Verified module: Factorial
    All verification conditions discharged successfully
fstar factorial.fst
Verified module: Factorial
All verification conditions discharged successfully
F* builds a typing derivation of the form:

$\Gamma_{\mbox{prelude}} \vdash \mbox{\texttt{let factorial n = e}} : t \Leftarrow \phi$

  • In a context $\Gamma_{\mbox{prelude}}$ including definitions of F* primitives

  • The program $\mbox{\texttt{let factorial n = e}}$ has type $t$, given the validity of a logical formula $\phi$

  • $\phi$ is passed to Z3 (an automated theorem prover/SMT solver) to check for validity

  • If the check succeds, then, from the metatheory of F*, the program is safe at type $t$

Running F* Programs

Extraction to OCaml and FSharp

fstar-compiler-1

Running F* Programs

Extraction to OCaml and FSharp

Extraction of a subset of F* to C

fstar-compiler-2

Some uses of F*

  • Project Everest: verify and deploy new, efficient HTTPS stack
    • miTLS: Verified reference implementation of TLS (1.2 and 1.3)
    • EverParse: Verified parsers and formatter generators
    • EverCrypt: Agile Cryptographic Provider
    • HACL*: High-Assurance Cryptographic Library
    • Vale: Verified Assembly Language for Everest
  • Verified Everest code deployed in
    • Firefox (Mozilla NSS crypto)
    • Windows (WinQUIC)
    • Azure Confidential Consortium (Verified Merkle tree for the blockchain)
    • WireGuard VPN
    • Zinc crypto library for Linux
    • Tezos and Concordium blockchains

The Current Everest and F* team

Microsoft Research, Inria Paris, CMU, MIT, Rosario, …

Everest-People

  • many former members, interns, external contributors, you?

This talk

  • Need to write low-level code for performance

    • Cryptographic routines are heavily optimized in C and assembly
    • Networking protocols are inherently effectful
  • Need to write low-level code for security

    • Hard to control side-channels with a GC, JIT etc.
  • Specifying and verifying effectful programs is hard

    • Much effort involved in designing and structuring specifications, perhaps more effort than programming effectful code itself

    • Typical code to proof ratio for functional correctness and security proofs is ~ 1 : 5 (more like 1 : 20 without SMT)

  • Pays to structure specifications well functional programming can help

  • Functional programming ~ Freedom of abstraction

    • FP holds the key to right abstractions for efficient, provably safe, low-level programming

Computations indexed by familiar FP structures


Core F* design principle:

  • Computation types are indexed by their specifications

  • Specs equipped with some algebraic structure for formal manipulation

Three examples:

  1. Weakest precondition inference: Monads indexed by WP-monads

  2. Imperative data access code: Lens-indexed imperative lenses

  3. Low-level parsers: Parser-indexed imperative parsers

    • Low-level formatters: Indexed by parsers, their inverses
  4. many more structures still to be discovered

The Functional Core of F*

Informally, like Coq or Agda, but with an extensional dependent type theory (~Nuprl)

  • Refinement types
    let nat = x:int{x>=0}
    
  • Dependent function types ($\Pi$), here together with refinements:
    val incr : x:int -> y:int{x < y}
    let incr x = x + 1

The Functional Core of F*

  • Recursive functions over inductive datatypes

    type vec a : nat -> Type =
      | Nil : vec a 0
      | Cons : #n:nat -> hd:a -> tl:vec a n -> vec a (n + 1)
    
    let rec append (v1:vec a n) (v2:vec a m) 
      : vec a (n + m) =
      match v1 with
      | Nil -> v2
      | Cons hd tl -> Cons hd (append tl v2)
  • Higher order

      let rec map (f:a -> b) (v:vec a n) 
        : vec b n 
        = match v with
          | Nil -> Nil
          | Cons hd tl -> Cons (f hd) (map f tl)
    
      map (fun x -> x + 42) (Cons 1 Nil)

Total functions in F*

  • The F* functions we saw so far were all total

  • Tot effect (default) = no side-effects, terminates on all inputs

    val factorial : nat -> Tot nat
    let rec factorial n = (if n = 0 then 1 else n * (factorial (n - 1)))
  • Quiz: How about giving this weak type to factorial?

    val factorial : int -> Tot int
  let rec factorial n = if n = 0 then 1 else n * factorial (n - 1)
                                                            ^^^^^
  Subtyping check failed; expected type x:int{x << n}; got type int

factorial (-1) loops! (int type in F* is unbounded)

  • Potentially divergent terms
    val factorial : int -> Dv int

Values and Computations

  • “Values” aka unconditionally total terms

  • Two classes of types

    • Value types (t): int, list int,
    • Computation types (C): conditionally pure, divergent, stateful,
  • F* effect system encapsulates effectful code

    • Pure code cannot call effectful code
    • But, via sub-effecting, pure code can be used in effectful contexts
  • Dependent function types of the form: x:t -> C

    • F* is call-by-value
    • argument can't have side-effects, so value type
  • Two forms of refinement types

    • Refined value types: x:t{p}
    • Refined computation types

Refined Computation Types

Monadic effects in F*

  • A classic state monad:
type st a = s -> (a * s)
let return x h = x,h
let bind f g h = let x,h' = f h in g x h'
let get () h = h,h
let put h _ = (),h
  • Turned into an abstract “computation type” and can be primitively implemented under the hood or not
    new_effect STATE { st; return; bind; get; put }
  • Can now program effectful computations directly:

    let double () = put (get () + get ())
  • And F* infers double : unit -> STATE unit w

Specifying Effectful Programs

  • Type inference in F* includes inference of weakest preconditions

  • e : STATE a w means

    forall post h0.        //for all postconditions and initial states
     w post h0 ==>       //given the weakest precondition is valid for post and h0
     e h0 ~>* (v, h1) /\ //the computation reduces to a value and final state
     post (v, h1)        //that validate the postcondition
  • What is a weakest precondition predicate transformer?

    let st_wp a : st_post a -> st_pre  //transforms postconditions to preconditions
    where st_post a : (a * s) -> prop //postconditions relate results to memories
    and   st_pre : s -> prop //preconditions are memory predicates
  • Dijkstra monad for state

WP inference $\sim$ CPS tranform

let st_wp a : st_post a -> st_pre  //transforms postconditions to preconditions
where st_post a : (a * s) -> prop //postconditions relate results to memories
and   st_pre : s -> prop //preconditions are memory predicates
  • Hey, wait a minute isn't that like the continuation monad?
cont r a = (a -> r) -> r
  • Take the continuation monad with result type prop

    s -> cont prop (a * s)
    = s -> ((a * s) -> prop) -> prop
    ~ ((a * s) -> prop) -> s -> prop
    = st_post a -> st_pre
    = st_wp a
  • WPs for state are just the state monad transformer applied to the continuation monad with result type prop

    stateT m a = s -> m (a * s)
    st_wp a = stateT (cont prop) a

Auto-generating WP calculi for monads

  • F* automatically derives a weakest precondition calculus for m-effectful computations whose WPs have the form

    m_wp a = mT (cont prop) a
    • the m transformer on the continuation monad with result type prop
  • And infers computations types of the form M a (wp : m_wp a)

  • E.g., for exceptions

    ex_wp a = ex_t (cont prop) a
          = (option a -> prop) -> prop
  • For state + exceptions

    st_ex_wp a = state_t (ex_t (cont prop)) a
             = s -> (ex_t (cont prop) a * s)
             = s -> ((option a * state) -> prop) -> prop
             ~ ((option a * state) -> prop) -> s -> prop
             = st_ex_post a -> st_ex_pre

Derived Specifications: Hoare triples

  • Reasoning about continuations: great for a core logic / not for a human

  • Hoare logic-style pre-conditions and post-conditions

    ST a (pre: s -> prop) (post: s -> a -> s -> prop) =
    STATE a (fun k s0 -> pre s0 /\ (forall x s1. post s0 x s1 ==> k (x, s1)))
  • stateful pre-condition is predicate on initial states

  • post-condition is relation on initial states, results, and final states

val get ()
  : ST s
    (requires fun s -> True)
    (ensures fun s0 result s1 -> s0 == result /\ result == s1)
    
val put s
  : ST unit
    (requires fun _ -> True)
    (ensures fun _ _ s1 -> s1 == s0)

Richer Model of Mutable Memory

with Lens-indexed Imperative Lenses

Richer memory models

  • Program libraries to model memory, e.g., the ML or C heap

  • Derive effectful actions for primitive operations (e.g., !, := etc.)

  • Write effectful programs against these libraries and verify them with refined computation types

  • Extract them to programs in ML or C with primitive effects

  • F*:

    let incr (r:ref int)
      : ST unit
        (requires fun h0 -> h0 `contains` r)
        (ensures fun h0 _ h1 -> sel h1 r = sel h0 r + 1)
      = r := !r + 1

    ML:

    let incr (r:ref int) : unit = r := !r + 1

    C:

    void incr (int *r) { *r = *r + 1; }

Modeling the ML heap

A sketch of FStar.Heap:

module Heap
  let addr = nat
  abstract let heap = {
    next_addr: addr;
    map: addr -> option (a:Type & v:a) {
       forall a. h > next_addr ==> map a == None
    }
  }
  abstract let ref t = addr  
  let contains h (r:ref t) = r < h.next_addr /\ h.map r == Some (t, _)
  let sel h (r:ref t{h `contains` r}) = let Some (_, v) = h.map r in v
  let upd h (r:ref t{h `contains` r}) v = ...

More on modeling heaps

Deriving ML-like Effectful Operations

  • Reading references

    let (!) #t (r:ref t)
      : ST t
          (requires fun h -> h `contains` r)
          (ensures fun h0 x h1 -> h0 == h1 /\ x = sel h1 r) =
      sel (get()) r
  • Writing references

    let (:=) #t (r:ref t) (v:t)
      : ST (ref t)
          (requires fun h -> h `contains` r)
          (ensures fun h0 x h1 -> h1 == upd h0 r v) =
      put (upd (get()) r v); r
  • Allocating and freeing references

Bidirectional data access, abstractly with lenses

type lens a b = {
    get : a -> b;
    put : b -> a -> a
}
  • sel and upd form a lens
    let ref_lens : lens (heap * ref a) a  = {
      get = fun (h, r) -> sel h r;
      put = fun v (h, r) -> upd h r v
    }

Imperative lenses

  • A lens-indexed computation type:

    type st_lens inv (l:lens (heap * a) b) = {
     st_get : x:a
           -> ST b
              (requires fun h -> inv h x)
              (ensures fun h0 y h1 -> h0==h1 /\ y == l.get (h0, x));
    
     st_put : y:b 
           -> x:a 
           -> ST a
              (requires fun h -> inv h x)
              (ensures fun h0 x' h1 -> h1, x' == l.put y (h0, x))
    }
  • (!) and (:=) are imperative lenses

    let iref_lens : st_lens contains ref_lens = {
      st_get = (!);
      st_put = (:=);
    }

More on lens-indexed imperative lenses

Manipulating Binary-formatted Data

with the EverParse parser combinator library

Networking Protocols: Many performance and security-critical parsers

  • TLS 1.3 message format, with variable-length data

    uint16 ProtocolVersion; opaque Random[32]; uint8 CipherSuite[2];    ​
    ​
    struct {​
         ProtocolVersion legacy_version = 0x0303;​
         Random          random;​
         opaque          legacy_session_id<0..32>;​
         CipherSuite     cipher_suites<2..2^16-2>;​
         opaque          legacy_compression_methods<1..2^8-1>;​
         Extension       extensions<8..2^16-1>;​
    } ClientHello;
  • De facto: Hand-written parsers and formatters, for performance, but also many vulnerabilities

    • Heartbleed was a parsing bug

EverParse: A Verified Low-level Parser Generator

  • Given a type description (e.g., ClientHello)

  • Produces a parser and a formatter indexed by its (inverse) parser

let parser t = b:bytes -> option (t * n:nat{n <= length b})
let format (p:parser t) = f:(t -> bytes) {
    forall x. p (f x) == Some (x, length (f x))
}
  • And a low-level zero-copy validator indexed by the parser
    let validator (p:parser t) = 
      bs:array uint8 ->
      pos:u32 { pos <= length bs } ->
      ST u32
      (requires fun h0 -> h0 `contains` bs)
      (ensures fun h0 res h1 ->
          h0 == h1 /\
          (if res < ERROR_CODE
           then exists v. p (as_bytes bs h0 pos) = Some (v, res) (* parsing succeeds *)
           else p (as_bytes bs h0 pos) = None (* parsing fails *)))

Combinator library for validators

  • Mirroring the structure of parser combinators
val return : validator unit_parser
val bind : validator p -> (t -> validator q) -> validator (p `Parser.bind` q)
val seq: validator p -> validator s -> validator (p `Parser.seq` s)
val map : f:(t -> s)  -> validator p -> -> validator (Parser.map f p)
...
  • Client-hello validator

    protocolVersion_validator​
    `seq` random_validator​
    `seq` sessionID_validator​
    `seq` clientHello_cipher_suites_validator​
    `seq` clientHello_compression_method_validator​
    `seq` clientHelloExtensions_validator​
  • Imperative code refines pure parser by construction

Data accessors

  • Structured access into validated binary data, specified by parsers and lenses

    let accessor (p1:parser t1) (p2:parser t2) (l:lens t1 t2) =
      bs:array uint8 ->
      pos:u32 { pos <= length bs } ->
      ST u32
      (requires fun h0 ->
          h0 `contains` bs /\
          Some? (p1 (as_bytes bs h0 pos))) (* bs contains a valid t1 *))
      (ensures fun h0 res h1 ->
          h0 == h1 /\
          let Some (v1, _) = p1 (as_bytes bs h0 pos) in
          p2 (as_bytes bs h0 res) == Some (l.get v1, _)) (* returns offset to t2 *)
      
  • Abstractly, structured data access into a (representation of) t1

  • Concretely, just performs arithmetic into the binary formatted data

    • Effectively, reads directly from computed offsets into network packets
    • Would be crazy to try this directly in C!

Extracted to efficient C code

  • After partial evaluation and proof erasure
uint32_t Parsers_ClientHello_clientHello_validator(LowParse_Low_Base_slice input, uint32_t pos)​
{​
  uint32_t pos10 = Parsers_ProtocolVersion_protocolVersion_validator(input, pos);​
  uint32_t pos11;​
  if (pos10 > ERROR_CODE)​
    pos11 = pos10;​
  else​
    pos11 = Parsers_Random_random_validator(input, pos10);​
  uint32_t pos12;​
  if (pos11 > ERROR_CODE)​
    pos12 = pos11;​
  else​
    pos12 = Parsers_SessionID_sessionID_validator(input, pos11);​
  uint32_t pos13;​
  if (pos12 > ERROR_CODE)​
    pos13 = pos12;​
  else​
    pos13 = Parsers_ClientHello_cipher_suites_clientHello_cipher_suites_validator(input, pos12);​
  uint32_t pos1;​
  ...
}​

Performance of extracted C code

EverParse-perf

Wrapping up

  • Write low-level code if you must

  • But, program it tastefully in a proof assistant, not directly in C or asm

  • Thoughtful structuring of imperative coding patterns can make reasoning about imperative programs similar to functional programming

  • And with proofs, partial evaluation and proof erasure, the resulting code can be significantly faster than hand-written C

What's next?

  • Steel: Separation logic and concurrency, using a mixture of SMT solving and metaprogrammed tactics
    • DSL in F* for Rust-like programming with proofs of correctness
  • Internalizing relational proofs
    • Program equivalence, information flow, etc.
  • More DSLs for specific verification tasks
    • Protocol state machines, quantum programming,
  • Learn more: