Programming and Verifying Effectful Programs in F*

fstar-logo

Nik Swamy and Guido Martinez

ECI 2019

The divergence effect (Dv)

  • Some useful code really is not always terminating
    val eval : exp -> Dv exp
    let rec eval e =
      match e with
      | App (Lam x e1) e2 -> eval (subst x e2 e1)
      | App e1 e2 -> eval (App (eval e1) e2)
      | Lam x e1 -> Lam x (eval e1)
      | _ -> e
    let main = eval (App (Lam 0 (App (Var 0) (Var 0)))
                         (Lam 0 (App (Var 0) (Var 0))))
    ./Divergence.exe

F* effect system encapsulates effectful code

  • Pure code cannot call potentially divergent code

  • Only pure code can appear in specifications

    val eval : expr -> Dv expr
        type tau = e:expr{e = eval e'}
                              ^^^^^^
    Expected a pure expression; got an expression ... with effect "DIV"
  • Sub-effecting: Tot t <: Dv t
    (e.g. divergent code can include pure code)

    val subst : list (var * expr) -> expr -> Tot expr
    eval (subst [x, Num 0] e) : Dv expr

Verifying potentially divergent programs

(partial correctness)

  • Using refinement types
    val factorial : int -> Dv int
  • Or the Div computation type (pre- and post- conditions)
    val eval_closed : e:exp{closed e} -> Dv (e':exp{Lam? e' /\ closed e'})
    let rec eval_closed e =
      match e with
      | App e1 e2 -> let Lam e1' = eval_closed e1 in
                     below_subst_beta 0 e1' e2;
                     eval_closed (subst (sub_beta e2) e1')
      | Lam e1 -> Lam e1

Monadic effects in F*

  • A classic state monad:
type st a = s -> (a * s)
let return x h = x,h
let bind f g h = let x,h' = f h in g x h'
let get () h = h,h
let put h _ = (),h
  • Turned into an abstract “computation type” and can be primitively implemented under the hood or not

    new_effect STATE { st; return; bind; get; put }
  • Can now program effectful computations directly:

    let double () = put (get () + get ())
  • And F* infers double : unit -> STATE unit w

Specifying Effectful Programs

  • Type inference in F* includes inference of weakest preconditions

  • e : STATE a w means

    forall post h0.        //for all postconditions and initial states
     w post h0 ==>       //given the weakest precondition is valid for post and h0
     e h0 ~>* (v, h1) /\ //the computation reduces to a value and final state
     post (v, h1)        //that validate the postcondition
  • What is a weakest precondition predicate transformer?

    let st_wp a : st_post a -> st_pre  //transforms postconditions to preconditions
    where st_post a : (a * s) -> prop //postconditions relate results to memories
    and   st_pre : s -> prop //preconditions are memory predicates
  • Dijkstra monad for state

WP inference $\sim$ CPS tranform

let st_wp a : st_post a -> st_pre  //transforms postconditions to preconditions
where st_post a : (a * s) -> prop //postconditions relate results to memories
and   st_pre : s -> prop //preconditions are memory predicates
  • Hey, wait a minute isn't that like the continuation monad?
cont r a = (a -> r) -> r
  • Take the continuation monad with result type prop

    s -> cont prop (a * s)
    = s -> ((a * s) -> prop) -> prop
    ~ ((a * s) -> prop) -> s -> prop
    = st_post a -> st_pre
    = st_wp a
  • WPs for state are just the state monad transformer applied to the continuation monad with result type prop

    stateT m a = s -> m (a * s)
    st_wp a = stateT (cont prop) a

Auto-generating WP calculi for monads

  • F* automatically derives a weakest precondition calculus for m-effectful computations whose WPs have the form

    m_wp a = mT (cont prop) a
    • the m transformer on the continuation monad with result type prop
  • And infers computations types of the form M a (wp : m_wp a)

  • E.g., for exceptions

    ex_wp a = ex_t (cont prop) a
          = (option a -> prop) -> prop
  • For state + exceptions

    st_ex_wp a = state_t (ex_t (cont prop)) a
             = s -> (ex_t (cont prop) a * s)
             = s -> ((option a * state) -> prop) -> prop
             ~ ((option a * state) -> prop) -> s -> prop
             = st_ex_post a -> st_ex_pre

Derived Specifications: Hoare triples

  • Reasoning about continuations: great for a core logic / not for a human

  • Hoare logic-style pre-conditions and post-conditions

    ST a (pre: s -> prop) (post: s -> a -> s -> prop) =
    STATE a (fun k s0 -> pre s0 /\ (forall x s1. post s0 x s1 ==> k (x, s1)))
  • stateful pre-condition is predicate on initial states

  • post-condition is relation on initial states, results, and final states

val get ()
  : ST s
    (requires fun s -> True)
    (ensures fun s0 result s1 -> s0 == result /\ result == s1)

val put s
  : ST unit
    (requires fun _ -> True)
    (ensures fun _ _ s1 -> s1 == s0)

Richer Model of Mutable Memory

with Lens-indexed Imperative Lenses

Richer memory models

  • Program libraries to model memory, e.g., the ML heap

  • Derive effectful actions for primitive operations (e.g., !, := etc.)

  • Write effectful programs against these libraries and verify them with refined computation types

  • Extract them to programs in ML or C with primtive effects

  • F*:

    let incr (r:ref int) : ST unit (requires (fun _ -> True))
             (ensures (fun h0 _ h1 -> modifies !{r} h0 h1 /\ sel h2 r = sel h1 r + 1))
        = r := !r + 1

    ML:

    let incr (r:ref int) : unit = r := !r + 1

    C:

    void incr (int *r) { *r = *r + 1; }

Modeling the ML heap

A sketch of FStar.Heap:

module Heap
  let addr = nat
  abstract let heap = {
    next_addr: addr;
    map: addr -> option (a:Type & v:a) {
       forall a. h > next_addr ==> map a == None
    }
  }
  abstract let ref t = addr
  let contains h (r:ref t) = r < h.next_addr /\ h.map r == Some (t, _)
  let sel h (r:ref t{h `contains` r}) = let Some (_, v) = h.map r in v
  let upd h (r:ref t{h `contains` r}) v = ...

More on modeling heaps

Deriving ML-like Effectful Operations

  • Reading references

    let (!) #t (r:ref t)
      : ST t
          (requires fun h -> h `contains` r)
          (ensures fun h0 x h1 -> h0 == h1 /\ x = sel h1 r) =
      sel (get()) r
  • Writing references

    let (:=) #t (r:ref t) (v:t)
      : ST (ref t)
          (requires fun h -> h `contains` r)
          (ensures fun h0 x h1 -> h1 == upd h0 r v) =
      put (upd (get()) r v); r
  • Allocating and freeing references

Freeing references

let free #r (r:ref t)
  : ST unit
    (requires fun h -> h `contains` r)
    (ensures fun h0 () h1 -> modifies !{r} h0 h1)
  = let h = get () in
    put ({h with map = fun a -> if a = r then None else h.map a})

Reference swapping (hand proof sketch)

val swap : r1:ref int -> r2:ref int -> ST unit
    (requires (fun h0 -> True))
    (ensures (fun h0 _ h3 -> modifies !{r1,r2} h0 h3 /\
                             sel h3 r2 == sel h0 r1 /\ sel h3 r1 == sel h0 r2))
let swap r1 r2 =
  let t = !r1 in
    (* Know (P1): exists h1 t. modifies !{} h0 h1 /\ t == sel h0 r1 *)
  r1 := !r2;
    (* Know (P2): exists h2. modifies !{r1} h1 h2 /\ sel h2 r1 == sel h1 r2 *)
  r2 := t
    (* Know (P3): modifies !{r2} h2 h3 /\ sel h3 r2 == t *)
(* `modifies !{r1,r2} h0 h3` follows directly from transitivity of modifies *)

(* `sel h3 r2 == sel h0 r1` follows immediately from (P1) and (P3) *)

(* Still to show: `sel h3 r1 == sel h0 r2`
   From (P2) we know that  `sel h2 r1 == sel h1 r2` (A)
   From (P1) we know that  modifies !{} h0 h1
     which by definition gives us  sel h1 r2 == sel h0 r2 (B)
   From (P3) we know that  modifies !{r2} h2 h3
     which by definition gives us  sel h2 r1 == sel h3 r1 (C)
   We conclude by transitivity from (A)+(B)+(C) *)
  • This variant is correct even when r1 and r2 are aliased

Bidirectional data access, abstractly with lenses

type lens a b = {
    get : a -> b;
    put : b -> a -> a
}
  • sel and upd form a lens
    let ref_lens : lens (heap * ref a) a  = {
      get = fun (h, r) -> sel h r;
      put = fun v (h, r) -> upd h r v
    }

Imperative lenses

  • A lens-indexed computation type:

    type st_lens inv (l:lens (heap * a) b) = {
     st_get : x:a
           -> ST b
              (requires fun h -> inv h x)
              (ensures fun h0 y h1 -> h0==h1 /\ y == l.get (h0, x));
    
     st_put : y:b
           -> x:a
           -> ST a
              (requires fun h -> inv h x)
              (ensures fun h0 x' h1 -> h1, x' == l.put y (h0, x))
    }
  • (!) and (:=) are imperative lenses

    let iref_lens : st_lens contains ref_lens = {
      st_get = (!);
      st_put = (:=);
    }

More on lens-indexed imperative lenses

Manipulating Binary-formatted Data

with the EverParse parser combinator library

Networking Protocols: Many performance and security-critical parsers

  • TLS 1.3 message format, with variable-length data

    uint16 ProtocolVersion; opaque Random[32]; uint8 CipherSuite[2];    ​
    ​
    struct {​
         ProtocolVersion legacy_version = 0x0303;​
         Random          random;​
         opaque          legacy_session_id<0..32>;​
         CipherSuite     cipher_suites<2..2^16-2>;​
         opaque          legacy_compression_methods<1..2^8-1>;​
         Extension       extensions<8..2^16-1>;​
    } ClientHello;
  • De facto: Hand-written parsers and formatters, for performance, but also many vulnerabilities

    • Heartbleed was a parsing bug

EverParse: A Verified Low-level Parser Generator

  • Given a type description (e.g., ClientHello)

  • Produces a parser and a formatter indexed by its (inverse) parser

let parser t = b:bytes -> option (t * n:nat{n <= length b})
let format (p:parser t) = f:(t -> bytes) {
    forall x. p (f x) == Some (x, length (f x))
}
  • And a low-level zero-copy validator indexed by the parser
    let validator (p:parser t) =
      bs:array uint8 ->
      pos:u32 { pos <= length bs } ->
      ST u32
      (requires fun h0 -> h0 `contains` bs)
      (ensures fun h0 res h1 ->
          h0 == h1 /\
          (if res < ERROR_CODE
           then exists v. p (as_bytes bs h0 pos) = Some (v, res) (* parsing succeeds *)
           else p (as_bytes bs h0 pos) = None (* parsing fails *)))

Combinator library for validators

  • Mirroring the structure of parser combinators
val return : validator unit_parser
val bind : validator p -> (t -> validator q) -> validator (p `Parser.bind` q)
val seq: validator p -> validator s -> validator (p `Parser.seq` s)
val map : f:(t -> s)  -> validator p -> -> validator (Parser.map f p)
...
  • Client-hello validator

    protocolVersion_validator​
    `seq` random_validator​
    `seq` sessionID_validator​
    `seq` clientHello_cipher_suites_validator​
    `seq` clientHello_compression_method_validator​
    `seq` clientHelloExtensions_validator​
  • Imperative code refines pure parser by construction

Data accessors

  • Structured access into validated binary data, specified by parsers and lenses
let accessor (p1:parser t1) (p2:parser t2) (l:lens t1 t2) =
    bs:array uint8 ->
    pos:u32 { pos <= length bs } ->
    ST u32
    (requires fun h0 ->
        h0 `contains` bs /\
        Some? (p1 (as_bytes bs h0 pos))) (* bs contains a valid t1 *))
    (ensures fun h0 res h1 ->
        h0 == h1 /\
        let Some (v1, _) = p1 (as_bytes bs h0 pos) in
        p2 (as_bytes bs h0 res) == Some (l.get v1, _)) (* returns offset to t2 *)
  • Abstractly, structured data access into a (representation of) t1

  • Concretely, just performs arithmetic into the binary formatted data

    • Effectively, reads directly from computed offsets into network packets
    • Would be crazy to try this directly in C!

Extracted to efficient C code

  • After partial evaluation and proof erasure
uint32_t Parsers_ClientHello_clientHello_validator(LowParse_Low_Base_slice input, uint32_t pos)​
{​
  uint32_t pos10 = Parsers_ProtocolVersion_protocolVersion_validator(input, pos);​
  uint32_t pos11;​
  if (pos10 > ERROR_CODE)​
    pos11 = pos10;​
  else​
    pos11 = Parsers_Random_random_validator(input, pos10);​
  uint32_t pos12;​
  if (pos11 > ERROR_CODE)​
    pos12 = pos11;​
  else​
    pos12 = Parsers_SessionID_sessionID_validator(input, pos11);​
  uint32_t pos13;​
  if (pos12 > ERROR_CODE)​
    pos13 = pos12;​
  else​
    pos13 = Parsers_ClientHello_cipher_suites_clientHello_cipher_suites_validator(input, pos12);​
  uint32_t pos1;​
  ...
}​

Performance of extracted C code

EverParse-perf

Wrapping up

  • Write low-level code if you must

  • But, program it tastefully in a proof assistant, not directly in C or asm

  • Thoughtful structuring of imperative coding patterns can make reasoning about imperative programs similar to functional programming

  • And with proofs, partial evaluation and proof erasure, the resulting code can be significantly faster than hand-written C

  • Homework: Try Chapters 1 7 of https://www.fstar-lang.org/tutorial