Skip to article frontmatterSkip to article content

DafnyBench and Inspect

In the world-breath’s wafting universe to drown, to sink

Isolde, Tristan und Isolde, Act III Scene 3

In this chapter, we’re going to ship a solver for DafnyBench (Loughridge & Sun et al 2024). Dafny is a verification-aware programming language developed at Microsoft Research that allows you to write programs alongside formal specifications and proofs of correctness, with compiletime knowledge of your verification status. It’s in roughly the hoare logic domain, meaning the interplay of imperative programs with preconditions, postconditions, and invariants define the rules of the game and the proof obligations. DafnyBench is a dataset of 750 Dafny program repair tasks, where each task provides a buggy program and specification, and the goal is to fix the code so it verifies.

The UK AISI incubated an evals framework called inspect_ai, though it is currently not being maintained by the AISI.

I’ll give thorough background about the tools and the paper, but I asked Claude to styletransfer it to Cervantes and Troyes. It’s important, when making formal verification agents, to have a good time.