prealpha; Formal Verification Agent Recipes

In distant land, unreachable by your steps, there stands a castle known as Monsalvat

Lohengrin, Lohengrin Act III, Scene 3

TODO: this is an ALPHA. This is not released yet.¶

Please don’t send it around to your friends yet.

Agents, evals, and RL environments for the working proof engineer.

Introduction. Setting scope, defining terms.
DafnyBench (Dafny) with inspect-ai. Quickest MVP with free agents and logging/dashboards
DafnyBench no-framework. Dispell any suspicions about the complicated framework by feeling the purepython and anthropic SDK under your fingernails.
FVAPPS (Lean) with pydantic-ai, more flexible than inspect-ai
From evals to RL envs. How much SFT do you need to bootstrap, and how to get those tokens. Curricula design.
Outlook. Please measure verification burden

Chapters 2-4 will have code in the repo in the ./evals subdir, maybe 5 as well.