Do you hear the call? Then thank god you are called to hear it.
Gurnemanz, Parsifal Act 1, Scene 1
Software is often incorrect. The Monastic Order of QA Technicians are tasked with detecting this with precision. Proof engineers are the Monastic-er Order within QA, who generally try to obtain the most assurance that the laws of syntax and semantics will abide. Considering the sheer volume of incorrect software in the world, this secretive and elite Order is far too small. It is here that we find ourselves in late 2025, the early days of agentic language systems and the dawn of a new era of program synthesis.
What is this book¶
This is your reference of agent recipes for formal methods tasks. In this book, we think language models will keep going up on the evals that get shipped, so we better make sure defensive capabilities like secure program synthesis or software correctness do not get neglected. Vitalik’s notion of “defensive acceleration” is instructive, here.
Written mostly end of 2025¶
The world is moving quickly enough that I expect this book will be useful for 6 months, before the ground shifts under its feet.
Who is the questgiver¶
This is Ole Q Doc coming at you live from the school of hard knocks. Email me. Mike Dodds told me that what’s obvious to me isn’t obvious to everyone,
Not a math book¶
We also can’t help but notice the pure math gold rush and the unapologetic absence of vision or business model that comes with it.

Figure 1:From SMBC October 20th 2025 (edits mine)
So we will be working with the Lean prover, but not with math. This is an ideological bias of the book.
Who is it for¶
The book is primarily for formal verification experts who want a piece of the evals and RL environments scene. I’m curious what evals and RL envs people will think about this-- many of them are more seasoned in evals (and especially in RL) than me. Such people are less likely to find nonobvious (to them) stuff in here, as I’m mainly doing what I perceive to be standard wisdoms in the agentdev space, but I am opinionated. An example of something I’m opinionated about is that Eun Sun Kim’s rubato choices are better than any other Parsifal conductor.
I’m assuming basic familiarity with software engineering in python. I typically say agentdev is accessible by anyone who has any background in any software. I’m not teaching formal methods, but if you don’t know formal methods you should be able to wing it.
How to read it¶
I think the book works well for the skimmers/halfassers as well as those who want to treat it like a dense textbook. It also works as a reference, where there’s no pretense of reading it front to back sequentially and you just grab a chapter when you need it. The next section covers how you should interact with the codebase.
Am I reading LLM prose?¶
I very kick off substantial prose tasks to LLMs. People tell me I have a strong voice, so you’d notice if I did. Since the first rule of formal verification agents is to have fun, I will occasionally style transfer the boring sections to different literary styles. Maybe we’ll introduce some recurring characters, etc.