Conclusion & Exercises - prealpha; Formal Verification Agent Recipes

In this chapter, we’ve taken a deep dive into the world of plain Python agent development. We’ve seen how to build a verification agent from scratch, without relying on any frameworks. ## TODO: audit the prose to make sure its not too LLMy

We’ve learned about the importance of manual tool-calling loops, specialized insertion tools, and configuration-driven design. We’ve also seen how to manage state using the message history and how to design tool APIs with the agent in mind.

The patterns we’ve explored in this chapter are not specific to DafnyBench or even to formal verification. They’re general principles that you can apply to any agent you build, regardless of the domain.

In the next chapter, we’ll turn our attention to a different proof assistant, Lean, and a different agent framework, pydantic-ai.

Exercises¶

Add a new insertion tool, insert_lemma(), for adding helper lemmas to the Dafny code.
Implement retry logic with exponential backoff for the API calls to the Anthropic API.
Add streaming support to the verify_dafny tool to show the incremental output from the Dafny compiler.
Create a visualization tool that animates the agent’s insertion decisions.
Port the inspect-ai implementation from Chapter 2 to use the specialized insertion tools from this chapter.
Add conversation persistence to the agent, so that you can save and resume agent sessions.
Implement a “dry run” mode for the agent that shows the insertions it would make without actually running the verification.

Dafny part 2: plain Python

When to “Rawdog” vs. Use Frameworks

Dafny part 2: plain Python

Outline for Chapter 03: DafnyBench with Plain Python