In the last chapter, we saw how a framework like inspect-ai can help us get a verification agent up and running quickly. But frameworks can sometimes feel like magic boxes, hiding the messy details of what’s really going on. In this chapter, we’re going to dispel that superstition. ## TODO: audit the prose to make sure its not too LLMy
We’re going to build the same DafnyBench solver, but this time, we’ll do it with plain Python. No frameworks, no magic. Just you, the Anthropic SDK, and a whole lot of code.
This will give you a much deeper understanding of how these agents work under the hood. We’ll explore some of the key innovations of our plain Python implementation, including a suite of specialized tools for inserting Dafny hints, a configuration system based on TOML, and of course, a manually implemented tool-calling loop.
So, let’s roll up our sleeves and see what inspect-ai was doing for us all along.