ProofSWE
Blog
GitHub
ProofSWE
Blog
Notes on coding agents, software engineering evaluation, and proof.
Essay · June 19, 2026
The Next Unit of AI Evaluation Is the Session, not Benchmarks.
Every lab tops the benchmark. Real software work still feels hard. The next benchmark has to be the session.
follow the creator atharva ↗