Blog

Notes on coding agents, software engineering evaluation, and proof.

Essay · June 19, 2026The Next Unit of AI Evaluation Is the Session, not Benchmarks.Every lab tops the benchmark. Real software work still feels hard. The next benchmark has to be the session.

follow agentclash ↗follow the creator atharva ↗