Agents That Are Running For Weeks
The key shift isn’t a smarter single agent - it’s delegation at duration + scale.
Cursor posted a big Autonomy dial signal - they claim they’ve been “running coding agents autonomously for weeks”, with the explicit goal of pushing agents toward projects that “typically take human teams months”.
They describes moving from brittle, lock-based coordination (agents stalling, forgetting locks, becoming risk-averse) to a role-split system - planners generate tasks, workers grind tasks to completion, and a judge decides whether to continue before the next iteration “starts fresh” to avoid tunnel vision.
Then they stress-tested it on an “ambitious goal” - building a web browser from scratch. Cursor says the agents ran for close to a week, writing 1M+ lines of code across ~1,000 files, with “hundreds of workers” pushing to the same branch with minimal conflicts. They also cite a three-week in-place Solid→React migration (+266K/−193K) and a long-running agent that made video rendering 25× faster (code merged, “in production soon”).
“At the end of each cycle, a judge agent determined whether to continue, then the next iteration would start fresh. This solved most of our coordination problems and let us scale to very large projects without any single agent getting tunnel vision.” - Cursor
This is clearly upward pressure on Autonomy because the capability frontier is now task-completion length - agents that can keep going, recover, coordinate, and ship meaningful increments with less human steering. Cursor also admits the limits (agents sometimes run too long, they still need “periodic fresh starts” to combat drift), which is exactly the failure mode that makes “weeks-long delegation” the real benchmark.
> The interface layer is thickening. If you disagree with my interpretation, or you’ve spotted a better signal then reply and tell me.


