The Moment It Broke: How 10 Agents Changed My Mental Model
By Nick Halstead
Two weeks ago my mental model of "building software" broke.
I spun up OpenClaw and had it coordinate 10+ agents. They talked to each other, installed their own tooling — Claude Code, Codex — spun up the repo scaffolding, and shipped working chunks end-to-end. Stuff that would've taken me a day of vibe coding in Cursor landed in under five minutes.
And then I sat there staring at the screen, because the obvious conclusion — wow, that's fast — wasn't actually the interesting part.
Speed is the wrong metric
When people see agents write code quickly, the natural reaction is to measure the acceleration. "10x faster!" "100x faster!" And sure, the speed is real. But if you've spent any time shipping enterprise software, you know that writing code was never the bottleneck.
The bottleneck is everything else:
- Making the UI actually complete — not just the happy path, but empty states, error handling, permissions, onboarding, settings, accessibility
- Integrating with the rest of the world — identity, billing, email, data warehouses — without creating a maintenance nightmare
- Deploying safely — migrations, rollbacks, canaries, secrets management
- Keeping it running — observability, incident response, SLOs
- Making it trustworthy — security reviews, audit trails, compliance, governance
- Surviving change — new requirements, new vendors, new policies, schema migrations
That's the gap between a demo and a system. And right now, agents are spectacular at demos.
The ability scale
I've been thinking about this as a rough score from 0 to 100. Not "how smart is the AI" — more like "how much of the delivery can you actually delegate?"
At 20/100 (where we are now), agents produce astonishing first drafts. The code works. The scaffolding is sound. Features appear fast. But the moment you need long-horizon correctness, production-grade completeness, or anything involving the trust stack — security, governance, compliance — you're back in the driver's seat.
At 100/100, you describe what you want and get a product that is secure, scalable, compliant, maintainable, and self-improving. Not a prototype. Not a demo. A system you'd bet a business on.
We're at 20. The slope is steep. And I think we cross most of the critical thresholds in the next 12 months.
What I saw in those five minutes
The agents didn't just write code fast. They coordinated. One handled the backend API. Another built the frontend. A third wrote tests. A fourth handled the database schema. They referenced each other's work, resolved conflicts, and produced a coherent result.
That's not autocomplete. That's the seed of something much bigger.
But it also exposed every gap. The output was a working prototype — not a product. The UI had no empty states. Error handling was optimistic at best. There was no deployment pipeline, no monitoring, no security review. It was, in the most literal sense, a demo.
And that's fine — for now. The question isn't whether agents can fill those gaps. It's how quickly the gaps close, and in what order.
The eight gates
I've mapped out what I think are the eight world changes that take us from 20 to 100. Each one removes a specific category of friction:
- The babysitting ends — delegation replaces supervision
- The demo dies — finished UX becomes the default output
- Integrations become copy-paste — connectors that actually hold
- Deployment becomes a button — safe, engineered, boring
- Scale becomes a setting — verified continuously, not discovered painfully
- Trust becomes built-in — security as a seatbelt, not a checklist
- Products maintain themselves — closed-loop lifecycle ownership
- Every company gets custom SaaS — "buy vs build" collapses
Each gate stacks on the last. You can't have self-maintaining products (7) without trustworthy deployment (4) and embedded security (6). You can't have custom enterprise SaaS (8) without all seven gates open.
The full thesis is here. But the seed of it — the thing that made me write 5,000 words on a Sunday — was watching those 10 agents work together and realizing: the speed is the easy part. The trust stack is the real game.
What becomes scarce
Here's the punchline, and it's counterintuitive: when engineering capacity stops being the bottleneck, the new constraints are human.
Clarity of intent — can you actually describe what you want? Good taste — do you know what "done" looks like? Governance — can you set the policies that keep speed safe?
The organizations that win the next era of software won't be the ones with the biggest engineering teams. They'll be the ones that can think clearly about what they need and articulate it precisely.
Software as a request. That's where this is going.