Software as a Request

By Nick Halstead

Over the last few months, code generation crossed a threshold: it stopped feeling like autocomplete and started feeling like a teammate. In my own experiments with multi‑agent setups (OpenClaw coordinating 10+ agents that install tooling, plan work, and execute in parallel), I've watched build cycles collapse from "a day of vibe coding" to "a few minutes." That speed is real. But it's also not the point.

The bottleneck for enterprise software has never been "writing code." The bottleneck is everything that makes software safe, complete, integrated, deployable, scalable, and maintainable — the parts organizations use to decide whether something is a demo or a system.

This piece is a thesis on what has to change — technically and operationally — over the next 12 months to take us from today's agentic acceleration (call it ability ~20/100) to the point where an individual can ask for "my own HubSpot" or "my own Facebook for my org/community" and get something that is not just built fast, but built finished and trustworthy.

It's a roadmap of eight world changes, with the underlying technical milestones that unlock them.


The future

When enterprises adopt software, they don't buy code. They buy:

  • Completed interfaces (polish, edge cases, roles, flows)
  • Integrations (identity, billing, messaging, data sync)
  • Deployment lifecycle (CI/CD, migrations, rollbacks, environments)
  • Operations (observability, incident response, SLOs)
  • Security & governance (least privilege, audit, compliance posture)
  • Performance & scalability (latency, cost, multi‑tenant safety)
  • Lifecycle ownership (bug tracking, feedback loops, updates)

Today, agents are already strong at shipping code quickly. The thesis is: once agents can reliably generate and operate the entire "trust stack," enterprise software becomes a prompt.


A simple ability scale

Let's define a rough "AI ability score" for enterprise software delivery:

  • 20/100 (now): astonishing speed at producing working code, scaffolds, features, and iterations. Weaknesses show up in long-horizon correctness, completeness, integration depth, production operations, security nuance, and accountability under uncertainty.
  • 100/100: can take an ambiguous goal and reliably produce a secure, scalable, compliant, maintainable product — and keep it correct as it evolves — with the operational discipline of a top-tier engineering org.

This isn't a "smartness" scale. It's a delivery autonomy scale.


Why "enterprise apps" are uniquely hard

If you want a clean mental model:

"Consumer prototypes fail when they're wrong" vs. "Enterprise systems fail when they're right but not safe."

Enterprise-grade means:

  • The UI is complete enough that normal people can use it without gaps.
  • The system integrates with other systems without fragile glue code.
  • Deployments are controlled, reversible, and auditable.
  • Incidents are diagnosable and recoverable.
  • Security isn't "we didn't see a bug," it's "we designed for adversaries."
  • The product survives change: schema migrations, new roles, new policies, new vendors.

The eight world changes

This is the ladder I think matters most — the "headline shifts" that will define the future.

Step 1 — The babysitting ends

Right now, even the best agents still feel like pair‑programming with a hyper‑productive junior: you get a burst of speed, but you pay for it in attention. You're nudging constantly — "not that file," "don't refactor the whole module," "why did you change the config," "fix the import," and so on. The work gets done, but it doesn't feel like delegation. It feels like supervision.

This first step is when that dynamic flips. You stop vibe‑coding alongside the agent and start assigning outcomes. The agent can move through a real repository without thrashing the architecture or violating conventions. It understands boundaries — where the UI ends, where the service layer begins, where migrations live, what patterns the team actually uses — and it can coordinate its own "specialists" (UI, backend, tests, migrations) without you manually conducting every move.

The practical difference is simple: you're no longer watching the work happen. You're reviewing it after it happened.

Milestone: you describe the feature at outcome level, hit enter, and come back to a single cohesive pull request that builds, tests, and reads like something a solid engineer would submit — rather than a stream of partial attempts that need handholding to converge.

Step 2 — The demo dies

This is the first moment the outside world actually believes the shift.

Up to now, agents can generate a lot of functionality quickly — but it still has that unmistakable prototype smell. Screens exist, flows almost work, and the "happy path" looks great… until a real user clicks something slightly wrong, has the wrong permissions, uploads messy data, or tries to do something mundane like export, undo, or recover from an error.

Step 2 is when that smell disappears.

"Cool prototype" stops being the default output. What you get looks and behaves like a real product because the agent is no longer optimizing for getting to working code — it's optimizing for getting to finished experience. And "finished" doesn't mean prettier UI. It means the app holds up under normal human behavior.

A complete interface is end‑to‑end, not a collection of screens. It includes the boring, essential parts people forget: empty states and loading states that guide you, error handling that makes sense, roles and permissions that are consistent in the UI and the API, onboarding that doesn't confuse new users, settings and admin tooling that don't feel bolted on, audit views that explain what happened, imports/exports that fit real workflows, bulk actions that don't break, and accessibility and responsive behavior that make it usable for everyone. It uses the same design language everywhere. It has product-grade text and guardrails — the moments where the system proactively asks "are you sure?" and prevents you from doing something destructive by accident.

Under the hood, the key change is that "done" becomes enforceable. Not "it compiles," but a real Definition of Done: workflow-based UI acceptance tests, category-specific UX patterns (a CRM should feel like a CRM), and self-review loops where specialized sub-agents critique completeness the way a seasoned product team would. The agent ships with the instinct humans develop through years of building software: most of the work is in the edges.

Milestone: the "last 30%" — the part that usually dominates timelines — becomes automated. The output isn't a demo you show. It's a product you can hand to users.

Step 3 — Integrations become copy‑paste

Enterprise software doesn't live on an island. The moment something matters, it has to connect: identity providers, email, billing, data warehouses, help desks, calendars, accounting systems, internal tools — everything. And today, that's still where "fast build" goes to die. Integrations are the part that turns a simple product idea into weeks of edge cases, brittle glue code, and late-night debugging.

Step 3 is when that stops being true.

This step is the shift from "we can call APIs" to "we can integrate like a product company." Real integrations aren't a single request-response. They're living relationships between systems, and they break in predictable ways. You need OAuth and SSO flows that handle refresh and tenant separation correctly. You need webhooks that don't duplicate events or lose them, plus retries that don't create data corruption. You need to survive pagination, rate limiting, and vendor quirks that only show up in production. You need two-way sync that can resolve conflicts instead of silently overwriting data. You need mapping layers that can adapt when schemas drift. And you need to distinguish sandboxes from production so testing doesn't become a minefield. Above all, you need visibility: monitoring and alerts that tell you when a connector is degraded before customers notice.

The real breakthrough here isn't that the agent can write integration code. It's that integrations become standardized. A connector stops being a bespoke side quest and becomes something you stamp out with reusable patterns: a connector engine for auth/webhooks/retry/idempotency, contract tests that run in CI against mocks or sandboxes, safe schema/mapping tooling, and a policy layer that controls which data is allowed to move where. Once those primitives exist, "integrate with X" becomes composable instead of custom.

Milestone: "Connect to X" becomes a reliable primitive — something you do with confidence, not an adventure you budget weeks for.

Step 4 — Deployment becomes a button

In most organizations, shipping isn't the hard part because the code is difficult. Shipping is hard because the process is fragile. It lives in tribal knowledge, late-night rituals, and a handful of people who "know how production works." Even when an agent can build features quickly, you don't really feel the speed-up if every release still requires careful manual choreography.

Step 4 is when that friction collapses.

Deployment becomes boring — in the best way. Not because it's reckless, but because it's engineered. Environments are created consistently from infrastructure-as-code, not by hand. CI/CD becomes a real gatekeeper, not a checkbox. Secrets are handled as a first-class system: least privilege by default, scoped credentials, rotation, and no accidental leakage. Database changes aren't scary because you're using safe migration patterns designed for online systems. Releases don't jump straight into the deep end; they roll out progressively via canaries or blue/green deploys, with meaningful health checks that actually verify behavior. And when something goes wrong, rollbacks aren't a panic button — they're part of the plan, with runbooks and operational readiness baked in from the beginning.

Under the hood, what changes is that the agent's work becomes reproducible and governable. Builds are hermetic: pinned dependencies, deterministic pipelines, no "works on my agent." Execution happens in sandboxes with constrained permissions, so the agent can't accidentally (or maliciously) cause damage while it's building and deploying. Release orchestration becomes a specialized capability: agents learn to stage risk, deploy in steps, and halt automatically when indicators degrade. And the most important shift: rollback triggers become automatic — if SLOs are violated, the system defends itself.

Milestone: "From PR to production" becomes an engineered system — repeatable, observable, reversible — not a hero process led by whoever's brave enough to push the button.

Step 5 — Scale becomes a setting

Most software feels fast when it's small.

A prototype with ten users is smooth. A new product with a few hundred accounts is fine. And then the first real wave of success hits — usage spikes, data grows, background jobs pile up, dashboards slow to a crawl — and suddenly the team is doing the thing everyone swears they'll never do: the rewrite. Not because the original idea was wrong, but because the system was never built to carry the weight it just earned.

Step 5 is when that pattern breaks.

"Scale becomes a setting" doesn't mean scaling is trivial. It means scaling stops being a surprise. Performance and cost are treated as first-class requirements from the beginning: you define latency budgets and throughput targets up front, and the system continuously proves it can meet them. Load testing and profiling aren't heroic efforts you do once a year — they're part of the build pipeline. The architecture naturally reaches for the right tools when pressure rises: caching where reads dominate, indexing where search matters, queues and batching where work needs smoothing, and backpressure where systems would otherwise collapse. In a multi-tenant world, it also means isolation: one noisy customer can't melt the experience for everyone else.

The deeper change is that "scale" becomes measurable rather than mythological. Performance regressions get caught the same way test regressions do — automatically, and early. The agent doesn't just generate code; it selects reference architectures that match the category of product you're building (a CRM is not a feed, a feed is not analytics). Profiling becomes routine, and recommendations become reliable enough to trust. And operational dashboards tell the full story: not just "is it slow," but what feature caused it, what it costs, and what the tradeoff is.

Milestone: "It scales" becomes a verified property. Not a hope. Not a future rewrite. A measurable, tested, continuously enforced reality.

Step 6 — Trust becomes built‑in

Up to this point, the story is speed. Faster building, fewer handoffs, smoother shipping. But enterprise adoption doesn't ultimately fail because the software is slow to create. It fails because the organization can't trust it.

This is the cliff: security reviews, governance checks, privacy concerns, procurement questionnaires, "where does the data go?", "who can access it?", "can we audit it?", "what happens when something goes wrong?" For most teams, that's where momentum dies — because trust is bolted on late, and bolted on badly.

Step 6 is when trust moves from an afterthought to an embedded property of the system.

Security stops being a checklist you sprint through at the end and becomes a design posture you carry from the beginning. That means threat modeling and abuse cases are part of how features are specified. It means identity and authorization aren't "login works," but least privilege by default and provably correct permission enforcement. It means every sensitive action leaves an audit trail, and those records can't be quietly altered. It means privacy and retention policies are real workflows — data can be found, controlled, and deleted correctly. It means safe defaults: the system resists misconfiguration rather than inviting it. And it means supply-chain hygiene is treated as a first-class risk: dependencies are controlled, builds are reproducible, and provenance is explicit.

Under the hood, the key change is governance that actually constrains reality. Agents operate inside a policy engine: what tools they can run, what networks they can touch, what dependencies they're allowed to introduce, what secrets they can access, and what classes of change require extra scrutiny. Security testing becomes a pipeline gate, not a periodic project. Provenance becomes automatic: you can always answer "who/what generated this artifact, with what inputs, and through what checks." And for high-risk areas — authorization, payments, PII flows — human approval isn't removed; it's formalized, with clear escalation paths and evidence attached.

Milestone: security becomes an automatic seatbelt. Something the system enforces continuously, so product teams can move fast without dragging a separate security team into every release just to feel safe.

Step 7 — Products maintain themselves

Most software doesn't fail because it's badly written on day one. It fails because it slowly decays.

A new system starts clean and understandable, then reality arrives: edge cases, messy customer data, vendor changes, new permissions, new workflows, unexpected load, and a steady drip of "small" fixes that quietly introduce regressions. Over time the product doesn't just accumulate features — it accumulates scar tissue. Teams call it "technical debt," but what they really mean is: the system is losing the ability to stay correct as the world changes.

Step 7 is when that rot stops being inevitable.

The key insight is simple: shipping once is easy. Keeping something correct is the product. Enterprise software isn't a release; it's a living contract. And that contract only holds if there's a real lifecycle loop: the product observes itself, turns what it sees into actionable work, fixes problems safely, and proves the fix didn't break anything else.

This step is what lifecycle ownership looks like when it's done well. Telemetry becomes signal, not noise: the system can tell the difference between a glitch and a pattern. Issues get triaged automatically into the right bucket — bug, feature request, incident — with context attached. When something breaks, the system can reproduce it reliably, shrink it into a minimal failing case, and propose a targeted fix rather than a vague rewrite. Updates arrive with prioritization, release notes, and clear customer impact. Backward compatibility isn't an afterthought; deprecations are staged and communicated. And rollouts respect the reality of multi-tenant software: customer-specific configuration, gradual deployment, safe defaults, and the ability to pause or rollback without chaos.

Under the hood, the product is now running a closed loop: observe → triage → fix → test → ship → verify. Regression prevention becomes meaningful — less "coverage theater," more tests that map to real user journeys and hard invariants. The system learns to route problems to the true source: sometimes it's code, sometimes it's an integration, sometimes it's the spec itself. And crucially, it develops calibrated confidence: low-risk fixes can flow fast, high-risk changes escalate to humans with clear evidence and a recommended path.

Milestone: the product becomes a self-improving system with operational memory. It doesn't just run — it remembers what went wrong, learns from it, and gets harder to break over time.

Step 8 — Every company gets custom SaaS

This is where all the earlier steps stop being "improvements to how we build software" and become a change in what software is.

At the beginning of this thesis, the breakthrough is speed — but it's fragile speed. Agents can build fast, yet they still demand attention. You're still supervising, still stitching, still polishing, still translating "almost working" into "safe to ship." Over the steps that follow, that friction collapses one gate at a time: first the babysitting ends and delegation becomes real; then the demo smell disappears because the interface is genuinely finished; then integrations stop being bespoke adventures and become standardized connectors; then deployment turns into a boring, engineered pipeline; then scale becomes something you verify continuously rather than discover painfully; then trust becomes embedded so security and governance stop being the adoption cliff; and finally, the product stops rotting because it can observe, triage, fix, ship, and verify in a closed loop.

Step 8 is what happens when those gates stack.

"Buy vs build" collapses, not because buying disappears, but because generating becomes the fastest path to a system that actually fits. Instead of renting a generic tool and spending months configuring it around your workflows, you generate a product that is shaped to your company from day one: your identity model, your permissions, your data rules, your integrations, your compliance posture, your deployment environment, your performance targets — all treated as inputs, not afterthoughts. The output isn't a clone in the shallow sense; it's something more useful: a fitted system that behaves like "HubSpot-class software" for your specific reality.

The counterintuitive part is what becomes scarce. It isn't engineering capacity anymore. The new constraints are human: clarity of intent, good taste, and governance. The organizations that win aren't the ones with the biggest dev teams — they're the ones that can articulate what they want, define what "done" means, and set the policies that keep speed safe.

Milestone: you stop renting generic software and start generating software shaped to you — with completeness, deployability, scale, trust, and ongoing maintenance as default properties, not heroic effort.