What if the right unit of work isn't an agent, but a fleet of small ones with shared memory and shared tools?
I keep watching teams reach for one heavyweight agent — pile fifty, sixty, a hundred skills onto it — and then spend most of their effort fighting the resulting monolith. The skill graph collides with itself. Tool selection gets unreliable. Context windows blow out. Every new capability makes the existing ones a little worse.
The bet I'm prototyping is the opposite shape: small, specialized agents in a fleet, with the fleet itself as the primitive — not the agent.
What goes wrong with one big agent
- Skills interfere. A single planner asked to do everything from sales outreach to log triage starts confusing tools and contexts. More skills, more drift.
- No shared substrate. Two agents on the same team can't see each other's work, can't share memory, can't pass tasks. Every team rebuilds this from scratch.
- Runtime lock-in. The framework that makes the agent usually owns the deployment, the messaging, the tool calls. Swapping the runtime means rewriting everything.
- Distribution is broken. "Here's an agent that does X" usually means "here's a half-configured Python project, good luck." There's no batteries-included unit you can hand someone.
The bet
Treat the fleet as the unit:
- Agents are declarative. Small YAML or config files that say what an agent is, not procedural code that defines it.
- The fleet owns the primitives. Task queue, shared memory, delegation, tool registry. Any agent in the fleet can use them.
- Runtimes are pluggable. A lightweight runtime for cheap, narrow agents. A heavier runtime when you need it. Same fleet contract, different engines underneath.
- Capability packs ship as bundles. Skills + integrations + MCP servers, curated together so a new fleet starts useful instead of empty.
Once the fleet is the primitive, "more capability" stops meaning "fatter agent" and starts meaning "another teammate with its own narrow job and shared access to the team's memory."
Where it is
Working prototype. Daemon, CLI, and a small web console for inspecting fleets. Core architecture is in: agent runtime interface, messaging contract, four-layer model. Active work is on the capability tier — fleet-level tools, MCP sidecars, and a baseline pack you'd actually want to start with.
What I'm still thinking about
- Pack curation is a knife-edge. Batteries-included enough to be delightful on day one, but not so prescriptive that teams hit the ceiling on day three. I don't have a clean answer yet.
- Coordination at scale. A three-agent fleet is easy. What does the task queue and delegation contract look like when a fleet has fifteen agents and they're all asking each other for things?
- Where's the open-core line? Single fleet, core orchestration, runtime contract — clearly open. Multi-fleet, RBAC, cost tracking, clustering — clearly the commercial layer. The wedge needs to feel honest, not crippled.
More on this once the capability tier lands and I've put a real fleet through real work.