---
title: "The Night Agents Took the Stage"
date: 2026-02-24
description: "Six demos, one panel, and the uncomfortable question every dev tool company is dodging: are you building for agents, or are agents building around you?"
tags: ["ai-hacks-on-tap","agent-experience","devtools"]
readingTime: "12 min read"
url: https://alexmoening.com/dev-thoughts/the-night-agents-took-the-stage.html
markdownUrl: https://alexmoening.com/dev-thoughts/the-night-agents-took-the-stage.md
---

# The Night Agents Took the Stage

[← Back to /dev/thoughts](/dev-thoughts/)

<p class="lead">I walked into a meetup expecting pizza and demos. I walked out realizing that every developer tool company in the room was grappling with the same identity crisis: the users of their products are increasingly not human. Six companies, six different answers, and one uncomfortable question nobody has fully solved.</p>

### 60 Pizzas and an Identity Crisis

Tuesday evening I walked into Mux's new HQ on Market Street for the monthly All Things Web meetup. A hundred developers, 60 pizzas, and six companies all trying to answer the same question: how do you make your product work for something that isn't human?

The theme was **Agent Experience (AX)** — a term Matt Biilmann from Netlify coined last year, and one that's rapidly becoming the defining design challenge for developer tools in 2026. I've spent 25 years thinking about how content moves across the internet. Lately I've been thinking about how *agents* move across developer tools. Turns out the problems are eerily similar: context, routing, and trust.

Here's what I saw.

### The MCP Backlash

Rhys Sullivan from Vercel kicked things off, and immediately went after the sacred cow: MCPs.

Six months ago, everyone was shipping MCP servers. Now? Rhys made the case that CLIs are winning. The reasons are practical:

<table class="data-table">
    <thead>
        <tr>
            <th>Problem</th>
            <th>Why CLIs Win</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Context bloat</td>
            <td>MCP servers add ~100K tokens on connect, most of it irrelevant</td>
        </tr>
        <tr>
            <td>Chaining</td>
            <td>Bash commands compose naturally; MCP tool chains don't</td>
        </tr>
        <tr>
            <td>Permissions</td>
            <td>Agents are increasingly running with permissive execution modes</td>
        </tr>
    </tbody>
</table>

But Rhys wasn't really arguing for CLIs either. His actual pitch was more radical: **skip the intermediary entirely and let agents consume OpenAPI specs directly**.

The demo was convincing. He pointed an agent at a TypeScript LSP, called `tools.discover`, and the agent found the Vercel API on its own. No MCP server. No CLI wrapper. Just the existing API spec. The agent read the OpenAPI definition, figured out the input/output types, called the Vercel API to set a DNS TXT record, and asked for human approval only because the spec marked PUT requests as destructive.

The key insight: OpenAPI already encodes what's safe and what isn't. GET requests can run automatically. PUT and POST get flagged. The permission model is baked into the spec. No new tooling required.

This resonated with me. I've watched the CDN industry go through similar cycles — every few years someone invents a new configuration spec that's going to fix everything. Usually the answer was already in the HTTP spec all along. OpenAPI feels like that moment for agent tooling.

### Your Docs Are Your Agent's Brain

Two talks back-to-back drove this home. Hahnbee Lee from Mintlify and Abhi Aiyer from Mastra both arrived at the same conclusion from different directions: **documentation written for humans is actively hostile to agents**.

Hahnbee's framing was diplomatic — just update your docs, agents don't know when they're wrong. But Abhi's story was visceral. Mastra built an MCP doc server so agents in Cursor could write Mastra code. It took off. Then Replit tried to use it, and the whole thing fell apart.

The reason? Every page in Mastra's docs included both English and Japanese text. Every time the doc server responded, the context window exploded with bilingual content that the agent couldn't parse. Replit's team called them up and said, essentially: your docs are unusable.

That phone call triggered a rethink. Mastra's team decided to **ship documentation inside `node_modules`**. Not on a website. Not behind an API. Inside the package itself, version-matched to the code the agent installed.

Then something unexpected happened. When Opus came out, Abhi noticed it started grepping `node_modules` on its own when it didn't know something. It would find Mastra's docs, follow breadcrumbs to related files, and recursively build context. The agent was teaching itself how to use Mastra by reading the docs that shipped alongside the code.

This is the pattern I wrote about in [Making My Website Speak Robot](making-my-website-speak-robot.html) — the web is going bilingual, serving one version for humans and another for machines. Mastra took it a step further: they put the machine version *inside the dependency tree*.

### The Skills Freshness Problem

Multiple speakers referenced the emerging **skills ecosystem** — portable Markdown files that give agents curated context for a specific tool or framework. Vercel is pushing this hard. Mintlify auto-generates them from docs. Sanity converted their entire agent tooling to skills.

But Jon Eide Johnsen from Sanity flagged the elephant in the room: **skills go stale**.

You can run `npx skills add` and get a skill installed globally. Great. But APIs change constantly. There's no `npx skills check` that auto-updates. There's no hash verification for freshness. The agent has old context and doesn't know it.

Jon's team has 40+ tools in their MCP server, which is already close to the practical limit for context. His suggestion: **progressive disclosure** — let agents discover tools incrementally instead of loading everything at once.

This maps to a problem I've seen in CDN configuration. Customers want to push every feature flag, every A/B test, every personalization rule to the edge. At some point you hit a ceiling — not because the infrastructure can't handle it, but because the configuration surface area becomes unmanageable. The answer is always the same: layer it. Give the system what it needs at the point of decision, not everything everywhere all at once.

### Cursor's 30% Number

David Gomes from Cursor gave the most jaw-dropping demo of the night. He walked us through a bug he introduced into Cursor's own codebase — keep/undo buttons showing on every diff line instead of only on hover — and showed how Cloud Agents fixed it.

The flow:

<div class="flow-diagram flow-vertical" role="img" aria-label="Cursor Cloud Agent workflow: Prompt and Screenshot sent to 8 Models in Parallel, each spawns VM with Cursor, records video of fix, pick best solution and merge PR">
    <div class="flow-step">
        <span class="step-icon">📝</span>
        <span class="step-text">Prompt + Screenshot</span>
    </div>
    <span class="flow-arrow" aria-hidden="true">↓</span>
    <div class="flow-step">
        <span class="step-icon">🔀</span>
        <span class="step-text">8 Models in Parallel</span>
    </div>
    <span class="flow-arrow" aria-hidden="true">↓</span>
    <div class="flow-step">
        <span class="step-icon">🖥️</span>
        <span class="step-text">Each Spawns VM + Cursor</span>
    </div>
    <span class="flow-arrow" aria-hidden="true">↓</span>
    <div class="flow-step">
        <span class="step-icon">🎬</span>
        <span class="step-text">Records Video of Fix</span>
    </div>
    <span class="flow-arrow" aria-hidden="true">↓</span>
    <div class="flow-step">
        <span class="step-icon">✅</span>
        <span class="step-text">Pick Best → Merge PR</span>
    </div>
</div>

That's not pair programming. That's a hiring manager reviewing candidates. Send the same bug to eight agents, watch the video playbacks, pick the cleanest solution.

But the number that stuck: **more than 30% of Cursor's merged PRs now originate from Cloud Agents**. Up from near zero in October 2025. And the trajectory is still climbing.

David also showed a security vulnerability demo. An automated agent running every 24 hours found a clipboard exfiltration bug in Cursor's embedded browser. The agent then *demonstrated the exploit* by recording a video of itself stealing a UUID from the clipboard through a crafted webpage. The Cursor team could watch the video, confirm the vulnerability was real, and kick off a separate agent to fix it.

That's the part that got the room's attention. Not the bug fix — we've all seen agents write code. It was the agent *proving its own finding was real* by recording a video demonstration. That's a fundamentally different capability.

### The Panel: AX Is a Discipline, Not a Feature

The panel brought Matt Biilmann (Netlify) and Mika Sagindyk (Agent Arena / 2027.dev) together with the speakers for what turned into the most interesting 30 minutes of the night.

Matt's core argument: **AX is a discipline, like UX and DX before it**. Not a feature you ship. Not an MCP endpoint you bolt on. It's an ongoing practice of understanding how agents approach your product, where they get stuck, and designing for that.

He told the room that autonomous agents are becoming a new persona — a new type of user that interacts with your product in fundamentally different ways than humans. If you want your product to still be relevant in a few years, you need to design for both.

Mika brought the data perspective. His company, 2027.dev, benchmarks developer tools by sending Claude Code through their getting-started guides and measuring where agents fail. The findings are humbling:

<table class="data-table">
    <thead>
        <tr><th>Finding</th><th>Detail</th></tr>
    </thead>
    <tbody>
        <tr><td>Agents get stuck where humans wouldn't</td><td>Installation prompts, interactive confirmations, ambiguous next steps</td></tr>
        <tr><td>Documentation conflicts cause loops</td><td>Step 3 contradicts step 7 — humans figure it out, agents don't</td></tr>
        <tr><td>No importance weighting</td><td>Step 1 feels as critical as step 10</td></tr>
        <tr><td>Failures are invisible</td><td>You can't detect them unless you actually test with agents</td></tr>
    </tbody>
</table>

That last point is the kicker. You can't design good AX by imagining how an agent will use your product. You have to run the eval. Just like you can't design good UX by imagining how users will navigate your product — you have to watch them try.

### What I'm Taking Away

I've been at the intersection of content delivery and developer experience for a long time. Here's what crystallized for me at this meetup:

**The OpenAPI insight is underappreciated.** Instead of building new MCP servers for every API, just expose the spec and let agents discover it. The permission model is already there. The type information is already there. This is the path of least resistance, and it maps cleanly to how REST APIs already work behind services like API Gateway.

**Documentation is becoming infrastructure.** Not a nice-to-have. Not a marketing exercise. Your docs are your agent's primary interface to your product. Ship them in `node_modules`. Serve them as Markdown. Make them machine-readable. This isn't optional anymore.

**Skills freshness is the next supply chain problem.** When I wrote about supply chain trust in [Supply Chain Roulette](supply-chain-roulette.html), the core issue was trusting third-party code that could change without your knowledge. Skills have the same problem. You install them once, APIs change, and your agent has stale context. We need versioning, hash verification, and automatic freshness checks.

**Cloud agents are past the demo phase.** Cursor's 30% stat isn't an aspiration — it's a measurement from production. The shift from synchronous pair-programming with AI to asynchronous delegation is happening now. The workflow looks less like talking to a copilot and more like managing a team.

**AX is about empathy for a non-human user.** This sounds weird, but it's the right framing. Agents can't read your visual hierarchy. They can't intuit that step 3 matters more than step 7. They can't recover from conflicting instructions the way a human can. Designing for them requires the same user research discipline we've applied to humans — just pointed at a fundamentally different kind of user.

### The Bigger Picture

The room was about 100 people — mostly SF developers, founders, and devrel folks. But the conversation felt bigger than a meetup. Every speaker, whether they were from a 3-person startup or a $29B company, was grappling with the same transition: the users of their developer tools are increasingly not humans.

That's not a future prediction. It's a present-tense observation. Cursor's Cloud Agents are shipping PRs. Replit's agent is writing Mastra code. Sanity's MCP server is bootstrapping entire CMS projects. Inkeep's agents are drafting emails and posting to Slack.

The question isn't whether agents will use your product. They already are. The question is whether you're designing for them or making them work around you.

I've been moving bits across the internet since 1999. Back then, the challenge was making content fast for humans. Now it's making content *legible* for machines. Different problem. Same discipline. And honestly, it's the most interesting evolution I've seen in this industry in a decade.

---

*The All Things Web meetup runs monthly in San Francisco, organized by Andre Landgraf. The February 2026 edition was hosted at Mux HQ. Speakers included Rhys Sullivan (Vercel), Hahnbee Lee (Mintlify), Abhi Aiyer (Mastra), Jon Eide Johnsen (Sanity), Gaurav Varma (Inkeep), and David Gomes (Cursor), with a panel featuring Matt Biilmann (Netlify) and Mika Sagindyk (Agent Arena).*

---

## Navigation

- [Home](/)
- [About](/about.html)
- [Projects](/projects.html)
- [Contact](/contact.html)
- [/dev/thoughts](/dev-thoughts/)

*Copyright 2026 Alex Moening. Opinions expressed are my own.*
