---
title: "Skills vs MCP — Why Your AI Needs an Orchestration Layer"
date: 2026-03-11
description: "MCP gives your AI tools. Skills tell it when and how to use them. The ecosystem is figuring this out the hard way — and the data backs it up."
tags: ["mcp","agent-experience","architecture","skills"]
readingTime: "14 min read"
url: https://alexmoening.com/dev-thoughts/skills-vs-mcp-orchestration-layer.html
markdownUrl: https://alexmoening.com/dev-thoughts/skills-vs-mcp-orchestration-layer.md
---

# Skills vs MCP — Why Your AI Needs an Orchestration Layer

[← Back to /dev/thoughts](/dev-thoughts/)

<p class="lead">MCP solved tool access. It didn't solve tool intelligence. And the ecosystem is figuring this out the hard way — developers are hitting context walls, accuracy is degrading, and the data from Anthropic, Redis, and Chroma Research all point the same direction: more tools doesn't mean more capable.</p>

Last year everybody was hot on MCP. Every tool needed a server. Every capability got wrapped in a protocol endpoint. Now the pendulum is swinging — developers are hitting context walls, accuracy is degrading, and the smartest builders are moving to skills, CLI tools, and orchestration layers that sit *above* MCP rather than replacing it.

I rebuilt my own home network skill from a 102-line tool catalog into an intent-routed orchestration layer and measured the difference. But this isn't just my opinion — the data from Anthropic, Redis, Chroma Research, and a growing chorus of developers all point the same direction.

Here's what I learned, what the ecosystem is discovering, and why it matters for anyone building AI agents that use tools.

### What MCP Already Gives You

<p class="section-summary">MCP is elegant infrastructure — define tools, inject them, let the model call them.</p>

Model Context Protocol is elegant. You define tools — functions with names, parameters, and descriptions — and your AI agent gets them injected into its context window at session start. My home network MCP server exposes 36 tools across five systems: pfSense firewall, UniFi switches, WLED lights, Kasa smart plugs, and Home Assistant.

When I say "turn off the picture frame," Claude Code already *has* `wled_set_power` in its tool list. It can see the function signature. It knows the parameters. MCP's job is done.

So what's the problem?

### What MCP Doesn't Give You

<p class="section-summary">Tools are atoms. They don't encode workflows, topology, or operational sequences.</p>

MCP tools are atoms. They don't know about each other. They don't encode workflows. They don't carry network topology. And they definitely don't tell the AI that before changing a switch port's VLAN, it should create a DHCP static map first or the device will get a random IP.

When I asked Claude to "add a new device to the IoT VLAN," it had 36 tools and zero guidance on the 6-step provisioning workflow that actually works. Every request became a cold-start reasoning problem: scan the tool list, guess the order, hope for the best.

This is the gap between tool *access* and tool *intelligence*.

### The Ecosystem Is Hitting the Same Wall

<p class="section-summary">Context consumption, accuracy degradation, and token waste — everyone's seeing the same pattern.</p>

I'm not the first person to notice this. The evidence is piling up from every direction.

**The context consumption is brutal.** A [GitHub issue](https://github.com/anthropics/claude-code/issues/11364) on Claude Code reported that just 7 MCP servers consumed **67,300 tokens — 33.7% of the 200K budget** — before a single user message. Another user found MCP tools eating [98,700 tokens (49.3%)](https://github.com/anthropics/claude-code/issues/13717) of their context window. Simon Willison [pointed out](https://simonwillison.net/2025/Aug/22/too-many-mcps/) that adding just the GitHub MCP alone defines 93 tools and swallows **55,000 tokens**.

**Tool overload makes models dumber.** Redis published [concrete benchmarks](https://redis.io/blog/from-reasoning-to-retrieval-solving-the-mcp-tool-overload-problem/): tool selection accuracy was **42% with 50+ tools** but jumped to **85% when filtered down to relevant tools** — a 2x improvement. Token usage dropped 98%. Latency dropped 8x. Anthropic's own testing showed their [Tool Search feature](https://www.anthropic.com/engineering/advanced-tool-use) improved accuracy from **49% to 74%** on Opus 4 by dynamically filtering which tools the model sees.

**The research confirms it at scale.** Chroma tested [18 state-of-the-art LLMs](https://research.trychroma.com/context-rot) and found **20-50% accuracy drops** as context grew from 10K to 100K+ tokens — what they call "context rot." Models don't use their context uniformly; they attend to the beginning and end but lose signal in the middle. Every tool definition you stuff in there pushes your actual task further into that dead zone.

**Developers are saying it plainly:**

> "MCP does not scale. It cannot scale beyond a certain threshold... This is a fundamental limitation with the entire concept of MCP." — [HackerNews](https://www.jenova.ai/en/resources/mcp-tool-scalability-problem)

> "Most of us are now drowning in the context we used to beg for." — [CodeRabbit](https://dev.to/piotr_hajdas/mcp-token-limits-the-hidden-cost-of-tool-overload-2d5)

Even [Anthropic themselves](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents) frame it explicitly: *"Context must be treated as a finite resource with diminishing marginal returns."* And: *"One of the most common failure modes we see is bloated tool sets that cover too much functionality or lead to ambiguous decision points about which tool to use."*

The message is clear: **more tools does not equal more capable**. Often it's the opposite.

### The Ecosystem Is Converging on the Same Fix

<p class="section-summary">Claude Code, Cursor, GitHub Copilot, Redis — all independently building progressive disclosure.</p>

What's interesting is that the solution isn't "fewer MCP servers." It's an orchestration layer *above* MCP. And everyone is arriving at it independently.

**Claude Code** built a [skills system](https://code.claude.com/docs/en/skills) with progressive disclosure — skill metadata loads at ~30-50 tokens, the full skill loads on invocation, and supporting resources load on demand. 349+ community skills and counting.

**Cursor** migrated from monolithic `.cursorrules` files to modular `.mdc` files in `.cursor/rules/`, with glob-based activation patterns so rules only load when relevant files are touched. They also enforce a **hard 40-tool limit** — an explicit acknowledgment that unlimited tools degrade performance.

**GitHub Copilot** added [path-specific `.instructions.md`](https://docs.github.com/copilot/customizing-copilot/adding-custom-instructions-for-github-copilot) files with YAML frontmatter for scoped context. In November 2025, they added agent-specific instructions to control which agents receive which context.

**Simon Willison** [made the case directly](https://simonwillison.net/2025/Aug/22/too-many-mcps/): *"If your coding agent can run terminal commands and you give it access to GitHub's `gh` tool, it gains all of that functionality for a token cost close to zero."* He recommended building small custom CLI tools over MCP servers.

**Redis** reframed tool selection as a [retrieval problem](https://redis.io/blog/from-reasoning-to-retrieval-solving-the-mcp-tool-overload-problem/) — vector search to find 3-5 relevant tools instead of loading 167 into context.

The pattern is the same everywhere: **don't dump everything into context. Route to what's relevant. Load on demand.**

### Enter the Skill Layer

<p class="section-summary">A skill routes intent to workflows. It doesn't replace tools — it tells the AI which ones to use.</p>

A Skill sits above MCP. It doesn't replace tools — it routes to them. When the Network skill is invoked, it loads a compact SKILL.md (~62 lines) that contains exactly one thing MCP can't provide: an intent routing table.

<table class="data-table">
    <thead>
        <tr>
            <th>Request Pattern</th>
            <th>Route To</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Status, health, what's online</td>
            <td>Workflows/CheckStatus.md</td>
        </tr>
        <tr>
            <td>Turn on/off, brightness, color</td>
            <td>Workflows/ControlDevice.md</td>
        </tr>
        <tr>
            <td>Can't reach, offline, troubleshoot</td>
            <td>Workflows/Troubleshoot.md</td>
        </tr>
        <tr>
            <td>Add device, provision, static map</td>
            <td>Workflows/ProvisionDevice.md</td>
        </tr>
        <tr>
            <td>Firewall rule, block, allow</td>
            <td>Workflows/ManageFirewall.md</td>
        </tr>
        <tr>
            <td>DHCP, lease, ARP, find device</td>
            <td>Workflows/ManageDHCP.md</td>
        </tr>
        <tr>
            <td>Home Assistant, automation, entity</td>
            <td>Workflows/ManageHomeAssistant.md</td>
        </tr>
        <tr>
            <td>Switch port, VLAN, port profile</td>
            <td>Workflows/SwitchPort.md</td>
        </tr>
    </tbody>
</table>

Each workflow file encodes the specific MCP tool calls, their order, which calls can be parallelized, verification steps, and troubleshooting for common failures. The AI doesn't guess. It reads the recipe.

### The Context Savings Are Real

<p class="section-summary">From 4,533 bytes every time to loading only what the request actually needs.</p>

Here's the before and after of what actually loads into the context window.

**Before (tool catalog approach):**

Every invocation loaded the full SKILL.md — 4,533 bytes containing a 36-tool catalog, VLAN topology, device IPs, UniFi network IDs, and interface aliases. All of it, every time, regardless of whether you were turning off a light or provisioning a new device.

Worse: roughly 2,000 bytes of that file duplicated tool descriptions that MCP had *already injected*. Pure waste.

**After (intent routing approach):**

<table class="data-table">
    <thead>
        <tr>
            <th>Request</th>
            <th>What Loads</th>
            <th>Bytes</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>"Turn off the picture frame"</td>
            <td>SKILL.md (62 lines) + ControlDevice.md</td>
            <td>~5,100</td>
        </tr>
        <tr>
            <td>"Check network status"</td>
            <td>SKILL.md + CheckStatus.md</td>
            <td>~4,400</td>
        </tr>
        <tr>
            <td>"Add a device to IoT VLAN"</td>
            <td>SKILL.md + ProvisionDevice.md + Topology.md</td>
            <td>~7,700</td>
        </tr>
        <tr>
            <td>"Why is plug 3 offline?"</td>
            <td>SKILL.md + Troubleshoot.md + Topology.md</td>
            <td>~7,750</td>
        </tr>
    </tbody>
</table>

The topology reference (VLAN tables, device IPs, UniFi network IDs) only loads when the workflow actually needs it. A simple "turn off the lights" never touches it.

But the real win isn't bytes — it's *relevance*. The old model loaded everything and left the AI to figure out what mattered. The new model loads only what's needed and tells the AI exactly what to do with it.

### A Walkthrough: "Turn Off the Picture Frame"

<p class="section-summary">Two MCP calls. No guessing. No scanning 36 tools.</p>

Here's what happens with the new architecture:

**1. Intent Match.** Claude reads the SKILL.md routing table. "Turn off" matches the `ControlDevice.md` pattern.

**2. Workflow Load.** ControlDevice.md loads. It has a device name resolution table:

<table class="data-table">
    <thead>
        <tr>
            <th>User Says</th>
            <th>Device Type</th>
            <th>MCP Name</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>picture frame, frame</td>
            <td>WLED</td>
            <td>pictureFrame</td>
        </tr>
        <tr>
            <td>bath mirror, bathroom</td>
            <td>WLED</td>
            <td>bathMirror</td>
        </tr>
        <tr>
            <td>plug 1/2/3</td>
            <td>Kasa</td>
            <td>ep25-plug-N</td>
        </tr>
    </tbody>
</table>

**3. Execute.** The workflow says: call `wled_set_power(device_name="pictureFrame", on=false)`.

**4. Verify.** The workflow says: confirm with `wled_get_state(device_name="pictureFrame")`.

Two MCP calls. No guessing. No scanning 36 tools. No loading topology data that wasn't needed.

### The Pattern

<p class="section-summary">Four layers, each loading only when the layer above needs it.</p>

This isn't specific to network management. The pattern applies anywhere an AI agent has many tools:

<table class="data-table steps">
    <thead>
        <tr><th>Layer</th><th>Role</th><th>Loading</th></tr>
    </thead>
    <tbody>
        <tr><td>1</td><td>MCP registers the tools — infrastructure</td><td>Always loaded, always available</td></tr>
        <tr><td>2</td><td>The Skill routes intent — decision layer</td><td>Loaded on invocation, maps natural language to workflow files</td></tr>
        <tr><td>3</td><td>Workflows encode sequences — domain knowledge</td><td>Loaded on demand: tool calls, parallelization, verification</td></tr>
        <tr><td>4</td><td>Reference data loads contextually</td><td>Topology, credentials, device lists — only when the active workflow needs them</td></tr>
    </tbody>
</table>

Each layer loads only when needed. Each layer contains only what the layer below doesn't provide.

### Why This Matters Now

<p class="section-summary">The numbers tell the story — orchestration beats brute force by every metric.</p>

A year ago, MCP was the answer to everything. The protocol is still essential — it solved the fundamental problem of giving AI agents access to external systems. But the ecosystem learned what happens when you give a model 50+ tool definitions and no guidance: it gets slower, less accurate, and more expensive.

<table class="data-table">
    <thead>
        <tr>
            <th>Metric</th>
            <th>Brute Force (all tools)</th>
            <th>With Orchestration Layer</th>
            <th>Source</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Tool selection accuracy</td>
            <td>42%</td>
            <td>85%</td>
            <td>Redis</td>
        </tr>
        <tr>
            <td>Token usage per request</td>
            <td>~23,000</td>
            <td>~400</td>
            <td>Redis</td>
        </tr>
        <tr>
            <td>Latency</td>
            <td>~3.4 sec</td>
            <td>~0.4 sec</td>
            <td>Redis</td>
        </tr>
        <tr>
            <td>Multi-step task success</td>
            <td>60%</td>
            <td>92%</td>
            <td>Jenova.ai</td>
        </tr>
        <tr>
            <td>Context consumed by tools</td>
            <td>33-50%</td>
            <td>2-5%</td>
            <td>GitHub Issues</td>
        </tr>
    </tbody>
</table>

Large context windows are a crutch. Yes, Claude can handle 200K tokens. But Chroma proved across 18 models that accuracy degrades 20-50% as context scales. Every token of irrelevant tool description pushes your actual task further into the "lost in the middle" dead zone.

The ecosystem is converging on a three-layer architecture: Skills provide procedural knowledge (how to do things) — cheap, always available. MCP provides external connectivity (access to things) — loaded as needed. Routing provides intelligent discovery — semantic search, not dump-everything-in-context.

Claude Code, Cursor, and GitHub Copilot all independently arrived at progressive disclosure — scoped, modular instruction files that load context based on relevance, not availability. Redis reframed tool selection as retrieval. Simon Willison recommended CLI tools over MCP servers. The pattern is the same everywhere.

MCP gives your AI hands. Skills give it recipes. And recipes — encoded, tested, versioned workflows that load on demand — are how you get from "the AI has access to 36 tools" to "the AI knows exactly which two to call and in what order."

Build the tools. Then build the layer that knows how to use them.

---

*Sources and further reading:*

<table class="data-table">
    <thead>
        <tr><th>Source</th><th>Topic</th></tr>
    </thead>
    <tbody>
        <tr><td><a href="https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents">Anthropic</a></td><td>Effective Context Engineering for AI Agents</td></tr>
        <tr><td><a href="https://www.anthropic.com/engineering/writing-tools-for-agents">Anthropic</a></td><td>Writing Effective Tools for AI Agents</td></tr>
        <tr><td><a href="https://www.anthropic.com/engineering/advanced-tool-use">Anthropic</a></td><td>Introducing Advanced Tool Use</td></tr>
        <tr><td><a href="https://research.trychroma.com/context-rot">Chroma Research</a></td><td>Context Rot</td></tr>
        <tr><td><a href="https://redis.io/blog/from-reasoning-to-retrieval-solving-the-mcp-tool-overload-problem/">Redis</a></td><td>Solving the MCP Tool Overload Problem</td></tr>
        <tr><td><a href="https://simonwillison.net/2025/Aug/22/too-many-mcps/">Simon Willison</a></td><td>Too Many MCPs</td></tr>
        <tr><td><a href="https://eclipsesource.com/blogs/2026/01/22/mcp-context-overload/">EclipseSource</a></td><td>MCP and Context Overload</td></tr>
        <tr><td><a href="https://arxiv.org/html/2602.14878v1">Arxiv</a></td><td>MCP Tool Descriptions Are Smelly</td></tr>
        <tr><td><a href="https://waleedk.medium.com/the-evolution-of-ai-tool-use-mcp-went-sideways-8ef4b1268126">Waleed Kadous</a></td><td>MCP Went Sideways</td></tr>
    </tbody>
</table>

---

*Building AI agents with too many tools? I'm always up for comparing notes — find me on [LinkedIn](https://www.linkedin.com/in/alexmoening/).*

---

## Navigation

- [Home](/)
- [About](/about.html)
- [Projects](/projects.html)
- [Contact](/contact.html)
- [/dev/thoughts](/dev-thoughts/)

*Copyright 2026 Alex Moening. Opinions expressed are my own.*
