MCP, CLI, and Agent Skills: A Research View on Tool Use for LLM Agents
The central question is not whether one interface “wins.” The research question is sharper: which boundary should carry protocol, execution, workflow knowledge, and governance in an LLM agent system?
LLM agents fail in ways that look strange until you treat tool use as a cognitive load problem.
An agent may have the right tool available and still choose the wrong one. It may receive relevant information and still lose it inside a long prompt. It may execute a syntactically valid tool call while misunderstanding the workflow that should surround it. It may be given twenty reasonable tools and perform worse than it did with five.
Those failures are not only model failures. They are interface failures.
The Model Context Protocol, command-line interfaces, and Agent Skills each answer a different part of the tool-use problem. MCP standardizes discovery and invocation. CLI tools provide executable, inspectable operations. Skills encode procedure, examples, constraints, and recovery logic. HTTP APIs and governed gateways centralize authorization and telemetry.
The mistake is collapsing all of those jobs into a single local tool registry.
This article takes a research view of the current debate around MCP, CLI, and Agent Skills. The conclusion is deliberately bounded: MCP over stdio is often a weak default for complex local agents; MCP over HTTP remains appropriate for governed platforms; CLI + Skills is frequently the better local architecture because it separates execution from procedural knowledge and supports progressive disclosure.
Background: Tool Use Is a Multi-Stage Problem
Most descriptions of agent tooling compress the problem into one step: the model calls a tool.
In practice, tool use has at least five stages:
tool-use loop
├── decide whether a tool is needed
├── retrieve or notice candidate tools
├── select the right tool for the task
├── fill valid arguments
└── interpret the result inside a larger workflowTool-learning research repeatedly shows that these stages are separable. MetaTool studies whether models know when to use tools and which tools to choose. WTU-EVAL focuses on whether a tool should be used at all, instead of assuming tool use is always required. ToolRet treats tool retrieval as its own problem, with a large corpus of heterogeneous tools. ToolLLM explored training and evaluating models over thousands of real-world APIs.
The implication is important: an interface that only exposes callable functions solves the execution surface, not the whole tool-use problem.
An agent needs to know what exists, what matters now, what order to follow, what a failure means, and when not to call anything. That is why tool metadata, context layout, command discoverability, and workflow instructions are not implementation details. They are part of the agent’s reasoning environment.
MCP: A Standard Interface, Not a Complete Architecture
MCP matters because it standardizes a real integration boundary. The protocol gives clients a common way to discover capabilities, call tools, read resources, and interact with external systems. The official specification defines transports including stdio and Streamable HTTP, and the broader MCP effort has become a shared vocabulary for agent-tool connectivity.
That standardization is valuable. Before shared protocols, every tool integration could become a custom bridge: one agent plugin for one app, one connector shape for one model client, one hand-written adapter per workflow.
But MCP is not a complete agent architecture. It defines how a client and server communicate. It does not automatically solve tool retrieval, prompt budget, workflow sequencing, human debuggability, or the question of whether the tool surface should be loaded into the model context at all.
The distinction between stdio and HTTP is especially important.
MCP over stdio is convenient for local integrations. A client launches a subprocess and exchanges JSON-RPC messages over stdin/stdout. That convenience makes local experimentation easy, but it also creates a fragile process boundary: stdout must stay protocol-clean, logs belong elsewhere, environment variables become the credential channel, and debugging spans client logs, server logs, schema interpretation, and model behavior.
MCP over HTTP is a better fit for governed platforms. HTTP-based deployments can sit behind OAuth, policy enforcement, service identity, audit logging, telemetry, and centralized lifecycle management. In that setting, the protocol boundary is doing real organizational work.
Context Is Not Free
The strongest technical objection to large local tool registries is context load.
Tool schemas, descriptions, examples, and parameter definitions are not neutral. Once they enter the prompt, they compete with the user task, code context, retrieved evidence, previous decisions, and safety constraints. More context does not automatically mean more capability.
Long-context research has been warning about this for years. In Lost in the Middle, models used information less reliably when relevant evidence appeared in the middle of long contexts. Large Language Models Can Be Easily Distracted by Irrelevant Context showed that irrelevant information can reduce problem-solving accuracy. More recent work on context length alone hurting performance despite perfect retrieval argues that even when relevant information is available, longer inputs can degrade performance across tasks.
The same logic applies to tool definitions. If an agent sees dozens or hundreds of tool affordances that do not matter for the current task, the interface has created a distraction problem before the model has begun reasoning.
Microsoft Research describes this as tool-space interference: adding individually reasonable tools can reduce end-to-end agent performance when the combined tool space becomes harder to navigate. This is not an argument against tools. It is an argument for tool-space design.
CLI + Skills as Progressive Disclosure
CLI tools and Agent Skills are not replacements for protocols in every setting. They are a different decomposition of the local agent problem.
The CLI is the execution substrate. It gives humans and agents the same operational surface: subcommands, help text, flags, exit codes, structured output, authentication, and shell-level composability. A good CLI can be used in a terminal, a script, CI, documentation, and an agent session without changing the underlying interface.
Skills are the procedural layer. Anthropic’s Agent Skills engineering post describes a model of progressive disclosure: metadata is available first, full instructions are loaded only when relevant, and supporting references or scripts are consulted as needed.
That architecture matters because it keeps most procedural knowledge out of the base prompt. The agent starts with an index, not a full manual.
This pattern also aligns with a basic research lesson from tool use: tool selection and tool execution are different problems. A CLI can be excellent at execution while a Skill is excellent at helping the model select, sequence, and interpret operations.
For example, a CLI can expose:
statusfor current statelistfor inventorycreatefor provisioningdeletefor destructive operations--jsonfor structured output--helpfor on-demand discovery
A Skill can encode:
- inspect before mutating
- prefer dry-run modes where available
- ask before destructive operations
- retry only on specific transient failures
- summarize command output instead of pasting everything
- stop when credentials or permissions are missing
MCP can expose similar capabilities, but stdio MCP often makes them passive function calls. CLI + Skills makes the separation explicit.
Where MCP Still Belongs
A research-oriented view should avoid replacing one universal answer with another.
MCP over HTTP remains a strong candidate when the system needs centralized governance. Large organizations often need consistent authentication, authorization, audit logs, usage telemetry, policy enforcement, and shared service ownership. Fragmented local CLI tools do not naturally provide those guarantees.
MCP over stdio remains reasonable for small local bridges with narrow tool surfaces. If a tool has two or three stable actions, no complex workflow, and no need for shared human execution, the overhead may be acceptable.
The weak default is not MCP itself. The weak default is using local MCP as a universal wrapper around every command, service, document source, database helper, browser operation, and internal workflow.
Design Principles for Agent Tool Interfaces
The practical principles are straightforward.
Minimize always-loaded context. Put only indexes, names, and trigger descriptions into the base context. Load full procedures and reference material only when needed.
Separate execution from instruction. Execution surfaces should be deterministic, scriptable, and inspectable. Procedural guidance should live in Skills, docs, policies, or task-specific prompts.
Preserve human debuggability. If an agent ran a command, a human should be able to rerun it, inspect output, and understand why it was selected.
Treat tool retrieval as a first-class problem. Large tool corpora require retrieval, routing, ranking, or scoping before tool selection. Dumping every possible tool into the model context is a brittle baseline.
Choose transport by governance boundary. Stdio is a local process convenience. HTTP is a platform boundary. CLI is an execution boundary. Skills are a workflow boundary.
Conclusion
The debate around MCP, CLI, and Agent Skills is really a debate about context management.
LLM agents do not become reliable just because they can call more tools. They become reliable when the tool space is scoped, relevant information is loaded at the right time, workflow knowledge is explicit, and execution can be inspected.
For local developer agents, CLI + Skills often gives the better decomposition: the CLI does the work, the Skill teaches the workflow, and the agent keeps most irrelevant detail out of context until it is needed. For enterprise platforms, MCP over HTTP still has a strong role because governance is itself part of the system.
The end state is not “everything becomes an MCP server.” It is a layered agent architecture where each boundary is chosen for the job it actually performs.
Sources: MCP transport and authorization specifications; Anthropic’s MCP donation announcement and Agent Skills engineering note; Microsoft Research on tool-space interference; Liu et al. on lost-in-the-middle long-context behavior; Shi et al. on irrelevant-context distraction; Amazon Science/arXiv work on context length and performance; MetaTool, WTU-EVAL, ToolRet, ToolLLM, MCP-Atlas, and survey literature on LLM tool learning.