Custom AI Agents

Part 4 — Tools and MCP

How do agents interact with the real world — and what security problem came with the answer?

8 min · Updated June 2026

A model that can only talk is a curiosity. A model that can query your claims database, file a ticket, send an email, or read a spreadsheet is a worker. Tools are how that happens, and in 2026 the way tools are integrated has been substantially standardised by the Model Context Protocol (MCP).

Q4.1 — What is MCP and why did it take over?

MCP is an open protocol — originally from Anthropic, donated to the Linux Foundation in December 2025 and now co-stewarded by Anthropic, OpenAI, Google, Microsoft, AWS, and Cloudflare — that standardises how an agent connects to tools and data sources.

The mental model people reach for is “USB-C for AI agents”: instead of writing a bespoke integration for every tool-and-model combination, you build an MCP server once (wrapping your database, your API, your file store) and any MCP-compatible client — any agent, any IDE — can use it.

Adoption has been fast. By spring 2026 the protocol is supported natively across every major lab and IDE, there are many thousands of public MCP servers in registries, and enterprise adoption is well into majority territory. For a vertical agent builder, MCP gives you portability, discoverability through registries, and vendor independence.

Q4.2 — When should I use MCP vs. plain function calling?

You do not have to use MCP. The older approach — defining tools directly in your code as functions the model can call — is still completely valid and is often better for tools that live inside your own application.

Function calling wins on performance (no network hop, no protocol overhead) and on fine-grained control. Best for tools tightly coupled to your app.
MCP wins on portability, reuse across teams and clients, and integration with third-party tools. Best as the boundary between your agent and shared or external systems.

A reasonable rule: internal, tightly-coupled tools as native functions; anything shared, external, or reused across multiple agents as MCP servers.

Q4.3 — How do I design tools agents can actually use?

Two patterns dominate good tool design in 2026, both driven by the same problem — too many tools and too much tool output drowning the context.

Tool examples beat tool schemas. A JSON schema tells the model the shape of a tool’s arguments but not how to use it well. Adding a few concrete usage examples has been shown to lift parameter-handling accuracy substantially — one report cited an improvement from around 72% to around 90% on complex parameters. Treat tool definitions like documentation for a junior colleague: show, don’t just specify.

Let the agent call tools as code, not as round-trips. Instead of the classic loop — model emits one tool call, waits for the result, emits the next — newer approaches let the model write a small program that orchestrates many tools at once in a sandbox:

Anthropic’s Programmatic Tool Calling (generally available with Sonnet 4.6 as of February 2026): the model writes Python that runs in a managed container, calling tools as functions and only surfacing the final result to its context. Reported token reductions of around 37% on multi-tool workflows.
Cloudflare’s Code Mode: generate code-level interfaces from MCP tool schemas and let the model write JavaScript against them in a sandboxed isolate. On Cloudflare’s own API this collapsed the token cost from over a million tokens to around a thousand — a roughly 99.9% reduction.

For tool-heavy agents, generating code that calls tools is dramatically more efficient than chatting one tool call at a time. Once your agent crosses roughly twenty tools, this stops being optional.

There is also parallel tool calling— having the agent fire several independent tool calls at once. It is faster for broad search-style tasks, but be aware that the parallelism can burn on the order of 15× the tokens of a single conversation. Use it where the task value justifies the spend, not reflexively.

Q4.4 — What are Agent Skills and when do they matter?

An Agent Skill is an open standard where a capability is packaged as a folder containing a SKILL.md instructions file plus any scripts and resources. Only a few dozen summary tokens load into context until the agent actually needs the skill, at which point the full detail loads.

It is a clean way to give an agent deep, domain-specific procedures — how your firm drafts a particular contract type, how your hospital codes a particular encounter — without permanently bloating its context. Think of it as progressive disclosure applied to expertise.

Q4.5 — What is the MCP security problem, and what is the defensive posture?

The rapid, sprawling adoption of MCP has created a real and active security problem. If you are building in a regulated vertical, you cannot treat this as a footnote.

Through late 2025 and into 2026, security researchers documented a steady stream of serious vulnerabilities: an architectural flaw in the official SDKs exposing large numbers of servers to remote code execution; server-side request forgery flaws; DNS-rebinding issues in official SDKs; and at least one real supply-chain attack where a backdoored server was published to a public registry.

The OWASP Top 10 for Agentic Applications (released late 2025) names agent goal hijacking — manipulating an agent into pursuing an attacker’s objective, often via prompt injection delivered through tool output or retrieved content — as the top risk class.

The defensive posture that has emerged:

Never expose raw MCP servers directly to the model client. Front everything with an MCP gateway or portal that provides single sign-on, per-tool access curation, audit logging, and data-loss-prevention scanning. The point is a controlled front door.
Use OAuth, not static API keys, and watch the protocol’s move toward short-lived, federated workload identities.
Treat locally-installed, unvetted MCP servers as a liability, not a convenience. Pin versions, track the CVE feed, patch promptly.
Sandbox anything that executes. Run tools in an isolated worker with no host filesystem access by default and tight egress controls — never in your main application process. WebAssembly-based per-call sandboxing is emerging as a strong answer here.
Guard the inputs to tools and the outputs from retrieval, because that is where prompt injection rides in.

An agent with tools is an agent with an attack surface. Design for that from day one.

Q4.6 — What about computer use and browser automation?

Some business workflows have no API — legacy ERPs, insurance claims systems, government portals. For these, agents can now drive a screen directly via Anthropic’s Computer Use tool or Playwright-based MCP servers. This is powerful, but it dramatically amplifies the prompt-injection surface — every page the agent reads is untrusted input — so it demands the strictest guardrails and human checkpoints.