For over a decade, we’ve been designing APIs (REST, GraphQL, gRPC) with two consumers in mind: frontend apps (web/mobile) and the developers integrating them. We wrote pretty Swagger docs, thought hard about pagination, and shaped JSON responses so UI components could render them easily.
Something is shifting, though. And it’s moving fast.
A new kind of consumer
With the rise of AI Agents, LLMs, and specifications like the Model Context Protocol (MCP), our APIs now have a new type of client that works very differently: machines that can reason.
Anthropic open-sourced MCP in November 2024 as an open standard for connecting AI assistants to external systems. Think of it like USB-C for AI applications: instead of writing custom integrations for every data source, you build against a single protocol. MCP follows a client-server architecture where a host application connects to multiple servers, each exposing specific capabilities (tools, resources, prompts) through a standardized interface. The protocol already joined the Linux Foundation’s Agentic AI Foundation (AAIF) in 2025, and by late 2025 it had become the de facto standard for tool and data access in agent-style LLM systems.
AI Agents don’t care whether your JSON is consistently formatted or whether you set proper caching headers. They don’t need a pretty UI. What they need is context, discoverability, and clear schema descriptions (like JSON Schema) so they can perform tool calling correctly.
Tool calling (also known as function calling) is how LLMs interface with external systems. OpenAI, Anthropic, Google, and others have all converged on a similar pattern: you describe available functions using JSON Schema, the model decides when to call them based on user intent, and your application executes the actual function with the model’s suggested arguments. The model never runs the function itself. It just returns structured data saying “call this function with these arguments,” and your code does the rest.
What is Agentic Architecture?
Agentic Architecture is a shift in how we design backends. Here’s what that looks like in practice:
-
Self-describing endpoints. Endpoints don’t just return data. They also return metadata about what the client can do next. HATEOAS (Hypermedia as the Engine of Application State) is becoming relevant again, but not for browsers. For Agents trying to figure out their next step.
This isn’t just theory. Darrel Miller, partner API architect at Microsoft, put it plainly: “Hypermedia is an effective way of accumulating the results of past choices and constraining the potential useful tools available for the next interaction.” When an API response includes a
_linksarray with available actions, the Agent doesn’t need to guess or hallucinate. If thepayaction disappears from the response because an invoice is already paid, the Agent physically cannot attempt that action. Mike Amundsen built a framework called GRAIL (Goal-Resolution through Affordance-Informed Logic) that demonstrates this: agents discover operations at runtime, try things, fail gracefully, learn what’s needed, and continue. No prior API knowledge required. -
Context-heavy payloads. When returning errors, a message like
"400 Bad Request: Invalid User ID"is no longer sufficient. We need messages that help AI Agents fix their own mistakes:"User ID must be a UUID v4. You passed an integer. Please call the /search/user endpoint first to resolve the UUID."Roni Dover from Digma documented this pattern after experimenting with MCP tools: when his API returned an empty array, the agent just gave up. When he added suggested next steps to the response (“Try searching for endpoints that use this function, or suggest manual instrumentation”), the agent immediately started pulling on new threads and produced useful results. API responses, when consumed by LLMs, are essentially a reverse prompt. An ended interaction is a dead end. Any data you return gives the agent a chance to continue exploring. -
Async by default. AI Agents need time to reason. APIs must be friendlier toward long-running operations returned via webhooks or polling. This is already on MCP’s roadmap. The protocol’s next release adds async support so servers can kick off long-running tasks while clients check back later for results. OpenAI added background mode to their Responses API for the same reason: long-running responses without holding a client connection open. Building agents went from “send request, get response” to proper event-driven system design.
-
Tool registries (MCP). Servers don’t just serve business data. They expose themselves as dynamically registered tools so language models can “use” the system like a plugin. MCP’s roadmap includes server identity via
.well-knownURLs, so clients can discover what a server can do without connecting first, and an MCP Registry (which launched in preview in September 2025) for community-driven server discovery. -
Machine-readable documentation. The
llms.txtstandard, proposed by Jeremy Howard in September 2024, exists because AI agents have finite context windows and your docs site isn’t designed for machine consumption. It’s a Markdown file at/llms.txtthat curates your most important documentation pages. Stripe’s implementation includes an instructions section telling AI agents which APIs to prefer and which deprecated patterns to avoid. Every time a developer asks an agent to integrate your API, those instructions shape the answer.
The Agent Experience (AX)
Apideck coined the term “Agent Experience” (AX), the same discipline as Developer Experience (DX) applied to autonomous consumers. The gap matters: ambiguity that a developer resolves through experience becomes a failure mode at scale when agents are writing the integrations.
The 2025 article “Designing AI-Ready Java APIs” by Markus Eisele outlines nine principles for agent-grade APIs. Some that stood out to me:
- Explicit discoverability over convention. If your API requires knowledge of a secret naming pattern or framework magic, an AI agent will miss it. Favor clarity and self-description over implicit conventions.
- Rich and actionable error context. The best errors read like a guide. “Missing required field
first_name” directly tells the agent what to add. “Currently in read-only mode; callenableWrite()to modify data” turns the error into a to-do list. - Performance transparency. If your API has expensive operations, expose that. Provide pagination options, batch endpoints, and document default vs. max values for expensive parameters.
As Eisele put it: by crafting your error text well, you’re essentially scripting the agent’s next action.
Preparing for the new era
We’re slowly moving from “API for UI” to “API for Agents.”
As engineers, our job is no longer just moving data from a database to a user’s screen. We’re building the plumbing that LLMs will call autonomously, and that plumbing needs to be self-explanatory. Google already published a guide to agentic design patterns covering single-agent, multi-agent coordinator, swarm, and custom logic patterns. OpenAI’s 2025 developer recap explicitly calls 2025 “the year of agent-native APIs.”
The APIs we build today will increasingly be consumed by code that decides on its own what to call and when. Worth keeping that in mind next time you write an error message.