The Exoskeleton of Agents: Beyond the Model, What Determines the Ceiling of AI
Author: Ding Zhiyu NeverGpDzy | Research Date: 2026-04-24
Research Subject: AI Agent Engineering | Subject Type: Technical Paradigm / Engineering Architecture
1. One-Sentence Definition
Agent Harness is not a new model, nor is it simply wrapping an LLM with a tool-calling interface. It is a complete execution exoskeleton built around the model: planning, state, memory, file systems, tools, permissions, subtasks, evaluation, rollback, sandboxing, and the interaction rules between humans and models are all embedded within this exoskeleton.
LangChain stated it plainly in its March 2026 article: the best Agents today are not merely those with stronger models, but those with more mature harnesses outside the model. The model provides intelligence; the harness determines whether that intelligence can be reliably delivered into real-world tasks.
This statement may look like an engineering detail.
But it is actually the most important perspective shift in the Agent space in 2026.
In the past, the question was which model is smarter. Now, more and more people actually building applications are beginning to ask: what kind of execution system has the smart model been placed into?
Once this question is asked, the competitive landscape of Agents changes.
2. Vertical Analysis: From "Models Can Think" to "Systems Can Act"
2.1 Early Agents Were Essentially Prompt Protocols
To understand the harness, you cannot start with LangChain's 2026 article.
You need to step back a bit.
Around 2022, the most exciting capability of large language models was not that they could call APIs or manage file systems, but that they could begin to write out their reasoning processes. Chain-of-Thought prompting enabled models to generate intermediate reasoning steps before producing an answer. ReAct further placed reasoning and action in the same loop: the model thinks a step, decides which tool to call, observes the result, and continues thinking.
The ReAct lineage was critical.
Because it clarified the minimal form of an Agent. An Agent does not merely answer questions; it should be able to act within an environment. The environment at that time might only be a search engine, Wikipedia, a calculator, or a simple task environment. But once actions emerged, the Agent was no longer a pure text generator.
It began to require an external loop.
This external loop was initially quite rudimentary. The prompt specified several formats: Thought, Action, Observation. The model output according to the format, the external program parsed out the Action, called the tool, injected the result back into the context, and let the model continue. By today's standards, this system looks primitive, but it had already planted the seeds of the harness.
The model was not running on its own.
There was an external program deciding when to feed tool results back, when to stop, when to consider the task complete, and when to let the model retry.
Toolformer addressed another question during the same period: can models learn on their own when to call tools? Its focus was on model capabilities, but the implications for engineering systems were direct. Once models begin relying on external tools, tool calling ceases to be an ancillary feature and becomes infrastructure for models to complete complex tasks.
So from 2022 to early 2023, the core vision of an Agent was this: the model handles reasoning, and external tools handle what the model cannot do.
At this point, the harness did not yet have a name.
It was just the glue code that held the model, tools, and context together.
2.2 The AutoGPT Moment: Exposing Agent Problems in Full View
In 2023, projects like AutoGPT brought Agents into the public eye.
Many people saw for the first time that a model could decompose tasks on its own, call search engines, write files, and iterate further. Their immediate reaction was, "Is this thing going to do the work by itself?" The Agent narrative during that period ran very hot, perhaps even overheated. People began imagining dropping in a goal and having the model complete an entire project autonomously.
But problems surfaced quickly.
The model would get lost.
It would fall into loops, forget earlier constraints, misuse tools, and break a simple task into a pile of unnecessary steps. It appeared to be busy the entire time, but the final output was often hollow.
This was a highly educational moment for the Agent space.
It told everyone that merely letting the model "think about the next step" is not enough. A truly usable Agent system needs task state, plan structures, checkpoints, tool permissions, failure recovery, and observability. Otherwise, what you see is a machine working very hard, but without a steering wheel, brakes, or a dashboard.
This is also why frameworks like LangChain gained rapid early popularity. LangChain was born in the second half of 2022. Its initial value was straightforward: encapsulate LLMs, prompts, tools, chained calls, and retrieval so that developers could build applications faster. It did not start out being called a harness, but it was solving exactly the early problems of the harness.
How to connect the model to tools.
How to organize multi-step calls.
How to chain context, retrieval, and output into a workflow.
It was just that the industry in 2023 had not yet fully realized that the truly difficult part was not connecting these capabilities together, but making them work reliably over the long term.
2.3 In 2024, the Industry Retreated from "Agent Magic" to "Workflow Engineering"
2024 was a fascinating inflection point.
The previous year, the market loved talking about autonomous agents. By 2024, more and more developers began to acknowledge that fully unleashed Agents were not very useful. What could actually be deployed tended to be more controllable workflows.
Anthropic later drew a similar distinction in Building Effective Agents: a workflow orchestrates models and tools through predefined code paths, while an agent lets the model dynamically decide the process. This distinction is important. It is not a rejection of Agents, but a reminder that higher model freedom is not always better.
It was in this context that LangGraph became important.
LangGraph's positioning was not to give developers another flashy prompt template, but to provide Agent applications with a controllable graph runtime. State can be saved, flows can branch, nodes can retry, humans can intervene, and execution can be persisted. You can place the model at certain nodes in the graph and let it make judgments, but the entire system is not a cloud of freely floating text.
This is the process by which the harness began to evolve from "glue code" into a "runtime."
During the same period, Microsoft AutoGen brought multi-Agent dialogue and collaboration to the forefront. Semantic Kernel emphasized plugins, planners, and the agent framework in enterprise applications. CrewAI packaged agents, tasks, crews, and flows into a more intuitive work automation model. LlamaIndex, starting from data and retrieval, placed Agents into knowledge work and RAG scenarios.
These trajectories look different.
But they are all answering the same question: if an LLM is going to work over the long term, how much engineering structure does it need on the outside?
The answer is becoming clearer.
A lot.
2.4 After MCP: The Tool Layer Begins to Standardize
In November 2024, Anthropic released the Model Context Protocol (MCP). The pitch was simple: it is an open standard for connecting AI assistants to data sources and tool systems.
But the change behind it was not simple.
Before MCP, every Agent framework could have its own tool protocol. You wrote one set of tools for LangChain, another for Claude, and yet another for Cursor or a local automation system. This works in the short term but becomes fragmented over time. As tools proliferate, integration costs become a real bottleneck.
MCP pushed things a step toward standardization.
This is critical for the harness. Because the harness's value lies not only in invoking the model, but also in managing the world the model can touch. File systems, databases, browsers, GitHub, Slack, internal knowledge bases, cloud services -- these are all the Agent's hands and feet.
If the tool layer lacks standards, the harness gets bogged down by integration costs.
If the tool layer begins to standardize, the competitive focus of the harness shifts upward: who manages state better, who plans tasks better, who protects permissions better, who turns long-term memory into usable assets.
This is the significance of protocols like MCP. They do not directly replace the harness; they are paving the road for it.
2.5 Coding Agents Made the Value of the Harness Impossible to Ignore
What truly gave the word "harness" its weight was coding Agents.
The reason is simple. Writing code is the scenario best suited to exposing an Agent system's capabilities.
A coding Agent cannot just chat. It needs to read files, search code, understand repo structure, modify multiple files, run tests, read error logs, roll back bad changes, handle user requirements injected mid-stream, and know which commands are dangerous and which files should not be touched.
The model certainly matters.
But the system outside the model matters more.
The differences among products like Claude Code, OpenAI Codex, Cursor, OpenHands, and Devin often lie not in the model itself. More critical is their harness. Is there a file tree view? A shell? A patch mechanism? Task planning? Permission confirmation? Persistent memory? Subtasks? A mechanism to compress context and continue pushing forward?
LangChain's article The Anatomy of an Agent Harness was written against this backdrop. The most striking case in the article was that, holding the model constant, they improved a coding Agent's ranking on Terminal Bench 2.0 from Top 30 to Top 5 through harness engineering. This result should not be over-glorified, since a benchmark is only one slice, but it demonstrates a very plain fact: an Agent's capability does not reside entirely in the model.
Capability also resides in the outer shell.
LangChain's article decomposed the harness into several core components: planning, filesystem, subagents, stateful middleware, context engineering, tool and permission layer, and evaluation loop. Each of these appears unremarkable in isolation, but together they change how an Agent works.
Planning addresses the model's short-term impulsiveness. The model easily gets drawn to the immediate next step; the planning tool lets it decompose tasks, track progress, and review what comes next.
Filesystem addresses the context window problem. The model cannot stuff everything into the prompt forever; the file system becomes external working memory. Intermediate results, drafts, long documents, and code snippets can all be placed outside the context and read back when needed.
Subagents address context pollution and role separation. A main Agent does not need to stuff every detail into its own context; it can delegate local tasks to specialized sub-Agents and receive compressed results.
Stateful middleware addresses long-term behavioral patterns. What the model sees at each step, which tools it uses, what memory it inherits, and what policies it follows should not all depend on a static system prompt. The middleware layer can dynamically inject context and also intercept dangerous actions.
Together, these capabilities mean the Agent is no longer a bare model loop.
It has become a small operating system.
2.6 In 2026, the Harness Evolved from Engineering Technique to Strategic Asset
By 2026, the word "harness" began to be placed much more explicitly center stage.
LangChain's article reads almost like a manifesto: models matter, but stop staring only at the model. Your Agent's performance may be largely determined by the harness.
OpenAI is pushing in a similar direction. The Agents SDK productizes agents, handoffs, guardrails, sessions, and tools. The Responses API unifies tool calling, remote MCP, computer use, code interpreter, and file search into a single interface. It is not a purely open-source harness, but a model provider's native harness.
Anthropic's trajectory is equally clear. Claude Code and the Claude Agent SDK distill the coding Agent practice into an SDK, while MCP standardizes tool integration. Its advantage is a strong product experience; its weakness is equally obvious: the better it works, the more easily critical workflows sink into the provider's own ecosystem.
LangChain's position is somewhat different.
It aims to be a model-agnostic harness. That is, the model can be swapped, while the underlying workflows, state, memory, tools, and evaluation remain in the developer's hands. LangChain later wrote Your Harness, Your Memory, and its core concern lies precisely here: if an Agent's long-term memory and execution habits are taken over by closed-source products, what users and enterprises truly accumulate is not their own asset, but the platform's asset.
This pushes the harness from an engineering problem to a strategic problem.
Whose harness you use is, to some degree, handing over your workflows, context, memory, and control.
3. Horizontal Analysis: The Current Agent Harness Competitive Landscape
3.1 LangChain / LangGraph / Deep Agents: The Representative Open Harness Route
LangChain's position today can no longer be understood merely as an "LLM application framework."
It is more like building a three-layer structure for the Agent engineering stack. LangChain provides foundational abstractions and integrations, LangGraph provides a controllable graph runtime, and LangSmith provides tracing, evaluation, and observability. Deep Agents extracts validated harness patterns from coding Agents and packages them into a reusable pattern library.
The strengths of this route are clear.
First, it gives developers control. You can choose OpenAI, Anthropic, Google, or open-source models; you can define your own tools; you can decide where memory is stored.
Second, it prioritizes state. The hardest part of an Agent system is not getting the model to answer a single question, but keeping state consistent across multiple turns, multiple tools, and multiple failure paths. This is where LangGraph's core value lies.
Third, it is suited for complex business Agents. Many enterprise scenarios do not allow a black-box Agent to freely operate internal systems. They need human review, permissions, logging, replay, version control, and evaluation. LangChain's stack is more like a skeleton that engineering teams can modify.
But it also has weaknesses.
Openness and control typically bring complexity. Developers need to understand graphs, states, nodes, edges, memory, checkpoints, tool schemas, middleware, and tracing. For someone who just wants to quickly build an assistant, this can feel heavy.
This is LangChain's ecological niche: it is not the most hassle-free closed-source Agent product, but a toolbox for teams that want to own their harness.
3.2 OpenAI Agents SDK / Responses API: The Model Provider's Native Harness
OpenAI's Agents SDK takes a different path.
It directly productizes common Agent development elements as SDK concepts: Agent, Tool, Handoff, Guardrail, Session, Tracing. The Responses API further unifies model output and tool calling into a single interface, supporting file search, code interpreter, computer use, remote MCP, and other capabilities.
The advantage of this route is seamlessness.
The model, tools, runtime, evaluation, and logs all live within the same provider ecosystem, and in many cases developers do not need to wire things together themselves. Especially when the model itself possesses stronger tool calling, structured output, and multimodal capabilities, the native harness can easily capture first-mover advantages.
It suits two types of teams.
One type is teams already deeply using OpenAI models and APIs. For them, the Agents SDK reduces a lot of wiring work.
The other type is those who want to quickly validate an Agent product. Compared to building your own LangGraph, state storage, and permission system, OpenAI's native solution is faster.
But the risks are equally clear.
If your Agent's most important assets are user workflows, long-term memory, tool-calling histories, and policy preferences, then where these things are stored becomes critical. The more complete the model provider's native harness, the higher the potential switching cost. In the short term it boosts efficiency; in the long term it may create lock-in.
My judgment is that OpenAI's line will be strong in general-purpose Agents and enterprise APIs, but for teams that place particular value on control, auditability, privatization, and model substitutability, an open harness is still needed as a safety net.
3.3 Anthropic Claude Code / Claude Agent SDK / MCP: Working Backward from Product Experience to Standards
Anthropic's trajectory has a distinctive characteristic: it does not first build a large, comprehensive framework and then layer products on top. Instead, it first polished a high-frequency use case like Claude Code, then distilled the mature capabilities into SDK and protocol layers.
Claude Code gave many developers their first visceral sense that the quality of a coding Agent depends heavily on the external system. It can read repos, run commands, modify files, explain diffs, handle errors, and use permission mechanisms to control dangerous actions. All of this is the harness.
MCP, meanwhile, positions Anthropic as not just building its own closed-source product, but competing for a standard position at the tool layer. It may not monopolize all Agent harnesses, but it aims to become a universal plug through which Agents connect to the outside world.
The advantage of this route is strong product sense.
Claude Code is not an abstract paper; it is used by developers every day to do real work. Real tasks surface real problems: what to do when context is too long, when there are too many tools, when the user changes requirements mid-stream, when tests fail, and how to control permissions. This kind of product practice, in turn, makes the SDK closer to actual needs.
The weakness is boundaries.
If you want to fully port the Claude Code experience into your own business system, you will hit provider boundaries. The Claude Agent SDK can extend some capabilities, and MCP can standardize tools, but the core experience and model remain tightly bound to the Anthropic ecosystem.
So Anthropic's ecological niche is more like a high-quality closed-source Agent product combined with a tool standard promoter. It does not necessarily give you the greatest freedom, but it will continuously demonstrate what a truly good harness looks like.
3.4 Microsoft AutoGen / Semantic Kernel: The Enterprise Multi-Agent and Orchestration Route
Microsoft's route is more like what enterprise engineers would favor.
AutoGen's early use of multi-Agent dialogue demonstrated the potential for Agent collaboration, and it has since been evolving toward an event-driven, distributed, scalable agentic application framework. Semantic Kernel is closer to enterprise application development, emphasizing plugins, planners, memory, the agent framework, and integration with the .NET and Azure ecosystems.
The core question for these frameworks is not "can you build a cool demo," but "can it fit into enterprise software architecture."
The scenarios they excel in tend to be not individual coding Agents, but process automation, business system integration, multi-role collaboration, and enterprise knowledge and permission systems. For example, a sales operations Agent, an internal IT ticket Agent, or a hybrid knowledge-base Q&A and process execution Agent -- the Microsoft ecosystem handles these more smoothly.
The weakness stems from the same source.
Enterprise-grade frameworks tend to feel heavy. There are many abstraction layers, many concepts, and tight coupling with cloud platforms. For independent developers, it is not as out-of-the-box as Claude Code, nor as easy to understand as CrewAI.
Its niche is stability, not lightness.
3.5 CrewAI: Packaging Agent Collaboration as an Organizational Metaphor
Much of CrewAI's popularity comes from its sufficiently intuitive mode of expression.
Agent, task, crew, flow -- these are not heavily technical terms. Ordinary developers can easily grasp the idea: one Agent plays a role, a group of Agents form a crew, and together they complete a set of tasks.
This abstraction works well for business automation and prototyping.
If you want to build a market research workflow, a content production workflow, or a sales lead processing workflow, CrewAI lets you quickly assemble the structure. It lowers the barrier to entry and makes it easier for people without deep engineering backgrounds to participate in Agent workflow design.
But its problem is that organizational metaphors sometimes obscure real engineering issues.
Role division does not equal reliable execution. Multiple Agents conversing does not equal a smarter system. The more complex the task, the more it requires state, permissions, evaluation, exception recovery, and context governance. CrewAI is very comfortable for lightweight workflows; for high-reliability production systems, teams still need to invest significantly in harness engineering.
So CrewAI's position is built on ease of use and strong narrative.
It is suited for taking Agent collaboration from 0 to 1, but going from 1 to 10 often requires a more robust runtime and observability capabilities.
3.6 LlamaIndex Workflows / Agents: Starting from Data and Knowledge Work
LlamaIndex's starting point is not Agents, but data.
It gained widespread use because developers needed to connect private data to LLMs. RAG, indexing, retrieval, document parsing, and knowledge bases are its core foundation. Later, Agents and Workflows entered the LlamaIndex system, actually answering a more specific question: when the model does not merely answer questions but performs multi-step knowledge work around data, how should this be organized?
This gives LlamaIndex's Agent trajectory a distinctly data-oriented character.
It is suited for knowledge-intensive tasks. Examples include enterprise knowledge base analysis, document processing, research assistants, data retrieval, and structured organization. Its strength lies not in general-purpose coding Agents, but in enabling Agents to work better on top of the data layer.
Weaknesses correspondingly exist.
If a task requires complex OS-level execution, long-cycle code modification, or multi-tool permission control, LlamaIndex is not the most natural choice. It can connect to these, but it is not where it shines brightest.
3.7 The Current Landscape: Not a Winner-Take-All, but Layered Competition
If you place all these players on the same map, you will find that the Agent Harness is not a single racetrack.
It divides into at least four layers.
The first layer is the model provider's native harness. OpenAI and Anthropic are both competing aggressively here. The advantage is tight integration between model capabilities and tool interfaces; the downside is lock-in.
The second layer is the open engineering harness. LangChain / LangGraph is the representative. The advantage is control and composability; the downside is complexity.
The third layer is the enterprise orchestration harness. Microsoft AutoGen, Semantic Kernel, and similar frameworks are better suited for enterprise architectures. The advantage is governance and ecosystem; the downside is weight.
The fourth layer is the vertical scenario harness. Claude Code targets coding, LlamaIndex targets knowledge and data, and CrewAI targets team workflows and automation prototyping.
So the current situation is not as simple as "who will win."
The more realistic question is: what kind of task is your Agent actually performing?
If you are an individual developer writing code, a product-grade harness like Claude Code is very strong.
If you are an enterprise integrating Agents into internal systems, LangGraph, Semantic Kernel, and the OpenAI Agents SDK may all enter consideration.
If the task involves knowledge bases and documents, LlamaIndex's data stack is more natural.
If you are quickly validating a multi-role automation workflow, CrewAI will be smoother.
The harness is not a one-size-fits-all component.
It is becoming the ecological niche of the Agent product itself.
4. Cross-Insights
4.1 History Shaped the Present: Why LangChain Was the First to Articulate the Harness Clearly
The fact that LangChain could articulate the concept of the harness so clearly in 2026 is no coincidence.
From day one, it has been building the glue layer.
Early LangChain solved the problem of how to connect LLMs, tools, prompts, and retrieval. Later, the industry found that simple chains were not enough, so LangGraph emerged to handle state and workflow. Still later, coding agents and deep agents made planning, file systems, sub-agents, and middleware important in complex tasks, and Deep Agents appeared as a natural progression.
Looking at this trajectory, there was no sudden pivot.
It was the same problem going deeper each time.
From connecting models to connecting tools.
From connecting tools to connecting state.
From connecting state to connecting long-term memory and execution strategies.
From connecting execution strategies to competing over "who owns the Agent workflow."
This is the historical root of why LangChain emphasizes open harness today. It knows it is difficult to compete head-on with OpenAI and Anthropic on closed-source model capabilities, so it must place its value outside the model: runtime, memory, tools, evaluation, and portability.
4.2 The Real Divide Between Closed-Source Products and Open Harnesses Is Not Feature Count
When many people discuss the OpenAI Agents SDK, the Claude Agent SDK, and LangGraph, they naturally compare feature tables.
Is there tool calling?
Is there MCP?
Is there tracing?
Is there memory?
These certainly matter, but they are not the deepest divide.
The real divide is: who owns the Agent's "experience."
The longer an Agent is used, the more it accumulates: user preferences, organizational processes, common failures, tool usage habits, project structure understanding, historical decisions, internal jargon, and permission boundaries. These things are not all written in code, nor all written in the prompt. They are scattered across conversations, files, traces, memory, tool logs, and evaluation cases.
These are the sediment of the harness.
If this sediment remains in an open harness, enterprises and developers can migrate models, switch providers, and self-govern.
If this sediment remains in a closed-source product, the short-term experience may be better, but long-term migration costs will only increase.
This is not simply open-source sentiment.
It is a question of asset ownership.
4.3 The Moat of Agents Is Shifting from "Smart Models" to "Controllable Processes"
In 2023, people were still asking whether models could complete goals on their own.
In 2026, the more realistic question is whether the system can detect errors, limit damage, and recover.
This is the true value of the harness.
The higher-value the task, the less you can afford to let the model operate freely. Law, finance, healthcare, enterprise operations, code deployment, data modification -- in these scenarios, an Agent getting nine things right and one thing wrong can still be very costly.
So the moat will gradually shift from single-inference capability to process governance capability.
Whoever can turn the model's actions into auditable, replayable, evaluable, interruptible, and authorizable processes is closer to a production-grade Agent.
This is also why I do not believe the harness is merely "a shell around the model."
It is more like the distribution grid in an electrical power system. No matter how powerful the generator, if electricity cannot be delivered stably, safely, and on demand to every device, what the user experiences is still a blackout.
The model is the generator.
The harness is the distribution grid.
4.4 Three Future Scenarios
The most likely scenario is that hybrid harnesses become the norm.
Enterprises and developers will not choose just one. The model provider's native harness will handle part of the general-purpose tools and multimodal capabilities; the open harness will handle business state, long-term memory, audit, and model substitutability; MCP will handle tool connections. A real system may simultaneously include the OpenAI Agents SDK, LangGraph, internal permission systems, and MCP servers.
This does not sound sexy, but engineering is often like that.
The most dangerous scenario is that closed-source harnesses absorb working memory.
If a user's daily tasks, preferences, file operations, tool trajectories, and context compression strategies all sink into a closed-source Agent product, the model provider is no longer just selling models -- it is seizing control of the user's way of working. At that point, switching models is not hard, but switching harnesses is. Because what you are replacing is not just an API, but an entire set of trained work habits.
The most optimistic scenario is that tools, memory, skills, and traces all develop more open, portable standards.
MCP has already opened a breach at the tool layer. In the future, if memory, skills, evaluation traces, and permission policies can also be partially standardized, the Agent ecosystem will not be entirely absorbed by a few closed-source products. Developers could migrate a team's Agent workflow from Claude to OpenAI, from cloud to local, from one framework to another.
This would let model competition return to models, and harness competition return to experience and engineering quality.
4.5 A Decision Framework for Developers
If you are choosing a harness now, do not start by asking which one is hottest.
Start by asking four questions.
How long is the time horizon of your task? If it is just a one-off Q&A, you do not need a heavy harness. If the task spans files, days, or projects, state and memory become important.
Can your task tolerate errors? If not, prioritize looking at permissions, auditability, human-in-the-loop, tracing, and rollback.
Where are your core assets? If the core assets are organizational processes, user memory, and internal tools, try not to let them sink entirely into a closed-source product.
Does your team have engineering capability? If not, closed-source products and high-level SDKs are better for getting started. If so, an open harness like LangGraph can buy long-term control.
My overall judgment is that the Agent Harness will become the true dividing line for Agent applications over the next two years.
Model gaps will persist, but application gaps will increasingly come from outside the model.
Whoever can place a model inside a good system truly owns the Agent.
5. Sources
All access dates are 2026-04-24.
| Type | Source | Purpose |
|---|---|---|
| Core Article | LangChain, The Anatomy of an Agent Harness, https://blog.langchain.com/the-anatomy-of-an-agent-harness/ | Agent Harness definition, component breakdown, Terminal Bench 2.0 harness engineering case study |
| LangChain Extended | LangChain, Your Harness, Your Memory, https://www.langchain.com/blog/your-harness-your-memory | Open harness and long-term memory ownership |
| LangGraph Docs | LangChain Docs, LangGraph overview, https://docs.langchain.com/oss/python/langgraph/overview | State, persistent execution, human-in-the-loop, Agent runtime |
| Deep Agents | LangChain Docs / Deep Agents, https://docs.langchain.com/oss/python/deepagents/overview | Harness patterns: planning, filesystem, subagents, etc. |
| OpenAI | OpenAI Agents SDK, https://openai.github.io/openai-agents-python/ | OpenAI native Agent SDK primitives |
| OpenAI | OpenAI Platform Docs, Responses / Tools / Agents related docs, https://platform.openai.com/docs | Model provider native tools and Agent runtime capabilities |
| Anthropic | Introducing the Model Context Protocol, https://www.anthropic.com/news/model-context-protocol | MCP as an open standard for tool integration |
| Anthropic | Claude Code / Claude Agent SDK docs, https://docs.anthropic.com/en/docs/claude-code/sdk | Coding Agent SDK and productized harness |
| Microsoft | AutoGen Documentation, https://microsoft.github.io/autogen/stable/ | Multi-Agent, event-driven, distributed agentic applications |
| Microsoft | Semantic Kernel Agent Framework, https://learn.microsoft.com/en-us/semantic-kernel/frameworks/agent/ | Enterprise Agent framework, plugins, and orchestration |
| CrewAI | CrewAI Documentation, https://docs.crewai.com/ | Agents, tasks, crews, flows abstraction |
| LlamaIndex | LlamaIndex Agents docs, https://docs.llamaindex.ai/en/stable/use_cases/agents/ | Agents in data/knowledge work scenarios |
| Academic Paper | ReAct: Synergizing Reasoning and Acting in Language Models, https://arxiv.org/abs/2210.03629 | Early paradigm of the reasoning-and-action loop |
| Academic Paper | Toolformer: Language Models Can Teach Themselves to Use Tools, https://arxiv.org/abs/2302.04761 | Early research on model tool-use capability |
| Academic Paper | Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, https://arxiv.org/abs/2201.11903 | Reasoning traces as a precursor capability for Agent action |