Agent loop - Comparing AI Agent SDKs

When building AI agents, one of the first decisions is: how to run the agent loop? In all its simplicity, this is the Reason + Act (ReAct) pattern, where the agent alternates between reasoning about what to do next and taking actions using tools.

Unsurprisingly, there are already many frameworks providing this (and much more). I’ve picked up four to get an idea how they compare - OpenAI’s Agent SDK, Claude’s Agent SDK, Vercel AI SDK, and LangGraph.

I put together a minimal agent implementation across all four frameworks to find out. Not quite a “hello world”, but not much more in the context of agentic capabilities: an agent that’s tasked with planning focus time for a week.

Here’s what I learned.

The Setup

Before diving into the comparison, here’s how I structured the test:

For each SDK, I used the TypeScript/Node.js implementation.

Development environment: I used Claude Code for all implementations. These days, one key consideration for me is how well an AI coding assistant can help with the framework because it significantly affects the development speed.

The test scenario: Each agent needed to:

Accept a minimal user prompt (e.g., “Help me plan focus time for next week”). I wanted to keep this minimal to see how well the agent can work without too specific request from the user.
Use tools to check availability, create appointments, and manage scheduling. For this I set up a mock calendar to avoid the hassle of calendar authentication etc.. That’s not the focus here.
Follow the system instructions + user preferences to plan focus time effectively.

Tool architecture: For the mock calendar service I built SDK-specific adapters for each framework. This let me see how convenient it was to integrate custom tools with each SDK’s expected format.

User context: I included a personal preferences file - this emulated preferences that a relative savvy user would be able to provide.

The goal wasn’t to build production-ready code, but to get a feel of the experience with the basics: setup, tool integration, and actual usage patterns.

Let’s see how each SDK fared.

Claude Agent SDK

Let me start with the slightly amusing one: Claude Code doesn’t quite know how to use the Claude Agent SDK yet. Specifically, it struggled with using the query() function and the overall SDK structure. However, once I instructed it to read the code in node_modules, it found its way and produced a working implementation.

The bigger limitation: The Claude Agent SDK only works with Claude models. Every other SDK in this comparison lets you swap between providers, but this one locks you into Anthropic’s ecosystem. That’s not necessarily a dealbreaker - Claude is excellent - but it does reduce flexibility.

Apart from those, I enjoyed the minimal API surface. The SDK is straightforward, and once I got past the initial hiccups, it was smooth sailing.

Claude Agent SDK was the only one providing direct cost information (instead of just token usage), which is a nice touch making it really easy to get a grasp of how much a single run of the agent roughly costs.

OpenAI Agent SDK

This was my favorite of the bunch, and here’s why:

Clarity and structure: The SDK is very intuitive. You create an Agent and the your run(). That’s it, right? The concepts map intuitively to what you’re trying to build, and the setup process is straightforward.

Like with all the SDKs, processing output messages is not fun. They’re deep objects with nested structures, and formatting them into readable logs is a chore. This is amplified by OpenAIs stream format of streaming chunks instead of full messages.

Similarly, as token usage needs to parsed from the response metadata, getting cost estimates requires some manual work.

Despite these minor friction points, the overall experience was solid. The code felt clean, the patterns were clear, and everything worked as expected.

Vercel AI SDK

Like all Vercel’s products, the Vercel AI SDK shines in developer experience.

Setup time: Five minutes from zero to agent running. The SDK is designed for rapid development, and it shows.

Model flexibility: Want to try Anthropic’s Claude? Switch a parameter. Curious about OpenAI’s models? Switch it back. If you are familiar with using Vercel’s AI SDK to call the models, this is basically just a function switch.

Development experience with Claude Code: Once the library setup was in place, Claude Code effortlessly built the implementation in about 10 minutes total. The SDK’s conventions are clear enough that the AI assistant had no trouble following them.

If you prioritize getting something running quickly, Vercel AI SDK is hard to beat.

LangGraph

Claude Code implemented the LangGraph agent in one go - no iterations needed. That speaks to how well-structured the framework is.

The differentiator: LangGraph (and the broader LangChain ecosystem) provides significantly more than just an agent loop. You get extensive tooling, observability, and - most interestingly - the ability to build sophisticated AI workflows using graph-based patterns.

While my simple ReAct agent didn’t need these capabilities, I can see how LangGraph becomes increasingly valuable as your agent grows more complex. Need conditional logic? Multiple agent collaboration? Complex state management? LangGraph has patterns for all of it.

What I Learned

At the level of “basic ReAct pattern with a few tools,” these SDKs are very similar. They all:

Make it easy to get started
Handle tool integration smoothly (wrapping my calendar tools to each SDK’s format was straightforward across the board)
Execute the fundamental agent loop reliably

Although a no-brainer it you think it a bit, it’s worth mentioning that they produce very similar results. After all, the framework here is mainly just looping between LLM calls and tool invocations. The real magic happens in the model and prompt design, not the SDK. And for that, they all used the same prompts, and either Anthropics or OpenAI’s models.

Output formatting is surprisingly hard: Across all frameworks, the thing Claude Code struggled with most was formatting the conversation history into a readable output. Each SDK handles message structures differently, and translating that into clean, human-readable conversation logs was consistently cumbersome. So one thing that could have separated these SDKs more is better abstractions for message handling and formatting.

At the end of the day, if building a simple agent, any of these SDKs will definitely deliver. In real life, a lot more aspects like deployment, data location requirements, and integration with existing systems will likely influence the choice much more than what we saw here. The good news from that is: if you’re environment restricts your options, that’s really not a big deal.

My Pick

Given everything I learned, I’m starting with OpenAI’s Agent SDK for my next project.

Why?

Documentation quality: It’s solid and comprehensive
Claude Code compatibility: The AI assistant works effectively with it
The Runner API: Clean abstractions for building and executing agents
Flexibility: I can easily swap models and providers as needed

Could I pick any of these and be fine? Absolutely. At this complexity level, they’re all capable so I’m heavily prioritizing development speed to get to the edges and limitations as fast as possible. It’s interesting to see what happens when the agents get more complex.

That said, I’m keeping LangGraph in mind. If I find myself needing more sophisticated workflows - multi-agent systems, complex state management, or advanced graph-based patterns - I’ll revisit it. The broader ecosystem there is genuinely compelling for advanced use cases.

See the Code

Want to explore the implementations yourself? I’ve kept all four agents plus example runs in this GitHub repo.

Each implementation shows the same agent functionality built with different frameworks, so you can see exactly how they compare. The README includes setup instructions and sample outputs.