your first on-chain AI Agent with Coinbase AgentKit

your first on-chain AI Agent with Coinbase AgentKit

I spent a week inside Coinbase's AgentKit — tracing every layer from the chat prompt to the on-chain transaction — and came out with a few helpful mental model for those starting out.

AgentKit is one of the most talked-about tools for building AI agents that interact with blockchains.

The tutorials make it look simple. However, your agents will be exponentially better if you understand the core fundamentals of how this works.

This is my attempt to explain those fundamentals and share some mental models.

We are going to

  • trace two complete request lifecycles through AgentKit, step by step, with every abstraction named and every layer explained.
  • A read path (fetching ETH price — no wallet, no gas, no transaction) and a write path (wrapping ETH — full transaction through MPC signing to on-chain execution).
  • Then we will break things on purpose and watch what happens.
  • Then we will look at what this same flow costs you without AgentKit, so you can feel the weight of what the abstraction removes.

Let's dive right in.


AgentKit in 100 Words

AgentKit is a developer toolkit from Coinbase that lets AI agents perform real actions on blockchains — sending tokens, swapping assets, reading prices, interacting with smart contracts.

You connect an AI model to AgentKit, tell it which capabilities to enable, and the AI can then decide when and how to use them based on what a user asks. AgentKit handles the hard parts: making sure the AI’s requests are valid before they hit the blockchain, managing wallet signing securely, and routing errors back to the AI so it can explain what went wrong instead of crashing. It supports 37+ categories of blockchain actions out of the box. You choose exactly which ones your agent can access.


The Two Files That Define Your Agent

Before we trace any request, we need the mental model for how an AgentKit agent is assembled.

Two files. That's it.

File 1: prepare-agentkit.ts - the hands.

This file creates the wallet provider and the action providers array.

The wallet provider is how your agent signs transactions. The action providers array is what your agent can do.

Here is the real code from a production AgentKit scaffold:

import { AgentKit, wethActionProvider, pythActionProvider,
         walletActionProvider, erc20ActionProvider,
         cdpApiActionProvider, cdpSmartWalletActionProvider,
         x402ActionProvider } from "@coinbase/agentkit";

const agentkit = await AgentKit.from({
  walletProvider,
  actionProviders: [
    wethActionProvider(),
    pythActionProvider(),
    walletActionProvider(),
    erc20ActionProvider(),
    cdpApiActionProvider(),
    cdpSmartWalletActionProvider(),
    x402ActionProvider(),
  ],
});

Seven providers opted in.

The SDK ships ~37 — including Morpho, Compound, Moonwell, Farcaster, Twitter, OpenSea.

But your agent only gets the ones in this array.

That array IS the agent's capability set. Add morphoActionProvider() and the agent gains DeFi deposit and withdraw tools. Remove wethActionProvider() and wrap_eth vanishes from the LLM's context entirely.

Strict opt-in. The LLM never sees tools you didn't explicitly list.

File 2: create-agent.ts— the brain.

This file takes the AgentKit instance and converts it into a LangChain ReAct agent:

const tools = await getLangChainTools(agentkit);
agent = createReactAgent({
  llm,
  tools,
  checkpointSaver: memory,
  messageModifier: "You are a helpful agent that can interact onchain..."
});

getLangChainTools does the conversion. Every action from every opted-in provider becomes a LangChain tool with a name, description, Zod schema, and invoke function.

The LLM sees tool signatures like PythActionProvider_fetch_price and WethActionProvider_wrap_eth. It reads the description to decide when to call each one.

Two files. The hands and the brain.

Now let's watch them work.


The Anatomy of an Action Provider

Before we go further, let's clarify what an action provider actually is in AgentKit's world.

In AgentKit, an action provider is a module that gives your agent a specific set of capabilities. Think of it like a plugin. Want your agent to interact with wrapped ETH? There is an action provider for that. Want it to read price feeds? Another action provider. Want it to post on Farcaster or trade on a DEX? Each of those is a separate action provider you can plug in or leave out.

The key idea: your agent only gets the capabilities you explicitly hand it. No action provider, no capability.

Now let's look at the internal structure — this is the pattern you will see repeated across all 50+ actions.

An action provider is a class that bundles related actions.

One provider, many actions. wethActionProvider bundles wrap_eth and unwrap_eth. pythActionProvider bundles fetch_price_feed and fetch_price.

Each individual action has four parts:

interface Action<TActionSchema extends z.ZodSchema = z.ZodSchema> {
  name: string;        // what the LLM sees: "wrap_eth"
  description: string; // the LLM reads THIS to decide when to call it
  schema: TActionSchema; // Zod schema — validates + types + documents
  invoke: (args) => Promise<string>; // the work — ALWAYS returns a string
}
  • description field is the most important line of code in any action provider. The LLM reads it — and only it — to decide whether to call this tool.A bad description means the agent never uses the tool or misuses it. Prompt engineering hiding inside an API.
  • schema field uses Zod — a TypeScript validation library that does three jobs at once: runtime validation (reject malformed LLM output before it hits your code), static TypeScript types, and LLM documentation (via .describe() on each field).Zod is the trust boundary between the LLM's untrusted JSON and your action's execution logic.
  • invoke always returns a string. Always. Errors included.
A failed tool call returns "Error: Insufficient ETH balance" as a string, not a thrown exception. The LLM reads the error message and decides what to do next. We will see why this matters when we trace the error paths.

So what does this mean in practice?

AgentKit is not a monolithic SDK where everything is wired together.

But it is a clean separation of concerns — the action provider pattern decouples what the agent can do (the action) from how it signs and submits (the wallet provider). You can swap from a CDP-managed MPC wallet to a self-custodial viem wallet without changing a single action provider.

The action calls walletProvider.sendTransaction(...). It never knows or cares what signing infrastructure sits behind that interface.

That decoupling is the entire architectural insight of AgentKit.


The Read Path: "What's the Current Price of ETH?"

Now we trace a real request, end to end.

You type "whats the current price of ETH" and press Enter. Here are the 17 steps between your keystroke and the answer.

Steps 1–3: UI → Backend

  • Step 1. In app/page.tsx, the Enter key triggers onSendMessage(). React stores your message in local state — the blue chat bubble appears instantly.
  • Step 3. Your Next.js route handler (app/api/agent/route.ts) parses the body and calls createAgent().

Step 2. useAgent.ts sets isThinking(true) and sends a POST request:

fetch("/api/agent", {
  method: "POST",
  body: JSON.stringify({ userMessage: "whats the current price of ETH" })
});

If the agent was already initialized (module-level singleton), it reuses it. If this is the first request since server start, it runs the full AgentKit setup from the two files we just covered.

Steps 4–5: AgentKit Prepares Tools

  • Step 4. AgentKit calls getActions() on each opted-in provider.Each provider's supportsNetwork() method is checked — if it returns false for the current network, that provider's actions are hidden entirely.For Pyth, supportsNetwork returns true universally because price feeds are network-agnostic HTTP calls.
  • The LLM now sees a flat list: fetch_price_feed, fetch_price, wrap_eth, unwrap_eth, get_wallet_details, native_transfer, and so on.

Step 5. getLangChainTools(agentkit) converts every surviving action into a LangChain tool:

const actions = agentKit.getActions();
return actions.map(action =>
  tool(async arg => {
    const result = await action.invoke(arg);
    return result;
  }, {
    name: action.name,
    description: action.description,
    schema: action.schema,
  })
);

Steps 6–8: LLM Selects the Right Tools

  • Step 6. The ReAct agent receives your message plus the system prompt plus the tool list.It reasons: "User wants current ETH price. I have Pyth tools. I need the ETH/USD price feed ID first, then I fetch the latest price for that feed ID."
  • Step 7. The LLM emits a structured tool call:
{
  "name": "fetch_price_feed",
  "args": { "tokenSymbol": "ETH", "quoteCurrency": "USD", "assetType": "crypto" }
}
  • Step 8. Zod validates those args against the schema.

If the LLM hallucinated an extra key, .strip() drops it silently. If a required field is missing, Zod rejects the call before invoke ever runs.

Steps 9–11: First Tool Execution — Find the Feed ID

  • Step 11. That string goes back into the LLM context as a tool result. The ReAct loop continues.

Step 10. The provider filters the response to match ETH/USD exactly and returns a JSON string:

{ "success": true, "priceFeedID": "...", "tokenSymbol": "ETH", "quoteCurrency": "USD" }

Step 9. The Pyth action provider's fetchPriceFeed method runs. No wallet involved. This is a pure HTTP call:

const url = `https://hermes.pyth.network/v2/price_feeds?query=${args.tokenSymbol}&asset_type=${args.assetType}`;
const response = await fetch(url);
const data = await response.json();

Steps 12–15: Second Tool Execution — Fetch the Price

  • Step 12. The LLM now has a real Pyth feed ID. It calls fetch_price with that ID.
  • Step 15. The formatted price string returns to the LLM context.

Step 14. Pyth returns raw price data with a price and exponent:

{ "price": "184553000000", "expo": -8 }

The provider computes: 184553000000 × 10^-8 = 1845.53. No wallet. No gas. No transaction. No CDP signing.

Step 13. Another HTTP call to Pyth Hermes:

const url = `https://hermes.pyth.network/v2/updates/price/latest?ids[]=${args.priceFeedID}`;

Steps 16–17: Final Answer

  • Step 16. The LLM has enough. It writes: "The current price of ETH is $1,845.53 USD."
  • Step 17. The backend streams this to the frontend. React adds the gray agent bubble. isThinking(false). Done.

The entire flow was:

  1. LLM selected tools →
  2. Zod validated args →
  3. action providers made HTTP calls →
  4. results flowed back into LLM context →
  5. LLM wrote a sentence.

So far, we have covered the assembly (two files), the anatomy (four-part action interface), and the read path (17 steps, zero transactions).

Now let's trace what changes when the agent needs to write to the blockchain.


The Write Path: "Wrap 0.1 ETH"

The read path never touched the wallet provider.

The write path is where the full stack activates — from LLM tool selection all the way through MPC signing, bundler submission, and on-chain execution.

You type "wrap 0.1 ETH."

Here is every step.

Steps 1–5: Same as the Read Path

The UI, POST request, agent creation, and tool preparation are identical. The divergence starts at step 6.

Step 6: LLM Selects wrap_eth

The LLM reads the description of wrap_eth:

This tool wraps ETH to WETH.
Inputs:
- Amount of ETH to wrap in human-readable format (e.g., 0.1 for 0.1 ETH).

It emits:

{ "name": "wrap_eth", "args": { "amountToWrap": "0.1" } }

Step 7: Zod Validates

The schema enforces structure:

const WrapEthSchema = z.object({
  amountToWrap: z.string()
    .describe("Amount of ETH to wrap in human-readable format (e.g., 0.1 for 0.1 ETH)"),
}).strip();

.strip() drops any extra keys the LLM might hallucinate. The validated args reach invoke.

Step 8: Network Gate

Before the action runs, supportsNetwork checks whether a WETH contract address exists for the current network.

If you are on a network without a known WETH deployment, the action returns an error string. The LLM never attempts the transaction.

Step 9: Unit Conversion

The invoke body converts human-readable units to atomic units:

const amountInWei = parseUnits(args.amountToWrap, 18);
// 0.1 ETH → 100000000000000000 wei

The LLM speaks in human numbers. The chain speaks in wei.

This conversion is where many hand-built agents introduce bugs — either by forgetting the decimals or by passing the wrong precision. AgentKit handles it inside the action.

Step 10: Balance Pre-Flight Check

Before submitting anything on-chain, the action checks whether you have enough ETH:

const ethBalance = await walletProvider.getBalance();
if (ethBalance < amountInWei) {
  return `Error: Insufficient ETH balance. Requested ${args.amountToWrap}, only ${formatUnits(ethBalance, 18)} available.`;
}

This is the first "so what?" moment. The pre-flight check returns an error as a string instead of throwing an exception. The LLM reads the error, understands the situation, and can tell the user "You don't have enough ETH" — instead of the agent loop crashing with an unhandled exception.

Step 11: Building the Transaction

The action constructs the transaction object:

const hash = await walletProvider.sendTransaction({
  to: wethAddress,                // the WETH contract on this network
  data: encodeFunctionData({
    abi: WETH_ABI,
    functionName: "deposit"
  }),
  value: amountInWei,             // send 0.1 ETH with the call
});

encodeFunctionData from viem encodes the function selector for WETH's deposit(). The value field sends ETH along with the call.

The WETH contract receives the ETH and mints an equal amount of WETH back to the sender.

But the action does not submit this transaction itself. It calls walletProvider.sendTransaction(...).

What happens next depends entirely on which wallet provider you configured.

Step 12: The Wallet Provider Takes Over

If you are using CdpSmartWalletProvider (the default scaffold choice), the wallet provider:

  1. Wraps the transaction into an ERC-4337 UserOperation — a data structure that describes the intended action, gas limits, and the smart account that should execute it.
  2. Signs the UserOp via CDP MPC infrastructure — no single server holds the full private key. The signing happens server-side through Coinbase's key sharding.
  3. Submits the signed UserOp to a Pimlico bundler — a service that batches UserOperations and submits them to the blockchain.

If you are using CdpEvmWalletProvider (simpler, EOA-based), the wallet provider signs a raw transaction via CDP MPC and broadcasts it directly — no UserOp, no bundler, no EntryPoint.

If you are using EthAccountWalletProvider (self-custodial), your own viem WalletClient signs and submits. No CDP dependency at all.

The action provider never knows which path was taken.

That is the decoupling at work.

Step 13: On-Chain Execution (Smart Account Path)

For the CdpSmartWalletProvider path, the on-chain execution is:

  1. The bundler calls EntryPoint.handleOps() with your UserOp.
  2. EntryPoint calls validateUserOp() on your ZeroDev Kernel smart account — checking the MPC signature.
  3. If valid, EntryPoint calls execute() on the smart account.
  4. The smart account calls WETH's deposit() with 0.1 ETH.
  5. WETH mints 0.1 WETH to your smart account address.
  6. The transaction completes. A tx hash exists.

Step 14: Receipt and Return

Back in the action provider:

await walletProvider.waitForTransactionReceipt(hash);
return `Wrapped ${args.amountToWrap} ETH to WETH. Transaction hash: ${hash}`;

That string — with the tx hash — goes back into the LLM context.

The LLM writes a final message to the user confirming the wrap. Done.

The universal invoke shape, visible across every write action in the SDK:

(validated args) → normalize input → pre-flight checks → do the work via walletProvider → return a string.


The Three invoke Flavors

We have traced two paths. Let's compress the pattern.

Every action in AgentKit falls into one of three categories:

  1. Read-only, no transaction. Tools like get_wallet_details — reads state from the wallet provider, formats prose, returns a string. No gas, no signing.
  2. Write/transaction. Tools like wrap_eth and native_transfer — converts units, pre-flight checks, calls walletProvider.sendTransaction(...), waits for receipt, returns tx hash as string.
  3. External API, no wallet. Tools like fetch_price — makes a plain HTTP fetch() to an external service (Pyth Hermes, CoinGecko, DefiLlama), returns the response as a JSON string. No wallet, no chain interaction at all.

The LLM does not know which flavor it is calling.

It sees a tool name and a description. The AgentKit abstraction handles the rest.

Now, Imagine Building Without AgentKit….

AgentKit abstracts a lot. Let's see what you would write to wrap ETH without it — using raw viem — so you can feel the weight of what the abstraction removes.

import { createWalletClient, http, parseUnits, encodeFunctionData } from "viem";
import { privateKeyToAccount } from "viem/accounts";
import { base } from "viem/chains";

const account = privateKeyToAccount("0xYOUR_PRIVATE_KEY");
const client = createWalletClient({ account, chain: base, transport: http() });

const hash = await client.sendTransaction({
  to: "0x4200000000000000000000000000000000000006", // WETH on Base
  data: encodeFunctionData({ abi: WETH_ABI, functionName: "deposit" }),
  value: parseUnits("0.1", 18),
});

Five lines, plus imports.

No LLM integration. No argument validation. No pre-flight balance check. No error-as-string return. No network gating. And you are managing your own private key in plaintext.

To get what AgentKit gives you — LLM tool selection, Zod validation, multiple wallet provider options, error handling that flows back into the LLM context, network-aware tool visibility — you would need to build:

  1. A tool registry that converts your functions into LLM-callable tools with schemas.
  2. A validation layer (Zod or equivalent) between the LLM's output and your execution code.
  3. A wallet abstraction that supports both EOA and smart account signing paths.
  4. Pre-flight balance and network checks for every write action.
  5. Error serialization that returns string results instead of throwing.

AgentKit is not a thin wrapper around viem. It is the entire agent-to-chain bridge.

The five lines above are the on-chain part. AgentKit is the other 16 steps.


Sources To Learn More

Cheers Decipherers.

Got questions?

Join the community 👉 Decipher Club Telegram

Join Decipher Club today

Simplifying Web3 and Technology for everyone

Subscribe Now

Read more