A2A Commerce: When Agents Trade With Each Other (Not Humans)

By Orion · April 24, 2026 · Sputnik X

Most commerce on the internet today still assumes a human is on at least one end of the transaction. The checkout form, the "Are you sure?" dialog, the Stripe minimum fee, the fraud-review window — all of it is shaped by that assumption. What happens when neither end is a human?

That is agent-to-agent commerce, or A2A. I want to describe what it actually looks like in practice, which transactions we have already observed on our infrastructure, why the rails built for human commerce cannot serve this, what the new stack (MCP plus x402 plus identity) unlocks, and where the human should still be kept in the loop. The last part is important. Full autonomy is the wrong goal.

What A2A Actually Means

A2A is commerce where both counterparties are autonomous agents. Not a chatbot fronting for a human. Not an AI-generated checkout experience with a human on the card. Both sides are software processes with their own wallets, their own goals, their own budgets, acting on behalf of someone — but transacting without pinging that someone for each decision.

The term has two semi-overlapping meanings in 2026. Google's A2A Protocol (Agent-to-Agent Protocol), published in 2025, defines a specific JSON-RPC-over-HTTP spec for agents to discover each other's capabilities and exchange structured messages. It is a coordination protocol, not primarily a payment protocol. The more general usage — the one I am using here — covers any system where two agents conduct a transaction of value: data, compute, tools, money, physical goods.

The two meanings reinforce each other. Google A2A handles how agents find each other and speak. x402 handles how they pay. ERC-8004 / SoulLedger handles who to trust. MCP handles what tools they expose. You need all four to have something that actually works end to end.

Real Transactions We've Seen

Five days of x402 logs on mcp.sputnikx.xyz. 11,400 paid calls. I want to describe three of the more interesting ones, because they show the shape of A2A better than abstractions do.

The chain of trust checks. A research agent called soul_verify on six different vendor addresses in under two seconds. Each call cost $0.01. It was shopping: checking which of six candidate data vendors had trust scores above 70 before committing to pay $0.30 for actual data from one of them. The total overhead was $0.06 to save itself from paying a honeypot. That is the kind of micro-decision that only makes sense when the check is cheap and the decision-maker is not a human.

The compliance pull. A European enterprise agent called soul_compliance on its own counterparty address (not a third-party agent — its own trading partner). The classification came back Limited with a signed receipt. The agent stored the receipt, attached it to a transaction log, and continued. Cost: $0.10. I do not know what that receipt will eventually be used for, but I would guess: an EU AI Act compliance file, assembled automatically, months before the August 2, 2026 deadline. That is a use case invented by a regulation rather than a product manager.

The multi-hop delegation. Agent A paid Agent B to get a trade signal, and Agent B paid our query_trade endpoint to produce that signal. From our side we only saw Agent B. But the payment trail on Base made the delegation visible: the same address that received Agent A's USDC sent out a matching USDC transfer to our wallet 1.4 seconds later, with the same reference hash. Agent B was essentially a toll road. The commerce was Agent A -> Agent B -> us, executed in three seconds, zero human involvement.

None of these transactions are dramatic. That is the point. A2A commerce is not a visible revolution. It is an infrastructure shift. Transactions get smaller, more frequent, more structured, more auditable, and faster, until one day you look at the logs and notice that most of the volume is not human.

Why Stripe and PayPal Don't Fit

I will not bury the lede. The traditional payment rails fail at A2A for four specific reasons.

1. Minimum fees eat micropayments. Stripe charges roughly 2.9 percent plus $0.30 per transaction in most markets. On a $0.01 payment the fee is $0.30 — thirty times the payment. Below about $5, Stripe's unit economics invert. x402 on Base costs roughly 0.05 percent plus a few cents of gas-adjacent margin. A $0.01 payment is viable.

2. Credential sharing is a security disaster. If your agent uses a Stripe key, that key can be stolen, logged, leaked, memorized by an LLM, exfiltrated through prompt injection. You cannot give an autonomous process a credential that authorizes arbitrary spend without exposing the whole account. With x402 on Base, each agent has its own wallet with its own budget. Compromise of one agent's wallet does not touch any other. Compromise is bounded.

3. Chargeback windows assume a human buyer. Stripe and PayPal reserve the right to reverse a transaction for up to 120 days after settlement. That is reasonable when the buyer is a human who might have been defrauded. It is catastrophic in A2A, where you cannot keep 120 days of transactions in escrow. x402 payments settle on Base in under three seconds and are final. If a dispute arises, it is resolved via on-chain evidence and out-of-band arbitration, not by the payment rail.

4. Identity model assumes humans. Stripe's fraud system looks at browser fingerprints, IP addresses, transaction velocity, behavioral patterns — all signals tuned to humans. Agents look like fraud to these systems by default. They run on data centers. They transact at machine speeds. They do not have browsers. Trying to onboard an autonomous agent to a human-tuned fraud system is fighting the tools.

None of this is a criticism of Stripe. Stripe is excellent at what it does. What it does is not A2A commerce. A2A needs rails shaped for a different buyer population, and those rails now exist.

The New Stack

What does a working A2A transaction actually require? The answer, as far as I can tell, is four layers.

LayerQuestion it answersRepresentative technology
DiscoveryWho is out there?Google A2A, Coinbase Bazaar, MCP registries
CapabilityWhat can they do?MCP tool manifests, A2A agent cards
TrustShould I deal with them?ERC-8004, SoulLedger
PaymentHow do I pay?x402 on Base L2

Each layer can be swapped independently. MCP could be replaced with another tool protocol. x402 could run on Solana. ERC-8004 could have a competitor tomorrow. The stack is not a bundle; it is a shape. As long as the four questions get answered, the commerce works.

A minimal A2A transaction flow with this stack looks like:

1. Agent A searches a Bazaar for a capability: "give me Rotterdam pellet prices"
   -> returns candidate Agent B, Agent C, Agent D

2. For each candidate, Agent A calls SoulLedger /verify
   -> costs $0.03 total, returns trust scores [78, 41, 85]

3. Agent A picks Agent D (highest trust above threshold)

4. Agent A calls Agent D's MCP tool /prices
   -> Agent D returns 402 with x402 challenge for $0.30

5. Agent A signs ERC-3009 authorization, retries with X-PAYMENT
   -> Agent D's facilitator settles on Base in ~2s
   -> Agent D returns JSON: {"price_per_ton": 193.50, ...}

6. Agent A optionally calls SoulLedger /attest to endorse Agent D
   -> $0.50 plus gas, improves Agent D's future trust score

Total human involvement: zero. Total wall time: ~6 seconds.
Total cost: $0.83 plus attestation if issued.

The whole flow is executed by software, priced in cents, and leaves an auditable trail on Base. A human can review the log later. A human does not have to be in the hot path.

Where Humans Still Belong: The OriClaw Approval Gate

Full autonomy is the wrong goal. There are transactions where keeping a human in the loop is not friction but safety. The question is which ones.

Our answer, implemented in OriClaw's approval gate, is that the human-in-the-loop boundary is set by trust tier, not by transaction type. We classify every action an agent can take into one of four tiers:

  1. Tier 0 — Read-only. Automatic. No approval needed. Reading a trust score, querying a price, fetching a spec.
  2. Tier 1 — Small spend. Automatic up to a per-action and per-window budget (typically $1 per action, $20 per hour). x402 payments for data falls here by default.
  3. Tier 2 — Medium spend or state change. Requires approval from a named human operator. Includes attestations (which affect other agents), compliance submissions, and any payment above the Tier 1 ceiling.
  4. Tier 3 — High-risk or irreversible. Requires two-person approval plus an audit trail. Includes wallet key rotation, treasury transfers, contract deployments.

The approval gate is implemented as a Telegram bot that intercepts Tier 2 and Tier 3 actions before they execute. The agent's proposed action goes into a queue with a short description, cost estimate, and justification. A human taps approve or deny. If the approval window expires (15 minutes default), the action is auto-denied.

This setup lets an agent run autonomously in the 90 percent of cases where nothing needs a human, and keeps a human in the loop for the 10 percent where the cost of a bad decision is real. The ratio is tunable per deployment. Some operators run everything at Tier 0 and Tier 1. Others never let an agent spend without explicit approval. The gate does not force a policy; it lets you enforce whatever policy you choose.

The thing I want to be honest about: we do not always get the tier boundaries right. We have shipped bugs where an agent auto-executed a Tier 2 action because of a mis-tagged endpoint. We have also been too cautious, routing pure reads through human approval and burning operator time. Getting the boundaries right is an ongoing tuning exercise, not a solved problem.

Economic Implications

A few consequences of this stack that I think matter, in rough order of confidence.

Price of data falls. When the marginal cost of a query is a cent, the equilibrium price of data approaches its marginal cost of production. Data that was priced per-seat under human consumption gets repriced per-query under agent consumption, and the per-query price is much lower than per-seat divided by expected queries. Vendors compete on specificity and freshness, not access.

Intermediation gets cheap. An agent that wraps five underlying data sources into a single higher-level answer can charge $0.05 and earn a margin on top of $0.04 in upstream costs. The overhead is low enough that vertical specialization pays off. Expect a proliferation of tiny agent businesses each serving a narrow niche, because the cost to start one is a wallet and 200 lines of code.

Trust becomes the scarce resource. If discovery, capability, and payment are commodity infrastructure, the remaining scarce resource is reputation. An agent with a trust score of 90 can charge a premium over one at 60 because buyers route around low-trust vendors. This is the main reason we built SoulLedger. Reputation is the moat.

Regulation will arrive, and it will be structural. The EU AI Act is first but it will not be last. Regulators will require auditable provenance, risk classification, and revocation paths. Stacks that have this built in (on-chain attestations, signed receipts, immutable history) will comply cheaply. Stacks that rely on proprietary databases will have to rebuild. We think this is under-priced by the market right now.

Where We're Going

Three bets we are making at Sputnik X over the next two quarters.

Bet 1: A2A volume grows 10x in 2026. We are instrumenting every paid MCP call so we can measure this. Current run rate on our infrastructure is about 2,300 paid calls per day. If the bet is right, that number hits 23,000 per day by year end. If it does not, we need to understand why.

Bet 2: Trust scoring eats reputation scoring. Platform-specific reputation (eBay feedback, Amazon stars, Upwork ratings) stays relevant for humans. For agents, on-chain trust scoring takes over because portability wins. We are building SoulLedger as infrastructure, not as a product, because the winner here will look more like DNS than like Twitter.

Bet 3: The approval gate becomes standard. Every serious agent deployment will need a human-in-the-loop mechanism for high-tier actions. OriClaw is our version, but we expect convergence across vendors. The interesting question is whether it converges as a protocol (good for users) or as a product category (good for incumbents). We are pushing toward protocol.

I am writing this in April 2026 because the infrastructure is finally real. Twelve months ago A2A was a talk at a conference. Six months ago it was a demo. Today it is logs on a server, bills that get paid, transactions that settle. That is a short run-up to something that might define how the next decade of software-to-software commerce looks. We intend to be in the middle of it.

References