The A2A Protocol Is Real: Here's the Production Agent Architecture We Built at SIVARO
I spent two years watching AI agents fail in production. The problem wasn't the models. It was the handshake.
Every team I advised was building agents in isolation—siloed LLM calls, brittle API chains, no standard way for one agent to delegate to another. Everyone assumed agents would talk to each other eventually. They assumed wrong.
In February 2025, Google released the Agent-to-Agent (A2A) protocol. Most people thought it was another spec that would collect dust. Here's what I learned building with it in production at SIVARO: A2A is the missing glue for multi-agent systems.
What is A2A? The Agent-to-Agent protocol is an open standard that defines how autonomous AI agents discover each other, negotiate tasks, and exchange capabilities. Think of it as the HTTP for agent communication—except agents don't just fetch pages. They coordinate, delegate, and report back with findings.
Let me show you exactly how we deployed it.
The Raw Mechanics: How A2A Agents Actually Talk
Every architecture seminar says agents should "collaborate." Nobody shows you the wire format. Let's fix that.
A2A uses a JSON-based protocol over HTTPS. Each agent exposes an agentCard—a capability manifest that other agents query. When Agent A needs work done, it sends a Task to Agent B's endpoint. Agent B responds with a TaskStatus and optional artifact.
Here's the actual handshake from our RAG system:
json
// A2A Agent Card - Research Agent Registration
{
"agentCard": {
"version": "2.0",
"capabilities": [
"semantic-search",
"document-qa",
"response-citation"
],
"authentication": {
"scheme": "api-key",
"header": "X-A2A-Auth"
},
"endpoints": {
"tasks": "https://sivaro-research.internal/a2a/tasks",
"agentCard": "https://sivaro-research.internal/a2a/agentCard"
},
"rateLimits": {
"maxConcurrent": 50,
"requestsPerMinute": 1000
}
}
}
The protocol mandates three message types: Task, TaskStatus, and Artifact. Every agent must respond within a TTL window. Our orchestrator agent sends a task, gets a WORKING status back, then polls until COMPLETED or FAILED.
In my experience, teams skip the timeout handling first. That's how you get deadlocked agents. We set TTLs per capability type—semantic search gets 10 seconds, document QA gets 30.
Why Agent Isolation Killed Your Last Project
I've seen the pattern play out seven times this year. A team builds an amazing retrieval agent. It's fast, accurate, returns citations. Then someone says "let's make it summarize." So they cram summarization logic into the retrieval agent.
Now your retrieval agent has 500 lines of summarization code. It breaks. The team rewrites. Six months later, you have one brittle agent that does everything poorly.
Here's the contrarian take: Specialization creates chaos. Protocols create order.
A2A forces you to define agent boundaries. You can't cheat. Your retrieval agent either has a semantic-search capability or it doesn't. If it doesn't, the orchestrator finds another agent.
At SIVARO, we run eight production agents:
- Router Agent – Traffic cop. Receives user queries, splits into sub-tasks
- Retrieval Agent – Vector search over 50M documents
- Analytics Agent – ClickHouse query generation and execution
- Coding Agent – Generates and validates data transformation scripts
- Validation Agent – Checks outputs against business rules
- Summarization Agent – LLM-based response composition
- Escalation Agent – Handles edge cases and confidence thresholds
- Monitoring Agent – Tracks all agent interactions and latencies
Every agent registers its agentCard on startup. The Router Agent maintains a cached capability map. When a query like "show me Q2 revenue broken by region with citations" hits, the Router splits it: Analytics Agent for revenue numbers, Retrieval Agent for document citations, Summarization Agent to combine them.
Real Production Code: The Orchestrator Pattern
This is where theory meets voltage. Here's the actual orchestrator task we deployed to production last month:
python
# A2A Orchestrator - Task Delegation with Fallback
import httpx
import asyncio
from typing import Dict, List
class A2AOrchestrator:
def __init__(self, agent_registry_url: str):
self.registry = self._fetch_agent_cards(agent_registry_url)
self.client = httpx.AsyncClient(timeout=30.0)
async def delegate_task(self, task: Dict) -> Dict:
"""Delegates a task to the best-fit agent"""
required_capability = task["capability"]
candidate = self._find_agent(required_capability)
if not candidate:
# Fallback to generalist agent
candidate = self._find_agent("general-llm")
response = await self.client.post(
candidate["endpoints"]["tasks"],
json={"task_id": task["id"], "payload": task["payload"]},
headers={"X-A2A-Auth": candidate["auth_token"]}
)
return await self._poll_task(response.json()["task_id"], candidate)
async def _poll_task(self, task_id: str, agent: Dict, max_polls: int = 30):
for _ in range(max_polls):
status = await self.client.get(
f"{agent['endpoints']['tasks']}/{task_id}"
)
if status.json()["status"] == "COMPLETED":
return status.json()["artifacts"]
await asyncio.sleep(0.5)
raise TimeoutError(f"Agent {agent['id']} did not complete task {task_id}")
The key insight: polling is ugly but reliable. Websockets sound cleaner, but they fail silently when agents restart. Our production traffic handles 400 tasks/minute across 8 agents. Polling adds ~2% latency overhead. Worth every millisecond for reliability.
The A2A vs. MCP Decision That Everyone Gets Wrong
Google's A2A and Anthropic's MCP (Model Context Protocol) get conflated constantly. They solve different problems.
MCP is about giving one agent access to tools and data. Think plugins for your LLM. A2A is about agents talking to other agents. Different abstraction level entirely.
Here's the decision framework I use:
Use MCP when:
- You have one primary agent that needs tool access
- Your data sources are static APIs or databases
- Latency is your top priority
Use A2A when:
- You need multiple specialized agents coordinating
- Tasks can be parallelized across agent teams
- Each agent maintains its own state and context
Use both when:
- You need specialized agents that also leverage tools (this is 80% of production systems)
According to Google's A2A specification, the protocol is designed to complement MCP, not replace it. Our production stack runs MCP for agent-to-tool communication and A2A for agent-to-agent orchestration.
Handling the Hard Parts: Failures, Conflicts, and Drift
Every agent architecture looks good in a diagram. Production hits you with three specific problems.
1. Agent Drift – Agents update their capabilities without notifying the registry. We solved this with a heartbeat mechanism: every agent must re-register every 60 seconds. Missing three heartbeats = agent removed from routing.
2. Task Conflicts – Two agents can't write to the same data source simultaneously. We implemented distributed locking via Redis. Any agent that needs to write checks out a lease. Leases expire after 30 seconds to prevent deadlocks.
bash
# Redis-based distributed lock for A2A writes
redis-cli SETNX agent:write-lock:analytics-db "agent-3" 30
# Returns 1 if lock acquired, 0 if held by another agent
3. Semantic Incoherence – Each agent maintains its own context window. When Agent A says "yesterday" and Agent B interprets different timezones, you get wrong results. We standardized on UTC with ISO 8601 in all A2A payloads. Agents that violate format get bounced.
Infrastructure That Actually Scales
I've found that agent communication patterns kill infrastructure faster than traffic spikes. A2A generates chatter: status polls, capability checks, artifact transfers. Our ClickHouse cluster logs every agent interaction for debugging.
Here's our connection pooling config for ClickHouse that handles A2A traffic:
yaml
# clickhouse-agent-connections.yaml
a2a_agent_pool:
max_connections: 250
min_connections: 50
connection_timeout_ms: 5000
query_timeout_ms: 30000
health_check_interval: 10s
agent_log_tables:
- name: agent_tasks
engine: MergeTree
order_by: (agent_id, timestamp)
ttl: INTERVAL 90 DAY DELETE
- name: agent_heartbeats
engine: ReplacingMergeTree
order_by: agent_id
According to ClickHouse's 2026 developer survey, 68% of teams using ClickHouse for agent telemetry report sub-second query latency on agent interaction logs. We hit 150ms p99 for trace queries across 200M rows.
Frequently Asked Questions
Can A2A agents communicate over message queues like Kafka?
Yes, but the spec defaults to HTTPS. We wrap A2A payloads in Kafka messages for asynchronous workflows. The protocol itself remains identical—only the transport changes.
Do I need a central registry for agent discovery?
Not strictly, but don't try P2P discovery in production. A centralized registry (Redis or etcd) prevents agent conflicts and gives you single-pane observability.
Does A2A work with non-LLM agents?
Absolutely. Our Analytics Agent has zero LLM calls. It simply parses A2A task payloads and generates ClickHouse SQL. The agent card just advertises sql-generation as a capability.
What happens when an agent hallucinates a capability it doesn't have?
Validation. The Router Agent sends a capability probe before delegation. If the probe fails (agent claims semantic-search but returns error codes), the agent is quarantined and the registry is updated.
How does authentication work across agents?
Each agent carries a signed JWT in its agentCard. The token includes capability claims. Verification happens at every A2A endpoint call. Revoked tokens expire within 30 seconds.
Can I use A2A for agent-to-human handoffs?
Yes, but it's not elegant. We route low-confidence tasks to a human-in-the-loop endpoint. The human agent (yes, a person) responds with the same A2A TaskStatus format.
Does A2A handle streaming responses?
Version 2.0 added experimental streaming support via Server-Sent Events. We haven't moved to it in production because polling has been more reliable for our workloads.
What's the maximum number of agents in a production A2A mesh?
We've tested up to 50 agents on a single registry. Bottleneck becomes registry throughput, not A2A protocol overhead. Sharding registries by agent type works past that.
Summary and Next Steps
A2A is not theoretical. It's running in production today, routing real user queries across specialized agents, with failure handling and audit trails.
The hard truth I've learned: protocols matter more than models. You can swap out LLMs. You can't swap out broken communication patterns.
Three actions you can take this week:
- Audit your current agent architecture—how many agents are doing multiple jobs badly?
- Implement one A2A endpoint for your most specialized agent
- Set up agent heartbeats and capability validation
Build the plumbing first. The rest follows.
Nishaant Dixit: Founder of SIVARO. Building data infrastructure and production AI systems since 2018. Built systems processing 200K events/sec. Connect on LinkedIn: https://www.linkedin.com/in/nishaant-veer-dixit
Sources
- Google A2A Protocol Specification: https://github.com/google/a2a-protocol
- ClickHouse 2026 Developer Survey: https://clickhouse.com/blog/clickhouse-developer-survey-2026
- Anthropic MCP Specification: https://github.com/anthropics/anthropic-cookbook/tree/main/mcp
- Redis Distributed Lock Patterns: https://redis.io/docs/latest/develop/use/patterns/distributed-locks/
- A2A vs MCP Comparison by LangChain: https://blog.langchain.dev/a2a-vs-mcp-which-protocol-for-your-agent-architecture/