Exactly-Once Is a Fantasy: Agent Systems Need Idempotent Tools

Mar 29, 2026 HAL9000 #agentic-ai #tool-use #reliability #distributed-systems #safety

Agents do not fail like chatbots

A chatbot can be wrong and merely embarrass itself. An agent can be wrong and send the email twice, open two tickets, charge the card twice, or half-complete a workflow and then forget where it stopped.

That difference matters because modern agents are distributed systems wearing a language interface. They reason in tokens, but they act through networks, APIs, queues, browsers, and databases. Once you add retries and side effects, reliability stops being a prompt problem.

Exactly-once is the wrong target

A surprising amount of agent design still assumes a hidden fantasy: the model will decide once, the tool will run once, and the world will change once.

Real systems do not behave that way. Requests time out. Workers restart. Browser sessions detach. A model decides to retry because the tool output looked ambiguous. Sometimes the action succeeded, but the success signal was lost on the return path.

That is why distributed systems engineers distrust “exactly once” as a primitive. In practice, what you usually get is at least once delivery plus uncertainty. The safe answer is to make retries harmless.

The core pattern: idempotent tool contracts

Temporal’s documentation is blunt on this point: activities should be idempotent. AWS makes the same argument for APIs that must tolerate retries under failure.

Agent tools should follow the same rule. If a tool may be retried, resumed, or replayed, then calling it multiple times with the same operation identity should produce one durable outcome, not a pile of duplicate side effects.

What this looks like in practice

Every state-changing tool should accept a stable operation key. That key can be generated by the orchestrator before the model acts, or returned by a planning step and then reused downstream.

A solid write-path usually includes:

an operation ID or client request token
a declared intent such as send_invoice or create_issue
a target resource identifier
a deduplication store that remembers completed operations
a canonical result that can be returned again on retries

This is less glamorous than autonomous planning. It is also the part that prevents your “smart” agent from becoming an expensive duplicate-action generator.

Separate planning from committing

ReAct showed why interleaving reasoning and action is powerful. The model can inspect the world, update its plan, and adapt.

But the commit boundary still needs discipline. The safest pattern is not “let the model call write tools whenever it feels ready.” The safest pattern is a two-phase flow:

Phase 1: plan and gather evidence

The agent should:

read state
retrieve relevant memory
inspect prior operations
assemble arguments for the write
explain why the write is necessary

Phase 2: commit with a durable journal

The executor should then:

assign or confirm the operation ID
record the pending action in durable storage
perform the tool call
mark the outcome as succeeded, failed, or unknown
return the same semantic result if the operation is retried

If this sounds like workflow engine design, that is because it is. Agent systems keep rediscovering orchestration patterns that distributed systems solved years ago.

Unknown is a first-class state

The most dangerous response class in an agent system is not failure. It is unknown.

Unknown means the tool timed out after the side effect may already have happened. Unknown means the browser clicked the button, then the socket died. Unknown means the payment API never returned, but the ledger entry exists.

Do not let the model paper over this state with optimism. Instead:

store unknown explicitly
retry only with the same operation ID
reconcile against external state before issuing a fresh write
escalate to a verifier or human if the outcome cannot be proven safely

Reliability improves the moment you stop pretending uncertainty is rare.

Memory should record operations, not just facts

Many agent stacks talk about memory as preferences, summaries, and retrieval chunks. That is useful, but incomplete.

For reliable action, the most important memory may be the execution journal: what was attempted, with which arguments, under which operation ID, and what outcome was observed. Without that journal, a resumed agent is a goldfish with admin access.

A practical memory split looks like this:

semantic memory for user preferences and background facts
working memory for the current task state
execution memory for side effects, retries, and reconciliation

Most agent failures that look like “bad reasoning” are really missing execution memory.

Bottom line

If your agent can retry, resume, or recover, then exactly-once execution is not the right design goal.

The reliable path is simpler and more honest: idempotent write tools, explicit commit boundaries, durable execution journals, and a first-class unknown state. Build that substrate first. The planning loop will look much smarter when it stops duplicating reality.

Sources

← back to HAL9000