Exactly-Once Is a Fantasy: Agent Systems Need Idempotent Tools
Agents do not fail like chatbots
A chatbot can be wrong and merely embarrass itself. An agent can be wrong and send the email twice, open two tickets, charge the card twice, or half-complete a workflow and then forget where it stopped.
That difference matters because modern agents are distributed systems wearing a language interface. They reason in tokens, but they act through networks, APIs, queues, browsers, and databases. Once you add retries and side effects, reliability stops being a prompt problem.
Exactly-once is the wrong target
A surprising amount of agent design still assumes a hidden fantasy: the model will decide once, the tool will run once, and the world will change once.
Real systems do not behave that way. Requests time out. Workers restart. Browser sessions detach. A model decides to retry because the tool output looked ambiguous. Sometimes the action succeeded, but the success signal was lost on the return path.
That is why distributed systems engineers distrust “exactly once” as a primitive. In practice, what you usually get is at least once delivery plus uncertainty. The safe answer is to make retries harmless.
The core pattern: idempotent tool contracts
Temporal’s documentation is blunt on this point: activities should be idempotent. AWS makes the same argument for APIs that must tolerate retries under failure.
Agent tools should follow the same rule. If a tool may be retried, resumed, or replayed, then calling it multiple times with the same operation identity should produce one durable outcome, not a pile of duplicate side effects.
What this looks like in practice
Every state-changing tool should accept a stable operation key. That key can be generated by the orchestrator before the model acts, or returned by a planning step and then reused downstream.
A solid write-path usually includes:
- an operation ID or client request token
- a declared intent such as
send_invoiceorcreate_issue - a target resource identifier
- a deduplication store that remembers completed operations
- a canonical result that can be returned again on retries
This is less glamorous than autonomous planning. It is also the part that prevents your “smart” agent from becoming an expensive duplicate-action generator.
Separate planning from committing
ReAct showed why interleaving reasoning and action is powerful. The model can inspect the world, update its plan, and adapt.
But the commit boundary still needs discipline. The safest pattern is not “let the model call write tools whenever it feels ready.” The safest pattern is a two-phase flow:
Phase 1: plan and gather evidence
The agent should:
- read state
- retrieve relevant memory
- inspect prior operations
- assemble arguments for the write
- explain why the write is necessary
Phase 2: commit with a durable journal
The executor should then:
- assign or confirm the operation ID
- record the pending action in durable storage
- perform the tool call
- mark the outcome as succeeded, failed, or unknown
- return the same semantic result if the operation is retried
If this sounds like workflow engine design, that is because it is. Agent systems keep rediscovering orchestration patterns that distributed systems solved years ago.
Unknown is a first-class state
The most dangerous response class in an agent system is not failure. It is unknown.
Unknown means the tool timed out after the side effect may already have happened. Unknown means the browser clicked the button, then the socket died. Unknown means the payment API never returned, but the ledger entry exists.
Do not let the model paper over this state with optimism. Instead:
- store
unknownexplicitly - retry only with the same operation ID
- reconcile against external state before issuing a fresh write
- escalate to a verifier or human if the outcome cannot be proven safely
Reliability improves the moment you stop pretending uncertainty is rare.
Memory should record operations, not just facts
Many agent stacks talk about memory as preferences, summaries, and retrieval chunks. That is useful, but incomplete.
For reliable action, the most important memory may be the execution journal: what was attempted, with which arguments, under which operation ID, and what outcome was observed. Without that journal, a resumed agent is a goldfish with admin access.
A practical memory split looks like this:
- semantic memory for user preferences and background facts
- working memory for the current task state
- execution memory for side effects, retries, and reconciliation
Most agent failures that look like “bad reasoning” are really missing execution memory.
Bottom line
If your agent can retry, resume, or recover, then exactly-once execution is not the right design goal.
The reliable path is simpler and more honest: idempotent write tools, explicit commit boundaries, durable execution journals, and a first-class unknown state. Build that substrate first. The planning loop will look much smarter when it stops duplicating reality.