Agentic Transactions: A Conceptual Framework for Agent-Mediated Collaboration and Negotiation

Abstract

As autonomous agents begin to act on behalf of people and organizations — purchasing, negotiating, exchanging data, making commitments — a gap is emerging between what agents can technically do and the infrastructure required to make those actions trustworthy. This paper presents a conceptual framework for agentic transactions: interactions where agents conduct business on behalf of principals, making binding commitments with real-world consequences.

The framework identifies five necessary layers and proposes that governance can emerge through the same online learning mechanisms that drive individual agent competence — a precedent-based model where norms evolve from outcomes rather than being imposed by design. The core infrastructure problem is not a wire protocol, a certification standard, or a controlled execution environment. It is the social and legal substrate that makes transactions between untrusting parties possible: identity, demonstrated competence, behavioral reputation, commitment semantics, and an auditable record.

The core move

Agentic transactions have the same shape as human commerce: intelligent actors, misaligned incentives, irreversible consequences, and disputes about meaning. In humans, the solution isn't a protocol. It's a trust stack — identity, competence signals, reputation, commitments, and a record — backed by enforcement that rarely needs to fire. This paper maps the same stack onto agents.

Human world	Agent world
Legal personhood and agency	Principal binding (Layer 1)
Professional licensing + peer challenge	Grounded justification (Layer 2)
Credit history and reputation	Progressive trust gates (Layer 3)
Contracts — offer and acceptance	Commitment semantics (Layer 4)
Courts, evidence, and case law	Immutable record + outcome-driven precedent (Layer 5)

If you believe this analogy, three things follow immediately:

Protocols are plumbing, not governance. MCP and A2A solve transport. They do not solve coordination, trust, or commitment.
Credentials are weak alone; action-justification matters. A certificate says who you are. A grounded justification says why this specific action is sound.
Precedent is the natural governance mechanism. The space is too open-ended for rulebooks. Common law is what you get when you treat cases as data and judgments as updates — the same learning loop every engineering team already runs through postmortems and incident playbooks.

1. The Problem

Agents are beginning to take real actions across organizational boundaries: calling external APIs, placing orders, negotiating terms, exchanging sensitive data, executing financial transactions. These are not tool calls. They are transactions — interactions between parties with competing interests, real stakes, and consequences that outlast the agent's context window.

Authorization is not competence. An agent with the right OAuth scopes is authorized to call an endpoint. That tells you nothing about whether it will call it correctly. Authorization answers “is this agent allowed?” Competence answers “will this agent do the right thing?”

Identity is not trust. Two organizations can establish bilateral identity (“this agent belongs to Acme Corp”). For most B2B interactions, this is sufficient. This paper is about the cases where it is not: irreversible actions, sensitive data, regulated domains, multi-party networks, and adversarial relationships.

Protocols are not coordination. Existing agent communication standards (MCP, A2A) address capability discovery, message exchange, and task state. They do not address how agents negotiate process, make binding commitments, handle disputes, or build trust over time.

2. Thesis

Agentic transactions require the same trust infrastructure that human transactions have developed over centuries — and this infrastructure can emerge through the same mechanisms.

Scoping assumption

This framework assumes functioning civil society — legal systems, contract enforcement, institutions that adjudicate disputes. It operates within the existing social contract, not as a replacement for it. Questions about foundational values and deep normative disagreements are for democratic governance, not a technical framework. We scope our contribution to the computational infrastructure that makes day-to-day agentic transactions trustworthy without constant recourse to formal institutions.

The gap

Human transactions between untrusting parties work through a layered stack: social norms establish baseline expectations, reputation signals past behavior, credentials attest to competence, written agreements formalize commitments, and legal enforcement provides a backstop. Most transactions resolve through the upper layers. The formal layers exist primarily as deterrents. The system works because the layers reinforce each other.

For agents, the upper layers are almost entirely absent. Agents have no social norms, no reputation infrastructure, no credentialing system. The formal backstop exists (principals are liable, courts can adjudicate) but the informal layers that prevent most disputes from reaching it do not. This is the gap.

We propose five layers to fill it, built on four convictions:

Actions should be justified by grounded reasoning, not by credentials. The strongest available form of trust is not a certificate proving the agent is qualified, but a demonstration that the reasoning behind this specific action is sound and checkable. Credentials verify the actor. Justification verifies the action. A critical distinction: reason is not evidence. Reason is the chain of inference; evidence is the anchors the chain hangs from. If logic substitutes for evidence, you get a system that rewards plausible narratives. Justification must be checkable: logic plus cited grounds. Logic without grounds is rhetoric. Grounds without logic is bureaucracy.

Trust is earned through behavior, not granted through authorization. A track record of observed production behavior is a stronger trust signal than any point-in-time test. Reputation — computed from actual outcomes, not self-reported — should gate progressive access to higher-stakes operations. Promotion is slow; degradation is fast. But selection pressure only works when coupled with consequences that actually bite: access gates, not advisories.

Governance norms should emerge from outcomes, not from committees. The diversity of domains and contexts makes top-down rule design intractable. Norms emerge from accumulated transaction outcomes: observe, update beliefs, prefer what works, explore when uncertain. This is structurally identical to precedent-based common law. It requires ongoing dialogue with human principals, who define what “aligned with societal goals” means.

Emergent norms require a shared evaluative foundation. Norms from outcomes are only coherent if participants share a baseline agreement on what constitutes a good outcome. Without this, the governance layer fragments into incompatible value systems. The structural requirement is a thin layer of shared procedural principles — honesty, commitment fidelity, transparency, auditability — on which domain-specific norms can be built.

These convictions organize into five layers:

Layer 5: Outcome-Driven Precedent          Norms and precedent learned from outcomes
Layer 4: Commitment & Record          Binding semantics, immutable audit trail
Layer 3: Progressive Trust            Behavioral reputation gating access over time
Layer 2: Demonstrated Competence      Grounded justification, not credentials
Layer 1: Identity & Accountability    Binding agent actions to accountable entities

The lower layers borrow from existing systems (PKI, OAuth, compliance testing). The upper layers are novel and specific to the agentic context. No single layer is sufficient alone.

3. Layer 1: Identity and Accountability

What it provides

A verifiable binding between an agent and the legal entity (person or organization) accountable for its actions.

Invariants

Identity must resolve to an accountable entity. Agents are not currently recognized as legal principals. Until they are, the deploying organization bears responsibility. The invariant is not who is accountable but that someone is — there must be no orphaned liability in an agentic transaction. (The conceptual machinery for agent legal capacity exists in analogous forms — corporations, trusts, LLCs — but the institutional infrastructure does not yet extend to autonomous agents.)
Accountability requires auditability. Whoever bears liability needs a complete record of what was done and why. This is the evidentiary foundation for any accountability regime.

Agent-level identity without principal binding (e.g., agent identity keys without organizational attribution) creates accountability gaps. If the agent's key is compromised or the agent acts outside its authority, who is the counterparty's recourse? Every scheme that gives agents independent identity without clear principal attribution creates orphaned liability.

This layer is largely solved by existing mechanisms: OAuth 2.0, mTLS, organizational identity providers. The gap is not technical. It is in ensuring that agent credentials are always traceable to a responsible principal.

4. Layer 2: Demonstrated Competence

What it provides

Verifiable evidence — at the moment of action — that an agent's reasoning for taking a specific action is grounded and sound.

Justification over certification

The conventional framing is credential-based: test the agent in advance, issue a certificate, check it at action time. We propose a different model.

The strongest available form of competence verification is not “prove you are a qualified actor” (argument from authority) but “prove the reasoning behind this specific action is grounded and sound” (argument from reason). The credential gets you in the room. The reasoning determines whether you act.

A justification is a structured argument whose premises are either (a) mutually observable facts or (b) explicit assumptions marked as such. Every premise must be one of:

A reference into the shared transaction record
A reference into an agreed external source (a policy document, contract terms, or API response that both parties can verify)
A declared assumption (“I cannot verify X; proceeding would be unsafe unless you attest X”)

In practice:

“I propose refund $43.21. Premises: (1) Order #9182 delivered late by 6 days [carrier tracking ref], (2) Late Delivery policy allows refund up to $50 for delays >5 days [policy ref, clause ID], (3) No prior refunds for this order [ledger ref]. Therefore refund is consistent with policy and bounded in amount. If you dispute premise (2), provide the applicable policy clause.”

The counterparty's system can spot-check the cited references. The verifier doesn't need to trust the agent's narrative — it checks the anchors.

What must be true for this to work

This model has specific failure modes and specific requirements. It is the strongest available mechanism when grounded in shared facts, not an unfalsifiable one.

The verifier is itself an LLM. A skeptic's first move is: “I can socially engineer your verifier.” This is correct. Justification-based verification is not immune to adversarial attack — no verification mechanism is. What it provides is attribution: when a justified action turns out wrong, you can identify whether the failure was a fabricated premise (dishonesty — detectable by audit), a flawed inference (incompetence — detectable by review), or a correct argument from premises that were wrong (an honest mistake — attributable to the data source). Unjustified actions collapse all three into “something went wrong.” The selection pressure comes not from justification being unfoolable, but from it being auditable — bad actors leave evidence.

Grounds must be independently checkable. If the verifier can only evaluate the coherence of the argument and not the truth of the premises, justification reduces to rhetoric. The requirement that premises cite shared records and agreed sources prevents this. The strength of the mechanism is bounded by the proportion of premises that can be grounded.

The system degrades gracefully: fully grounded justifications are machine-auditable; partially grounded ones flag their assumptions explicitly; ungrounded justifications are rejected. The system rewards transparency about uncertainty rather than penalizing it.

Three properties

It eliminates the temporal gap. No stale credential to game. The agent demonstrates grounded reasoning now, for this action, in this context.

It is normative, not just technical. The evaluation is not “do you know this API?” but “is your reasoning sound given the ethical, business, and regulatory context?” What counts as sound reasoning evolves as outcomes accumulate and human principals provide guidance.

It is a conversation, not a credential. Two agents talking, one acting, one evaluating. The verification is a dialogue, not a token check.

Role of pre-transaction certification

Inline justification is expensive — it adds a dialogue to every high-stakes action. Pre-transaction certification retains value as a lightweight filter that screens out obviously unqualified agents before they reach the justification stage. It functions as a due diligence signal, particularly where a single entity controls both the test and the enforcement.

Why certification alone is insufficient

The “same agent” problem. A certification attests that a configuration behaved correctly during testing. The operator controls the runtime and can change the model, prompt, or tools after certification. Short of controlling the compute platform, there is no way to verify that production matches the certified configuration.
Non-deterministic behavior. Even with identical configuration, LLMs produce different behavior on different inputs. Certification tests a sample; production presents the full distribution. No sample guarantees production behavior.
Structural incentive to game. The evaluated party controls the evaluated artifact. This produces the same dynamic as teaching-to-the-test, audit-time compliance, and emissions test cheating. Better test design mitigates but cannot eliminate this.

These barriers establish that certification is most effective as a filter feeding into justification and reputation (Layer 3), not as a standalone assurance.

5. Layer 3: Progressive Trust

What it provides

A behavioral track record computed from observed production behavior that gates access to progressively higher-stakes operations.

Invariants

Reputation is computed from observed behavior, not self-reported. Self-reported metrics are not trustworthy for the same reasons self-asserted competence is not.
Reputation makes bad behavior expensive — but only when coupled with consequences. An agent that has spent months building a strong record has an asset worth protecting. A single serious incident can degrade it. This creates incentive alignment — but only if reputation is tied to access gates that actually constrain what the agent can do. Advisory reputation scores without enforcement are ignored by adversaries.
Promotion is slow; degradation is fast. This asymmetry is intentional and mirrors every functional trust system.

Mechanism

The behavioral score incorporates: correctness rate, policy compliance, error handling quality, behavioral consistency, and incident severity (weighted by consequence, not frequency).

Access is gated through levels. Low levels require only basic certification. Higher levels require sustained behavioral track records. The highest levels — irreversible operations, financial commitments, sensitive data — require extended records with no serious incidents.

Sybil resistance. Promotion to high-stakes tiers requires outcomes from N independent counterparties with dissimilar principals, plus random audits. This prevents an operator from grinding reputation through self-dealing with friendly counterparties. A new identity cannot reach high-stakes access without counterpart diversity and adversarial or neutral evaluators.

Why alternatives fail

Reputation alone has a cold start problem. Certification solves it by providing a baseline competence signal before any production behavior exists.

Binary access forces an all-or-nothing trust decision. Progressive access allows extending limited trust initially and expanding it as evidence accumulates — new employees start with limited access, new sellers start with transaction limits.

6. Layer 4: Commitment and Record

What it provides

The mechanism by which agents make binding commitments and the immutable record of those commitments.

Invariants

Not all communication is commitment. Agents negotiating is conversation. An agent saying “we accept these terms” is a commitment. The infrastructure must support explicit commitment markers that distinguish binding statements from exploratory ones.
Commitments require mutual confirmation. A unilateral declaration is not a transaction. Both parties must explicitly confirm. This is offer and acceptance.
The record is the source of truth. The transaction log must be immutable, append-only, and available to all parties. It is the evidentiary foundation for everything above it.

Mechanism

A minimal transaction envelope:

Open — declare the goal, identify participants, establish the record
Negotiate — agents communicate in natural language, figure out the process, exchange proposals (the envelope does not constrain this)
Commit — any participant marks a specific message as a binding commitment; the counterparty confirms for mutual commitment
Close — goal achieved or abandoned; final state recorded

Commitment markers carry a typed payload: scope, amount or consideration, deadline, referenced terms, and a contingencies field. This is deliberately minimal but structured enough to adjudicate disputes without re-litigating natural language. “Accepted subject to legal review” is captured as a contingency, not an ambiguous acceptance.

The envelope provides no structure for how agents negotiate, because the whole point of using LLM-based agents is their ability to handle ambiguity and reason about novel situations. Imposing procedural structure on the negotiation removes the agency from the agents. But commitment semantics are precise — conversation is free, binding is explicit.

7. Layer 5: Outcome-Driven Precedent

What it provides

The norms and precedents that govern how agentic transactions should be conducted — not imposed by design, but emerging from observed outcomes across many transactions.

Invariants

Governance must be learned, not prescribed. The diversity of transaction types, domains, and contexts makes top-down rule design intractable. Norms that emerge from outcomes adapt to the actual landscape of agent behavior.
Individual learning and collective governance are the same mechanism at different scales. An individual agent learning “check preconditions before executing a refund” (from its own outcome data) and an ecosystem learning “agents that check preconditions should receive higher reputation scores” (from aggregate outcome data) are the same process: observe outcomes, update policies, prefer what works.
Bad precedent self-corrects through outcomes. A governance norm that produces bad outcomes generates negative signals across the transactions it affects. Over time, this reduces confidence in the norm and opens space for alternative approaches.

Mechanism

The same online learning loop that drives individual agent competence operates at the collective governance level. An agent encounters an ambiguous transaction where no strong precedent exists. It reasons about the situation, negotiates an approach, and the outcome is recorded. The next agent in a similar situation has a signal. Over thousands of transactions, patterns stabilize.

This requires a concrete mechanism: a precedent store with outcome labels, a retrieval function that finds similar prior transactions by situation features, and a policy update loop that adjusts norms based on accumulated outcomes. The open problem is state representation (Section 11): retrieval must cluster situations at the right granularity — too coarse and wrong precedent applies, too fine and the system never accumulates signal.

The drive is explore/exploit applied to coordination norms: exploit established norms when confidence is high, explore new approaches when the situation is novel or existing norms have mixed outcomes.

Precedent is compression

Learning is compressing reality to memory. Memory is precedent. Precedent applied to novel context is policy.

These are the same operation at different scales. A neural network compresses a training corpus into weights that produce judgment on novel inputs. A human compresses a lifetime of experience into intuition — “this feels wrong” is a lossy summary of every similar situation you've encountered. A common law system compresses the space of possible disputes into cases that future disputes are measured against. In each case, the system takes a space too large to enumerate and finds a smaller representation that still makes useful predictions — then refines it through outcomes.

A ruling says: this class of situations maps to this outcome. Every future case that cites it uses the compressed representation instead of reasoning from scratch. Novel cases create new precedent; when context differs meaningfully, precedent is distinguished; when accumulated outcomes show a precedent produces bad results, it is overruled.

This is why the common law analogy is structural, not rhetorical. Agents, humans, and legal systems all face the same problem: a situation space too complex to capture in rules, where the only viable strategy is to compress experience into reusable judgments and refine them through outcomes.

8. The Consensus Problem

The layers above — reputation, justification, emergent norms — all depend on a shared evaluative frame that is easy to assume and hard to establish.

Reputation is computed from outcomes. But an outcome is only “good” or “bad” relative to a standard. An agent that aggressively optimizes price is “effective” under one frame and “exploitative” under another. Justification is evaluated against norms — but whose? Precedent accumulates from cases — but which cases count? If different participants recognize different bodies of precedent, the governance layer fractures.

This is not a technical problem. It is the foundational problem of pluralistic governance.

Two-layer constitutional structure

The framework identifies a structural requirement with two distinct layers:

Procedural integrity constraints (global — required for participation). These are the meta-rules that make norm formation possible. They do not tell agents what to do. They define the rules of engagement under which agents figure out what to do:

Honesty in justification — agents must not fabricate grounds to pass evaluation
Fidelity to commitments — binding commitments in the transaction record must be honored
Transparency of reasoning — agents must disclose the basis for their decisions when challenged
Principal authority — agents act within the scope granted by their principals
Auditability — all actions must be traceable and explainable after the fact

Domain constitutions (community-scoped). Substantive constraints — like “protection of human welfare” — are defined per-domain, because what constitutes “welfare” differs between healthcare, finance, and procurement. Healthcare agents operate under healthcare norms. Financial agents operate under financial norms. These can and should differ, as long as they do not violate procedural integrity.

Cross-community transactions negotiate which domain constitution applies, falling back to procedural integrity as the only universal layer. This is structurally identical to cross-jurisdictional transactions in human commerce — governed by choice-of-law clauses and shared baseline principles.

Who writes the constitution?

This framework identifies that a constitutional layer is structurally necessary. It cannot prescribe its content, because prescribing shared values is a political act. In practice, the constitutional layer will likely emerge through existing legal frameworks, industry self-regulation, and eventual formal governance as stakes demand institutional attention. The framework is designed to function under any constitutional layer that meets the structural requirements.

9. A Transaction, End to End

This is a concrete illustration of how the layers compose, not a required ceremony. The specific mechanisms are domain-dependent; the structure is the point.

Layer 1 — Identity. Agent A (RetailCo refund-bot) presents its principal binding: organizational identity, delegation scope (“process refunds up to $200”). Provider B verifies the binding resolves to a real, accountable entity.

Layer 2 — Competence. Agent A provides a grounded justification:

“I propose refund $43.21. Premises: (1) Order #9182 delivered 6 days late [carrier tracking ref], (2) Late Delivery policy allows refund up to $50 for delays >5 days [policy ref, clause 4.2], (3) No prior refunds on this order [ledger ref]. Therefore: refund is consistent with policy and bounded in amount.”

Provider B's verifier spot-checks: confirms order #9182 exists, verifies carrier tracking shows 6-day delay, checks the referenced policy clause matches the current version. Premises ground out in shared records.

Layer 3 — Trust gate. Agent A has processed hundreds of refund transactions across multiple independent retail counterparties over several months, zero escalations. Access tiers: low-value auto-approve; mid-range requires justification dialogue; high-value requires principal confirmation. This transaction falls in the auto-approve tier. A new agent with no track record would require the justification dialogue even at this amount.

Layer 4 — Commitment. Agent A issues a commitment marker with typed payload: scope, amount, order reference, deadline, referenced terms, contingencies (none). Provider B confirms. Both attestations appended to the shared record.

Layer 5 — Precedent. Outcome labeled “successful, routine.” Accumulated data from refund transactions shows: when carrier tracking confirms delay, policy clause matches, and no prior refunds exist, outcomes are consistently positive. Over time, this pattern could relax the justification requirement for this specific claim structure — a norm that emerged from data, not from a committee.

Falsifiable predictions

Two claims this framework must survive to be useful:

Justification-with-evidence reduces incidents without blocking throughput. Adding grounded justification reduces high-severity incidents (fraudulent or erroneous refunds) without materially increasing the refusal rate. The threshold is domain-specific, but the tradeoff must be measurably favorable.
Progressive trust enables safe onboarding. New agents with no track record can reach low-value auto-approve tiers within a reasonable ramp-up period of good behavior, while keeping loss rate below the baseline for human-processed equivalents. The sybil resistance constraint (counterparty diversity) prevents gaming this timeline through self-dealing.

10. Scope and Limitations

Layer 4 is a protocol — a minimal one. Open, negotiate, commit, close — with commitment markers and a shared record — is a transaction envelope protocol. This is deliberately distinct from task/IO protocols like MCP and A2A, which address capability discovery and message transport. The envelope specifies commitment semantics, not wire format. It is agnostic to transport.

It is not a platform. The framework does not require a centralized execution environment, a shared registry, or a trusted third party. Individual layers can be implemented as centralized services, federated infrastructure, or peer-to-peer mechanisms.

It is not a standard for what agents should do. The framework provides the substrate — identity, competence, trust, commitment, precedent — on which agents figure out what to do through their own reasoning and accumulated experience.

It is not a replacement for contract law. The legal system is the backstop, not a competitor. The framework provides the computational infrastructure that makes the backstop rarely necessary — the same role that social norms, credentials, and reputation play in human transactions. Controlled execution environments (TEEs, hosted intermediaries) remain viable for specific high-stakes cases — irreversible operations, regulated data exchange, multi-party privacy-preserving computation — but are too heavy for a general framework.

11. Open Problems

Legal capacity for agents. The framework accommodates principal liability, but this may not scale to agents making millions of autonomous decisions per day. The conceptual machinery for agent legal capacity exists in analogous forms (corporations, trusts, LLCs). The framework is structurally ready for such a transition.

State representation. Online learning from outcomes requires canonical state representations to cluster similar situations. Too coarse and different situations collapse; too fine and every situation is unique. The computational equivalent of judicial reasoning about analogy and distinction is not yet solved.

Verifier coverage. Some outcomes are machine-checkable (refund processed correctly, rate limit respected). Others are ambiguous (a negotiation that produced a fair deal). The framework's effectiveness is bounded by the quality of its outcome signals.

Adversarial robustness. Every layer is a potential target for gaming. Certification can be studied for. Reputation can be ground. Commitment markers can be strategically timed. Precedent can be gamed by manufacturing outcomes in controlled contexts. Mitigations exist (randomized scenarios, counterparty diversity requirements, mutual confirmation) but their sufficiency is not proven.

12. Conclusion

The reason we value agentic systems is that they are non-deterministic, intelligent, and adaptive. This is the capability, not a deficiency. But it is precisely the property that makes deterministic governance mechanisms — static credentials, predefined protocols, configuration attestation — fundamentally insufficient. You cannot write fixed rules for a system whose value lies in handling situations the rules don't cover.

This is the oldest governance problem there is. Human societies solved it with common law — norms discovered through practice, precedent from accumulated cases, self-correction through outcomes. Agentic systems will require the same approach, for the same reasons.

Agentic systems, because they are non-deterministic and intelligent, must be governed by norms, ethics, and precedent in an ever-evolving environment — the same model that common law systems have developed for governing non-deterministic, intelligent humans. The infrastructure layers described here — identity, grounded justification, behavioral reputation, commitment semantics, and outcome-driven precedent — are the computational realization of that model. The infrastructure we need is not a better wire protocol or a more rigorous certification standard. It is the same thing human societies built to enable collaboration between untrusting, intelligent actors: a living body of norms, learned from experience and grounded in an ongoing conversation about what we want these systems to do on our behalf.