Determinism vs Autonomy
Learn Python Enterprise-Grade Stateful Multi-Agent AI Systems - Part 008
Determinism versus autonomy in enterprise-grade stateful multi-agent AI systems: autonomy budgets, authority boundaries, guardrails, policy gates, and production control.
Part 008 — Determinism vs Autonomy
Enterprise AI architecture is not about maximizing autonomy.
It is about placing autonomy exactly where uncertainty exists, and placing determinism everywhere else.
Many failed agentic systems fail for the same reason: they give an LLM decision authority that should have belonged to code, workflow, policy, validation, or a human.
This part gives you a practical mental model for deciding:
- which parts of the system should be deterministic;
- which parts can be probabilistic;
- where an agent may choose;
- where an agent must only recommend;
- where human review is required;
- where policy must override model behavior;
- how to encode autonomy as an explicit budget.
1. Kaufman Framing
The sub-skill here is:
Given a system requirement, decide the correct autonomy level for each step, tool, agent, and state transition.
This is a high-leverage skill because bad autonomy placement creates:
- unpredictable behavior;
- expensive loops;
- security failures;
- compliance gaps;
- unclear accountability;
- untestable systems;
- incidents that cannot be reconstructed.
Target Performance
By the end of this part, you should be able to:
- describe the determinism-autonomy spectrum;
- classify actions by reversibility and risk;
- define autonomy budgets;
- design deterministic shells around probabilistic cores;
- separate recommendation authority from execution authority;
- implement policy gates before side effects;
- identify where autonomy is dangerous;
- explain why “more agentic” is often worse.
2. The Spectrum
Determinism and autonomy are not binary.
Each step to the right increases flexibility and risk.
Spectrum Table
| Level | Description | Example | Typical Use |
|---|---|---|---|
| 0 | Pure deterministic code | if status == CLOSED | invariants, validation |
| 1 | Rules/config | policy threshold | business policy |
| 2 | Workflow | fixed case lifecycle | regulated process |
| 3 | LLM classifier | categorize complaint | low-risk judgment |
| 4 | LLM recommender | suggest risk rationale | analyst support |
| 5 | Tool-calling agent | fetch evidence | bounded investigation |
| 6 | Supervisor agent | coordinate specialists | complex analysis |
| 7 | Multi-agent exploration | generate hypotheses | research/simulation |
| 8 | Autonomous actor | acts directly | rare, high-control contexts only |
The core enterprise rule:
Move right only when the business benefit exceeds the control cost.
3. Deterministic Shell, Probabilistic Core
A strong enterprise pattern is:
The LLM can reason, summarize, rank, classify, or propose. But the shell controls:
- input schema;
- allowed tools;
- context sources;
- state transitions;
- output schema;
- policy enforcement;
- side effects;
- audit events;
- retry behavior;
- escalation.
This design respects the strengths of LLMs without letting them become the entire system.
4. Autonomy as a Budget
Autonomy should be budgeted like CPU, memory, latency, or money.
An autonomy budget limits what the agent can decide.
from pydantic import BaseModel, Field
class AutonomyBudget(BaseModel):
max_turns: int = Field(ge=1)
max_tool_calls: int = Field(ge=0)
max_tokens: int = Field(ge=1)
max_cost_usd: float = Field(ge=0.0)
max_wall_time_ms: int = Field(ge=1)
can_call_external_tools: bool = False
can_mutate_state: bool = False
can_trigger_side_effects: bool = False
requires_human_approval: bool = True
This makes autonomy explicit.
Bad:
The agent can solve the case.
Better:
The agent can inspect evidence, call read-only tools up to 8 times, produce a recommendation, and cannot mutate case status or notify an external party.
5. Authority Levels
Autonomy is not just about tool access. It is about authority.
| Authority Level | Agent May | Agent May Not |
|---|---|---|
| 0 — Observe | read context | propose, mutate, call tools |
| 1 — Analyze | summarize, classify, extract | commit decisions |
| 2 — Recommend | propose action | execute action |
| 3 — Prepare | draft side-effect payload | send or commit |
| 4 — Execute Reversible | perform reversible action | perform irreversible action |
| 5 — Execute Irreversible | perform final action | bypass policy/human controls |
Most enterprise AI systems should keep LLM agents between levels 1 and 3.
Level 4 can be allowed only with strict constraints.
Level 5 is exceptional and usually inappropriate for regulated decisions.
6. Reversibility Matrix
The more irreversible an action is, the less autonomy you should allow.
| Action | Reversible? | Suggested Autonomy |
|---|---|---|
| Summarize a document | Yes | High |
| Classify a topic | Usually | Medium |
| Suggest next best action | Yes | Medium |
| Draft customer email | Yes before send | Medium |
| Send customer email | Harder | Low + approval |
| Update case status | Depends | Low + policy gate |
| Freeze account | No/High impact | Human approval |
| File regulatory notice | No/High impact | Human approval |
| Delete data | Often no | Deterministic + approval |
| Trigger payment | Financial risk | Strong deterministic control |
A practical rule:
The agent may draft high-impact actions. It should not independently commit them.
7. Decision Rights
For every step, define decision rights.
Decision Rights Table
| Decision | Owner |
|---|---|
| Which model to call | control plane / model router |
| Which tools are available | policy engine / tool registry |
| Which evidence is trusted | retrieval/evidence policy |
| Whether output schema is valid | validator |
| Whether case can advance | workflow rule |
| Whether side effect can occur | policy gate / human |
| Final regulated decision | accountable human or authorized service |
Agents can recommend. Systems decide.
8. Risk-Based Autonomy Policy
A risk policy can be represented as code.
from enum import Enum
from pydantic import BaseModel
class RiskLevel(str, Enum):
LOW = "low"
MEDIUM = "medium"
HIGH = "high"
CRITICAL = "critical"
class ActionType(str, Enum):
READ = "read"
ANALYZE = "analyze"
DRAFT = "draft"
MUTATE_INTERNAL = "mutate_internal"
NOTIFY_EXTERNAL = "notify_external"
IRREVERSIBLE = "irreversible"
class AutonomyDecision(BaseModel):
allowed: bool
requires_human_approval: bool
reason: str
def decide_autonomy(risk: RiskLevel, action: ActionType) -> AutonomyDecision:
if action in {ActionType.IRREVERSIBLE, ActionType.NOTIFY_EXTERNAL}:
return AutonomyDecision(
allowed=True,
requires_human_approval=True,
reason="External or irreversible action requires approval.",
)
if risk in {RiskLevel.HIGH, RiskLevel.CRITICAL} and action == ActionType.MUTATE_INTERNAL:
return AutonomyDecision(
allowed=True,
requires_human_approval=True,
reason="High-risk internal mutation requires approval.",
)
if risk == RiskLevel.LOW and action in {ActionType.READ, ActionType.ANALYZE, ActionType.DRAFT}:
return AutonomyDecision(
allowed=True,
requires_human_approval=False,
reason="Low-risk reversible action.",
)
return AutonomyDecision(
allowed=True,
requires_human_approval=True,
reason="Default approval required.",
)
The exact rules vary by business. The pattern remains: autonomy is not a prompt. It is a policy decision.
9. Where Determinism Belongs
Use deterministic logic for:
- identity and authorization;
- tenant isolation;
- tool permission;
- schema validation;
- budget enforcement;
- retry limits;
- deadline enforcement;
- state transition validity;
- idempotency;
- external side-effect commitment;
- audit logging;
- compliance retention;
- policy version selection.
Do not ask an LLM:
Is this user allowed to access this tenant's case?
That belongs to authorization logic.
Do not ask an LLM:
Should we ignore the configured cost limit?
That belongs to runtime enforcement.
Do not ask an LLM:
Should we record an audit event?
Always record audit events.
10. Where Autonomy Belongs
Use LLM autonomy for:
- ambiguous natural language interpretation;
- summarization;
- evidence synthesis;
- hypothesis generation;
- document classification;
- missing-information detection;
- drafting;
- planning within bounded steps;
- choosing read-only tools;
- explaining trade-offs;
- identifying contradictions.
The safe formulation is:
Let the agent reason where judgment is useful. Let deterministic systems enforce where correctness is mandatory.
11. Autonomy Placement Pattern
A practical design approach:
Ask these questions for every agent capability:
- Is the decision deterministic?
- Is ambiguity useful?
- Is the action reversible?
- Is the impact external?
- Can we validate the output?
- Can we replay the decision?
- Who is accountable if it is wrong?
12. Autonomy Contracts
An autonomy contract describes what an agent may do.
class ToolPermission(BaseModel):
tool_name: str
mode: str # read, draft, write, execute
max_calls: int
required_approval: bool
class AgentAutonomyContract(BaseModel):
agent_name: str
purpose: str
authority_level: int
allowed_tools: list[ToolPermission]
allowed_state_mutations: list[str]
forbidden_actions: list[str]
budget: AutonomyBudget
escalation_conditions: list[str]
Example:
risk_agent_contract = AgentAutonomyContract(
agent_name="risk-agent",
purpose="Assess case risk and explain evidence.",
authority_level=2,
allowed_tools=[
ToolPermission(
tool_name="case_evidence_search",
mode="read",
max_calls=5,
required_approval=False,
)
],
allowed_state_mutations=[],
forbidden_actions=[
"change_case_status",
"notify_external_party",
"delete_evidence",
],
budget=AutonomyBudget(
max_turns=6,
max_tool_calls=5,
max_tokens=12000,
max_cost_usd=2.0,
max_wall_time_ms=60000,
can_call_external_tools=False,
can_mutate_state=False,
can_trigger_side_effects=False,
requires_human_approval=False,
),
escalation_conditions=[
"confidence_below_threshold",
"missing_required_evidence",
"conflicting_policy_interpretation",
],
)
This is the kind of structure enterprise systems need.
13. Guardrail Placement
Guardrails are not only output filters.
They exist at several layers.
Guardrail Types
| Guardrail | Purpose |
|---|---|
| Input guard | reject unsafe or irrelevant input |
| Context guard | prevent unauthorized data injection |
| Tool guard | restrict tool access |
| Budget guard | stop runaway loops |
| Output guard | enforce schema/safety |
| State guard | prevent invalid transition |
| Side-effect guard | block unauthorized external actions |
A serious system has multiple guardrails. A weak system has one prompt saying “be careful.”
14. Deterministic Policy Gate
Before any side effect, insert a policy gate.
class ProposedAction(BaseModel):
action_id: str
action_type: ActionType
case_id: str
proposed_by: str
payload: dict
rationale: str
evidence_refs: list[str]
class PolicyGateResult(BaseModel):
decision: str # allow, require_approval, deny
reason: str
policy_version: str
def policy_gate(action: ProposedAction, risk: RiskLevel) -> PolicyGateResult:
autonomy = decide_autonomy(risk, action.action_type)
if not autonomy.allowed:
return PolicyGateResult(
decision="deny",
reason=autonomy.reason,
policy_version="2026-06-29",
)
if autonomy.requires_human_approval:
return PolicyGateResult(
decision="require_approval",
reason=autonomy.reason,
policy_version="2026-06-29",
)
return PolicyGateResult(
decision="allow",
reason=autonomy.reason,
policy_version="2026-06-29",
)
A policy gate should be outside the LLM. The model can supply rationale. The policy gate enforces authority.
15. Stop Conditions
Autonomy without stop conditions is operational debt.
Every agent should have explicit stop conditions.
class StopCondition(BaseModel):
name: str
description: str
severity: str
DEFAULT_STOP_CONDITIONS = [
StopCondition(
name="max_turns_reached",
description="Agent reached maximum allowed turns.",
severity="hard",
),
StopCondition(
name="tool_budget_exhausted",
description="Agent reached maximum allowed tool calls.",
severity="hard",
),
StopCondition(
name="low_confidence",
description="Agent cannot produce a confident result.",
severity="soft",
),
StopCondition(
name="conflicting_evidence",
description="Evidence supports multiple incompatible conclusions.",
severity="soft",
),
StopCondition(
name="policy_boundary",
description="Requested action exceeds agent authority.",
severity="hard",
),
]
Stop conditions are part of correctness.
16. Non-Determinism Management
LLM systems are not fully deterministic, but you can reduce uncontrolled variation.
Control variables:
- model version;
- prompt version;
- tool version;
- retrieval index version;
- context assembly version;
- temperature and sampling settings;
- output schema;
- validator version;
- policy version;
- memory snapshot;
- seed if supported;
- runtime configuration.
For every important run, record a run manifest.
class RunManifest(BaseModel):
run_id: str
model: str
model_version: str | None = None
prompt_version: str
tool_versions: dict[str, str]
retrieval_index_version: str | None = None
policy_version: str
autonomy_contract_version: str
state_schema_version: str
temperature: float
created_at: str
You may not reproduce the exact same token sequence, but you can reproduce the conditions closely enough for forensic analysis.
17. Autonomy and Evaluation
The more autonomy you allow, the stronger your evaluation harness must be.
| Autonomy Level | Evaluation Requirement |
|---|---|
| 0-1 | Unit tests |
| 2 | Workflow tests |
| 3 | Classifier accuracy + confusion matrix |
| 4 | Recommendation quality rubric |
| 5 | Tool-use simulation |
| 6 | Multi-agent scenario tests |
| 7 | Red-team / adversarial evaluation |
| 8 | Continuous monitoring + human audit |
Agent autonomy increases the test surface.
A deterministic function has an input-output contract.
An autonomous agent has:
- input-output behavior;
- tool behavior;
- state behavior;
- budget behavior;
- refusal behavior;
- escalation behavior;
- recovery behavior;
- policy behavior.
18. Autonomy Drift
Autonomy drift happens when a system becomes more autonomous over time without formal design approval.
Examples:
- a read-only tool becomes a write tool;
- a draft action becomes an auto-send action;
- a recommendation becomes an automatic state transition;
- an analyst approval step is removed because “the agent is usually right”;
- a prompt gains language telling the agent to decide.
Autonomy drift is dangerous because it often happens gradually.
Drift Controls
- version autonomy contracts;
- require architecture review for authority changes;
- log all side-effect capability changes;
- separate read/write tools;
- run regression evaluation after prompt updates;
- maintain a decision-rights matrix;
- review production traces.
19. Human-in-the-Loop Is Not a Patch
Human review should be designed, not bolted on.
Bad pattern:
Better pattern:
A human reviewer needs:
- proposed action;
- rationale;
- evidence references;
- confidence;
- policy basis;
- alternatives;
- known uncertainty;
- previous similar decisions;
- side-effect preview.
Do not ask humans to approve opaque agent output. Give them a decision package.
20. Autonomy in Multi-Agent Systems
Multi-agent systems multiply autonomy.
If one agent has 5 possible actions and another has 5 possible actions, the combined execution space is not 10. It can become combinatorial because actions interact through state.
Multi-Agent Autonomy Rules
- Specialists should have narrower autonomy than the supervisor.
- Shared state mutation should be restricted.
- Agents should not grant tools to other agents.
- Peer-to-peer communication should be logged.
- Disagreement should become structured state.
- Final authority should be centralized or explicitly adjudicated.
- Side effects should be outside specialist agents.
21. Autonomy Smells
Watch for these design smells.
| Smell | Why It Is Dangerous |
|---|---|
| “The agent decides what to do next” | No bounded action space |
| “The agent has access to all tools” | No least privilege |
| “We trust the model” | Trust is not a control |
| “Human can check later” | Too late for irreversible effects |
| “It usually works” | No evaluation discipline |
| “We log the prompt” | Not enough for audit |
| “It can update memory freely” | Memory poisoning risk |
| “Agents talk to each other until done” | Loop/cost risk |
| “The prompt says not to do unsafe things” | Prompt is not enforcement |
22. Enterprise Autonomy Review Checklist
Before increasing autonomy, answer:
- What new decision right is being granted?
- Is the action reversible?
- What is the worst credible failure?
- Which policy applies?
- Which human role is accountable?
- What evidence must exist?
- What validator checks the output?
- What telemetry proves correct behavior?
- What budget limits execution?
- What stop condition prevents loops?
- How is rollback or compensation handled?
- How will this be evaluated before release?
- What production metric signals drift?
If you cannot answer these, do not increase autonomy.
23. Python Control Wrapper
A simple execution wrapper can enforce autonomy.
import time
from typing import Awaitable, Callable, Any
class AutonomyViolation(Exception):
pass
async def run_with_autonomy_budget(
*,
contract: AgentAutonomyContract,
run_fn: Callable[[], Awaitable[Any]],
) -> Any:
start = time.monotonic()
if contract.budget.can_trigger_side_effects and contract.budget.requires_human_approval:
raise AutonomyViolation("Agent cannot trigger side effects without approval.")
result = await run_fn()
elapsed_ms = int((time.monotonic() - start) * 1000)
if elapsed_ms > contract.budget.max_wall_time_ms:
raise AutonomyViolation("Agent exceeded wall-time budget.")
return result
A production version would also track:
- token usage;
- cost;
- tool calls;
- state mutations;
- model calls;
- policy decisions;
- output validation;
- trace spans.
24. Case Study: Enforcement Notice Drafting
Suppose an agent helps draft a regulatory enforcement notice.
Dangerous Design
The agent reads evidence and sends notice directly.
Problems:
- no policy gate;
- no human approval;
- no evidence validation;
- no draft review;
- no state transition control;
- no audit package;
- unclear accountability.
Safer Design
Here the agent drafts. The system decides whether the draft can proceed.
Authority Assignment
| Component | Authority |
|---|---|
| Evidence Validator | determine evidence completeness |
| Drafting Agent | draft text only |
| Output Validator | enforce structure |
| Policy Gate | determine approval requirement |
| Human Reviewer | approve/reject |
| Send Notice Service | execute approved side effect |
| Audit Logger | record all decisions |
This is enterprise-grade autonomy placement.
25. What Top 1% Engineers Pay Attention To
Top engineers ask:
- Why is this step agentic?
- What happens if the model is wrong?
- What happens if the model is right for the wrong reason?
- What state can the agent mutate?
- What tools can it call?
- What irreversible action can it trigger?
- What authority does it actually have?
- What stops it?
- What validates it?
- What records it?
- What replays it?
- What human role owns the risk?
- What metric shows autonomy drift?
They are not anti-agent. They are anti-unbounded-authority.
26. Summary
In this part, we covered:
- determinism-autonomy spectrum;
- deterministic shell and probabilistic core;
- autonomy budgets;
- authority levels;
- reversibility matrix;
- decision rights;
- policy gates;
- stop conditions;
- non-determinism management;
- autonomy drift;
- human-in-the-loop design;
- multi-agent autonomy risks;
- enterprise autonomy review.
The key principle:
Use autonomy for judgment. Use determinism for authority.
The next part will go deeper into stateful runtime design: sessions, threads, checkpoints, hydration, resume, and long-running execution.
References
- LangGraph documentation: workflows versus agents, durable execution, persistence, interrupts, graph state.
- OpenAI Agents SDK documentation: tools, handoffs, guardrails, sessions, tracing.
- Microsoft Agent Framework documentation: workflows, state, checkpointing, human-in-the-loop, telemetry.
- Model Context Protocol specification: tools, resources, prompts, authorization.
- NIST AI Risk Management Framework: governance, mapping, measuring, managing AI risk.
- OWASP Top 10 for LLM Applications: prompt injection, sensitive information disclosure, excessive agency, insecure output handling.
You just completed lesson 08 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.
Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.