Part 019 — Human-in-the-Loop Control Points

Human-in-the-loop is not a button that says “approve.”

It is a durable control point in a stateful system: with context, evidence, decision rights, policy, audit, timeout, escalation, and resume semantics.

Many teams add human review late:

That is not human-in-the-loop. That is post-hoc inspection.

Enterprise-grade human-in-the-loop means the system is explicitly designed to pause before authority crosses a risk boundary:

This part explains how to design human control points for stateful multi-agent systems.

1. Kaufman Framing

Using Kaufman's method, we deconstruct this skill into:

identify where human control is needed;
classify control type: approval, review, override, escalation, exception handling;
produce decision packages humans can actually evaluate;
persist interrupts durably;
resume safely after decision;
record decisions immutably;
prevent approval bypass;
handle timeout and delegation;
measure human review quality;
avoid human rubber-stamping.

Target Performance

By the end of this part, you should be able to:

design human-in-the-loop points as runtime states;
distinguish human-in-the-loop, human-on-the-loop, and human-out-of-the-loop;
create typed approval and decision package models;
design interrupt/resume flows;
place human review before high-impact side effects;
implement audit-ready approval events;
handle reviewer roles and authorization;
model timeout, escalation, override, and rejection;
prevent human review from becoming a fake control.

2. Human Control Modes

Human involvement is not one thing.

Mode	Meaning	Example
Human-in-the-loop	human must approve before action continues	approve notice before send
Human-on-the-loop	system acts, human monitors/overrides	monitor low-risk auto-classification
Human-out-of-the-loop	no human involved	deterministic low-risk transformation
Human-over-the-loop	human sets policy/thresholds, reviews aggregate performance	governance board reviews metrics
Human-after-the-loop	post-hoc audit	monthly sample review

In regulated/high-impact systems, be precise.

Control Mode by Risk

Risk	Suggested Control
low, reversible	automated + monitoring
medium, reviewable	human-on-the-loop or sampled review
high, externally visible	human-in-the-loop
critical, irreversible	human-led workflow with agent assistance

3. The Human Control Boundary

Humans should be inserted at authority transitions, not randomly.

Authority transitions include:

draft to send;
recommendation to decision;
internal note to official notice;
risk score proposal to domain field update;
evidence summary to legal/regulatory conclusion;
reversible action to irreversible action;
low-risk workflow to high-risk escalation.

A useful rule:

Put humans where authority changes, not where the model merely thinks.

4. Approval Is a Command

Approval should not be a chat message.

Bad:

User: approved

Better:

from enum import Enum
from pydantic import BaseModel, Field


class HumanDecisionType(str, Enum):
    APPROVE = "approve"
    REJECT = "reject"
    REVISE = "revise"
    ESCALATE = "escalate"
    REQUEST_MORE_EVIDENCE = "request_more_evidence"


class HumanDecisionCommand(BaseModel):
    command_id: str
    tenant_id: str
    decision_package_id: str
    reviewer_id: str
    decision: HumanDecisionType
    comment: str | None = None
    expected_package_version: int
    idempotency_key: str

Approval is an explicit command with:

reviewer identity;
package version;
decision type;
idempotency;
comment;
authorization;
audit trail.

5. Decision Package

A reviewer needs a structured package.

class DecisionPackage(BaseModel):
    decision_package_id: str
    tenant_id: str
    run_id: str
    subject_type: str
    subject_id: str
    title: str
    proposed_action: str
    proposed_by: str
    rationale: str
    evidence_refs: list[str]
    risk_level: str
    confidence: float = Field(ge=0.0, le=1.0)
    known_uncertainties: list[str] = Field(default_factory=list)
    alternatives: list[str] = Field(default_factory=list)
    policy_basis: list[str] = Field(default_factory=list)
    side_effect_preview: dict = Field(default_factory=dict)
    version: int

What a Good Decision Package Contains

proposed action;
rationale;
evidence references;
risk classification;
confidence and uncertainty;
policy basis;
alternatives;
side-effect preview;
downstream impact;
prior similar decisions if allowed;
reviewer options;
deadline;
escalation path.

Humans should not approve opaque text.

6. Human Interrupt as Runtime State

Human review should pause execution durably.

class HumanInterruptStatus(str, Enum):
    PENDING = "pending"
    DECIDED = "decided"
    EXPIRED = "expired"
    CANCELLED = "cancelled"
    ESCALATED = "escalated"


class HumanInterrupt(BaseModel):
    interrupt_id: str
    run_id: str
    thread_id: str
    tenant_id: str
    reason: str
    required_role: str
    decision_package_id: str
    status: HumanInterruptStatus
    created_at: str
    expires_at: str | None = None

Interrupt Flow

The runtime is paused, not forgotten.

7. Resume After Human Decision

Resume must be deterministic.

Resume Invariants

Resume starts from latest valid checkpoint.
Decision package version matches expected version.
Reviewer is authorized.
Decision is recorded before action.
Approved action matches package.
Side effects use idempotency.
Rejection/revision/escalation has explicit path.

8. Reviewer Authorization

Not every human can approve every action.

class ReviewerAuthorization(BaseModel):
    reviewer_id: str
    roles: list[str]
    tenant_id: str
    scopes: list[str]
    max_risk_level: str

Approval check:

def can_approve_notice(reviewer: ReviewerAuthorization, risk_level: str) -> bool:
    if "notice:approve" not in reviewer.scopes:
        return False

    if risk_level in {"high", "critical"} and "senior_reviewer" not in reviewer.roles:
        return False

    return True

Human identity and role checks belong outside the model.

9. Approval Event

Record approval as an immutable event.

class HumanDecisionEvent(BaseModel):
    event_id: str
    event_type: str = "human.decision_recorded"
    event_version: str = "1.0"
    tenant_id: str
    run_id: str
    interrupt_id: str
    decision_package_id: str
    reviewer_id: str
    decision: HumanDecisionType
    comment: str | None = None
    policy_version: str
    occurred_at: str

Audit should show:

who decided;
what they saw;
what version they saw;
when they decided;
what policy applied;
what action followed.

10. Approval Is Not Execution

Approval and execution are separate.

A human approving a notice does not mean the notice was sent.

The notification service emits notice.sent after successful execution.

This distinction prevents audit confusion.

11. Override

Override allows a human to intentionally bypass or modify a system recommendation.

Overrides need stricter controls.

class HumanOverrideCommand(BaseModel):
    command_id: str
    tenant_id: str
    decision_package_id: str
    reviewer_id: str
    original_recommendation: str
    override_decision: str
    override_reason: str
    evidence_refs: list[str]
    expected_package_version: int

Override Rules

require reason;
require higher authority for high risk;
record original recommendation;
record override rationale;
maybe trigger sample review;
never silently replace system output.

Overrides are valuable feedback for evaluation and calibration.

12. Rejection and Revision

Human decisions should support more than approve/reject.

Decision	Meaning
approve	action may proceed
reject	action must not proceed
revise	return to drafting/planning
request more evidence	route to evidence collection
escalate	send to higher authority
defer	wait until external condition
override	choose different action

A binary approve/reject workflow is often too weak.

13. Human Review Queue

A review queue is a system component.

class ReviewTask(BaseModel):
    review_task_id: str
    tenant_id: str
    interrupt_id: str
    decision_package_id: str
    required_role: str
    priority: str
    status: str
    assigned_to: str | None = None
    created_at: str
    due_at: str | None = None

Queue features:

role-based assignment;
priority;
SLA/deadline;
escalation;
workload balancing;
audit;
conflict-of-interest checks;
reviewer comments;
reassignment.

Do not implement human review only as a chat prompt.

14. Timeout and Escalation

Human review can expire.

Timeout Policy

class ReviewTimeoutPolicy(BaseModel):
    interrupt_type: str
    timeout_minutes: int
    on_timeout: str  # escalate, cancel, defer, auto_reject
    escalation_role: str | None = None

Be careful with auto-approval. It is usually unsafe for high-impact actions.

15. Separation of Duties

In regulated workflows, the person who requested or drafted an action may not be allowed to approve it.

def separation_of_duties_ok(proposer_id: str, reviewer_id: str) -> bool:
    return proposer_id != reviewer_id

This also applies to AI-assisted workflows:

agent drafts;
user requests;
reviewer approves;
service executes.

For sensitive actions, enforce reviewer independence.

16. Human Review UX

A good UI is part of correctness.

Reviewer should see:

proposed action;
generated output/draft;
evidence references;
policy basis;
uncertainty;
risk level;
side-effect preview;
system confidence;
dissent/conflicts;
previous reviewer comments;
allowed decisions;
consequence of approval.

Bad UI:

Approve this AI output? [Yes] [No]

Good UI:

Approve sending Notice Draft #456 to Entity ABC under Policy P-2026-04?
Evidence: doc_1, doc_7.
Known uncertainty: missing inspection record for May.
Consequence: external legal notice will be sent and case status will move to NOTIFIED.

Human review quality depends on decision context.

17. Rubber-Stamping Risk

Human-in-the-loop can become fake control.

Causes:

too many approvals;
poor decision package;
time pressure;
automation bias;
reviewer lacks domain expertise;
unclear accountability;
no feedback loop;
no sampling of approved decisions;
UI makes approval too easy.

Controls

risk-based routing;
decision package quality checks;
reviewer training;
randomized audit;
require comments for high-risk approvals;
track approval latency;
track override/rejection rates;
compare outcomes after approval;
monitor reviewer variance.

18. Human Feedback as Data

Human decisions are valuable evaluation data.

Events to collect:

approval;
rejection;
revision request;
override;
escalation;
evidence requested;
comment;
time to decision;
reviewer role;
downstream outcome.

Use this to improve:

routing;
prompts;
agent roles;
evaluation sets;
policy thresholds;
UI;
training examples.

But do not blindly train on all approvals. Human approval may also be wrong.

19. Review Quality Metrics

Metric	Signal
approval rate	may show quality or rubber-stamping
rejection rate	output quality issue or strict review
override rate	system recommendation mismatch
revision rate	drafting/planning quality
escalation rate	uncertainty/risk
time to decision	workload/friction
approval after high uncertainty	possible risk
reviewer disagreement	policy ambiguity
post-approval incident rate	review effectiveness
sampled audit defect rate	true quality signal

Metrics need interpretation.

20. Human-in-the-Loop in Multi-Agent Systems

In multi-agent systems, humans may review:

supervisor decision package;
disagreement artifact;
high-risk dissent;
notice draft;
policy conflict;
tool side-effect preview;
memory update;
final case transition.

Do not ask humans to review every agent message. Ask them to review meaningful control points.

21. Human Review for Memory

Memory writes may require human review when:

memory is sensitive;
memory affects future decisions;
memory is organization-wide;
memory is derived from ambiguous conversation;
memory could encode bias or stale facts.

Example memory approval:

class MemoryApprovalPackage(BaseModel):
    package_id: str
    proposed_memory: str
    subject_type: str
    subject_id: str
    source_refs: list[str]
    sensitivity: str
    retention: str
    proposed_by: str

Not all memory should be automatic.

22. Human Review for Tools

High-impact tools require approval.

Examples:

send external notice;
file regulatory action;
freeze account;
delete data;
execute payment;
update official status;
send customer communication.

Tool executor should enforce approval:

class ToolApprovalRequirement(BaseModel):
    tool_name: str
    effect_type: str
    min_reviewer_role: str
    approval_required: bool

If an agent asks to call a high-impact tool without approval, deny.

23. Human Review for Policy Exceptions

Sometimes a human approves an exception.

Example:

proceed despite missing non-critical evidence;
override default risk threshold;
reopen closed case;
send notice outside normal SLA.

Exceptions need strong audit.

class PolicyExceptionDecision(BaseModel):
    exception_id: str
    policy_id: str
    requested_exception: str
    approver_id: str
    justification: str
    expires_at: str | None = None

Policy exceptions should not become hidden precedent unless explicitly governed.

24. Audit Trail

Human control points need an audit chain.

For every high-impact action, audit should answer:

what was proposed?
by whom/what?
based on what evidence?
what did the human see?
who approved?
under what authority?
what happened after approval?
was the action actually executed?
was it later overridden or reversed?

25. Python Review Flow Sketch

class ReviewService:
    async def create_interrupt(
        self,
        *,
        run_id: str,
        thread_id: str,
        tenant_id: str,
        decision_package: DecisionPackage,
        required_role: str,
    ) -> HumanInterrupt:
        interrupt = HumanInterrupt(
            interrupt_id=new_id("hint"),
            run_id=run_id,
            thread_id=thread_id,
            tenant_id=tenant_id,
            reason=f"Human review required for {decision_package.proposed_action}",
            required_role=required_role,
            decision_package_id=decision_package.decision_package_id,
            status=HumanInterruptStatus.PENDING,
            created_at=now_iso(),
        )

        await save_decision_package(decision_package)
        await save_interrupt(interrupt)
        await create_review_task(interrupt)

        return interrupt

    async def submit_decision(
        self,
        command: HumanDecisionCommand,
    ) -> HumanDecisionEvent:
        package = await load_decision_package(command.decision_package_id)

        if package.version != command.expected_package_version:
            raise ValueError("Decision package version mismatch.")

        if not await reviewer_authorized(command.reviewer_id, package):
            raise PermissionError("Reviewer not authorized.")

        event = HumanDecisionEvent(
            event_id=new_id("evt"),
            tenant_id=command.tenant_id,
            run_id=package.run_id,
            interrupt_id=await get_interrupt_id(package.decision_package_id),
            decision_package_id=package.decision_package_id,
            reviewer_id=command.reviewer_id,
            decision=command.decision,
            comment=command.comment,
            policy_version="policy_2026_06",
            occurred_at=now_iso(),
        )

        await append_event(event)
        await mark_interrupt_decided(event.interrupt_id)

        return event

This code is intentionally simplified, but the boundaries are visible.

26. Failure Modes

Failure	Description	Mitigation
approval lost	decision stored only in UI	durable event
wrong reviewer	no authorization check	role/scope validation
stale package approved	package changed after review	expected version
approval bypass	tool executes without approval	tool executor policy
duplicate approval	user submits twice	idempotency key
rubber stamping	humans approve blindly	metrics/audit/risk-based review
opaque package	reviewer lacks evidence	decision package requirements
timeout ignored	pending forever	timeout/escalation policy
approval != execution confusion	audit says approved but not sent	separate events
no rejection path	only approve available	richer decision types
conflict hidden	reviewer sees only final answer	include dissent/conflicts

27. Production Checklist

Before adding a human control point:

28. Practice Drill

Design human-in-the-loop for an enforcement notice workflow.

Requirements:

agent drafts notice;
risk agent flags high risk;
policy agent identifies applicable rule;
supervisor creates decision package;
human reviewer approves/rejects/revises;
notice can only be sent after approval;
approval expires after 48 hours;
reviewer cannot be original requester;
all decisions must be auditable.

Deliverables:

decision package schema;
interrupt schema;
review task schema;
approval command;
human decision event;
timeout policy;
reviewer authorization rule;
separation-of-duties rule;
resume flow;
failure mode tests.

29. What Top 1% Engineers Pay Attention To

Top engineers ask:

Why is a human needed here?
What exactly is the human approving?
What evidence does the human see?
What does approval authorize?
What does approval not authorize?
What happens if the package changes?
What happens if the reviewer is not qualified?
What happens if the reviewer disagrees?
What happens if the review times out?
What happens if approval is duplicated?
What event proves approval happened?
What event proves execution happened?
Is this reducing risk or creating control theater?

They treat human review as a system design problem.

30. Summary

In this part, we covered:

human control modes;
authority boundaries;
approval as command;
decision packages;
human interrupts;
resume after decision;
reviewer authorization;
approval events;
approval vs execution;
override;
rejection and revision;
review queues;
timeout and escalation;
separation of duties;
review UX;
rubber-stamping risk;
feedback as data;
review quality metrics;
human review for multi-agent systems, memory, tools, and policy exceptions;
audit trails;
Python review flow;
failure modes;
production checklist.

The key principle:

Human-in-the-loop is only meaningful when the human has the right context, authority, timing, and audit trail.

The next part begins memory architecture: short-term, long-term, episodic, semantic, and procedural memory.

References

LangGraph documentation: interrupts and human-in-the-loop execution with persistence/checkpointers.
NIST AI Risk Management Framework: risk governance and accountability.
Enterprise workflow patterns: approval, separation of duties, audit trail, escalation.