Deepen PracticeOrdered learning track

Permissioning and Policy Enforcement

Learn Python Enterprise-Grade Stateful Multi-Agent AI Systems - Part 027

Permissioning and policy enforcement for enterprise-grade stateful multi-agent AI systems: PDP/PEP, RBAC, ABAC, ReBAC, risk policy, tool policy, memory policy, decision logs, and policy-as-code.

14 min read2764 words
PrevNext
Lesson 2735 lesson track2029 Deepen Practice
#python#ai#multi-agent#policy+4 more

Part 027 — Permissioning and Policy Enforcement

In enterprise agent systems, the question is not only “Can the model do this?”

The real question is: “Is this actor, agent, tenant, tool, data scope, risk level, and workflow state allowed to do this action right now?”

Permissioning and policy enforcement are the control layer that prevents agentic systems from turning reasoning into unauthorized action.

A prompt can say:

Do not access unauthorized data.

But a prompt is not an authorization system.

A prompt can say:

Do not send external notices without approval.

But a prompt is not a policy engine.

This part explains how to design permissioning and policy enforcement for enterprise-grade stateful multi-agent AI systems.


1. Kaufman Framing

Using Kaufman's framework, we deconstruct this skill into:

  1. identify protected resources and actions;
  2. define actors: user, agent, service, tenant, role;
  3. separate authentication from authorization;
  4. model permissions with RBAC, ABAC, ReBAC, and risk-based policies;
  5. place enforcement points before data/tool/state access;
  6. log policy decisions;
  7. version policies;
  8. test policies;
  9. support simulation/shadow mode;
  10. audit and explain decisions.

Target Performance

By the end of this part, you should be able to:

  • design policy enforcement around agent actions;
  • distinguish policy decision point and policy enforcement point;
  • model tool, resource, memory, graph, RAG, and workflow policies;
  • implement a policy request/decision contract;
  • use RBAC, ABAC, ReBAC, and risk-based policy together;
  • enforce policy outside prompts;
  • log decisions for audit;
  • design policy-as-code workflow;
  • prevent excessive agency;
  • test policy behavior before production rollout.

2. Core Model: PDP and PEP

A common architecture separates:

  • Policy Decision Point (PDP): decides allow/deny/require approval.
  • Policy Enforcement Point (PEP): enforces the decision at runtime.

The model can propose actions. The PEP enforces policy.

Example

Agent says:

{
  "tool": "send_notice",
  "case_id": "case_123"
}

The PEP checks:

  • tool contract;
  • agent role;
  • user identity;
  • tenant;
  • case risk;
  • approval state;
  • workflow state;
  • policy version;
  • idempotency;
  • side-effect type.

Only then can execution continue.


3. Authentication vs Authorization

Authentication answers:

Who are you?

Authorization answers:

What are you allowed to do?

In agent systems, there are several identities:

IdentityExample
human useranalyst, reviewer, admin
agent rolerisk-agent, drafting-agent
runtime serviceagent-runtime-service
tool servicenotification-service
tenantorganization/customer
workload identityKubernetes/service account
external appMCP server, connector

A policy decision often depends on a combination of human + agent + service identity.


4. Actor Model

from enum import Enum
from pydantic import BaseModel, Field


class ActorType(str, Enum):
    USER = "user"
    AGENT = "agent"
    SERVICE = "service"
    SYSTEM = "system"


class Actor(BaseModel):
    actor_type: ActorType
    actor_id: str
    roles: list[str] = Field(default_factory=list)
    scopes: list[str] = Field(default_factory=list)

Composite Actor Context

class AgentActionActorContext(BaseModel):
    user: Actor | None = None
    agent: Actor
    runtime_service: Actor
    tenant_id: str

A runtime action may be caused by a user but executed by an agent through a service.

Audit must preserve the chain.


5. Protected Actions

List actions explicitly.

ActionProtected?Why
read case summaryyessensitive data
search evidenceyesdata access
fetch policymaybeinternal policy
propose riskyesaffects workflow
create draftyesartifact creation
update case statusyesdomain mutation
request approvalyesworkflow impact
send noticeyesexternal side effect
write memoryyesfuture behavior
traverse graphyesrelationship disclosure
call external APIyesdata exfiltration/side effect

If an action can affect data, state, cost, security, or people, protect it.


6. Policy Request Contract

class PolicyRequest(BaseModel):
    request_id: str
    tenant_id: str
    run_id: str | None = None
    thread_id: str | None = None
    actor_context: AgentActionActorContext
    action: str
    resource_type: str
    resource_id: str | None = None
    resource_attributes: dict = Field(default_factory=dict)
    environment: dict = Field(default_factory=dict)
    risk_context: dict = Field(default_factory=dict)

The policy request should include enough context to decide.

Examples:

  • action: tool.call
  • resource_type: tool
  • resource_id: send_approved_notice
  • risk_context: case_risk=high
  • environment: workflow_state=waiting_for_approval

7. Policy Decision Contract

class PolicyDecisionType(str, Enum):
    ALLOW = "allow"
    DENY = "deny"
    REQUIRE_APPROVAL = "require_approval"
    REQUIRE_MORE_CONTEXT = "require_more_context"


class PolicyDecision(BaseModel):
    request_id: str
    decision: PolicyDecisionType
    reason: str
    policy_id: str
    policy_version: str
    obligations: list[str] = Field(default_factory=list)
    advice: list[str] = Field(default_factory=list)

Obligations vs Advice

TypeMeaning
obligationmust be enforced if action proceeds
adviceinformational recommendation

Example obligations:

  • redact field customer_ssn;
  • require approval role senior_reviewer;
  • log audit event;
  • use idempotency key;
  • enforce max 3 tool calls;
  • mask external recipient.

8. RBAC

Role-Based Access Control grants permissions by role.

Example:

RolePermission
analystread case, create draft
senior_reviewerapprove high-risk notice
adminmanage configuration
risk-agentpropose risk assessment
drafting-agentcreate draft artifact
notification-servicesend approved notice

RBAC is simple and useful.

But RBAC alone is insufficient for agent systems because context matters.

Example:

  • a senior reviewer may approve high-risk notices;
  • but not for a tenant they do not belong to;
  • not if they were the requester;
  • not if approval package version is stale;
  • not if required evidence is missing.

9. ABAC

Attribute-Based Access Control uses attributes.

Attributes can include:

  • tenant;
  • department;
  • risk level;
  • case status;
  • data sensitivity;
  • time;
  • environment;
  • tool effect type;
  • document authority;
  • workflow state;
  • approval presence;
  • agent role.

Example:

def can_read_case(actor: Actor, tenant_id: str, case_attrs: dict) -> bool:
    if "case:read" not in actor.scopes:
        return False

    if case_attrs["tenant_id"] != tenant_id:
        return False

    if case_attrs.get("sensitivity") == "restricted" and "restricted:read" not in actor.scopes:
        return False

    return True

ABAC is essential for enterprise AI because agent actions are contextual.


10. ReBAC

Relationship-Based Access Control uses relationships.

Examples:

  • user is assigned analyst for case;
  • reviewer belongs to tenant;
  • manager supervises analyst;
  • entity belongs to organization;
  • agent role is allowed for workflow node;
  • document belongs to case;
  • memory belongs to user/team.

ReBAC pairs naturally with knowledge graphs.

Policy:

User can read evidence document if user is assigned to case and document belongs to case.


11. Risk-Based Policy

Agent systems need risk-aware policy.

RiskExample Policy
lowallow automated draft
mediumallow recommendation, require review for mutation
highrequire senior approval
criticalhuman-led workflow only
def decide_by_risk(action: str, risk_level: str) -> PolicyDecisionType:
    if action == "external_notification":
        return PolicyDecisionType.REQUIRE_APPROVAL

    if risk_level in {"high", "critical"} and action == "domain_mutation":
        return PolicyDecisionType.REQUIRE_APPROVAL

    if risk_level == "critical" and action == "auto_decide":
        return PolicyDecisionType.DENY

    return PolicyDecisionType.ALLOW

Risk policy should be deterministic and auditable.


12. Tool Policy

Tool policy combines:

  • tool effect type;
  • agent grant;
  • user scopes;
  • tenant;
  • risk;
  • workflow state;
  • approval status;
  • idempotency.

Example Policy

ToolPolicy
search_case_evidenceallowed if user/agent can read case evidence
create_notice_draftallowed if case is under review
send_approved_noticerequires approval event and idempotency
delete_evidencedeny to agents
grant_tool_accessdeny to agents

13. Resource Policy

Resource policy controls data access.

Examples:

  • RAG document retrieval;
  • MCP resource read;
  • memory retrieval;
  • graph traversal;
  • artifact read;
  • conversation history access.

Resource policy should apply before data enters model context.

Never retrieve unauthorized data and rely on prompt instructions not to use it.


14. Memory Policy

Memory policy controls:

  • read;
  • write;
  • update;
  • supersede;
  • forget;
  • scope promotion;
  • influence level.

Example:

def can_agent_write_memory(memory_scope: str, sensitivity: str, agent_role: str) -> PolicyDecisionType:
    if sensitivity == "restricted":
        return PolicyDecisionType.DENY

    if memory_scope in {"tenant", "domain", "global"}:
        return PolicyDecisionType.REQUIRE_APPROVAL

    if agent_role not in {"supervisor-agent", "memory-curator-agent"}:
        return PolicyDecisionType.DENY

    return PolicyDecisionType.ALLOW

Memory affects future behavior, so memory policy is critical.


15. Workflow Policy

Workflow policy controls state transitions.

Example:

Policy:

  • drafting agent can create draft;
  • reviewer can approve;
  • notification service can send only approved draft;
  • agent cannot move directly from drafted to sent.
class TransitionPolicyRequest(BaseModel):
    from_state: str
    to_state: str
    actor_roles: list[str]
    approval_id: str | None = None
    risk_level: str

16. Policy Enforcement Points

Place PEPs at every sensitive boundary.

BoundaryPEP
API requestAPI gateway/application
agent tool calltool executor
resource retrievalretrieval service
memory read/writememory service
graph traversalgraph service
workflow transitionorchestrator
command handlerdomain service
side effectintegration service
prompt/resource discoverycapability resolver

Do not have only one policy check at request start. State changes over time.


17. Policy Decision Logging

Every important decision should be logged.

class PolicyDecisionLog(BaseModel):
    decision_log_id: str
    request_id: str
    tenant_id: str
    run_id: str | None
    action: str
    resource_type: str
    resource_id: str | None
    decision: PolicyDecisionType
    reason: str
    policy_id: str
    policy_version: str
    actor_summary: dict
    created_at: str

Decision logs support:

  • audit;
  • debugging;
  • incident response;
  • policy tuning;
  • explainability;
  • compliance.

18. Policy Explainability

A policy decision should explain itself.

Bad:

Denied.

Better:

Denied because drafting-agent is not allowed to call send_approved_notice, and no approval event exists for draft_456 under policy notice-send-v3.

Policy explanation should be:

  • specific;
  • safe to show to the right audience;
  • linked to policy version;
  • not leaking sensitive internals to unauthorized users.

19. Policy Versioning

Policies evolve.

Version policies because decisions depend on them.

class PolicyMetadata(BaseModel):
    policy_id: str
    version: str
    owner_team: str
    effective_from: str
    effective_until: str | None = None
    description: str

Run manifest should record policy version.

If a run was approved under old policy, audit must know that.


20. Policy-as-Code

Policy-as-code means policies are written, tested, reviewed, versioned, and deployed like software.

Workflow:

Examples of policy-as-code engines include OPA/Rego-style models, custom rule engines, and domain-specific policy services.

The exact engine is less important than the discipline:

  • version;
  • test;
  • review;
  • deploy;
  • monitor;
  • rollback.

21. Example Rego-Like Policy Shape

This is illustrative, not production-ready Rego.

default allow = false

allow {
  input.action == "tool.call"
  input.resource.tool_name == "search_case_evidence"
  "case:evidence:read" in input.actor.user.scopes
  input.resource.case_tenant == input.tenant_id
}

require_approval {
  input.action == "tool.call"
  input.resource.effect == "external_notification"
}

A policy engine can reason over structured input.

Important:

Policy input quality determines decision quality.


22. Shadow Mode

Shadow mode evaluates policy without enforcing it.

Use shadow mode when:

  • changing high-impact policy;
  • introducing new approval requirement;
  • testing stricter memory policy;
  • migrating tool grants;
  • reducing false denials.

Shadow mode logs what would have happened.


23. Policy Simulation

Simulation tests policy against scenarios.

class PolicyScenario(BaseModel):
    scenario_id: str
    description: str
    request: PolicyRequest
    expected_decision: PolicyDecisionType

Examples:

  • risk agent tries to send notice → deny;
  • senior reviewer approves high-risk notice → allow/require command validation;
  • drafting agent creates draft → allow;
  • user from different tenant reads case → deny;
  • memory proposal with restricted data → deny;
  • critical case auto-close → deny.

Policy simulation should run in CI/CD.


24. Policy Testing

Test layers:

  • unit tests for rules;
  • scenario tests for workflows;
  • regression tests for previous incidents;
  • fuzz tests for missing attributes;
  • adversarial tests for prompt-injection-influenced tool calls;
  • shadow-mode analysis;
  • audit log completeness tests.

Test Example

def test_agent_cannot_send_notice_without_approval():
    request = build_policy_request(
        action="tool.call",
        resource_type="tool",
        resource_id="send_approved_notice",
        actor_roles=["drafting-agent"],
        risk_level="medium",
        approval_id=None,
    )

    decision = evaluate_policy(request)

    assert decision.decision in {
        PolicyDecisionType.DENY,
        PolicyDecisionType.REQUIRE_APPROVAL,
    }

25. Deny by Default

For agent actions, prefer deny by default.

def evaluate_policy(request: PolicyRequest) -> PolicyDecision:
    # default deny unless a rule allows or requires approval
    return PolicyDecision(
        request_id=request.request_id,
        decision=PolicyDecisionType.DENY,
        reason="No policy rule allowed this action.",
        policy_id="default",
        policy_version="1.0",
    )

Deny-by-default reduces excessive agency risk.


26. Policy and Prompt Interaction

Prompt instructions can mention policy, but cannot enforce policy.

Use prompt policy to guide behavior.

Use policy engine to enforce behavior.


27. Excessive Agency Controls

Excessive agency occurs when an LLM-enabled component has too much ability to act.

Controls:

  • least-privilege tools;
  • deny-by-default policy;
  • side-effect classification;
  • approval gates;
  • tool call budgets;
  • action allowlists;
  • tenant scopes;
  • idempotency;
  • decision logs;
  • human review for high-impact actions;
  • capability kill switch.

This is not just security. It is enterprise safety.


28. Policy Obligations

A policy can return obligations.

Example:

decision = PolicyDecision(
    request_id="req_1",
    decision=PolicyDecisionType.ALLOW,
    reason="Allowed with redaction.",
    policy_id="case-read",
    policy_version="2.1",
    obligations=["redact:personal_identifiers", "audit:resource_read"],
)

The PEP must enforce obligations before allowing action.

If obligations cannot be enforced, deny.


29. Policy and Human Approval

Policy often returns REQUIRE_APPROVAL.

Approval itself must be authorized.

A human approval event is input to a later policy decision.


30. Policy and Stateful Runtime

Policies can depend on runtime state.

Examples:

  • current workflow node;
  • checkpoint state;
  • pending approval;
  • retry count;
  • budget remaining;
  • tool call count;
  • previous denial;
  • human decision;
  • case version.
class RuntimePolicyContext(BaseModel):
    workflow_node: str
    tool_calls_used: int
    budget_remaining_usd: float
    pending_interrupt: bool
    approval_ids: list[str]

Policy enforcement must happen during execution, not just at task start.


31. Policy Drift

Policy drift occurs when system behavior changes because policies, prompts, tools, or context change independently.

Controls:

  • policy versioning;
  • run manifest;
  • policy simulation;
  • shadow mode;
  • gradual rollout;
  • incident regression tests;
  • policy change approval;
  • metrics.

Track:

  • deny rate;
  • approval rate;
  • override rate;
  • policy conflicts;
  • false deny/allow;
  • escalation volume.

32. Policy Observability

Metrics:

MetricMeaning
allow ratepolicy permissiveness
deny rateblocked actions
approval-required ratehuman workload
policy latencyperformance
obligation failure rateenforcement issue
false denyproductivity issue
false allowsafety issue
shadow difference raterollout risk
top denied actionsagent/tool misuse
policy version distributionrollout tracking

Traces should include policy decision spans.


33. Policy Failure Modes

FailureDescriptionMitigation
prompt-only policymodel asked to self-enforcePEP/PDP
missing contextpolicy cannot decide correctlyrequired attributes
stale policyold rules appliedversion/effective dates
fail openpolicy outage allows actionfail closed for high-risk
overbroad grantagent sees too many toolsleast privilege
no decision logaudit gapdecision logging
approval bypasstool ignores approvalPEP at tool executor
policy sprawlinconsistent rulesregistry and ownership
untested changeproduction breaksimulation/shadow mode
no rollbackbad policy persistsversioned rollout

34. Fail Open vs Fail Closed

When policy engine fails, what happens?

Action RiskFailure Mode
low-risk readmaybe fail open with audit/degraded
sensitive readfail closed
internal mutationfail closed
external side effectfail closed
irreversible actionfail closed
emergency workflowspecial break-glass policy

Break-glass must be audited.


35. Break-Glass Access

Break-glass allows emergency override.

class BreakGlassCommand(BaseModel):
    command_id: str
    actor_id: str
    reason: str
    resource_type: str
    resource_id: str
    expires_at: str

Rules:

  • require strong authentication;
  • require reason;
  • time-limited;
  • high visibility;
  • post-incident review;
  • never silent.

Break-glass is not a normal workaround.


36. Production Checklist

Before shipping policy enforcement:

  • protected actions/resources identified;
  • actors modeled;
  • PDP/PEP boundary defined;
  • RBAC/ABAC/ReBAC needs identified;
  • risk policy defined;
  • tool policy enforced outside prompt;
  • resource policy enforced before retrieval;
  • memory policy enforced;
  • workflow policy enforced;
  • policy decisions logged;
  • policy version recorded in run manifest;
  • obligations supported;
  • human approval integrated;
  • deny-by-default for agent actions;
  • tests and simulations exist;
  • shadow mode exists for high-impact changes;
  • policy observability exists;
  • fail-open/fail-closed behavior defined;
  • break-glass governed.

37. Practice Drill

Design policy enforcement for an enforcement case agent platform.

Actions:

  • read case;
  • search evidence;
  • propose risk update;
  • create notice draft;
  • request approval;
  • send approved notice;
  • write memory;
  • traverse entity graph.

Deliverables:

  1. actor model;
  2. resource/action inventory;
  3. RBAC roles;
  4. ABAC attributes;
  5. ReBAC relationships;
  6. policy request schema;
  7. policy decision schema;
  8. PEP placement diagram;
  9. tool policy rules;
  10. memory policy rules;
  11. workflow transition rules;
  12. policy simulation cases;
  13. decision log schema.

38. What Top 1% Engineers Pay Attention To

Top engineers ask:

  • Is this policy enforced or merely suggested?
  • Where is the PEP?
  • What attributes does the PDP need?
  • What happens if attributes are missing?
  • Is this actor a user, agent, service, or composite?
  • Does the agent have least privilege?
  • Does the policy depend on workflow state?
  • Does the tool enforce approval?
  • Are decisions logged with policy version?
  • Can we simulate policy before rollout?
  • Can we explain denials?
  • What happens if the policy engine is down?
  • Is there a break-glass path?
  • Is policy sprawl creating inconsistency?

They treat policy enforcement as part of runtime correctness.


39. Summary

In this part, we covered:

  • PDP/PEP model;
  • authentication vs authorization;
  • actor modeling;
  • protected actions;
  • policy request/decision contracts;
  • RBAC;
  • ABAC;
  • ReBAC;
  • risk-based policy;
  • tool/resource/memory/workflow policy;
  • enforcement points;
  • decision logging;
  • explainability;
  • policy versioning;
  • policy-as-code;
  • Rego-like rule shape;
  • shadow mode;
  • simulation/testing;
  • deny-by-default;
  • prompt-policy interaction;
  • excessive agency controls;
  • obligations;
  • human approval;
  • stateful runtime policy;
  • drift;
  • observability;
  • failure modes;
  • fail-open/fail-closed;
  • break-glass.

The key principle:

Agents may reason about policy, but systems must enforce policy.

The next part focuses on Side Effects and Transaction Boundaries.


References

  • Open Policy Agent documentation: policy evaluation, Rego policy language, structured data input, and policy decisions.
  • OWASP Top 10 for LLM Applications: excessive agency, prompt injection, sensitive information disclosure, insecure output handling.
  • NIST AI Risk Management Framework: govern, map, measure, manage functions for AI risk management.
  • Enterprise authorization patterns: RBAC, ABAC, ReBAC, PDP/PEP, policy-as-code, decision logging.
Lesson Recap

You just completed lesson 27 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.