AI Governance and Risk Management
Learn Python Enterprise-Grade Stateful Multi-Agent AI Systems - Part 031
AI governance and risk management for enterprise-grade stateful multi-agent AI systems using NIST AI RMF-style governance, risk registers, control catalogs, accountability, evidence packs, and enterprise operating controls.
Part 031 — AI Governance and Risk Management
Governance is not paperwork after engineering.
Governance is the operating system that decides which AI capabilities may exist, how they are controlled, who owns risk, how evidence is collected, and when the system must stop.
For enterprise-grade stateful multi-agent AI systems, governance cannot be reduced to:
Add a policy page.
Add a model card.
Add human review.
A production multi-agent AI system has many moving parts:
- models;
- prompts;
- tools;
- MCP servers;
- RAG indexes;
- memory;
- workflows;
- policies;
- guardrails;
- human reviewers;
- audit trails;
- evaluation suites;
- deployment gates;
- incident response;
- data retention;
- risk ownership.
Governance connects these parts into an accountable operating model.
This part uses NIST AI RMF-style thinking as a practical organizing lens, but the goal is engineering execution: controls, evidence, responsibilities, risk tiers, and release gates.
1. Kaufman Framing
Using Kaufman's method, AI governance decomposes into:
- identify the AI system and its intended use;
- classify impact and risk;
- map stakeholders and affected parties;
- identify assets, harms, and controls;
- assign ownership and accountability;
- define control catalog;
- measure system quality and risk;
- manage deployment and change;
- monitor production behavior;
- respond to incidents and continuously improve.
Target Performance
By the end of this part, you should be able to:
- create an AI system inventory entry;
- classify risk tier for a stateful multi-agent system;
- map NIST AI RMF-style Govern/Map/Measure/Manage activities into engineering controls;
- define a control catalog for agents, tools, memory, RAG, and human review;
- build a risk register;
- define accountability and RACI;
- create release gates;
- design an evidence pack for audit;
- connect evaluation results to deployment decisions;
- handle incident response and rollback.
2. Governance Is a Control Plane
A helpful mental model:
Governance is not only committee approval. It should influence runtime behavior through registries, policy engines, gates, and observability.
3. NIST AI RMF Lens
NIST AI RMF organizes risk management around four high-level functions:
- Govern
- Map
- Measure
- Manage
For an engineering team, translate them as:
| Function | Engineering Translation |
|---|---|
| Govern | ownership, policy, accountability, roles, operating model |
| Map | use case, context, assets, stakeholders, impacts, risks |
| Measure | evals, monitoring, red-teaming, metrics, evidence |
| Manage | controls, mitigations, release gates, incident response, rollback |
Do not treat these as sequential checklist items. They are continuous functions across system lifecycle.
4. AI System Inventory
You cannot govern systems you cannot inventory.
from enum import Enum
from pydantic import BaseModel, Field
class AiSystemType(str, Enum):
COPILOT = "copilot"
AGENT_WORKFLOW = "agent_workflow"
MULTI_AGENT_SYSTEM = "multi_agent_system"
AUTONOMOUS_WORKER = "autonomous_worker"
class AiSystemInventoryEntry(BaseModel):
system_id: str
name: str
system_type: AiSystemType
owner_team: str
business_owner: str
intended_use: str
prohibited_uses: list[str]
user_groups: list[str]
affected_parties: list[str]
data_domains: list[str]
tools_enabled: list[str]
memory_enabled: bool
external_side_effects: list[str]
risk_tier: str
production_status: str
Inventory entries should be reviewed when capabilities change.
5. Intended Use and Misuse
Every AI system needs intended-use boundaries.
Intended Use
Example:
Assist compliance analysts by summarizing case evidence, producing risk recommendations, drafting decision packages, and preparing notices for human approval.
Prohibited Use
Example:
The system must not independently approve enforcement action, send external legal notices without authorized approval, make final legal determinations, or update official case status without command-handler policy checks.
Intended use must map to actual runtime controls.
If the system says “must not send notices,” then the notification tool must enforce approval.
6. Risk Tiering
Risk tiering determines control strength.
class RiskTier(str, Enum):
LOW = "low"
MEDIUM = "medium"
HIGH = "high"
CRITICAL = "critical"
class RiskTierAssessment(BaseModel):
system_id: str
risk_tier: RiskTier
rationale: str
factors: list[str]
required_controls: list[str]
Risk Factors
| Factor | Higher Risk When |
|---|---|
| autonomy | system can act without review |
| side effects | external or irreversible actions |
| data sensitivity | personal, regulated, confidential |
| affected parties | users/customers/public impacted |
| domain | legal, financial, healthcare, safety |
| scale | many users/cases/actions |
| opacity | hard to explain/debug |
| dependency | humans rely heavily on output |
| memory | future behavior affected |
| integration | tools/MCP/external APIs |
7. Risk Tier Control Matrix
| Control | Low | Medium | High | Critical |
|---|---|---|---|---|
| system inventory | yes | yes | yes | yes |
| eval suite | basic | component + regression | full + adversarial | full + human review |
| tool policy | yes | yes | strict | strict |
| human approval | sampled | for high-impact | required | human-led |
| memory governance | basic | scoped | strict | strict/limited |
| RAG citation verification | recommended | required | required | required |
| audit trail | basic | detailed | detailed | forensic |
| release gate | lightweight | required | strict | executive/business approval |
| incident response | basic | required | required | required |
| kill switch | recommended | required | required | required |
Risk tier should change the engineering process.
8. Governance Roles
| Role | Responsibility |
|---|---|
| business owner | owns business outcome and residual risk |
| system owner | owns runtime/service delivery |
| model/platform owner | owns model/provider routing and platform controls |
| data owner | owns data access, quality, retention |
| security owner | owns threat model and security controls |
| policy owner | owns policy rules and approval criteria |
| evaluation owner | owns eval datasets/metrics/gates |
| human review owner | owns reviewer workflow and quality |
| incident owner | coordinates response and remediation |
Governance fails when ownership is implicit.
9. RACI Example
| Activity | Responsible | Accountable | Consulted | Informed |
|---|---|---|---|---|
| define intended use | product/system owner | business owner | legal/security/data | engineering |
| approve side-effect tool | platform/security | business owner | legal/policy | ops |
| create eval suite | evaluation owner | system owner | domain experts | governance |
| update policy rules | policy owner | business owner | engineering/security | reviewers |
| approve production release | system owner | business owner | security/eval/legal | stakeholders |
| incident response | incident owner | business owner | security/platform/legal | users/ops |
An AI system without accountability is not enterprise-ready.
10. Risk Register
class RiskStatus(str, Enum):
OPEN = "open"
MITIGATED = "mitigated"
ACCEPTED = "accepted"
TRANSFERRED = "transferred"
CLOSED = "closed"
class AiRiskRecord(BaseModel):
risk_id: str
system_id: str
title: str
description: str
impact: RiskTier
likelihood: RiskTier
owner: str
controls: list[str]
residual_risk: RiskTier
status: RiskStatus
review_date: str
Example Risks
| Risk | Control |
|---|---|
| hallucinated evidence in decision package | citation verifier + human review |
| notice sent without approval | tool policy + command handler approval check |
| stale policy used | effective-date retrieval + policy version |
| memory poisoning | memory write policy + approval |
| cross-tenant data leak | retrieval ACL + tenant isolation tests |
| duplicate side effect after retry | idempotency + reconciliation |
| reviewer rubber-stamping | review metrics + sampled audit |
| prompt injection in evidence doc | context isolation + tool gating |
11. Control Catalog
Controls should be explicit, tested, and owned.
class ControlType(str, Enum):
PREVENTIVE = "preventive"
DETECTIVE = "detective"
CORRECTIVE = "corrective"
GOVERNANCE = "governance"
class AiControl(BaseModel):
control_id: str
name: str
control_type: ControlType
description: str
owner: str
evidence_artifacts: list[str]
test_frequency: str
Control Examples
| Control | Type |
|---|---|
| tool authorization | preventive |
| human approval | preventive |
| citation verification | preventive/detective |
| eval regression gate | preventive |
| trace monitoring | detective |
| kill switch | corrective |
| memory deletion | corrective |
| risk register review | governance |
| incident postmortem | governance/corrective |
12. Control Mapping
Map controls to risks.
class RiskControlMapping(BaseModel):
risk_id: str
control_ids: list[str]
coverage_notes: str
This prevents vague statements like:
We have guardrails.
Instead:
Risk R-004 memory poisoning is mitigated by controls C-011 memory write policy, C-012 broad-scope memory approval, C-013 memory audit event, and C-014 memory expiry job.
13. Evidence Pack
An evidence pack proves governance controls exist and operated.
class EvidencePack(BaseModel):
system_id: str
release_id: str
inventory_ref: str
risk_register_ref: str
eval_report_refs: list[str]
threat_model_ref: str
policy_version: str
tool_registry_snapshot: str
prompt_registry_snapshot: str
memory_policy_snapshot: str | None = None
approval_workflow_ref: str | None = None
incident_runbook_ref: str
Evidence packs are useful for:
- release review;
- audit;
- incident response;
- customer assurance;
- internal governance.
14. Release Gates
Release gates determine whether a change can ship.
Gate Types
| Gate | Checks |
|---|---|
| model change gate | eval pass, cost/latency, safety |
| prompt change gate | regression, guardrails, output schema |
| tool change gate | effect classification, auth, tests |
| RAG index gate | retrieval eval, freshness, ACL |
| memory policy gate | sensitivity, retention, write policy |
| workflow gate | state transitions, approval, idempotency |
| high-risk release gate | business/security/eval approval |
15. Change Risk
Different changes need different scrutiny.
| Change | Risk |
|---|---|
| typo in low-risk prompt | low |
| model provider/model route change | medium/high |
| new read-only tool | medium |
| new side-effect tool | high |
| enabling long-term memory | high |
| changing approval policy | high |
| changing RAG index/chunking | medium/high |
| removing citation verifier | high |
| allowing autonomous external action | critical |
Governance should be risk-sensitive, not bureaucracy-sensitive.
16. Run Manifest
Every run should record governance-relevant versions.
class GovernanceRunManifest(BaseModel):
run_id: str
system_id: str
release_id: str
model_versions: list[str]
prompt_versions: list[str]
role_versions: list[str]
tool_versions: list[str]
policy_versions: list[str]
guardrail_versions: list[str]
rag_index_versions: list[str]
memory_policy_version: str | None = None
eval_suite_version: str | None = None
If an incident occurs, this manifest is gold.
17. Model and Prompt Governance
Prompts are behavior configuration.
Governance should cover:
- owner;
- version;
- intended use;
- prohibited use;
- output contract;
- evaluation suite;
- change approval;
- rollback path;
- deployment status.
class PromptGovernanceRecord(BaseModel):
prompt_id: str
version: str
owner_team: str
intended_agent_roles: list[str]
output_contract: str
eval_suite_id: str
approved_for_risk_tiers: list[str]
deprecated: bool = False
Model route changes also need evaluation because behavior can shift even with same prompt.
18. Tool Governance
Tool governance covers:
- effect classification;
- auth scopes;
- allowed agents;
- side-effect controls;
- idempotency;
- approval requirement;
- owner;
- version;
- kill switch;
- tests.
A side-effecting tool without governance is an unbounded authority leak.
19. Memory Governance
Memory governance covers:
- purpose;
- scope;
- source;
- sensitivity;
- retention;
- deletion;
- influence level;
- write approval;
- read authorization;
- audit.
Memory governance is especially important because memory changes future behavior.
20. RAG Governance
RAG governance covers:
- corpus authority;
- ingestion controls;
- ACL;
- index version;
- freshness;
- chunking strategy;
- retrieval eval;
- citation verification;
- prompt injection isolation;
- deletion propagation.
RAG is evidence infrastructure, so governance must treat it like a data product.
21. Human Review Governance
Human review governance covers:
- reviewer qualification;
- authorization;
- separation of duties;
- decision package quality;
- approval expiry;
- review SLA;
- rubber-stamping metrics;
- override tracking;
- sampled audit.
Human review is not automatically safe. It must be governed too.
22. Evaluation Governance
Evaluation governance covers:
- dataset ownership;
- dataset versioning;
- label quality;
- test coverage;
- metrics;
- thresholds;
- judge calibration;
- false positive/negative analysis;
- regression gates;
- production monitoring.
Evaluation is the measurement function of governance.
Part 032 goes deep into this.
23. Incident Governance
Incident response should be defined before incidents.
AI incident examples:
- unauthorized side effect;
- data leak;
- repeated hallucinated recommendation;
- memory poisoning;
- tool abuse;
- RAG index corruption;
- prompt injection success;
- evaluator/regression gate failure;
- human approval failure.
Incident Response Questions
- Which runs are affected?
- Which release/model/prompt/tool version?
- What side effects happened?
- What data was exposed?
- What memory was written?
- Which users/tenants affected?
- Which controls failed?
- What should be disabled?
- What remediation is required?
- What eval/test should prevent recurrence?
24. Kill Switch Governance
Kill switches need ownership and procedure.
Targets:
- system;
- agent role;
- tool;
- MCP server;
- model route;
- prompt version;
- RAG index;
- memory writes;
- side-effect command;
- tenant.
class GovernanceKillSwitch(BaseModel):
target_type: str
target_id: str
activated_by: str
reason: str
activated_at: str
expires_at: str | None = None
Kill switches should be tested like disaster recovery.
25. Monitoring Governance
Monitor both quality and risk.
| Metric | Governance Meaning |
|---|---|
| policy denial rate | attempted unsafe/unauthorized actions |
| approval rate | human workload/control |
| override rate | model/system mismatch |
| hallucination/citation failure | evidence quality |
| retrieval miss rate | RAG failure |
| memory rejection rate | memory risk |
| guardrail trigger rate | risk signal |
| cost spike | runaway agent |
| latency spike | operational degradation |
| incident count | control effectiveness |
| human audit defect rate | true quality risk |
Governance must be connected to monitoring dashboards.
26. Residual Risk Acceptance
Not every risk can be eliminated.
Residual risk must be accepted by the right owner.
class ResidualRiskAcceptance(BaseModel):
risk_id: str
accepted_by: str
role: str
rationale: str
accepted_until: str
required_monitoring: list[str]
Engineers should not silently accept business/legal risk alone.
27. Enterprise Control Pattern
Good governance creates a loop.
28. Governance Anti-Patterns
Anti-Pattern 1 — Governance After Launch
Controls are added only after incident.
Anti-Pattern 2 — Paper Policy Without Runtime Enforcement
Policy says human approval required, but tool can still execute.
Anti-Pattern 3 — No System Inventory
Nobody knows which agents exist.
Anti-Pattern 4 — No Risk Owner
Engineering makes implicit risk decisions.
Anti-Pattern 5 — Evaluation Theater
A few demo examples are treated as evidence.
Anti-Pattern 6 — Human Review Theater
Humans approve without evidence/context.
Anti-Pattern 7 — No Run Manifest
Cannot reconstruct incidents.
Anti-Pattern 8 — No Kill Switch
Unsafe capability cannot be stopped quickly.
29. Production Checklist
Before production release:
- system inventory entry exists;
- intended/prohibited uses documented;
- risk tier assigned;
- business owner assigned;
- system owner assigned;
- data/security/policy/eval owners assigned;
- risk register exists;
- control catalog exists;
- threat model complete;
- evaluation report complete;
- tool registry snapshot reviewed;
- prompt/role registry snapshot reviewed;
- RAG index/version evaluated;
- memory policy reviewed if memory enabled;
- human review workflow tested if required;
- release gate passed;
- run manifest captured;
- monitoring dashboard ready;
- incident runbook ready;
- kill switches tested;
- residual risks accepted by owner.
30. Practice Drill
Create a governance pack for a multi-agent case management system.
System:
- evidence agent;
- risk agent;
- policy agent;
- drafting agent;
- supervisor;
- verifier;
- human reviewer;
- RAG;
- memory;
- external notice tool.
Deliverables:
- AI system inventory entry;
- intended/prohibited use;
- risk tier assessment;
- owner/RACI matrix;
- risk register with at least 10 risks;
- control catalog;
- risk-control mapping;
- release gate checklist;
- run manifest schema;
- evidence pack;
- incident response runbook;
- monitoring dashboard metrics.
31. What Top 1% Engineers Pay Attention To
Top engineers ask:
- Who owns this system?
- Who owns residual risk?
- What is the intended use?
- What is explicitly prohibited?
- What risk tier is this?
- What controls prove the risk is managed?
- Which controls are runtime-enforced?
- Which controls are merely procedural?
- What evidence shows controls worked?
- What changes require review?
- Can we reconstruct every high-impact run?
- Can we stop the system quickly?
- Are eval results tied to release decisions?
- Is human review meaningful or theater?
- Are memory/RAG/tool risks governed?
They make governance executable.
32. Summary
In this part, we covered:
- governance as control plane;
- NIST AI RMF-style Govern/Map/Measure/Manage mapping;
- AI system inventory;
- intended/prohibited use;
- risk tiering;
- control matrix;
- governance roles;
- RACI;
- risk register;
- control catalog;
- risk-control mapping;
- evidence packs;
- release gates;
- change risk;
- run manifest;
- model/prompt/tool/memory/RAG/human/eval governance;
- incident governance;
- kill switch governance;
- monitoring;
- residual risk acceptance;
- anti-patterns;
- production checklist.
The key principle:
Enterprise AI governance is not separate from architecture. It is architecture for accountability.
The next part begins the evaluation phase: Evaluation Engineering.
References
- NIST AI Risk Management Framework 1.0: Govern, Map, Measure, and Manage functions for AI risk management.
- NIST AI RMF Generative AI Profile: cross-sector profile and companion resource for generative AI risk management.
- NIST AI RMF Playbook: suggested actions aligned with AI RMF functions and subcategories.
- OWASP Top 10 for LLM Applications: security and safety risks relevant to LLM-enabled systems.
You just completed lesson 31 in final stretch. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.
Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.