Final StretchOrdered learning track

AWS for AI/ML and Bedrock Production Platforms

Learn AWS Engineering Mastery - Part 034

Production AI/ML platform design on AWS using Amazon Bedrock, SageMaker AI, RAG, guardrails, model invocation logging, IAM, PrivateLink, evaluation, observability, cost controls, and governance.

[2026-07-01]25 min read4825 words

In This Lesson

AWS for AI/ML and Bedrock Production Platforms 1. Target Skill 2. Kaufman Skill Decomposition

PrevNext

Lesson 3435 lesson track30–35 Final Stretch

#aws#cloud#ai#ml+6 more

Learn AWS Engineering Mastery - Part 034

AWS for AI/ML and Bedrock Production Platforms

AI/ML on AWS is not only about calling a model.

A production AI platform must solve identity, data access, model access, inference reliability, prompt safety, retrieval quality, evaluation, observability, cost control, human oversight, audit evidence, and incident response.

The dangerous beginner assumption is:

AI platform = model endpoint + prompt.

The production-grade mental model is:

AI platform = governed data + controlled model access + safe inference boundary + measurable quality + observable runtime + accountable operations.

This part teaches how to design AI/ML and generative-AI platforms on AWS using Amazon Bedrock, SageMaker AI, Knowledge Bases, Guardrails, model invocation logging, PrivateLink, IAM, vector retrieval, evaluation, MLOps, and operational governance.

1. Target Skill

After this part, you should be able to:

distinguish classic ML workloads, generative AI workloads, RAG systems, agentic workflows, and AI-assisted business processes;
explain the production boundaries of Amazon Bedrock and SageMaker AI;
design a secure Bedrock invocation path with IAM, model access, logging, encryption, network controls, and guardrails;
design RAG using Knowledge Bases or custom retrieval while preserving data classification and access control;
reason about prompt injection, data leakage, retrieval poisoning, hallucination, model drift, and tool misuse;
define observability for AI workloads: latency, errors, tokens, cost, retrieval hit rate, guardrail interventions, quality metrics, and human feedback;
design cost controls for token-based inference, provisioned throughput, vector storage, pipelines, and endpoints;
design model governance using model registry, model cards, approval workflows, evaluation reports, and audit trails;
decide when to use Bedrock, SageMaker managed endpoints, custom containers, or external model providers;
create a golden path for regulated AI applications on AWS.

2. Kaufman Skill Decomposition

AI/ML platform engineering is too broad to learn as one topic. Decompose it into practical sub-skills.

First 20 Hours Focus

Timebox	Focus	Practice Output
2h	AI workload taxonomy	Classify 10 use cases: ML, GenAI, RAG, agent, workflow automation
3h	Bedrock invocation path	Draw secure invocation architecture
3h	RAG design	Build retrieval mental model: source → chunks → embeddings → retrieval → answer
3h	Safety model	Define threats and guardrails for one regulated assistant
3h	Observability	Define metrics/logs/traces/eval dashboard
2h	Cost model	Estimate cost drivers for tokens, vector storage, endpoints, pipelines
2h	Governance	Define approval workflow, model card, evidence, and human review
2h	Golden path	Create a production AI service template

The goal is not to become a data scientist in 20 hours. The goal is to become dangerous in the right direction: able to design safe and operable AI systems without magical thinking.

3. Core Mental Model

AI applications have two planes:

AI control plane: governs models, data sources, prompts, policies, evaluations, approvals, and evidence.
AI data plane: handles user requests, retrieval, model invocation, tool calls, responses, logging, and runtime safety.

A production AI platform must control both planes.

4. AI Workload Taxonomy

Before selecting services, classify the workload.

Workload Type	Example	Main Risk	Common AWS Building Blocks
Classic supervised ML	Fraud score, risk score	Training data quality, drift	SageMaker AI, Pipelines, Model Registry
Batch ML	Nightly prediction	Stale output, pipeline failure	SageMaker Processing/Batch Transform, Step Functions
Real-time ML inference	Eligibility decision	Latency, scaling, explainability	SageMaker endpoint, Lambda/ECS wrapper
Generative AI	Summarization, drafting	Hallucination, sensitive data	Amazon Bedrock, Guardrails, logging
RAG	Policy assistant over internal docs	Wrong retrieval, stale source, leakage	Bedrock Knowledge Bases, OpenSearch Serverless, S3
Agentic workflow	Assistant taking actions	Tool misuse, runaway loops	Bedrock Agents, Step Functions, IAM boundaries
AI-assisted human workflow	Case triage recommendation	Overreliance, auditability	Bedrock/SageMaker + human approval + evidence store

A regulated platform usually needs human accountability even when AI assists the workflow.

5. Amazon Bedrock Mental Model

Amazon Bedrock is a managed foundation model platform. It allows applications to invoke foundation models and use related capabilities such as Knowledge Bases, Agents, Guardrails, model customization, model invocation logging, and provisioned throughput depending on model and Region support.

The critical mental model:

Bedrock does not remove architecture responsibility.
It moves model hosting responsibility to AWS/model providers while leaving application, data, authorization, prompt, evaluation, and business accountability with you.

5.1 Bedrock Building Blocks

Capability	Use
Foundation models	Text, image, embeddings, reasoning, coding, multimodal use cases depending on model availability
InvokeModel / Converse APIs	Runtime model invocation
Knowledge Bases	Managed RAG over supported data sources/vector stores
Agents	Orchestrate model reasoning with actions/tools
Guardrails	Apply configurable safety/privacy filters to inputs and outputs
Model invocation logging	Capture invocation input/output metadata for monitoring/audit if enabled
Provisioned Throughput	Reserve model invocation capacity for supported models/use cases
Cross-Region inference profiles	Route inference to supported Regions for throughput/performance when residency allows
PrivateLink	Private VPC connectivity to Bedrock endpoints

5.2 Bedrock Is Not a Complete App Platform

Bedrock does not automatically solve:

application authentication;
business authorization;
tenant isolation;
source document authorization;
prompt lifecycle management;
evaluation design;
regulatory approval;
human review;
cost attribution;
incident response;
product UX;
workflow correctness.

Your architecture must handle those.

6. Secure Bedrock Invocation Architecture

6.1 Security Layers

Layer	Control
Workforce/application identity	IAM role, IAM Identity Center, Cognito, OIDC, JWT
Bedrock API permission	Least-privilege IAM actions and resource scope where supported
Model access	Explicitly managed model access and allowed model list
Network	VPC interface endpoint/PrivateLink where appropriate
Data	S3/KMS policies, vector store policies, source authorization
Prompt	Prompt template review, injection defenses, variable validation
Guardrails	Content filters, denied topics, sensitive information controls
Logging	Invocation logging with data protection decisions
Audit	Request metadata, source citations, human approval, version IDs

6.2 Least Privilege for Bedrock

Do not grant broad Bedrock permissions to every application.

A production design should define:

which role can invoke which model;
which role can manage Knowledge Bases;
which role can create or update Guardrails;
which role can access invocation logs;
which role can configure model access;
which role can create provisioned throughput;
which role can operate agents/actions.

Model selection is a policy decision, not just an SDK parameter.

7. Network Boundary with PrivateLink

For private workloads, use VPC interface endpoints where appropriate so applications can reach Bedrock APIs privately from a VPC without relying on public IPs, internet gateways, or NAT devices for that service path.

7.1 Private AI Service Pattern

7.2 Network Design Rules

Prefer private subnets for application runtime.
Use VPC endpoints for supported AWS service access when private routing and reduced NAT dependency are desired.
Do not assume PrivateLink solves data authorization; it solves network path control.
Combine endpoint policies, IAM, KMS, and source data policies.
Monitor endpoint errors, DNS resolution, security group rules, and route table behavior.

8. RAG Mental Model

Retrieval Augmented Generation is not “upload documents and ask questions.”

RAG is a data pipeline plus an inference pattern.

Source data → ingestion → parsing → chunking → embedding → indexing → retrieval → prompt assembly → model generation → citation/evidence → feedback/evaluation.

8.1 RAG Design Decisions

Decision	Why It Matters
Source selection	Determines truth boundary
Source authorization	Prevents data leakage
Chunk size	Affects recall, precision, and cost
Chunk overlap	Helps context continuity but increases index size
Embedding model	Affects semantic retrieval quality
Vector store	Affects latency, cost, filtering, operations
Metadata filters	Enforce tenant, classification, jurisdiction, version
Prompt template	Determines how context is used
Citation policy	Supports auditability and user trust
Freshness	Prevents stale answers
Evaluation set	Prevents silent quality degradation

8.2 Knowledge Bases vs Custom RAG

Use Knowledge Bases When	Use Custom RAG When
Managed ingestion/retrieval is enough	Need highly custom retrieval/ranking
Supported data sources fit	Complex authorization model required
Faster time-to-value matters	You need full control over chunking/indexing
Standard enterprise search assistant	Need domain-specific query planning
Managed integration is preferred	Need custom vector store or hybrid search design

Knowledge Bases can accelerate RAG, but source authorization and data governance remain your responsibility.

9. Source Authorization in RAG

The worst RAG failure in enterprise systems is not a bad answer. It is a correct answer from data the user should not see.

9.1 Authorization Invariant

Retrieval must be authorization-aware before generation.

Do not retrieve all relevant chunks and hope the model refuses to reveal unauthorized data.

9.2 Metadata Filter Pattern

Each chunk should carry metadata:

documentId: policy-2026-019
sourceSystem: enforcement-policy-repository
classification: confidential
jurisdiction: ID
tenantId: regulator-a
allowedRoles:
  - senior-investigator
  - legal-reviewer
version: "2026.04"
effectiveFrom: 2026-04-01
effectiveTo: null
retentionClass: regulatory-record

At query time:

authenticate user;
resolve authorization claims;
translate claims into retrieval filters;
retrieve only permitted chunks;
pass permitted chunks to model;
log source IDs and policy version;
return citations only for permitted sources.

9.3 Tenant-Aware Retrieval

For multi-tenant systems:

include tenantId in index metadata;
enforce tenant filter outside the model;
consider separate indexes for high-risk tenants;
isolate encryption keys for sensitive tenants if required;
log retrieval scope;
test cross-tenant leakage continuously.

10. Prompt and Context Engineering

Prompt engineering in production is not clever wording. It is controlled instruction design.

10.1 Prompt Template Contract

A prompt template should have:

name;
version;
owner;
purpose;
model compatibility;
approved variables;
input validation;
safety instructions;
context formatting;
output schema;
evaluation results;
approval status;
rollback version.

Example:

prompt:
  name: enforcement-case-summary
  version: 3.2.1
  owner: regulatory-ai-platform
  modelFamily: anthropic-claude-compatible
  purpose: summarize enforcement case evidence for investigator review
  inputVariables:
    - caseFacts
    - evidenceList
    - jurisdiction
  outputSchema:
    type: object
    required:
      - summary
      - openQuestions
      - citedEvidence
      - confidenceNotes
  safety:
    noLegalConclusion: true
    requireCitations: true
    flagMissingEvidence: true
  approval:
    status: approved
    approvedBy: ai-governance-board

10.2 Prompt Versioning

A prompt change is a production change.

Track:

who changed it;
why it changed;
evaluation result;
impacted use cases;
rollback version;
production release date.

11. Guardrails and Safety

Amazon Bedrock Guardrails provides configurable safeguards for generative AI applications, including controls that can evaluate user inputs and model responses depending on configuration.

11.1 Guardrail Layers

Layer	Example
Input validation	Reject unsupported file type or oversized prompt
Authorization	Ensure user can access requested data
Prompt guard	Detect prompt injection or prohibited requests
Model guardrail	Apply Bedrock Guardrails to input/output
Tool guard	Restrict what actions an agent can perform
Output validation	Enforce JSON schema or citation requirement
Human review	Require approval for high-impact outputs
Audit logging	Store request metadata and guardrail decisions

11.2 Safety Is Not One Control

Guardrails are important, but they are not a complete safety strategy.

A regulated AI platform also needs:

data minimization;
source authorization;
prompt versioning;
evaluation;
red-team testing;
human approval for consequential decisions;
output schema validation;
tool permission boundaries;
incident response;
monitoring for drift and abuse.

11.3 Prompt Injection Defense

Prompt injection is when untrusted input attempts to override instructions or manipulate model behavior.

Defense-in-depth:

separate system instructions from user content;
delimit retrieved content;
never put secrets in prompt context;
retrieve only authorized content;
treat documents as untrusted input;
validate tool calls outside the model;
require output schema validation;
log suspicious patterns;
test with adversarial prompt sets.

12. Agentic Workflows on AWS

Agents are powerful because they can choose actions. They are dangerous for the same reason.

12.1 Agent Risk Model

Risk	Example	Control
Unauthorized action	Agent closes case without approval	Tool IAM boundary + human approval
Wrong tool call	Agent updates wrong record	Deterministic validation before execution
Runaway loop	Agent repeatedly calls expensive tools	Step limit, timeout, budget control
Data exfiltration	Agent sends sensitive data to external endpoint	Network/IAM boundary, egress control
Misleading explanation	Agent invents rationale	Require source evidence and audit trail
Prompt injection	Retrieved doc instructs agent to ignore rules	Prompt isolation and tool validation

12.2 Agent Pattern with Deterministic Workflow

The model may propose. Deterministic code must validate. Humans should approve high-impact actions.

13. SageMaker AI and MLOps Mental Model

SageMaker AI is more relevant when you need custom ML lifecycle control: training, processing, model registry, pipelines, model cards, deployment endpoints, batch transform, and MLOps governance.

13.1 SageMaker Building Blocks

Capability	Use
Processing jobs	Data preprocessing, evaluation, feature generation
Training jobs	Train custom models
Pipelines	Orchestrate ML workflows
Model Registry	Catalog versions and approval status
Model Cards	Document model details and governance information
Endpoints	Real-time inference
Batch Transform	Batch inference
Feature Store	Managed feature storage if used

13.2 MLOps Pipeline

13.3 Model Registry Governance

A registered model version should include:

training data reference;
code version;
hyperparameters;
evaluation metrics;
bias/fairness notes if relevant;
intended use;
limitations;
approval status;
deployment environment;
rollback version;
owner;
expiry/review date.

13.4 Model Cards

Model cards document model details for governance and reporting. For regulated environments, model cards help create a durable record of model purpose, risk, evaluation, limitations, and approval.

14. Choosing Bedrock vs SageMaker vs Custom Hosting

Requirement	Prefer Bedrock	Prefer SageMaker AI	Prefer Custom Hosting
Use managed foundation models	Yes	Maybe	No
Need fast GenAI application launch	Yes	Maybe	Maybe
Need custom model training	Maybe	Yes	Maybe
Need full container/runtime control	No	Maybe	Yes
Need model registry and MLOps	Limited/adjacent	Yes	Custom build required
Need RAG with managed integration	Yes	Maybe	Maybe
Need strict model/provider selection	Yes, via allowed models	Yes	Yes
Need exotic inference optimization	Maybe	Maybe	Yes
Need minimal ops burden	Yes	Medium	No
Need full portability	Low	Medium	High

The correct answer may combine them:

Bedrock for foundation model inference.
SageMaker for custom ML models.
ECS/EKS/Lambda for application orchestration.
S3/OpenSearch/DynamoDB for data and retrieval.
Step Functions for deterministic workflow.

15. AI Observability

AI observability extends normal service observability.

15.1 Standard Service Signals

request count;
error rate;
latency;
saturation;
dependency failures;
deployment events;
trace correlation.

15.2 AI-Specific Signals

Signal	Why It Matters
Input tokens	Cost and context size
Output tokens	Cost and response behavior
Model ID/version	Reproducibility
Prompt template version	Change tracking
Guardrail intervention count	Safety signal
Blocked requests	Abuse or misclassification signal
Retrieval latency	RAG performance
Retrieved document IDs	Audit and debugging
Retrieval hit rate	Relevance quality
Citation coverage	Trust/evidence
Human override rate	Quality and risk indicator
User feedback	Online quality signal
Evaluation score trend	Regression detection

15.3 Logging Caution

Model invocation logging can capture model input and output. That is powerful for debugging and audit, but risky for privacy and compliance.

Before enabling detailed logs, decide:

whether prompts contain PII or confidential data;
whether responses contain sensitive generated content;
log retention;
encryption key;
access policy;
redaction/masking;
audit access;
regional storage requirement;
whether sampling is enough;
whether a separate evidence store is required.

16. Evaluation Strategy

Production AI needs evaluation before and after release.

16.1 Evaluation Types

Evaluation	Purpose
Unit prompt tests	Check prompt behavior for known cases
Golden dataset eval	Detect regression against curated examples
Retrieval eval	Measure whether correct source chunks are retrieved
Safety eval	Test prohibited content, injection, leakage
Human evaluation	Review usefulness and correctness
Online feedback	Capture user ratings, corrections, escalation
Business outcome eval	Measure effect on workflow quality/time/risk

16.2 Evaluation Dataset

For a regulated assistant, include:

normal cases;
edge cases;
incomplete evidence;
conflicting evidence;
outdated policy;
cross-jurisdiction question;
unauthorized data request;
prompt injection attempt;
sensitive personal data;
ambiguous user request;
high-impact decision request;
required refusal cases.

16.3 Release Gate

A prompt/model/retrieval change should require:

evaluation result;
safety test result;
cost estimate;
rollback plan;
owner approval;
deployment window;
monitoring plan.

17. Cost Engineering for AI

AI cost can scale unexpectedly because usage is easy to generate.

17.1 Cost Drivers

Cost Driver	Examples
Input tokens	Long context, large retrieved chunks
Output tokens	Verbose responses
Model choice	Larger models typically cost more
Invocation count	Chatty UX, retries, agents
Provisioned throughput	Reserved capacity billing
Cross-Region inference	Architecture and pricing implications
Vector store	Index size, replicas, queries
Ingestion pipeline	Embedding and parsing jobs
Logs	Full prompt/response logging volume
SageMaker endpoints	Always-on instances or serverless inference usage
Evaluation	Repeated offline eval across models/prompts

17.2 Cost Controls

limit max input length;
limit max output tokens;
choose model tier by task complexity;
cache deterministic or repeated responses where safe;
summarize conversation history;
retrieve fewer but better chunks;
set agent step limits;
apply per-user/per-tenant quotas;
monitor token usage;
allocate cost tags;
set budgets;
use provisioned throughput only when usage pattern justifies commitment;
avoid logging unnecessary full payloads;
periodically delete stale indexes and test environments.

17.3 Model Routing

Use task-based routing:

Task	Model Strategy
Simple classification	Smaller/cheaper model
Draft generation	Medium model
Complex reasoning	Stronger model
Embedding	Embedding-specific model
High-risk final decision	Human review, not model-only

Do not send every request to the most expensive model by default.

18. Capacity, Throughput, and Reliability

AI workloads introduce dependency uncertainty.

18.1 Failure Modes

Failure Mode	Symptom	Mitigation
Throttling	429/rate exceeded	Backoff, quota request, provisioned throughput, model routing
High latency	Slow response	Streaming, shorter context, smaller model, async workflow
Regional capacity issue	Increased errors	Cross-Region inference if residency allows, fallback model
Retrieval outage	RAG cannot answer	Degraded response, cached docs, incident route
Guardrail false positive	Valid answer blocked	Review policy, exception workflow, human fallback
Guardrail false negative	Unsafe content passes	Eval/red-team, layered controls
Prompt regression	Worse answer after prompt change	Versioning, eval gate, rollback
Tool failure	Agent cannot complete action	Retry policy, compensation, human handoff

18.2 Timeout Budget

Example synchronous AI API latency budget:

AuthN/AuthZ:           50 ms
Request validation:    20 ms
Retrieval:            300 ms
Prompt assembly:       20 ms
Guardrail input:      150 ms
Model inference:    2,500 ms
Guardrail output:     150 ms
Response formatting:   30 ms
Network overhead:     100 ms
--------------------------
Total target:       3,320 ms

If the model may exceed UX tolerance, use async workflow or streaming.

19. Human-in-the-Loop Design

Not all AI outputs should directly affect business state.

19.1 Human Review Triggers

Require human review when:

output affects legal/regulatory rights;
confidence/evaluation score is low;
retrieved evidence is insufficient;
user asks for a high-impact decision;
model detects conflicting sources;
action changes production data;
output contains sensitive classification;
system is in degraded mode;
policy requires approval.

19.2 Review Record

review:
  requestId: ai-req-2026-0710-00031
  reviewer: investigator-42
  modelId: bedrock-model-id
  promptVersion: enforcement-summary-3.2.1
  sourceDocuments:
    - policy-2026-019
    - case-note-3819
  modelOutputHash: sha256:...
  decision: approved-with-edits
  editsSummary: removed unsupported conclusion
  reviewedAt: 2026-07-01T10:42:00+07:00

For regulated systems, the human review record is often as important as the AI output.

20. AI Security Threat Model

Threat	Description	Control
Prompt injection	User/source text manipulates model	Prompt isolation, guardrails, eval, output validation
Data leakage	Model sees or returns unauthorized data	Authorization-aware retrieval, IAM, KMS, metadata filters
Retrieval poisoning	Bad source content enters index	source approval, ingestion validation, provenance
Tool abuse	Agent invokes unsafe action	IAM boundary, deterministic validation, human approval
Secret exposure	Secret placed in prompt/log	never inject secrets, log redaction, scanner
Model overreliance	User trusts unsupported answer	citations, confidence notes, human review
Cost abuse	User generates large token usage	quotas, throttling, budgets
Model/provider change	Behavior changes unexpectedly	eval, model pinning where possible, release gates
Log exposure	Sensitive prompts in logs	encryption, access control, retention, masking
Cross-tenant leakage	Tenant A retrieves Tenant B data	tenant metadata filters, separate indexes, tests

21. Governance Model

An AI platform needs governance without blocking all experimentation.

21.1 Risk Tiers

Tier	Example	Required Controls
Low	Draft marketing copy from public content	Basic logging, cost controls
Medium	Internal knowledge assistant	AuthZ-aware retrieval, guardrails, eval
High	Regulatory case summary	citations, human review, audit evidence, approved prompts
Critical	Automated enforcement decision	Usually avoid full automation; require deterministic rules and formal governance

21.2 AI Change Types

Change	Treat As
Prompt template update	Production change
Model switch	Production change with eval
Retrieval source addition	Data governance change
Guardrail policy update	Safety control change
Tool/action addition	Security and workflow change
Embedding model change	Retrieval quality change
Chunking strategy change	Retrieval quality and cost change

21.3 Evidence Artifacts

A governed AI release should produce:

architecture diagram;
model list;
prompt versions;
data source inventory;
risk assessment;
evaluation report;
red-team/safety test report;
guardrail configuration;
logging/retention decision;
human review policy;
rollback plan;
owner approval.

22. Golden Path: Regulated RAG Assistant

22.1 Use Case

An internal assistant helps investigators summarize enforcement case documents and relevant policy. It must not make final legal conclusions. It must cite sources. It must respect user authorization.

22.2 Architecture

22.3 Platform Contract

aiService:
  name: enforcement-rag-assistant
  owner: regulatory-ai-platform
  riskTier: high
  allowedModels:
    - approved-bedrock-model-family
  dataSources:
    - enforcement-policy-repository
    - case-document-store
  retrieval:
    authorizationAware: true
    requireCitations: true
    maxChunks: 8
  guardrails:
    input: enabled
    output: enabled
    piiHandling: mask
    deniedTopics:
      - final-legal-determination
  humanReview:
    requiredFor:
      - low-confidence
      - missing-citation
      - high-impact-recommendation
  observability:
    invocationLogging: restricted
    metrics:
      - latency
      - tokenUsage
      - retrievalHitRate
      - guardrailIntervention
      - humanOverrideRate
  cost:
    perUserQuota: true
    perTenantBudget: true

23. Production Readiness Checklist

Before releasing an AI workload:

24. Anti-Patterns

24.1 Model-First Architecture

The team starts by choosing a model and then tries to retrofit data, security, evaluation, and UX.

Fix:

Start with risk, data, user journey, correctness requirement, and operational constraints.
Then choose the model.

24.2 RAG Without Authorization

All documents are indexed together and the model is expected not to reveal sensitive content.

Fix:

enforce authorization at retrieval;
use metadata filters;
separate indexes where risk requires;
test leakage.

24.3 No Evaluation Set

Prompt changes are tested by “looks good to me.”

Fix:

maintain golden test cases;
run eval before release;
track regression.

24.4 Logging Everything Forever

The platform logs full prompts and responses indefinitely.

Fix:

classify data;
minimize logs;
mask where possible;
define retention;
restrict access;
encrypt logs.

24.5 Agent With Broad Permissions

An agent can perform many actions using an overpowered role.

Fix:

narrow tools;
use action-specific roles;
validate arguments;
require human approval for high-impact actions;
set time/step/budget limits.

24.6 AI as Decision Maker Without Accountability

The model makes consequential decisions without human review or deterministic rules.

Fix:

use AI as decision support;
require human approval;
log evidence;
make deterministic policy checks explicit.

25. Failure Modes

Failure Mode	Symptom	Root Cause	Prevention
Hallucinated answer	Unsupported claim	Weak prompt/eval/context	Require citations, eval, human review
Data leak	User sees unauthorized data	Retrieval not authorization-aware	Metadata filters, separate indexes, tests
Prompt injection success	Model follows malicious document	Untrusted source not isolated	Delimit context, guardrails, output validation
Silent quality regression	New prompt worse	No eval gate	Golden dataset and release gate
Cost spike	Token usage jumps	Long context or runaway agents	quotas, budgets, max tokens, step limits
Latency spike	Poor UX	large context/model/Region issue	model routing, streaming, async, capacity planning
Guardrail overblocking	Valid work blocked	Policy too broad	review metrics and tune policies
Guardrail underblocking	Unsafe output	Weak safety tests	red-team suite and layered controls
Stale retrieval	Old policy used	ingestion lag or version issue	freshness metadata, ingestion monitoring
Untraceable decision	Audit cannot reconstruct	Missing logs/evidence	prompt/model/source/version logging

26. Deliberate Practice

Practice 1: Secure Bedrock Invocation

Design a secure Bedrock invocation path for an internal application.

Required output:

IAM role design;
allowed models;
network path;
logging decision;
guardrail configuration;
cost controls;
failure modes.

Practice 2: RAG Authorization Design

Design RAG for confidential policy documents.

Required output:

document metadata schema;
authorization filter model;
chunking strategy;
retrieval evaluation plan;
leakage test cases.

Practice 3: AI Evaluation Gate

Create an evaluation gate for prompt/model changes.

Required output:

test categories;
golden dataset structure;
pass/fail thresholds;
rollback criteria;
approval workflow.

Practice 4: Regulated AI Runbook

Create a runbook for “assistant returns unsupported regulatory conclusion.”

Required output:

detection signals;
triage steps;
immediate mitigation;
user communication;
evidence collection;
rollback/update path;
post-incident review questions.

27. Self-Correction Questions

Use these questions to test your design:

Does the system retrieve only data the user is authorized to see?
Can you identify which model, prompt version, and sources produced an answer?
Can you roll back a prompt change?
Are guardrails tested against known failure cases?
Is there an evaluation set for each critical use case?
Are token usage and cost visible by service/user/tenant?
Are logs safe to store under the chosen retention policy?
Can the system degrade gracefully when Bedrock/retrieval is unavailable?
Are high-impact actions validated outside the model?
Is human review required where business/regulatory risk demands it?

28. Engineering Judgment Summary

A production AI platform on AWS is not a model wrapper.

It is an operational system where data governance, model access, prompt control, retrieval authorization, guardrails, evaluation, observability, cost control, and human accountability work together.

The winning mental model:

Do not trust the model as the boundary.
Build deterministic boundaries around the model.

Amazon Bedrock can reduce the burden of foundation model access and inference operations. SageMaker AI can support custom ML and MLOps lifecycle. But neither removes the need for engineering judgment.

For regulated systems, AI should usually begin as assistive, evidence-producing, and reviewable. Automation can increase only when the platform can prove correctness, safety, auditability, and accountability.

29. References

Amazon Bedrock User Guide: https://docs.aws.amazon.com/bedrock/latest/userguide/what-is-bedrock.html
Amazon Bedrock Inference: https://docs.aws.amazon.com/bedrock/latest/userguide/inference.html
Amazon Bedrock Guardrails: https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails.html
Amazon Bedrock Guardrails How It Works: https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails-how.html
Amazon Bedrock Model Invocation Logging: https://docs.aws.amazon.com/bedrock/latest/userguide/model-invocation-logging.html
Amazon Bedrock Knowledge Bases: https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base.html
Amazon Bedrock PrivateLink/VPC Endpoints: https://docs.aws.amazon.com/bedrock/latest/userguide/vpc-interface-endpoints.html
Amazon Bedrock IAM: https://docs.aws.amazon.com/bedrock/latest/userguide/security-iam.html
Amazon Bedrock Provisioned Throughput: https://docs.aws.amazon.com/bedrock/latest/userguide/prov-throughput.html
Amazon Bedrock Cross-Region Inference: https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference.html
Amazon SageMaker AI Pipelines: https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines.html
Amazon SageMaker Model Registry: https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry.html
Amazon SageMaker Model Cards: https://docs.aws.amazon.com/sagemaker/latest/dg/model-cards.html
AWS Well-Architected Machine Learning Lens: https://docs.aws.amazon.com/wellarchitected/latest/machine-learning-lens/welcome.html

Lesson Recap

You just completed lesson 34 in final stretch. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 33

Learn Aws Part 033 Platform Engineering Golden Paths And Internal Developer Platforms

Next Lesson

Lesson 35

Learn Aws Part 035 Capstone Regulated Enterprise Platform On Aws