Final StretchOrdered learning track

AWS for AI/ML and Bedrock Production Platforms

Learn AWS Engineering Mastery - Part 034

Production AI/ML platform design on AWS using Amazon Bedrock, SageMaker AI, RAG, guardrails, model invocation logging, IAM, PrivateLink, evaluation, observability, cost controls, and governance.

25 min read4825 words
PrevNext
Lesson 3435 lesson track3035 Final Stretch
#aws#cloud#ai#ml+6 more

Learn AWS Engineering Mastery - Part 034

AWS for AI/ML and Bedrock Production Platforms

AI/ML on AWS is not only about calling a model.

A production AI platform must solve identity, data access, model access, inference reliability, prompt safety, retrieval quality, evaluation, observability, cost control, human oversight, audit evidence, and incident response.

The dangerous beginner assumption is:

AI platform = model endpoint + prompt.

The production-grade mental model is:

AI platform = governed data + controlled model access + safe inference boundary + measurable quality + observable runtime + accountable operations.

This part teaches how to design AI/ML and generative-AI platforms on AWS using Amazon Bedrock, SageMaker AI, Knowledge Bases, Guardrails, model invocation logging, PrivateLink, IAM, vector retrieval, evaluation, MLOps, and operational governance.


1. Target Skill

After this part, you should be able to:

  • distinguish classic ML workloads, generative AI workloads, RAG systems, agentic workflows, and AI-assisted business processes;
  • explain the production boundaries of Amazon Bedrock and SageMaker AI;
  • design a secure Bedrock invocation path with IAM, model access, logging, encryption, network controls, and guardrails;
  • design RAG using Knowledge Bases or custom retrieval while preserving data classification and access control;
  • reason about prompt injection, data leakage, retrieval poisoning, hallucination, model drift, and tool misuse;
  • define observability for AI workloads: latency, errors, tokens, cost, retrieval hit rate, guardrail interventions, quality metrics, and human feedback;
  • design cost controls for token-based inference, provisioned throughput, vector storage, pipelines, and endpoints;
  • design model governance using model registry, model cards, approval workflows, evaluation reports, and audit trails;
  • decide when to use Bedrock, SageMaker managed endpoints, custom containers, or external model providers;
  • create a golden path for regulated AI applications on AWS.

2. Kaufman Skill Decomposition

AI/ML platform engineering is too broad to learn as one topic. Decompose it into practical sub-skills.

First 20 Hours Focus

TimeboxFocusPractice Output
2hAI workload taxonomyClassify 10 use cases: ML, GenAI, RAG, agent, workflow automation
3hBedrock invocation pathDraw secure invocation architecture
3hRAG designBuild retrieval mental model: source → chunks → embeddings → retrieval → answer
3hSafety modelDefine threats and guardrails for one regulated assistant
3hObservabilityDefine metrics/logs/traces/eval dashboard
2hCost modelEstimate cost drivers for tokens, vector storage, endpoints, pipelines
2hGovernanceDefine approval workflow, model card, evidence, and human review
2hGolden pathCreate a production AI service template

The goal is not to become a data scientist in 20 hours. The goal is to become dangerous in the right direction: able to design safe and operable AI systems without magical thinking.


3. Core Mental Model

AI applications have two planes:

AI control plane: governs models, data sources, prompts, policies, evaluations, approvals, and evidence.
AI data plane: handles user requests, retrieval, model invocation, tool calls, responses, logging, and runtime safety.

A production AI platform must control both planes.


4. AI Workload Taxonomy

Before selecting services, classify the workload.

Workload TypeExampleMain RiskCommon AWS Building Blocks
Classic supervised MLFraud score, risk scoreTraining data quality, driftSageMaker AI, Pipelines, Model Registry
Batch MLNightly predictionStale output, pipeline failureSageMaker Processing/Batch Transform, Step Functions
Real-time ML inferenceEligibility decisionLatency, scaling, explainabilitySageMaker endpoint, Lambda/ECS wrapper
Generative AISummarization, draftingHallucination, sensitive dataAmazon Bedrock, Guardrails, logging
RAGPolicy assistant over internal docsWrong retrieval, stale source, leakageBedrock Knowledge Bases, OpenSearch Serverless, S3
Agentic workflowAssistant taking actionsTool misuse, runaway loopsBedrock Agents, Step Functions, IAM boundaries
AI-assisted human workflowCase triage recommendationOverreliance, auditabilityBedrock/SageMaker + human approval + evidence store

A regulated platform usually needs human accountability even when AI assists the workflow.


5. Amazon Bedrock Mental Model

Amazon Bedrock is a managed foundation model platform. It allows applications to invoke foundation models and use related capabilities such as Knowledge Bases, Agents, Guardrails, model customization, model invocation logging, and provisioned throughput depending on model and Region support.

The critical mental model:

Bedrock does not remove architecture responsibility.
It moves model hosting responsibility to AWS/model providers while leaving application, data, authorization, prompt, evaluation, and business accountability with you.

5.1 Bedrock Building Blocks

CapabilityUse
Foundation modelsText, image, embeddings, reasoning, coding, multimodal use cases depending on model availability
InvokeModel / Converse APIsRuntime model invocation
Knowledge BasesManaged RAG over supported data sources/vector stores
AgentsOrchestrate model reasoning with actions/tools
GuardrailsApply configurable safety/privacy filters to inputs and outputs
Model invocation loggingCapture invocation input/output metadata for monitoring/audit if enabled
Provisioned ThroughputReserve model invocation capacity for supported models/use cases
Cross-Region inference profilesRoute inference to supported Regions for throughput/performance when residency allows
PrivateLinkPrivate VPC connectivity to Bedrock endpoints

5.2 Bedrock Is Not a Complete App Platform

Bedrock does not automatically solve:

  • application authentication;
  • business authorization;
  • tenant isolation;
  • source document authorization;
  • prompt lifecycle management;
  • evaluation design;
  • regulatory approval;
  • human review;
  • cost attribution;
  • incident response;
  • product UX;
  • workflow correctness.

Your architecture must handle those.


6. Secure Bedrock Invocation Architecture

6.1 Security Layers

LayerControl
Workforce/application identityIAM role, IAM Identity Center, Cognito, OIDC, JWT
Bedrock API permissionLeast-privilege IAM actions and resource scope where supported
Model accessExplicitly managed model access and allowed model list
NetworkVPC interface endpoint/PrivateLink where appropriate
DataS3/KMS policies, vector store policies, source authorization
PromptPrompt template review, injection defenses, variable validation
GuardrailsContent filters, denied topics, sensitive information controls
LoggingInvocation logging with data protection decisions
AuditRequest metadata, source citations, human approval, version IDs

6.2 Least Privilege for Bedrock

Do not grant broad Bedrock permissions to every application.

A production design should define:

  • which role can invoke which model;
  • which role can manage Knowledge Bases;
  • which role can create or update Guardrails;
  • which role can access invocation logs;
  • which role can configure model access;
  • which role can create provisioned throughput;
  • which role can operate agents/actions.

Model selection is a policy decision, not just an SDK parameter.


For private workloads, use VPC interface endpoints where appropriate so applications can reach Bedrock APIs privately from a VPC without relying on public IPs, internet gateways, or NAT devices for that service path.

7.1 Private AI Service Pattern

7.2 Network Design Rules

  • Prefer private subnets for application runtime.
  • Use VPC endpoints for supported AWS service access when private routing and reduced NAT dependency are desired.
  • Do not assume PrivateLink solves data authorization; it solves network path control.
  • Combine endpoint policies, IAM, KMS, and source data policies.
  • Monitor endpoint errors, DNS resolution, security group rules, and route table behavior.

8. RAG Mental Model

Retrieval Augmented Generation is not “upload documents and ask questions.”

RAG is a data pipeline plus an inference pattern.

Source data → ingestion → parsing → chunking → embedding → indexing → retrieval → prompt assembly → model generation → citation/evidence → feedback/evaluation.

8.1 RAG Design Decisions

DecisionWhy It Matters
Source selectionDetermines truth boundary
Source authorizationPrevents data leakage
Chunk sizeAffects recall, precision, and cost
Chunk overlapHelps context continuity but increases index size
Embedding modelAffects semantic retrieval quality
Vector storeAffects latency, cost, filtering, operations
Metadata filtersEnforce tenant, classification, jurisdiction, version
Prompt templateDetermines how context is used
Citation policySupports auditability and user trust
FreshnessPrevents stale answers
Evaluation setPrevents silent quality degradation

8.2 Knowledge Bases vs Custom RAG

Use Knowledge Bases WhenUse Custom RAG When
Managed ingestion/retrieval is enoughNeed highly custom retrieval/ranking
Supported data sources fitComplex authorization model required
Faster time-to-value mattersYou need full control over chunking/indexing
Standard enterprise search assistantNeed domain-specific query planning
Managed integration is preferredNeed custom vector store or hybrid search design

Knowledge Bases can accelerate RAG, but source authorization and data governance remain your responsibility.


9. Source Authorization in RAG

The worst RAG failure in enterprise systems is not a bad answer. It is a correct answer from data the user should not see.

9.1 Authorization Invariant

Retrieval must be authorization-aware before generation.

Do not retrieve all relevant chunks and hope the model refuses to reveal unauthorized data.

9.2 Metadata Filter Pattern

Each chunk should carry metadata:

documentId: policy-2026-019
sourceSystem: enforcement-policy-repository
classification: confidential
jurisdiction: ID
tenantId: regulator-a
allowedRoles:
  - senior-investigator
  - legal-reviewer
version: "2026.04"
effectiveFrom: 2026-04-01
effectiveTo: null
retentionClass: regulatory-record

At query time:

  1. authenticate user;
  2. resolve authorization claims;
  3. translate claims into retrieval filters;
  4. retrieve only permitted chunks;
  5. pass permitted chunks to model;
  6. log source IDs and policy version;
  7. return citations only for permitted sources.

9.3 Tenant-Aware Retrieval

For multi-tenant systems:

  • include tenantId in index metadata;
  • enforce tenant filter outside the model;
  • consider separate indexes for high-risk tenants;
  • isolate encryption keys for sensitive tenants if required;
  • log retrieval scope;
  • test cross-tenant leakage continuously.

10. Prompt and Context Engineering

Prompt engineering in production is not clever wording. It is controlled instruction design.

10.1 Prompt Template Contract

A prompt template should have:

  • name;
  • version;
  • owner;
  • purpose;
  • model compatibility;
  • approved variables;
  • input validation;
  • safety instructions;
  • context formatting;
  • output schema;
  • evaluation results;
  • approval status;
  • rollback version.

Example:

prompt:
  name: enforcement-case-summary
  version: 3.2.1
  owner: regulatory-ai-platform
  modelFamily: anthropic-claude-compatible
  purpose: summarize enforcement case evidence for investigator review
  inputVariables:
    - caseFacts
    - evidenceList
    - jurisdiction
  outputSchema:
    type: object
    required:
      - summary
      - openQuestions
      - citedEvidence
      - confidenceNotes
  safety:
    noLegalConclusion: true
    requireCitations: true
    flagMissingEvidence: true
  approval:
    status: approved
    approvedBy: ai-governance-board

10.2 Prompt Versioning

A prompt change is a production change.

Track:

  • who changed it;
  • why it changed;
  • evaluation result;
  • impacted use cases;
  • rollback version;
  • production release date.

11. Guardrails and Safety

Amazon Bedrock Guardrails provides configurable safeguards for generative AI applications, including controls that can evaluate user inputs and model responses depending on configuration.

11.1 Guardrail Layers

LayerExample
Input validationReject unsupported file type or oversized prompt
AuthorizationEnsure user can access requested data
Prompt guardDetect prompt injection or prohibited requests
Model guardrailApply Bedrock Guardrails to input/output
Tool guardRestrict what actions an agent can perform
Output validationEnforce JSON schema or citation requirement
Human reviewRequire approval for high-impact outputs
Audit loggingStore request metadata and guardrail decisions

11.2 Safety Is Not One Control

Guardrails are important, but they are not a complete safety strategy.

A regulated AI platform also needs:

  • data minimization;
  • source authorization;
  • prompt versioning;
  • evaluation;
  • red-team testing;
  • human approval for consequential decisions;
  • output schema validation;
  • tool permission boundaries;
  • incident response;
  • monitoring for drift and abuse.

11.3 Prompt Injection Defense

Prompt injection is when untrusted input attempts to override instructions or manipulate model behavior.

Defense-in-depth:

  • separate system instructions from user content;
  • delimit retrieved content;
  • never put secrets in prompt context;
  • retrieve only authorized content;
  • treat documents as untrusted input;
  • validate tool calls outside the model;
  • require output schema validation;
  • log suspicious patterns;
  • test with adversarial prompt sets.

12. Agentic Workflows on AWS

Agents are powerful because they can choose actions. They are dangerous for the same reason.

12.1 Agent Risk Model

RiskExampleControl
Unauthorized actionAgent closes case without approvalTool IAM boundary + human approval
Wrong tool callAgent updates wrong recordDeterministic validation before execution
Runaway loopAgent repeatedly calls expensive toolsStep limit, timeout, budget control
Data exfiltrationAgent sends sensitive data to external endpointNetwork/IAM boundary, egress control
Misleading explanationAgent invents rationaleRequire source evidence and audit trail
Prompt injectionRetrieved doc instructs agent to ignore rulesPrompt isolation and tool validation

12.2 Agent Pattern with Deterministic Workflow

The model may propose. Deterministic code must validate. Humans should approve high-impact actions.


13. SageMaker AI and MLOps Mental Model

SageMaker AI is more relevant when you need custom ML lifecycle control: training, processing, model registry, pipelines, model cards, deployment endpoints, batch transform, and MLOps governance.

13.1 SageMaker Building Blocks

CapabilityUse
Processing jobsData preprocessing, evaluation, feature generation
Training jobsTrain custom models
PipelinesOrchestrate ML workflows
Model RegistryCatalog versions and approval status
Model CardsDocument model details and governance information
EndpointsReal-time inference
Batch TransformBatch inference
Feature StoreManaged feature storage if used

13.2 MLOps Pipeline

13.3 Model Registry Governance

A registered model version should include:

  • training data reference;
  • code version;
  • hyperparameters;
  • evaluation metrics;
  • bias/fairness notes if relevant;
  • intended use;
  • limitations;
  • approval status;
  • deployment environment;
  • rollback version;
  • owner;
  • expiry/review date.

13.4 Model Cards

Model cards document model details for governance and reporting. For regulated environments, model cards help create a durable record of model purpose, risk, evaluation, limitations, and approval.


14. Choosing Bedrock vs SageMaker vs Custom Hosting

RequirementPrefer BedrockPrefer SageMaker AIPrefer Custom Hosting
Use managed foundation modelsYesMaybeNo
Need fast GenAI application launchYesMaybeMaybe
Need custom model trainingMaybeYesMaybe
Need full container/runtime controlNoMaybeYes
Need model registry and MLOpsLimited/adjacentYesCustom build required
Need RAG with managed integrationYesMaybeMaybe
Need strict model/provider selectionYes, via allowed modelsYesYes
Need exotic inference optimizationMaybeMaybeYes
Need minimal ops burdenYesMediumNo
Need full portabilityLowMediumHigh

The correct answer may combine them:

Bedrock for foundation model inference.
SageMaker for custom ML models.
ECS/EKS/Lambda for application orchestration.
S3/OpenSearch/DynamoDB for data and retrieval.
Step Functions for deterministic workflow.

15. AI Observability

AI observability extends normal service observability.

15.1 Standard Service Signals

  • request count;
  • error rate;
  • latency;
  • saturation;
  • dependency failures;
  • deployment events;
  • trace correlation.

15.2 AI-Specific Signals

SignalWhy It Matters
Input tokensCost and context size
Output tokensCost and response behavior
Model ID/versionReproducibility
Prompt template versionChange tracking
Guardrail intervention countSafety signal
Blocked requestsAbuse or misclassification signal
Retrieval latencyRAG performance
Retrieved document IDsAudit and debugging
Retrieval hit rateRelevance quality
Citation coverageTrust/evidence
Human override rateQuality and risk indicator
User feedbackOnline quality signal
Evaluation score trendRegression detection

15.3 Logging Caution

Model invocation logging can capture model input and output. That is powerful for debugging and audit, but risky for privacy and compliance.

Before enabling detailed logs, decide:

  • whether prompts contain PII or confidential data;
  • whether responses contain sensitive generated content;
  • log retention;
  • encryption key;
  • access policy;
  • redaction/masking;
  • audit access;
  • regional storage requirement;
  • whether sampling is enough;
  • whether a separate evidence store is required.

16. Evaluation Strategy

Production AI needs evaluation before and after release.

16.1 Evaluation Types

EvaluationPurpose
Unit prompt testsCheck prompt behavior for known cases
Golden dataset evalDetect regression against curated examples
Retrieval evalMeasure whether correct source chunks are retrieved
Safety evalTest prohibited content, injection, leakage
Human evaluationReview usefulness and correctness
Online feedbackCapture user ratings, corrections, escalation
Business outcome evalMeasure effect on workflow quality/time/risk

16.2 Evaluation Dataset

For a regulated assistant, include:

  • normal cases;
  • edge cases;
  • incomplete evidence;
  • conflicting evidence;
  • outdated policy;
  • cross-jurisdiction question;
  • unauthorized data request;
  • prompt injection attempt;
  • sensitive personal data;
  • ambiguous user request;
  • high-impact decision request;
  • required refusal cases.

16.3 Release Gate

A prompt/model/retrieval change should require:

  • evaluation result;
  • safety test result;
  • cost estimate;
  • rollback plan;
  • owner approval;
  • deployment window;
  • monitoring plan.

17. Cost Engineering for AI

AI cost can scale unexpectedly because usage is easy to generate.

17.1 Cost Drivers

Cost DriverExamples
Input tokensLong context, large retrieved chunks
Output tokensVerbose responses
Model choiceLarger models typically cost more
Invocation countChatty UX, retries, agents
Provisioned throughputReserved capacity billing
Cross-Region inferenceArchitecture and pricing implications
Vector storeIndex size, replicas, queries
Ingestion pipelineEmbedding and parsing jobs
LogsFull prompt/response logging volume
SageMaker endpointsAlways-on instances or serverless inference usage
EvaluationRepeated offline eval across models/prompts

17.2 Cost Controls

  • limit max input length;
  • limit max output tokens;
  • choose model tier by task complexity;
  • cache deterministic or repeated responses where safe;
  • summarize conversation history;
  • retrieve fewer but better chunks;
  • set agent step limits;
  • apply per-user/per-tenant quotas;
  • monitor token usage;
  • allocate cost tags;
  • set budgets;
  • use provisioned throughput only when usage pattern justifies commitment;
  • avoid logging unnecessary full payloads;
  • periodically delete stale indexes and test environments.

17.3 Model Routing

Use task-based routing:

TaskModel Strategy
Simple classificationSmaller/cheaper model
Draft generationMedium model
Complex reasoningStronger model
EmbeddingEmbedding-specific model
High-risk final decisionHuman review, not model-only

Do not send every request to the most expensive model by default.


18. Capacity, Throughput, and Reliability

AI workloads introduce dependency uncertainty.

18.1 Failure Modes

Failure ModeSymptomMitigation
Throttling429/rate exceededBackoff, quota request, provisioned throughput, model routing
High latencySlow responseStreaming, shorter context, smaller model, async workflow
Regional capacity issueIncreased errorsCross-Region inference if residency allows, fallback model
Retrieval outageRAG cannot answerDegraded response, cached docs, incident route
Guardrail false positiveValid answer blockedReview policy, exception workflow, human fallback
Guardrail false negativeUnsafe content passesEval/red-team, layered controls
Prompt regressionWorse answer after prompt changeVersioning, eval gate, rollback
Tool failureAgent cannot complete actionRetry policy, compensation, human handoff

18.2 Timeout Budget

Example synchronous AI API latency budget:

AuthN/AuthZ:           50 ms
Request validation:    20 ms
Retrieval:            300 ms
Prompt assembly:       20 ms
Guardrail input:      150 ms
Model inference:    2,500 ms
Guardrail output:     150 ms
Response formatting:   30 ms
Network overhead:     100 ms
--------------------------
Total target:       3,320 ms

If the model may exceed UX tolerance, use async workflow or streaming.


19. Human-in-the-Loop Design

Not all AI outputs should directly affect business state.

19.1 Human Review Triggers

Require human review when:

  • output affects legal/regulatory rights;
  • confidence/evaluation score is low;
  • retrieved evidence is insufficient;
  • user asks for a high-impact decision;
  • model detects conflicting sources;
  • action changes production data;
  • output contains sensitive classification;
  • system is in degraded mode;
  • policy requires approval.

19.2 Review Record

review:
  requestId: ai-req-2026-0710-00031
  reviewer: investigator-42
  modelId: bedrock-model-id
  promptVersion: enforcement-summary-3.2.1
  sourceDocuments:
    - policy-2026-019
    - case-note-3819
  modelOutputHash: sha256:...
  decision: approved-with-edits
  editsSummary: removed unsupported conclusion
  reviewedAt: 2026-07-01T10:42:00+07:00

For regulated systems, the human review record is often as important as the AI output.


20. AI Security Threat Model

ThreatDescriptionControl
Prompt injectionUser/source text manipulates modelPrompt isolation, guardrails, eval, output validation
Data leakageModel sees or returns unauthorized dataAuthorization-aware retrieval, IAM, KMS, metadata filters
Retrieval poisoningBad source content enters indexsource approval, ingestion validation, provenance
Tool abuseAgent invokes unsafe actionIAM boundary, deterministic validation, human approval
Secret exposureSecret placed in prompt/lognever inject secrets, log redaction, scanner
Model overrelianceUser trusts unsupported answercitations, confidence notes, human review
Cost abuseUser generates large token usagequotas, throttling, budgets
Model/provider changeBehavior changes unexpectedlyeval, model pinning where possible, release gates
Log exposureSensitive prompts in logsencryption, access control, retention, masking
Cross-tenant leakageTenant A retrieves Tenant B datatenant metadata filters, separate indexes, tests

21. Governance Model

An AI platform needs governance without blocking all experimentation.

21.1 Risk Tiers

TierExampleRequired Controls
LowDraft marketing copy from public contentBasic logging, cost controls
MediumInternal knowledge assistantAuthZ-aware retrieval, guardrails, eval
HighRegulatory case summarycitations, human review, audit evidence, approved prompts
CriticalAutomated enforcement decisionUsually avoid full automation; require deterministic rules and formal governance

21.2 AI Change Types

ChangeTreat As
Prompt template updateProduction change
Model switchProduction change with eval
Retrieval source additionData governance change
Guardrail policy updateSafety control change
Tool/action additionSecurity and workflow change
Embedding model changeRetrieval quality change
Chunking strategy changeRetrieval quality and cost change

21.3 Evidence Artifacts

A governed AI release should produce:

  • architecture diagram;
  • model list;
  • prompt versions;
  • data source inventory;
  • risk assessment;
  • evaluation report;
  • red-team/safety test report;
  • guardrail configuration;
  • logging/retention decision;
  • human review policy;
  • rollback plan;
  • owner approval.

22. Golden Path: Regulated RAG Assistant

22.1 Use Case

An internal assistant helps investigators summarize enforcement case documents and relevant policy. It must not make final legal conclusions. It must cite sources. It must respect user authorization.

22.2 Architecture

22.3 Platform Contract

aiService:
  name: enforcement-rag-assistant
  owner: regulatory-ai-platform
  riskTier: high
  allowedModels:
    - approved-bedrock-model-family
  dataSources:
    - enforcement-policy-repository
    - case-document-store
  retrieval:
    authorizationAware: true
    requireCitations: true
    maxChunks: 8
  guardrails:
    input: enabled
    output: enabled
    piiHandling: mask
    deniedTopics:
      - final-legal-determination
  humanReview:
    requiredFor:
      - low-confidence
      - missing-citation
      - high-impact-recommendation
  observability:
    invocationLogging: restricted
    metrics:
      - latency
      - tokenUsage
      - retrievalHitRate
      - guardrailIntervention
      - humanOverrideRate
  cost:
    perUserQuota: true
    perTenantBudget: true

23. Production Readiness Checklist

Before releasing an AI workload:

  • Use case is classified by risk tier.
  • Model/provider list is approved.
  • IAM permissions are least privilege.
  • Network path is defined.
  • Data sources are inventoried.
  • Source authorization is enforced before retrieval.
  • Prompt templates are versioned.
  • Guardrails are configured and tested.
  • Output schema validation exists where applicable.
  • Evaluation dataset exists.
  • Regression tests run in CI/CD.
  • Prompt injection tests exist.
  • Cost quotas and budgets exist.
  • Token usage metrics exist.
  • Invocation logging decision is documented.
  • Sensitive logs are encrypted and access-controlled.
  • Human review policy exists.
  • Rollback path exists.
  • Incident runbook exists.
  • Evidence artifacts are stored.
  • Users are informed of limitations.

24. Anti-Patterns

24.1 Model-First Architecture

The team starts by choosing a model and then tries to retrofit data, security, evaluation, and UX.

Fix:

Start with risk, data, user journey, correctness requirement, and operational constraints.
Then choose the model.

24.2 RAG Without Authorization

All documents are indexed together and the model is expected not to reveal sensitive content.

Fix:

  • enforce authorization at retrieval;
  • use metadata filters;
  • separate indexes where risk requires;
  • test leakage.

24.3 No Evaluation Set

Prompt changes are tested by “looks good to me.”

Fix:

  • maintain golden test cases;
  • run eval before release;
  • track regression.

24.4 Logging Everything Forever

The platform logs full prompts and responses indefinitely.

Fix:

  • classify data;
  • minimize logs;
  • mask where possible;
  • define retention;
  • restrict access;
  • encrypt logs.

24.5 Agent With Broad Permissions

An agent can perform many actions using an overpowered role.

Fix:

  • narrow tools;
  • use action-specific roles;
  • validate arguments;
  • require human approval for high-impact actions;
  • set time/step/budget limits.

24.6 AI as Decision Maker Without Accountability

The model makes consequential decisions without human review or deterministic rules.

Fix:

  • use AI as decision support;
  • require human approval;
  • log evidence;
  • make deterministic policy checks explicit.

25. Failure Modes

Failure ModeSymptomRoot CausePrevention
Hallucinated answerUnsupported claimWeak prompt/eval/contextRequire citations, eval, human review
Data leakUser sees unauthorized dataRetrieval not authorization-awareMetadata filters, separate indexes, tests
Prompt injection successModel follows malicious documentUntrusted source not isolatedDelimit context, guardrails, output validation
Silent quality regressionNew prompt worseNo eval gateGolden dataset and release gate
Cost spikeToken usage jumpsLong context or runaway agentsquotas, budgets, max tokens, step limits
Latency spikePoor UXlarge context/model/Region issuemodel routing, streaming, async, capacity planning
Guardrail overblockingValid work blockedPolicy too broadreview metrics and tune policies
Guardrail underblockingUnsafe outputWeak safety testsred-team suite and layered controls
Stale retrievalOld policy usedingestion lag or version issuefreshness metadata, ingestion monitoring
Untraceable decisionAudit cannot reconstructMissing logs/evidenceprompt/model/source/version logging

26. Deliberate Practice

Practice 1: Secure Bedrock Invocation

Design a secure Bedrock invocation path for an internal application.

Required output:

  • IAM role design;
  • allowed models;
  • network path;
  • logging decision;
  • guardrail configuration;
  • cost controls;
  • failure modes.

Practice 2: RAG Authorization Design

Design RAG for confidential policy documents.

Required output:

  • document metadata schema;
  • authorization filter model;
  • chunking strategy;
  • retrieval evaluation plan;
  • leakage test cases.

Practice 3: AI Evaluation Gate

Create an evaluation gate for prompt/model changes.

Required output:

  • test categories;
  • golden dataset structure;
  • pass/fail thresholds;
  • rollback criteria;
  • approval workflow.

Practice 4: Regulated AI Runbook

Create a runbook for “assistant returns unsupported regulatory conclusion.”

Required output:

  • detection signals;
  • triage steps;
  • immediate mitigation;
  • user communication;
  • evidence collection;
  • rollback/update path;
  • post-incident review questions.

27. Self-Correction Questions

Use these questions to test your design:

  1. Does the system retrieve only data the user is authorized to see?
  2. Can you identify which model, prompt version, and sources produced an answer?
  3. Can you roll back a prompt change?
  4. Are guardrails tested against known failure cases?
  5. Is there an evaluation set for each critical use case?
  6. Are token usage and cost visible by service/user/tenant?
  7. Are logs safe to store under the chosen retention policy?
  8. Can the system degrade gracefully when Bedrock/retrieval is unavailable?
  9. Are high-impact actions validated outside the model?
  10. Is human review required where business/regulatory risk demands it?

28. Engineering Judgment Summary

A production AI platform on AWS is not a model wrapper.

It is an operational system where data governance, model access, prompt control, retrieval authorization, guardrails, evaluation, observability, cost control, and human accountability work together.

The winning mental model:

Do not trust the model as the boundary.
Build deterministic boundaries around the model.

Amazon Bedrock can reduce the burden of foundation model access and inference operations. SageMaker AI can support custom ML and MLOps lifecycle. But neither removes the need for engineering judgment.

For regulated systems, AI should usually begin as assistive, evidence-producing, and reviewable. Automation can increase only when the platform can prove correctness, safety, auditability, and accountability.


29. References

Lesson Recap

You just completed lesson 34 in final stretch. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.