Series MapLesson 27 / 35
Deepen PracticeOrdered learning track

Learn Aws Part 027 Compliance Auditability Config Cloudtrail And Policy As Code

29 min read5713 words
PrevNext
Lesson 2735 lesson track2029 Deepen Practice

title: Learn AWS Engineering Mastery - Part 027 description: AWS compliance, auditability, evidence engineering, CloudTrail, AWS Config, Audit Manager, policy-as-code, control mapping, exception governance, and regulatory defensibility. series: learn-aws seriesTitle: Learn AWS Engineering Mastery order: 27 partTitle: Compliance, Auditability, Config, CloudTrail, and Policy-as-Code tags:

  • aws
  • compliance
  • auditability
  • cloudtrail
  • config
  • audit-manager
  • policy-as-code
  • governance
  • evidence
  • series date: 2026-07-01

Compliance, Auditability, Config, CloudTrail, and Policy-as-Code

Target pembelajaran: setelah bagian ini, kita mampu mendesain sistem AWS yang bukan hanya aman secara teknis, tetapi juga bisa dibuktikan, diaudit, dijelaskan, dan dipertahankan di hadapan auditor, regulator, security reviewer, dan incident review board.

Part sebelumnya membahas security engineering: KMS, secrets, WAF, GuardDuty, Security Hub, dan containment. Part ini membahas lapisan yang sering gagal di organisasi besar:

Bagaimana kita membuktikan bahwa kontrol benar-benar berjalan, perubahan tercatat, konfigurasi sesuai policy, exception disetujui, evidence bisa ditemukan, dan narasi compliance tidak hanya berupa screenshot manual menjelang audit?

Compliance di AWS bukan sekadar checklist. Compliance adalah sistem bukti.

Engineer top-tier harus mampu menjawab lima pertanyaan:

  1. Who did what? Siapa melakukan aksi apa, kapan, dari mana, menggunakan identitas apa?
  2. What changed? Resource apa berubah dari state apa ke state apa?
  3. Was it allowed? Apakah perubahan sesuai policy, kontrol, dan approval path?
  4. Was it remediated? Jika tidak compliant, apakah ada koreksi, exception, atau risk acceptance?
  5. Can we prove it later? Apakah bukti lengkap, immutable, queryable, dan bisa dijelaskan?

Jika jawaban atas pertanyaan tersebut masih bergantung pada ingatan manusia, screenshot console, atau spreadsheet manual, sistem belum audit-ready.


1. Kaufman Skill Map

Kaufman-style skill deconstruction untuk compliance AWS:

Sub-skillPertanyaan intiOutput yang harus bisa dibuat
Evidence designBukti apa yang dibutuhkan untuk membuktikan kontrol?Evidence map per control
Audit loggingEvent apa yang harus direkam dan disimpan?CloudTrail org trail + data event strategy
Config complianceState resource apa yang harus dievaluasi?Config rules + conformance packs
Policy-as-codePolicy mana yang bisa dicegah/dideteksi otomatis?Guardrail repo + CI evaluation
Exception handlingBagaimana pelanggaran yang legitimate diproses?Exception workflow + expiry
Audit responseBagaimana menjawab auditor tanpa kerja manual besar?Evidence query + report package
Continuous assuranceBagaimana compliance dijaga setiap hari?Dashboard, alarm, remediation, review cadence

Skill target bukan “tahu CloudTrail dan Config”. Skill targetnya adalah mampu membangun assurance loop.


2. Mental Model: Compliance Is Evidence of Control Operation

Compliance bukan keadaan statis. Compliance adalah kemampuan menunjukkan bahwa kontrol berjalan sepanjang waktu.

Contoh:

Control intentTechnical implementationEvidence source
Semua S3 bucket production terenkripsiS3 default encryption + SCP/IaC guardrailAWS Config, CloudTrail, S3 config
Hanya role tertentu bisa deploy ke prodIAM role + CI/CD approval + SCPCloudTrail, pipeline logs, IAM policy
Semua database production punya backupRDS backup policy + AWS Backup planAWS Backup, AWS Config, CloudTrail
Tidak ada public databaseSecurity group rules + Config ruleAWS Config, VPC flow logs, Security Hub
Semua akses break-glass direkamIAM role + CloudTrail + ticket referenceCloudTrail, IAM Identity Center, ticket system

Perhatikan pola: policy intent tidak otomatis menjadi evidence. Kita harus merancang jalur bukti.

2.1 Compliance vs Security

Security bertanya:

Apakah sistem terlindungi?

Compliance bertanya:

Bisakah kita membuktikan sistem terlindungi sesuai kontrol yang disepakati?

Sistem bisa relatif aman tetapi buruk compliance-nya jika tidak ada bukti. Sistem juga bisa terlihat compliant tetapi tetap rapuh jika kontrol hanya formalitas. Top-tier engineer menghindari dua-duanya.

2.2 Auditability vs Observability

Observability menjawab kondisi teknis runtime:

  • request gagal di mana;
  • latency naik kenapa;
  • dependency mana yang timeout;
  • error rate per service.

Auditability menjawab kondisi governance:

  • siapa mengubah policy;
  • siapa membuka security group;
  • perubahan dilakukan melalui pipeline atau manual;
  • apakah perubahan sesuai approved change;
  • apakah pelanggaran policy diperbaiki.

Keduanya saling melengkapi, tapi tidak sama.


3. AWS Evidence Stack

AWS evidence stack bisa dilihat sebagai beberapa lapisan:

LayerAWS capabilityApa yang dibuktikan
ActivityCloudTrailAPI/action history
StateAWS ConfigResource configuration over time
Compliance evaluationAWS Config RulesWhether resource state satisfies rules
Packaged governanceConformance PacksStandardized controls across accounts/Regions
Security postureSecurity HubAggregated findings and standards
Audit workflowAudit ManagerControl mapping, evidence collection, assessment
Preventive governanceSCP/IAM/IaC policyWhat cannot be changed or deployed
RetentionS3 Object Lock / CloudTrail LakeLong-term evidence preservation

A mature organization does not pick only one. It combines them into an assurance pipeline.


4. CloudTrail: The API Activity Ledger

CloudTrail records actions taken by users, roles, AWS services, console, CLI, SDKs, and APIs. For auditability, CloudTrail is the first answer to: who did what?

4.1 CloudTrail Event Categories

CategoryMeaningTypical usage
Management eventsControl plane actions like create bucket, attach policy, modify security groupCore audit trail
Data eventsHigh-volume object/function-level access such as S3 object API or Lambda invokeSensitive data access audit
Insights eventsUnusual API activity patternsDetection of anomalous control-plane behavior
Network activity eventsVPC endpoint API activity visibility where supportedPrivate path auditing

Do not assume management events are enough. Regulated systems often need data event strategy for sensitive buckets, key workloads, or high-risk resources.

4.2 Organization Trail

For enterprise AWS, prefer organization-level CloudTrail where possible.

Design principle:

Workload account owners should not be able to erase or weaken central audit evidence.

Common pattern:

  1. Organization trail is configured centrally.
  2. Logs are delivered to a dedicated log archive/security account.
  3. S3 bucket has restrictive bucket policy.
  4. Encryption is controlled centrally.
  5. Log integrity validation is enabled where applicable.
  6. Retention follows regulatory requirement.
  7. Critical events are routed to detection pipelines.

4.3 Management Events vs Data Events

Management events show control-plane operations. Data events show data-plane operations.

Example:

ActionCloudTrail event type
Create S3 bucketManagement event
Change S3 bucket policyManagement event
Get object from sensitive bucketData event
Put object into evidence bucketData event
Invoke sensitive LambdaData event

Trade-off:

ChoiceBenefitRisk/cost
Management events onlyLower volume/cost, broad control-plane auditWeak data access evidence
Data events for all bucketsStronger audit coverageHigh volume, high cost, noisy evidence
Data events for classified bucketsBalancedRequires data classification maturity
Data events on evidence bucketsProves evidence access pathMust protect from excessive noise

Do not enable high-volume data events blindly. Tie them to classification and control objectives.

4.4 CloudTrail Lake

CloudTrail Lake is useful when audit queries become frequent, cross-account, or historical.

Typical queries:

  • Who changed this IAM policy in the last 90 days?
  • Which principals called PutBucketPolicy on production buckets?
  • Was this security group rule opened manually or by pipeline role?
  • Which accounts had root user activity?
  • Which role assumed break-glass access during an incident window?

CloudTrail Lake shifts evidence from raw files to queryable event stores.

4.5 CloudTrail Failure Modes

Failure modeConsequenceMitigation
Trail only in one accountMissing org-wide activityOrganization trail
Trail only in one RegionMissing regional activityMulti-Region trail
Logs stored in same workload accountTampering riskCentral log archive account
No data event strategySensitive object access invisibleClassify and enable selective data events
No retention policyEvidence unavailable during auditRetention by control requirement
No alerting on trail changesLogging can be disabled silentlyConfig rule + EventBridge alert
Unrestricted log bucket accessEvidence confidentiality riskBucket policy, KMS, least privilege
No linkage to change ticketsHard to prove authorizationRequire deployment metadata/tags/ticket references

4.6 CloudTrail Design Checklist

  • Is there an organization trail?
  • Is it multi-Region?
  • Are logs delivered to a separate security/log archive account?
  • Is the log bucket protected by bucket policy and encryption?
  • Is there alerting on StopLogging, DeleteTrail, and trail modification?
  • Are data events enabled for classified data stores?
  • Is retention aligned with regulatory and legal requirements?
  • Are audit queries tested before real audit season?
  • Can you answer who changed a critical resource within minutes?

5. AWS Config: The Resource State Timeline

CloudTrail tells you what API calls happened. AWS Config tells you what resource configuration existed and how it changed over time.

Mental model:

AWS Config is central to answering:

  • What was the configuration of this resource at a point in time?
  • When did the resource become noncompliant?
  • Which resources violate this rule?
  • What relationships did this resource have?
  • Did remediation happen?

5.1 Configuration Recorder

The configuration recorder determines which resource types AWS Config records.

Design concern:

If the resource type is not recorded, you cannot evaluate or reconstruct it later through Config.

Common mistakes:

  • Config enabled only in some Regions.
  • Global resources omitted unintentionally.
  • Recorder scope too narrow.
  • Delivery channel misconfigured.
  • No aggregation across accounts.

5.2 Configuration Items

A configuration item is a point-in-time representation of resource state. For compliance, configuration items are stronger than screenshots because they are time-stamped, structured, queryable, and linked to resource identity.

Example audit question:

Was encryption enabled on all production RDS instances between January and March?

Manual answer: spreadsheet and screenshots.

AWS Config answer: query configuration items and rule evaluations by resource, account, Region, and time.

5.3 AWS Config Rules

Rules evaluate whether resources comply with expected configuration.

Rule typeUse case
AWS managed ruleCommon controls like encryption, public access, required tags
Custom ruleOrganization-specific policy logic
Periodic ruleTime-based evaluation
Change-triggered ruleEvaluate when resource changes

Rule design principles:

  1. Rule name should map to control intent.
  2. Rule should identify resource scope clearly.
  3. Rule result should be actionable.
  4. Rule should not create unmanageable noise.
  5. Rule should support exception logic where needed.
  6. Rule evaluation should feed dashboard and audit evidence.

5.4 Conformance Packs

A conformance pack packages Config rules and remediation actions as a deployable compliance unit.

Use conformance packs when you need:

  • standardized controls across accounts;
  • baseline governance per environment;
  • evidence by framework/control;
  • organization-wide rollout;
  • drift detection against a baseline.

Example pack categories:

PackTypical controls
Security baselineCloudTrail enabled, S3 public access blocked, root MFA, encryption
Data protectionKMS encryption, backup enabled, restricted public access
Network baselineNo unrestricted ingress, flow logs enabled, endpoint policies
Tagging baselineRequired cost/security/owner/environment tags
Regulated workload baselineLogging, encryption, retention, access controls, backup

5.5 Remediation

AWS Config can trigger remediation through Systems Manager Automation documents.

However, remediation is not always safe.

NoncomplianceAuto-remediate?Reasoning
Missing required tagOften yesLow risk if standard tags known
S3 public access openedOften yes for sensitive accountHigh security risk, quick correction valuable
Security group port 22 open to internetOften yes or quarantineContext-dependent, but high-risk
RDS backup retention too lowMaybeCould affect cost/operations
Deleting unauthorized resourceRarely automaticHigh business risk
Changing IAM policyUsually careful workflowCould break production

Remediation design must include:

  • blast radius;
  • rollback path;
  • notification;
  • exception handling;
  • owner routing;
  • evidence of remediation.

5.6 Config Failure Modes

Failure modeConsequenceMitigation
Config not enabled in all accounts/RegionsBlind spotsOrganization rollout + periodic audit
Rules too genericLow signalMap rule to explicit control
Rules too noisyAlert fatigueSeverity, owner mapping, exception workflow
No remediation strategyNoncompliance accumulatesAuto/manual remediation playbooks
No exception expiryPermanent policy bypassTime-bound exception registry
No aggregationCentral team cannot see riskConfig aggregator/delegated admin
No cost awarenessConfig itself becomes expensive/noisyScope resource recording intentionally

6. CloudTrail vs AWS Config

CloudTrail and Config are often confused. Treat them as complementary evidence systems.

QuestionCloudTrailAWS Config
Who made API call?StrongWeak/indirect
What changed?API-level changeResource-state change
What was config at time T?HarderStrong
Is resource compliant?Not primaryStrong
Can we reconstruct user activity?StrongNot primary
Can we evaluate baseline?IndirectStrong
Can we trigger detection/remediation?Yes via eventsYes via rules

Example:

Security group allowed 0.0.0.0/0 to port 5432.

CloudTrail answers:

  • which principal called AuthorizeSecurityGroupIngress;
  • from which source IP;
  • using which assumed role;
  • at what time.

AWS Config answers:

  • when the security group became noncompliant;
  • whether it is still noncompliant;
  • what rule detected it;
  • what related resources are affected.

A defensible audit answer uses both.


7. AWS Audit Manager: Control-to-Evidence Workflow

AWS Audit Manager helps organize control frameworks, assessments, and evidence collection.

Mental model:

Audit Manager should not be treated as magic compliance. It automates parts of evidence collection, but you still need:

  • correct scope;
  • correct AWS account mapping;
  • correct control interpretation;
  • manual evidence for non-AWS processes;
  • owner review;
  • exception documentation;
  • auditor-ready narrative.

7.1 Frameworks, Controls, and Assessments

ConceptMeaning
FrameworkA collection of controls, often aligned to a standard
ControlA requirement or safeguard to be evaluated
AssessmentEvaluation of scoped AWS usage against a framework
EvidenceData collected to support control operation
DelegationAssigning evidence/control review to owners

For engineering teams, the important discipline is mapping controls to technical evidence.

Example:

ControlEvidence
Audit logging is enabledCloudTrail configuration, Config rule result, log delivery evidence
Access is reviewedIAM Access Analyzer finding review, IAM Identity Center assignment export, ticket approval
Encryption at restConfig rule result, KMS key policy, resource encryption setting
Backup retentionAWS Backup plan, RDS retention setting, restore test record
Change managementCI/CD approval logs, CloudTrail deployment role events, change ticket

7.2 Automatic vs Manual Evidence

Not all controls are automatically provable from AWS.

Evidence typeExampleRisk
Automated AWS evidenceConfig rule output, CloudTrail eventStrong but must be scoped correctly
Semi-automated evidenceAthena query output, generated reportRequires query validation
Manual evidencePolicy document, meeting approval, training recordHigher process risk
External evidenceJira, ServiceNow, GitHub, HR systemRequires integration/chain of custody

A mature program reduces manual evidence but does not pretend all evidence can be automated.

7.3 Evidence Finder and Queryability

Evidence that cannot be searched quickly becomes operational debt.

A useful evidence search model includes:

  • by account;
  • by Region;
  • by workload;
  • by control;
  • by resource;
  • by owner;
  • by compliance status;
  • by time window;
  • by exception ID.

Do not wait for audit season to test evidence retrieval. Test it like restore drills.


8. Policy-as-Code

Policy-as-code means control intent is expressed in versioned, testable, reviewable rules.

It can operate at several stages:

StageExample control
Pre-commitNo plaintext secret in repo
CI static checkS3 bucket must not be public
IaC plan checkRDS must have backup retention >= required threshold
Deployment gateProduction changes require approval
AWS preventiveSCP denies disabling CloudTrail
Runtime detectiveAWS Config detects drift
RemediationSSM Automation corrects tag or blocks public access

8.1 Policy Code vs Human Policy

Human policy:

Production databases must be encrypted.

Policy-as-code:

DENY resource.aws_db_instance WHERE
  environment == "prod" AND
  storage_encrypted != true

But good policy-as-code also needs:

  • naming standard;
  • account/environment context;
  • exception model;
  • severity;
  • remediation hint;
  • test cases;
  • ownership;
  • audit mapping.

8.2 Preventive vs Detective Guardrails

Guardrail typeExampleStrengthWeakness
PreventiveSCP denies disabling CloudTrailStops bad actionCan block urgent work if too broad
DetectiveConfig rule flags noncomplianceFlexibleBad state can exist for some time
CorrectiveAutomation fixes noncomplianceFast responseCan break workloads if unsafe
AdvisoryCI warningLow frictionEasy to ignore

Use preventive guardrails for actions that should almost never happen. Use detective/corrective controls where context matters.

8.3 Policy Repository Structure

Example governance repo:

aws-governance/
  controls/
    CT-LOG-001-cloudtrail-enabled.yaml
    DP-ENC-001-s3-encryption.yaml
    IAM-PRIV-001-no-wildcard-admin.yaml
  rules/
    config/
    cfn-guard/
    opa/
    terraform-policy/
  exceptions/
    approved/
    expired/
  mappings/
    iso27001.yaml
    soc2.yaml
    internal-regulatory.yaml
  tests/
    compliant/
    noncompliant/
  docs/
    control-catalog.md
    evidence-map.md

Control files should include:

id: DP-ENC-001
title: Production storage must use approved encryption
intent: Protect regulated data at rest
severity: high
scope:
  environments: [prod]
  resourceTypes: [s3, rds, dynamodb, ebs]
evidenceSources:
  - aws-config
  - cloudtrail
  - kms
exception:
  allowed: true
  maxDurationDays: 30
  approvalRole: security-risk-owner
remediation:
  mode: manual-or-automated-by-resource-type

This turns compliance from scattered tribal knowledge into an engineering artifact.


9. Exception Governance

Exception handling is where many compliance systems become fiction.

A real system has exceptions because production has constraints. But an exception must not become an invisible permanent bypass.

9.1 Exception Record

Minimum fields:

FieldWhy it matters
Exception IDTraceability
Control IDWhat is being bypassed
Resource/account/RegionScope
OwnerAccountability
ReasonRisk narrative
Compensating controlHow risk is reduced
ApprovalWho accepted risk
Expiry datePrevents permanent bypass
Review cadenceEnsures reassessment
Evidence linkAudit trace

9.2 Exception State Machine

A good exception workflow is strict enough for audit but practical enough for engineering.

9.3 Exception Anti-Patterns

Anti-patternWhy dangerous
“Temporary” exception without expiryBecomes permanent bypass
Exception approved by resource owner onlyConflict of interest
Exception not linked to control IDHard to audit
Exception not machine-readableCannot integrate with policy-as-code
Exception suppresses all alertsHides unrelated risks
No compensating controlRisk is accepted blindly

10. Control Mapping for Regulated Workloads

A control catalog bridges regulation and AWS implementation.

Example:

Control IDIntentAWS implementationEvidence
LOG-001Record administrative activityOrg CloudTrail multi-Region trailCloudTrail trail config, S3 log objects
LOG-002Protect audit logsLog archive account, KMS, bucket policy, Object Lock where requiredS3 policy, KMS policy, CloudTrail delivery
CFG-001Monitor resource configurationAWS Config recorder all required accounts/RegionsConfig recorder status
CFG-002Detect public exposureConfig managed/custom rulesRule evaluations
IAM-001Enforce least privilegeIAM roles, permission boundaries, SCPIAM policy, Access Analyzer review
BCK-001Ensure recoverabilityAWS Backup plan, RDS backup retentionBackup job reports, restore drill
CHG-001Govern production changeCI/CD approval, deployment rolePipeline logs, CloudTrail AssumeRole

10.1 Evidence Quality Criteria

Good evidence is:

  1. Relevant: proves the control, not adjacent activity.
  2. Complete: covers required accounts, Regions, resources, and time period.
  3. Reliable: generated by system of record, not manually edited.
  4. Time-bound: shows when control operated.
  5. Tamper-resistant: protected against unauthorized modification/deletion.
  6. Queryable: can be found and filtered.
  7. Explainable: auditor can understand mapping from requirement to evidence.

Bad evidence is a screenshot with no timestamp, no scope, no query, and no link to control.


11. Regulatory Defensibility

Regulatory defensibility means you can explain your system in terms of controls, risk, evidence, and operational behavior.

It is not enough to say:

We use AWS, therefore we are compliant.

The stronger answer is:

AWS provides compliant infrastructure and service capabilities. We are responsible for configuring workloads, access, logging, encryption, retention, monitoring, response, and evidence collection according to our regulatory obligations. This control catalog maps each obligation to AWS technical controls and evidence sources. Exceptions are time-bound and approved by risk owners.

11.1 Responsibility Boundary

For each control, document:

QuestionExample
What does AWS operate?Physical datacenter, managed service infrastructure
What do we configure?IAM, encryption, logging, network access, backup
What evidence comes from AWS?CloudTrail, Config, Audit Manager, service configuration
What evidence comes from us?Change approval, risk acceptance, runbook, incident report
What evidence comes from vendors?SaaS logs, identity provider audit logs, ticketing system

11.2 Evidence Chain

If any link is missing, the audit story becomes weak.


12. Designing a Continuous Assurance Platform

A serious AWS environment should not collect compliance evidence once per year. It should run continuous assurance.

12.1 Reference Architecture

12.2 Control Loop

  1. Define control catalog.
  2. Map controls to AWS services and evidence.
  3. Implement preventive controls where safe.
  4. Implement detective controls for configuration and activity.
  5. Route noncompliance to owner.
  6. Remediate or create exception.
  7. Retain evidence.
  8. Review metrics monthly.
  9. Test evidence retrieval quarterly.
  10. Re-map controls when architecture changes.

12.3 Compliance Metrics

Useful compliance metrics:

MetricMeaning
Control coveragePercentage of controls mapped to evidence
Account coveragePercentage of accounts with required logging/config
Region coveragePercentage of used Regions covered
Noncompliance ageHow long violations remain open
Exception countActive exceptions by severity
Exception ageHow long risk acceptance persists
Auto-remediation success rateWhether corrective controls work
Evidence retrieval timeHow fast audit package can be produced
Manual evidence ratioAmount of evidence still manually collected

If a control is “green” but evidence retrieval takes days, compliance maturity is low.


13. Policy-as-Code in CI/CD and IaC

13.1 Pre-Deployment Guardrail Example

A production RDS instance should not deploy if backup retention is below policy.

rule prod_rds_backup_retention {
  when resource.type == "aws_db_instance" and resource.tags.Environment == "prod" {
    assert resource.backup_retention_period >= 7
  }
}

13.2 Runtime Guardrail Example

If a manual console change reduces retention:

  1. AWS Config records resource state change.
  2. Config rule evaluates noncompliance.
  3. EventBridge routes event to remediation workflow.
  4. Workflow checks exception registry.
  5. If no exception, creates ticket or remediates.
  6. Evidence is stored.

13.3 Why Both CI and Runtime Checks Are Needed

CI checks only catch code-driven changes. Runtime checks catch:

  • console changes;
  • emergency operations;
  • third-party automation;
  • service-created resources;
  • drift;
  • changes from old pipelines;
  • imported resources.

Runtime checks without CI checks allow bad changes into production and then detect them late. Use both.


14. Workload Evidence Map Template

For each production workload, maintain an evidence map.

workload: regulated-case-management
owner: platform-enforcement-team
environment: production
accounts:
  - prod-app
  - prod-data
  - prod-security
regions:
  - ap-southeast-1
  - ap-southeast-3
controls:
  - id: LOG-001
    title: Administrative activity logging
    implementation:
      - organization-cloudtrail
      - cloudtrail-lake
      - log-archive-s3
    evidence:
      - cloudtrail-trail-config
      - sample-event-query
      - s3-log-delivery-proof
    reviewCadence: monthly
  - id: CFG-001
    title: Resource configuration monitoring
    implementation:
      - aws-config-recorder
      - conformance-pack-regulated-baseline
    evidence:
      - recorder-status
      - rule-evaluation-report
    reviewCadence: weekly

This template prevents audit work from becoming a last-minute archaeology project.


15. Common Production Scenarios

15.1 Auditor asks: “Show all privileged access to production.”

Good response path:

  1. Identify privileged roles.
  2. Query CloudTrail AssumeRole events.
  3. Join with identity provider/ticket references if available.
  4. Filter by production accounts.
  5. Provide evidence with time window, principal, role, source, and approval link.

Weak response:

  • Export IAM users manually.
  • Screenshot role list.
  • Ask team leads whether anyone accessed production.

15.2 Security review asks: “Can someone disable logging?”

Good response path:

  1. SCP denies disabling CloudTrail except break-glass/security admin path.
  2. Config rule detects CloudTrail disabled.
  3. EventBridge alert fires on trail modification events.
  4. Log archive bucket cannot be modified by workload accounts.
  5. Evidence shows tests of detection.

15.3 Incident review asks: “When did the bucket become public?”

Good response path:

  1. Config timeline shows when public policy/block setting changed.
  2. CloudTrail shows who made the change.
  3. Security Hub/Config finding shows detection time.
  4. Ticket/remediation record shows closure time.
  5. Exception registry shows whether it was approved.

16. Decision Matrix

16.1 Which Service for Which Evidence?

NeedPrimary serviceSupporting service
API activity auditCloudTrailCloudTrail Lake, EventBridge
Resource state over timeAWS ConfigConfig aggregator
Baseline complianceConfig RulesConformance Packs
Automated remediationAWS ConfigSystems Manager Automation
Audit framework mappingAudit ManagerSecurity Hub, Config, CloudTrail
Security postureSecurity HubGuardDuty, Inspector, Config
Long-term log retentionS3Object Lock, KMS
Queryable audit eventsCloudTrail LakeAthena/S3 depending design
Preventive org-level guardrailSCPIAM permission boundary
Pre-deploy validationIaC policy-as-codeCI/CD gates

16.2 Prevent, Detect, or Correct?

ControlPreferred modeReasoning
Disable CloudTrailPrevent + detectAlmost never legitimate
Public S3 bucket in data accountPrevent + detect + correctHigh-risk exposure
Missing cost tagDetect + correctLow-risk automation
Nonstandard instance typeDetect/advisoryMay have workload-specific reason
Root user activityDetect + alertMust be investigated
KMS key deletion schedulePrevent/detectCould destroy data access
Security group open to internetDetect/correct depending port/accountContext-dependent
Backup disabledDetect + owner workflowRemediation may affect workload behavior

17. Failure Modeling

17.1 Evidence Blind Spot

Symptom: Auditor asks for historical state; team cannot reconstruct it.

Likely causes:

  • Config not enabled;
  • CloudTrail retention too short;
  • data events not enabled;
  • logs stored in workload account;
  • resource type not recorded;
  • no account/Region inventory.

Mitigation:

  • baseline evidence services in landing zone;
  • periodic evidence retrieval drills;
  • resource inventory reconciliation;
  • conformance pack for logging/config coverage.

17.2 False Compliance

Symptom: Dashboard green, but auditor rejects evidence.

Likely causes:

  • control poorly mapped;
  • rule checks only existence, not effectiveness;
  • evidence not time-bound;
  • scope excludes some accounts;
  • manual exception not documented.

Mitigation:

  • control-to-evidence review;
  • sample audit walkthrough;
  • control owner sign-off;
  • automated scope inventory.

17.3 Policy Noise

Symptom: Thousands of findings ignored.

Likely causes:

  • rules too broad;
  • missing owner tags;
  • no severity model;
  • no exception process;
  • no remediation path.

Mitigation:

  • classify severity;
  • route to owners;
  • suppress only with expiry;
  • automate low-risk fixes;
  • review noisy rules quarterly.

17.4 Remediation Breaks Production

Symptom: Auto-remediation changes resource and causes outage.

Likely causes:

  • remediation not risk-classified;
  • no safe rollback;
  • no staging test;
  • no workload owner notification;
  • one-size-fits-all remediation.

Mitigation:

  • separate advisory/detect/correct policies;
  • implement dry-run;
  • run remediation in nonprod first;
  • use approval for high-risk changes;
  • maintain remediation runbooks.

18. Internal Engineering Handbook Rules

Rule 1: Evidence must be designed with the system

Do not deploy regulated workloads and add audit later. Logging, Config, retention, control mapping, and evidence ownership are part of architecture.

Rule 2: Every production exception expires

If exception cannot expire, it is not an exception; it is a policy change. Treat it as such.

Rule 3: A control without evidence is an aspiration

Write controls so they map to evidence sources. If you cannot identify evidence, the control is not operationalized.

Rule 4: A dashboard is not an audit package

Dashboards help operators. Audit packages need scope, time window, evidence source, explanation, and sign-off.

Rule 5: Compliance should be continuous

Annual audit preparation should mostly package existing evidence, not discover whether controls exist.

Rule 6: Manual evidence is expensive risk

Manual evidence may be necessary, but it should be minimized, versioned, linked, and reviewed.


19. Implementation Blueprint

Phase 1: Baseline Logging and State

  • Enable organization CloudTrail.
  • Centralize logs in log archive account.
  • Enable AWS Config in required accounts/Regions.
  • Create account/Region coverage dashboard.
  • Protect evidence buckets.

Phase 2: Control Catalog

  • Define control IDs.
  • Map each control to AWS implementation.
  • Map each control to evidence source.
  • Define owner and review cadence.
  • Define exception rules.

Phase 3: Config Rules and Conformance Packs

  • Start with high-risk controls:
    • CloudTrail enabled;
    • S3 public access blocked;
    • encryption enabled;
    • required tags;
    • no unrestricted ingress;
    • backup enabled.
  • Deploy conformance packs to pilot OU.
  • Tune noise.
  • Expand to production OUs.

Phase 4: Policy-as-Code Gates

  • Add IaC static checks.
  • Add CI policy checks.
  • Add deployment approval gates for production.
  • Require metadata: owner, environment, data classification, change ID.

Phase 5: Evidence Automation

  • Configure Audit Manager assessments.
  • Build recurring evidence exports.
  • Create evidence retrieval runbooks.
  • Test audit response scenarios.

Phase 6: Continuous Assurance

  • Review compliance metrics monthly.
  • Review exception inventory.
  • Run evidence retrieval game days.
  • Add new controls when architecture changes.

20. Deliberate Practice

Practice 1: Reconstruct a Manual Change

Simulate a manual security group change in a sandbox account.

Deliverables:

  • CloudTrail event query showing who changed it.
  • AWS Config timeline showing when it became noncompliant.
  • Config rule result.
  • Remediation/ticket record.
  • Short audit narrative.

Practice 2: Build a Mini Conformance Pack

Create a small pack with controls:

  • CloudTrail enabled.
  • S3 bucket public read prohibited.
  • Required tags exist.
  • EBS volumes encrypted.

Deliverables:

  • rule definitions;
  • deployment method;
  • evaluation output;
  • remediation decision table.

Practice 3: Design an Exception Workflow

For a public S3 bucket used by a demo app, define:

  • exception record;
  • risk owner;
  • expiry;
  • compensating control;
  • detection behavior;
  • closure criteria.

Practice 4: Audit Package Drill

Pick one control and produce a package:

  • control description;
  • scope;
  • AWS implementation;
  • evidence source;
  • sample evidence;
  • review sign-off;
  • known exceptions.

Timebox: 90 minutes.

Goal: discover whether your evidence system is real.


21. Anti-Patterns

Anti-patternBetter approach
Enabling CloudTrail only after audit requestOrganization trail as landing zone baseline
Treating Config findings as “security team problem”Owner routing by account/workload/team
Screenshot-driven audit evidenceQueryable, retained, structured evidence
Permanent exception spreadsheetMachine-readable exception registry with expiry
Ignoring disabled RegionsExplicit Region policy and coverage checks
No linkage between control and evidenceControl catalog with evidence mapping
Auto-remediating everythingRisk-based remediation modes
Only checking IaCRuntime drift detection with Config
Only checking runtimePre-deployment policy gates
Compliance owned only by GRCShared ownership between platform, security, workload, and risk owners

22. Self-Correction Checklist

Ask these questions before claiming a workload is audit-ready:

  • Can we list all in-scope AWS accounts and Regions?
  • Is CloudTrail enabled centrally and protected from workload owners?
  • Is AWS Config enabled for required resource types?
  • Can we show resource state at a point in time?
  • Can we connect CloudTrail activity to change approval?
  • Do Config rules map to named controls?
  • Are conformance packs deployed consistently?
  • Is there an exception workflow with expiry?
  • Are remediation actions tested and safe?
  • Can we produce evidence without manual archaeology?
  • Is evidence retained for the required period?
  • Can an auditor understand the control-to-evidence chain?

23. Engineering Judgment Summary

Compliance on AWS is not a pile of screenshots. It is an engineered feedback loop.

The strongest mental model:

CloudTrail proves activity. AWS Config proves state. Config Rules prove compliance evaluation. Conformance Packs package baseline controls. Audit Manager organizes control evidence. Policy-as-code shifts controls earlier. Exception governance keeps reality honest. Retention and queryability make evidence usable when pressure arrives.

A top-tier AWS engineer designs workloads so that audit evidence is a natural by-product of operating the system correctly.

If a system cannot explain who changed what, what state existed, why it was allowed, how noncompliance was handled, and where evidence lives, then it is not truly production-ready for regulated environments.


24. References

Lesson Recap

You just completed lesson 27 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.