Series MapLesson 31 / 35
Final StretchOrdered learning track

Learn Aws Part 031 Modernization Migration And Strangler Architecture

26 min read5122 words
PrevNext
Lesson 3135 lesson track3035 Final Stretch

title: Learn AWS Engineering Mastery - Part 031 description: Modernization and migration on AWS through portfolio assessment, 7R strategies, migration waves, strangler architecture, data migration, cutover, and risk-controlled transformation. series: learn-aws seriesTitle: Learn AWS Engineering Mastery order: 31 partTitle: Modernization, Migration, and Strangler Architecture tags:

  • aws
  • cloud
  • architecture
  • migration
  • modernization
  • strangler-fig
  • enterprise-architecture date: 2026-07-01

Learn AWS Engineering Mastery - Part 031

Modernization, Migration, and Strangler Architecture

Modernization is not the same as moving servers to AWS.

A migration can move a workload from one hosting environment to another while preserving most of its architecture. Modernization changes the workload so that it becomes easier to operate, safer to change, more observable, more resilient, more cost-transparent, and better aligned with business capability boundaries.

A senior AWS engineer does not treat migration as a one-time infrastructure project. They treat it as a controlled transformation program with measurable risk, staged learning, explicit rollback points, and clear ownership.

The core question is not:

How do we migrate this application to AWS?

The better question is:

Which business capability should change, which risk should be reduced, and which parts of the system should remain boring for now?

This part teaches how to think about migration and modernization as engineering change management.


1. Target Skill

After this part, you should be able to:

  • distinguish migration, modernization, transformation, refactoring, replatforming, and re-architecture;
  • select an appropriate migration strategy using the 7 Rs: retire, retain, rehost, relocate, repurchase, replatform, refactor/re-architect;
  • design a portfolio assessment that avoids arbitrary migration sequencing;
  • structure migration waves based on risk, dependency, business priority, and learning value;
  • apply the strangler fig pattern to incrementally replace legacy functionality;
  • choose safe cutover models: big-bang, phased, parallel run, shadow traffic, dark launch, and dual-write with reconciliation;
  • identify migration failure modes before execution;
  • plan data migration, CDC, backfill, validation, and rollback;
  • create modernization roadmaps that balance speed, risk, and engineering sustainability.

2. Kaufman Skill Decomposition

Migration and modernization are too broad to learn as one skill. Decompose them into sub-skills:

First 20 Hours Focus

For the first 20 hours of advanced practice, focus on these high-yield sub-skills:

TimeboxFocusPractice Output
2hMigration vocabulary and 7RsClassify 20 workloads into migration strategies
3hDependency mappingDraw dependency graph for a legacy system
3hStrangler designDesign incremental extraction for one capability
3hData migration riskCreate backfill + CDC + reconciliation plan
3hCutover planWrite phased cutover and rollback runbook
3hOperational readinessCreate go/no-go checklist
3hModernization governanceBuild decision log and exception model

3. Core Mental Model

Migration is movement. Modernization is capability change. Transformation is organizational change.

Migration changes where the system runs.
Modernization changes how the system behaves and evolves.
Transformation changes how teams build, operate, and govern systems.

A cloud migration that keeps every legacy failure mode intact is often just a data-center relocation with a new bill.

A good modernization effort changes at least one of these properties:

  • deployment frequency;
  • lead time for change;
  • recovery time;
  • observability;
  • security posture;
  • cost transparency;
  • operational toil;
  • scaling behavior;
  • tenant isolation;
  • data governance;
  • release safety;
  • team ownership boundaries.

The Transformation Triangle

You cannot modernize only the code while ignoring the operating model. A new containerized service with no ownership, no SLO, no deployment safety, and no cost allocation is not modern. It is just packaged differently.


4. Migration vs Modernization vs Refactoring

TermMeaningTypical ScopeRisk
MigrationMove workload to AWSHosting/platformOperational and cutover risk
RehostMove with minimal changeVM/server levelLower transformation risk, may preserve legacy flaws
ReplatformMove with selected platform improvementsRuntime/data/service layerModerate risk
RefactorImprove internal structure without changing external behaviorCode/module levelCode correctness risk
Re-architectChange structure, boundaries, data model, or integration styleSystem levelHigh design and delivery risk
ModernizationImprove evolvability, operability, scalability, and governanceEnd-to-end capabilityProgram risk

Practical Example

Legacy application:

- Java monolith on VM
- Oracle database
- file-based batch integration
- manual deployments
- shared admin credential
- no structured logs
- weekly release window

Possible approaches:

ApproachResult
RehostRun VM on EC2, preserve most behavior
ReplatformMove database to RDS/Aurora where compatible, automate deployment
RefactorImprove internal modules, remove hard-coded config
Re-architectExtract case intake, workflow, audit, notification as separate capabilities
ModernizeIntroduce IaC, CI/CD, observability, IAM roles, SLOs, runbooks, cost tags, resilience patterns

Top-tier engineers know when not to modernize. A stable, low-value, soon-to-retire app should not consume the same design energy as a strategic enforcement platform.


5. The 7 Rs Migration Strategies

AWS Prescriptive Guidance commonly describes seven migration strategies, known as the 7 Rs:

  1. Retire — decommission applications that are no longer needed.
  2. Retain — keep applications where they are for now.
  3. Rehost — move application infrastructure with minimal changes.
  4. Relocate — move workloads at the hypervisor/platform level where supported.
  5. Repurchase — replace with a different product, often SaaS.
  6. Replatform — make some optimizations without changing core architecture.
  7. Refactor or re-architect — redesign to use cloud-native capabilities or new architecture.

The key is not memorizing the list. The key is choosing the strategy that fits:

  • business criticality;
  • remaining lifespan;
  • regulatory constraints;
  • operational pain;
  • technical debt;
  • dependency complexity;
  • migration window;
  • team capability;
  • cost pressure;
  • resilience requirement;
  • security risk.

Strategy Decision Matrix

ConditionLikely StrategyReasoning
App unused or duplicatedRetireAvoid migrating waste
App tied to unsupported hardware/regulatory locationRetainCloud move may be unsafe now
App stable, low change, urgent data-center exitRehostSpeed over transformation
VMware estate with compatible targetRelocateReduce migration friction
Commodity business capabilityRepurchaseBuy instead of rebuild
App can benefit from managed DB/cache/storage with minimal code changeReplatformImprove operations without full rewrite
App is strategic but blocked by architectureRefactor/re-architectEnable long-term agility and resilience

Bad Strategy Smells

SmellWhy It Is Dangerous
“Everything will be refactored”Usually too slow, too risky, and too expensive
“Everything will be lift-and-shifted”Preserves cost, security, and operability problems
“We will decide after migration”Creates cloud-hosted technical debt
“The tool will migrate it”Tooling does not decide architecture or ownership
“No app can be retired”Indicates weak portfolio governance
“Modernization means Kubernetes”Confuses platform choice with business capability improvement

6. Portfolio Assessment

Before designing migrations, you need a portfolio view.

A portfolio assessment answers:

What exists, who owns it, why does it matter, what does it depend on, how risky is it, and what should happen to it?

Minimum Portfolio Fields

FieldWhy It Matters
Application nameBasic inventory
Business capabilityPrevents purely technical grouping
OwnerEnables accountability
CriticalityDrives sequencing and DR
Users/consumersReveals impact
Data classificationDrives security and compliance
RuntimeMigration compatibility
DatabaseData migration risk
Integration dependenciesWave planning
Authentication modelIdentity modernization
Deployment modelRelease readiness
Observability levelOperational risk
Compliance obligationsAudit/evidence requirements
Current painModernization value
Cost baselineBusiness case
End-of-life datesUrgency
Recommended 7RMigration strategy
RationaleGovernance evidence

Portfolio Maturity Levels

LevelBehavior
0No reliable inventory
1Spreadsheet inventory, incomplete dependency data
2Inventory plus owners and criticality
3Dependency graph, cost baseline, data classification
4Migration waves, risk scoring, decision log
5Continuously maintained portfolio, integrated with CMDB/IaC/observability

Dependency Graph Example

A dependency graph prevents naive sequencing. You cannot migrate the reporting batch if it depends on a database, file server, and partner SFTP path that will not exist in the target environment.


7. Migration Wave Planning

A migration wave is a batch of applications or components migrated together.

Bad wave planning groups workloads by convenience:

Wave 1: all Linux servers
Wave 2: all Windows servers
Wave 3: all databases

Better wave planning groups workloads by dependency, risk, and learning value:

Wave 1: non-critical app with representative architecture
Wave 2: low-risk apps using shared network/auth baseline
Wave 3: medium-critical apps with database migration
Wave 4: strategic capability with strangler extraction
Wave 5: high-criticality workload after readiness improves

Wave Design Criteria

CriterionQuestion
Dependency closureCan this wave run independently after migration?
Learning valueWhat reusable pattern will this wave prove?
Blast radiusWhat happens if cutover fails?
Business timingAre there blackout windows?
Data complexityIs there CDC, backfill, or dual-write?
Security readinessAre IAM, network, secrets, and audit ready?
Operational readinessAre runbooks and monitoring ready?
Rollback feasibilityCan we safely revert?

Wave Anti-Pattern

Pick the easiest apps first, learn nothing, then discover the hard platform gaps on the most critical workload.

A better approach: choose early waves that are safe but representative.


8. Landing Zone Readiness Before Migration

Migration without a ready landing zone creates chaos.

Before serious workload migration, ensure these foundations exist:

FoundationRequired Capability
Account strategyWorkload/account/environment separation
NetworkVPC, subnet, routing, DNS, endpoint, ingress/egress pattern
IdentityFederation, roles, least privilege, break-glass
LoggingCloudTrail, Config, VPC Flow Logs, application logs
SecurityGuardrails, KMS, secrets, vulnerability detection
DeploymentIaC, CI/CD, artifact promotion
ObservabilityMetrics, logs, traces, alarms, dashboards
OperationsRunbooks, incident management, SSM access
CostTags, budgets, cost allocation
ComplianceEvidence, controls, exception process

Foundation Dependency Diagram

A workload should not be the first place where you invent production access, logging, network egress, backup policy, or deployment promotion.


9. Modernization Candidate Selection

Not every app deserves deep modernization.

High-Value Modernization Candidates

Modernize aggressively when the workload:

  • is strategically important;
  • changes frequently;
  • blocks business agility;
  • has high operational toil;
  • has recurring incidents;
  • has scaling or latency bottlenecks;
  • has security/compliance weaknesses;
  • has unclear ownership boundaries;
  • has strong coupling across business capabilities;
  • is expensive because of inefficient architecture;
  • needs multi-tenant or self-service capability.

Low-Value Modernization Candidates

Avoid deep modernization when the workload:

  • is near retirement;
  • is stable and rarely changed;
  • has low business impact;
  • is replaceable by SaaS;
  • has no meaningful scaling or operational issue;
  • has unclear product sponsorship;
  • has limited team capacity.

Modernization Prioritization Matrix

Business ValueTechnical PainSuggested Approach
HighHighModernize incrementally
HighLowMigrate safely, defer deep changes
LowHighRetire, replace, or contain
LowLowRehost/retain/retire depending on lifecycle

10. Strangler Fig Pattern

The strangler fig pattern incrementally replaces parts of a legacy system with new services while the old and new systems coexist.

It is useful when a big-bang rewrite is too risky.

Basic Flow

Strangler Stages

StageDescription
1. Identify seamFind a capability boundary that can be intercepted
2. Introduce routingRoute selected traffic through a facade/proxy/API gateway
3. Extract capabilityBuild new implementation outside monolith
4. Sync or migrate dataDecide data ownership and transition model
5. Shift trafficRoute more traffic to new capability
6. Monitor and reconcileCompare behavior and data correctness
7. Decommission old pathRemove legacy code/data/integration once safe

Good Strangler Candidates

CandidateWhy It Works
Notification serviceOften peripheral, event-driven, low data ownership risk
Reporting projectionCan be built from replicated data before becoming authoritative
Read-only searchCan use shadow indexing and compare results
Authentication facadeEnables identity modernization carefully
Case intake APIOften a clear edge boundary
Document generationOften separable from core transaction workflow

Poor Early Strangler Candidates

CandidateWhy Risky
Core transaction commitHigh correctness and data ownership risk
Shared authorization logicCross-cutting and high blast radius
Database schema core tablesMany hidden dependencies
Batch settlementOften complex reconciliation and timing constraints
High-regulatory audit logMust preserve evidence integrity

11. Finding Seams in Legacy Systems

A seam is a place where behavior can be changed without rewriting the entire system.

Common seams:

  • URL route;
  • API endpoint;
  • message queue;
  • database table ownership boundary;
  • file exchange;
  • batch job;
  • UI module;
  • feature flag;
  • authentication boundary;
  • reporting read model;
  • external integration adapter.

Seam Evaluation Matrix

SeamGood WhenRisk
API routeRequests can be routed by path/header/tenantContract compatibility
UI routeFrontend module can call new backendUser experience inconsistency
QueueConsumers can be added or replacedOrdering/retry semantics
Database tableOwnership can be isolatedHidden coupling
File feedContract can remain stableBatch timing and duplicate processing
ReportRead-only projection acceptableData freshness expectations
AuthToken boundary can be introducedLockout/security risk

Example: Case Intake Extraction

The new service owns the intake workflow. A sync adapter updates the legacy system only where needed. Over time, downstream consumers move from the legacy database to published domain events or new APIs.


12. Anti-Corruption Layer

An anti-corruption layer protects new architecture from legacy model leakage.

Without it, a new service becomes a thin wrapper around old data structures, old error semantics, old permissions, and old coupling.

Responsibilities

ResponsibilityExample
Model translationLegacy CASE_STATUS='P' becomes PENDING_REVIEW
Protocol adaptationSOAP/file/database call becomes REST/event interface
Error mappingLegacy error code becomes domain-specific error
Identity mappingLegacy user ID maps to federated subject/tenant
Data normalizationInconsistent legacy fields become canonical DTO
Policy enforcementNew authorization policy wraps legacy permissiveness
ObservabilityAdds correlation ID and structured logging

Diagram

Rule

Never let the legacy data model become the public contract of the new platform.

13. Branch by Abstraction

Branch by abstraction is a technique where you introduce an abstraction over old behavior, then gradually swap implementations behind that abstraction.

It is useful when you cannot safely maintain a long-lived source-control branch.

Example

public interface CaseAssignmentEngine {
    AssignmentDecision assign(CaseContext context);
}

public final class LegacyCaseAssignmentEngine implements CaseAssignmentEngine {
    public AssignmentDecision assign(CaseContext context) {
        // Existing behavior
    }
}

public final class NewRuleBasedAssignmentEngine implements CaseAssignmentEngine {
    public AssignmentDecision assign(CaseContext context) {
        // New behavior
    }
}

With feature flags, tenant routing, or traffic sampling, you can route selected calls to the new implementation.

AWS Mapping

NeedAWS Mechanism
Toggle implementationAppConfig feature flags
Route trafficALB weighted target groups / API Gateway / CloudFront behavior
Compare behaviorCloudWatch Logs, metrics, traces
Release graduallyCodeDeploy canary/linear deployment
Roll backDeployment rollback and feature flag disable

14. Data Migration Mental Model

Application migration is often limited by data migration.

Data has properties code does not:

  • volume;
  • history;
  • legal retention;
  • lineage;
  • referential integrity;
  • privacy constraints;
  • encryption requirements;
  • ownership ambiguity;
  • downstream consumers;
  • analytical use;
  • audit evidence;
  • long-lived mistakes.

Data Migration Options

PatternDescriptionUse When
Big-bang export/importStop writes, export, import, switchSmall data, tolerable downtime
Backfill + CDCLoad historical data, stream ongoing changesMedium/large DB, limited downtime
Dual-writeApp writes to old and new systemsRequires strong reconciliation discipline
Event rebuildRebuild new state from event historyReliable event log exists
Read replica transitionUse replica then promote/switchSupported engine/topology
Shadow readNew system reads and compares without servingValidation before cutover

Backfill + CDC Flow

Data Validation Types

ValidationPurpose
Row countBasic completeness
ChecksumsDetect content mismatch
Referential checksDetect broken relations
Business invariantsValidate domain correctness
Sample replayCompare behavior using historical cases
Dual-read comparisonCompare old/new read output
Reconciliation reportExplain remaining differences

Dangerous Assumption

If the migration tool finishes successfully, the business data is correct.

Tool success means the process executed. It does not prove business correctness.


15. Dual-Write and Reconciliation

Dual-write is often necessary and often dangerous.

The problem:

Old system write succeeds, new system write fails.
New system write succeeds, old system write fails.
Both succeed but with different interpretation.
Retry creates duplicate.
Ordering changes final state.

Safer Alternative: Outbox Pattern

Reconciliation Requirements

If dual-write exists, define:

  • source of truth per field;
  • idempotency key;
  • ordering rule;
  • retry policy;
  • compensation process;
  • reconciliation frequency;
  • mismatch severity classification;
  • operator runbook;
  • audit trail;
  • decommission condition.

16. Cutover Patterns

Cutover is the moment users or systems start relying on the migrated target.

Cutover Pattern Matrix

PatternDescriptionStrengthRisk
Big-bangSwitch all users at onceSimple coordinationHigh blast radius
Phased by tenantMove selected tenants/groupsLower blast radiusCompatibility complexity
Phased by capabilityRoute specific functions to new systemAligns with stranglerShared data complexity
Blue/greenTwo environments, switch trafficFast rollback when statelessData rollback hard
CanarySmall traffic percentage firstEarly detectionRequires good telemetry
Parallel runOld and new run togetherStrong validationExpensive and complex
Shadow trafficNew system receives copied trafficSafe behavior comparisonMust avoid side effects
Dark launchDeploy but do not exposeProduction validationDoes not prove user behavior

Cutover Runbook Skeleton

## Cutover Runbook

### Preconditions
- Target environment deployed from approved artifact.
- Data backfill complete.
- CDC lag below threshold.
- Reconciliation mismatch below threshold.
- Monitoring dashboard verified.
- Rollback path tested.
- Business owner approval recorded.
- Support team staffed.

### Steps
1. Freeze non-essential changes.
2. Confirm source and target health.
3. Reduce TTL if DNS switch is used.
4. Enable maintenance banner if needed.
5. Stop or drain source writes if required.
6. Apply final CDC catch-up.
7. Run final validation.
8. Switch traffic.
9. Monitor error rate, latency, queue depth, and business metrics.
10. Declare success or execute rollback.

### Rollback Trigger
- Error rate above threshold for N minutes.
- Data mismatch above severity threshold.
- Critical workflow unavailable.
- Security control failure.
- Business owner declares unacceptable impact.

### Post-Cutover
- Keep source read-only for agreed window.
- Archive evidence.
- Update CMDB/catalog.
- Remove temporary credentials.
- Schedule decommission tasks.

17. Rollback vs Roll-Forward

Rollback is not always possible.

Code rollback is easy compared to data rollback.

Change TypeRollback Difficulty
Stateless code deployUsually easy
Config toggleEasy if tracked
DNS traffic switchModerate due to caching/TTL
Database schema additiveUsually manageable
Database destructive changeHard
Data migration with new writesVery hard
External side effectsOften impossible

Rule

For every migration step, classify rollback as easy, hard, or impossible before execution.

If rollback is impossible, design roll-forward, compensation, and containment.


18. Modernization Architecture Patterns

18.1 Rehost + Stabilize

Use when speed is critical but you still need governance improvements.

Improvements:

  • replace SSH/RDP with SSM Session Manager;
  • add CloudWatch logs/metrics;
  • use IAM roles instead of static credentials;
  • attach backup policy;
  • tag resources;
  • place in governed account/VPC;
  • document runbook.

18.2 Replatform to Managed Services

Benefits:

  • reduce infrastructure toil;
  • improve backup/patching posture;
  • gain managed monitoring hooks;
  • standardize encryption and access;
  • simplify scaling in some cases.

18.3 Strangler to Services

Benefits:

  • incremental change;
  • lower rewrite risk;
  • capability ownership;
  • targeted modernization.

18.4 Event-Driven Extraction

Benefits:

  • decouple read models;
  • introduce new consumers safely;
  • build observability and audit projections;
  • reduce direct database dependency.

19. Modernization of Monoliths

A monolith is not automatically bad.

A monolith becomes problematic when:

  • many teams must coordinate for small changes;
  • deployments are risky and infrequent;
  • unrelated capabilities share state unpredictably;
  • scaling one function requires scaling everything;
  • ownership is unclear;
  • test cycles are too slow;
  • incident blast radius is too large;
  • data model changes affect unrelated workflows;
  • compliance evidence is hard to isolate.

Decomposition Options

Decomposition AxisGood ForRisk
Business capabilityAligns teams and domain ownershipRequires domain clarity
SubdomainStrong DDD fitRequires modeling discipline
Transaction boundaryReduces distributed transaction riskMay not align with product teams
User journeyImproves experience ownershipCan duplicate backend logic
Data ownershipClarifies persistence boundaryRequires migration discipline
Operational profileIsolate high-load componentsCan fragment domain model

Decomposition Smell

A service exists because a class existed.

Microservices should not mirror legacy package structure. They should reflect durable capability boundaries.


20. Regulated Workload Modernization

For regulated systems, modernization must preserve evidence, accountability, and defensibility.

Extra Concerns

ConcernRequirement
Audit trailPreserve who/what/when/why across old and new paths
Data lineageExplain where migrated data came from
Legal retentionDo not lose or alter records under retention
Access controlMaintain least privilege and tenant/role boundaries
EvidenceStore cutover approvals, validation reports, logs
ReproducibilityBe able to explain migration logic later
Exception governanceRecord deviations and approval rationale
Chain of custodyProtect exported/imported data
PrivacyMask/tokenize data in non-prod migration tests

Enforcement Lifecycle Example

A regulatory enforcement platform may have:

  • complaint intake;
  • triage;
  • investigation;
  • evidence collection;
  • legal review;
  • enforcement decision;
  • appeal;
  • remediation tracking;
  • reporting;
  • audit trail.

Do not extract these randomly. Extract around workflow and evidence boundaries.

Good first extraction candidates may be:

  • notification;
  • document generation;
  • read-only reporting;
  • intake facade;
  • audit projection;
  • external partner adapter.

Riskier candidates:

  • enforcement decision engine;
  • legal hold model;
  • canonical evidence store;
  • identity/authorization model;
  • appeal state machine.

21. Migration Tooling Landscape

AWS provides many migration-related services and patterns, but tooling must serve architecture.

NeedCommon AWS Capability
Portfolio discoveryMigration Hub, Application Discovery Service
Server migrationApplication Migration Service
Database migrationDatabase Migration Service
Mainframe modernizationAWS Mainframe Modernization
Data transferDataSync, Transfer Family, Snow Family
VMware migration/relocationVMware-related AWS migration options, where applicable
Modernization guidanceAWS Prescriptive Guidance
Landing zoneOrganizations, Control Tower, IAM Identity Center
IaC deploymentCloudFormation, CDK, Terraform

Tooling Rule

Use tools to reduce mechanical migration effort, not to outsource architectural judgment.

22. Operational Readiness Review for Migration

Before go-live, answer these questions:

Availability

  • What is the expected availability target?
  • Which AZs are used?
  • What dependencies are single-AZ or single-instance?
  • What health checks prove service readiness?

Security

  • Which IAM roles exist and why?
  • Are static credentials eliminated?
  • Are secrets stored in Secrets Manager/Parameter Store?
  • Is encryption configured for data at rest and in transit?
  • Are audit logs enabled?

Observability

  • What dashboards exist?
  • Which alarms page humans?
  • Are business metrics monitored?
  • Are logs structured and searchable?
  • Are traces/correlation IDs available for request paths?

Operations

  • Who owns the workload?
  • What is the escalation path?
  • What are the runbooks?
  • How is production access granted and audited?
  • How are patches handled?

Data

  • Is backup configured?
  • Has restore been tested?
  • Is data migration validated?
  • Is there a reconciliation report?
  • Is retention preserved?

Cost

  • Are tags applied?
  • Are budgets configured?
  • Is there a baseline estimate?
  • Is rightsizing scheduled after stabilization?

23. Migration Failure Modes

Failure ModeCausePrevention
Hidden dependency breaksIncomplete dependency discoveryNetwork flow analysis, logs, stakeholder review
Cutover failsRunbook untestedRehearsal, rollback test, clear thresholds
Data mismatchWeak validationChecksums, business rules, reconciliation
Performance regressionTarget sizing wrongLoad test, scaling policy, capacity model
Security regressionLifted credentials/network assumptionsIAM redesign, secrets migration, least privilege
Observability gapLogs/metrics not readyTelemetry acceptance criteria
Cost surpriseOverprovisioning, data transfer, NAT, idle resourcesBaseline, tags, budgets, rightsizing
Team confusionOwnership unclearRACI, runbooks, escalation
Legacy never decommissionedNo exit criteriaDecommission backlog and deadlines
Shadow system divergenceParallel old/new without reconciliationSource-of-truth policy and reconciliation

24. Decision Records

Every significant migration decision should leave evidence.

ADR Template

# ADR: <Decision Title>

## Status
Proposed | Accepted | Superseded

## Context
What is the system, constraint, business driver, and risk?

## Decision
What are we doing?

## Options Considered
- Option A
- Option B
- Option C

## Consequences
Positive and negative outcomes.

## Rollback or Exit Plan
How can this decision be reversed or retired?

## Evidence
Links to assessment, test, approval, cost estimate, runbook.

Migration Decision Examples

  • Why rehost instead of refactor now?
  • Why choose RDS instead of self-managed DB?
  • Why keep mainframe integration for Phase 1?
  • Why choose tenant-by-tenant cutover?
  • Why accept temporary dual-write?
  • Why use S3 object store for evidence archive?
  • Why defer active-active multi-region?

25. Modernization Roadmap Template

Phase 0: Foundation
- Landing zone
- Network
- IAM federation
- Logging/audit
- Cost tagging
- Deployment baseline

Phase 1: Stabilize
- Rehost/replatform low-risk workloads
- Add observability
- Remove static credentials
- Establish backup/restore
- Build runbooks

Phase 2: Decouple
- Introduce facade/routing layer
- Extract low-risk capabilities
- Add event bus/outbox
- Build read projections
- Reduce direct DB coupling

Phase 3: Own Capabilities
- Extract strategic domain services
- Assign team ownership
- Introduce SLOs
- Split data ownership
- Automate release safety

Phase 4: Optimize
- Rightsize cost
- Improve performance
- Decommission legacy paths
- Harden compliance evidence
- Mature platform self-service

26. Example: Modernizing a Case Management Monolith

Starting Point

- Monolithic case management application
- Shared relational database
- Manual deployment
- Batch reporting
- Ad hoc audit table
- SFTP partner feeds
- Direct database access by reporting tools
- On-prem identity integration

Target Direction

Phased Plan

PhaseChangeRisk Reduced
1Move app to governed AWS account/VPCHosting and governance foundation
2Add CloudWatch, CloudTrail, Config, SSMOperability and auditability
3Introduce API facadeRouting seam for extraction
4Extract notification serviceLow-risk capability decoupling
5Add outbox and event streamReduce direct database coupling
6Build reporting projectionRemove reporting pressure from legacy DB
7Extract case intakeCreate first strategic domain service
8Decommission old intake pathReduce legacy surface

27. Modernization Metrics

Use metrics that reveal real improvement.

DimensionMetric
DeliveryDeployment frequency, lead time, rollback time
ReliabilityIncident count, MTTR, error budget burn
OperationsManual steps per release, toil hours, patch compliance
SecurityStatic credential count, critical findings, access exceptions
CostUnit cost per transaction/case/tenant, idle resource spend
Performancep95/p99 latency, saturation, queue lag
DataReconciliation mismatch rate, CDC lag, restore success
BusinessCase cycle time, complaint intake completion rate

Bad Metrics

MetricProblem
Number of servers migratedSays nothing about business value
Number of microservices createdMay reward fragmentation
Percent cloud migratedMay include waste
Lines of code rewrittenNot a value metric
Kubernetes adoptionPlatform choice, not modernization outcome

28. Common Anti-Patterns

Anti-Pattern 1: Big-Bang Rewrite

Symptoms:

  • multi-year rewrite;
  • old system frozen in theory but changing in practice;
  • feature parity target keeps moving;
  • no production feedback until late;
  • migration team detached from business users.

Better:

  • strangler extraction;
  • capability-by-capability replacement;
  • parallel run for high-risk functions;
  • production validation early.

Anti-Pattern 2: Lift-and-Shift Forever

Symptoms:

  • EC2 estate looks like old data center;
  • static credentials moved unchanged;
  • manual patching persists;
  • no IaC;
  • no cost allocation;
  • no observability improvement.

Better:

  • rehost only as a phase;
  • set stabilization backlog;
  • define modernization triggers.

Anti-Pattern 3: Microservices Without Ownership

Symptoms:

  • many services owned by same overloaded team;
  • shared database persists;
  • no independent deployment;
  • cross-service changes require coordination;
  • observability poor.

Better:

  • align services to teams and capabilities;
  • split data ownership intentionally;
  • enforce service contracts.

Anti-Pattern 4: Data Migration as Afterthought

Symptoms:

  • application plan exists, data plan vague;
  • rollback undefined;
  • validation limited to row counts;
  • downstream consumers ignored.

Better:

  • plan data first;
  • define source of truth;
  • validate business invariants;
  • rehearse cutover.

Anti-Pattern 5: Temporary Becomes Permanent

Symptoms:

  • temporary VPN never retired;
  • dual-write continues indefinitely;
  • legacy database read access remains;
  • exception roles persist;
  • migration accounts become production accounts.

Better:

  • every temporary state has owner and expiry;
  • track decommissioning as first-class work;
  • audit exceptions.

29. Deliberate Practice

Exercise 1: Classify a Portfolio

Create a table of 15 workloads and assign a 7R strategy to each. Include rationale.

Required columns:

Application | Owner | Criticality | Dependencies | Data Class | Pain | Recommended 7R | Rationale | Risk

Exercise 2: Design a Strangler Plan

Choose one monolith capability and design:

  • seam;
  • routing mechanism;
  • new service boundary;
  • data ownership;
  • event model;
  • rollback plan;
  • decommission criteria.

Exercise 3: Data Migration Plan

For a relational database migration, write:

  • source/target engines;
  • backfill plan;
  • CDC plan;
  • validation plan;
  • cutover threshold;
  • rollback/roll-forward plan.

Exercise 4: Cutover Runbook

Write a cutover runbook with:

  • preconditions;
  • exact steps;
  • owners;
  • timing;
  • monitoring;
  • rollback trigger;
  • post-cutover cleanup.

30. Self-Correction Checklist

You understand this part when you can answer:

  • What is the difference between migration and modernization?
  • Why is the 7R strategy not just a checklist?
  • Which workloads should not be modernized?
  • What is a migration wave and how should it be sequenced?
  • What must exist in a landing zone before migration?
  • What is a seam in a legacy system?
  • How does strangler fig reduce rewrite risk?
  • What is an anti-corruption layer and why does it matter?
  • Why is dual-write dangerous?
  • How do you validate migrated data beyond row counts?
  • When is rollback impossible?
  • What evidence should a regulated migration preserve?
  • Which metrics prove modernization success?

31. Engineering Judgment Summary

Migration and modernization are not about moving everything fast or rewriting everything perfectly.

The senior engineering posture is:

Move what should be moved.
Retire what should not exist.
Modernize where change creates durable value.
Use strangler patterns to reduce risk.
Treat data migration as a correctness problem.
Treat cutover as an incident waiting to happen.
Preserve evidence for every important decision.

The best modernization programs are boring in execution because the risk was made visible before production changed.


References

Lesson Recap

You just completed lesson 31 in final stretch. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.