Final StretchOrdered learning track

Expand-Contract Database Change

Learn Java Data Access Pattern In Action - Part 056

Expand-contract database change untuk Java production: backward-compatible schema evolution, dual write, dual read, backfill, read switch, feature flag, validation, cleanup, rollback, old/new app compatibility, dan zero-downtime rollout.

13 min read2532 words
PrevNext
Lesson 5660 lesson track51–60 Final Stretch
#java#data-access#database#migration+5 more

Part 056 — Expand-Contract Database Change

Expand-contract adalah pattern paling penting untuk zero-downtime database change.

Prinsipnya sederhana:

Jangan mengubah kontrak lama secara langsung.
Tambahkan kontrak baru lebih dulu.
Buat old dan new app bisa hidup bersama.
Pindahkan data dan traffic secara bertahap.
Hapus kontrak lama hanya setelah aman.

Tetapi implementasinya membutuhkan choreography:

  • migration;
  • app compatibility;
  • dual write;
  • dual read;
  • backfill;
  • parity check;
  • feature flag;
  • read switch;
  • constraint enforcement;
  • cleanup;
  • rollback/forward fix.

Part ini membahas expand-contract secara operasional untuk Java production.


1. Core Thesis

Expand-contract memungkinkan perubahan database tanpa mematahkan rolling deployment.

Core phases:

1. Expand
2. Dual write / compatibility write
3. Backfill
4. Dual read / fallback read
5. Validate parity
6. Switch read
7. Enforce constraints
8. Stop old write
9. Contract cleanup

Tidak semua perubahan butuh semua fase, tetapi perubahan besar biasanya butuh mayoritas.


2. Why Direct Change Fails

Bad direct rename:

alter table case_file rename column status to case_status;

Rolling deployment problem:

old app expects status
new app expects case_status

During rollout:

  • old app crashes after schema changed;
  • new app may start before migration;
  • jobs/scripts still use old column;
  • reports break;
  • rollback app impossible because schema changed.

Expand-contract avoids this by making old and new contracts coexist temporarily.


3. Compatibility Is the Goal

At each phase, ask:

Can old app run?
Can new app run?
Can both write safely?
Can both read safely?
Can rollback app still work?
Can data be recovered?

Migration is not complete until compatibility is preserved through deployment.


4. Example Scenario

Change:

case_file.status text

to:

case_file.case_status text not null

Also:

  • new enum code names;
  • new dashboard index;
  • Java code uses CaseStatus;
  • old app still reads/writes status.

Need safe migration.


5. Phase 1 — Expand Schema

Add new column without breaking old app.

alter table case_file
add column case_status text;

Do not drop/rename old column.

Do not add not-null yet if existing rows lack value.

Old app continues using status.

New app can start writing both.


6. Expand Rule

Expand changes should be additive and backward-compatible:

  • add nullable column;
  • add table;
  • add index;
  • add view v2;
  • add function v2;
  • add constraint not enforced yet if supported;
  • add enum/code while old app tolerant;
  • add trigger only if compatible and tested.

Avoid:

  • drop;
  • rename;
  • type narrowing;
  • required column without default/backfill;
  • incompatible view replacement.

7. Phase 2 — Dual Write

New app writes both old and new columns.

public void insertCase(CaseFile caseFile) {
    jdbc.update("""
        insert into case_file (
            id,
            tenant_id,
            status,
            case_status,
            created_at,
            updated_at
        ) values (?, ?, ?, ?, ?, ?)
        """,
        caseFile.id().value(),
        caseFile.tenantId().value(),
        oldStatusCode(caseFile.status()),
        newStatusCode(caseFile.status()),
        now,
        now
    );
}

For update:

update case_file
set status = :oldStatus,
    case_status = :newStatus,
    updated_at = :now,
    version = version + 1
where id = :id
  and version = :expectedVersion;

Both representations stay in sync for new writes.


8. Dual Write Ownership

Dual write must be centralized.

Good:

  • repository/DAO update method writes both;
  • all command paths use same method;
  • tests assert both columns updated.

Bad:

  • controller A writes both;
  • batch job writes only old;
  • admin script writes only new;
  • stored procedure writes old only.

Search and audit all write paths.


9. Hidden Write Paths

Find:

  • application repositories;
  • batch jobs;
  • admin tools;
  • stored procedures;
  • triggers;
  • ETL;
  • data repair scripts;
  • integration tests;
  • manual runbooks;
  • external service sharing DB.

Every writer must be compatible.


10. Phase 3 — Backfill Existing Data

Existing rows have:

status filled
case_status null

Backfill:

update case_file
set case_status =
    case status
        when 'OPEN' then 'UNDER_REVIEW'
        when 'DONE' then 'APPROVED'
        else status
    end
where case_status is null;

For large table, do chunked job.

Backfill must be idempotent:

where case_status is null

or source version check.


11. Chunked Backfill

Pseudo:

while (true) {
    int updated = jdbc.update("""
        update case_file
        set case_status = status
        where case_status is null
          and id > ?
        order by id
        limit ?
        """, lastId, chunkSize);

    if (updated == 0) {
        break;
    }

    saveProgress(lastId);
    sleep(throttle);
}

Actual SQL syntax varies by DB.

Better pattern may select IDs first, then update by IDs.


12. Backfill Progress Table

create table migration_job_progress (
    job_name text primary key,
    last_processed_id uuid,
    rows_processed bigint not null,
    updated_at timestamp not null
);

Use for:

  • resume after crash;
  • monitoring;
  • audit;
  • throttling;
  • kill switch.

For deterministic "update where null" you may not need cursor, but progress helps operations.


13. Phase 4 — Dual Read / Fallback Read

New app can read new column with fallback:

String statusCode = row.caseStatus() != null
        ? row.caseStatus()
        : convertOldStatus(row.status());

This allows new app to run before backfill complete.

Old app still reads status.

Do not switch to "new only" until backfill/parity verified.


14. Dual Read Caveat

Dual read can hide incomplete backfill forever.

Add metrics:

case_status_fallback_read.count
case_status_null.count
status_mismatch.count

Fallback should trend to zero.

Set cleanup deadline.


15. Phase 5 — Parity Check

Check old/new representation match.

select count(*)
from case_file
where case_status is null
   or case_status <> status;

If mapping not one-to-one, use mapping function/table.

For large table, run chunked validation.

Record:

  • mismatch count;
  • sample rows;
  • last checked time;
  • query version.

16. Parity Mismatch Handling

If mismatch found:

  • identify writer missing dual write;
  • stop/switch off feature if needed;
  • repair rows;
  • add tests;
  • monitor again.

Do not proceed to read switch until mismatch understood.


17. Phase 6 — Switch Reads

Feature flag:

if (featureFlags.readCaseStatusFromNewColumn()) {
    return CaseStatus.fromDbCode(row.caseStatus());
}

return CaseStatus.fromDbCode(row.status());

Rollout:

  1. enable for canary;
  2. monitor errors/mismatch;
  3. ramp traffic;
  4. enable globally.

Keep dual write active.


18. Read Switch Rollback

If new read path fails, disable flag and read old column again.

This is why old column still exists and is still updated.

Read switch is reversible by config/feature flag.


19. Phase 7 — Enforce Constraints

After backfill + read switch:

alter table case_file
alter column case_status set not null;

Add check constraint:

alter table case_file
add constraint ck_case_file_case_status
check (case_status in ('UNDER_REVIEW', 'APPROVED', 'REJECTED'));

For large DB, use DB-specific non-blocking validation if possible.

Precondition:

select count(*) from case_file where case_status is null;

must be zero.


20. Constraint Enforcement Timing

Do not enforce too early.

If old app or writer still does not write new column, not-null will break.

Ensure:

  • all app instances dual write;
  • all jobs/scripts updated;
  • backfill complete;
  • old app gone or compatible;
  • tests pass.

21. Phase 8 — Stop Old Writes

After all apps read new column and old versions gone, stop writing old column.

But old column still exists.

App writes only:

case_status

Keep old column for rollback window or compatibility.

Monitor old column no longer changes or remains consistent.


22. Stop Old Write Rollback

If you need to rollback to old app after stopping old writes, old app may read stale old column.

Therefore stop old write only after rollback-to-old-app no longer needed or after old column is maintained by trigger/compat layer.

Be conservative.


23. Phase 9 — Contract Cleanup

Later migration:

alter table case_file
drop column status;

Before drop:

  • old app versions gone;
  • code search no old column;
  • reports/scripts updated;
  • data parity archived if needed;
  • backup exists;
  • feature flags removed;
  • dashboards/metrics zero old usage;
  • migration scheduled.

Contract cleanup can be days/weeks later.


24. Contract Cleanup Is Separate Release

Do not combine expand and contract in one deployment.

Use separate release/window.

This reduces rollback risk.


25. Expand-Contract Timeline

Release A:
  migration add new column
  app dual write + fallback read

Between:
  backfill + parity check

Release B:
  feature flag read new column
  enforce constraint after validation

Release C:
  stop old write

Release D:
  drop old column

Can be compressed for small systems, but phases should be logically separated.


26. Column Rename Pattern

Rename is not rename in zero-downtime.

Use:

add new column
dual write
backfill
dual read
switch read
stop old write
drop old

Direct rename is only safe during full downtime or if no old app/consumer exists.


27. Column Type Change Pattern

Example amount_cents bigint to amount numeric(19,2).

Pattern:

  1. add amount_decimal;
  2. dual write converted value;
  3. backfill deterministically;
  4. read fallback;
  5. parity check with conversion;
  6. switch read;
  7. constraints;
  8. cleanup.

Be careful with precision/rounding.


28. Splitting Column

Example full_name to first_name, last_name.

Backfill may be lossy.

Need:

  • conversion rules;
  • manual review exceptions;
  • unknown markers;
  • new write path;
  • old read fallback maybe derive full_name from new fields;
  • business acceptance.

Not all migrations are purely mechanical.


29. Merging Columns

Example first_name, last_name to display_name.

Dual write can derive display name.

Backfill from old fields.

But old app may update first/last separately.

Need compatibility writer until old app gone.


30. Table Split

Move detail fields to case_detail.

Expand:

create table case_detail (
    case_id uuid primary key,
    summary text,
    risk_score int,
    updated_at timestamp not null
);

Dual write:

  • writes old columns and new table.

Backfill:

  • insert missing detail rows from old table.

Dual read:

  • read detail table with fallback to old columns.

Switch read.

Contract:

  • drop old columns later.

31. Table Merge

Create new table with desired shape.

Dual write both old tables and new merged table.

Backfill new table.

Read new with fallback.

Eventually stop old writes/drop old.

Merges often affect multiple queries and reports, so query catalog helps.


32. Read Model v2 Pattern

For complex dashboard read model:

  1. create case_dashboard_read_model_v2;
  2. projector writes v1 and v2;
  3. backfill/rebuild v2;
  4. compare v1/v2 outputs;
  5. switch readers to v2;
  6. stop v1 writes;
  7. drop v1 later.

This avoids destructive read model migration.


33. API Compatibility

Database expand-contract often requires API compatibility.

If API exposes old field:

{"status": "OPEN"}

new internal field should not force API change immediately.

Map internal new schema to stable API contract.

API changes need their own versioning/deprecation.


34. Event Compatibility

Outbox event payload changes:

  • add optional field first;
  • consumers tolerate missing/unknown fields;
  • version event if incompatible;
  • publish old/new fields during transition;
  • stop old field after consumers updated.

Database migration and event schema migration often coordinate.


35. Feature Flags

Feature flags can control:

  • dual write on/off;
  • read new column;
  • write new only;
  • use new index/query path;
  • use read model v2;
  • enable constraint-dependent behavior.

Flags should be temporary and removed in contract phase.

Track flag lifecycle.


36. Feature Flag Risk

Too many flags create state explosion.

For migration, define allowed combinations.

Example:

dual_write=true must be enabled before read_new=true.
write_old=false only after read_new=true and old app gone.

Guard invalid combinations.


37. Compatibility Triggers

Sometimes database trigger maintains old/new columns.

Example:

when status updated, update case_status
when case_status updated, update status

Pros:

  • protects legacy writers;
  • central DB compatibility.

Cons:

  • hidden logic;
  • performance;
  • recursion risk;
  • harder testing;
  • business mapping in DB;
  • migration cleanup.

Prefer application dual write when writers controlled. Use triggers when legacy/uncontrolled writers exist.


38. Dual Write Consistency

Dual write in same SQL statement/transaction is best.

Bad:

updateOldColumn();
updateNewColumn();

separate transactions can diverge.

Good:

update case_file
set status = :old,
    case_status = :new
where ...

or same DB transaction.


39. Dual Write Failure

If writing old succeeds and new fails in same transaction, transaction rolls back.

If different systems write separately, divergence possible.

Use parity checks and repair.


40. Backfill vs Live Writes Race

While backfill runs, live writes may update rows.

Backfill condition should avoid overwriting newer new column:

update case_file
set case_status = status
where case_status is null

If live dual write already set new column, backfill skips.

If mapping depends on updated_at/version, use more careful condition.


41. Source of Truth During Transition

Define source of truth per phase.

Example:

Phase dual write/backfill:
  old status is source for backfill.
  new writes maintain both.

After read switch:
  new case_status becomes source.

Ambiguity causes repair mistakes.


42. Data Parity Metrics

Metrics:

migration.case_status.null_remaining
migration.case_status.mismatch_count
migration.case_status.fallback_read_count
migration.case_status.backfill_rows_per_second
migration.case_status.backfill_lag

Run parity query periodically until cleanup.


43. Canary Read Switch

Canary:

  • one instance;
  • one tenant;
  • internal users;
  • small traffic percentage.

Monitor:

  • mapping errors;
  • mismatch;
  • latency;
  • query plan/index;
  • user error rates;
  • fallback count.

Then ramp.


44. New Index Rollout

If read switch uses new query:

  1. create index first;
  2. verify plan;
  3. deploy app with query behind flag;
  4. canary;
  5. ramp.

Do not switch query before index ready on large table.


45. Dropping Index

Contract may drop old index.

Before drop:

  • confirm no old query uses it;
  • slow query monitoring;
  • query catalog;
  • code search;
  • DB index usage stats;
  • rollback plan.

Dropping index can hurt old app/report.


46. Rollback Strategy by Phase

After expand

Rollback app safe. New column unused.

During dual write

Rollback app may stop writing new column. Backfill can fill later.

After read switch

Disable flag to old read if old column still maintained.

After stop old write

Rollback to old app unsafe unless compatibility trigger/dual write restored.

After drop old column

Rollback old app impossible.

Each phase should document rollback.


47. Forward Fix Strategy

If issue discovered after contract:

  • add compatibility column back if needed;
  • create new migration;
  • deploy fixed app;
  • restore from backup only if data lost and required.

Forward fix often safer than rollback.


48. Test Matrix

Automated tests should cover:

  • old schema + old app maybe through release testing;
  • expanded schema + old app;
  • expanded schema + new app;
  • partial backfill + new app;
  • dual write updates both;
  • fallback read works;
  • read new flag works;
  • parity mismatch detection;
  • constraint precondition fails if dirty;
  • cleanup migration after code no longer uses old.

49. Integration Test for Dual Write

@Test
void updateWritesOldAndNewStatus() {
    useCase.approve(command);

    Row row = jdbc.queryForObject("""
        select status, case_status
        from case_file
        where id = ?
        """, mapper, caseId);

    assertThat(row.status()).isEqualTo("APPROVED");
    assertThat(row.caseStatus()).isEqualTo("APPROVED");
}

50. Integration Test for Fallback Read

@Test
void readsOldStatusWhenNewColumnNotBackfilled() {
    fixture.caseFile(status("UNDER_REVIEW"), caseStatus(null));

    CaseFile caseFile = repository.load(caseId).orElseThrow();

    assertThat(caseFile.status()).isEqualTo(UNDER_REVIEW);
}

This allows partial backfill.


51. Integration Test for Read Switch

@Test
void readSwitchUsesNewStatus() {
    featureFlags.setReadCaseStatusFromNewColumn(true);

    fixture.caseFile(status("OLD_VALUE"), caseStatus("APPROVED"));

    CaseFile caseFile = repository.load(caseId).orElseThrow();

    assertThat(caseFile.status()).isEqualTo(APPROVED);
}

Use carefully because mismatch fixture is artificial.


52. Precondition Test

Before adding not-null/check, migration test should fail if dirty fixture exists.

This proves safety check works.

In CI, use migration-specific test or pre-deploy verification query.


53. Operational Runbook

For expand-contract rollout:

  1. Apply expand migration.
  2. Deploy dual-write app.
  3. Monitor dual write errors.
  4. Run backfill job.
  5. Monitor backfill progress.
  6. Run parity check.
  7. Create index/constraint if needed.
  8. Enable read switch canary.
  9. Ramp read switch.
  10. Enforce not-null/check.
  11. Stop old writes after rollback window.
  12. Contract cleanup later.

54. Checklist Before Read Switch

  • Backfill complete.
  • Fallback read count near zero.
  • Parity mismatch zero or understood.
  • New query/index plan verified.
  • Old and new code both compatible.
  • Feature flag rollback ready.
  • Metrics/alerts active.
  • Support team aware if user-visible.
  • Error mapping updated.

55. Checklist Before Contract

  • All app instances no longer use old schema.
  • Jobs/scripts updated.
  • Reports updated.
  • Code search clean.
  • Old read/write metrics zero.
  • Old indexes not used.
  • Backup available.
  • Contract migration reviewed.
  • Rollback to old app no longer required.
  • Feature flags removed or scheduled for removal.

56. Anti-Pattern: Expand and Contract in Same Deploy

Breaks rollback/rolling compatibility.


57. Anti-Pattern: Direct Rename

Use add-copy-switch-drop.


58. Anti-Pattern: Backfill Without Parity Check

You don't know if data is safe.


59. Anti-Pattern: Dual Write in Some Paths Only

Hidden writer will corrupt parity.


60. Anti-Pattern: Permanent Fallback Read

Migration never finishes and complexity remains.

Set cleanup deadline.


61. Anti-Pattern: Contract Before Old Jobs Updated

Batch/admin/report scripts break outside main app deploy.


62. Mini Lab

Design expand-contract for:

Move document metadata from case_document table JSON column metadata_json
to normalized columns:
- file_name
- mime_type
- size_bytes
- checksum

Requirements:

  • old app reads/writes metadata_json;
  • new app reads normalized columns;
  • table has 300M rows;
  • document upload still active;
  • reporting query needs index on mime_type;
  • rollback app possible for 2 weeks.

Tasks:

  1. Expand schema.
  2. Dual write strategy.
  3. Backfill job design.
  4. Fallback read mapping.
  5. Parity check.
  6. Index creation.
  7. Read switch.
  8. Constraint enforcement.
  9. Stop old write after rollback window.
  10. Contract cleanup.

63. Summary

Expand-contract is the core pattern for zero-downtime database evolution.

You must master:

  • expand additive schema;
  • dual write;
  • hidden write path discovery;
  • chunked idempotent backfill;
  • dual/fallback read;
  • parity check;
  • read switch with feature flag;
  • constraint enforcement;
  • stop old write;
  • contract cleanup;
  • column rename/type change/split/merge;
  • table split/merge;
  • read model v2;
  • event/API compatibility;
  • compatibility triggers;
  • live write race;
  • source-of-truth definition;
  • metrics;
  • canary rollout;
  • rollback by phase;
  • test matrix;
  • operational runbook.

Part berikutnya membahas Zero-Downtime Data Access Change: deploy order, rolling deployment, old/new app compatibility, migration lock, failure rollback, and how to choreograph schema/query/code changes safely.


64. References

Lesson Recap

You just completed lesson 56 in final stretch. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.