Learn Postgresql In Action Part 033 Schema Migration Zero Downtime
title: Learn PostgreSQL in Action - Part 033 description: Schema migration and zero-downtime change for PostgreSQL production systems, with lock-aware DDL, expand-contract, backfill, validation, and Java release coordination. series: learn-postgresql-in-action seriesTitle: Learn PostgreSQL in Action order: 33 partTitle: Schema Migration and Zero-Downtime Change tags:
- postgresql
- migration
- ddl
- zero-downtime
- java
- flyway
- liquibase
- series date: 2026-07-01
Part 033 — Schema Migration and Zero-Downtime Change
Zero-downtime schema migration is not a bag of clever SQL commands. It is the discipline of changing a live data contract while old application code, new application code, background jobs, replicas, caches, batch processes, and external consumers may all exist at the same time.
A top-tier PostgreSQL engineer thinks of migration as distributed systems change management.
The database schema is not merely structure. It is a runtime contract among:
- application binaries;
- migration tooling;
- connection pools;
- query planner assumptions;
- ORM mappings;
- batch/backfill workers;
- replication/CDC consumers;
- backup and restore procedures;
- operational dashboards;
- humans running incident response.
The goal is not “run DDL successfully”. The goal is:
Change the schema without breaking correctness, availability, observability, rollback ability, or future evolution.
1. Kaufman Skill Decomposition
Following Josh Kaufman’s skill acquisition model, we decompose zero-downtime migration into small sub-skills that can be practiced independently.
| Sub-skill | What to practice | Failure if ignored |
|---|---|---|
| Lock awareness | Predict lock level and lock duration before running DDL | Production writes block unexpectedly |
| Expand-contract thinking | Split incompatible changes into safe phases | Old/new app versions break each other |
| Backfill design | Move large data in controlled batches | WAL explosion, replica lag, vacuum pressure |
| Constraint validation | Add integrity without long blocking scans | Bad historical data or long exclusive lock |
| Dual compatibility | Make app code tolerate old and new schema | Rolling deploy fails halfway |
| Rollback design | Separate code rollback from data rollback | “Rollback” makes corruption worse |
| Migration governance | Review, test, measure, and audit changes | Unsafe DDL reaches production unnoticed |
| Runtime diagnostics | Watch locks, lag, temp files, bloat, error rates | Incident detected too late |
The minimum useful target is:
Given a schema change proposal, you can classify it as safe/unsafe, design an expand-contract rollout, identify required backfills, write lock-safe SQL, define rollback behavior, and specify what metrics to watch during deployment.
2. Mental Model: Schema Migration Is a State Machine
A schema migration is not one event. It is a multi-state transition.
The dangerous mistake is treating this as:
ALTER TABLE ...;
deploy app;
done.
That only works for tiny systems, controlled maintenance windows, or purely additive changes with no behavioral dependency.
In production, schema migration should be modeled as:
state before -> safe transition -> compatibility window -> verification -> cleanup
3. Migration Taxonomy
Not all migrations have the same risk profile.
3.1 Usually safe additive changes
These are usually safe when done carefully:
- create a new nullable column;
- create a new table;
- create a new schema;
- add a non-enforced object used later;
- create an index concurrently;
- add a
CHECK,FOREIGN KEY, orNOT NULLconstraint asNOT VALIDwhen supported; - create a view/function not yet used by old code.
“Usually safe” does not mean “free”. Even additive DDL can briefly take locks, invalidate plans, increase catalog churn, or trigger ORM assumptions.
3.2 Potentially dangerous changes
These require deliberate sequencing:
- rename a column used by application code;
- drop a column;
- change column type;
- add
NOT NULLto a large existing table; - add a foreign key to a high-write table;
- create a non-concurrent index on a large table;
- add a unique constraint over dirty historical data;
- rewrite a large table;
- change enum values used by application code;
- change partitioning structure;
- change primary key semantics;
- change tenant boundary or RLS policy;
- alter a function used by triggers or constraints.
3.3 Operationally dangerous changes
These may appear safe from a schema perspective but are dangerous operationally:
- backfill millions of rows in one transaction;
- create many indexes at once;
- run migration during peak write traffic;
- run heavy validation while replicas are lagging;
- ship a migration that assumes no old application instances exist;
- run DDL without
lock_timeout; - run migration through a connection pool with normal application traffic;
- execute migration in a transaction when the command requires autocommit;
- deploy without observing locks and replication lag.
4. PostgreSQL DDL Lock Reality
Every DDL statement has a locking profile. The critical variables are:
- Which lock is acquired?
- How long is the lock held?
- Does the command scan the table?
- Does it rewrite the table?
- Does it block reads, writes, or both?
- Does it wait for old transactions?
- Can it run inside a transaction block?
The production incident usually comes from confusing lock level with lock duration.
A command may take a strong lock only briefly. That can still be dangerous if it waits behind a long-running transaction; once queued, it can block later application traffic behind it.
4.1 Always use lock timeout for migrations
A migration should generally not wait indefinitely for a lock.
SET lock_timeout = '5s';
SET statement_timeout = '30min';
SET idle_in_transaction_session_timeout = '60s';
The intent is:
- fail fast if the migration cannot acquire a lock safely;
- avoid silent production traffic pileups;
- retry during a safer window;
- make lock waits observable.
Do not set lock_timeout so high that the migration becomes an outage.
5. Expand-Contract Pattern
The expand-contract pattern is the default mental model for zero-downtime schema change.
5.1 Example: split full_name into first_name and last_name
Unsafe version:
ALTER TABLE customer DROP COLUMN full_name;
ALTER TABLE customer ADD COLUMN first_name text NOT NULL;
ALTER TABLE customer ADD COLUMN last_name text NOT NULL;
This breaks every old binary that still reads or writes full_name.
Safe phased version:
-- Phase 1: expand
ALTER TABLE customer ADD COLUMN first_name text;
ALTER TABLE customer ADD COLUMN last_name text;
Deploy application version A:
- writes
full_nameand new columns; - reads from new columns when present;
- falls back to
full_namewhen missing.
Backfill:
UPDATE customer
SET first_name = split_part(full_name, ' ', 1),
last_name = nullif(regexp_replace(full_name, '^\S+\s*', ''), '')
WHERE first_name IS NULL
OR last_name IS NULL;
For real production, do not run this as one huge update. Use chunked backfill.
Then validate:
SELECT count(*)
FROM customer
WHERE first_name IS NULL OR last_name IS NULL;
Then enforce:
ALTER TABLE customer
ADD CONSTRAINT customer_first_name_nn CHECK (first_name IS NOT NULL) NOT VALID;
ALTER TABLE customer
ADD CONSTRAINT customer_last_name_nn CHECK (last_name IS NOT NULL) NOT VALID;
ALTER TABLE customer VALIDATE CONSTRAINT customer_first_name_nn;
ALTER TABLE customer VALIDATE CONSTRAINT customer_last_name_nn;
Then deploy application version B:
- reads new columns only;
- no longer writes
full_name.
Then cleanup:
ALTER TABLE customer DROP COLUMN full_name;
Cleanup is often the riskiest phase because it removes rollback compatibility. Delay it until the new contract has proven stable.
6. Add Column Safely
6.1 Add nullable column
ALTER TABLE enforcement_case
ADD COLUMN risk_score integer;
This is usually an expand-phase operation.
6.2 Add column with default
Modern PostgreSQL avoids rewriting the whole table for many constant defaults, but engineers should still review the exact expression.
Usually acceptable:
ALTER TABLE enforcement_case
ADD COLUMN source_system text DEFAULT 'legacy';
Potentially dangerous if volatile or backfill-heavy semantics are needed:
ALTER TABLE enforcement_case
ADD COLUMN generated_at timestamptz DEFAULT now();
Why? The semantic question is not just performance. It is whether old rows should receive a historical value, migration-time value, or null-then-backfilled value.
6.3 Add required column safely
Do not jump directly to:
ALTER TABLE enforcement_case
ADD COLUMN priority text NOT NULL;
Safer sequence:
ALTER TABLE enforcement_case
ADD COLUMN priority text;
Deploy app that writes priority for new rows.
Backfill old rows.
UPDATE enforcement_case
SET priority = 'normal'
WHERE priority IS NULL
AND id BETWEEN :start_id AND :end_id;
Validate no nulls remain.
SELECT count(*)
FROM enforcement_case
WHERE priority IS NULL;
Then enforce using a lock-aware approach.
In PostgreSQL 18, NOT NULL constraints can be added as not-valid constraints, enabling a staged validation model similar in spirit to other constraints.
ALTER TABLE enforcement_case
ADD CONSTRAINT enforcement_case_priority_nn NOT NULL priority NOT VALID;
ALTER TABLE enforcement_case
VALIDATE CONSTRAINT enforcement_case_priority_nn;
When in doubt, check the exact PostgreSQL version and syntax support in your environment. Managed databases may lag behind upstream versions.
7. Add Constraint Safely
7.1 Add check constraint with NOT VALID
ALTER TABLE enforcement_case
ADD CONSTRAINT enforcement_case_status_valid
CHECK (status IN ('draft', 'open', 'escalated', 'closed'))
NOT VALID;
This means PostgreSQL does not immediately scan all existing rows for validity. New or updated rows must satisfy the constraint.
Later:
ALTER TABLE enforcement_case
VALIDATE CONSTRAINT enforcement_case_status_valid;
This is a powerful pattern:
- prevent new bad data;
- clean historical bad data;
- validate the old data;
- rely on the constraint as part of the application invariant.
7.2 Add foreign key safely
Unsafe on a large busy table:
ALTER TABLE enforcement_action
ADD CONSTRAINT enforcement_action_case_fk
FOREIGN KEY (case_id) REFERENCES enforcement_case(id);
Safer:
ALTER TABLE enforcement_action
ADD CONSTRAINT enforcement_action_case_fk
FOREIGN KEY (case_id) REFERENCES enforcement_case(id)
NOT VALID;
Then validate later:
ALTER TABLE enforcement_action
VALIDATE CONSTRAINT enforcement_action_case_fk;
Before adding a foreign key, ensure the referencing side has a supporting index for common checks and deletes/updates on parent keys.
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_enforcement_action_case_id
ON enforcement_action(case_id);
7.3 Unique constraint safely
A unique constraint over existing data must first prove data is clean.
SELECT external_reference, count(*)
FROM enforcement_case
GROUP BY external_reference
HAVING count(*) > 1;
Then create a unique index concurrently:
CREATE UNIQUE INDEX CONCURRENTLY ux_enforcement_case_external_reference
ON enforcement_case(external_reference)
WHERE external_reference IS NOT NULL;
Then attach it as a constraint when appropriate:
ALTER TABLE enforcement_case
ADD CONSTRAINT enforcement_case_external_reference_unique
UNIQUE USING INDEX ux_enforcement_case_external_reference;
The index-first approach separates the expensive build from the metadata-level constraint attachment.
8. Create Index Safely
For large production tables, prefer:
CREATE INDEX CONCURRENTLY idx_case_status_created_at
ON enforcement_case(status, created_at DESC);
CREATE INDEX CONCURRENTLY allows normal writes to continue, but it is not magic:
- it takes longer than a regular index build;
- it performs multiple phases;
- it cannot run inside a normal transaction block;
- if it fails, it can leave an invalid index that must be dropped;
- it still consumes CPU, I/O, WAL, and maintenance resources;
- concurrent builds on the same table should be serialized.
Always inspect:
SELECT
schemaname,
tablename,
indexname,
indexdef
FROM pg_indexes
WHERE tablename = 'enforcement_case';
Check invalid indexes:
SELECT
n.nspname AS schema_name,
c.relname AS index_name,
i.indisvalid,
i.indisready,
pg_get_indexdef(i.indexrelid) AS definition
FROM pg_index i
JOIN pg_class c ON c.oid = i.indexrelid
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE NOT i.indisvalid OR NOT i.indisready;
Drop failed index safely:
DROP INDEX CONCURRENTLY IF EXISTS idx_case_status_created_at;
9. Rename Without Breaking Rolling Deploys
Column rename is not backward compatible.
Unsafe:
ALTER TABLE enforcement_case RENAME COLUMN officer_id TO assigned_officer_id;
Old code still querying officer_id fails immediately.
Safer sequence:
- Add new column.
- Dual write.
- Backfill.
- Switch reads.
- Stop writing old column.
- Drop old column later.
ALTER TABLE enforcement_case
ADD COLUMN assigned_officer_id bigint;
Backfill:
UPDATE enforcement_case
SET assigned_officer_id = officer_id
WHERE assigned_officer_id IS NULL
AND officer_id IS NOT NULL;
Add consistency check during migration window:
SELECT count(*)
FROM enforcement_case
WHERE officer_id IS DISTINCT FROM assigned_officer_id;
Renaming is acceptable only when you can guarantee no old code, SQL job, BI query, view, trigger, or CDC consumer refers to the old name.
In many organizations, that guarantee is weaker than people think.
10. Change Column Type Safely
Some type changes are metadata-only. Others rewrite the table.
Dangerous example:
ALTER TABLE enforcement_case
ALTER COLUMN external_reference TYPE uuid USING external_reference::uuid;
This can fail due to invalid historical data and may rewrite the table.
Safer expand-contract:
ALTER TABLE enforcement_case
ADD COLUMN external_reference_uuid uuid;
Deploy dual write:
- new writes populate both columns;
- parsing errors are rejected at application boundary.
Backfill in batches:
UPDATE enforcement_case
SET external_reference_uuid = external_reference::uuid
WHERE external_reference_uuid IS NULL
AND external_reference ~* '^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$'
AND id BETWEEN :start_id AND :end_id;
Detect dirty data:
SELECT id, external_reference
FROM enforcement_case
WHERE external_reference IS NOT NULL
AND external_reference_uuid IS NULL;
Then switch reads and eventually drop the old column.
11. Backfill Engineering
Backfill is where many “zero downtime” migrations fail.
11.1 Bad backfill
UPDATE enforcement_case
SET normalized_subject_name = lower(subject_name)
WHERE normalized_subject_name IS NULL;
Possible consequences:
- huge transaction;
- huge WAL burst;
- replica lag;
- table and index bloat;
- lock contention;
- autovacuum pressure;
- long rollback if cancelled;
- application latency spike;
- checkpoint pressure.
11.2 Chunked backfill
Use deterministic chunks.
UPDATE enforcement_case
SET normalized_subject_name = lower(subject_name)
WHERE id >= :start_id
AND id < :end_id
AND normalized_subject_name IS NULL;
Loop with bounded batch size:
batch size: 1,000 to 10,000 rows initially
sleep: 50ms to 500ms between batches
commit: every batch
observe: WAL, locks, replica lag, app latency
adapt: reduce batch when lag grows
11.3 Backfill with keyset cursor
WITH batch AS (
SELECT id
FROM enforcement_case
WHERE normalized_subject_name IS NULL
ORDER BY id
LIMIT 1000
)
UPDATE enforcement_case c
SET normalized_subject_name = lower(c.subject_name)
FROM batch
WHERE c.id = batch.id
RETURNING c.id;
This avoids offset scanning.
11.4 Backfill progress table
CREATE TABLE migration_progress (
migration_name text PRIMARY KEY,
last_id bigint NOT NULL DEFAULT 0,
updated_at timestamptz NOT NULL DEFAULT now()
);
Backfill workers should be restartable.
A production backfill must tolerate:
- process restart;
- transaction failure;
- deadlock retry;
- application deploy rollback;
- partial completion;
- duplicate execution.
12. Dual Write and Dual Read
Dual write is a temporary compatibility mechanism, not an architecture destination.
12.1 Dual write pattern
@Transactional
public void assignOfficer(long caseId, long officerId) {
repository.updateOfficerColumns(caseId, officerId, officerId);
}
SQL:
UPDATE enforcement_case
SET officer_id = :officerId,
assigned_officer_id = :officerId,
updated_at = now()
WHERE id = :caseId;
12.2 Dual read pattern
SELECT
id,
COALESCE(assigned_officer_id, officer_id) AS effective_officer_id
FROM enforcement_case
WHERE id = :id;
12.3 Consistency check
SELECT count(*) AS mismatch_count
FROM enforcement_case
WHERE officer_id IS DISTINCT FROM assigned_officer_id;
12.4 When dual write is risky
Dual write is risky when:
- two columns have different semantics;
- transformation is lossy;
- writes happen from multiple services;
- triggers also mutate data;
- CDC consumers interpret both fields;
- rollback may write only one field again.
In that case, prefer a single source of truth plus derived field regeneration.
13. Trigger-Based Compatibility
Sometimes you can use triggers to keep old and new columns consistent.
CREATE OR REPLACE FUNCTION sync_officer_columns()
RETURNS trigger
LANGUAGE plpgsql
AS $$
BEGIN
IF NEW.assigned_officer_id IS NULL AND NEW.officer_id IS NOT NULL THEN
NEW.assigned_officer_id := NEW.officer_id;
END IF;
IF NEW.officer_id IS NULL AND NEW.assigned_officer_id IS NOT NULL THEN
NEW.officer_id := NEW.assigned_officer_id;
END IF;
RETURN NEW;
END;
$$;
CREATE TRIGGER trg_sync_officer_columns
BEFORE INSERT OR UPDATE ON enforcement_case
FOR EACH ROW
EXECUTE FUNCTION sync_officer_columns();
Use this sparingly.
Pros:
- centralizes compatibility in database;
- protects non-Java writers;
- helps during long migration windows.
Cons:
- hidden behavior;
- harder to reason about from application code;
- can add write overhead;
- can surprise CDC consumers;
- can create recursion or ordering issues when trigger graph grows.
A trigger-based migration should have an explicit removal plan.
14. Migration Tooling: Flyway and Liquibase
14.1 Flyway mental model
Flyway is migration-file driven. Common categories:
- versioned migrations: run once in version order;
- repeatable migrations: re-run when checksum changes;
- baseline migrations: establish starting point for existing databases.
Typical structure:
src/main/resources/db/migration/
V001__create_case_table.sql
V002__add_case_priority.sql
V003__add_case_priority_constraint_not_valid.sql
R__views.sql
Rules for serious systems:
- migrations are immutable after applied;
- do not edit old versioned migrations to “fix” production;
- create a new migration instead;
- avoid environment-specific DDL in the same script;
- do not mix massive backfill with application startup migration;
- separate DDL from long-running data migration when needed;
- run migration from a controlled deploy step, not from every app replica at once.
14.2 Liquibase mental model
Liquibase uses changelogs and changesets. It can express changes in XML/YAML/JSON/SQL and supports preconditions and rollback descriptions.
A changeset is not just SQL. It is governance metadata.
Use preconditions for state-aware deployments:
databaseChangeLog:
- changeSet:
id: 033-add-priority-column
author: platform-team
preConditions:
- onFail: MARK_RAN
- not:
columnExists:
tableName: enforcement_case
columnName: priority
changes:
- addColumn:
tableName: enforcement_case
columns:
- column:
name: priority
type: text
For PostgreSQL-heavy systems, SQL migrations are often clearer for lock-aware DDL.
The tool should not hide PostgreSQL behavior from engineers.
15. Application Startup Migrations: Be Careful
Running migrations automatically at application startup is convenient but dangerous in horizontally scaled services.
Possible failure modes:
- multiple pods attempt migration;
- app readiness blocks on migration;
- migration takes locks during traffic spike;
- app starts with new code before migration completes;
- rollback image encounters partially migrated schema;
- long backfill runs inside app lifecycle;
- failed migration causes crash loop.
Safer pattern:
For small systems, startup migration can be acceptable. For regulated or high-availability systems, prefer a controlled migration job.
16. Lock-Safe Deployment Checklist
Before migration:
SELECT pid, usename, application_name, state, wait_event_type, wait_event,
now() - xact_start AS xact_age,
now() - query_start AS query_age,
query
FROM pg_stat_activity
WHERE datname = current_database()
ORDER BY xact_start NULLS LAST;
Find blockers:
SELECT
blocked.pid AS blocked_pid,
blocked.query AS blocked_query,
blocking.pid AS blocking_pid,
blocking.query AS blocking_query
FROM pg_stat_activity blocked
JOIN pg_stat_activity blocking
ON blocking.pid = ANY(pg_blocking_pids(blocked.pid));
Check invalid indexes:
SELECT c.relname AS index_name,
i.indisvalid,
i.indisready
FROM pg_index i
JOIN pg_class c ON c.oid = i.indexrelid
WHERE NOT i.indisvalid OR NOT i.indisready;
Check replication lag:
SELECT
application_name,
state,
sync_state,
write_lag,
flush_lag,
replay_lag
FROM pg_stat_replication;
During migration watch:
- lock waits;
- blocked app sessions;
- WAL generation;
- replica lag;
- slow query rate;
- error rate by SQLSTATE;
- CPU/I/O saturation;
- connection pool wait time;
- dead tuples and autovacuum activity after backfill.
17. Rollback Strategy
Rollback is not always “run down migration”.
There are three rollback classes.
17.1 Code rollback only
Safe when schema is expanded and backward compatible.
Example:
- new nullable column added;
- old code ignores it;
- rollback app image works.
17.2 Forward fix
Common for database migrations.
If data has been transformed or contract changed, a forward fix may be safer than reverting.
Example:
- bad constraint was too strict;
- relax constraint with new migration;
- preserve data.
17.3 Data rollback
Most dangerous.
Needed when:
- destructive migration ran;
- data was overwritten;
- semantic transformation was wrong;
- external consumers received bad events.
Data rollback must account for:
- writes after migration;
- events already emitted;
- caches/search indexes;
- audit logs;
- external side effects;
- PITR implications.
A real migration plan should state:
Rollback before phase 1: no action.
Rollback after expand: code rollback safe.
Rollback after dual write: code rollback safe if old column maintained.
Rollback after backfill: code rollback safe; data remains extra.
Rollback after contract cleanup: code rollback unsafe; restore/forward fix required.
18. Zero-Downtime Migration Patterns
18.1 Add nullable column
DDL -> deploy writer -> deploy reader -> optional constraint -> cleanup none
18.2 Add required column
add nullable -> deploy writer -> backfill -> validate -> enforce not-null -> cleanup fallback code
18.3 Rename column
add new column -> dual write -> backfill -> switch reads -> stop old writes -> drop old column
18.4 Change type
add new typed column -> dual write conversion -> backfill -> validate conversion -> switch -> drop old
18.5 Add unique invariant
detect duplicates -> clean data -> create unique index concurrently -> attach constraint -> app relies on invariant
18.6 Add foreign key
create referencing index concurrently -> add FK not valid -> clean orphan rows -> validate -> monitor write impact
18.7 Split table
create new table -> dual write or trigger sync -> backfill -> validate counts/checksums -> switch reads -> stop old writes -> cleanup
18.8 Merge tables
create target contract -> dual write target -> backfill from sources -> validate semantic equivalence -> switch readers -> remove source dependency
19. Schema Migration and Java Release Coordination
19.1 App code must support compatibility windows
Bad Java DTO:
record CaseDto(Long id, String priority) {}
If old rows have null priority and code assumes non-null, migration fails at runtime.
Better transition DTO:
record CaseDto(Long id, String priority) {
String effectivePriority() {
return priority == null ? "normal" : priority;
}
}
Later, after backfill and constraint validation, remove fallback.
19.2 Avoid ORM auto-DDL in production
Disable Hibernate schema mutation in production.
spring.jpa.hibernate.ddl-auto=validate
Use explicit migrations.
Hibernate auto-DDL is not a zero-downtime migration tool. It does not encode release compatibility, lock budget, backfill strategy, or rollback semantics.
19.3 Versioned application behavior
During rolling deployment, both old and new app versions may run.
Schema must support all live writers and readers.
20. Practice Lab
Use the lab from Part 002.
20.1 Create baseline table
CREATE TABLE case_assignment (
id bigserial PRIMARY KEY,
case_id bigint NOT NULL,
officer_id bigint NOT NULL,
assigned_at timestamptz NOT NULL DEFAULT now(),
status text NOT NULL DEFAULT 'active'
);
INSERT INTO case_assignment(case_id, officer_id, assigned_at, status)
SELECT
gs,
(random() * 1000)::bigint,
now() - ((random() * 365)::int || ' days')::interval,
CASE WHEN random() < 0.8 THEN 'active' ELSE 'closed' END
FROM generate_series(1, 500000) gs;
20.2 Migration challenge
Goal: introduce assignment_source text NOT NULL DEFAULT 'manual' safely.
Phases:
- add nullable column;
- deploy writer logic;
- backfill in batches;
- add constraint not valid;
- validate;
- remove fallback.
20.3 Observe locks
In one session:
BEGIN;
SELECT * FROM case_assignment WHERE id = 1;
-- Keep transaction open.
In another session:
SET lock_timeout = '3s';
ALTER TABLE case_assignment ADD COLUMN migration_test text;
In a third session:
SELECT pid, state, wait_event_type, wait_event, query
FROM pg_stat_activity
WHERE query ILIKE '%case_assignment%';
Understand what blocks, what waits, and what times out.
21. Production Review Template
Before approving any migration, answer:
1. What invariant or product capability requires this schema change?
2. Is the change additive, destructive, or behavioral?
3. Which old and new app versions must coexist?
4. Which SQL objects depend on the changed object?
5. What lock does each DDL statement take?
6. Does any statement scan or rewrite a large table?
7. Is CREATE INDEX CONCURRENTLY needed?
8. Is NOT VALID / VALIDATE CONSTRAINT applicable?
9. Is there a backfill? Is it chunked and restartable?
10. What is the WAL and replica-lag risk?
11. What dashboards/queries will be watched during rollout?
12. What happens if the migration fails halfway?
13. What is the rollback or forward-fix plan by phase?
14. When can cleanup happen safely?
15. Has this been tested on production-like volume?
22. Common Anti-Patterns
22.1 “One PR changes everything”
Schema, app behavior, backfill, and cleanup in one deployment is not zero downtime.
22.2 “Rollback migration generated automatically”
Automatic rollback scripts often reverse syntax, not business semantics.
22.3 “It worked in staging”
Staging usually lacks:
- production volume;
- long transactions;
- replica lag;
- BI/reporting queries;
- real connection pool pressure;
- dirty historical data.
22.4 “Just add the index”
An index build can saturate I/O, generate WAL, and make replicas lag.
22.5 “Drop old column immediately”
Dropping compatibility too soon converts an easy app rollback into a database restore problem.
22.6 “Backfill inside migration transaction”
Long data migration inside a schema migration transaction is a classic outage pattern.
23. Self-Correction Checklist
You understand this part when you can:
- explain why rename/drop/type-change are not rolling-deploy safe;
- design expand-contract phases for a required column;
- add a foreign key safely to a large table;
- create and recover from a failed concurrent index build;
- design a restartable backfill;
- define rollback behavior for each migration phase;
- explain why app startup migrations can be dangerous;
- review a migration PR for lock, WAL, replica, and Java compatibility risks.
24. Key Takeaways
Schema migration is one of the highest-leverage PostgreSQL skills because it combines database internals, application compatibility, operational safety, and human process.
The core rules are:
- Prefer additive changes first.
- Keep old and new app versions compatible.
- Use expand-contract for incompatible changes.
- Never run large backfills as one blind transaction.
- Use
CREATE INDEX CONCURRENTLYfor large online index builds. - Use
NOT VALIDand later validation where appropriate. - Always set lock and statement timeouts.
- Treat rollback as phase-specific, not a generic undo button.
- Delay destructive cleanup until the new contract is proven.
- Test migration behavior on production-like data volume.
References
- PostgreSQL 18 Documentation —
ALTER TABLE: https://www.postgresql.org/docs/current/sql-altertable.html - PostgreSQL 18 Documentation —
CREATE INDEX: https://www.postgresql.org/docs/current/sql-createindex.html - PostgreSQL 18 Documentation — Constraints: https://www.postgresql.org/docs/current/ddl-constraints.html
- Redgate Flyway Documentation — Migrations: https://documentation.red-gate.com/fd/migrations-271585107.html
- Redgate Flyway Documentation — Repeatable Migrations: https://documentation.red-gate.com/fd/repeatable-migrations-273973335.html
- Liquibase Documentation — Preconditions: https://docs.liquibase.com/secure/user-guide-5-1/what-are-preconditions
- Liquibase Documentation — Rollback: https://docs.liquibase.com/secure/user-guide-5-1-1/what-is-a-rollback
You just completed lesson 33 in final stretch. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.
Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.