Series MapLesson 30 / 34
Final StretchOrdered learning track

Learn Java Database Migrations Part 030 Observability Audit And Compliance

11 min read2183 words
PrevNext
Lesson 3034 lesson track2934 Final Stretch

title: Learn Java Database Migrations, Flyway, Liquibase - Part 030 description: Observability, audit trail, compliance, and regulatory defensibility for Java database migrations using Flyway, Liquibase, CI/CD evidence, production logs, approvals, and release records. series: learn-java-database-migrations seriesTitle: Learn Java Database Migrations, Flyway, Liquibase order: 30 partTitle: Observability, Audit Trail, Compliance, and Regulatory Defensibility slug: observability-audit-and-compliance tags:

  • java
  • database
  • migration
  • flyway
  • liquibase
  • observability
  • audit
  • compliance
  • governance
  • production-engineering date: 2026-06-28

Part 030 — Observability, Audit Trail, Compliance, and Regulatory Defensibility

Goal: setelah bagian ini, kamu bisa membuat database migration yang tidak hanya berhasil secara teknis, tetapi juga bisa dijelaskan, dibuktikan, diaudit, dan dipertanggungjawabkan setelah production change terjadi.

Dalam sistem production, terutama regulated systems, migration harus menjawab pertanyaan audit sederhana tetapi berat:

Siapa yang mengubah database?
Apa yang berubah?
Kapan berubah?
Di environment mana berubah?
Mengapa berubah?
Siapa yang menyetujui?
Artifact mana yang dijalankan?
Apakah artifact itu sama dengan yang direview?
Apakah migration berhasil?
Apa evidence-nya?
Apa dampaknya ke data dan aplikasi?
Bagaimana recovery plan-nya?

Jika jawaban hanya “pipeline hijau” atau “Flyway/Liquibase sukses”, evidence belum cukup.

Observability dan audit bukan dekorasi. Keduanya adalah bagian dari correctness model database migration.


1. Kaufman Deconstruction

Skill observability/audit migration bisa dipecah menjadi:

Sub-skillOutput Konkret
Change identitysetiap migration punya identity, owner, ticket, risk class
Execution tracestart/end/duration/status/operator/environment tercatat
Artifact integritychecksum, commit SHA, build ID, image digest, changelog version
Approval evidencePR, review, DBA approval, change request, risk signoff
Database evidenceschema history/changelog table, row counts, post-checks
Runtime observabilitymetrics, logs, traces, alerts, lock/wait monitoring
Compliance mappingevidence bisa menjawab kontrol internal/regulator
Incident linkagefailure migration terhubung ke incident/change record
Retention policyevidence disimpan sesuai kebutuhan audit

Kaufman-style target:

Dalam 20 jam latihan, kamu harus bisa membuat satu migration evidence bundle yang cukup untuk menjawab audit tanpa membuka Slack history atau bertanya ke engineer yang deploy.


2. Mental Model: Migration as Auditable State Transition

Migration adalah transition dari S_before ke S_after.

Setiap edge harus punya evidence.

TransitionEvidence
Intent → artifactticket, design note, migration file
Artifact → reviewedPR review, SQL preview, risk checklist
Reviewed → approvedapproval record, change request
Approved → executedbuild ID, commit SHA, runner identity, timestamp
Executed → observedFlyway/Liquibase history, logs, metrics
Observed → verifiedpost-check SQL, app health, invariant checks
Verified → audit recordevidence bundle, release notes, retention location

3. Observability vs Audit vs Compliance

Ketiganya sering dicampur, padahal berbeda.

ConcernPrimary QuestionExample
ObservabilityApa yang sedang/baru saja terjadi?migration duration, lock wait, failed changeset
AuditApa yang terjadi dan apa buktinya?PR, checksum, who/when/what
ComplianceApakah proses memenuhi kontrol yang diwajibkan?approval, segregation of duties, retention

Observability membantu operasi real-time.

Audit membantu rekonstruksi kejadian.

Compliance membantu membuktikan proses memenuhi aturan internal/regulatory.


4. Evidence Hierarchy

Tidak semua evidence punya kualitas sama.

Level 0: verbal claim
Level 1: log line
Level 2: CI job result
Level 3: immutable artifact + checksum
Level 4: database history table + release metadata
Level 5: independently retained evidence bundle

Untuk production-grade migration, target minimal adalah Level 4. Untuk regulated environment, targetkan Level 5.


5. Flyway Audit Surface

Flyway menyediakan audit surface utama melalui schema history table. Table ini mencatat migration yang diterapkan, checksum, status, timestamp, dan metadata eksekusi.

Typical fields dalam flyway_schema_history mencakup konsep seperti:

installed_rank
version
description
type
script
checksum
installed_by
installed_on
execution_time
success

Gunakan table ini sebagai ledger teknis, bukan satu-satunya audit record.

5.1 Useful Queries

SELECT installed_rank,
       version,
       description,
       type,
       script,
       checksum,
       installed_by,
       installed_on,
       execution_time,
       success
FROM flyway_schema_history
ORDER BY installed_rank;

Failed migration:

SELECT *
FROM flyway_schema_history
WHERE success = false
ORDER BY installed_rank DESC;

Last migration:

SELECT *
FROM flyway_schema_history
ORDER BY installed_rank DESC
LIMIT 1;

5.2 What Flyway History Does Not Prove Alone

Flyway history table does not prove:

  • PR was approved;
  • SQL was reviewed;
  • change request was authorized;
  • data invariant was verified;
  • application compatibility was tested;
  • production lock impact was acceptable;
  • operator used correct approval path;
  • generated SQL matched reviewed artifact if build was not immutable.

It is necessary evidence, not complete evidence.


6. Liquibase Audit Surface

Liquibase menyediakan DATABASECHANGELOG untuk mencatat changeset yang sudah dijalankan dan DATABASECHANGELOGLOCK untuk mencegah concurrent update pada database yang sama.

Important concepts:

id
author
filename/dateexecuted/orderexecuted/exectype/md5sum/description/comments/tag/liquibase/contexts/labels/deployment_id

6.1 Useful Queries

SELECT id,
       author,
       filename,
       dateexecuted,
       orderexecuted,
       exectype,
       md5sum,
       description,
       comments,
       tag,
       deployment_id
FROM databasechangelog
ORDER BY orderexecuted;

Current lock:

SELECT id,
       locked,
       lockgranted,
       lockedby
FROM databasechangeloglock;

Changes by deployment:

SELECT deployment_id,
       count(*) AS changeset_count,
       min(dateexecuted) AS first_change,
       max(dateexecuted) AS last_change
FROM databasechangelog
GROUP BY deployment_id
ORDER BY max(dateexecuted) DESC;

6.2 What Liquibase History Does Not Prove Alone

DATABASECHANGELOG does not by itself prove:

  • generated SQL was inspected;
  • rollback was reviewed;
  • labels/contexts were correctly selected;
  • manual SQL was not executed separately;
  • data migration completed if done outside Liquibase;
  • application compatibility was verified;
  • release approval existed.

7. Migration Logging Standard

A migration runner should emit structured logs.

Example JSON log:

{
  "event": "database_migration_started",
  "service": "case-management",
  "environment": "production",
  "database": "case_db",
  "schema": "public",
  "tool": "flyway",
  "toolVersion": "...",
  "commitSha": "8b9f...",
  "buildId": "github-actions-123456",
  "releaseVersion": "2026.06.28.1",
  "changeRequestId": "CHG-10452",
  "operator": "migration-runner",
  "riskClass": "high"
}

Completion log:

{
  "event": "database_migration_completed",
  "service": "case-management",
  "environment": "production",
  "status": "success",
  "durationMs": 43821,
  "migrationsExecuted": 3,
  "lastVersion": "20260628.1400",
  "postChecksPassed": true
}

Failure log:

{
  "event": "database_migration_failed",
  "service": "case-management",
  "environment": "production",
  "status": "failed",
  "failedMigration": "V20260628_1400__add_case_priority.sql",
  "errorClass": "lock_timeout",
  "durationMs": 5000,
  "recoveryMode": "roll_forward_or_retry_after_lock_clear",
  "incidentId": "INC-8821"
}

8. Metrics to Capture

Migration metrics should be emitted to the same monitoring stack as application metrics.

MetricTypeMeaning
db_migration_started_totalcountermigration runs started
db_migration_completed_totalcountermigration runs completed
db_migration_failed_totalcountermigration failures
db_migration_duration_secondshistogramexecution duration
db_migration_pending_countgaugepending migrations before run
db_migration_executed_countgauge/counterexecuted migrations in run
db_migration_lock_wait_secondshistogramlock acquisition/wait
db_migration_backfill_rows_totalcounterrows migrated by backfill
db_migration_backfill_lag_rowsgaugeremaining rows
db_migration_postcheck_failed_totalcounterfailed verification checks

Recommended labels:

service
environment
database
schema
tenant
tool
release_version
migration_version
risk_class

Be careful with high-cardinality labels. Tenant label may be acceptable for small tenant fleets but dangerous for massive SaaS fleets. In large fleets, aggregate by shard or tenant tier and emit per-tenant detail to logs/evidence store.


9. Tracing Migration Execution

For Java migration runner, create trace span around migration execution.

span: db.migration.run
attributes:
  db.system=postgresql
  db.name=case_db
  migration.tool=flyway
  migration.release_version=2026.06.28.1
  migration.change_request_id=CHG-10452
  migration.risk_class=high

Nested spans:

db.migration.validate
db.migration.migrate
db.migration.post_check
db.migration.backfill_batch
db.migration.evidence_upload

Do not put secrets or full SQL with sensitive literals into traces.


10. Post-Migration Verification

Migration should end with explicit post-checks.

Examples:

-- No failed Flyway migration
SELECT count(*) = 0 AS ok
FROM flyway_schema_history
WHERE success = false;
-- No null after backfill
SELECT count(*) AS null_priority_count
FROM case_file
WHERE priority IS NULL;
-- FK candidates are valid before validation
SELECT count(*) AS orphan_count
FROM case_file cf
LEFT JOIN customer c ON c.id = cf.customer_id
WHERE cf.customer_id IS NOT NULL
  AND c.id IS NULL;
-- Feature readiness marker
SELECT value
FROM system_setting
WHERE key = 'case_priority_backfill_completed';

Post-check rule:

A migration is not complete when the tool returns success. It is complete when expected state is verified.


11. Evidence Bundle

For each production migration, generate an evidence bundle.

Recommended structure:

migration-evidence/
  metadata.json
  change-request.md
  reviewed-files.txt
  git-diff.patch
  artifact-digest.txt
  flyway-info-before.json
  flyway-info-after.json
  liquibase-status-before.txt
  liquibase-history-after.csv
  generated-sql.sql
  rollback-or-rollforward-plan.md
  pre-check-results.json
  post-check-results.json
  migration-run.log.jsonl
  metrics-summary.json
  approval-records.json
  incident-link.txt

Example metadata.json:

{
  "service": "case-management",
  "environment": "production",
  "database": "case_db",
  "schema": "public",
  "releaseVersion": "2026.06.28.1",
  "commitSha": "8b9f...",
  "artifactDigest": "sha256:...",
  "tool": "flyway",
  "toolVersion": "...",
  "changeRequestId": "CHG-10452",
  "riskClass": "high",
  "approvedBy": ["db-reviewer", "service-owner"],
  "executedBy": "migration-runner",
  "startedAt": "2026-06-28T10:10:00Z",
  "completedAt": "2026-06-28T10:10:44Z",
  "status": "success"
}

Evidence bundle should be immutable after closure. Store it in artifact storage, release system, or audit evidence repository with retention policy.


12. Release Notes for Database Change

Each migration should have release-note quality documentation.

Template:

# Database Change: Add case priority

## Intent
Add `case_file.priority` to support priority-based triage.

## Change Type
Expand migration: additive nullable column + backfill worker.

## Compatibility
Old app can ignore the new column.
New app can read/write it after feature flag enabled.

## Risk
Medium: backfill touches 18M rows in batches.

## Pre-checks
- column does not exist
- severity values are within known mapping
- no long-running transaction on target table

## Execution
- apply DDL
- deploy app with dual-write
- run backfill
- verify no null priority

## Roll-forward
- fix mapping defects with new migration
- rerun backfill for failed rows

## Rollback
- disable feature flag
- app ignores column
- column drop deferred to contract release

## Verification
- null priority count = 0
- API smoke test passed
- dashboard healthy

13. Compliance Control Mapping

Migration process often supports controls like:

Control IntentMigration Evidence
Change authorizationchange request / ticket / approval
Peer reviewPR review record
Segregation of dutiesreviewer != author; runner identity controlled
Traceabilityticket ID in migration description/comment
Integritychecksum, artifact digest, immutable build
Environment promotionsame artifact across staging and prod
Least privilegemigration role documented
Operational monitoringlogs/metrics/traces
Post-implementation reviewpost-check results
Incident managementincident link for failed migration
Retentionevidence bundle stored with retention policy

A regulated migration process should not rely on tribal memory.


14. Segregation of Duties

For higher control environments, separate roles:

RoleResponsibilityShould Not Do
Authorwrites migrationself-approve high-risk DB change
Reviewerreviews logic/riskexecute unreviewed change
DBA/platformreviews lock/ops/securitysilently patch prod without record
Release managerapproves change windowmodify migration artifact
Migration runnerexecutes approved artifactuse ad hoc SQL outside artifact
Auditorinspects evidencerequire engineer memory as evidence

Automation helps enforce this:

PR author cannot be sole approver
production migration uses service account
manual SQL requires emergency change record
artifact digest must match approved build

15. Artifact Integrity

You need to prove that what ran is what was reviewed.

Capture:

git commit SHA
repository URL
branch/tag
build ID
container image digest
migration artifact checksum
Flyway/Liquibase tool version
generated SQL checksum
approval timestamp
execution timestamp

Bad evidence:

We ran the script from someone's laptop.

Better evidence:

Artifact sha256:abc was built from commit 8b9f, reviewed in PR #492,
approved in CHG-10452, executed by migration-runner in job #8931,
and target DB history table shows checksum X applied successfully at timestamp Y.

16. Manual Change Detection

Manual database changes are a major audit and reliability risk.

Controls:

  • production DDL only via migration role;
  • revoke broad DDL from app user;
  • database audit logging for DDL;
  • scheduled drift detection;
  • Flyway/Liquibase validation in pipeline;
  • compare schema snapshots;
  • alert on DDL outside deployment window;
  • require emergency migration artifact after hotfix.

Example PostgreSQL event trigger idea:

CREATE TABLE ddl_audit_log (
    id bigserial PRIMARY KEY,
    occurred_at timestamptz NOT NULL DEFAULT now(),
    username text NOT NULL,
    command_tag text NOT NULL,
    object_identity text
);

Implementation varies by database and organization, but the invariant is universal:

Production schema change must be attributable to an approved change path.


17. Observability for Backfill and Long-Running Migration

Backfill should have its own operational telemetry.

Progress table:

CREATE TABLE migration_job_progress (
    job_name varchar(128) PRIMARY KEY,
    status varchar(32) NOT NULL,
    last_processed_id varchar(128),
    rows_processed bigint NOT NULL DEFAULT 0,
    rows_failed bigint NOT NULL DEFAULT 0,
    started_at timestamptz NOT NULL DEFAULT now(),
    updated_at timestamptz NOT NULL DEFAULT now(),
    completed_at timestamptz
);

Metrics:

backfill_rows_processed_total
backfill_rows_failed_total
backfill_remaining_rows
backfill_batch_duration_seconds
backfill_throttle_sleep_seconds
backfill_checkpoint_age_seconds

Logs per batch:

{
  "event": "migration_backfill_batch_completed",
  "job": "case_priority_backfill",
  "batchSize": 1000,
  "rowsProcessed": 1000,
  "lastProcessedId": "case-123456",
  "durationMs": 812,
  "remainingEstimate": 420000
}

Backfill completion evidence:

SELECT status, rows_processed, rows_failed, completed_at
FROM migration_job_progress
WHERE job_name = 'case_priority_backfill';

18. Alerts and Stop Conditions

Migration should have predefined stop conditions.

Examples:

ConditionAction
lock wait > thresholdabort migration
replication lag > thresholdpause backfill
error rate increasespause release
failed post-checkkeep feature flag off
row failure count > thresholdstop backfill
migration duration exceeds windowescalate to incident/change manager
app health degradedhalt next deployment phase

Do not invent stop conditions during incident. Define them before deployment.


19. Audit-Ready Flyway Runner Sketch

public final class AuditedFlywayRunner {

    private final Flyway flyway;
    private final MigrationEvidenceWriter evidence;
    private final MigrationMetrics metrics;

    public void run(MigrationMetadata metadata) {
        evidence.writeMetadata(metadata);
        metrics.started(metadata);

        long start = System.nanoTime();
        try {
            evidence.writePreInfo(flyway.info());
            flyway.validate();
            MigrateResult result = flyway.migrate();
            evidence.writeMigrateResult(result);
            runPostChecks();
            evidence.writePostInfo(flyway.info());
            metrics.completed(metadata, elapsedSeconds(start));
        } catch (Exception ex) {
            evidence.writeFailure(ex);
            metrics.failed(metadata, ex);
            throw ex;
        } finally {
            evidence.close();
        }
    }
}

Key design choices:

  • evidence written before and after execution;
  • validation happens before migration;
  • post-checks are first-class;
  • failure is recorded before rethrow;
  • metadata includes release identity.

20. Audit-Ready Liquibase Runner Sketch

public final class AuditedLiquibaseRunner {

    public void run(Connection connection, MigrationMetadata metadata) throws Exception {
        evidence.writeMetadata(metadata);
        evidence.writeGeneratedSql(generateUpdateSql(connection));

        Database database = DatabaseFactory.getInstance()
                .findCorrectDatabaseImplementation(new JdbcConnection(connection));

        Liquibase liquibase = new Liquibase(
                metadata.changelogFile(),
                new ClassLoaderResourceAccessor(),
                database);

        try {
            liquibase.validate();
            evidence.writeDatabaseChangeLogBefore(connection);
            liquibase.update(new Contexts(metadata.contexts()), new LabelExpression(metadata.labels()));
            runPostChecks(connection);
            evidence.writeDatabaseChangeLogAfter(connection);
        } catch (Exception ex) {
            evidence.writeFailure(ex);
            throw ex;
        }
    }
}

Careful: generated SQL should be tied to the same changelog artifact, context, labels, DB engine, and tool version as production execution.


21. Risk Classification

Not every migration needs the same ceremony.

Risk ClassExampleRequired Evidence
Lowadd nullable column to small tablePR, CI, history table, post-check
Mediumadd index, backfill small datagenerated SQL, timing estimate, post-check
Highlarge table DDL, FK validation, data rewriteDBA review, rehearsal, metrics, rollback/roll-forward plan
Criticaldestructive change, tenant-wide migration, regulated data rewritechange board, full evidence bundle, emergency plan

Risk should be explicit in migration metadata.

migration:
  id: DB-20260628-001
  riskClass: high
  destructive: false
  touchesSensitiveData: true
  requiresBackfill: true
  requiresDowntime: false
  rollbackClass: roll-forward-only

22. Naming and Traceability Convention

Migration names should help audit.

Flyway:

V20260628_1400__CHG_10452_add_case_priority_column.sql
V20260628_1430__CHG_10452_backfill_case_priority.sql
R__case_priority_reporting_view.sql

Liquibase:

- changeSet:
    id: CHG-10452-20260628-1400-add-case-priority
    author: platform-db
    comments: "Add case priority column for triage release 2026.06.28.1"

Rules:

MUST include change/ticket reference for production-impacting migration.
SHOULD include concise intent in filename/description.
MUST NOT include secrets, customer names, or sensitive data in filenames.

23. Retention and Access

Evidence is useful only if retained and discoverable.

Recommended retention design:

ArtifactRetention
migration sourcerepository lifetime
CI logsat least release/audit window
generated SQLrelease/audit window
evidence bundlecompliance retention period
database history tabledatabase lifetime / archived on decommission
production DDL audit logsecurity/audit retention
change approvalscompliance retention period

Access rule:

Auditors and incident responders should be able to retrieve evidence by:
- service name
- database name
- release version
- change request ID
- migration version
- execution date

24. Failure and Incident Linkage

A failed migration should create or attach to an incident/change event.

Failure record should include:

migration identity
failure timestamp
target environment
database/schema/tenant
error message
partial state
history table status
objects created/changed before failure
operator action
repair/roll-forward plan
customer impact
incident ID
final closure evidence

Avoid vague closure:

Fixed manually.

Use explicit closure:

Failure caused by duplicate email values blocking unique index creation.
Migration halted before index creation.
Duplicates identified by query X.
Remediation migration V20260628_1530 merged and applied.
Post-check duplicate count = 0.
Unique index created successfully.
Incident INC-8821 closed with evidence bundle link.

25. Common Anti-Patterns

Anti-pattern 1: History Table as the Only Audit Record

History table tells what tool applied. It does not prove authorization, review, compatibility, or post-checks.

Anti-pattern 2: No Artifact Integrity

If production runner pulls mutable branch HEAD, you cannot prove what was reviewed is what ran.

Anti-pattern 3: Manual Hotfix Without Follow-Up Migration

Emergency SQL may be necessary, but it must be reconciled into source-controlled migration history.

Anti-pattern 4: Logs Without Structured Metadata

Unstructured logs are hard to search by release, service, tenant, or change request.

Anti-pattern 5: Post-Checks Are Manual and Unrecorded

If someone ran SQL manually and pasted “looks good” in chat, evidence is weak.

Anti-pattern 6: Same User for App and Migration

App user should not normally own broad DDL rights. Migration runner should have controlled, auditable privileges.

Anti-pattern 7: Audit Trail Stored Only in Ephemeral CI Logs

CI logs expire. Evidence must be retained according to audit needs.


26. Internal Standard

A mature engineering organization should define:

MUST:
- Every production migration has owner, ticket, risk class, and artifact identity.
- Every production migration is traceable to reviewed source control changes.
- Migration runner emits structured logs.
- Tool history/changelog table is captured before/after for high-risk changes.
- Post-checks are executed and stored.
- Failures link to incident/change records.
- Manual hotfixes are reconciled into migration source.

SHOULD:
- Generate evidence bundle automatically.
- Emit migration metrics to observability platform.
- Store generated SQL for review.
- Use immutable build artifacts and image digests.
- Enforce reviewer/author separation for high-risk migrations.
- Run scheduled drift detection.

MAY:
- Require change advisory board for critical database migrations.
- Attach database audit log excerpts.
- Capture before/after schema snapshots.

27. Practical Exercise

For a migration that adds a new regulated case status and backfills historical case records:

Create an evidence bundle containing:

  1. migration intent;
  2. ticket/change request ID;
  3. risk classification;
  4. reviewed migration files;
  5. generated SQL;
  6. approval record;
  7. pre-check results;
  8. execution logs;
  9. Flyway/Liquibase history after execution;
  10. data invariant post-checks;
  11. rollback/roll-forward plan;
  12. dashboard/metric summary;
  13. final audit summary.

Then answer:

Could an auditor reconstruct what happened without asking the deployer?
Could an incident responder know exactly what state the database is in?
Could a future engineer understand whether a manual hotfix occurred?

If any answer is no, the evidence model is incomplete.


28. References

  • Flyway documentation — schema history table as audit trail and checksum/status tracking.
  • Flyway documentation — validate, migrate, and schema history behavior.
  • Liquibase documentation — DATABASECHANGELOG, DATABASECHANGELOGLOCK, update, validate, and update-sql.
  • Spring Boot documentation — database initialization and migration tool integration.
  • Testcontainers Java documentation — database containers for production-like integration testing.

29. Key Takeaways

  1. Database migration is an auditable state transition.
  2. Tool history tables are necessary evidence but not complete audit evidence.
  3. Observability answers what is happening; audit answers what happened; compliance proves the required process was followed.
  4. Evidence should connect intent, review, artifact, execution, verification, and recovery.
  5. Structured logs, metrics, post-checks, and immutable artifacts turn migration from tribal operation into defensible engineering process.
  6. Manual hotfixes must be reconciled into source-controlled migration history.

Next: Part 031 will cover security, privileges, secrets boundary, and dangerous command guardrails.

Lesson Recap

You just completed lesson 30 in final stretch. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.