Deepen PracticeOrdered learning track

Learn Postgresql In Action Part 022 Wal Checkpoint Durability

[]19 min read3693 words

In This Lesson

1. Problem yang Diselesaikan WAL 2. Mental Model: WAL sebagai Journal of Intent and Effects 3. Vocabulary Minimum

Lesson 2235 lesson track20–29 Deepen Practice

title: Learn PostgreSQL in Action - Part 022 description: Write-Ahead Logging, checkpoints, durability, crash recovery, commit semantics, WAL tuning, and production failure modelling. series: learn-postgresql-in-action seriesTitle: Learn PostgreSQL in Action order: 22 partTitle: WAL, Checkpoints, Durability, and Crash Recovery tags:

postgresql
database
wal
checkpoint
durability
recovery
performance
java
series date: 2026-07-01

Part 022 — WAL, Checkpoints, Durability, and Crash Recovery

Pada Part 021 kita membahas vacuum sebagai mekanisme menjaga storage tetap sehat dalam dunia MVCC. Sekarang kita masuk ke mekanisme yang membuat PostgreSQL bisa tetap benar setelah crash: Write-Ahead Log, atau WAL.

WAL adalah salah satu konsep paling penting untuk engineer yang ingin memahami PostgreSQL di level production.

Tanpa mental model WAL, banyak gejala akan terasa acak:

commit kadang lambat;
write throughput turun saat checkpoint;
pg_wal tiba-tiba penuh;
replica lag membesar;
backup/PITR tidak bisa dipercaya;
batch job menghasilkan WAL jauh lebih besar dari ukuran data logical;
synchronous_commit disetel sembarangan tanpa memahami konsekuensi durability;
storage cepat, tetapi latency tetap spike karena fsync/checkpoint behavior.

Top engineer melihat write path PostgreSQL sebagai pipeline:

SQL change -> shared buffers -> WAL record -> WAL flush -> commit acknowledgment -> later data page checkpoint -> crash recovery replay

Fokus part ini: memahami durability mechanics, bukan hanya parameter tuning.

1. Problem yang Diselesaikan WAL

Database harus menjawab pertanyaan ini:

Kalau server mati setelah transaksi commit tetapi sebelum semua data page ditulis ke disk, bagaimana database tahu perubahan mana yang harus dipertahankan?

Jawabannya: WAL.

Prinsip Write-Ahead Logging:

Catatan perubahan harus ditulis ke log yang durable sebelum data page terkait dianggap aman berubah.

Dengan WAL, PostgreSQL tidak perlu menulis semua data page ke disk saat commit. PostgreSQL cukup memastikan WAL record yang diperlukan sudah durable sesuai konfigurasi commit. Data page bisa ditulis nanti oleh background writer atau checkpointer.

Ini memberi trade-off penting:

commit bisa lebih cepat daripada flush semua data page;
crash recovery perlu replay WAL sejak checkpoint terakhir;
checkpoint frequency mempengaruhi recovery time dan I/O pressure;
WAL volume menjadi faktor utama replication, backup, dan storage capacity.

2. Mental Model: WAL sebagai Journal of Intent and Effects

WAL bukan sekadar log text. WAL adalah log binary internal yang mencatat perubahan database pada level yang cukup untuk redo crash recovery.

Ingat urutan utamanya:

WAL before data page

Bukan:

data page before WAL

Jika data page ditulis dulu lalu crash sebelum WAL durable, recovery bisa melihat page berubah tanpa log yang menjelaskan perubahan itu. Itu merusak consistency.

3. Vocabulary Minimum

Istilah	Arti Production
WAL	Write-Ahead Log, log perubahan untuk durability dan recovery.
LSN	Log Sequence Number, posisi byte/logical dalam WAL stream.
`pg_wal`	Direktori WAL segment di data directory.
WAL segment	File WAL, biasanya 16 MB default jika tidak diubah saat `initdb`.
Dirty page	Data page di memory yang berubah tetapi belum ditulis ke data file.
Checkpoint	Titik ketika PostgreSQL memastikan dirty pages sampai titik tertentu ditulis dan recovery bisa mulai dari posisi checkpoint.
Redo	Proses replay WAL saat crash recovery.
`fsync`	Memaksa data benar-benar durable di storage.
`full_page_writes`	Menulis full page image pada perubahan pertama setelah checkpoint untuk melindungi dari torn page.
WAL archiving	Menyalin WAL segment ke archive storage untuk PITR/backup.
Replication slot	Mekanisme mempertahankan WAL agar replica/subscriber bisa mengejar.
`synchronous_commit`	Setting kapan commit dianggap selesai relatif terhadap WAL flush/local/remote.

4. PostgreSQL Write Path

Mari pecah satu transaksi sederhana.

BEGIN;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

High-level path:

Yang sering mengejutkan engineer:

commit sukses tidak berarti semua data page sudah ditulis ke table file;
commit sukses berarti WAL yang dibutuhkan sudah cukup durable sesuai setting;
setelah crash, PostgreSQL membaca checkpoint dan replay WAL;
dirty page yang belum sempat ditulis bisa direkonstruksi dari WAL.

5. Commit Semantics dan `synchronous_commit`

synchronous_commit mengontrol kapan PostgreSQL menganggap commit cukup aman untuk dikonfirmasi ke client.

Nilai yang sering ditemui:

Mode	Mental model	Risiko
`on`	Tunggu WAL flush lokal; default yang aman untuk banyak OLTP.	Latency lebih tinggi daripada async.
`off`	Commit bisa dikonfirmasi sebelum WAL flush lokal.	Beberapa transaksi yang sudah “sukses” ke client bisa hilang saat crash.
`local`	Tunggu flush lokal, tidak menunggu synchronous standby.	Remote durability tidak dijamin saat pakai sync replication.
`remote_write`	Tunggu standby menulis WAL ke OS, belum tentu flush durable.	Standby crash bisa kehilangan.
`remote_apply`	Tunggu standby apply perubahan.	Latency tertinggi, consistency read-after-write di standby lebih kuat.

Prinsip:

synchronous_commit=off adalah latency/durability trade-off, bukan tuning gratis.

Aman untuk:

telemetry yang bisa hilang sedikit;
cache invalidation yang bisa direkonsiliasi;
idempotent event buffer dengan upstream source of truth.

Berbahaya untuk:

pembayaran;
ledger;
enforcement case state transition;
inventory reservation;
audit legal/regulatory;
outbox event yang menjadi source of truth.

Di Java, jangan set session-level durability tanpa boundary jelas:

SET LOCAL synchronous_commit = off;

Hanya masuk akal dalam transaksi yang loss-tolerant.

6. LSN: Coordinate System untuk WAL

LSN adalah posisi dalam WAL stream.

Useful functions:

SELECT pg_current_wal_lsn();

SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0');

Dalam replication/backup, LSN dipakai untuk:

mengetahui posisi WAL sekarang;
menghitung replication lag dalam byte;
menentukan restart point slot;
backup start/end position;
recovery target;
debugging WAL retention.

Contoh replication slot WAL retained:

SELECT
  slot_name,
  slot_type,
  active,
  restart_lsn,
  pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained_wal
FROM pg_replication_slots
WHERE restart_lsn IS NOT NULL;

7. WAL Segment dan `pg_wal`

WAL disimpan di direktori pg_wal sebagai segment files.

Hal penting:

default WAL segment biasanya 16 MB;
ukuran segment bisa diubah saat initdb dengan --wal-segsize;
file WAL dapat direcycle;
WAL bisa membesar karena checkpoint, archiving failure, replication slot, high write workload, atau backup/recovery configuration;
jangan hapus file pg_wal manual kecuali mengikuti prosedur recovery resmi. Ini bukan log aplikasi biasa.

Cek WAL directory dari SQL tidak langsung membaca file system, tetapi kita bisa pantau statistik dan slots.

SELECT * FROM pg_stat_wal;

Cek archiver:

SELECT
  archived_count,
  last_archived_wal,
  last_archived_time,
  failed_count,
  last_failed_wal,
  last_failed_time,
  stats_reset
FROM pg_stat_archiver;

8. Checkpoint Mental Model

Checkpoint adalah titik koordinasi antara WAL dan data files.

Saat checkpoint:

PostgreSQL menentukan checkpoint LSN.
Dirty pages sampai titik tertentu ditulis ke data files.
WAL yang diperlukan diflush.
Checkpoint position disimpan.
Crash recovery berikutnya bisa mulai replay dari checkpoint tersebut.

Trade-off checkpoint:

Checkpoint terlalu sering	Checkpoint terlalu jarang
I/O write pressure tinggi.	Crash recovery lebih lama.
More full-page writes after each checkpoint.	WAL bisa membesar.
Latency spike lebih sering.	Lebih banyak WAL harus direplay.
Disk churn tinggi.	Recovery time objective bisa terancam.

Checkpoint bukan hanya durability event. Ia juga performance event.

9. Parameter Checkpoint Penting

Parameter	Arti
`checkpoint_timeout`	Maksimum waktu antar automatic checkpoint. Default umum 5 menit.
`max_wal_size`	Soft limit WAL growth sebelum checkpoint dipicu.
`min_wal_size`	WAL minimum yang dipertahankan/recycled untuk spike.
`checkpoint_completion_target`	Seberapa tersebar checkpoint write dilakukan sepanjang interval.
`checkpoint_warning`	Log warning jika checkpoint karena WAL penuh terjadi terlalu dekat.

Mental model:

checkpoint_timeout controls time pressure
max_wal_size controls WAL volume pressure
checkpoint_completion_target controls smoothing

Jika log berisi:

checkpoints are occurring too frequently

Biasanya max_wal_size terlalu kecil untuk workload, atau batch job menghasilkan WAL besar.

Jangan langsung menaikkan max_wal_size tanpa mengecek:

apakah batch job wajar;
apakah index terlalu banyak;
apakah full-page writes tinggi setelah checkpoint;
apakah archiving/replication slot menahan WAL;
apakah recovery time masih sesuai RTO.

10. Full Page Writes dan Torn Page

Storage menulis page dalam unit tertentu. PostgreSQL page biasanya 8 KB. Jika power loss terjadi saat sebagian page tertulis, page bisa menjadi torn page: setengah lama, setengah baru.

full_page_writes membantu melindungi hal ini.

Mental model:

Konsekuensi:

setelah checkpoint, perubahan pertama pada banyak page bisa menghasilkan WAL besar;
checkpoint terlalu sering bisa meningkatkan full-page-write overhead;
mematikan full_page_writes berbahaya kecuali storage stack benar-benar menjamin atomic page write dan organisasi menerima risiko.

Rule:

Jangan matikan full_page_writes sebagai tuning biasa.

11. WAL Volume: Kenapa Write Kecil Bisa Mahal

Sebuah logical update kecil bisa menghasilkan WAL besar karena:

heap tuple baru;
index entries;
visibility/fsm metadata;
full page image setelah checkpoint;
TOAST changes;
foreign key/index maintenance;
logical decoding metadata;
replication requirements.

Contoh buruk:

UPDATE orders
SET updated_at = now()
WHERE status = 'OPEN';

Jika menyentuh 10 juta row:

membuat 10 juta versi tuple baru;
menghasilkan dead tuple;
memperbarui index jika updated_at atau affected column terindex;
menghasilkan WAL besar;
membuat replica lag;
memicu autovacuum debt;
memperpanjang backup/archive volume.

Di Java, ini sering muncul dari:

bulk update tanpa batching;
migration data correction;
scheduled “touch all active rows”;
ORM update semua dirty entity;
event replay tanpa checkpoint aplikasi;
audit table dengan index berlebihan.

12. Measuring WAL Generation

Ukur WAL sebelum dan sesudah operasi.

SELECT pg_current_wal_lsn() AS before_lsn;

Jalankan workload.

SELECT pg_current_wal_lsn() AS after_lsn;

Hitung:

SELECT pg_size_pretty(pg_wal_lsn_diff('0/1A000000', '0/19000000'));

Lebih praktis dalam satu session:

SELECT pg_current_wal_lsn() \gset

-- run workload here

SELECT
  pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), :'pg_current_wal_lsn')) AS wal_generated;

Untuk cluster-level stats:

SELECT
  wal_records,
  wal_fpi,
  pg_size_pretty(wal_bytes) AS wal_bytes,
  wal_buffers_full,
  wal_write,
  wal_sync,
  wal_write_time,
  wal_sync_time,
  stats_reset
FROM pg_stat_wal;

Jika track_wal_io_timing aktif, timing WAL write/sync lebih informatif.

13. WAL Buffers

wal_buffers adalah memory untuk WAL data sebelum ditulis.

Biasanya default otomatis cukup untuk banyak workload. Tetapi high-write workload bisa menunjukkan pressure.

Signal:

SELECT wal_buffers_full
FROM pg_stat_wal;

Jika wal_buffers_full terus naik cepat, backend sering harus flush WAL karena buffer penuh.

Namun tuning wal_buffers bukan langkah pertama. Cek dulu:

transaksi terlalu besar;
batch job terlalu agresif;
checkpoint terlalu sering;
storage latency tinggi;
synchronous replication;
index write amplification.

14. Background Writer vs Checkpointer

PostgreSQL punya proses yang membantu menulis dirty pages.

Mental model sederhana:

Process	Peran
Backend process	Bisa menulis WAL dan kadang data page jika perlu.
WAL writer	Membantu flush WAL secara periodik.
Background writer	Menulis dirty shared buffers agar backend tidak terlalu sering menulis sendiri.
Checkpointer	Mengelola checkpoint dan memastikan dirty pages terkait checkpoint tertulis.

Jika background/checkpoint tidak mampu mengimbangi dirty page generation, backend client bisa ikut menulis page dan latency naik.

Observability:

SELECT * FROM pg_stat_bgwriter;

Pada versi PostgreSQL modern, statistik checkpointer juga dapat tersedia melalui view khusus seperti pg_stat_checkpointer.

Cek view yang tersedia:

SELECT relname
FROM pg_class
WHERE relname IN ('pg_stat_bgwriter', 'pg_stat_checkpointer', 'pg_stat_wal');

15. Crash Recovery Step-by-Step

Saat PostgreSQL restart setelah crash:

Yang penting:

PostgreSQL melakukan REDO dari WAL;
recovery time bergantung pada seberapa banyak WAL sejak checkpoint;
checkpoint terlalu jarang bisa memperpanjang restart;
checkpoint terlalu sering bisa memperburuk runtime latency;
RTO harus mempengaruhi checkpoint tuning.

Pertanyaan production:

Jika primary crash sekarang, berapa lama recovery sampai service siap menerima traffic?

Jangan hanya mengukur happy-path latency. Ukur crash recovery di staging dengan data volume realistis.

16. WAL dan Replication

Physical replication mengirim WAL stream dari primary ke standby.

Jika primary menghasilkan WAL lebih cepat daripada standby menerima/apply:

replication lag naik;
read replica makin stale;
failover RPO bisa memburuk;
replication slot bisa menahan WAL di primary;
storage primary bisa penuh.

Pantau:

SELECT
  application_name,
  state,
  sync_state,
  sent_lsn,
  write_lsn,
  flush_lsn,
  replay_lsn,
  pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)) AS replay_lag_bytes
FROM pg_stat_replication;

Jangan hanya monitor lag waktu. Monitor byte lag juga.

17. WAL dan Logical Decoding/CDC

Logical replication dan CDC membaca WAL untuk mengeluarkan perubahan logical.

Trade-off:

CDC memberi event stream yang kuat;
tetapi slot CDC yang macet menahan WAL;
schema change harus dikelola;
long transaction baru terlihat setelah commit, bisa menghasilkan burst besar;
large transaction bisa menekan memory/disk consumer.

Cek logical slot:

SELECT
  slot_name,
  plugin,
  slot_type,
  active,
  restart_lsn,
  confirmed_flush_lsn,
  catalog_xmin,
  pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained_wal
FROM pg_replication_slots
WHERE slot_type = 'logical';

Rule:

CDC bukan gratis. Ia memindahkan sebagian coupling dari database query ke WAL retention, schema evolution, dan consumer liveness.

18. WAL Archiving dan PITR

WAL archiving menyimpan completed WAL segment ke archive storage.

Gunanya:

Point-in-Time Recovery;
backup continuous;
restore ke waktu sebelum human error;
warm standby tertentu;
forensic timeline.

Basic config conceptual:

archive_mode = on
archive_command = 'test ! -f /archive/%f && cp %p /archive/%f'

Production archive command harus:

idempotent;
return zero hanya jika archive berhasil;
tidak silently discard WAL;
punya monitoring failure;
punya retention policy;
diuji restore-nya.

Cek:

SELECT
  archived_count,
  failed_count,
  last_archived_wal,
  last_archived_time,
  last_failed_wal,
  last_failed_time
FROM pg_stat_archiver;

Jika archive gagal, WAL bisa menumpuk di primary.

19. Durability vs Latency: Application-Level Classification

Tidak semua write punya kebutuhan durability sama.

Data	Durability requirement	Setting attitude
Ledger/payment	Sangat tinggi	Jangan korbankan durability.
Regulatory audit	Sangat tinggi	Commit harus kuat, backup/PITR wajib.
Case state transition	Tinggi	Harus recoverable dan idempotent.
Outbox event	Tinggi jika source of truth	Jangan async commit sembarangan.
Metrics raw high-volume	Medium/low	Bisa batch/loss-tolerant.
Cache refresh marker	Low	Bisa async jika dapat direkonstruksi.
Search indexing queue	Medium	Bisa replay dari source table.

Java service sebaiknya tidak hanya punya satu repository abstraction yang menyembunyikan semua durability decision. Untuk domain critical, durability adalah bagian dari contract.

20. Unlogged Tables

Unlogged table mengurangi WAL untuk data table, tetapi trade-off besar:

data tidak crash-safe seperti logged table;
tidak direplikasi secara physical/logical dengan cara sama untuk data;
setelah crash, isi unlogged table bisa hilang/truncated;
cocok untuk temporary staging/cache yang bisa dibangun ulang.

Contoh:

CREATE UNLOGGED TABLE import_stage_orders (
  external_id text,
  payload jsonb,
  loaded_at timestamptz default now()
);

Cocok untuk:

staging import yang bisa diulang;
intermediate computation;
cache table rebuilt dari source;
dedupe scratch table.

Tidak cocok untuk:

order utama;
audit;
outbox;
job queue yang tidak bisa hilang;
regulatory evidence.

21. Temporary Tables dan WAL

Temporary table juga punya WAL behavior berbeda dan lifecycle session-local.

Use case:

batch processing intermediate result;
ETL transform dalam session;
complex report staging;
migration helper.

Namun di Java connection pool, temporary table perlu hati-hati karena connection dipakai ulang.

Risiko:

temp table tersisa dalam session pooled connection;
nama bentrok;
transaction boundary tidak membersihkan state;
memory/temp file pressure.

Gunakan:

CREATE TEMP TABLE tmp_ids (
  id bigint PRIMARY KEY
) ON COMMIT DROP;

ON COMMIT DROP membantu membersihkan state setelah transaksi.

22. Checkpoint Tuning Workflow

Jangan tuning checkpoint dari satu parameter.

Workflow:

22.1 If Checkpoints Too Frequent

Symptoms:

log warning checkpoints occurring too frequently;
latency spikes around checkpoint;
high full-page writes;
WAL volume high.

Possible actions:

increase max_wal_size;
increase checkpoint_timeout within RTO constraints;
tune checkpoint_completion_target;
reduce write amplification;
reschedule batch jobs;
improve storage.

22.2 If Crash Recovery Too Slow

Symptoms:

restart takes too long;
too much WAL replay;
RTO violated.

Possible actions:

reduce checkpoint interval;
reduce max_wal_size;
improve storage read throughput;
reduce huge transactions;
test failover rather than relying on crash recovery only.

Trade-off explicit:

More runtime checkpoint I/O  <-> less crash recovery WAL replay
Less runtime checkpoint I/O  <-> more crash recovery WAL replay

23. Large Transaction Failure Mode

Large transaction is dangerous because it concentrates cost.

Example:

BEGIN;
UPDATE account_event
SET processed = true
WHERE created_at < now() - interval '30 days';
COMMIT;

Risks:

massive WAL burst;
long lock duration;
replication lag spike;
CDC burst after commit;
vacuum debt;
rollback very expensive;
snapshot horizon held long;
connection occupied;
application timeout ambiguity.

Better:

WITH batch AS (
  SELECT id
  FROM account_event
  WHERE processed = false
    AND created_at < now() - interval '30 days'
  ORDER BY id
  LIMIT 10000
)
UPDATE account_event e
SET processed = true
FROM batch b
WHERE e.id = b.id;

Commit per batch from Java:

int total = 0;
while (true) {
    int updated = transactionTemplate.execute(status -> repository.markBatchProcessed(10_000));
    if (updated == 0) break;
    total += updated;
}

Benefits:

WAL distributed;
lock duration lower;
replication can catch up;
failure resumes naturally;
timeout easier to reason about.

24. WAL and Index Design

Every index has write cost.

For each insert/update/delete, PostgreSQL may need to log index changes.

Index portfolio affects WAL volume.

Example:

table: orders
indexes: 14
operation: update status on 1M rows

If many indexes include status, updated_at, or expression derived from changed columns:

WAL grows;
index bloat grows;
vacuum cleanup grows;
replication lag grows.

Index design question:

Does this index pay for itself in read latency more than it costs in WAL, vacuum, storage, and write latency?

This is why Part 015 emphasized index budget.

25. WAL and TOAST

Large values can be stored in TOAST tables. Updating large JSONB/text payloads can generate significant WAL.

Anti-pattern:

UPDATE customer_profile
SET profile_json = jsonb_set(profile_json, '{lastViewedAt}', to_jsonb(now()))
WHERE id = ?;

If profile_json large:

entire modified value may create large storage changes;
TOAST churn;
WAL volume high;
GIN index update if indexed;
vacuum debt.

Better design:

move high-churn field to normal column;
avoid GIN index on frequently rewritten large document unless needed;
split stable document from mutable metadata;
append event rather than rewrite large blob.

26. WAL and `COPY` / Bulk Load

Bulk load can produce huge WAL.

Strategies:

load into staging table;
create indexes after load if possible;
use unlogged staging if data can be reloaded;
run ANALYZE after load;
consider partition attach strategy;
avoid one enormous transaction unless atomicity requirement justifies it;
ensure replica/archive can handle WAL rate.

Example:

CREATE UNLOGGED TABLE order_import_stage (
  external_id text,
  payload jsonb
);

Load:

COPY order_import_stage (external_id, payload)
FROM '/data/orders.csv'
WITH (FORMAT csv, HEADER true);

Then validate and merge in controlled batches.

27. Observability Minimum

27.1 WAL Stats

SELECT
  wal_records,
  wal_fpi,
  pg_size_pretty(wal_bytes) AS wal_bytes,
  wal_buffers_full,
  wal_write,
  wal_sync,
  wal_write_time,
  wal_sync_time,
  stats_reset
FROM pg_stat_wal;

27.2 Archiver

SELECT
  archived_count,
  failed_count,
  last_archived_wal,
  last_archived_time,
  last_failed_wal,
  last_failed_time
FROM pg_stat_archiver;

27.3 Replication Lag

SELECT
  application_name,
  state,
  sync_state,
  pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)) AS replay_lag_bytes,
  write_lag,
  flush_lag,
  replay_lag
FROM pg_stat_replication;

27.4 Replication Slots

SELECT
  slot_name,
  slot_type,
  active,
  restart_lsn,
  confirmed_flush_lsn,
  pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained_wal
FROM pg_replication_slots
WHERE restart_lsn IS NOT NULL;

27.5 Checkpoint/Background Writer

SELECT * FROM pg_stat_bgwriter;

If available:

SELECT * FROM pg_stat_checkpointer;

28. Incident Playbook: `pg_wal` Disk Full

Scenario:

Filesystem containing pg_wal almost full.
Application writes failing or close to failing.

Step 1 — Do Not Delete WAL Manually

Manual deletion can destroy recoverability and corrupt recovery assumptions.

Step 2 — Check Replication Slots

SELECT
  slot_name,
  slot_type,
  active,
  restart_lsn,
  pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained_wal
FROM pg_replication_slots
WHERE restart_lsn IS NOT NULL
ORDER BY pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) DESC;

Inactive slot with huge retained WAL is common.

Step 3 — Check Archiver Failure

SELECT
  failed_count,
  last_failed_wal,
  last_failed_time,
  last_archived_wal,
  last_archived_time
FROM pg_stat_archiver;

Step 4 — Check Write Spike

SELECT
  pg_size_pretty(wal_bytes) AS wal_bytes,
  wal_records,
  wal_fpi,
  stats_reset
FROM pg_stat_wal;

Step 5 — Emergency Options

Possible actions, depending on root cause:

fix archive destination/permissions/network;
restart/fix replica or CDC consumer;
drop obsolete replication slot only after confirming it is safe;
add disk temporarily;
stop high-WAL batch job;
throttle application writes;
failover only if planned and safe.

Dropping slot:

SELECT pg_drop_replication_slot('obsolete_slot_name');

But only if you accept that consumer cannot resume from that slot.

29. Incident Playbook: Commit Latency Spikes

Scenario:

p99 commit latency spikes every few minutes.

Check:

Are spikes aligned with checkpoints?
Is storage fsync latency high?
Is synchronous replication waiting?
Is WAL generated in bursts?
Are batch jobs running?
Is wal_buffers_full increasing?
Are checkpoints too frequent?

Queries:

SELECT * FROM pg_stat_wal;

SELECT * FROM pg_stat_replication;

SELECT * FROM pg_stat_bgwriter;

Also inspect logs for checkpoint messages.

Possible fixes:

smooth batch jobs;
increase max_wal_size;
adjust checkpoint completion;
improve storage latency;
revisit synchronous replication mode;
reduce index write amplification;
batch small commits when correctness allows.

30. Incident Playbook: Replica Lag After Batch Job

Scenario:

Nightly job finishes on primary.
Replica is 90 minutes behind.
Read traffic sees stale data.

Diagnosis:

SELECT
  application_name,
  state,
  sent_lsn,
  write_lsn,
  flush_lsn,
  replay_lsn,
  pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn)) AS replay_lag_bytes,
  write_lag,
  flush_lag,
  replay_lag
FROM pg_stat_replication;

Interpretation:

Lag point	Meaning
`sent_lsn` far behind current	Primary sender/network bottleneck.
`write_lsn` behind sent	Standby receiving/writing bottleneck.
`flush_lsn` behind write	Standby disk flush bottleneck.
`replay_lsn` behind flush	Standby apply/replay bottleneck.

Fix direction:

reduce WAL burst;
batch smaller;
avoid rewriting huge rows;
reduce indexes affected by batch;
improve replica hardware;
schedule job away from read-SLA window;
use logical/application-level incremental processing.

31. Java Integration: Timeout Hierarchy

Durability path interacts with application timeouts.

Set timeout hierarchy intentionally:

statement_timeout < transaction timeout < request timeout < load balancer timeout

Example:

SET LOCAL statement_timeout = '5s';
SET LOCAL lock_timeout = '500ms';

In Java/Spring:

@Transactional(timeout = 10)
public void performCriticalWrite(...) {
    // keep only DB invariant here
}

But remember: timeout after commit ambiguity is real.

If client times out while server is committing, application might not know if transaction committed. Solve with:

idempotency key;
unique request key;
outbox/inbox;
read-after-timeout reconciliation;
retry only when operation is idempotent.

32. Java Integration: Idempotency and WAL Reality

Network failure around commit boundary:

The transaction may have committed even if client did not receive response.

Correct pattern:

CREATE TABLE payment_request (
  idempotency_key text PRIMARY KEY,
  payment_id bigint NOT NULL,
  created_at timestamptz NOT NULL DEFAULT now()
);

Application retry inserts same idempotency key. Database uniqueness tells whether operation already happened.

This is not only an application pattern. It is a direct consequence of distributed systems plus WAL-backed commit semantics.

33. Hands-On Lab

33.1 Measure WAL for Insert

DROP TABLE IF EXISTS wal_lab;

CREATE TABLE wal_lab (
  id bigserial PRIMARY KEY,
  status text NOT NULL,
  payload text NOT NULL,
  created_at timestamptz NOT NULL DEFAULT now()
);

SELECT pg_current_wal_lsn() AS before_lsn \gset

INSERT INTO wal_lab (status, payload)
SELECT 'NEW', repeat(md5(i::text), 20)
FROM generate_series(1, 100000) AS s(i);

SELECT
  pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), :'before_lsn')) AS wal_generated;

33.2 Compare Update WAL

SELECT pg_current_wal_lsn() AS before_lsn \gset

UPDATE wal_lab
SET status = 'DONE'
WHERE id <= 50000;

SELECT
  pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), :'before_lsn')) AS wal_generated;

33.3 Add Index and Repeat

CREATE INDEX wal_lab_status_created_idx
ON wal_lab (status, created_at);

SELECT pg_current_wal_lsn() AS before_lsn \gset

UPDATE wal_lab
SET status = 'ARCHIVED'
WHERE id <= 50000;

SELECT
  pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), :'before_lsn')) AS wal_generated;

Observe how index changes affect WAL.

33.4 Observe Checkpoint

CHECKPOINT;

Then watch stats:

SELECT * FROM pg_stat_wal;
SELECT * FROM pg_stat_bgwriter;

Do not run aggressive checkpoint loops on shared production systems.

33.5 Test Unlogged Table

CREATE UNLOGGED TABLE wal_lab_unlogged AS
SELECT * FROM wal_lab WITH NO DATA;

SELECT pg_current_wal_lsn() AS before_lsn \gset

INSERT INTO wal_lab_unlogged
SELECT * FROM wal_lab;

SELECT
  pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), :'before_lsn')) AS wal_generated;

Compare with logged table behavior. Interpret carefully; metadata/index operations can still produce WAL.

34. Self-Correction Questions

Why does PostgreSQL write WAL before data pages?
Why can commit succeed before table files contain the changed data page?
What is an LSN?
How does checkpoint frequency affect runtime latency and crash recovery time?
Why can frequent checkpoints increase WAL volume through full-page writes?
What does synchronous_commit=off risk?
Why can a tiny logical update create large WAL?
How do indexes amplify WAL generation?
Why can replication slots fill pg_wal?
What is the difference between write lag, flush lag, and replay lag on a standby?
Why is manual deletion from pg_wal dangerous?
When are unlogged tables acceptable?
Why can a Java timeout after commit produce ambiguous outcome?
How does idempotency key solve commit ambiguity?
Why must backup strategy include WAL archiving or equivalent continuous WAL capture for PITR?

35. Production Readiness Checklist

36. Key Takeaways

WAL is the durability backbone of PostgreSQL.
Commit does not require every changed data page to be written immediately; WAL makes later recovery possible.
Checkpoints trade runtime I/O against crash recovery time.
Frequent checkpoints can increase I/O and full-page-write overhead.
WAL volume is affected by indexes, TOAST, full page writes, batch size, and update shape.
Replication, CDC, PITR, and crash recovery all depend on WAL.
synchronous_commit is a correctness/durability decision, not just a performance knob.
Java services must handle commit ambiguity with idempotency and reconciliation.
pg_wal is not an application log directory; never delete it manually as a “cleanup”.
Top PostgreSQL engineering requires connecting SQL write shape to WAL, replica lag, checkpoint pressure, backup cost, and recovery time.

References

PostgreSQL Documentation — WAL Internals: https://www.postgresql.org/docs/current/wal-internals.html
PostgreSQL Documentation — Write Ahead Log Configuration: https://www.postgresql.org/docs/current/runtime-config-wal.html
PostgreSQL Documentation — Continuous Archiving and PITR: https://www.postgresql.org/docs/current/continuous-archiving.html
PostgreSQL Documentation — Monitoring Statistics: https://www.postgresql.org/docs/current/monitoring-stats.html
PostgreSQL Documentation — Reliability and WAL: https://www.postgresql.org/docs/current/wal.html

Lesson Recap

You just completed lesson 22 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 21

Learn Postgresql In Action Part 021 Vacuum Autovacuum Freeze

Next Lesson

Lesson 23

Learn Postgresql In Action Part 023 Replication Streaming Hot Standby

Learn Postgresql In Action Part 022 Wal Checkpoint Durability

Part 022 — WAL, Checkpoints, Durability, and Crash Recovery

1. Problem yang Diselesaikan WAL

2. Mental Model: WAL sebagai Journal of Intent and Effects

3. Vocabulary Minimum

4. PostgreSQL Write Path

5. Commit Semantics dan synchronous_commit

6. LSN: Coordinate System untuk WAL

7. WAL Segment dan pg_wal

8. Checkpoint Mental Model

9. Parameter Checkpoint Penting

10. Full Page Writes dan Torn Page

11. WAL Volume: Kenapa Write Kecil Bisa Mahal

12. Measuring WAL Generation

13. WAL Buffers

14. Background Writer vs Checkpointer

15. Crash Recovery Step-by-Step

16. WAL dan Replication

17. WAL dan Logical Decoding/CDC

18. WAL Archiving dan PITR

19. Durability vs Latency: Application-Level Classification

20. Unlogged Tables

21. Temporary Tables dan WAL

22. Checkpoint Tuning Workflow

22.1 If Checkpoints Too Frequent

22.2 If Crash Recovery Too Slow

23. Large Transaction Failure Mode

24. WAL and Index Design

25. WAL and TOAST

26. WAL and COPY / Bulk Load

27. Observability Minimum

27.1 WAL Stats

27.2 Archiver

27.3 Replication Lag

27.4 Replication Slots

27.5 Checkpoint/Background Writer

28. Incident Playbook: pg_wal Disk Full

Step 1 — Do Not Delete WAL Manually

Step 2 — Check Replication Slots

Step 3 — Check Archiver Failure

Step 4 — Check Write Spike

Step 5 — Emergency Options

29. Incident Playbook: Commit Latency Spikes

30. Incident Playbook: Replica Lag After Batch Job

31. Java Integration: Timeout Hierarchy

32. Java Integration: Idempotency and WAL Reality

33. Hands-On Lab

33.1 Measure WAL for Insert

33.2 Compare Update WAL

33.3 Add Index and Repeat

33.4 Observe Checkpoint

33.5 Test Unlogged Table

34. Self-Correction Questions

35. Production Readiness Checklist

36. Key Takeaways

References

5. Commit Semantics dan `synchronous_commit`

7. WAL Segment dan `pg_wal`

26. WAL and `COPY` / Bulk Load

28. Incident Playbook: `pg_wal` Disk Full