Start HereOrdered learning track

Production-Like PostgreSQL Lab Setup

Learn PostgreSQL in Action - Part 002

Build a repeatable PostgreSQL 18 lab with Docker, psql, pgbench, pg_stat_statements, logging, metrics queries, and Java/Testcontainers readiness.

9 min read1798 words
PrevNext
Lesson 0235 lesson track0106 Start Here
#postgresql#database#docker#observability+3 more

Part 002 — Production-Like PostgreSQL Lab Setup

1. Tujuan Bagian Ini

Bagian ini membangun lab PostgreSQL yang repeatable. Tanpa lab yang bisa di-reset, semua materi berikutnya akan menjadi teori. Kita butuh environment yang bisa menghasilkan:

  • query cepat dan query lambat;
  • plan baik dan plan buruk;
  • lock wait dan deadlock;
  • table/index bloat;
  • autovacuum behavior;
  • WAL growth;
  • pool starvation;
  • migration lock;
  • Java integration behavior.

Target akhir part ini:

  1. PostgreSQL 18 berjalan lokal dengan konfigurasi observability dasar.
  2. psql bisa dipakai untuk inspeksi.
  3. pg_stat_statements aktif.
  4. Slow query logging aktif.
  5. pgbench tersedia untuk workload sederhana.
  6. Schema latihan siap untuk seri berikutnya.
  7. Ada folder structure yang bisa dipakai untuk eksperimen lanjutan.

2. Prinsip Lab

Lab ini mengikuti empat prinsip.

2.1 Repeatable

Semua harus bisa dihancurkan dan dibuat ulang.

make down
make up
make seed
make benchmark

Kalau hasil eksperimen tidak bisa diulang, kita tidak tahu apakah improvement datang dari perubahan kita atau kebetulan cache, data distribution, atau state lain.

2.2 Observable

Lab harus memperlihatkan gejala internal PostgreSQL:

  • active session;
  • wait event;
  • query statistics;
  • lock graph;
  • slow query log;
  • buffer usage;
  • temp file;
  • WAL growth;
  • table statistics;
  • index usage.

2.3 Small Enough to Run, Large Enough to Fail

Dataset kecil membuat semua query terlihat cepat. Dataset terlalu besar membuat eksperimen lambat. Kita mulai dengan dataset menengah yang bisa diatur.

Target awal:

  • puluhan ribu sampai beberapa juta row;
  • cukup untuk melihat plan difference;
  • cukup untuk melihat index effect;
  • cukup untuk membuat lock dan bloat;
  • masih nyaman dijalankan di laptop.

2.4 One Variable at a Time

Setiap eksperimen hanya mengubah satu hal:

  • index baru;
  • query rewrite;
  • statistics update;
  • pool size;
  • isolation level;
  • batch size;
  • partition strategy;
  • configuration parameter.

Jika terlalu banyak hal berubah, kita tidak sedang belajar. Kita sedang menebak.

3. Folder Structure

Buat struktur proyek:

learn-postgresql-in-action-lab/
├── Makefile
├── docker-compose.yml
├── postgres/
│   ├── init/
│   │   ├── 001_extensions.sql
│   │   ├── 002_schema.sql
│   │   ├── 003_seed_small.sql
│   │   └── 004_observability_views.sql
│   └── conf/
│       └── postgresql.conf
├── scripts/
│   ├── psql.sh
│   ├── reset.sh
│   ├── seed-large.sql
│   ├── locks-demo.sql
│   ├── deadlock-session-a.sql
│   ├── deadlock-session-b.sql
│   └── explain-template.sql
├── pgbench/
│   ├── case-search.sql
│   ├── case-update.sql
│   └── mixed-workload.sql
└── java-lab/
    └── README.md

Kenapa struktur ini penting?

  • postgres/init untuk state awal database.
  • postgres/conf untuk konfigurasi engine.
  • scripts untuk eksperimen manual.
  • pgbench untuk workload repeatable.
  • java-lab untuk integrasi Java pada part berikutnya.

4. Docker Compose PostgreSQL 18

Buat file docker-compose.yml:

services:
  postgres:
    image: postgres:18
    container_name: learn-postgresql-in-action
    ports:
      - "5432:5432"
    environment:
      POSTGRES_DB: learnpg
      POSTGRES_USER: learnpg
      POSTGRES_PASSWORD: learnpg
    command:
      - "postgres"
      - "-c"
      - "config_file=/etc/postgresql/postgresql.conf"
    volumes:
      - pgdata:/var/lib/postgresql/data
      - ./postgres/init:/docker-entrypoint-initdb.d
      - ./postgres/conf/postgresql.conf:/etc/postgresql/postgresql.conf:ro
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U learnpg -d learnpg"]
      interval: 5s
      timeout: 5s
      retries: 10

volumes:
  pgdata:

Catatan:

  • Image postgres:18 membuat lab mengikuti baseline seri.
  • Volume pgdata membuat state bertahan antar restart.
  • Script di /docker-entrypoint-initdb.d hanya berjalan saat database volume pertama kali dibuat.
  • Untuk reset total, hapus volume.

5. Konfigurasi PostgreSQL untuk Lab

Buat file postgres/conf/postgresql.conf:

# -------------------------------------------------------------------
# Basic connection
# -------------------------------------------------------------------
listen_addresses = '*'
port = 5432
max_connections = 100

# -------------------------------------------------------------------
# Memory baseline for local lab
# Do not copy blindly to production.
# -------------------------------------------------------------------
shared_buffers = '512MB'
effective_cache_size = '2GB'
work_mem = '16MB'
maintenance_work_mem = '256MB'

# -------------------------------------------------------------------
# WAL and checkpoint visibility
# -------------------------------------------------------------------
wal_level = replica
max_wal_size = '2GB'
min_wal_size = '256MB'
checkpoint_timeout = '10min'
checkpoint_completion_target = 0.9

# -------------------------------------------------------------------
# Logging for learning
# -------------------------------------------------------------------
logging_collector = on
log_destination = 'stderr'
log_directory = 'log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_min_duration_statement = 250
log_lock_waits = on
deadlock_timeout = '1s'
log_checkpoints = on
log_temp_files = 0
log_autovacuum_min_duration = 0

# -------------------------------------------------------------------
# Query identity and statement statistics
# -------------------------------------------------------------------
compute_query_id = on
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.max = 10000
pg_stat_statements.track = all
pg_stat_statements.track_planning = on

# -------------------------------------------------------------------
# Autovacuum visible enough for experiments
# -------------------------------------------------------------------
autovacuum = on
track_io_timing = on
track_wal_io_timing = on
track_functions = pl

# -------------------------------------------------------------------
# Local lab convenience
# -------------------------------------------------------------------
timezone = 'UTC'
lc_messages = 'C'

Penting:

  • Ini konfigurasi lab, bukan rekomendasi production universal.
  • log_min_duration_statement = 250 membuat query lebih dari 250 ms masuk log.
  • log_temp_files = 0 mencatat semua temp file, berguna saat belajar sort/hash spill.
  • track_io_timing = on membantu diagnosis I/O, tetapi punya overhead kecil; tetap berguna untuk lab.
  • shared_preload_libraries = 'pg_stat_statements' wajib sebelum extension dapat bekerja penuh.

6. Makefile

Buat Makefile:

PG_CONTAINER=learn-postgresql-in-action
PG_URL=postgresql://learnpg:learnpg@localhost:5432/learnpg

.PHONY: up down reset logs psql seed bench stats locks

up:
	docker compose up -d

down:
	docker compose down

reset:
	docker compose down -v
	docker compose up -d

logs:
	docker logs -f $(PG_CONTAINER)

psql:
	psql $(PG_URL)

seed:
	psql $(PG_URL) -f scripts/seed-large.sql

bench:
	pgbench $(PG_URL) -f pgbench/mixed-workload.sql -c 8 -j 4 -T 60

stats:
	psql $(PG_URL) -c "select * from lab.top_statements limit 20;"

locks:
	psql $(PG_URL) -f scripts/locks-demo.sql

Jika psql dan pgbench tidak terinstall di host, jalankan dari container:

docker exec -it learn-postgresql-in-action psql -U learnpg -d learnpg

docker exec -it learn-postgresql-in-action pgbench -U learnpg -d learnpg -i -s 10

7. Extension Setup

Buat postgres/init/001_extensions.sql:

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
CREATE EXTENSION IF NOT EXISTS pgcrypto;

Kita memakai:

  • pg_stat_statements untuk melihat aggregate query behavior;
  • pgcrypto untuk utility UUID/random dalam lab jika dibutuhkan.

PostgreSQL 18 juga menyediakan uuidv7(), tetapi seri ini tidak bergantung pada UUIDv7 di part awal. Nanti kita bahas pada type system dan key design.

8. Schema Latihan: Enforcement Case Management

Buat postgres/init/002_schema.sql:

CREATE SCHEMA IF NOT EXISTS app;
CREATE SCHEMA IF NOT EXISTS lab;

CREATE TYPE app.case_status AS ENUM (
    'DRAFT',
    'SUBMITTED',
    'UNDER_REVIEW',
    'INVESTIGATION',
    'ESCALATED',
    'DECISION_PENDING',
    'ENFORCEMENT_ACTION',
    'CLOSED',
    'REJECTED'
);

CREATE TABLE app.case_file (
    id                  bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    tenant_id           bigint NOT NULL,
    case_number         text NOT NULL,
    status              app.case_status NOT NULL DEFAULT 'DRAFT',
    priority            int NOT NULL DEFAULT 3 CHECK (priority BETWEEN 1 AND 5),
    subject_name        text NOT NULL,
    subject_identifier  text,
    assigned_team       text,
    assigned_user_id    bigint,
    opened_at           timestamptz NOT NULL DEFAULT now(),
    due_at              timestamptz,
    closed_at           timestamptz,
    version             bigint NOT NULL DEFAULT 0,
    metadata            jsonb NOT NULL DEFAULT '{}'::jsonb,
    created_at          timestamptz NOT NULL DEFAULT now(),
    updated_at          timestamptz NOT NULL DEFAULT now(),

    CONSTRAINT uq_case_file_tenant_case_number UNIQUE (tenant_id, case_number),
    CONSTRAINT chk_case_closed_at CHECK (
        (status IN ('CLOSED', 'REJECTED') AND closed_at IS NOT NULL)
        OR
        (status NOT IN ('CLOSED', 'REJECTED'))
    )
);

CREATE TABLE app.case_event (
    id             bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    tenant_id      bigint NOT NULL,
    case_file_id   bigint NOT NULL REFERENCES app.case_file(id),
    event_type     text NOT NULL,
    actor_user_id  bigint,
    payload        jsonb NOT NULL DEFAULT '{}'::jsonb,
    occurred_at    timestamptz NOT NULL DEFAULT now()
);

CREATE TABLE app.case_task (
    id              bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    tenant_id       bigint NOT NULL,
    case_file_id    bigint NOT NULL REFERENCES app.case_file(id),
    task_type       text NOT NULL,
    status          text NOT NULL CHECK (status IN ('OPEN', 'IN_PROGRESS', 'DONE', 'CANCELLED')),
    assigned_user_id bigint,
    due_at          timestamptz,
    created_at      timestamptz NOT NULL DEFAULT now(),
    updated_at      timestamptz NOT NULL DEFAULT now()
);

CREATE TABLE app.outbox_event (
    id              bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    aggregate_type  text NOT NULL,
    aggregate_id    bigint NOT NULL,
    event_type      text NOT NULL,
    payload         jsonb NOT NULL,
    status          text NOT NULL DEFAULT 'PENDING' CHECK (status IN ('PENDING', 'PUBLISHED', 'FAILED')),
    created_at      timestamptz NOT NULL DEFAULT now(),
    published_at    timestamptz
);

CREATE TABLE app.idempotency_key (
    tenant_id       bigint NOT NULL,
    key             text NOT NULL,
    request_hash    text NOT NULL,
    response_body   jsonb,
    status          text NOT NULL CHECK (status IN ('PROCESSING', 'COMPLETED', 'FAILED')),
    created_at      timestamptz NOT NULL DEFAULT now(),
    updated_at      timestamptz NOT NULL DEFAULT now(),
    PRIMARY KEY (tenant_id, key)
);

-- Baseline indexes. We intentionally keep this minimal.
-- Later parts will add, compare, and remove indexes deliberately.
CREATE INDEX idx_case_file_tenant_status_created
    ON app.case_file (tenant_id, status, created_at DESC);

CREATE INDEX idx_case_file_tenant_due
    ON app.case_file (tenant_id, due_at)
    WHERE status NOT IN ('CLOSED', 'REJECTED');

CREATE INDEX idx_case_event_case_occurred
    ON app.case_event (case_file_id, occurred_at DESC);

CREATE INDEX idx_case_task_open_due
    ON app.case_task (tenant_id, due_at)
    WHERE status IN ('OPEN', 'IN_PROGRESS');

CREATE INDEX idx_outbox_pending_created
    ON app.outbox_event (created_at)
    WHERE status = 'PENDING';

8.1 Mengapa Schema Ini Dipilih

Schema ini cukup sederhana untuk dipahami, tetapi cukup kaya untuk latihan production:

  • case_file punya lifecycle status.
  • case_event memberi audit/event stream sederhana.
  • case_task memberi due date dan assignment workload.
  • outbox_event memberi pola integration/event publishing.
  • idempotency_key memberi concurrency dan retry pattern.
  • Partial indexes memungkinkan latihan index design.
  • JSONB memungkinkan hybrid modeling.
  • Constraints memberi invariant boundary.

8.2 Hal yang Sengaja Belum Sempurna

Schema ini bukan final design. Beberapa hal sengaja akan kita evaluasi nanti:

  • enum vs lookup table untuk status;
  • apakah case_status terlalu rigid;
  • apakah case_number text sudah tepat;
  • apakah metadata perlu generated columns;
  • apakah FK pada event table berdampak pada write throughput;
  • apakah updated_at perlu trigger;
  • apakah case_event perlu partitioning;
  • apakah index baseline terlalu banyak atau terlalu sedikit.

Ini penting: lab yang terlalu “sempurna” tidak mengajarkan diagnosis.

9. Seed Data Kecil

Buat postgres/init/003_seed_small.sql:

INSERT INTO app.case_file (
    tenant_id,
    case_number,
    status,
    priority,
    subject_name,
    subject_identifier,
    assigned_team,
    assigned_user_id,
    opened_at,
    due_at,
    metadata
)
SELECT
    (1 + (g % 5))::bigint AS tenant_id,
    'CASE-' || lpad(g::text, 8, '0') AS case_number,
    (ARRAY[
        'DRAFT',
        'SUBMITTED',
        'UNDER_REVIEW',
        'INVESTIGATION',
        'ESCALATED',
        'DECISION_PENDING'
    ])[1 + (g % 6)]::app.case_status AS status,
    1 + (g % 5) AS priority,
    'Subject ' || g AS subject_name,
    'SID-' || (100000 + g) AS subject_identifier,
    'TEAM-' || (1 + (g % 10)) AS assigned_team,
    1 + (g % 100) AS assigned_user_id,
    now() - ((g % 365) || ' days')::interval AS opened_at,
    now() + ((g % 60) || ' days')::interval AS due_at,
    jsonb_build_object(
        'riskScore', g % 100,
        'source', CASE WHEN g % 3 = 0 THEN 'portal' ELSE 'batch' END,
        'region', 'R' || (g % 7)
    ) AS metadata
FROM generate_series(1, 10000) AS g;

INSERT INTO app.case_event (
    tenant_id,
    case_file_id,
    event_type,
    actor_user_id,
    payload,
    occurred_at
)
SELECT
    cf.tenant_id,
    cf.id,
    CASE WHEN e.n = 1 THEN 'CASE_CREATED' ELSE 'STATUS_REVIEWED' END,
    cf.assigned_user_id,
    jsonb_build_object('status', cf.status, 'eventSequence', e.n),
    cf.opened_at + (e.n || ' hours')::interval
FROM app.case_file cf
JOIN LATERAL generate_series(1, 3) AS e(n) ON true;

INSERT INTO app.case_task (
    tenant_id,
    case_file_id,
    task_type,
    status,
    assigned_user_id,
    due_at
)
SELECT
    cf.tenant_id,
    cf.id,
    CASE WHEN cf.priority <= 2 THEN 'URGENT_REVIEW' ELSE 'STANDARD_REVIEW' END,
    CASE WHEN cf.id % 4 = 0 THEN 'DONE' ELSE 'OPEN' END,
    cf.assigned_user_id,
    cf.due_at
FROM app.case_file cf
WHERE cf.id % 2 = 0;

ANALYZE;

Seed kecil ini berjalan saat container pertama kali dibuat.

10. Observability Views

Buat postgres/init/004_observability_views.sql:

CREATE OR REPLACE VIEW lab.top_statements AS
SELECT
    queryid,
    calls,
    round(total_exec_time::numeric, 2) AS total_exec_ms,
    round(mean_exec_time::numeric, 2) AS mean_exec_ms,
    rows,
    shared_blks_hit,
    shared_blks_read,
    shared_blks_dirtied,
    shared_blks_written,
    temp_blks_read,
    temp_blks_written,
    left(regexp_replace(query, '\s+', ' ', 'g'), 160) AS query_sample
FROM pg_stat_statements
ORDER BY total_exec_time DESC;

CREATE OR REPLACE VIEW lab.active_sessions AS
SELECT
    pid,
    usename,
    application_name,
    state,
    wait_event_type,
    wait_event,
    now() - xact_start AS xact_age,
    now() - query_start AS query_age,
    left(regexp_replace(query, '\s+', ' ', 'g'), 160) AS query_sample
FROM pg_stat_activity
WHERE datname = current_database()
ORDER BY query_start NULLS LAST;

CREATE OR REPLACE VIEW lab.blocking_locks AS
SELECT
    blocked.pid AS blocked_pid,
    blocked_activity.usename AS blocked_user,
    blocking.pid AS blocking_pid,
    blocking_activity.usename AS blocking_user,
    blocked.locktype,
    blocked.relation::regclass AS relation,
    blocked.mode AS blocked_mode,
    blocking.mode AS blocking_mode,
    now() - blocked_activity.query_start AS blocked_duration,
    left(blocked_activity.query, 160) AS blocked_query,
    left(blocking_activity.query, 160) AS blocking_query
FROM pg_locks blocked
JOIN pg_stat_activity blocked_activity
    ON blocked_activity.pid = blocked.pid
JOIN pg_locks blocking
    ON blocking.locktype = blocked.locktype
   AND blocking.database IS NOT DISTINCT FROM blocked.database
   AND blocking.relation IS NOT DISTINCT FROM blocked.relation
   AND blocking.page IS NOT DISTINCT FROM blocked.page
   AND blocking.tuple IS NOT DISTINCT FROM blocked.tuple
   AND blocking.virtualxid IS NOT DISTINCT FROM blocked.virtualxid
   AND blocking.transactionid IS NOT DISTINCT FROM blocked.transactionid
   AND blocking.classid IS NOT DISTINCT FROM blocked.classid
   AND blocking.objid IS NOT DISTINCT FROM blocked.objid
   AND blocking.objsubid IS NOT DISTINCT FROM blocked.objsubid
   AND blocking.pid <> blocked.pid
JOIN pg_stat_activity blocking_activity
    ON blocking_activity.pid = blocking.pid
WHERE NOT blocked.granted
  AND blocking.granted;

CREATE OR REPLACE VIEW lab.table_health AS
SELECT
    schemaname,
    relname,
    n_live_tup,
    n_dead_tup,
    seq_scan,
    seq_tup_read,
    idx_scan,
    idx_tup_fetch,
    n_tup_ins,
    n_tup_upd,
    n_tup_del,
    n_tup_hot_upd,
    last_vacuum,
    last_autovacuum,
    last_analyze,
    last_autoanalyze
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC, n_live_tup DESC;

CREATE OR REPLACE VIEW lab.index_usage AS
SELECT
    schemaname,
    relname,
    indexrelname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC, pg_relation_size(indexrelid) DESC;

Views ini bukan pengganti observability platform. Ini adalah learning console agar diagnosis cepat.

11. Menjalankan Lab

Dari folder root:

make up

Cek health:

docker ps

Masuk ke database:

make psql

Cek versi:

SELECT version();

Cek extension:

SELECT extname, extversion
FROM pg_extension
ORDER BY extname;

Cek data awal:

SELECT count(*) FROM app.case_file;
SELECT count(*) FROM app.case_event;
SELECT count(*) FROM app.case_task;

Cek observability view:

SELECT * FROM lab.active_sessions;
SELECT * FROM lab.table_health;
SELECT * FROM lab.index_usage;

12. psql sebagai Engineering Console

psql bukan sekadar CLI. Untuk PostgreSQL engineer, psql adalah diagnostic console.

Command penting:

\conninfo        -- connection info
\dn              -- schemas
\dt app.*        -- tables in schema
\d+ app.case_file
\di+ app.*       -- indexes with size
\x on            -- expanded display
\timing on       -- show query timing
\watch 1         -- repeat query every 1 second

Contoh monitoring aktif:

SELECT * FROM lab.active_sessions;
\watch 1

Contoh melihat locks:

SELECT * FROM lab.blocking_locks;
\watch 1

Contoh top statements:

SELECT calls, mean_exec_ms, shared_blks_read, query_sample
FROM lab.top_statements
LIMIT 10;

13. EXPLAIN Template

Buat scripts/explain-template.sql:

EXPLAIN (ANALYZE, BUFFERS, VERBOSE, SETTINGS)
SELECT
    id,
    tenant_id,
    case_number,
    status,
    priority,
    opened_at,
    due_at
FROM app.case_file
WHERE tenant_id = 1
  AND status = 'INVESTIGATION'
ORDER BY created_at DESC
LIMIT 20;

Jalankan:

psql postgresql://learnpg:learnpg@localhost:5432/learnpg -f scripts/explain-template.sql

Yang harus diamati:

  • node paling atas;
  • scan type;
  • estimated rows vs actual rows;
  • buffers hit/read;
  • sort ada atau tidak;
  • index yang dipakai;
  • execution time;
  • planning time;
  • settings non-default.

Jangan fokus pada execution time saja. Untuk lab kecil, waktu bisa misleading karena cache.

14. Seed Data Besar

Buat scripts/seed-large.sql:

INSERT INTO app.case_file (
    tenant_id,
    case_number,
    status,
    priority,
    subject_name,
    subject_identifier,
    assigned_team,
    assigned_user_id,
    opened_at,
    due_at,
    metadata
)
SELECT
    (1 + (g % 20))::bigint AS tenant_id,
    'CASE-LARGE-' || lpad(g::text, 10, '0') AS case_number,
    CASE
        WHEN g % 100 < 45 THEN 'CLOSED'
        WHEN g % 100 < 60 THEN 'UNDER_REVIEW'
        WHEN g % 100 < 75 THEN 'INVESTIGATION'
        WHEN g % 100 < 85 THEN 'SUBMITTED'
        WHEN g % 100 < 95 THEN 'ESCALATED'
        ELSE 'DECISION_PENDING'
    END::app.case_status AS status,
    1 + (g % 5) AS priority,
    'Large Subject ' || g AS subject_name,
    'LSID-' || (10000000 + g) AS subject_identifier,
    'TEAM-' || (1 + (g % 25)) AS assigned_team,
    1 + (g % 1000) AS assigned_user_id,
    now() - ((g % 1095) || ' days')::interval AS opened_at,
    now() + ((g % 120) || ' days')::interval AS due_at,
    jsonb_build_object(
        'riskScore', g % 100,
        'source', CASE WHEN g % 5 = 0 THEN 'api' WHEN g % 5 = 1 THEN 'portal' ELSE 'batch' END,
        'region', 'R' || (g % 12),
        'largeSeed', true
    ) AS metadata
FROM generate_series(1, 500000) AS g;

INSERT INTO app.case_event (
    tenant_id,
    case_file_id,
    event_type,
    actor_user_id,
    payload,
    occurred_at
)
SELECT
    cf.tenant_id,
    cf.id,
    CASE e.n
        WHEN 1 THEN 'CASE_CREATED'
        WHEN 2 THEN 'CASE_ASSIGNED'
        WHEN 3 THEN 'CASE_REVIEWED'
        ELSE 'CASE_UPDATED'
    END,
    cf.assigned_user_id,
    jsonb_build_object('status', cf.status, 'eventSequence', e.n),
    cf.opened_at + (e.n || ' hours')::interval
FROM app.case_file cf
JOIN LATERAL generate_series(1, 4) AS e(n) ON true
WHERE cf.case_number LIKE 'CASE-LARGE-%';

INSERT INTO app.case_task (
    tenant_id,
    case_file_id,
    task_type,
    status,
    assigned_user_id,
    due_at
)
SELECT
    cf.tenant_id,
    cf.id,
    CASE WHEN cf.priority <= 2 THEN 'URGENT_REVIEW' ELSE 'STANDARD_REVIEW' END,
    CASE
        WHEN cf.id % 10 = 0 THEN 'DONE'
        WHEN cf.id % 10 = 1 THEN 'IN_PROGRESS'
        ELSE 'OPEN'
    END,
    cf.assigned_user_id,
    cf.due_at
FROM app.case_file cf
WHERE cf.case_number LIKE 'CASE-LARGE-%'
  AND cf.id % 3 = 0;

ANALYZE;

Jalankan:

make seed

Estimasi data setelah seed:

  • case_file: sekitar 510.000 row;
  • case_event: sekitar 2 juta row tambahan;
  • case_task: sekitar 166.000 row tambahan.

Jumlah ini cukup untuk mulai melihat plan difference tanpa membuat laptop terlalu berat.

15. Workload dengan pgbench

PostgreSQL menyediakan pgbench sebagai tool sederhana untuk menjalankan benchmark berbasis transaksi berulang dan menghitung transaction rate. Dalam seri ini, pgbench dipakai bukan untuk klaim performa absolut, tetapi untuk membuat workload repeatable.

Buat pgbench/case-search.sql:

\set tenant_id random(1, 20)
\set priority random(1, 5)

SELECT id, case_number, status, priority, opened_at, due_at
FROM app.case_file
WHERE tenant_id = :tenant_id
  AND priority = :priority
  AND status IN ('UNDER_REVIEW', 'INVESTIGATION', 'ESCALATED')
ORDER BY created_at DESC
LIMIT 50;

Buat pgbench/case-update.sql:

\set case_id random(1, 500000)

BEGIN;

UPDATE app.case_file
SET
    version = version + 1,
    updated_at = now(),
    metadata = jsonb_set(metadata, '{lastTouchedBy}', to_jsonb('pgbench'::text), true)
WHERE id = :case_id;

INSERT INTO app.case_event (
    tenant_id,
    case_file_id,
    event_type,
    actor_user_id,
    payload
)
SELECT
    tenant_id,
    id,
    'CASE_TOUCHED',
    assigned_user_id,
    jsonb_build_object('source', 'pgbench')
FROM app.case_file
WHERE id = :case_id;

COMMIT;

Buat pgbench/mixed-workload.sql:

\set tenant_id random(1, 20)
\set priority random(1, 5)
\set case_id random(1, 500000)

BEGIN;

SELECT id, case_number, status, priority, opened_at, due_at
FROM app.case_file
WHERE tenant_id = :tenant_id
  AND priority = :priority
  AND status IN ('UNDER_REVIEW', 'INVESTIGATION', 'ESCALATED')
ORDER BY created_at DESC
LIMIT 20;

UPDATE app.case_file
SET
    version = version + 1,
    updated_at = now()
WHERE id = :case_id;

COMMIT;

Jalankan:

pgbench postgresql://learnpg:learnpg@localhost:5432/learnpg \
  -f pgbench/case-search.sql \
  -c 8 \
  -j 4 \
  -T 60

Parameter:

  • -c 8: 8 concurrent clients;
  • -j 4: 4 worker threads;
  • -T 60: 60 detik;
  • -f: script transaksi.

Setelah menjalankan workload:

SELECT calls, mean_exec_ms, shared_blks_hit, shared_blks_read, temp_blks_written, query_sample
FROM lab.top_statements
LIMIT 10;

16. Reset Statistik

Saat eksperimen, statistik lama bisa mengganggu interpretasi.

Reset pg_stat_statements:

SELECT pg_stat_statements_reset();

Reset stats database-wide butuh privilege lebih tinggi. Untuk lab, bisa masuk sebagai superuser container default jika diperlukan. Namun untuk sebagian besar part, reset pg_stat_statements sudah cukup.

Format eksperimen:

SELECT pg_stat_statements_reset();

-- run workload

SELECT * FROM lab.top_statements LIMIT 20;

17. Lock Demo

Buat scripts/locks-demo.sql:

BEGIN;

SELECT id, status
FROM app.case_file
WHERE id = 1
FOR UPDATE;

-- Keep this transaction open.
-- In another psql session, run:
-- UPDATE app.case_file SET priority = 5 WHERE id = 1;

Session A:

psql postgresql://learnpg:learnpg@localhost:5432/learnpg -f scripts/locks-demo.sql

Session B:

UPDATE app.case_file
SET priority = 5
WHERE id = 1;

Session C:

SELECT * FROM lab.blocking_locks;

Yang dipelajari:

  • row-level lock bisa membuat session lain menunggu;
  • blocker bisa ditemukan dari pg_locks dan pg_stat_activity;
  • lock wait bukan deadlock;
  • transaksi yang dibiarkan open adalah operational risk.

18. Deadlock Demo

Buat scripts/deadlock-session-a.sql:

BEGIN;

UPDATE app.case_file
SET priority = 2
WHERE id = 1;

SELECT pg_sleep(5);

UPDATE app.case_file
SET priority = 3
WHERE id = 2;

COMMIT;

Buat scripts/deadlock-session-b.sql:

BEGIN;

UPDATE app.case_file
SET priority = 4
WHERE id = 2;

SELECT pg_sleep(5);

UPDATE app.case_file
SET priority = 5
WHERE id = 1;

COMMIT;

Jalankan hampir bersamaan di dua terminal:

psql postgresql://learnpg:learnpg@localhost:5432/learnpg -f scripts/deadlock-session-a.sql
psql postgresql://learnpg:learnpg@localhost:5432/learnpg -f scripts/deadlock-session-b.sql

Salah satu transaksi akan gagal dengan deadlock detected.

Mental model:

Deadlock bukan sekadar “database error”. Deadlock adalah bukti bahwa dua transaksi mengambil resource dalam urutan berbeda.

Aplikasi Java harus memperlakukan deadlock sebagai retryable error hanya jika:

  • operasi idempotent;
  • side effect eksternal belum dikirim sebelum commit;
  • retry punya limit dan jitter;
  • telemetry mencatat retry count.

19. Bloat Demo Awal

Kita akan bahas bloat lebih dalam di part storage dan vacuum. Untuk sekarang, cukup lihat bahwa update membuat dead tuples.

Jalankan:

SELECT relname, n_live_tup, n_dead_tup
FROM lab.table_health
WHERE relname = 'case_file';

Update banyak row:

UPDATE app.case_file
SET metadata = jsonb_set(metadata, '{bloatDemo}', 'true'::jsonb, true)
WHERE tenant_id = 1;

Cek lagi:

SELECT relname, n_live_tup, n_dead_tup, n_tup_upd, n_tup_hot_upd
FROM lab.table_health
WHERE relname = 'case_file';

Lalu:

VACUUM (VERBOSE, ANALYZE) app.case_file;

Cek log PostgreSQL.

Tujuan bukan menghafal output vacuum. Tujuan awal adalah melihat bahwa update meninggalkan jejak storage-level.

20. WAL Growth Demo Awal

Cek current WAL LSN:

SELECT pg_current_wal_lsn();

Simpan LSN awal:

SELECT pg_current_wal_lsn() AS before_lsn \gset

Jalankan update:

UPDATE app.case_file
SET version = version + 1
WHERE tenant_id = 2;

Bandingkan WAL bytes:

SELECT pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), :'before_lsn')) AS wal_generated;

Pelajaran:

  • update menghasilkan WAL;
  • index maintenance juga menghasilkan WAL;
  • write-heavy design punya biaya replication, backup, storage, dan recovery.

21. Baseline Query Experiments

21.1 Query yang Didukung Index

EXPLAIN (ANALYZE, BUFFERS)
SELECT id, case_number, status, created_at
FROM app.case_file
WHERE tenant_id = 1
  AND status = 'INVESTIGATION'
ORDER BY created_at DESC
LIMIT 20;

Kemungkinan besar memakai index idx_case_file_tenant_status_created.

21.2 Query yang Tidak Sepenuhnya Didukung Index

EXPLAIN (ANALYZE, BUFFERS)
SELECT id, case_number, status, priority, created_at
FROM app.case_file
WHERE tenant_id = 1
  AND priority = 3
  AND status IN ('UNDER_REVIEW', 'INVESTIGATION', 'ESCALATED')
ORDER BY created_at DESC
LIMIT 50;

Pertanyaan:

  • Apakah index existing cukup?
  • Apakah priority harus masuk index?
  • Apakah partial index lebih tepat?
  • Apakah query ini read path penting atau hanya admin report?

Jangan langsung membuat index. Catat baseline dulu.

21.3 Query JSONB Tanpa Index

EXPLAIN (ANALYZE, BUFFERS)
SELECT id, case_number, metadata
FROM app.case_file
WHERE metadata ->> 'region' = 'R3'
LIMIT 50;

Pertanyaan:

  • Apakah ini butuh generated column?
  • Apakah butuh expression index?
  • Apakah butuh GIN index?
  • Apakah field region seharusnya kolom relational biasa?

Kita akan jawab di part JSONB dan indexing.

22. Java Lab Placeholder

Buat java-lab/README.md:

# Java PostgreSQL Lab

This folder will be used in later parts for:

- pgJDBC experiments
- HikariCP pool sizing
- transaction timeout hierarchy
- Hibernate/JPA query behavior
- Flyway zero-downtime migration
- Testcontainers integration tests

Untuk sekarang, belum perlu membuat service Java. Kita akan membangun ketika sudah punya mental model query, lock, dan transaction.

Namun, dari awal kita sudah memilih lab structure yang kompatibel dengan Java integration.

23. Diagnostic Queries yang Wajib Dihafal Secara Fungsional

Bukan hafal karakter demi karakter, tetapi tahu kapan memakainya.

23.1 Active Sessions

SELECT
    pid,
    state,
    wait_event_type,
    wait_event,
    now() - xact_start AS xact_age,
    now() - query_start AS query_age,
    left(query, 120) AS query
FROM pg_stat_activity
WHERE datname = current_database()
ORDER BY query_start NULLS LAST;

23.2 Blocking Sessions

SELECT * FROM lab.blocking_locks;

23.3 Top Statements

SELECT
    calls,
    mean_exec_ms,
    total_exec_ms,
    shared_blks_hit,
    shared_blks_read,
    temp_blks_written,
    query_sample
FROM lab.top_statements
LIMIT 20;

23.4 Table Health

SELECT * FROM lab.table_health
WHERE schemaname = 'app';

23.5 Index Usage

SELECT * FROM lab.index_usage
WHERE schemaname = 'app';

23.6 Relation Size

SELECT
    n.nspname AS schema_name,
    c.relname AS relation_name,
    pg_size_pretty(pg_total_relation_size(c.oid)) AS total_size,
    pg_size_pretty(pg_relation_size(c.oid)) AS heap_size,
    pg_size_pretty(pg_indexes_size(c.oid)) AS indexes_size
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE n.nspname = 'app'
  AND c.relkind = 'r'
ORDER BY pg_total_relation_size(c.oid) DESC;

24. Troubleshooting Setup

24.1 Port 5432 Sudah Dipakai

Ubah port mapping:

ports:
  - "55432:5432"

Lalu connection URL:

postgresql://learnpg:learnpg@localhost:55432/learnpg

24.2 Extension Tidak Aktif

Cek:

SHOW shared_preload_libraries;
SELECT * FROM pg_extension;

Jika pg_stat_statements tidak aktif:

  1. pastikan shared_preload_libraries = 'pg_stat_statements';
  2. restart container;
  3. jalankan CREATE EXTENSION IF NOT EXISTS pg_stat_statements;.

24.3 Init Script Tidak Berjalan

Script /docker-entrypoint-initdb.d hanya berjalan saat volume kosong.

Reset:

docker compose down -v
docker compose up -d

24.4 Query Terlalu Cepat untuk Dibedakan

Tambah data:

make seed

Atau jalankan query dengan predicate yang lebih buruk untuk latihan. Jangan membuat kesimpulan tuning dari dataset 10 row.

25. Safety Boundary: Lab vs Production

Jangan copy-paste konfigurasi lab ke production.

Contoh:

  • log_temp_files = 0 bisa menghasilkan log besar pada workload tertentu.
  • log_autovacuum_min_duration = 0 terlalu noisy untuk production kecuali sedang investigasi.
  • work_mem = 16MB mungkin terlalu besar atau terlalu kecil tergantung concurrency dan query shape.
  • max_connections = 100 tidak berarti aplikasi boleh membuat 100 koneksi aktif tanpa budget.
  • pg_stat_statements.track = all perlu dipahami overhead dan volume statistiknya.

Production tuning harus berbasis workload dan metrics.

26. Checklist Selesai Part 002

Kamu boleh lanjut ke Part 003 jika bisa:

  • menjalankan PostgreSQL 18 dengan Docker Compose;
  • masuk via psql;
  • melihat extension aktif;
  • menjalankan query pada schema app;
  • membaca lab.active_sessions;
  • membaca lab.top_statements setelah workload;
  • menjalankan EXPLAIN (ANALYZE, BUFFERS);
  • membuat lock wait sederhana dan menemukan blocker;
  • menjalankan pgbench script sederhana;
  • reset lab dengan docker compose down -v.

27. Latihan

Latihan 1 — Baseline Plan

Jalankan:

EXPLAIN (ANALYZE, BUFFERS)
SELECT id, case_number, status
FROM app.case_file
WHERE tenant_id = 3
  AND status = 'UNDER_REVIEW'
ORDER BY created_at DESC
LIMIT 25;

Catat:

  • scan type;
  • index name;
  • buffers hit/read;
  • actual rows;
  • execution time.

Latihan 2 — JSONB Predicate

Jalankan:

EXPLAIN (ANALYZE, BUFFERS)
SELECT id, case_number
FROM app.case_file
WHERE metadata ->> 'source' = 'api'
LIMIT 100;

Catat apakah query melakukan sequential scan.

Jangan optimasi dulu. Kita akan kembali ke query ini di part JSONB dan indexing.

Latihan 3 — Lock Visibility

Buat satu session menahan row lock. Buat session lain menunggu. Temukan blocker memakai lab.blocking_locks.

Latihan 4 — Workload Stats

Reset pg_stat_statements, jalankan pgbench 60 detik, lalu ambil top statements.

SELECT pg_stat_statements_reset();
pgbench postgresql://learnpg:learnpg@localhost:5432/learnpg \
  -f pgbench/mixed-workload.sql \
  -c 8 -j 4 -T 60
SELECT * FROM lab.top_statements LIMIT 10;

28. Ringkasan

Di part ini kita membangun fondasi praktik:

  • PostgreSQL 18 local lab;
  • konfigurasi observability;
  • schema enforcement case management;
  • seed data;
  • pg_stat_statements;
  • diagnostic views;
  • pgbench workload;
  • lock/deadlock demo;
  • EXPLAIN template;
  • resettable experiment workflow.

Mulai part berikutnya, kita akan membongkar arsitektur PostgreSQL: process model, memory, storage, WAL, dan background workers. Tujuannya agar setiap gejala di lab punya tempat dalam mental model engine.

29. Referensi

Lesson Recap

You just completed lesson 02 in start here. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.