Production-Like PostgreSQL Lab Setup
Learn PostgreSQL in Action - Part 002
Build a repeatable PostgreSQL 18 lab with Docker, psql, pgbench, pg_stat_statements, logging, metrics queries, and Java/Testcontainers readiness.
Part 002 — Production-Like PostgreSQL Lab Setup
1. Tujuan Bagian Ini
Bagian ini membangun lab PostgreSQL yang repeatable. Tanpa lab yang bisa di-reset, semua materi berikutnya akan menjadi teori. Kita butuh environment yang bisa menghasilkan:
- query cepat dan query lambat;
- plan baik dan plan buruk;
- lock wait dan deadlock;
- table/index bloat;
- autovacuum behavior;
- WAL growth;
- pool starvation;
- migration lock;
- Java integration behavior.
Target akhir part ini:
- PostgreSQL 18 berjalan lokal dengan konfigurasi observability dasar.
psqlbisa dipakai untuk inspeksi.pg_stat_statementsaktif.- Slow query logging aktif.
pgbenchtersedia untuk workload sederhana.- Schema latihan siap untuk seri berikutnya.
- Ada folder structure yang bisa dipakai untuk eksperimen lanjutan.
2. Prinsip Lab
Lab ini mengikuti empat prinsip.
2.1 Repeatable
Semua harus bisa dihancurkan dan dibuat ulang.
make down
make up
make seed
make benchmark
Kalau hasil eksperimen tidak bisa diulang, kita tidak tahu apakah improvement datang dari perubahan kita atau kebetulan cache, data distribution, atau state lain.
2.2 Observable
Lab harus memperlihatkan gejala internal PostgreSQL:
- active session;
- wait event;
- query statistics;
- lock graph;
- slow query log;
- buffer usage;
- temp file;
- WAL growth;
- table statistics;
- index usage.
2.3 Small Enough to Run, Large Enough to Fail
Dataset kecil membuat semua query terlihat cepat. Dataset terlalu besar membuat eksperimen lambat. Kita mulai dengan dataset menengah yang bisa diatur.
Target awal:
- puluhan ribu sampai beberapa juta row;
- cukup untuk melihat plan difference;
- cukup untuk melihat index effect;
- cukup untuk membuat lock dan bloat;
- masih nyaman dijalankan di laptop.
2.4 One Variable at a Time
Setiap eksperimen hanya mengubah satu hal:
- index baru;
- query rewrite;
- statistics update;
- pool size;
- isolation level;
- batch size;
- partition strategy;
- configuration parameter.
Jika terlalu banyak hal berubah, kita tidak sedang belajar. Kita sedang menebak.
3. Folder Structure
Buat struktur proyek:
learn-postgresql-in-action-lab/
├── Makefile
├── docker-compose.yml
├── postgres/
│ ├── init/
│ │ ├── 001_extensions.sql
│ │ ├── 002_schema.sql
│ │ ├── 003_seed_small.sql
│ │ └── 004_observability_views.sql
│ └── conf/
│ └── postgresql.conf
├── scripts/
│ ├── psql.sh
│ ├── reset.sh
│ ├── seed-large.sql
│ ├── locks-demo.sql
│ ├── deadlock-session-a.sql
│ ├── deadlock-session-b.sql
│ └── explain-template.sql
├── pgbench/
│ ├── case-search.sql
│ ├── case-update.sql
│ └── mixed-workload.sql
└── java-lab/
└── README.md
Kenapa struktur ini penting?
postgres/inituntuk state awal database.postgres/confuntuk konfigurasi engine.scriptsuntuk eksperimen manual.pgbenchuntuk workload repeatable.java-labuntuk integrasi Java pada part berikutnya.
4. Docker Compose PostgreSQL 18
Buat file docker-compose.yml:
services:
postgres:
image: postgres:18
container_name: learn-postgresql-in-action
ports:
- "5432:5432"
environment:
POSTGRES_DB: learnpg
POSTGRES_USER: learnpg
POSTGRES_PASSWORD: learnpg
command:
- "postgres"
- "-c"
- "config_file=/etc/postgresql/postgresql.conf"
volumes:
- pgdata:/var/lib/postgresql/data
- ./postgres/init:/docker-entrypoint-initdb.d
- ./postgres/conf/postgresql.conf:/etc/postgresql/postgresql.conf:ro
healthcheck:
test: ["CMD-SHELL", "pg_isready -U learnpg -d learnpg"]
interval: 5s
timeout: 5s
retries: 10
volumes:
pgdata:
Catatan:
- Image
postgres:18membuat lab mengikuti baseline seri. - Volume
pgdatamembuat state bertahan antar restart. - Script di
/docker-entrypoint-initdb.dhanya berjalan saat database volume pertama kali dibuat. - Untuk reset total, hapus volume.
5. Konfigurasi PostgreSQL untuk Lab
Buat file postgres/conf/postgresql.conf:
# -------------------------------------------------------------------
# Basic connection
# -------------------------------------------------------------------
listen_addresses = '*'
port = 5432
max_connections = 100
# -------------------------------------------------------------------
# Memory baseline for local lab
# Do not copy blindly to production.
# -------------------------------------------------------------------
shared_buffers = '512MB'
effective_cache_size = '2GB'
work_mem = '16MB'
maintenance_work_mem = '256MB'
# -------------------------------------------------------------------
# WAL and checkpoint visibility
# -------------------------------------------------------------------
wal_level = replica
max_wal_size = '2GB'
min_wal_size = '256MB'
checkpoint_timeout = '10min'
checkpoint_completion_target = 0.9
# -------------------------------------------------------------------
# Logging for learning
# -------------------------------------------------------------------
logging_collector = on
log_destination = 'stderr'
log_directory = 'log'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_min_duration_statement = 250
log_lock_waits = on
deadlock_timeout = '1s'
log_checkpoints = on
log_temp_files = 0
log_autovacuum_min_duration = 0
# -------------------------------------------------------------------
# Query identity and statement statistics
# -------------------------------------------------------------------
compute_query_id = on
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.max = 10000
pg_stat_statements.track = all
pg_stat_statements.track_planning = on
# -------------------------------------------------------------------
# Autovacuum visible enough for experiments
# -------------------------------------------------------------------
autovacuum = on
track_io_timing = on
track_wal_io_timing = on
track_functions = pl
# -------------------------------------------------------------------
# Local lab convenience
# -------------------------------------------------------------------
timezone = 'UTC'
lc_messages = 'C'
Penting:
- Ini konfigurasi lab, bukan rekomendasi production universal.
log_min_duration_statement = 250membuat query lebih dari 250 ms masuk log.log_temp_files = 0mencatat semua temp file, berguna saat belajar sort/hash spill.track_io_timing = onmembantu diagnosis I/O, tetapi punya overhead kecil; tetap berguna untuk lab.shared_preload_libraries = 'pg_stat_statements'wajib sebelum extension dapat bekerja penuh.
6. Makefile
Buat Makefile:
PG_CONTAINER=learn-postgresql-in-action
PG_URL=postgresql://learnpg:learnpg@localhost:5432/learnpg
.PHONY: up down reset logs psql seed bench stats locks
up:
docker compose up -d
down:
docker compose down
reset:
docker compose down -v
docker compose up -d
logs:
docker logs -f $(PG_CONTAINER)
psql:
psql $(PG_URL)
seed:
psql $(PG_URL) -f scripts/seed-large.sql
bench:
pgbench $(PG_URL) -f pgbench/mixed-workload.sql -c 8 -j 4 -T 60
stats:
psql $(PG_URL) -c "select * from lab.top_statements limit 20;"
locks:
psql $(PG_URL) -f scripts/locks-demo.sql
Jika psql dan pgbench tidak terinstall di host, jalankan dari container:
docker exec -it learn-postgresql-in-action psql -U learnpg -d learnpg
docker exec -it learn-postgresql-in-action pgbench -U learnpg -d learnpg -i -s 10
7. Extension Setup
Buat postgres/init/001_extensions.sql:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
CREATE EXTENSION IF NOT EXISTS pgcrypto;
Kita memakai:
pg_stat_statementsuntuk melihat aggregate query behavior;pgcryptountuk utility UUID/random dalam lab jika dibutuhkan.
PostgreSQL 18 juga menyediakan uuidv7(), tetapi seri ini tidak bergantung pada UUIDv7 di part awal. Nanti kita bahas pada type system dan key design.
8. Schema Latihan: Enforcement Case Management
Buat postgres/init/002_schema.sql:
CREATE SCHEMA IF NOT EXISTS app;
CREATE SCHEMA IF NOT EXISTS lab;
CREATE TYPE app.case_status AS ENUM (
'DRAFT',
'SUBMITTED',
'UNDER_REVIEW',
'INVESTIGATION',
'ESCALATED',
'DECISION_PENDING',
'ENFORCEMENT_ACTION',
'CLOSED',
'REJECTED'
);
CREATE TABLE app.case_file (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
tenant_id bigint NOT NULL,
case_number text NOT NULL,
status app.case_status NOT NULL DEFAULT 'DRAFT',
priority int NOT NULL DEFAULT 3 CHECK (priority BETWEEN 1 AND 5),
subject_name text NOT NULL,
subject_identifier text,
assigned_team text,
assigned_user_id bigint,
opened_at timestamptz NOT NULL DEFAULT now(),
due_at timestamptz,
closed_at timestamptz,
version bigint NOT NULL DEFAULT 0,
metadata jsonb NOT NULL DEFAULT '{}'::jsonb,
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now(),
CONSTRAINT uq_case_file_tenant_case_number UNIQUE (tenant_id, case_number),
CONSTRAINT chk_case_closed_at CHECK (
(status IN ('CLOSED', 'REJECTED') AND closed_at IS NOT NULL)
OR
(status NOT IN ('CLOSED', 'REJECTED'))
)
);
CREATE TABLE app.case_event (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
tenant_id bigint NOT NULL,
case_file_id bigint NOT NULL REFERENCES app.case_file(id),
event_type text NOT NULL,
actor_user_id bigint,
payload jsonb NOT NULL DEFAULT '{}'::jsonb,
occurred_at timestamptz NOT NULL DEFAULT now()
);
CREATE TABLE app.case_task (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
tenant_id bigint NOT NULL,
case_file_id bigint NOT NULL REFERENCES app.case_file(id),
task_type text NOT NULL,
status text NOT NULL CHECK (status IN ('OPEN', 'IN_PROGRESS', 'DONE', 'CANCELLED')),
assigned_user_id bigint,
due_at timestamptz,
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now()
);
CREATE TABLE app.outbox_event (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
aggregate_type text NOT NULL,
aggregate_id bigint NOT NULL,
event_type text NOT NULL,
payload jsonb NOT NULL,
status text NOT NULL DEFAULT 'PENDING' CHECK (status IN ('PENDING', 'PUBLISHED', 'FAILED')),
created_at timestamptz NOT NULL DEFAULT now(),
published_at timestamptz
);
CREATE TABLE app.idempotency_key (
tenant_id bigint NOT NULL,
key text NOT NULL,
request_hash text NOT NULL,
response_body jsonb,
status text NOT NULL CHECK (status IN ('PROCESSING', 'COMPLETED', 'FAILED')),
created_at timestamptz NOT NULL DEFAULT now(),
updated_at timestamptz NOT NULL DEFAULT now(),
PRIMARY KEY (tenant_id, key)
);
-- Baseline indexes. We intentionally keep this minimal.
-- Later parts will add, compare, and remove indexes deliberately.
CREATE INDEX idx_case_file_tenant_status_created
ON app.case_file (tenant_id, status, created_at DESC);
CREATE INDEX idx_case_file_tenant_due
ON app.case_file (tenant_id, due_at)
WHERE status NOT IN ('CLOSED', 'REJECTED');
CREATE INDEX idx_case_event_case_occurred
ON app.case_event (case_file_id, occurred_at DESC);
CREATE INDEX idx_case_task_open_due
ON app.case_task (tenant_id, due_at)
WHERE status IN ('OPEN', 'IN_PROGRESS');
CREATE INDEX idx_outbox_pending_created
ON app.outbox_event (created_at)
WHERE status = 'PENDING';
8.1 Mengapa Schema Ini Dipilih
Schema ini cukup sederhana untuk dipahami, tetapi cukup kaya untuk latihan production:
case_filepunya lifecycle status.case_eventmemberi audit/event stream sederhana.case_taskmemberi due date dan assignment workload.outbox_eventmemberi pola integration/event publishing.idempotency_keymemberi concurrency dan retry pattern.- Partial indexes memungkinkan latihan index design.
- JSONB memungkinkan hybrid modeling.
- Constraints memberi invariant boundary.
8.2 Hal yang Sengaja Belum Sempurna
Schema ini bukan final design. Beberapa hal sengaja akan kita evaluasi nanti:
- enum vs lookup table untuk status;
- apakah
case_statusterlalu rigid; - apakah
case_numbertext sudah tepat; - apakah
metadataperlu generated columns; - apakah FK pada event table berdampak pada write throughput;
- apakah
updated_atperlu trigger; - apakah
case_eventperlu partitioning; - apakah index baseline terlalu banyak atau terlalu sedikit.
Ini penting: lab yang terlalu “sempurna” tidak mengajarkan diagnosis.
9. Seed Data Kecil
Buat postgres/init/003_seed_small.sql:
INSERT INTO app.case_file (
tenant_id,
case_number,
status,
priority,
subject_name,
subject_identifier,
assigned_team,
assigned_user_id,
opened_at,
due_at,
metadata
)
SELECT
(1 + (g % 5))::bigint AS tenant_id,
'CASE-' || lpad(g::text, 8, '0') AS case_number,
(ARRAY[
'DRAFT',
'SUBMITTED',
'UNDER_REVIEW',
'INVESTIGATION',
'ESCALATED',
'DECISION_PENDING'
])[1 + (g % 6)]::app.case_status AS status,
1 + (g % 5) AS priority,
'Subject ' || g AS subject_name,
'SID-' || (100000 + g) AS subject_identifier,
'TEAM-' || (1 + (g % 10)) AS assigned_team,
1 + (g % 100) AS assigned_user_id,
now() - ((g % 365) || ' days')::interval AS opened_at,
now() + ((g % 60) || ' days')::interval AS due_at,
jsonb_build_object(
'riskScore', g % 100,
'source', CASE WHEN g % 3 = 0 THEN 'portal' ELSE 'batch' END,
'region', 'R' || (g % 7)
) AS metadata
FROM generate_series(1, 10000) AS g;
INSERT INTO app.case_event (
tenant_id,
case_file_id,
event_type,
actor_user_id,
payload,
occurred_at
)
SELECT
cf.tenant_id,
cf.id,
CASE WHEN e.n = 1 THEN 'CASE_CREATED' ELSE 'STATUS_REVIEWED' END,
cf.assigned_user_id,
jsonb_build_object('status', cf.status, 'eventSequence', e.n),
cf.opened_at + (e.n || ' hours')::interval
FROM app.case_file cf
JOIN LATERAL generate_series(1, 3) AS e(n) ON true;
INSERT INTO app.case_task (
tenant_id,
case_file_id,
task_type,
status,
assigned_user_id,
due_at
)
SELECT
cf.tenant_id,
cf.id,
CASE WHEN cf.priority <= 2 THEN 'URGENT_REVIEW' ELSE 'STANDARD_REVIEW' END,
CASE WHEN cf.id % 4 = 0 THEN 'DONE' ELSE 'OPEN' END,
cf.assigned_user_id,
cf.due_at
FROM app.case_file cf
WHERE cf.id % 2 = 0;
ANALYZE;
Seed kecil ini berjalan saat container pertama kali dibuat.
10. Observability Views
Buat postgres/init/004_observability_views.sql:
CREATE OR REPLACE VIEW lab.top_statements AS
SELECT
queryid,
calls,
round(total_exec_time::numeric, 2) AS total_exec_ms,
round(mean_exec_time::numeric, 2) AS mean_exec_ms,
rows,
shared_blks_hit,
shared_blks_read,
shared_blks_dirtied,
shared_blks_written,
temp_blks_read,
temp_blks_written,
left(regexp_replace(query, '\s+', ' ', 'g'), 160) AS query_sample
FROM pg_stat_statements
ORDER BY total_exec_time DESC;
CREATE OR REPLACE VIEW lab.active_sessions AS
SELECT
pid,
usename,
application_name,
state,
wait_event_type,
wait_event,
now() - xact_start AS xact_age,
now() - query_start AS query_age,
left(regexp_replace(query, '\s+', ' ', 'g'), 160) AS query_sample
FROM pg_stat_activity
WHERE datname = current_database()
ORDER BY query_start NULLS LAST;
CREATE OR REPLACE VIEW lab.blocking_locks AS
SELECT
blocked.pid AS blocked_pid,
blocked_activity.usename AS blocked_user,
blocking.pid AS blocking_pid,
blocking_activity.usename AS blocking_user,
blocked.locktype,
blocked.relation::regclass AS relation,
blocked.mode AS blocked_mode,
blocking.mode AS blocking_mode,
now() - blocked_activity.query_start AS blocked_duration,
left(blocked_activity.query, 160) AS blocked_query,
left(blocking_activity.query, 160) AS blocking_query
FROM pg_locks blocked
JOIN pg_stat_activity blocked_activity
ON blocked_activity.pid = blocked.pid
JOIN pg_locks blocking
ON blocking.locktype = blocked.locktype
AND blocking.database IS NOT DISTINCT FROM blocked.database
AND blocking.relation IS NOT DISTINCT FROM blocked.relation
AND blocking.page IS NOT DISTINCT FROM blocked.page
AND blocking.tuple IS NOT DISTINCT FROM blocked.tuple
AND blocking.virtualxid IS NOT DISTINCT FROM blocked.virtualxid
AND blocking.transactionid IS NOT DISTINCT FROM blocked.transactionid
AND blocking.classid IS NOT DISTINCT FROM blocked.classid
AND blocking.objid IS NOT DISTINCT FROM blocked.objid
AND blocking.objsubid IS NOT DISTINCT FROM blocked.objsubid
AND blocking.pid <> blocked.pid
JOIN pg_stat_activity blocking_activity
ON blocking_activity.pid = blocking.pid
WHERE NOT blocked.granted
AND blocking.granted;
CREATE OR REPLACE VIEW lab.table_health AS
SELECT
schemaname,
relname,
n_live_tup,
n_dead_tup,
seq_scan,
seq_tup_read,
idx_scan,
idx_tup_fetch,
n_tup_ins,
n_tup_upd,
n_tup_del,
n_tup_hot_upd,
last_vacuum,
last_autovacuum,
last_analyze,
last_autoanalyze
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC, n_live_tup DESC;
CREATE OR REPLACE VIEW lab.index_usage AS
SELECT
schemaname,
relname,
indexrelname,
idx_scan,
idx_tup_read,
idx_tup_fetch,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC, pg_relation_size(indexrelid) DESC;
Views ini bukan pengganti observability platform. Ini adalah learning console agar diagnosis cepat.
11. Menjalankan Lab
Dari folder root:
make up
Cek health:
docker ps
Masuk ke database:
make psql
Cek versi:
SELECT version();
Cek extension:
SELECT extname, extversion
FROM pg_extension
ORDER BY extname;
Cek data awal:
SELECT count(*) FROM app.case_file;
SELECT count(*) FROM app.case_event;
SELECT count(*) FROM app.case_task;
Cek observability view:
SELECT * FROM lab.active_sessions;
SELECT * FROM lab.table_health;
SELECT * FROM lab.index_usage;
12. psql sebagai Engineering Console
psql bukan sekadar CLI. Untuk PostgreSQL engineer, psql adalah diagnostic console.
Command penting:
\conninfo -- connection info
\dn -- schemas
\dt app.* -- tables in schema
\d+ app.case_file
\di+ app.* -- indexes with size
\x on -- expanded display
\timing on -- show query timing
\watch 1 -- repeat query every 1 second
Contoh monitoring aktif:
SELECT * FROM lab.active_sessions;
\watch 1
Contoh melihat locks:
SELECT * FROM lab.blocking_locks;
\watch 1
Contoh top statements:
SELECT calls, mean_exec_ms, shared_blks_read, query_sample
FROM lab.top_statements
LIMIT 10;
13. EXPLAIN Template
Buat scripts/explain-template.sql:
EXPLAIN (ANALYZE, BUFFERS, VERBOSE, SETTINGS)
SELECT
id,
tenant_id,
case_number,
status,
priority,
opened_at,
due_at
FROM app.case_file
WHERE tenant_id = 1
AND status = 'INVESTIGATION'
ORDER BY created_at DESC
LIMIT 20;
Jalankan:
psql postgresql://learnpg:learnpg@localhost:5432/learnpg -f scripts/explain-template.sql
Yang harus diamati:
- node paling atas;
- scan type;
- estimated rows vs actual rows;
- buffers hit/read;
- sort ada atau tidak;
- index yang dipakai;
- execution time;
- planning time;
- settings non-default.
Jangan fokus pada execution time saja. Untuk lab kecil, waktu bisa misleading karena cache.
14. Seed Data Besar
Buat scripts/seed-large.sql:
INSERT INTO app.case_file (
tenant_id,
case_number,
status,
priority,
subject_name,
subject_identifier,
assigned_team,
assigned_user_id,
opened_at,
due_at,
metadata
)
SELECT
(1 + (g % 20))::bigint AS tenant_id,
'CASE-LARGE-' || lpad(g::text, 10, '0') AS case_number,
CASE
WHEN g % 100 < 45 THEN 'CLOSED'
WHEN g % 100 < 60 THEN 'UNDER_REVIEW'
WHEN g % 100 < 75 THEN 'INVESTIGATION'
WHEN g % 100 < 85 THEN 'SUBMITTED'
WHEN g % 100 < 95 THEN 'ESCALATED'
ELSE 'DECISION_PENDING'
END::app.case_status AS status,
1 + (g % 5) AS priority,
'Large Subject ' || g AS subject_name,
'LSID-' || (10000000 + g) AS subject_identifier,
'TEAM-' || (1 + (g % 25)) AS assigned_team,
1 + (g % 1000) AS assigned_user_id,
now() - ((g % 1095) || ' days')::interval AS opened_at,
now() + ((g % 120) || ' days')::interval AS due_at,
jsonb_build_object(
'riskScore', g % 100,
'source', CASE WHEN g % 5 = 0 THEN 'api' WHEN g % 5 = 1 THEN 'portal' ELSE 'batch' END,
'region', 'R' || (g % 12),
'largeSeed', true
) AS metadata
FROM generate_series(1, 500000) AS g;
INSERT INTO app.case_event (
tenant_id,
case_file_id,
event_type,
actor_user_id,
payload,
occurred_at
)
SELECT
cf.tenant_id,
cf.id,
CASE e.n
WHEN 1 THEN 'CASE_CREATED'
WHEN 2 THEN 'CASE_ASSIGNED'
WHEN 3 THEN 'CASE_REVIEWED'
ELSE 'CASE_UPDATED'
END,
cf.assigned_user_id,
jsonb_build_object('status', cf.status, 'eventSequence', e.n),
cf.opened_at + (e.n || ' hours')::interval
FROM app.case_file cf
JOIN LATERAL generate_series(1, 4) AS e(n) ON true
WHERE cf.case_number LIKE 'CASE-LARGE-%';
INSERT INTO app.case_task (
tenant_id,
case_file_id,
task_type,
status,
assigned_user_id,
due_at
)
SELECT
cf.tenant_id,
cf.id,
CASE WHEN cf.priority <= 2 THEN 'URGENT_REVIEW' ELSE 'STANDARD_REVIEW' END,
CASE
WHEN cf.id % 10 = 0 THEN 'DONE'
WHEN cf.id % 10 = 1 THEN 'IN_PROGRESS'
ELSE 'OPEN'
END,
cf.assigned_user_id,
cf.due_at
FROM app.case_file cf
WHERE cf.case_number LIKE 'CASE-LARGE-%'
AND cf.id % 3 = 0;
ANALYZE;
Jalankan:
make seed
Estimasi data setelah seed:
case_file: sekitar 510.000 row;case_event: sekitar 2 juta row tambahan;case_task: sekitar 166.000 row tambahan.
Jumlah ini cukup untuk mulai melihat plan difference tanpa membuat laptop terlalu berat.
15. Workload dengan pgbench
PostgreSQL menyediakan pgbench sebagai tool sederhana untuk menjalankan benchmark berbasis transaksi berulang dan menghitung transaction rate. Dalam seri ini, pgbench dipakai bukan untuk klaim performa absolut, tetapi untuk membuat workload repeatable.
Buat pgbench/case-search.sql:
\set tenant_id random(1, 20)
\set priority random(1, 5)
SELECT id, case_number, status, priority, opened_at, due_at
FROM app.case_file
WHERE tenant_id = :tenant_id
AND priority = :priority
AND status IN ('UNDER_REVIEW', 'INVESTIGATION', 'ESCALATED')
ORDER BY created_at DESC
LIMIT 50;
Buat pgbench/case-update.sql:
\set case_id random(1, 500000)
BEGIN;
UPDATE app.case_file
SET
version = version + 1,
updated_at = now(),
metadata = jsonb_set(metadata, '{lastTouchedBy}', to_jsonb('pgbench'::text), true)
WHERE id = :case_id;
INSERT INTO app.case_event (
tenant_id,
case_file_id,
event_type,
actor_user_id,
payload
)
SELECT
tenant_id,
id,
'CASE_TOUCHED',
assigned_user_id,
jsonb_build_object('source', 'pgbench')
FROM app.case_file
WHERE id = :case_id;
COMMIT;
Buat pgbench/mixed-workload.sql:
\set tenant_id random(1, 20)
\set priority random(1, 5)
\set case_id random(1, 500000)
BEGIN;
SELECT id, case_number, status, priority, opened_at, due_at
FROM app.case_file
WHERE tenant_id = :tenant_id
AND priority = :priority
AND status IN ('UNDER_REVIEW', 'INVESTIGATION', 'ESCALATED')
ORDER BY created_at DESC
LIMIT 20;
UPDATE app.case_file
SET
version = version + 1,
updated_at = now()
WHERE id = :case_id;
COMMIT;
Jalankan:
pgbench postgresql://learnpg:learnpg@localhost:5432/learnpg \
-f pgbench/case-search.sql \
-c 8 \
-j 4 \
-T 60
Parameter:
-c 8: 8 concurrent clients;-j 4: 4 worker threads;-T 60: 60 detik;-f: script transaksi.
Setelah menjalankan workload:
SELECT calls, mean_exec_ms, shared_blks_hit, shared_blks_read, temp_blks_written, query_sample
FROM lab.top_statements
LIMIT 10;
16. Reset Statistik
Saat eksperimen, statistik lama bisa mengganggu interpretasi.
Reset pg_stat_statements:
SELECT pg_stat_statements_reset();
Reset stats database-wide butuh privilege lebih tinggi. Untuk lab, bisa masuk sebagai superuser container default jika diperlukan. Namun untuk sebagian besar part, reset pg_stat_statements sudah cukup.
Format eksperimen:
SELECT pg_stat_statements_reset();
-- run workload
SELECT * FROM lab.top_statements LIMIT 20;
17. Lock Demo
Buat scripts/locks-demo.sql:
BEGIN;
SELECT id, status
FROM app.case_file
WHERE id = 1
FOR UPDATE;
-- Keep this transaction open.
-- In another psql session, run:
-- UPDATE app.case_file SET priority = 5 WHERE id = 1;
Session A:
psql postgresql://learnpg:learnpg@localhost:5432/learnpg -f scripts/locks-demo.sql
Session B:
UPDATE app.case_file
SET priority = 5
WHERE id = 1;
Session C:
SELECT * FROM lab.blocking_locks;
Yang dipelajari:
- row-level lock bisa membuat session lain menunggu;
- blocker bisa ditemukan dari
pg_locksdanpg_stat_activity; - lock wait bukan deadlock;
- transaksi yang dibiarkan open adalah operational risk.
18. Deadlock Demo
Buat scripts/deadlock-session-a.sql:
BEGIN;
UPDATE app.case_file
SET priority = 2
WHERE id = 1;
SELECT pg_sleep(5);
UPDATE app.case_file
SET priority = 3
WHERE id = 2;
COMMIT;
Buat scripts/deadlock-session-b.sql:
BEGIN;
UPDATE app.case_file
SET priority = 4
WHERE id = 2;
SELECT pg_sleep(5);
UPDATE app.case_file
SET priority = 5
WHERE id = 1;
COMMIT;
Jalankan hampir bersamaan di dua terminal:
psql postgresql://learnpg:learnpg@localhost:5432/learnpg -f scripts/deadlock-session-a.sql
psql postgresql://learnpg:learnpg@localhost:5432/learnpg -f scripts/deadlock-session-b.sql
Salah satu transaksi akan gagal dengan deadlock detected.
Mental model:
Deadlock bukan sekadar “database error”. Deadlock adalah bukti bahwa dua transaksi mengambil resource dalam urutan berbeda.
Aplikasi Java harus memperlakukan deadlock sebagai retryable error hanya jika:
- operasi idempotent;
- side effect eksternal belum dikirim sebelum commit;
- retry punya limit dan jitter;
- telemetry mencatat retry count.
19. Bloat Demo Awal
Kita akan bahas bloat lebih dalam di part storage dan vacuum. Untuk sekarang, cukup lihat bahwa update membuat dead tuples.
Jalankan:
SELECT relname, n_live_tup, n_dead_tup
FROM lab.table_health
WHERE relname = 'case_file';
Update banyak row:
UPDATE app.case_file
SET metadata = jsonb_set(metadata, '{bloatDemo}', 'true'::jsonb, true)
WHERE tenant_id = 1;
Cek lagi:
SELECT relname, n_live_tup, n_dead_tup, n_tup_upd, n_tup_hot_upd
FROM lab.table_health
WHERE relname = 'case_file';
Lalu:
VACUUM (VERBOSE, ANALYZE) app.case_file;
Cek log PostgreSQL.
Tujuan bukan menghafal output vacuum. Tujuan awal adalah melihat bahwa update meninggalkan jejak storage-level.
20. WAL Growth Demo Awal
Cek current WAL LSN:
SELECT pg_current_wal_lsn();
Simpan LSN awal:
SELECT pg_current_wal_lsn() AS before_lsn \gset
Jalankan update:
UPDATE app.case_file
SET version = version + 1
WHERE tenant_id = 2;
Bandingkan WAL bytes:
SELECT pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), :'before_lsn')) AS wal_generated;
Pelajaran:
- update menghasilkan WAL;
- index maintenance juga menghasilkan WAL;
- write-heavy design punya biaya replication, backup, storage, dan recovery.
21. Baseline Query Experiments
21.1 Query yang Didukung Index
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, case_number, status, created_at
FROM app.case_file
WHERE tenant_id = 1
AND status = 'INVESTIGATION'
ORDER BY created_at DESC
LIMIT 20;
Kemungkinan besar memakai index idx_case_file_tenant_status_created.
21.2 Query yang Tidak Sepenuhnya Didukung Index
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, case_number, status, priority, created_at
FROM app.case_file
WHERE tenant_id = 1
AND priority = 3
AND status IN ('UNDER_REVIEW', 'INVESTIGATION', 'ESCALATED')
ORDER BY created_at DESC
LIMIT 50;
Pertanyaan:
- Apakah index existing cukup?
- Apakah
priorityharus masuk index? - Apakah partial index lebih tepat?
- Apakah query ini read path penting atau hanya admin report?
Jangan langsung membuat index. Catat baseline dulu.
21.3 Query JSONB Tanpa Index
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, case_number, metadata
FROM app.case_file
WHERE metadata ->> 'region' = 'R3'
LIMIT 50;
Pertanyaan:
- Apakah ini butuh generated column?
- Apakah butuh expression index?
- Apakah butuh GIN index?
- Apakah field
regionseharusnya kolom relational biasa?
Kita akan jawab di part JSONB dan indexing.
22. Java Lab Placeholder
Buat java-lab/README.md:
# Java PostgreSQL Lab
This folder will be used in later parts for:
- pgJDBC experiments
- HikariCP pool sizing
- transaction timeout hierarchy
- Hibernate/JPA query behavior
- Flyway zero-downtime migration
- Testcontainers integration tests
Untuk sekarang, belum perlu membuat service Java. Kita akan membangun ketika sudah punya mental model query, lock, dan transaction.
Namun, dari awal kita sudah memilih lab structure yang kompatibel dengan Java integration.
23. Diagnostic Queries yang Wajib Dihafal Secara Fungsional
Bukan hafal karakter demi karakter, tetapi tahu kapan memakainya.
23.1 Active Sessions
SELECT
pid,
state,
wait_event_type,
wait_event,
now() - xact_start AS xact_age,
now() - query_start AS query_age,
left(query, 120) AS query
FROM pg_stat_activity
WHERE datname = current_database()
ORDER BY query_start NULLS LAST;
23.2 Blocking Sessions
SELECT * FROM lab.blocking_locks;
23.3 Top Statements
SELECT
calls,
mean_exec_ms,
total_exec_ms,
shared_blks_hit,
shared_blks_read,
temp_blks_written,
query_sample
FROM lab.top_statements
LIMIT 20;
23.4 Table Health
SELECT * FROM lab.table_health
WHERE schemaname = 'app';
23.5 Index Usage
SELECT * FROM lab.index_usage
WHERE schemaname = 'app';
23.6 Relation Size
SELECT
n.nspname AS schema_name,
c.relname AS relation_name,
pg_size_pretty(pg_total_relation_size(c.oid)) AS total_size,
pg_size_pretty(pg_relation_size(c.oid)) AS heap_size,
pg_size_pretty(pg_indexes_size(c.oid)) AS indexes_size
FROM pg_class c
JOIN pg_namespace n ON n.oid = c.relnamespace
WHERE n.nspname = 'app'
AND c.relkind = 'r'
ORDER BY pg_total_relation_size(c.oid) DESC;
24. Troubleshooting Setup
24.1 Port 5432 Sudah Dipakai
Ubah port mapping:
ports:
- "55432:5432"
Lalu connection URL:
postgresql://learnpg:learnpg@localhost:55432/learnpg
24.2 Extension Tidak Aktif
Cek:
SHOW shared_preload_libraries;
SELECT * FROM pg_extension;
Jika pg_stat_statements tidak aktif:
- pastikan
shared_preload_libraries = 'pg_stat_statements'; - restart container;
- jalankan
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;.
24.3 Init Script Tidak Berjalan
Script /docker-entrypoint-initdb.d hanya berjalan saat volume kosong.
Reset:
docker compose down -v
docker compose up -d
24.4 Query Terlalu Cepat untuk Dibedakan
Tambah data:
make seed
Atau jalankan query dengan predicate yang lebih buruk untuk latihan. Jangan membuat kesimpulan tuning dari dataset 10 row.
25. Safety Boundary: Lab vs Production
Jangan copy-paste konfigurasi lab ke production.
Contoh:
log_temp_files = 0bisa menghasilkan log besar pada workload tertentu.log_autovacuum_min_duration = 0terlalu noisy untuk production kecuali sedang investigasi.work_mem = 16MBmungkin terlalu besar atau terlalu kecil tergantung concurrency dan query shape.max_connections = 100tidak berarti aplikasi boleh membuat 100 koneksi aktif tanpa budget.pg_stat_statements.track = allperlu dipahami overhead dan volume statistiknya.
Production tuning harus berbasis workload dan metrics.
26. Checklist Selesai Part 002
Kamu boleh lanjut ke Part 003 jika bisa:
- menjalankan PostgreSQL 18 dengan Docker Compose;
- masuk via
psql; - melihat extension aktif;
- menjalankan query pada schema
app; - membaca
lab.active_sessions; - membaca
lab.top_statementssetelah workload; - menjalankan
EXPLAIN (ANALYZE, BUFFERS); - membuat lock wait sederhana dan menemukan blocker;
- menjalankan
pgbenchscript sederhana; - reset lab dengan
docker compose down -v.
27. Latihan
Latihan 1 — Baseline Plan
Jalankan:
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, case_number, status
FROM app.case_file
WHERE tenant_id = 3
AND status = 'UNDER_REVIEW'
ORDER BY created_at DESC
LIMIT 25;
Catat:
- scan type;
- index name;
- buffers hit/read;
- actual rows;
- execution time.
Latihan 2 — JSONB Predicate
Jalankan:
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, case_number
FROM app.case_file
WHERE metadata ->> 'source' = 'api'
LIMIT 100;
Catat apakah query melakukan sequential scan.
Jangan optimasi dulu. Kita akan kembali ke query ini di part JSONB dan indexing.
Latihan 3 — Lock Visibility
Buat satu session menahan row lock. Buat session lain menunggu. Temukan blocker memakai lab.blocking_locks.
Latihan 4 — Workload Stats
Reset pg_stat_statements, jalankan pgbench 60 detik, lalu ambil top statements.
SELECT pg_stat_statements_reset();
pgbench postgresql://learnpg:learnpg@localhost:5432/learnpg \
-f pgbench/mixed-workload.sql \
-c 8 -j 4 -T 60
SELECT * FROM lab.top_statements LIMIT 10;
28. Ringkasan
Di part ini kita membangun fondasi praktik:
- PostgreSQL 18 local lab;
- konfigurasi observability;
- schema enforcement case management;
- seed data;
pg_stat_statements;- diagnostic views;
pgbenchworkload;- lock/deadlock demo;
- EXPLAIN template;
- resettable experiment workflow.
Mulai part berikutnya, kita akan membongkar arsitektur PostgreSQL: process model, memory, storage, WAL, dan background workers. Tujuannya agar setiap gejala di lab punya tempat dalam mental model engine.
29. Referensi
- PostgreSQL 18 Documentation: https://www.postgresql.org/docs/current/
- PostgreSQL 18 Release Notes: https://www.postgresql.org/docs/current/release-18.html
- PostgreSQL
pg_stat_statements: https://www.postgresql.org/docs/current/pgstatstatements.html - PostgreSQL
pgbench: https://www.postgresql.org/docs/current/pgbench.html - PostgreSQL Monitoring Statistics: https://www.postgresql.org/docs/current/monitoring-stats.html
- PostgreSQL Runtime Statistics Configuration: https://www.postgresql.org/docs/current/runtime-config-statistics.html
You just completed lesson 02 in start here. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.
Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.