Final StretchOrdered learning track

Benchmarking PL/pgSQL: Micro, Macro, and Contention Tests

Learn PL/pgSQL In Action - Part 034

Benchmarking PL/pgSQL with micro, macro, and contention tests: pgbench scripts, realistic workloads, parameter skew, correctness invariants, concurrency harnesses, and regression thresholds.

11 min read2124 words
PrevNext
Lesson 3440 lesson track3440 Final Stretch
#postgresql#plpgsql#benchmarking#pgbench+5 more

Part 034 — Benchmarking PL/pgSQL: Micro, Macro, and Contention Tests

Goal: learn how to benchmark PL/pgSQL routines without fooling yourself. We will design microbenchmarks, macrobenchmarks, contention tests, replay workloads, correctness checks, and regression thresholds using PostgreSQL-native tooling and production-grade assumptions.

A benchmark is a claim.

When someone says:

This function handles 10,000 calls per second.

The real questions are:

  • with how many clients?
  • with what data volume?
  • with what parameter distribution?
  • with hot cache or cold cache?
  • with triggers enabled?
  • with audit/outbox enabled?
  • with realistic constraints and indexes?
  • with realistic lock contention?
  • with what isolation level?
  • on what hardware and PostgreSQL settings?
  • with what correctness checks after the run?

Without those answers, the benchmark is mostly theater.

This part builds a benchmarking discipline for PL/pgSQL.


1. Benchmarking Mental Model

Benchmarking is not “run a loop and print time.”

It is controlled evidence about behavior under a declared workload.

A useful benchmark starts with a question:

Bad Benchmark QuestionBetter Benchmark Question
Is this function fast?How does claim_next_job() behave at 1, 8, 32, and 128 clients with 1M pending jobs?
Is dynamic SQL faster?Does dynamic SQL reduce p95 latency for skewed tenants enough to justify planning overhead?
Can the trigger handle load?What is the write amplification of audit trigger under 500 updates/sec?
Is this state transition scalable?Does transition latency degrade when 20 workers target the same case group?

Benchmarks must be anchored to product behavior.


2. Benchmark Types

Benchmark TypeTargetExampleRisk If Missing
Microbenchmarknarrow routine/statementstatic SQL vs dynamic SQL functionlocal optimization guesses
Macrobenchmarkend-to-end workflowsubmit → validate → transition → audit → outboxmisses integration cost
Contention benchmarkconcurrent conflict pathworkers claim same queuerace and lock issues hidden
Data-scale benchmarkincreasing table size100k, 10M, 100M rowsfalse confidence from tiny data
Skew benchmarkuneven distributiontenant 1 has 80% rowsgeneric-plan traps hidden
Regression benchmarkcompare versionsold vs new functionperformance drift unnoticed
Failure benchmarkretries/errorsserialization failure pathretry storms hidden

Do not run only microbenchmarks. PL/pgSQL often looks fine in isolation and fails when triggers, locks, audit, and data skew join the party.


3. Benchmark Invariants

Every benchmark should declare invariants.

Example for a queue claim function:

## Invariants
- No job is claimed by more than one worker.
- Claimed jobs move from READY to CLAIMED exactly once.
- Failed claims do not modify job state.
- Total claimed count equals distinct worker claims.
- No transaction remains open after the benchmark.

Performance without correctness is irrelevant.

After every run, verify the data:

-- No duplicate claims.
SELECT job_id, count(*)
FROM bench.job_claim_log
GROUP BY job_id
HAVING count(*) > 1;

-- Status/count reconciliation.
SELECT status, count(*)
FROM bench.job
GROUP BY status
ORDER BY status;

-- Claimed jobs have a log.
SELECT count(*)
FROM bench.job j
WHERE j.status = 'CLAIMED'
  AND NOT EXISTS (
    SELECT 1
    FROM bench.job_claim_log l
    WHERE l.job_id = j.job_id
  );

A benchmark that does not validate invariants can reward broken code.


4. Use pgbench for Database-Native Load

pgbench is PostgreSQL's built-in benchmarking client. It can run repeated SQL command sequences across multiple concurrent sessions and report transaction rate.

Basic custom-script run:

pgbench \
  --client=16 \
  --jobs=4 \
  --time=120 \
  --progress=10 \
  --file=bench/claim_job.sql \
  appdb

Useful options:

OptionMeaning
--client / -cconcurrent database sessions
--jobs / -jworker threads in pgbench client
--time / -Tduration-based run
--transactions / -tfixed transactions per client
--progress / -Pperiodic progress output
--file / -fcustom transaction script
--rate / -Rtarget rate instead of maximum pressure
--latency-limit / -Lcount transactions above latency limit
--report-latenciesreport latency statistics
--aggregate-intervalaggregate logs by interval

Maximum-throughput tests and rate-limited tests answer different questions.

  • Maximum throughput asks: “Where does it saturate?”
  • Rate-limited test asks: “Does it meet SLA at expected traffic?”

Production readiness needs both.


5. Benchmark Schema Design

Create dedicated benchmark schemas. Do not benchmark against ad-hoc tables with unknown history.

CREATE SCHEMA IF NOT EXISTS bench;

DROP TABLE IF EXISTS bench.job_claim_log;
DROP TABLE IF EXISTS bench.job;

CREATE TABLE bench.job (
  job_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  tenant_id bigint NOT NULL,
  status text NOT NULL CHECK (status IN ('READY', 'CLAIMED', 'DONE', 'FAILED')),
  priority integer NOT NULL DEFAULT 0,
  available_at timestamptz NOT NULL DEFAULT clock_timestamp(),
  claimed_by text,
  claimed_at timestamptz,
  payload jsonb NOT NULL DEFAULT '{}'::jsonb
);

CREATE INDEX bench_job_ready_idx
ON bench.job (priority DESC, available_at, job_id)
WHERE status = 'READY';

CREATE TABLE bench.job_claim_log (
  claim_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  job_id bigint NOT NULL REFERENCES bench.job(job_id),
  worker_id text NOT NULL,
  claimed_at timestamptz NOT NULL DEFAULT clock_timestamp(),
  UNIQUE (job_id)
);

Seed realistic data:

INSERT INTO bench.job (tenant_id, status, priority, available_at, payload)
SELECT
  CASE
    WHEN gs <= 800000 THEN 1
    WHEN gs <= 950000 THEN 2
    ELSE 3
  END AS tenant_id,
  'READY',
  (random() * 100)::integer,
  clock_timestamp() - ((random() * 86400)::integer || ' seconds')::interval,
  jsonb_build_object('source', 'benchmark', 'n', gs)
FROM generate_series(1, 1000000) AS gs;

ANALYZE bench.job;

This creates skew. Skew is not a bug in the benchmark; it is production realism.


6. Benchmark Routine: Queue Claim

A realistic PL/pgSQL claim function:

CREATE OR REPLACE FUNCTION bench.claim_next_job(p_worker_id text)
RETURNS bigint
LANGUAGE plpgsql
AS $$
DECLARE
  v_job_id bigint;
BEGIN
  SELECT j.job_id
  INTO v_job_id
  FROM bench.job j
  WHERE j.status = 'READY'
    AND j.available_at <= clock_timestamp()
  ORDER BY j.priority DESC, j.available_at, j.job_id
  FOR UPDATE SKIP LOCKED
  LIMIT 1;

  IF v_job_id IS NULL THEN
    RETURN NULL;
  END IF;

  UPDATE bench.job j
  SET status = 'CLAIMED',
      claimed_by = p_worker_id,
      claimed_at = clock_timestamp()
  WHERE j.job_id = v_job_id;

  INSERT INTO bench.job_claim_log (job_id, worker_id)
  VALUES (v_job_id, p_worker_id);

  RETURN v_job_id;
END;
$$;

This routine contains real benchmark concerns:

  • selective partial index;
  • row lock;
  • SKIP LOCKED;
  • update;
  • log insert;
  • unique correctness guard;
  • concurrency behavior.

7. pgbench Script for Claim Function

bench/claim_job.sql:

\set worker_id random(1, 1000000)
SELECT bench.claim_next_job('worker-' || :worker_id);

Run matrix:

for c in 1 4 8 16 32 64; do
  pgbench \
    --client=$c \
    --jobs=4 \
    --time=120 \
    --progress=10 \
    --file=bench/claim_job.sql \
    appdb \
    > results/claim_job_c${c}.txt
 done

After each run, reset or reseed the dataset. Otherwise later runs benchmark a different workload.

Reset example:

TRUNCATE bench.job_claim_log;
UPDATE bench.job
SET status = 'READY',
    claimed_by = NULL,
    claimed_at = NULL;
VACUUM ANALYZE bench.job;

For large tables, full reset may be expensive. In that case, use isolated benchmark database snapshots, containerized instances, or partitioned benchmark runs.


8. Read the Results Correctly

A simple pgbench output may show transactions per second and latency. Do not stop there.

Capture database-side metrics around the run.

Before:

SELECT pg_stat_reset();
SELECT pg_stat_statements_reset();

After:

SELECT
  calls,
  total_exec_time,
  mean_exec_time,
  max_exec_time,
  rows,
  shared_blks_hit,
  shared_blks_read,
  temp_blks_written,
  wal_records,
  wal_bytes,
  left(query, 160) AS query_sample
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;

Function timing:

SELECT
  schemaname,
  funcname,
  calls,
  total_time,
  self_time,
  total_time / NULLIF(calls, 0) AS avg_total_ms,
  self_time / NULLIF(calls, 0) AS avg_self_ms
FROM pg_stat_user_functions
WHERE schemaname = 'bench'
ORDER BY total_time DESC;

Lock snapshot during run:

SELECT
  wait_event_type,
  wait_event,
  count(*)
FROM pg_stat_activity
WHERE datname = current_database()
GROUP BY wait_event_type, wait_event
ORDER BY count(*) DESC;

Interpret benchmark with at least four dimensions:

  1. throughput;
  2. latency distribution;
  3. database resource profile;
  4. correctness invariants.

9. Microbenchmark: Static SQL vs Dynamic SQL

Question:

Does dynamic SQL improve parameter-sensitive lookup enough to justify replanning cost?

Schema:

CREATE TABLE bench.case_file (
  case_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  tenant_id bigint NOT NULL,
  status text NOT NULL,
  created_at timestamptz NOT NULL DEFAULT clock_timestamp()
);

CREATE INDEX bench_case_file_tenant_status_idx
ON bench.case_file (tenant_id, status, created_at DESC);

Skewed data:

INSERT INTO bench.case_file (tenant_id, status, created_at)
SELECT
  CASE WHEN gs <= 900000 THEN 1 ELSE gs END,
  CASE WHEN random() < 0.9 THEN 'OPEN' ELSE 'CLOSED' END,
  clock_timestamp() - ((random() * 31536000)::bigint || ' seconds')::interval
FROM generate_series(1, 1000000) gs;

ANALYZE bench.case_file;

Static function:

CREATE OR REPLACE FUNCTION bench.count_open_cases_static(p_tenant_id bigint)
RETURNS bigint
LANGUAGE plpgsql
AS $$
DECLARE
  v_count bigint;
BEGIN
  SELECT count(*)
  INTO v_count
  FROM bench.case_file c
  WHERE c.tenant_id = p_tenant_id
    AND c.status = 'OPEN';

  RETURN v_count;
END;
$$;

Dynamic function:

CREATE OR REPLACE FUNCTION bench.count_open_cases_dynamic(p_tenant_id bigint)
RETURNS bigint
LANGUAGE plpgsql
AS $$
DECLARE
  v_count bigint;
BEGIN
  EXECUTE
    'SELECT count(*) FROM bench.case_file c WHERE c.tenant_id = $1 AND c.status = $2'
  INTO v_count
  USING p_tenant_id, 'OPEN';

  RETURN v_count;
END;
$$;

pgbench script for skew:

\set tenant random(1, 1000000)
SELECT bench.count_open_cases_static(
  CASE WHEN :tenant <= 900000 THEN 1 ELSE :tenant END
);

Run both versions under the same data, same clients, same duration, same warmup. Compare:

  • mean latency;
  • p95/p99 if captured;
  • planning time if tracked;
  • total execution time;
  • plan shape for tenant 1 vs rare tenant;
  • correctness count.

Do not choose a winner from one client and one dataset.


10. Macrobenchmark: Workflow Path

Microbenchmarks are good for isolated hypotheses. Macrobenchmarks test real workflow cost.

Example case transition path:

Benchmark script:

\set case_id random(1, 1000000)
SELECT app.transition_case(:case_id, 'ESCALATED', 'benchmark');

But this is not enough. If cases can be transitioned only once, random ids will increasingly hit invalid states. That changes benchmark meaning.

Better script pattern:

WITH candidate AS (
  SELECT case_id
  FROM app.case_file
  WHERE status = 'OPEN'
  ORDER BY random()
  LIMIT 1
)
SELECT app.transition_case(candidate.case_id, 'ESCALATED', 'benchmark')
FROM candidate;

This is realistic but expensive because of ORDER BY random(). For benchmark generation, pre-create a candidate table:

CREATE TABLE bench.transition_candidate AS
SELECT case_id, row_number() OVER () AS rn
FROM app.case_file
WHERE status = 'OPEN';

CREATE UNIQUE INDEX ON bench.transition_candidate (rn);
ANALYZE bench.transition_candidate;

pgbench:

\set rn random(1, 1000000)
SELECT app.transition_case(c.case_id, 'ESCALATED', 'benchmark')
FROM bench.transition_candidate c
WHERE c.rn = :rn;

Then verify:

SELECT status, count(*)
FROM app.case_file
GROUP BY status;

SELECT count(*) FROM app.audit_event WHERE reason_code = 'benchmark';
SELECT count(*) FROM app.outbox_event WHERE aggregate_type = 'case_file';

Macrobenchmarks must include side effects.


11. Contention Benchmark

Contention tests ask:

What happens when multiple sessions want the same logical resource?

Example: same case group has a serial invariant.

CREATE TABLE bench.case_group_counter (
  group_id bigint PRIMARY KEY,
  open_count bigint NOT NULL
);

CREATE TABLE bench.case_item (
  case_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  group_id bigint NOT NULL,
  status text NOT NULL
);

Function:

CREATE OR REPLACE FUNCTION bench.close_case_with_counter(p_case_id bigint)
RETURNS void
LANGUAGE plpgsql
AS $$
DECLARE
  v_group_id bigint;
BEGIN
  SELECT group_id
  INTO STRICT v_group_id
  FROM bench.case_item
  WHERE case_id = p_case_id
  FOR UPDATE;

  UPDATE bench.case_item
  SET status = 'CLOSED'
  WHERE case_id = p_case_id
    AND status = 'OPEN';

  IF FOUND THEN
    UPDATE bench.case_group_counter
    SET open_count = open_count - 1
    WHERE group_id = v_group_id;
  END IF;
END;
$$;

Contention script intentionally targets a small group set:

\set case_id random(1, 10000)
SELECT bench.close_case_with_counter(:case_id);

Measure:

  • throughput collapse as clients increase;
  • lock waits;
  • deadlocks;
  • invariant errors;
  • p99 latency;
  • counter correctness.

Correctness assertion:

SELECT g.group_id, g.open_count, actual.actual_open_count
FROM bench.case_group_counter g
JOIN (
  SELECT group_id, count(*) AS actual_open_count
  FROM bench.case_item
  WHERE status = 'OPEN'
  GROUP BY group_id
) actual USING (group_id)
WHERE g.open_count <> actual.actual_open_count;

If the benchmark finds counter drift, the function is incorrect regardless of TPS.


12. Warmup, Cache, and Repeatability

Benchmarks are sensitive to cache state.

Run TypeMeaning
Cold-ish rundata not mostly in shared buffers / OS cache
Warm runrepeated workload benefits from cache
Steady-state runsystem has reached stable throughput/latency
Saturation runclients exceed capacity and queueing dominates

For application SLOs, steady-state under expected traffic is often more useful than maximum TPS.

Recommendations:

  • run warmup separately;
  • discard first interval;
  • repeat runs;
  • report variance;
  • keep data volume constant;
  • isolate benchmark machine;
  • record PostgreSQL settings;
  • record schema/index version;
  • record git commit/migration version;
  • record hardware/container resource limits.

Example run record:

## Benchmark Run
- Date: 2026-07-03
- PostgreSQL: 18.x
- Dataset: 10M case_file rows, 100M audit rows
- Script: bench/transition_case.sql
- Clients: 1, 8, 32, 64
- Duration: 180s + 30s warmup
- Hardware: 8 vCPU, 32GB RAM, NVMe
- Config changes: shared_buffers=8GB, work_mem=64MB
- App version: git sha ...
- Migration version: ...

Without a run record, benchmark results are hard to reproduce and easy to misuse.


13. Rate-Limited Benchmarking

Maximum pressure can hide SLA behavior.

Rate-limited test:

pgbench \
  --client=32 \
  --jobs=8 \
  --time=300 \
  --rate=500 \
  --latency-limit=200 \
  --file=bench/transition_case.sql \
  appdb

This asks:

At 500 transactions/sec, how many calls exceed 200ms?

That is closer to production SLO than “what is the maximum TPS?”

Use rate-limited tests for:

  • expected traffic;
  • Black Friday / peak model;
  • batch windows;
  • migration cutover load;
  • retry storm simulation.

14. Benchmarking Failure Paths

Many systems benchmark only success. Production often suffers in failure paths.

Failure path examples:

  • duplicate idempotency key;
  • invalid state transition;
  • serialization failure retry;
  • lock timeout;
  • permission denied;
  • missing referenced entity;
  • audit insert conflict;
  • outbox unavailable due to constraint problem.

Example invalid transition script:

\set case_id random(1, 1000000)
SELECT app.transition_case(:case_id, 'IMPOSSIBLE_STATUS', 'benchmark_failure');

If this throws and aborts every transaction, pgbench will report failures. That may be desired. But for controlled failure-path latency, wrap with a function that catches expected domain SQLSTATE and records result.

CREATE OR REPLACE FUNCTION bench.try_invalid_transition(p_case_id bigint)
RETURNS text
LANGUAGE plpgsql
AS $$
BEGIN
  PERFORM app.transition_case(p_case_id, 'IMPOSSIBLE_STATUS', 'benchmark_failure');
  RETURN 'unexpected_success';
EXCEPTION
  WHEN SQLSTATE 'P2401' THEN
    RETURN 'expected_domain_failure';
END;
$$;

Then benchmark:

\set case_id random(1, 1000000)
SELECT bench.try_invalid_transition(:case_id);

Failure handling can be more expensive than success if it writes logs, captures diagnostics, or performs compensating work.


15. Benchmarking Retry Behavior

Retry logic can amplify load.

Suppose a caller retries serialization failures. Benchmark the total attempt cost, not just successful calls.

Database helper:

CREATE TABLE bench.retry_observation (
  observation_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  worker_id text NOT NULL,
  attempt_no integer NOT NULL,
  sqlstate text,
  success boolean NOT NULL,
  observed_at timestamptz NOT NULL DEFAULT clock_timestamp()
);

Application-level retries are better tested from the application harness, because PL/pgSQL functions normally run inside a transaction controlled by the caller. Still, database-side observations can show retry storms.

Metrics:

  • attempts per successful transition;
  • SQLSTATE distribution;
  • latency including retries;
  • lock wait during retries;
  • duplicate side effects;
  • idempotency correctness.

A workflow is not production-ready until retry behavior is measured.


16. Benchmarking Triggers

Trigger cost is often invisible in naive benchmarks.

Create variants:

VariantPurpose
trigger disabled in isolated testlower-bound mutation cost
audit onlyaudit overhead
audit + outboxfull side-effect overhead
full production triggersrealistic path

Do not disable triggers in production. For benchmark isolation, use dedicated schemas or databases.

Measure statement with trigger effects:

BEGIN;
EXPLAIN (ANALYZE, BUFFERS, WAL)
UPDATE app.case_file
SET status = 'ESCALATED'
WHERE status = 'OPEN'
  AND tenant_id = 42
LIMIT 0; -- invalid syntax: use controlled predicate instead
ROLLBACK;

PostgreSQL does not support UPDATE ... LIMIT directly. Use a CTE:

BEGIN;
WITH candidate AS (
  SELECT case_id
  FROM app.case_file
  WHERE status = 'OPEN'
    AND tenant_id = 42
  ORDER BY case_id
  LIMIT 1000
)
EXPLAIN (ANALYZE, BUFFERS, WAL)
UPDATE app.case_file c
SET status = 'ESCALATED'
FROM candidate x
WHERE x.case_id = c.case_id;
ROLLBACK;

This tests bounded trigger cost.


17. Benchmarking Large Result Functions

Set-returning PL/pgSQL functions can produce misleading benchmarks. They may materialize results and hide memory or temp-file behavior.

Benchmark questions:

  • How many rows are returned?
  • Does the caller fetch all rows?
  • Is ordering required?
  • Does the function use RETURN NEXT loop or RETURN QUERY?
  • Are temp files created?
  • Does latency measure first row or full result?

Example bad benchmark:

SELECT count(*) FROM app.get_large_report(42);

This measures full consumption, not client streaming behavior.

Better variants:

-- Full result cost.
SELECT count(*) FROM app.get_large_report(42);

-- Page-like access.
SELECT * FROM app.get_large_report_page(42, 1000, NULL);

-- Query plan for underlying statement.
EXPLAIN (ANALYZE, BUFFERS)
SELECT ... underlying report query ...;

For large result APIs, benchmark pagination/keyset design separately.


18. Regression Thresholds

A benchmark becomes useful when it becomes a gate.

Example threshold file:

routine: app.transition_case
workload: bench/transition_case.sql
dataset: case-management-10m-v3
thresholds:
  mean_latency_ms_max: 25
  p95_latency_ms_max: 80
  p99_latency_ms_max: 200
  tps_min: 500
  error_rate_max: 0.001
  duplicate_side_effects_max: 0
  wal_bytes_per_tx_max: 5000
  lock_wait_ratio_max: 0.02

Use thresholds carefully:

  • too strict causes noisy CI;
  • too loose catches nothing;
  • hardware-dependent thresholds must run on stable benchmark infrastructure;
  • logic thresholds are more stable than raw TPS.

Good gate:

No duplicate claims. p95 latency must not regress by more than 20% against baseline on benchmark runner.

Bad gate:

Must always be faster than 10ms everywhere.


19. Benchmark Result Table

Store benchmark results as data.

CREATE TABLE bench.benchmark_run (
  run_id uuid PRIMARY KEY,
  benchmark_name text NOT NULL,
  git_sha text,
  migration_version text,
  postgres_version text NOT NULL DEFAULT version(),
  dataset_name text NOT NULL,
  clients integer NOT NULL,
  duration_seconds integer NOT NULL,
  started_at timestamptz NOT NULL DEFAULT clock_timestamp(),
  notes text
);

CREATE TABLE bench.benchmark_metric (
  run_id uuid NOT NULL REFERENCES bench.benchmark_run(run_id),
  metric_name text NOT NULL,
  metric_value numeric NOT NULL,
  unit text NOT NULL,
  PRIMARY KEY (run_id, metric_name)
);

This enables trend analysis:

SELECT
  r.started_at,
  r.git_sha,
  r.clients,
  m.metric_value AS p95_latency_ms
FROM bench.benchmark_run r
JOIN bench.benchmark_metric m USING (run_id)
WHERE r.benchmark_name = 'transition_case'
  AND m.metric_name = 'p95_latency_ms'
ORDER BY r.started_at;

Performance work becomes engineering when history is queryable.


20. Benchmark Anti-Patterns

Anti-PatternWhy It Misleads
tiny datasethides indexes, cache, partition, and stats issues
one client onlyhides lock/contention behavior
max TPS onlyignores SLA and tail latency
no correctness checkrewards broken code
no warmup policymixes startup effects with steady-state
random data onlyhides skew and hot keys
benchmark on developer laptop onlypoor production prediction
disabled triggershides real workflow cost
no version recordimpossible to reproduce
averages onlyhides p95/p99 pain
comparing different datasetsinvalid conclusion
no reset between runsworkload changes over time

A bad benchmark is worse than no benchmark because it creates confidence in the wrong thing.


21. Practical Benchmark Playbook

For any important PL/pgSQL routine:

  1. Define the product question.
  2. Define invariants.
  3. Create realistic data volume.
  4. Include skew and hot keys.
  5. Benchmark success path.
  6. Benchmark failure path.
  7. Benchmark concurrency path.
  8. Capture pg_stat_statements.
  9. Capture function stats.
  10. Capture lock/wait snapshots.
  11. Verify correctness after the run.
  12. Repeat across a client matrix.
  13. Store results.
  14. Add regression threshold.
  15. Document interpretation.

22. Example Final Benchmark Report

# Benchmark Report: app.transition_case

## Question
Can `app.transition_case()` sustain expected peak traffic of 500 transitions/sec with p95 < 80ms and no duplicate audit/outbox side effects?

## Workload
- Dataset: 10M cases, 50M audit rows, 10M outbox rows
- Distribution: tenant 1 owns 70% of open cases
- Clients: 1, 8, 16, 32, 64
- Duration: 5 minutes each, 30s warmup discarded
- Script: bench/transition_case.sql

## Correctness Invariants
- no invalid state transition accepted
- every successful transition has exactly one audit event
- every successful transition has exactly one outbox event
- no case has duplicate transition for same command id

## Results
| Clients | TPS | p95 ms | p99 ms | Error % | Notes |
|---:|---:|---:|---:|---:|---|
| 1 | ... | ... | ... | ... | baseline |
| 8 | ... | ... | ... | ... | stable |
| 16 | ... | ... | ... | ... | stable |
| 32 | ... | ... | ... | ... | lock wait begins |
| 64 | ... | ... | ... | ... | saturation |

## Diagnosis
Saturation begins around 32 clients due to row lock contention on hot tenant transition sequence.

## Decision
Accept for expected load with rate limit. For batch escalation, use partitioned worker by tenant group.

This is the level of evidence expected in serious engineering organizations.


23. What Top Engineers Do Differently

They do not benchmark to win an argument. They benchmark to reduce uncertainty.

A top-tier PL/pgSQL benchmark is:

  • explicit about workload;
  • honest about data distribution;
  • concurrent enough to expose locks;
  • strict about correctness;
  • repeatable;
  • connected to SLOs;
  • tracked over time;
  • interpreted with database evidence, not just client-side TPS.

The goal is not a pretty number.

The goal is this sentence:

Under this workload, with this data shape, this routine preserves these invariants and meets these latency/throughput bounds until this saturation point.

That sentence is engineering.


References

Lesson Recap

You just completed lesson 34 in final stretch. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.