Series/Learn Database Design and Architect

Start HereOrdered learning track

Workload-First Design

Learn Database Design and Architect - Part 004

Production-grade workload-first database design, covering OLTP, OLAP, HTAP, query shapes, write paths, access patterns, consistency, latency, throughput, contention, and architectural tradeoffs.

[2026-07-04]25 min read4901 words

In This Lesson

1. Tujuan Part Ini 2. The Core Mistake: Schema-First Without Workload 3. Workload-First Design Definition

PrevNext

Lesson 0484 lesson track01–15 Start Here

#database#architecture#workload#oltp+5 more

Part 004 — Workload-First Design

Database design yang buruk sering dimulai dari bentuk data.

Database design yang kuat dimulai dari workload.

Bukan berarti data model tidak penting. Data model sangat penting. Tetapi di sistem produksi, model yang “rapi” bisa tetap gagal jika tidak cocok dengan pola akses, volume, latency target, contention, growth, dan consistency requirement.

Part ini membahas workload-first design: cara mendesain database dari perilaku sistem nyata.

Mental model utamanya:

Database bukan hanya struktur tabel. Database adalah mesin yang menerima workload: reads, writes, transactions, scans, joins, locks, replication, backups, migrations, analytics, and failures.

1. Tujuan Part Ini

Setelah part ini, kamu harus mampu:

Mengklasifikasi workload sebelum memilih database atau schema.
Membedakan OLTP, OLAP, HTAP, event-heavy, search-heavy, graph-heavy, dan write-heavy workload.
Membaca query shape sebagai input desain fisik.
Menentukan kapan normalisasi bagus dan kapan read model/projection diperlukan.
Mengidentifikasi contention point sebelum production incident.
Menentukan index, partition, sharding, replication, dan caching dari workload, bukan dari tebakan.
Membuat workload profile sebagai bagian dari database design document.

2. The Core Mistake: Schema-First Without Workload

Contoh schema tampak benar:

CREATE TABLE customer (
    customer_id UUID PRIMARY KEY,
    name TEXT NOT NULL,
    email TEXT NOT NULL UNIQUE
);

CREATE TABLE orders (
    order_id UUID PRIMARY KEY,
    customer_id UUID NOT NULL REFERENCES customer(customer_id),
    status TEXT NOT NULL,
    created_at TIMESTAMPTZ NOT NULL
);

CREATE TABLE order_item (
    order_item_id UUID PRIMARY KEY,
    order_id UUID NOT NULL REFERENCES orders(order_id),
    product_id UUID NOT NULL,
    quantity INT NOT NULL,
    price_amount NUMERIC(19, 4) NOT NULL
);

Secara logical, ini masuk akal.

Tetapi workload bisa membuat desain ini gagal.

Pertanyaan yang belum dijawab:

Apakah order dibuat 10 per menit atau 50.000 per detik?
Apakah query utama adalah order detail by ID atau dashboard agregasi harian?
Apakah customer melihat order history dengan pagination?
Apakah admin filter order berdasarkan status dan date range?
Apakah reporting butuh join jutaan row setiap menit?
Apakah order item immutable?
Apakah status sering berubah?
Apakah ada hot customer dengan jutaan order?
Apakah order ID random menyebabkan index locality buruk?
Apakah created_at partition diperlukan?
Apakah read replica cukup atau butuh projection table?
Apakah write transaction menyentuh inventory, payment, fraud, ledger?

Tanpa workload, schema hanya setengah desain.

3. Workload-First Design Definition

Workload-first design adalah pendekatan yang mendesain database berdasarkan:

Access patterns: query dan command yang benar-benar dijalankan.
Frequency: seberapa sering operasi terjadi.
Latency target: berapa cepat operasi harus selesai.
Throughput target: berapa banyak operasi per detik/menit.
Consistency need: seberapa segar/benar data harus terlihat.
Transaction boundary: apa yang harus atomic.
Contention profile: row/key mana yang sering diperebutkan.
Data growth: bagaimana volume bertambah.
Retention: berapa lama data aktif/arsip.
Failure tolerance: apa yang boleh degrade.

Schema adalah hasil dari semua ini, bukan starting point tunggal.

4. Workload Taxonomy

Ada banyak jenis workload. Architect harus bisa mengklasifikasi dengan cepat.

Setiap jenis punya desain berbeda.

5. OLTP Workload

OLTP adalah workload transaksi operasional.

Ciri-ciri:

banyak read/write kecil,
latency rendah,
transaksi pendek,
consistency penting,
query by primary key atau selective index,
concurrency tinggi,
rollback harus aman,
data current state penting.

Contoh:

create order,
update case status,
approve application,
assign task,
capture payment,
change customer profile.

OLTP Design Priorities

Keep transactions short.
Keep indexes selective and purposeful.
Avoid long-running analytical scans on primary database.
Avoid hot counters.
Enforce critical invariants close to transaction boundary.
Design for idempotency and retry.
Control lock ordering.
Separate operational read model from heavy reporting when necessary.

OLTP Query Shape Example

SELECT *
FROM case_file
WHERE case_id = :case_id;

Ini simple primary key lookup.

SELECT case_id, case_number, priority, due_at
FROM case_file
WHERE assigned_to = :user_id
  AND status IN ('ASSIGNED', 'UNDER_REVIEW')
ORDER BY due_at ASC
LIMIT 50;

Ini inbox query. Butuh index yang mengikuti predicate dan ordering.

Contoh index:

CREATE INDEX idx_case_inbox
ON case_file (assigned_to, status, due_at, case_id);

Index ini tidak muncul dari normalisasi. Ia muncul dari workload.

6. OLAP Workload

OLAP adalah workload analitik.

Ciri-ciri:

scan data besar,
aggregation,
group by,
joins besar,
query lebih lama,
freshness bisa delayed,
columnar storage sering cocok,
dimensional model sering lebih cocok.

Contoh:

monthly enforcement report,
revenue by region,
SLA breach trends,
cohort analysis,
regulator dashboard,
historical compliance metrics.

OLAP Design Priorities

Define grain.
Separate fact and dimension.
Optimize scans and aggregations.
Use snapshots when report must be reproducible.
Avoid running large analytical queries on OLTP primary.
Decide freshness SLA.
Design lineage and reconciliation.

OLAP Query Example

SELECT
    date_trunc('month', closed_at) AS month,
    violation_type,
    count(*) AS closed_cases,
    avg(extract(epoch FROM closed_at - opened_at) / 86400) AS avg_days_to_close
FROM case_fact
WHERE closed_at >= date '2026-01-01'
  AND closed_at < date '2027-01-01'
GROUP BY 1, 2
ORDER BY 1, 2;

Ini bukan query yang ideal dijalankan terus-menerus di OLTP primary jika volume besar.

7. HTAP Workload

HTAP mencoba menggabungkan operational dan analytical workload.

Ini menggoda, tapi harus hati-hati.

Masalah utama:

transaksi kecil butuh latency rendah,
analitik butuh scan besar,
dua workload bersaing CPU, memory, IO,
index untuk OLTP tidak sama dengan layout untuk OLAP,
consistency/freshness expectation berbeda.

HTAP masuk akal jika:

analytical query tidak terlalu berat,
freshness sangat penting,
platform memang dirancang untuk hybrid,
workload isolation kuat,
tim punya operational maturity.

Jika tidak, lebih aman:

8. Event-Heavy Workload

Event-heavy workload berpusat pada append, ordering, replay, dan projection.

Ciri-ciri:

banyak insert append-only,
event immutable,
ordering penting,
consumers banyak,
deduplication perlu,
projection bisa stale,
replay/rebuild dibutuhkan.

Contoh:

audit events,
workflow events,
payment events,
activity timeline,
CDC outbox,
sensor ingestion.

Schema contoh:

CREATE TABLE domain_event (
    event_id UUID PRIMARY KEY,
    aggregate_type TEXT NOT NULL,
    aggregate_id UUID NOT NULL,
    event_type TEXT NOT NULL,
    event_version INT NOT NULL,
    occurred_at TIMESTAMPTZ NOT NULL,
    recorded_at TIMESTAMPTZ NOT NULL,
    sequence_no BIGINT NOT NULL,
    payload JSONB NOT NULL,
    metadata JSONB NOT NULL,
    UNIQUE (aggregate_type, aggregate_id, sequence_no)
);

Index dari workload:

CREATE INDEX idx_domain_event_aggregate
ON domain_event (aggregate_type, aggregate_id, sequence_no);

CREATE INDEX idx_domain_event_recorded_at
ON domain_event (recorded_at);

Pertanyaan workload:

query by aggregate?
query by time?
replay all events?
replay one aggregate?
consume by sequence?
partition by time atau aggregate?
event retention berapa lama?

9. Search-Heavy Workload

Search-heavy workload tidak cocok dipaksa menjadi relational filtering biasa.

Ciri-ciri:

full-text search,
fuzzy match,
ranking,
highlighting,
faceting,
relevance scoring,
language analyzer,
partial match,
large result set.

Contoh:

search case by free text,
search document evidence,
customer support search,
legal/regulatory document search,
product catalog search.

Pattern umum:

System of record tetap database utama. Search index adalah projection.

Pertanyaan penting:

Search result boleh stale berapa lama?
Apa yang terjadi jika search index tertinggal?
Apakah authorization filter diterapkan di search index atau setelah fetch dari DB?
Apakah deleted/hidden records cepat hilang dari index?
Apakah indexing idempotent?

Search-heavy design bukan hanya “tambahkan Elasticsearch/OpenSearch”. Boundary kebenarannya harus jelas.

10. Graph-Heavy Workload

Graph-heavy workload muncul ketika pertanyaan utama adalah relationship traversal.

Contoh:

beneficial ownership,
fraud ring,
dependency graph,
social graph,
regulatory relationship network,
case-to-entity-to-document-to-transaction link analysis.

Relational database bisa menyimpan graph, tetapi traversal kompleks bisa sulit.

Pertanyaan workload:

traversal depth berapa?
graph berubah sering atau jarang?
edge punya property?
perlu shortest path?
perlu community detection?
query real-time atau offline?
correctness requirement tinggi?

Relational edge table:

CREATE TABLE entity_relationship (
    relationship_id UUID PRIMARY KEY,
    from_entity_id UUID NOT NULL,
    to_entity_id UUID NOT NULL,
    relationship_type TEXT NOT NULL,
    valid_from TIMESTAMPTZ NOT NULL,
    valid_to TIMESTAMPTZ,
    source_document_id UUID,
    confidence_score NUMERIC(5, 4),
    created_at TIMESTAMPTZ NOT NULL
);

CREATE INDEX idx_relationship_from
ON entity_relationship (from_entity_id, relationship_type);

CREATE INDEX idx_relationship_to
ON entity_relationship (to_entity_id, relationship_type);

Ini cukup untuk depth rendah dan query sederhana.

Jika traversal mendalam dan dynamic, graph database atau graph projection bisa lebih cocok.

11. Time-Series Workload

Time-series workload berpusat pada waktu.

Ciri-ciri:

append-heavy,
query by time range,
aggregation by bucket,
retention/rollup,
high ingestion,
late-arriving data mungkin ada,
compression/partitioning penting.

Contoh:

metrics,
audit log,
sensor data,
price feed,
operational events,
SLA snapshots.

Schema sederhana:

CREATE TABLE service_metric_sample (
    service_name TEXT NOT NULL,
    metric_name TEXT NOT NULL,
    sampled_at TIMESTAMPTZ NOT NULL,
    value DOUBLE PRECISION NOT NULL,
    tags JSONB NOT NULL DEFAULT '{}'::jsonb,
    PRIMARY KEY (service_name, metric_name, sampled_at)
);

Workload questions:

Ingestion rate?
Query latest or historical range?
Retention raw data?
Rollup resolution?
Cardinality of tags?
Late samples allowed?
Need downsampling?

Time-series failure mode yang sering terjadi: high-cardinality label/tag meledakkan storage dan index.

12. Ledger / Append-Only Workload

Ledger workload menuntut auditability dan balance correctness.

Ciri-ciri:

append-only entries,
no silent update,
correction via reversal/adjustment,
idempotency wajib,
reconciliation penting,
sum/aggregate consistency penting,
ordering often important.

Contoh:

wallet,
accounting,
points balance,
inventory movement,
enforcement penalty ledger,
quota usage ledger.

Pattern:

CREATE TABLE ledger_entry (
    entry_id UUID PRIMARY KEY,
    ledger_id UUID NOT NULL,
    transaction_id UUID NOT NULL,
    entry_type TEXT NOT NULL,
    amount NUMERIC(19, 4) NOT NULL,
    currency TEXT NOT NULL,
    direction TEXT NOT NULL CHECK (direction IN ('DEBIT', 'CREDIT')),
    occurred_at TIMESTAMPTZ NOT NULL,
    recorded_at TIMESTAMPTZ NOT NULL,
    reversal_of_entry_id UUID,
    UNIQUE (ledger_id, transaction_id, entry_type)
);

Design principle:

Jangan update balance sebagai satu-satunya truth. Balance adalah derived state dari movements/entries, walaupun boleh disimpan sebagai cached current balance dengan reconciliation.

13. Read Workload vs Write Workload

Setiap table punya read/write profile.

Contoh:

Entity	Write Pattern	Read Pattern	Risk
case_file	moderate updates	frequent inbox/detail	lock contention on status updates
case_event	high append	replay/time query	storage growth
customer	low update	frequent lookup	PII/security
audit_log	high append	rare investigation	retention/index bloat
dashboard_summary	batch/async update	frequent dashboard read	stale data
SLA snapshot	periodic write	trend analysis	high volume

A good database design document always contains workload table like this.

14. Query Shape Is Architecture Input

Query shape means the structure of the query:

equality lookup,
range scan,
prefix search,
full-text search,
aggregate,
join-heavy,
top-N,
pagination,
existence check,
anti-join,
recursive traversal.

Each shape has different design implications.

14.1 Equality Lookup

SELECT * FROM customer WHERE customer_id = :id;

Design:

primary key,
direct lookup,
cacheable,
low latency.

14.2 Range Query

SELECT *
FROM case_file
WHERE opened_at >= :from
  AND opened_at < :to
ORDER BY opened_at;

Design:

index on opened_at,
possibly partition by time,
beware large range scans.

14.3 Composite Filter + Sort

SELECT *
FROM case_file
WHERE assigned_to = :user_id
  AND status = 'UNDER_REVIEW'
ORDER BY due_at
LIMIT 50;

Design:

CREATE INDEX idx_case_assignee_status_due
ON case_file (assigned_to, status, due_at);

14.4 Top-N Query

SELECT *
FROM enforcement_case
WHERE status = 'OPEN'
ORDER BY risk_score DESC
LIMIT 100;

Design:

index aligned with status + risk_score,
precomputed risk_score if expensive,
avoid sorting huge active set repeatedly.

14.5 Aggregation

SELECT status, count(*)
FROM case_file
GROUP BY status;

For small table, okay.

For huge active table with frequent dashboard access, consider summary table/projection.

14.6 Pagination

Bad pattern:

SELECT *
FROM case_file
ORDER BY opened_at DESC
OFFSET 500000
LIMIT 50;

Large offset gets expensive.

Better pattern:

SELECT *
FROM case_file
WHERE opened_at < :last_seen_opened_at
ORDER BY opened_at DESC
LIMIT 50;

This is keyset pagination.

15. Command Shape Is Also Architecture Input

Reads matter. Writes matter more for consistency.

Command shape asks:

what rows are touched?
what invariants checked?
what locks acquired?
what events emitted?
what indexes updated?
what downstream side effects triggered?

Example command: assignCase.

Input:
- case_id
- assignee
- assigned_by
- reason
- command_id

Checks:
- case exists
- case not closed
- assignee eligible
- no active primary assignment conflict

Writes:
- update case_file.current_assignee
- insert case_assignment_history
- insert case_event
- maybe update SLA

Emits:
- CaseAssigned event

Transaction design:

Command shape determines transaction boundary.

16. Frequency Matters

A query executed once per day can be inefficient.

A query executed 500 times per second cannot.

A write path executed once per minute can have more checks.

A write path executed 20.000 times per second must be minimal and carefully indexed.

Workload design requires rough numbers.

Example table:

Operation	Frequency	Latency Target	Consistency	Notes
Get case detail	200 rps	p95 < 80ms	strong enough from primary/read replica	user-facing
List my active cases	100 rps	p95 < 120ms	read-your-writes preferred	inbox
Assign case	5 rps	p95 < 300ms	strong	transaction-critical
Add evidence	20 rps	p95 < 500ms	strong metadata, blob async ok	document path
Dashboard counts	50 rps	p95 < 200ms	stale <= 5 min ok	projection acceptable
Monthly report	1/day	minutes ok	reproducible	analytical store

Without numbers, arguments become taste-based.

17. Latency Budget Thinking

Database is only part of request latency.

Example target:

API p95 target: 300ms

Budget:

Component	Budget
API gateway/auth	30ms
application logic	50ms
database query	100ms
downstream call	80ms
serialization/network	40ms

If one DB query consumes 250ms, the whole API cannot meet target.

Architectural implication:

avoid N+1 queries,
avoid unbounded scans,
avoid lock waits,
avoid synchronous reporting queries,
avoid remote cross-region writes in latency-sensitive path unless required.

18. Throughput and Write Amplification

Every write updates more than the row.

A write may update:

heap/table page,
primary key index,
secondary indexes,
WAL/log,
replication stream,
CDC/outbox,
triggers,
materialized counters,
foreign key checks,
backups/storage snapshots indirectly.

So this:

UPDATE case_file SET status = 'UNDER_REVIEW' WHERE case_id = :id;

is not “one simple write” if the table has 12 indexes, triggers, replication, and CDC.

Index design must account for write workload.

Rule:

Every index is a read optimization and a write cost.

19. Contention Analysis

Contention happens when many transactions compete for the same row, key range, index page, partition, or resource.

Common hotspots:

global counter row,
sequential queue head,
same tenant partition,
same account balance,
same inventory item,
same case updated by many actors,
monotonically increasing index page,
leader region in global database,
lock table for scheduler.

Example bad counter:

UPDATE sequence_counter
SET value = value + 1
WHERE name = 'case_number';

At high throughput, this row is a hotspot.

Alternatives:

database sequence,
hi/lo allocation,
partitioned counters,
time-prefixed IDs,
async number assignment,
per-tenant sequence if domain allows.

Contention should be identified before launch.

20. Workload and Index Design

Index design should answer actual query shapes.

Bad index strategy:

Add index to every foreign key and every column used somewhere.

Better strategy:

List critical queries.
Identify predicates.
Identify join keys.
Identify sort/group requirements.
Estimate selectivity.
Design composite indexes for high-value queries.
Remove redundant indexes.
Measure execution plans.

Example queries:

-- Query A: user inbox
WHERE assigned_to = ? AND status IN (?, ?) ORDER BY due_at LIMIT 50

-- Query B: supervisor queue
WHERE region = ? AND status = ? AND priority = ? ORDER BY risk_score DESC LIMIT 100

-- Query C: case lookup
WHERE case_number = ?

Possible indexes:

CREATE INDEX idx_case_user_inbox
ON case_file (assigned_to, status, due_at, case_id);

CREATE INDEX idx_case_supervisor_queue
ON case_file (region, status, priority, risk_score DESC, case_id);

CREATE UNIQUE INDEX uq_case_number
ON case_file (case_number);

Do not create one giant index hoping it solves all queries.

21. Workload and Partitioning

Partitioning solves specific problems:

large table maintenance,
time-based retention,
pruning scans,
reducing index size per partition,
archiving old data,
isolating hot/cold data.

Partitioning does not automatically make all queries faster.

It helps when query predicates align with partition key.

Example time partition:

case_event_2026_01
case_event_2026_02
case_event_2026_03

Good query:

WHERE recorded_at >= '2026-03-01'
  AND recorded_at < '2026-04-01'

Bad query for time partition only:

WHERE aggregate_id = :case_id

This may need to search many partitions unless there is additional design.

Partition key is workload decision.

22. Workload and Sharding

Sharding distributes data across nodes.

It is not just “partitioning at larger scale”.

Sharding impacts:

transactions,
joins,
uniqueness,
foreign keys,
resharding,
operational complexity,
backup/restore,
tenant movement,
query routing,
incident blast radius.

Shard key must be chosen from workload.

Bad shard key signs:

low cardinality,
highly skewed,
frequently updated,
not present in common queries,
causes cross-shard transactions,
creates hot shard.

Example SaaS:

tenant_id as shard key

Good if most queries are tenant-scoped.

Bad if one tenant is 70% of traffic.

Mitigation:

split large tenants,
virtual shards,
tenant tiering,
dedicated database for huge tenants,
tenant-aware capacity planning.

23. Workload and Consistency

Not every read needs the same consistency.

Classify reads:

Read Type	Consistency Need	Example
Command precondition	strong	close case only if decision approved
User confirmation after write	read-your-writes	show submitted request
Dashboard	bounded stale okay	counts refresh every 5 min
Audit investigation	exact/reproducible	decision history
Search	eventually consistent often okay	free-text search
Notification	at-least-once okay with dedupe	email/push

Overusing strong consistency everywhere hurts scalability.

Underusing strong consistency corrupts invariants.

Architect's job is classification.

24. Workload and Read Replicas

Read replicas help only certain workloads.

They can reduce load on primary, but introduce replication lag.

Good candidates:

read-only dashboards with stale tolerance,
heavy list pages not requiring latest write,
exports,
reporting snapshots,
internal tooling.

Bad candidates:

command precondition checks,
read immediately after write when user expects latest,
authorization-critical reads if lag can leak permissions,
fraud/risk checks requiring latest state.

Read replica decision must include stale-read risk.

Example:

User changes role from ADMIN to VIEWER.
Replica lags 10 seconds.
Authorization reads from replica.
User still appears ADMIN.

This is not just performance issue. This is security issue.

25. Workload and Caching

Cache is not a substitute for data model.

Cache helps when:

data is read frequently,
data changes less frequently,
stale data acceptable or controllable,
cache key is clear,
invalidation strategy exists,
source of truth remains clear.

Cache hurts when:

correctness depends on latest data,
invalidation is unclear,
cached object embeds authorization-sensitive data,
write frequency high,
cache stampede occurs,
cache becomes hidden database.

Cache design questions:

cache-aside or write-through?
TTL or explicit invalidation?
per-user or global key?
negative caching?
stampede prevention?
what is stale tolerance?
what happens during cache outage?

Never add cache before knowing query plan and workload.

26. Workload and Materialized Views / Projections

If read workload is expensive and can tolerate staleness, use projection.

Example operational dashboard:

CREATE TABLE case_dashboard_summary (
    region TEXT NOT NULL,
    status TEXT NOT NULL,
    priority TEXT NOT NULL,
    count_cases BIGINT NOT NULL,
    last_refreshed_at TIMESTAMPTZ NOT NULL,
    PRIMARY KEY (region, status, priority)
);

Maintained by:

trigger,
synchronous transaction,
async event consumer,
scheduled batch,
materialized view refresh.

Tradeoff:

Strategy	Freshness	Write Cost	Failure Mode
Synchronous update	immediate	higher	command fails if summary fails
Trigger	immediate	hidden complexity	hard debugging
Async event	eventual	lower	projection lag/drift
Batch refresh	delayed	lower	stale dashboard
Materialized view	depends	medium/high	refresh cost/locking depending DB

Projection must have rebuild path.

27. Workload and Data Shape

Relational normalization optimizes correctness and reduces redundancy.

But some read workloads need pre-joined or nested shape.

Example detail page:

Case detail page needs:
- case header
- current assignment
- latest 10 events
- evidence count
- open tasks
- SLA status
- decision summary

Naive implementation:

1 query case
1 query assignment
1 query events
1 query evidence count
1 query tasks
1 query SLA
1 query decision

Maybe okay. Maybe not.

If page is hot and latency target low, consider:

joined query with proper indexes,
read model table,
materialized case summary,
API aggregation with batch fetch,
cache for stable components.

Do not denormalize blindly. Denormalize from measured workload.

28. Workload and N+1 Query Risk

N+1 pattern:

Fetch 50 cases
For each case, fetch assignee
For each case, fetch latest event
For each case, fetch SLA

Result: 151 queries.

At small scale, invisible.

At production scale, latency and database load explode.

Solutions:

batch queries,
joins,
precomputed read model,
data loader pattern,
API response redesign,
pagination limit.

Database architect should review application query behavior, not just schema.

29. Workload and Transaction Length

Long transactions are dangerous in OLTP.

They can hold locks, delay vacuum/cleanup, increase conflict probability, and create cascading waits.

Bad pattern:

BEGIN
  update case
  call external document service
  call notification service
  write audit
COMMIT

Never hold DB transaction while waiting for external service unless absolutely unavoidable.

Better:

BEGIN
  update case
  write audit
  write outbox event
COMMIT

async worker sends notification/document side effect

Transaction should cover state change and durable intent for side effects, not the side effects themselves.

30. Workload and Batch Jobs

Batch jobs are often the hidden killer of OLTP systems.

Examples:

nightly report,
backfill,
cleanup,
reconciliation,
export,
reindex,
data quality scan,
retention purge.

Batch workload questions:

How many rows scanned?
Does it lock rows?
Does it update indexed columns?
Does it run during peak hours?
Does it compete with OLTP queries?
Is it resumable?
Is it chunked?
Is it idempotent?
Can it be throttled?

Safe batch pattern:

Process in chunks
Commit per chunk
Track progress
Use stable ordering
Avoid full-table locks
Throttle under load
Make retry idempotent
Record job evidence

Example chunking:

SELECT case_id
FROM case_file
WHERE archived_at IS NULL
  AND closed_at < now() - interval '2 years'
ORDER BY case_id
LIMIT 1000;

Then process chunk, commit, repeat.

31. Workload and Multi-Tenancy

Multi-tenant workload is rarely uniform.

One tenant may dominate:

storage,
writes,
reads,
reporting,
search,
support tickets,
custom configuration.

Workload profile must include tenant skew.

Example:

Tenant Tier	Tenant Count	Data Volume	Traffic	Isolation Need
small	10,000	low	low	shared
medium	500	medium	medium	shared with limits
enterprise	20	high	high	dedicated shard/db possible
regulated	5	high	medium	stricter isolation

Design impact:

tenant_id in primary queries,
tenant-scoped indexes,
per-tenant rate limits,
partition/shard strategy,
backup/restore per tenant,
noisy neighbor mitigation,
data residency.

32. Workload and Data Growth

Data growth has multiple dimensions:

row count,
row width,
index size,
history size,
event volume,
attachment/blob metadata,
tenant skew,
retention length,
cardinality of labels/tags,
archived data.

Example projection:

case_event:
- 5 million events/day
- average payload 1 KB
- raw table growth ≈ 5 GB/day before index/WAL/replication overhead
- 365 days ≈ 1.8 TB raw payload alone

Actual storage can be much higher due to indexes, WAL, bloat, replication, backup, and compression behavior.

Architect must design lifecycle and retention early.

33. Workload and Backup/Restore

Backup is also workload.

Questions:

How much data?
How often backup?
How fast restore?
Point-in-time recovery needed?
Restore whole DB or tenant/entity subset?
Backup encrypted?
Backup tested?
Does backup affect primary performance?
Are large indexes included/rebuilt?

A database that can serve production traffic but cannot be restored within required RTO is not production-ready.

34. Workload and Migration

Schema migration is workload too.

A migration can:

scan table,
rewrite table,
lock table,
build index,
backfill column,
validate constraint,
update many rows,
cause replication lag,
break old app version.

Workload-first migration asks:

table size?
write rate during migration?
lock behavior?
rollback plan?
dual-version app compatibility?
backfill speed?
validation method?
deploy order?

This will be covered deeply in later parts, but introduce it now: database design includes future change workload.

35. Building a Workload Profile

A workload profile is a structured artifact.

Template:

## Workload Profile: Case Management Database

### Critical Commands
| Command | Frequency | Rows Touched | Transaction Need | Latency | Failure Risk |
|---|---:|---|---|---:|---|

### Critical Queries
| Query | Frequency | Predicate | Sort | Result Size | Freshness | Latency |
|---|---:|---|---|---:|---|---:|

### Batch Jobs
| Job | Schedule | Rows Scanned | Rows Written | Idempotent | Throttled |
|---|---|---:|---:|---|---|

### Analytical Workloads
| Report | Freshness | Grain | Volume | Store |
|---|---|---|---:|---|

### Growth Assumptions
| Data | Daily Growth | Retention | Partition Needed |
|---|---:|---|---|

### Consistency Classes
| Operation | Consistency | Staleness Allowed | Reason |
|---|---|---:|---|

### Contention Risks
| Resource | Why Hot | Mitigation |
|---|---|---|

This artifact prevents architecture by opinion.

36. Example Workload Profile: Regulatory Case System

Commands

Command	Frequency	Rows Touched	Transaction Need	Latency	Risk
Open case	2 rps	intake, case, transition, outbox	atomic	300ms	duplicate case
Assign case	10 rps	case, assignment history, outbox	atomic	200ms	double assignment
Add evidence metadata	30 rps	evidence, case_event, outbox	atomic metadata	500ms	orphan evidence
Close case	1 rps	case, decision, transition, outbox	atomic	500ms	close without decision
Reopen case	rare	case, transition, reason	atomic	500ms	unauthorized reopening

Queries

Query	Frequency	Predicate	Result	Freshness
Case detail	200 rps	case_id	1 case + related summary	latest preferred
My inbox	100 rps	assignee/status	50 rows	read-your-writes
Supervisor queue	50 rps	region/status/priority	100 rows	stale < 30s ok
Case timeline	40 rps	case_id/order by time	100 events	latest preferred
Dashboard counts	80 rps	region/status/priority	aggregate	stale < 5 min ok

Batch / Reporting

Job	Schedule	Store	Notes
SLA breach scan	every 5 min	OLTP/projection	chunked, index by due_at
Monthly regulator report	monthly	warehouse	reproducible snapshot
Archive closed cases	nightly	OLTP + archive store	retention-aware
Data quality scan	nightly	replica/warehouse	no OLTP impact

Design Consequences

case_file needs indexes for inbox and supervisor queue.
case_status_transition and case_event need query by case and by time.
Dashboard counts should be projection, not repeated full aggregation.
Monthly report should be generated from analytical snapshot, not mutable current state.
Close case transaction must enforce decision invariant.
Evidence metadata transaction should not wait for large file storage if avoidable.

37. How Workload Changes Data Model

Suppose normalized model:

case
case_assignment
case_sla
case_priority
case_decision
case_event

For case detail page, normalized is fine if joins are indexed and result is small.

For inbox page, repeatedly joining many tables may be wasteful.

Workload may justify storing selected current fields in case_file:

current_assignee
current_sla_due_at
current_priority
latest_decision_status

This is denormalization, but controlled.

Rules:

Identify source of truth for each derived field.
Update derived fields in transaction or projection.
Add repair/rebuild job.
Add consistency checks.
Document why denormalization exists.

38. Workload-First Decision Matrix

Workload Signal	Likely Design Move
Frequent primary key lookup	primary key, maybe cache
Frequent filtered list	composite index aligned with predicate/sort
Large aggregation dashboard	summary table/materialized projection
Heavy historical reporting	warehouse/lakehouse/analytical store
High append event stream	append table, partitioning, retention
Deep relationship traversal	graph model/projection
Full-text relevance search	search index projection
High write contention on counter	sequence/hi-lo/partitioned counter
Stale reads acceptable	read replica/projection/cache
Strong command precondition	primary/transactional read
Huge tenant skew	tenant tiering/dedicated shard
Long retention history	partition/archive strategy
Frequent schema evolution	expand-contract migration discipline

39. Workload Smells

39.1 “We’ll Add Indexes Later”

Index design should evolve, but critical query paths must be considered early.

39.2 “Reporting Can Query Production Tables”

Maybe at small scale. Dangerous at large scale.

39.3 “Cache Will Fix It”

Cache can reduce repeated reads. It does not fix bad consistency model, bad query shape, or missing indexes.

39.4 “One Database For Everything”

Possible at early stage. Risky when search, analytics, OLTP, and event replay all compete.

39.5 “All Reads Must Be Strongly Consistent”

Usually false and expensive.

39.6 “Eventual Consistency Is Fine Everywhere”

Also false. Command invariants often need strong consistency.

39.7 “The ORM Will Handle It”

ORM can generate queries. It does not understand workload unless you design for it.

39.8 “No Need To Estimate Volume Yet”

Rough estimates are better than none. Architecture without scale assumptions is guesswork.

40. Practical Procedure: Workload-First Database Design

Use this sequence.

Step 1 — List User Journeys

Example:

investigator opens inbox,
investigator reviews case,
supervisor assigns case,
approver closes case,
auditor reviews history,
regulator exports monthly report.

Step 2 — Extract Commands and Queries

For each journey, write commands and queries.

Journey: Supervisor assigns case
- Query supervisor queue
- Query case detail
- Command assign case
- Query updated queue
- Notify assignee

Step 3 — Classify Consistency

assign case = strong
supervisor queue = stale 30s acceptable
notification = at-least-once with dedupe

Step 4 — Estimate Frequency and Latency

Even approximate numbers improve design.

Step 5 — Identify Hot Paths

Hot path = high frequency + user-facing + correctness-sensitive.

Design hot paths first.

Step 6 — Design Logical Model

Normalize for truth and invariants.

Step 7 — Design Physical Access

Indexes, partitioning, projections, caches, replicas.

Step 8 — Design Failure and Repair

For each projection/cache/downstream store:

how stale?
how rebuild?
how detect drift?
how retry?

Step 9 — Validate With EXPLAIN and Load Test

Do not trust mental execution plan.

Step 10 — Document Assumptions

All workload assumptions must be visible.

41. Mini Case Study: Inbox Query

Requirement:

Investigator must see 50 most urgent active cases assigned to them, sorted by SLA due time.

Naive query:

SELECT *
FROM case_file
WHERE assigned_to = :user_id
  AND status != 'CLOSED'
ORDER BY due_at ASC
LIMIT 50;

Problems:

status != 'CLOSED' may be less index-friendly than explicit active statuses.
SELECT * may read wide rows unnecessarily.
No tie-breaker in order.
If due_at nullable, sort behavior unclear.

Better:

SELECT case_id, case_number, priority, status, due_at, title
FROM case_file
WHERE assigned_to = :user_id
  AND status IN ('ASSIGNED', 'UNDER_REVIEW', 'DECISION_PENDING')
  AND due_at IS NOT NULL
ORDER BY due_at ASC, case_id ASC
LIMIT 50;

Index:

CREATE INDEX idx_case_file_investigator_inbox
ON case_file (assigned_to, status, due_at, case_id);

If priority must dominate due date:

ORDER BY priority_rank DESC, due_at ASC, case_id ASC

Then maybe store priority_rank numeric or use expression index depending DB capability and portability requirements.

42. Mini Case Study: Dashboard Counts

Requirement:

Dashboard shows number of active cases by region, priority, and status. Refresh can be up to 5 minutes stale.

Bad design:

SELECT region, priority, status, count(*)
FROM case_file
WHERE status IN ('OPEN', 'ASSIGNED', 'UNDER_REVIEW', 'DECISION_PENDING')
GROUP BY region, priority, status;

Run every dashboard request.

At low volume, okay.

At high volume, bad.

Better:

CREATE TABLE case_dashboard_count (
    region TEXT NOT NULL,
    priority TEXT NOT NULL,
    status TEXT NOT NULL,
    case_count BIGINT NOT NULL,
    last_refreshed_at TIMESTAMPTZ NOT NULL,
    PRIMARY KEY (region, priority, status)
);

Refresh strategy:

scheduled every minute/five minutes,
or event-driven projection,
or materialized view depending DB.

Because stale <= 5 minutes is acceptable, we avoid burdening OLTP primary for every dashboard hit.

43. Mini Case Study: Audit Timeline

Requirement:

Auditor must see full timeline for a case in chronological order.

Query:

SELECT event_type, occurred_at, actor, payload
FROM case_event
WHERE case_id = :case_id
ORDER BY occurred_at ASC, event_id ASC;

Index:

CREATE INDEX idx_case_event_timeline
ON case_event (case_id, occurred_at, event_id);

If timeline can have thousands of events, paginate by (occurred_at, event_id).

If timeline includes multiple sources, consider unified timeline projection:

CREATE TABLE case_timeline_item (
    timeline_item_id UUID PRIMARY KEY,
    case_id UUID NOT NULL,
    item_type TEXT NOT NULL,
    occurred_at TIMESTAMPTZ NOT NULL,
    actor TEXT,
    title TEXT NOT NULL,
    payload JSONB NOT NULL,
    source_table TEXT NOT NULL,
    source_id UUID NOT NULL
);

This is projection. Source tables remain truth.

44. Mermaid: Workload-First Design Flow

45. Workload Review Checklist

Access Patterns

Apa 10 query paling penting?
Apa 10 command paling penting?
Query mana user-facing?
Query mana background/internal?
Query mana audit/regulatory?

Frequency and Latency

Berapa rps/tps sekarang?
Berapa target 12 bulan?
p50/p95/p99 target?
Apakah request synchronous?

Data Volume

Row count awal?
Growth harian/bulanan?
Average row size?
Index growth?
Retention?

Consistency

Strong consistency di mana wajib?
Read-your-writes di mana wajib?
Eventual consistency di mana acceptable?
Staleness maksimum berapa?

Transactions

Command mana atomic?
Row mana disentuh?
Lock risk apa?
External side effect apa?
Outbox needed?

Physical Design

Index apa untuk hot query?
Partition key apa dan kenapa?
Apakah sharding perlu sekarang atau nanti?
Read replica cukup?
Projection/cache/search needed?

Operational

Backup impact?
Batch job impact?
Migration impact?
Monitoring metrics?
Degraded mode?

46. Key Takeaways

Workload-first design berarti schema, index, partition, cache, replica, dan projection diturunkan dari operasi nyata.
OLTP dan OLAP punya prioritas berbeda; mencampurnya tanpa isolasi menciptakan risiko.
Query shape menentukan index dan physical access path.
Command shape menentukan transaction boundary dan consistency enforcement.
Frequency dan latency target mengubah keputusan desain.
Every index is both read optimization and write cost.
Contention harus dianalisis sebelum incident.
Read replica, cache, dan projection hanya aman jika consistency/freshness requirement jelas.
Batch, backup, migration, dan reporting juga bagian dari workload.
Database design document harus menyertakan workload profile, bukan hanya ERD.

47. What Comes Next

Part berikutnya adalah Requirements to Data Model.

Kita akan mengubah requirement bisnis menjadi model data secara sistematis:

dari user journey ke entity,
dari command ke transaction,
dari invariant ke constraint,
dari report ke grain,
dari workflow ke lifecycle,
dari failure mode ke repair path.

Prinsipnya:

Requirement yang tidak diterjemahkan menjadi data invariant akan bocor sebagai bug, ambiguity, atau manual process.

Lesson Recap

You just completed lesson 04 in start here. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 03

Data Lifecycle and State Thinking

Next Lesson

Lesson 05

Requirements to Data Model