Series/Learn Java Data Access Pattern In Action

Final StretchOrdered learning track

Data Access Engineering Playbook

Learn Java Data Access Pattern In Action - Part 060

Final Data Access Engineering Playbook untuk Java production: decision matrix, failure model, review rubric, production readiness checklist, architecture patterns, tool selection, transaction, migration, testing, observability, dan interview-level synthesis.

[2026-07-05]15 min read2852 words

In This Lesson

1. Core Thesis 2. The Final Mental Model 3. Data Access Responsibility Map

Finish

Lesson 6060 lesson track51–60 Final Stretch

#java#data-access#playbook#production-readiness+5 more

Part 060 — Data Access Engineering Playbook

Data access engineering bukan sekadar memilih:
JDBC vs JPA vs jOOQ vs MyBatis vs R2DBC
Ini adalah kemampuan merancang kontrak antara aplikasi dan database agar:

benar di bawah concurrency;

cepat di bawah load;

aman di bawah migration;

bisa diobservasi saat incident;

bisa dites sebelum production;

bisa berevolusi tanpa downtime;

bisa dijelaskan ke reviewer, auditor, dan interviewer.

Bagian terakhir ini adalah playbook final untuk menyatukan seluruh seri.

1. Core Thesis

Top-tier data access engineer memiliki tiga kemampuan:

1. Correctness:
   transaksi, constraint, isolation, locking, idempotency.

2. Performance:
   query shape, index, batching, connection pressure, read model.

3. Evolvability:
   migration, compatibility, testing, observability, operational runbook.

Tools hanyalah implementasi dari prinsip ini.

2. The Final Mental Model

Every data access operation is a contract:

Input:
  command/query parameters, tenant, actor, expected version

Boundary:
  transaction, consistency, authorization, timeout

Database:
  schema, constraints, indexes, isolation, locks

Output:
  domain result, DTO, event, error, affected rows

Operational:
  metrics, logs, tests, migration compatibility, rollback story

If you cannot describe these, operation is not production-ready.

3. Data Access Responsibility Map

Component	Responsibility
DAO	Own SQL primitive and mapping
Repository	Own aggregate persistence contract
Query Service	Own read DTO/projection/search
Unit of Work	Track changes within transaction if ORM
Transaction Script	Orchestrate simple use case transaction
Outbox Repository	Durable integration event write
Inbox Repository	Idempotent message processing
Migration	Versioned schema evolution
Backfill Job	Safe data movement
Read Model Projector	Eventually consistent projection
Cache Layer	Accelerate reads with freshness contract

Do not put all responsibilities in one generic repository.

4. Tool Decision Matrix

Need	Good Fit
Primitive explicit SQL	JDBC
Object graph aggregate with Unit of Work	JPA/Hibernate
SQL-first type-safe query	jOOQ
Hand-written SQL with mapper XML	MyBatis
Non-blocking relational access	R2DBC
Simple Spring repository over JPA	Spring Data JPA
Bulk/report/read model SQL	jOOQ/MyBatis/JDBC
Stored procedure legacy	JDBC/MyBatis/jOOQ
Complex dashboard	DTO projection/read model/jOOQ
High-volume batch insert	JDBC batch/jOOQ batch
Reactive end-to-end high I/O	R2DBC
Imperative high concurrency Java 21	JDBC/JPA/jOOQ + virtual threads

Choose by workload, not fashion.

5. JDBC When

Use JDBC when:

need full control;
simple SQL;
library/framework code;
batch/streaming with minimal abstraction;
no ORM/DSL overhead desired;
team comfortable with manual mapping;
migration/tooling simple.

JDBC requires discipline:

try-with-resources;
prepared statements;
explicit transaction;
error classification;
mapping tests;
batch failure handling.

6. JPA/Hibernate When

Use JPA/Hibernate when:

aggregate lifecycle maps well to entities;
Unit of Work/dirty checking valuable;
moderate object graph;
domain command path benefits from managed entities;
team understands fetch/flush/cascade/lazy behavior.

Avoid/limit JPA for:

large dashboards;
complex reports;
bulk updates with audit;
huge streaming;
SQL-heavy analytics;
unbounded graph traversal.

Use projection/query service alongside JPA.

7. jOOQ When

Use jOOQ when:

SQL is important and must be type-safe;
database-first workflow;
generated schema compile safety desired;
dynamic SQL/CTE/window/upsert common;
read/query/reporting is central;
team wants explicit query shape;
ORM hidden behavior is unacceptable.

jOOQ requires:

codegen workflow;
mapping discipline;
transaction integration;
update count checks;
SQL review culture.

8. MyBatis When

Use MyBatis when:

team wants hand-written SQL;
XML SQL review is preferred;
legacy/stored procedure DB;
mapping result sets without ORM;
dynamic SQL tags are enough;
type-safe DSL/codegen not required.

MyBatis requires:

#{} vs ${} discipline;
resultMap tests;
TypeHandler tests;
update count checks;
dynamic SQL tests;
real DB integration.

9. R2DBC When

Use R2DBC when:

stack is reactive end-to-end;
high concurrency I/O-bound workload;
non-blocking drivers mature for your DB;
team strong in Reactor;
blocking dependencies minimal;
transaction complexity manageable.

Avoid R2DBC when:

DB is bottleneck;
team not reactive-ready;
JPA features needed;
virtual threads/JDBC sufficient;
driver support uncertain.

10. Virtual Threads When

Use virtual threads when:

blocking JDBC/JPA/jOOQ/MyBatis code is already mature;
thread scalability is concern;
Java runtime/framework supports it;
imperative simplicity is valuable;
reactive rewrite not justified.

Remember:

pool remains bottleneck
query remains bottleneck
locks remain bottleneck

Virtual threads reduce thread cost, not database cost.

11. Command vs Query Split

Command path:

intent-specific transaction
aggregate/source truth
constraints
idempotency
audit/outbox
optimistic/pessimistic concurrency
semantic errors

Query path:

DTO projection
read model
bounded result
sort whitelist
pagination
index
query count budget
tenant/security predicate

Do not force one repository method style for both.

12. Command Path Checklist

13. Query Path Checklist

14. Transaction Design Checklist

15. Consistency Pattern Matrix

Problem	Pattern
duplicate request	idempotency key
lost update	`@Version` / expected version / conditional update
duplicate unique business key	unique constraint
one active relationship	partial unique index / constraint
capacity reservation	atomic conditional update
high contention row	pessimistic lock / queue / sharding
cross-service consistency	outbox/inbox/saga
read dashboard performance	read model/projection
async duplicate event	inbox dedup + source version
backfill old event	idempotent upsert
no distributed transaction	local transaction + durable messages

Use database constraints as final guard whenever possible.

16. Error Mapping Matrix

DB/Framework Error	Application Meaning
unique constraint	duplicate business key/idempotent duplicate
FK violation	invalid reference / not found / invariant bug
check constraint	invalid state/data bug
not-null violation	mapper/app bug or invalid command
optimistic lock	conflict/stale version
lock timeout	busy/retry later
deadlock	retryable transaction failure
serialization failure	retryable transaction failure
query timeout	overloaded/slow query/fail fast
non-unique result	data invariant violation
no result	not found or empty depending contract

Do not leak raw SQL exceptions to API.

17. Performance Failure Model

Common causes:

N+1;
missing index;
slow count;
unbounded result;
over-fetching entity graph;
cartesian explosion;
connection pool wait;
long transaction;
lock contention;
batch not batching;
full graph merge;
cache churn;
read model lag/catch-up;
async retry storm.

Diagnosis order:

query count -> slow SQL -> plan -> pool wait -> locks -> transaction duration -> workload/queue

18. N+1 Prevention Rule

For list/read endpoint:

Use DTO projection or read model.

For command/detail with aggregate:

Use intent-specific fetch plan.

Test query count.

Do not fix N+1 by making everything EAGER.

19. Pagination Rule

Use offset for:

shallow UI pages;
small result sets;
admin convenience.

Use keyset/cursor for:

deep pagination;
infinite scroll;
high-scale feeds;
export traversal.

Use async export for huge data.

Always sort deterministically.

20. Batch/Bulk Rule

Use entity loop when:

domain method needed;
row count moderate;
audit/outbox per row manageable.

Use bulk SQL when:

technical update;
same transformation;
no per-row domain event required;
version/audit implications understood.

Use chunked job when:

large data;
need resume/throttle/audit.

21. Cache Rule

Cache only with freshness contract.

Good candidates:

immutable reference data;
stable DTO read;
read model result with TTL/source version.

Bad candidates:

command validation;
mutable aggregate state;
authorization without strict invalidation;
financial/inventory/capacity truth.

Cache must have:

key scope;
TTL/invalidation;
metrics;
manual eviction;
test for stale safety.

22. Migration Rule

For production schema change:

expand -> deploy compatible code -> backfill -> switch -> enforce -> contract later

Never:

edit applied migration;
rename/drop during rolling deploy;
backfill huge table in one transaction;
add constraint without checking dirty data;
assume migration tool equals zero downtime.

23. Testing Rule

Mock tests do not prove data access.

You need:

real DB integration;
migrations applied;
mapping tests;
constraint tests;
rollback tests;
query count tests;
concurrency tests for critical locks;
migration compatibility tests;
backfill job tests;
performance smoke.

24. Observability Rule

Every critical operation needs names.

Names:

CaseFileRepository.loadForApproval
CaseDashboardQuery.search
OutboxDao.claimNextBatch
CaseStatusBackfillJob.processChunk

Metrics:

duration;
error count;
rows returned/updated;
pool wait;
timeout;
retry;
conflict;
lock wait;
lag/progress for async/backfill.

If not observable, it is not production-ready.

25. Security Rule

Data access is security boundary.

Every query must consider:

tenant predicate;
authorization scope;
raw SQL injection;
cache key scope;
sensitive columns;
redaction;
admin bypass explicit;
audit.

SQL bug can be data breach.

26. Multi-Tenant Rule

Always design:

tenant ID in primary predicates;
tenant-aware indexes;
tenant-aware cache keys;
per-tenant rate limits for expensive work;
hot tenant metrics;
tenant isolation tests.

For global/admin queries, name them explicitly.

27. Read Model Rule

Use read model when:

dashboard joins too many tables;
query cost high and repeated;
list view needs denormalized fields;
search/filter shape differs from write schema;
read path can tolerate lag;
rebuild process possible.

Read model must have:

source version;
idempotent projector;
lag metric;
rebuild/backfill;
schema owner;
query indexes.

28. Outbox/Inbox Rule

Use outbox when DB state change must publish event.

Use inbox when consuming message must update DB exactly/effectively once.

Outbox requires:

event key unique;
same transaction as state;
publisher claim/lease;
idempotent publish/consumer;
mark published with ownership.

Inbox requires:

message ID unique;
payload hash conflict check;
ack after commit;
retry/dead-letter.

29. Review Rubric: Repository Method

Ask:

What is semantic contract?
Does method name reveal lock/fetch/update behavior?
Is transaction expected outside or inside?
What cardinality?
What errors?
What concurrency protection?
What rows/tables touched?
What tests prove it?

Bad:

save(entity)
findAll()
update(data)

for complex domain without precise semantics.

30. Review Rubric: SQL Query

Ask:

Query name and owner?
Explicit selected columns?
Tenant/security predicate?
Bound parameters?
Raw fragments whitelisted?
Result bounded?
Sort deterministic?
Index supports filter/order?
Count query needed?
Explain plan reviewed?
Mapping nullability handled?
Timeout?
Metrics?

31. Review Rubric: Transaction

Ask:

What is atomic?
What external effects happen?
What happens on rollback?
What happens on timeout?
What happens on duplicate retry?
What locks are acquired?
How long can it run?
What isolation anomalies possible?
Are constraints final guard?
Is retry safe?

32. Review Rubric: Migration

Ask:

Is change additive?
Does old app still work?
Does new app work with partial data?
Is backfill needed?
Is backfill chunked?
Are constraints/indexes safe?
Are applied migrations immutable?
Is rollback/forward fix defined?
Are hidden consumers known?
Is cleanup scheduled?

33. Review Rubric: Backfill/Repair

Ask:

What exact rows affected?
Is dry run available?
Is update idempotent?
Is progress durable?
Is progress saved atomically?
Can it pause/resume?
Is throttle/kill switch present?
Are errors isolated?
Is audit/snapshot needed?
What constraint prevents recurrence?

34. Production Readiness Checklist

A data access feature is production-ready if:

35. Interview-Level Explanation: JPA vs jOOQ

A strong answer:

I use JPA/Hibernate when I want Unit of Work and aggregate lifecycle management,
but I avoid using managed entities as read models. For query-heavy paths I prefer
DTO projections, jOOQ, or read models because query shape is explicit and easier
to tune. In production I care less about tool ideology and more about transaction
boundary, query count, indexes, migration compatibility, and error semantics.

This shows seniority.

36. Interview-Level Explanation: Optimistic vs Pessimistic Lock

Optimistic locking detects stale updates using a version column and is good when
contention is low or user edits can be retried/reloaded. Pessimistic locking
serializes a short critical section by locking rows, useful for high contention
or child-set invariants. I still use constraints as final guards. I don't blindly
retry optimistic conflicts for user commands because that can apply decisions to
state the user never saw.

37. Interview-Level Explanation: N+1

N+1 happens when one initial query is followed by one query per result row,
usually due to lazy loading in mappers or serializers. I prevent it by not
returning entities from read endpoints, using DTO projections/read models for
lists, explicit fetch plans for aggregate commands, and query count tests for
critical endpoints.

38. Interview-Level Explanation: Zero-Downtime Migration

I use expand-contract. First I add backward-compatible schema, deploy code that
can write/read both old and new representations, backfill in chunks, validate
parity, switch reads behind a feature flag, enforce constraints, and only later
drop old schema after all old app versions and jobs are gone. Rollback differs
by phase; early rollback is usually app/flag rollback, late rollback is often
forward fix.

39. Interview-Level Explanation: Idempotency

Retries happen because networks, timeouts, and async delivery are unreliable.
I use idempotency keys or message IDs stored with payload hash and outcome.
Duplicate same key returns same result or skips work; same key with different
payload is conflict. For projections I use source version so old events cannot
overwrite newer read model state.

40. Interview-Level Explanation: Outbox

Outbox solves the problem of updating DB state and publishing an event atomically.
The command transaction writes the state change and outbox row together. A worker
later claims and publishes outbox events with idempotent event keys. Consumers
must also be idempotent. This avoids distributed transaction while giving durable
eventual delivery.

41. Interview-Level Explanation: R2DBC vs Virtual Threads

R2DBC removes blocking threads and is valuable in a reactive end-to-end stack,
but it does not remove database bottlenecks. Virtual threads make blocking JDBC
scale better at the thread level while keeping imperative code. I compare them
with load tests and choose based on bottleneck, driver maturity, team expertise,
and transaction complexity. Pool, timeout, and backpressure remain necessary in
both.

42. Final Architecture Pattern: Hybrid Pragmatism

A strong production architecture often uses:

JPA/Hibernate:
  command aggregate persistence where Unit of Work helps.

jOOQ/MyBatis/JDBC:
  explicit read queries, reports, batch, outbox/inbox.

Migration tool:
  Flyway/Liquibase.

Read model:
  high-traffic dashboards.

Outbox/inbox:
  service integration.

Backfill jobs:
  safe schema/data evolution.

Testcontainers:
  real DB integration tests.

Virtual threads:
  scalable imperative execution if Java/runtime supports it.

This is not overengineering if each piece solves a real problem.

43. Common Bad Architecture

Controller returns JPA entity.
Repository has findAll.
Service mutates detached entities.
No @Version.
No idempotency.
No outbox.
Flyway migration drops column directly.
Dashboard maps entity list to DTO and triggers N+1.
No query count tests.
No real DB integration tests.
Cache used for command validation.
Backfill runs as one huge SQL.

This is incident factory.

44. Good Architecture Example

POST /cases/{id}/approve
  -> ApproveCaseCommand(commandId, caseId, expectedVersion, actor, reason)
  -> @Transactional use case
  -> idempotency start
  -> repository.loadForApproval(caseId)
  -> domain approve()
  -> repository.save with version check
  -> audit insert
  -> outbox insert
  -> idempotency complete
  -> response DTO

GET /cases/dashboard
  -> CaseDashboardQuery(validated filters, sort, cursor)
  -> jOOQ/MyBatis DTO projection or read model
  -> bounded query
  -> query count budget
  -> tenant/security predicate

Clear separation.

45. Data Access Design Document Template

# Data Access Design: <Feature>

## Use Cases
- Commands:
- Queries:

## Schema
- Tables:
- Constraints:
- Indexes:

## Transaction Model
- Boundary:
- Isolation:
- Locks:
- Idempotency:

## Query Model
- DTO/read model:
- Pagination:
- Sort/filter:
- Query plan:

## Migration Plan
- Expand:
- Backfill:
- Switch:
- Contract:

## Observability
- Metrics:
- Logs:
- Alerts:

## Tests
- Unit:
- Integration:
- Concurrency:
- Migration:
- Performance smoke:

## Rollback/Forward Fix

Use for risky changes.

46. Final Failure Checklist

Before launch, ask:

What if this command is retried?
What if two users click approve at same time?
What if DB update commits but HTTP times out?
What if outbox publish fails?
What if duplicate event is consumed?
What if migration partially completes?
What if old app version still runs?
What if backfill job crashes?
What if query returns 100x rows?
What if tenant filter is missing?
What if cache is stale?
What if read model lags?
What if DB pool is full?
What if lock waits?
What if generated SQL changes after library upgrade?

If you can answer, you're operating at senior/principal level.

47. Learning Path After This Series

To go deeper:

PostgreSQL internals and query planning.
Transaction isolation anomalies with real labs.
Advanced indexing: partial, covering, expression, GIN/GiST/BRIN.
Event-driven consistency: outbox/inbox/saga patterns.
High-scale read models and projection rebuild.
Data migration and online schema change at large scale.
Observability for database-heavy services.
Performance engineering and load testing.
Distributed systems failure modeling.
Database reliability engineering.

Data access mastery is lifelong.

48. Final Production Checklist

Use this as final gate.

Correctness

Constraints enforce invariants.
Transactions short and explicit.
Idempotency for retried commands/messages.
Optimistic/pessimistic strategy clear.
Error translation semantic.
Outbox/inbox where needed.

Performance

Evolvability

Operability

Security

49. Final Mental Shortcut

When reviewing any data access code, scan for:

Where is the transaction?
Where is the constraint?
Where is the index?
Where is the limit?
Where is the tenant predicate?
Where is the version/idempotency?
Where is the query count budget?
Where is the migration compatibility?
Where is the rollback story?
Where is the test?
Where is the metric?

If many are missing, the code is not production-ready.

50. Closing Summary

Across this 60-part series, we covered:

mental model and boundaries;
JDBC primitives;
transactions and consistency;
DAO/repository/query patterns;
JPA/Hibernate production patterns;
jOOQ/MyBatis SQL-first patterns;
reactive/R2DBC/async/virtual threads;
migration/backfill/testing/playbook.

The core lesson:

Data access is not a framework choice.
Data access is production correctness engineering.

Mastering it means understanding both worlds:

Java application model
+
database execution model

and designing the boundary deliberately.

51. References

Jakarta Persistence Specification: https://jakarta.ee/specifications/persistence/3.2/jakarta-persistence-spec-3.2
Hibernate ORM User Guide: https://docs.hibernate.org/stable/orm/userguide/html_single/
jOOQ Manual: https://www.jooq.org/doc/latest/manual/
MyBatis Documentation: https://mybatis.org/mybatis-3/
R2DBC Specification: https://r2dbc.io/spec/1.0.0.RELEASE/spec/html/
Spring Framework Transaction Management: https://docs.spring.io/spring-framework/reference/data-access/transaction.html
PostgreSQL Documentation: https://www.postgresql.org/docs/current/
Flyway Documentation: https://documentation.red-gate.com/fd
Liquibase Documentation: https://docs.liquibase.com/
Testcontainers Java: https://java.testcontainers.org/

Lesson Recap

You just completed lesson 60 in final stretch. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 59

Testing Data Access Layer

END_OF_SERIES