Data Access Engineering Playbook
Learn Java Data Access Pattern In Action - Part 060
Final Data Access Engineering Playbook untuk Java production: decision matrix, failure model, review rubric, production readiness checklist, architecture patterns, tool selection, transaction, migration, testing, observability, dan interview-level synthesis.
Part 060 — Data Access Engineering Playbook
Data access engineering bukan sekadar memilih:
JDBC vs JPA vs jOOQ vs MyBatis vs R2DBCIni adalah kemampuan merancang kontrak antara aplikasi dan database agar:
- benar di bawah concurrency;
- cepat di bawah load;
- aman di bawah migration;
- bisa diobservasi saat incident;
- bisa dites sebelum production;
- bisa berevolusi tanpa downtime;
- bisa dijelaskan ke reviewer, auditor, dan interviewer.
Bagian terakhir ini adalah playbook final untuk menyatukan seluruh seri.
1. Core Thesis
Top-tier data access engineer memiliki tiga kemampuan:
1. Correctness:
transaksi, constraint, isolation, locking, idempotency.
2. Performance:
query shape, index, batching, connection pressure, read model.
3. Evolvability:
migration, compatibility, testing, observability, operational runbook.
Tools hanyalah implementasi dari prinsip ini.
2. The Final Mental Model
Every data access operation is a contract:
Input:
command/query parameters, tenant, actor, expected version
Boundary:
transaction, consistency, authorization, timeout
Database:
schema, constraints, indexes, isolation, locks
Output:
domain result, DTO, event, error, affected rows
Operational:
metrics, logs, tests, migration compatibility, rollback story
If you cannot describe these, operation is not production-ready.
3. Data Access Responsibility Map
| Component | Responsibility |
|---|---|
| DAO | Own SQL primitive and mapping |
| Repository | Own aggregate persistence contract |
| Query Service | Own read DTO/projection/search |
| Unit of Work | Track changes within transaction if ORM |
| Transaction Script | Orchestrate simple use case transaction |
| Outbox Repository | Durable integration event write |
| Inbox Repository | Idempotent message processing |
| Migration | Versioned schema evolution |
| Backfill Job | Safe data movement |
| Read Model Projector | Eventually consistent projection |
| Cache Layer | Accelerate reads with freshness contract |
Do not put all responsibilities in one generic repository.
4. Tool Decision Matrix
| Need | Good Fit |
|---|---|
| Primitive explicit SQL | JDBC |
| Object graph aggregate with Unit of Work | JPA/Hibernate |
| SQL-first type-safe query | jOOQ |
| Hand-written SQL with mapper XML | MyBatis |
| Non-blocking relational access | R2DBC |
| Simple Spring repository over JPA | Spring Data JPA |
| Bulk/report/read model SQL | jOOQ/MyBatis/JDBC |
| Stored procedure legacy | JDBC/MyBatis/jOOQ |
| Complex dashboard | DTO projection/read model/jOOQ |
| High-volume batch insert | JDBC batch/jOOQ batch |
| Reactive end-to-end high I/O | R2DBC |
| Imperative high concurrency Java 21 | JDBC/JPA/jOOQ + virtual threads |
Choose by workload, not fashion.
5. JDBC When
Use JDBC when:
- need full control;
- simple SQL;
- library/framework code;
- batch/streaming with minimal abstraction;
- no ORM/DSL overhead desired;
- team comfortable with manual mapping;
- migration/tooling simple.
JDBC requires discipline:
- try-with-resources;
- prepared statements;
- explicit transaction;
- error classification;
- mapping tests;
- batch failure handling.
6. JPA/Hibernate When
Use JPA/Hibernate when:
- aggregate lifecycle maps well to entities;
- Unit of Work/dirty checking valuable;
- moderate object graph;
- domain command path benefits from managed entities;
- team understands fetch/flush/cascade/lazy behavior.
Avoid/limit JPA for:
- large dashboards;
- complex reports;
- bulk updates with audit;
- huge streaming;
- SQL-heavy analytics;
- unbounded graph traversal.
Use projection/query service alongside JPA.
7. jOOQ When
Use jOOQ when:
- SQL is important and must be type-safe;
- database-first workflow;
- generated schema compile safety desired;
- dynamic SQL/CTE/window/upsert common;
- read/query/reporting is central;
- team wants explicit query shape;
- ORM hidden behavior is unacceptable.
jOOQ requires:
- codegen workflow;
- mapping discipline;
- transaction integration;
- update count checks;
- SQL review culture.
8. MyBatis When
Use MyBatis when:
- team wants hand-written SQL;
- XML SQL review is preferred;
- legacy/stored procedure DB;
- mapping result sets without ORM;
- dynamic SQL tags are enough;
- type-safe DSL/codegen not required.
MyBatis requires:
#{}vs${}discipline;- resultMap tests;
- TypeHandler tests;
- update count checks;
- dynamic SQL tests;
- real DB integration.
9. R2DBC When
Use R2DBC when:
- stack is reactive end-to-end;
- high concurrency I/O-bound workload;
- non-blocking drivers mature for your DB;
- team strong in Reactor;
- blocking dependencies minimal;
- transaction complexity manageable.
Avoid R2DBC when:
- DB is bottleneck;
- team not reactive-ready;
- JPA features needed;
- virtual threads/JDBC sufficient;
- driver support uncertain.
10. Virtual Threads When
Use virtual threads when:
- blocking JDBC/JPA/jOOQ/MyBatis code is already mature;
- thread scalability is concern;
- Java runtime/framework supports it;
- imperative simplicity is valuable;
- reactive rewrite not justified.
Remember:
pool remains bottleneck
query remains bottleneck
locks remain bottleneck
Virtual threads reduce thread cost, not database cost.
11. Command vs Query Split
Command path:
intent-specific transaction
aggregate/source truth
constraints
idempotency
audit/outbox
optimistic/pessimistic concurrency
semantic errors
Query path:
DTO projection
read model
bounded result
sort whitelist
pagination
index
query count budget
tenant/security predicate
Do not force one repository method style for both.
12. Command Path Checklist
- Command has ID/idempotency if retry possible.
- Tenant/actor/scope explicit.
- Expected version included if user edit.
- Source-of-truth table/aggregate loaded.
- Transaction boundary at use case.
- External calls not inside transaction.
- DB constraints enforce critical invariants.
- Optimistic/pessimistic strategy chosen.
- Affected rows checked.
- Audit written if required.
- Outbox written atomically.
- Errors mapped semantically.
- Retry only for retryable/idempotent failures.
- Tests cover commit, rollback, conflict, duplicate.
13. Query Path Checklist
- Query object validates filters/limit/sort.
- Tenant/security predicate enforced in SQL.
- Result bounded.
- DTO projection/read model preferred for lists.
- Entity graph not leaked.
- Sort deterministic with tiebreaker.
- Pagination strategy chosen: offset/keyset/slice.
- Count query reviewed or avoided.
- Index supports predicates/order.
- Query timeout defined.
- Query named and observable.
- Query count/N+1 tested.
- Mapping handles null/enums/time.
- Generated SQL/plan reviewed if critical.
14. Transaction Design Checklist
- Boundary at use case/command handler.
- Transaction is short.
- No external I/O inside transaction.
- Isolation level understood.
- Lock strategy explicit.
- Lock ordering consistent.
- Timeout/lock timeout configured.
- Retry wraps whole transaction.
- Idempotency supports retry.
- Bulk/native updates do not leave stale ORM context.
- Rollback tested.
15. Consistency Pattern Matrix
| Problem | Pattern |
|---|---|
| duplicate request | idempotency key |
| lost update | @Version / expected version / conditional update |
| duplicate unique business key | unique constraint |
| one active relationship | partial unique index / constraint |
| capacity reservation | atomic conditional update |
| high contention row | pessimistic lock / queue / sharding |
| cross-service consistency | outbox/inbox/saga |
| read dashboard performance | read model/projection |
| async duplicate event | inbox dedup + source version |
| backfill old event | idempotent upsert |
| no distributed transaction | local transaction + durable messages |
Use database constraints as final guard whenever possible.
16. Error Mapping Matrix
| DB/Framework Error | Application Meaning |
|---|---|
| unique constraint | duplicate business key/idempotent duplicate |
| FK violation | invalid reference / not found / invariant bug |
| check constraint | invalid state/data bug |
| not-null violation | mapper/app bug or invalid command |
| optimistic lock | conflict/stale version |
| lock timeout | busy/retry later |
| deadlock | retryable transaction failure |
| serialization failure | retryable transaction failure |
| query timeout | overloaded/slow query/fail fast |
| non-unique result | data invariant violation |
| no result | not found or empty depending contract |
Do not leak raw SQL exceptions to API.
17. Performance Failure Model
Common causes:
- N+1;
- missing index;
- slow count;
- unbounded result;
- over-fetching entity graph;
- cartesian explosion;
- connection pool wait;
- long transaction;
- lock contention;
- batch not batching;
- full graph merge;
- cache churn;
- read model lag/catch-up;
- async retry storm.
Diagnosis order:
query count -> slow SQL -> plan -> pool wait -> locks -> transaction duration -> workload/queue
18. N+1 Prevention Rule
For list/read endpoint:
Use DTO projection or read model.
For command/detail with aggregate:
Use intent-specific fetch plan.
Test query count.
Do not fix N+1 by making everything EAGER.
19. Pagination Rule
Use offset for:
- shallow UI pages;
- small result sets;
- admin convenience.
Use keyset/cursor for:
- deep pagination;
- infinite scroll;
- high-scale feeds;
- export traversal.
Use async export for huge data.
Always sort deterministically.
20. Batch/Bulk Rule
Use entity loop when:
- domain method needed;
- row count moderate;
- audit/outbox per row manageable.
Use bulk SQL when:
- technical update;
- same transformation;
- no per-row domain event required;
- version/audit implications understood.
Use chunked job when:
- large data;
- need resume/throttle/audit.
21. Cache Rule
Cache only with freshness contract.
Good candidates:
- immutable reference data;
- stable DTO read;
- read model result with TTL/source version.
Bad candidates:
- command validation;
- mutable aggregate state;
- authorization without strict invalidation;
- financial/inventory/capacity truth.
Cache must have:
- key scope;
- TTL/invalidation;
- metrics;
- manual eviction;
- test for stale safety.
22. Migration Rule
For production schema change:
expand -> deploy compatible code -> backfill -> switch -> enforce -> contract later
Never:
- edit applied migration;
- rename/drop during rolling deploy;
- backfill huge table in one transaction;
- add constraint without checking dirty data;
- assume migration tool equals zero downtime.
23. Testing Rule
Mock tests do not prove data access.
You need:
- real DB integration;
- migrations applied;
- mapping tests;
- constraint tests;
- rollback tests;
- query count tests;
- concurrency tests for critical locks;
- migration compatibility tests;
- backfill job tests;
- performance smoke.
24. Observability Rule
Every critical operation needs names.
Names:
CaseFileRepository.loadForApproval
CaseDashboardQuery.search
OutboxDao.claimNextBatch
CaseStatusBackfillJob.processChunk
Metrics:
- duration;
- error count;
- rows returned/updated;
- pool wait;
- timeout;
- retry;
- conflict;
- lock wait;
- lag/progress for async/backfill.
If not observable, it is not production-ready.
25. Security Rule
Data access is security boundary.
Every query must consider:
- tenant predicate;
- authorization scope;
- raw SQL injection;
- cache key scope;
- sensitive columns;
- redaction;
- admin bypass explicit;
- audit.
SQL bug can be data breach.
26. Multi-Tenant Rule
Always design:
- tenant ID in primary predicates;
- tenant-aware indexes;
- tenant-aware cache keys;
- per-tenant rate limits for expensive work;
- hot tenant metrics;
- tenant isolation tests.
For global/admin queries, name them explicitly.
27. Read Model Rule
Use read model when:
- dashboard joins too many tables;
- query cost high and repeated;
- list view needs denormalized fields;
- search/filter shape differs from write schema;
- read path can tolerate lag;
- rebuild process possible.
Read model must have:
- source version;
- idempotent projector;
- lag metric;
- rebuild/backfill;
- schema owner;
- query indexes.
28. Outbox/Inbox Rule
Use outbox when DB state change must publish event.
Use inbox when consuming message must update DB exactly/effectively once.
Outbox requires:
- event key unique;
- same transaction as state;
- publisher claim/lease;
- idempotent publish/consumer;
- mark published with ownership.
Inbox requires:
- message ID unique;
- payload hash conflict check;
- ack after commit;
- retry/dead-letter.
29. Review Rubric: Repository Method
Ask:
- What is semantic contract?
- Does method name reveal lock/fetch/update behavior?
- Is transaction expected outside or inside?
- What cardinality?
- What errors?
- What concurrency protection?
- What rows/tables touched?
- What tests prove it?
Bad:
save(entity)
findAll()
update(data)
for complex domain without precise semantics.
30. Review Rubric: SQL Query
Ask:
- Query name and owner?
- Explicit selected columns?
- Tenant/security predicate?
- Bound parameters?
- Raw fragments whitelisted?
- Result bounded?
- Sort deterministic?
- Index supports filter/order?
- Count query needed?
- Explain plan reviewed?
- Mapping nullability handled?
- Timeout?
- Metrics?
31. Review Rubric: Transaction
Ask:
- What is atomic?
- What external effects happen?
- What happens on rollback?
- What happens on timeout?
- What happens on duplicate retry?
- What locks are acquired?
- How long can it run?
- What isolation anomalies possible?
- Are constraints final guard?
- Is retry safe?
32. Review Rubric: Migration
Ask:
- Is change additive?
- Does old app still work?
- Does new app work with partial data?
- Is backfill needed?
- Is backfill chunked?
- Are constraints/indexes safe?
- Are applied migrations immutable?
- Is rollback/forward fix defined?
- Are hidden consumers known?
- Is cleanup scheduled?
33. Review Rubric: Backfill/Repair
Ask:
- What exact rows affected?
- Is dry run available?
- Is update idempotent?
- Is progress durable?
- Is progress saved atomically?
- Can it pause/resume?
- Is throttle/kill switch present?
- Are errors isolated?
- Is audit/snapshot needed?
- What constraint prevents recurrence?
34. Production Readiness Checklist
A data access feature is production-ready if:
- schema migration reviewed and safe;
- app compatibility planned;
- repository/query contract explicit;
- transaction/constraint/concurrency strategy defined;
- SQL/query plan/index reviewed;
- DTO/mapping tested;
- rollback/forward-fix story exists;
- metrics/logs/traces named;
- tests cover happy/error/rollback;
- load/pressure implications understood;
- runbook exists for risky migration/backfill;
- security/tenant predicates verified;
- feature flag/canary if risky.
35. Interview-Level Explanation: JPA vs jOOQ
A strong answer:
I use JPA/Hibernate when I want Unit of Work and aggregate lifecycle management,
but I avoid using managed entities as read models. For query-heavy paths I prefer
DTO projections, jOOQ, or read models because query shape is explicit and easier
to tune. In production I care less about tool ideology and more about transaction
boundary, query count, indexes, migration compatibility, and error semantics.
This shows seniority.
36. Interview-Level Explanation: Optimistic vs Pessimistic Lock
Optimistic locking detects stale updates using a version column and is good when
contention is low or user edits can be retried/reloaded. Pessimistic locking
serializes a short critical section by locking rows, useful for high contention
or child-set invariants. I still use constraints as final guards. I don't blindly
retry optimistic conflicts for user commands because that can apply decisions to
state the user never saw.
37. Interview-Level Explanation: N+1
N+1 happens when one initial query is followed by one query per result row,
usually due to lazy loading in mappers or serializers. I prevent it by not
returning entities from read endpoints, using DTO projections/read models for
lists, explicit fetch plans for aggregate commands, and query count tests for
critical endpoints.
38. Interview-Level Explanation: Zero-Downtime Migration
I use expand-contract. First I add backward-compatible schema, deploy code that
can write/read both old and new representations, backfill in chunks, validate
parity, switch reads behind a feature flag, enforce constraints, and only later
drop old schema after all old app versions and jobs are gone. Rollback differs
by phase; early rollback is usually app/flag rollback, late rollback is often
forward fix.
39. Interview-Level Explanation: Idempotency
Retries happen because networks, timeouts, and async delivery are unreliable.
I use idempotency keys or message IDs stored with payload hash and outcome.
Duplicate same key returns same result or skips work; same key with different
payload is conflict. For projections I use source version so old events cannot
overwrite newer read model state.
40. Interview-Level Explanation: Outbox
Outbox solves the problem of updating DB state and publishing an event atomically.
The command transaction writes the state change and outbox row together. A worker
later claims and publishes outbox events with idempotent event keys. Consumers
must also be idempotent. This avoids distributed transaction while giving durable
eventual delivery.
41. Interview-Level Explanation: R2DBC vs Virtual Threads
R2DBC removes blocking threads and is valuable in a reactive end-to-end stack,
but it does not remove database bottlenecks. Virtual threads make blocking JDBC
scale better at the thread level while keeping imperative code. I compare them
with load tests and choose based on bottleneck, driver maturity, team expertise,
and transaction complexity. Pool, timeout, and backpressure remain necessary in
both.
42. Final Architecture Pattern: Hybrid Pragmatism
A strong production architecture often uses:
JPA/Hibernate:
command aggregate persistence where Unit of Work helps.
jOOQ/MyBatis/JDBC:
explicit read queries, reports, batch, outbox/inbox.
Migration tool:
Flyway/Liquibase.
Read model:
high-traffic dashboards.
Outbox/inbox:
service integration.
Backfill jobs:
safe schema/data evolution.
Testcontainers:
real DB integration tests.
Virtual threads:
scalable imperative execution if Java/runtime supports it.
This is not overengineering if each piece solves a real problem.
43. Common Bad Architecture
Controller returns JPA entity.
Repository has findAll.
Service mutates detached entities.
No @Version.
No idempotency.
No outbox.
Flyway migration drops column directly.
Dashboard maps entity list to DTO and triggers N+1.
No query count tests.
No real DB integration tests.
Cache used for command validation.
Backfill runs as one huge SQL.
This is incident factory.
44. Good Architecture Example
POST /cases/{id}/approve
-> ApproveCaseCommand(commandId, caseId, expectedVersion, actor, reason)
-> @Transactional use case
-> idempotency start
-> repository.loadForApproval(caseId)
-> domain approve()
-> repository.save with version check
-> audit insert
-> outbox insert
-> idempotency complete
-> response DTO
GET /cases/dashboard
-> CaseDashboardQuery(validated filters, sort, cursor)
-> jOOQ/MyBatis DTO projection or read model
-> bounded query
-> query count budget
-> tenant/security predicate
Clear separation.
45. Data Access Design Document Template
# Data Access Design: <Feature>
## Use Cases
- Commands:
- Queries:
## Schema
- Tables:
- Constraints:
- Indexes:
## Transaction Model
- Boundary:
- Isolation:
- Locks:
- Idempotency:
## Query Model
- DTO/read model:
- Pagination:
- Sort/filter:
- Query plan:
## Migration Plan
- Expand:
- Backfill:
- Switch:
- Contract:
## Observability
- Metrics:
- Logs:
- Alerts:
## Tests
- Unit:
- Integration:
- Concurrency:
- Migration:
- Performance smoke:
## Rollback/Forward Fix
Use for risky changes.
46. Final Failure Checklist
Before launch, ask:
- What if this command is retried?
- What if two users click approve at same time?
- What if DB update commits but HTTP times out?
- What if outbox publish fails?
- What if duplicate event is consumed?
- What if migration partially completes?
- What if old app version still runs?
- What if backfill job crashes?
- What if query returns 100x rows?
- What if tenant filter is missing?
- What if cache is stale?
- What if read model lags?
- What if DB pool is full?
- What if lock waits?
- What if generated SQL changes after library upgrade?
If you can answer, you're operating at senior/principal level.
47. Learning Path After This Series
To go deeper:
- PostgreSQL internals and query planning.
- Transaction isolation anomalies with real labs.
- Advanced indexing: partial, covering, expression, GIN/GiST/BRIN.
- Event-driven consistency: outbox/inbox/saga patterns.
- High-scale read models and projection rebuild.
- Data migration and online schema change at large scale.
- Observability for database-heavy services.
- Performance engineering and load testing.
- Distributed systems failure modeling.
- Database reliability engineering.
Data access mastery is lifelong.
48. Final Production Checklist
Use this as final gate.
Correctness
- Constraints enforce invariants.
- Transactions short and explicit.
- Idempotency for retried commands/messages.
- Optimistic/pessimistic strategy clear.
- Error translation semantic.
- Outbox/inbox where needed.
Performance
- Query shape reviewed.
- Indexes support query.
- Results bounded.
- N+1 tested.
- Pool pressure monitored.
- Batch/backfill throttled.
Evolvability
- Migration expand-contract.
- Backfill resumable.
- Old/new app compatibility.
- Cleanup planned.
- Tests cover partial migration.
Operability
- Metrics/logs named.
- Alerts and dashboards.
- Runbook for migration/backfill.
- Kill switch/feature flag.
- Rollback/forward fix story.
Security
- Tenant predicate.
- Authorization predicate.
- SQL injection protected.
- Cache key scoped.
- Sensitive data not logged.
49. Final Mental Shortcut
When reviewing any data access code, scan for:
Where is the transaction?
Where is the constraint?
Where is the index?
Where is the limit?
Where is the tenant predicate?
Where is the version/idempotency?
Where is the query count budget?
Where is the migration compatibility?
Where is the rollback story?
Where is the test?
Where is the metric?
If many are missing, the code is not production-ready.
50. Closing Summary
Across this 60-part series, we covered:
- mental model and boundaries;
- JDBC primitives;
- transactions and consistency;
- DAO/repository/query patterns;
- JPA/Hibernate production patterns;
- jOOQ/MyBatis SQL-first patterns;
- reactive/R2DBC/async/virtual threads;
- migration/backfill/testing/playbook.
The core lesson:
Data access is not a framework choice.
Data access is production correctness engineering.
Mastering it means understanding both worlds:
Java application model
+
database execution model
and designing the boundary deliberately.
51. References
- Jakarta Persistence Specification: https://jakarta.ee/specifications/persistence/3.2/jakarta-persistence-spec-3.2
- Hibernate ORM User Guide: https://docs.hibernate.org/stable/orm/userguide/html_single/
- jOOQ Manual: https://www.jooq.org/doc/latest/manual/
- MyBatis Documentation: https://mybatis.org/mybatis-3/
- R2DBC Specification: https://r2dbc.io/spec/1.0.0.RELEASE/spec/html/
- Spring Framework Transaction Management: https://docs.spring.io/spring-framework/reference/data-access/transaction.html
- PostgreSQL Documentation: https://www.postgresql.org/docs/current/
- Flyway Documentation: https://documentation.red-gate.com/fd
- Liquibase Documentation: https://docs.liquibase.com/
- Testcontainers Java: https://java.testcontainers.org/
You just completed lesson 60 in final stretch. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.
Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.