Final StretchOrdered learning track

Data Access Engineering Playbook

Learn Java Data Access Pattern In Action - Part 060

Final Data Access Engineering Playbook untuk Java production: decision matrix, failure model, review rubric, production readiness checklist, architecture patterns, tool selection, transaction, migration, testing, observability, dan interview-level synthesis.

15 min read2852 words
Prev
Finish
Lesson 6060 lesson track51–60 Final Stretch
#java#data-access#playbook#production-readiness+5 more

Part 060 — Data Access Engineering Playbook

Data access engineering bukan sekadar memilih:

JDBC vs JPA vs jOOQ vs MyBatis vs R2DBC

Ini adalah kemampuan merancang kontrak antara aplikasi dan database agar:

  • benar di bawah concurrency;
  • cepat di bawah load;
  • aman di bawah migration;
  • bisa diobservasi saat incident;
  • bisa dites sebelum production;
  • bisa berevolusi tanpa downtime;
  • bisa dijelaskan ke reviewer, auditor, dan interviewer.

Bagian terakhir ini adalah playbook final untuk menyatukan seluruh seri.


1. Core Thesis

Top-tier data access engineer memiliki tiga kemampuan:

1. Correctness:
   transaksi, constraint, isolation, locking, idempotency.

2. Performance:
   query shape, index, batching, connection pressure, read model.

3. Evolvability:
   migration, compatibility, testing, observability, operational runbook.

Tools hanyalah implementasi dari prinsip ini.


2. The Final Mental Model

Every data access operation is a contract:

Input:
  command/query parameters, tenant, actor, expected version

Boundary:
  transaction, consistency, authorization, timeout

Database:
  schema, constraints, indexes, isolation, locks

Output:
  domain result, DTO, event, error, affected rows

Operational:
  metrics, logs, tests, migration compatibility, rollback story

If you cannot describe these, operation is not production-ready.


3. Data Access Responsibility Map

ComponentResponsibility
DAOOwn SQL primitive and mapping
RepositoryOwn aggregate persistence contract
Query ServiceOwn read DTO/projection/search
Unit of WorkTrack changes within transaction if ORM
Transaction ScriptOrchestrate simple use case transaction
Outbox RepositoryDurable integration event write
Inbox RepositoryIdempotent message processing
MigrationVersioned schema evolution
Backfill JobSafe data movement
Read Model ProjectorEventually consistent projection
Cache LayerAccelerate reads with freshness contract

Do not put all responsibilities in one generic repository.


4. Tool Decision Matrix

NeedGood Fit
Primitive explicit SQLJDBC
Object graph aggregate with Unit of WorkJPA/Hibernate
SQL-first type-safe queryjOOQ
Hand-written SQL with mapper XMLMyBatis
Non-blocking relational accessR2DBC
Simple Spring repository over JPASpring Data JPA
Bulk/report/read model SQLjOOQ/MyBatis/JDBC
Stored procedure legacyJDBC/MyBatis/jOOQ
Complex dashboardDTO projection/read model/jOOQ
High-volume batch insertJDBC batch/jOOQ batch
Reactive end-to-end high I/OR2DBC
Imperative high concurrency Java 21JDBC/JPA/jOOQ + virtual threads

Choose by workload, not fashion.


5. JDBC When

Use JDBC when:

  • need full control;
  • simple SQL;
  • library/framework code;
  • batch/streaming with minimal abstraction;
  • no ORM/DSL overhead desired;
  • team comfortable with manual mapping;
  • migration/tooling simple.

JDBC requires discipline:

  • try-with-resources;
  • prepared statements;
  • explicit transaction;
  • error classification;
  • mapping tests;
  • batch failure handling.

6. JPA/Hibernate When

Use JPA/Hibernate when:

  • aggregate lifecycle maps well to entities;
  • Unit of Work/dirty checking valuable;
  • moderate object graph;
  • domain command path benefits from managed entities;
  • team understands fetch/flush/cascade/lazy behavior.

Avoid/limit JPA for:

  • large dashboards;
  • complex reports;
  • bulk updates with audit;
  • huge streaming;
  • SQL-heavy analytics;
  • unbounded graph traversal.

Use projection/query service alongside JPA.


7. jOOQ When

Use jOOQ when:

  • SQL is important and must be type-safe;
  • database-first workflow;
  • generated schema compile safety desired;
  • dynamic SQL/CTE/window/upsert common;
  • read/query/reporting is central;
  • team wants explicit query shape;
  • ORM hidden behavior is unacceptable.

jOOQ requires:

  • codegen workflow;
  • mapping discipline;
  • transaction integration;
  • update count checks;
  • SQL review culture.

8. MyBatis When

Use MyBatis when:

  • team wants hand-written SQL;
  • XML SQL review is preferred;
  • legacy/stored procedure DB;
  • mapping result sets without ORM;
  • dynamic SQL tags are enough;
  • type-safe DSL/codegen not required.

MyBatis requires:

  • #{} vs ${} discipline;
  • resultMap tests;
  • TypeHandler tests;
  • update count checks;
  • dynamic SQL tests;
  • real DB integration.

9. R2DBC When

Use R2DBC when:

  • stack is reactive end-to-end;
  • high concurrency I/O-bound workload;
  • non-blocking drivers mature for your DB;
  • team strong in Reactor;
  • blocking dependencies minimal;
  • transaction complexity manageable.

Avoid R2DBC when:

  • DB is bottleneck;
  • team not reactive-ready;
  • JPA features needed;
  • virtual threads/JDBC sufficient;
  • driver support uncertain.

10. Virtual Threads When

Use virtual threads when:

  • blocking JDBC/JPA/jOOQ/MyBatis code is already mature;
  • thread scalability is concern;
  • Java runtime/framework supports it;
  • imperative simplicity is valuable;
  • reactive rewrite not justified.

Remember:

pool remains bottleneck
query remains bottleneck
locks remain bottleneck

Virtual threads reduce thread cost, not database cost.


11. Command vs Query Split

Command path:

intent-specific transaction
aggregate/source truth
constraints
idempotency
audit/outbox
optimistic/pessimistic concurrency
semantic errors

Query path:

DTO projection
read model
bounded result
sort whitelist
pagination
index
query count budget
tenant/security predicate

Do not force one repository method style for both.


12. Command Path Checklist

  • Command has ID/idempotency if retry possible.
  • Tenant/actor/scope explicit.
  • Expected version included if user edit.
  • Source-of-truth table/aggregate loaded.
  • Transaction boundary at use case.
  • External calls not inside transaction.
  • DB constraints enforce critical invariants.
  • Optimistic/pessimistic strategy chosen.
  • Affected rows checked.
  • Audit written if required.
  • Outbox written atomically.
  • Errors mapped semantically.
  • Retry only for retryable/idempotent failures.
  • Tests cover commit, rollback, conflict, duplicate.

13. Query Path Checklist

  • Query object validates filters/limit/sort.
  • Tenant/security predicate enforced in SQL.
  • Result bounded.
  • DTO projection/read model preferred for lists.
  • Entity graph not leaked.
  • Sort deterministic with tiebreaker.
  • Pagination strategy chosen: offset/keyset/slice.
  • Count query reviewed or avoided.
  • Index supports predicates/order.
  • Query timeout defined.
  • Query named and observable.
  • Query count/N+1 tested.
  • Mapping handles null/enums/time.
  • Generated SQL/plan reviewed if critical.

14. Transaction Design Checklist

  • Boundary at use case/command handler.
  • Transaction is short.
  • No external I/O inside transaction.
  • Isolation level understood.
  • Lock strategy explicit.
  • Lock ordering consistent.
  • Timeout/lock timeout configured.
  • Retry wraps whole transaction.
  • Idempotency supports retry.
  • Bulk/native updates do not leave stale ORM context.
  • Rollback tested.

15. Consistency Pattern Matrix

ProblemPattern
duplicate requestidempotency key
lost update@Version / expected version / conditional update
duplicate unique business keyunique constraint
one active relationshippartial unique index / constraint
capacity reservationatomic conditional update
high contention rowpessimistic lock / queue / sharding
cross-service consistencyoutbox/inbox/saga
read dashboard performanceread model/projection
async duplicate eventinbox dedup + source version
backfill old eventidempotent upsert
no distributed transactionlocal transaction + durable messages

Use database constraints as final guard whenever possible.


16. Error Mapping Matrix

DB/Framework ErrorApplication Meaning
unique constraintduplicate business key/idempotent duplicate
FK violationinvalid reference / not found / invariant bug
check constraintinvalid state/data bug
not-null violationmapper/app bug or invalid command
optimistic lockconflict/stale version
lock timeoutbusy/retry later
deadlockretryable transaction failure
serialization failureretryable transaction failure
query timeoutoverloaded/slow query/fail fast
non-unique resultdata invariant violation
no resultnot found or empty depending contract

Do not leak raw SQL exceptions to API.


17. Performance Failure Model

Common causes:

  • N+1;
  • missing index;
  • slow count;
  • unbounded result;
  • over-fetching entity graph;
  • cartesian explosion;
  • connection pool wait;
  • long transaction;
  • lock contention;
  • batch not batching;
  • full graph merge;
  • cache churn;
  • read model lag/catch-up;
  • async retry storm.

Diagnosis order:

query count -> slow SQL -> plan -> pool wait -> locks -> transaction duration -> workload/queue

18. N+1 Prevention Rule

For list/read endpoint:

Use DTO projection or read model.

For command/detail with aggregate:

Use intent-specific fetch plan.

Test query count.

Do not fix N+1 by making everything EAGER.


19. Pagination Rule

Use offset for:

  • shallow UI pages;
  • small result sets;
  • admin convenience.

Use keyset/cursor for:

  • deep pagination;
  • infinite scroll;
  • high-scale feeds;
  • export traversal.

Use async export for huge data.

Always sort deterministically.


20. Batch/Bulk Rule

Use entity loop when:

  • domain method needed;
  • row count moderate;
  • audit/outbox per row manageable.

Use bulk SQL when:

  • technical update;
  • same transformation;
  • no per-row domain event required;
  • version/audit implications understood.

Use chunked job when:

  • large data;
  • need resume/throttle/audit.

21. Cache Rule

Cache only with freshness contract.

Good candidates:

  • immutable reference data;
  • stable DTO read;
  • read model result with TTL/source version.

Bad candidates:

  • command validation;
  • mutable aggregate state;
  • authorization without strict invalidation;
  • financial/inventory/capacity truth.

Cache must have:

  • key scope;
  • TTL/invalidation;
  • metrics;
  • manual eviction;
  • test for stale safety.

22. Migration Rule

For production schema change:

expand -> deploy compatible code -> backfill -> switch -> enforce -> contract later

Never:

  • edit applied migration;
  • rename/drop during rolling deploy;
  • backfill huge table in one transaction;
  • add constraint without checking dirty data;
  • assume migration tool equals zero downtime.

23. Testing Rule

Mock tests do not prove data access.

You need:

  • real DB integration;
  • migrations applied;
  • mapping tests;
  • constraint tests;
  • rollback tests;
  • query count tests;
  • concurrency tests for critical locks;
  • migration compatibility tests;
  • backfill job tests;
  • performance smoke.

24. Observability Rule

Every critical operation needs names.

Names:

CaseFileRepository.loadForApproval
CaseDashboardQuery.search
OutboxDao.claimNextBatch
CaseStatusBackfillJob.processChunk

Metrics:

  • duration;
  • error count;
  • rows returned/updated;
  • pool wait;
  • timeout;
  • retry;
  • conflict;
  • lock wait;
  • lag/progress for async/backfill.

If not observable, it is not production-ready.


25. Security Rule

Data access is security boundary.

Every query must consider:

  • tenant predicate;
  • authorization scope;
  • raw SQL injection;
  • cache key scope;
  • sensitive columns;
  • redaction;
  • admin bypass explicit;
  • audit.

SQL bug can be data breach.


26. Multi-Tenant Rule

Always design:

  • tenant ID in primary predicates;
  • tenant-aware indexes;
  • tenant-aware cache keys;
  • per-tenant rate limits for expensive work;
  • hot tenant metrics;
  • tenant isolation tests.

For global/admin queries, name them explicitly.


27. Read Model Rule

Use read model when:

  • dashboard joins too many tables;
  • query cost high and repeated;
  • list view needs denormalized fields;
  • search/filter shape differs from write schema;
  • read path can tolerate lag;
  • rebuild process possible.

Read model must have:

  • source version;
  • idempotent projector;
  • lag metric;
  • rebuild/backfill;
  • schema owner;
  • query indexes.

28. Outbox/Inbox Rule

Use outbox when DB state change must publish event.

Use inbox when consuming message must update DB exactly/effectively once.

Outbox requires:

  • event key unique;
  • same transaction as state;
  • publisher claim/lease;
  • idempotent publish/consumer;
  • mark published with ownership.

Inbox requires:

  • message ID unique;
  • payload hash conflict check;
  • ack after commit;
  • retry/dead-letter.

29. Review Rubric: Repository Method

Ask:

  1. What is semantic contract?
  2. Does method name reveal lock/fetch/update behavior?
  3. Is transaction expected outside or inside?
  4. What cardinality?
  5. What errors?
  6. What concurrency protection?
  7. What rows/tables touched?
  8. What tests prove it?

Bad:

save(entity)
findAll()
update(data)

for complex domain without precise semantics.


30. Review Rubric: SQL Query

Ask:

  1. Query name and owner?
  2. Explicit selected columns?
  3. Tenant/security predicate?
  4. Bound parameters?
  5. Raw fragments whitelisted?
  6. Result bounded?
  7. Sort deterministic?
  8. Index supports filter/order?
  9. Count query needed?
  10. Explain plan reviewed?
  11. Mapping nullability handled?
  12. Timeout?
  13. Metrics?

31. Review Rubric: Transaction

Ask:

  1. What is atomic?
  2. What external effects happen?
  3. What happens on rollback?
  4. What happens on timeout?
  5. What happens on duplicate retry?
  6. What locks are acquired?
  7. How long can it run?
  8. What isolation anomalies possible?
  9. Are constraints final guard?
  10. Is retry safe?

32. Review Rubric: Migration

Ask:

  1. Is change additive?
  2. Does old app still work?
  3. Does new app work with partial data?
  4. Is backfill needed?
  5. Is backfill chunked?
  6. Are constraints/indexes safe?
  7. Are applied migrations immutable?
  8. Is rollback/forward fix defined?
  9. Are hidden consumers known?
  10. Is cleanup scheduled?

33. Review Rubric: Backfill/Repair

Ask:

  1. What exact rows affected?
  2. Is dry run available?
  3. Is update idempotent?
  4. Is progress durable?
  5. Is progress saved atomically?
  6. Can it pause/resume?
  7. Is throttle/kill switch present?
  8. Are errors isolated?
  9. Is audit/snapshot needed?
  10. What constraint prevents recurrence?

34. Production Readiness Checklist

A data access feature is production-ready if:

  • schema migration reviewed and safe;
  • app compatibility planned;
  • repository/query contract explicit;
  • transaction/constraint/concurrency strategy defined;
  • SQL/query plan/index reviewed;
  • DTO/mapping tested;
  • rollback/forward-fix story exists;
  • metrics/logs/traces named;
  • tests cover happy/error/rollback;
  • load/pressure implications understood;
  • runbook exists for risky migration/backfill;
  • security/tenant predicates verified;
  • feature flag/canary if risky.

35. Interview-Level Explanation: JPA vs jOOQ

A strong answer:

I use JPA/Hibernate when I want Unit of Work and aggregate lifecycle management,
but I avoid using managed entities as read models. For query-heavy paths I prefer
DTO projections, jOOQ, or read models because query shape is explicit and easier
to tune. In production I care less about tool ideology and more about transaction
boundary, query count, indexes, migration compatibility, and error semantics.

This shows seniority.


36. Interview-Level Explanation: Optimistic vs Pessimistic Lock

Optimistic locking detects stale updates using a version column and is good when
contention is low or user edits can be retried/reloaded. Pessimistic locking
serializes a short critical section by locking rows, useful for high contention
or child-set invariants. I still use constraints as final guards. I don't blindly
retry optimistic conflicts for user commands because that can apply decisions to
state the user never saw.

37. Interview-Level Explanation: N+1

N+1 happens when one initial query is followed by one query per result row,
usually due to lazy loading in mappers or serializers. I prevent it by not
returning entities from read endpoints, using DTO projections/read models for
lists, explicit fetch plans for aggregate commands, and query count tests for
critical endpoints.

38. Interview-Level Explanation: Zero-Downtime Migration

I use expand-contract. First I add backward-compatible schema, deploy code that
can write/read both old and new representations, backfill in chunks, validate
parity, switch reads behind a feature flag, enforce constraints, and only later
drop old schema after all old app versions and jobs are gone. Rollback differs
by phase; early rollback is usually app/flag rollback, late rollback is often
forward fix.

39. Interview-Level Explanation: Idempotency

Retries happen because networks, timeouts, and async delivery are unreliable.
I use idempotency keys or message IDs stored with payload hash and outcome.
Duplicate same key returns same result or skips work; same key with different
payload is conflict. For projections I use source version so old events cannot
overwrite newer read model state.

40. Interview-Level Explanation: Outbox

Outbox solves the problem of updating DB state and publishing an event atomically.
The command transaction writes the state change and outbox row together. A worker
later claims and publishes outbox events with idempotent event keys. Consumers
must also be idempotent. This avoids distributed transaction while giving durable
eventual delivery.

41. Interview-Level Explanation: R2DBC vs Virtual Threads

R2DBC removes blocking threads and is valuable in a reactive end-to-end stack,
but it does not remove database bottlenecks. Virtual threads make blocking JDBC
scale better at the thread level while keeping imperative code. I compare them
with load tests and choose based on bottleneck, driver maturity, team expertise,
and transaction complexity. Pool, timeout, and backpressure remain necessary in
both.

42. Final Architecture Pattern: Hybrid Pragmatism

A strong production architecture often uses:

JPA/Hibernate:
  command aggregate persistence where Unit of Work helps.

jOOQ/MyBatis/JDBC:
  explicit read queries, reports, batch, outbox/inbox.

Migration tool:
  Flyway/Liquibase.

Read model:
  high-traffic dashboards.

Outbox/inbox:
  service integration.

Backfill jobs:
  safe schema/data evolution.

Testcontainers:
  real DB integration tests.

Virtual threads:
  scalable imperative execution if Java/runtime supports it.

This is not overengineering if each piece solves a real problem.


43. Common Bad Architecture

Controller returns JPA entity.
Repository has findAll.
Service mutates detached entities.
No @Version.
No idempotency.
No outbox.
Flyway migration drops column directly.
Dashboard maps entity list to DTO and triggers N+1.
No query count tests.
No real DB integration tests.
Cache used for command validation.
Backfill runs as one huge SQL.

This is incident factory.


44. Good Architecture Example

POST /cases/{id}/approve
  -> ApproveCaseCommand(commandId, caseId, expectedVersion, actor, reason)
  -> @Transactional use case
  -> idempotency start
  -> repository.loadForApproval(caseId)
  -> domain approve()
  -> repository.save with version check
  -> audit insert
  -> outbox insert
  -> idempotency complete
  -> response DTO

GET /cases/dashboard
  -> CaseDashboardQuery(validated filters, sort, cursor)
  -> jOOQ/MyBatis DTO projection or read model
  -> bounded query
  -> query count budget
  -> tenant/security predicate

Clear separation.


45. Data Access Design Document Template

# Data Access Design: <Feature>

## Use Cases
- Commands:
- Queries:

## Schema
- Tables:
- Constraints:
- Indexes:

## Transaction Model
- Boundary:
- Isolation:
- Locks:
- Idempotency:

## Query Model
- DTO/read model:
- Pagination:
- Sort/filter:
- Query plan:

## Migration Plan
- Expand:
- Backfill:
- Switch:
- Contract:

## Observability
- Metrics:
- Logs:
- Alerts:

## Tests
- Unit:
- Integration:
- Concurrency:
- Migration:
- Performance smoke:

## Rollback/Forward Fix

Use for risky changes.


46. Final Failure Checklist

Before launch, ask:

  • What if this command is retried?
  • What if two users click approve at same time?
  • What if DB update commits but HTTP times out?
  • What if outbox publish fails?
  • What if duplicate event is consumed?
  • What if migration partially completes?
  • What if old app version still runs?
  • What if backfill job crashes?
  • What if query returns 100x rows?
  • What if tenant filter is missing?
  • What if cache is stale?
  • What if read model lags?
  • What if DB pool is full?
  • What if lock waits?
  • What if generated SQL changes after library upgrade?

If you can answer, you're operating at senior/principal level.


47. Learning Path After This Series

To go deeper:

  1. PostgreSQL internals and query planning.
  2. Transaction isolation anomalies with real labs.
  3. Advanced indexing: partial, covering, expression, GIN/GiST/BRIN.
  4. Event-driven consistency: outbox/inbox/saga patterns.
  5. High-scale read models and projection rebuild.
  6. Data migration and online schema change at large scale.
  7. Observability for database-heavy services.
  8. Performance engineering and load testing.
  9. Distributed systems failure modeling.
  10. Database reliability engineering.

Data access mastery is lifelong.


48. Final Production Checklist

Use this as final gate.

Correctness

  • Constraints enforce invariants.
  • Transactions short and explicit.
  • Idempotency for retried commands/messages.
  • Optimistic/pessimistic strategy clear.
  • Error translation semantic.
  • Outbox/inbox where needed.

Performance

  • Query shape reviewed.
  • Indexes support query.
  • Results bounded.
  • N+1 tested.
  • Pool pressure monitored.
  • Batch/backfill throttled.

Evolvability

  • Migration expand-contract.
  • Backfill resumable.
  • Old/new app compatibility.
  • Cleanup planned.
  • Tests cover partial migration.

Operability

  • Metrics/logs named.
  • Alerts and dashboards.
  • Runbook for migration/backfill.
  • Kill switch/feature flag.
  • Rollback/forward fix story.

Security

  • Tenant predicate.
  • Authorization predicate.
  • SQL injection protected.
  • Cache key scoped.
  • Sensitive data not logged.

49. Final Mental Shortcut

When reviewing any data access code, scan for:

Where is the transaction?
Where is the constraint?
Where is the index?
Where is the limit?
Where is the tenant predicate?
Where is the version/idempotency?
Where is the query count budget?
Where is the migration compatibility?
Where is the rollback story?
Where is the test?
Where is the metric?

If many are missing, the code is not production-ready.


50. Closing Summary

Across this 60-part series, we covered:

  • mental model and boundaries;
  • JDBC primitives;
  • transactions and consistency;
  • DAO/repository/query patterns;
  • JPA/Hibernate production patterns;
  • jOOQ/MyBatis SQL-first patterns;
  • reactive/R2DBC/async/virtual threads;
  • migration/backfill/testing/playbook.

The core lesson:

Data access is not a framework choice.
Data access is production correctness engineering.

Mastering it means understanding both worlds:

Java application model
+
database execution model

and designing the boundary deliberately.


51. References

Lesson Recap

You just completed lesson 60 in final stretch. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.