Deepen PracticeOrdered learning track

SQL Ownership and Reviewability

Learn Java Data Access Pattern In Action - Part 048

SQL ownership dan reviewability untuk Java data access production: query naming, explain plan, schema coupling, generated SQL visibility, migration safety, regulatory defensibility, query catalog, performance budgets, dan code review rubric.

15 min read2812 words
PrevNext
Lesson 4860 lesson track34–50 Deepen Practice
#java#data-access#sql#review+5 more

Part 048 — SQL Ownership and Reviewability

Dalam sistem production, SQL bukan detail kecil di balik ORM/mapper.

SQL adalah kontrak performa, correctness, consistency, security, audit, dan biaya.

Query yang tidak bisa dinamai, direview, dijelaskan, dan diobservasi adalah liability.

Top engineer memperlakukan SQL seperti kode production:

  • ada owner;
  • ada nama;
  • ada contract;
  • ada test;
  • ada explain plan;
  • ada metric;
  • ada migration compatibility;
  • ada rollback story;
  • ada review checklist.

Bagian ini menutup Phase 6 dengan budaya engineering untuk SQL ownership.


1. Core Thesis

SQL ownership berarti setiap query penting punya jawaban jelas:

Who owns this query?
What business use case does it serve?
What tables/indexes does it depend on?
What result cardinality is expected?
What is its latency budget?
How is it tested?
How is it observed?
How does it fail?
What happens when schema changes?

Reviewability berarti query bisa dipahami oleh manusia, bukan hanya "berhasil jalan".


2. Why SQL Ownership Matters

Tanpa ownership:

  • query lambat tidak ada yang merasa bertanggung jawab;
  • schema migration mematahkan aplikasi;
  • ORM generated SQL tidak pernah direview;
  • dashboard query membunuh OLTP DB;
  • tenant predicate hilang;
  • count query lebih mahal dari data query;
  • index dibuat tanpa memahami workload;
  • duplicate query muncul di banyak tempat;
  • audit/regulatory report tidak defensible;
  • incident debugging lambat.

SQL ownership mengubah data access dari "framework magic" menjadi engineering discipline.


3. Query as Production Artifact

Query critical harus diperlakukan seperti artifact:

name: CaseDashboardQuery.search
owner: Case Management Team
purpose: supervisor dashboard list
tables: case_file, officer
latency budget: p95 < 150ms at page size 50
cardinality: max 51 rows
consistency: source OLTP read committed
security: tenant + visible units predicate
indexes: ix_case_tenant_status_updated
tests: integration + query count + tenant isolation

Ini bukan bureaucracy. Ini operational clarity.


4. Query Naming

Every important query should have stable name.

Examples:

CaseDashboardQuery.search
CaseFileRepository.loadForApproval
OutboxDao.claimNextBatch
CaseExportQuery.readAfterCursor
OfficerWorkloadDao.reserveCapacity
ProjectionDao.applyCaseDashboardSnapshot

Use name in:

  • method name;
  • SQL comment;
  • metric tag;
  • logs/traces;
  • dashboard;
  • slow query analysis.

5. SQL Comment

Native/MyBatis/JDBC/jOOQ plain SQL:

/* query=CaseDashboardQuery.search */
select ...

Benefits:

  • DB slow query logs identify app query;
  • tracing easier;
  • incident debugging faster.

Do not put high-cardinality/sensitive values in comments.

Bad:

/* tenant=abc user=123 keyword=secret */

6. ORM Generated SQL Visibility

If using JPA/Hibernate, generated SQL must be inspected for critical path.

Questions:

  • does JPQL generate expected joins?
  • is pagination in DB?
  • is there cartesian explosion?
  • are associations fetched unexpectedly?
  • does count query use distinct?
  • does update include version predicate?
  • does lock generate for update?
  • are binds correct types?
  • are tenant predicates present?

ORM does not exempt query review.


7. Query Catalog

For high-scale systems, maintain lightweight query catalog.

Example:

## CaseDashboardQuery.search

Owner: Case Platform
Source: CaseDashboardQuery.java
Type: read projection
Tables: case_file, officer
Frequency: high
Latency target: p95 < 150ms
Page size: max 100
Consistency: read committed source DB
Indexes:
- ix_case_tenant_status_updated
- ix_officer_id
Notes:
- Use Slice, no total count by default.
- Sort whitelist only.

This can live in docs near code.


8. Explain Plan Culture

Critical query should have explain plan review.

Checklist:

  • index scan vs sequential scan;
  • rows estimated vs actual;
  • join order;
  • nested loop/hash/merge join;
  • sort operation;
  • sort spill;
  • filter selectivity;
  • tenant predicate;
  • limit pushdown;
  • count cost;
  • lock behavior;
  • partition pruning if applicable.

Do not approve performance-critical query only by eyeballing Java code.


9. Explain Plan Is Environment-Sensitive

Plan depends on:

  • database engine/version;
  • statistics;
  • data volume;
  • data distribution;
  • parameter values;
  • tenant size;
  • indexes;
  • configuration;
  • bind peeking/plan cache behavior.

A query fast in dev can fail in production data.

Use staging-like data for important query.


10. Data Distribution Matters

Example:

tenant A: 500 rows
tenant B: 50 million rows

Query with tenant predicate may perform differently for hot tenant.

Status distribution:

OPEN = 1%
ARCHIVED = 90%

Index selectivity matters.

Performance tests should include skew/hot tenants, not only tiny fixtures.


11. Query Plan Regression

Schema/index changes can regress query.

Examples:

  • dropped composite index;
  • changed column type;
  • new nullable column affects filter;
  • changed query sort;
  • ORM version changes SQL;
  • statistics stale;
  • partitioning changed.

Mitigation:

  • migration review includes query impact;
  • performance smoke test;
  • slow query monitoring;
  • query catalog mapping indexes to queries.

12. Schema Coupling

Every query is coupled to schema.

Coupling types:

  • column name/type;
  • constraint name;
  • index availability;
  • FK relationship;
  • enum/check code;
  • nullability;
  • generated value;
  • trigger behavior;
  • partition scheme;
  • view/read model shape.

Make coupling visible.

SQL-first tools like jOOQ catch some coupling at compile time. ORM/string SQL need tests and review.


13. Migration Compatibility

Rolling deployment requires old and new app versions work during migration.

Query change should consider:

old app + old schema
old app + expanded schema
new app + expanded schema
new app + contracted schema

Use expand-contract:

  1. add nullable/new column;
  2. write both/dual read;
  3. backfill;
  4. switch reads;
  5. enforce constraints;
  6. remove old column later.

Do not break query assumptions in same deploy.


14. Constraint Names Are API

Application error translation often depends on constraint names.

Migration:

alter table case_file
add constraint uq_case_file_tenant_case_number
unique (tenant_id, case_number);

App maps:

uq_case_file_tenant_case_number -> DuplicateCaseNumber

Random/generated names hurt maintainability.

Constraint names are part of data access contract.


15. Index Ownership

Index should have purpose.

Bad:

random index created because query was slow once

Good:

ix_case_file_tenant_status_updated
supports:
- CaseDashboardQuery.search(status filter, updated desc)
- CaseExportQuery.readOpenAfter maybe

Document:

  • query supported;
  • columns/order;
  • partial predicate if any;
  • write overhead;
  • removal impact.

Indexes speed reads but slow writes and consume storage.


16. Composite Index Design

Query:

where tenant_id = ?
  and status = ?
order by updated_at desc, id desc
limit ?

Possible index:

create index ix_case_file_tenant_status_updated
on case_file(tenant_id, status, updated_at desc, id desc);

Review:

  • equality predicates first;
  • range/order columns next;
  • tiebreaker included;
  • covering index if needed;
  • cardinality/selectivity;
  • partial index if status subset.

17. Partial Index

If query often searches active rows:

create index ix_case_active_tenant_updated
on case_file(tenant_id, updated_at desc, id desc)
where archived_at is null;

Great when predicate stable and selective.

But query must include matching predicate.

Review planner actually uses it.


18. Count Query Review

Data query:

select ...
from case_file
where tenant_id = ?
order by updated_at desc
limit 50;

Count:

select count(*)
from case_file
where tenant_id = ?;

For huge tenant, count may be expensive.

Alternatives:

  • Slice with limit+1;
  • approximate count;
  • precomputed count;
  • count only on first page;
  • async count.

Review count as separate query.


19. Query Cardinality Contract

Every query should define expected cardinality:

QueryCardinality
find by ID0 or 1
get by IDexactly 1 or error
search page0..limit+1
claim batch0..limit
update by versionexactly 1 or conflict
cleanup bulk0..N
insert1
upsert projection0 or 1 depending old version

Cardinality drives return type and error handling.


20. Security Review

Every tenant/user-scoped query must answer:

  • tenant predicate included?
  • authorization scope included?
  • is scope enforced in SQL before rows leave DB?
  • can cache key leak cross-tenant?
  • can raw sort/filter inject SQL?
  • are sensitive columns selected?
  • is redaction handled?
  • are admin cross-tenant queries explicitly named?

Security bug in query is data breach.


21. SQL Injection Review

Safe:

where status = ?

Unsafe:

where status = '${status}'

Dynamic fragments requiring raw SQL:

  • sort column;
  • direction;
  • table/column selection;
  • function name.

These must use whitelist enum.

Never pass request string directly into SQL.


22. Query Observability

Metrics:

data.query.duration{query}
data.query.error.count{query, error_type}
data.query.rows{returned, query}
data.query.timeout.count{query}
data.query.lock_wait.count{query}
data.query.slow.count{query}

Avoid high-cardinality tags:

  • tenant ID;
  • user ID;
  • case ID;
  • keyword.

Use logs/traces for sampled debugging with safe redaction.


23. Row Count Metric

Returned row count helps detect anomalies.

Example:

CaseDashboardQuery.search returns 51 rows max.

If suddenly returns 10,000, limit bug.

For update:

affected_rows

If update expected 1 but affected 0/2, correctness signal.


24. Query Timeout

Interactive query should have timeout/deadline.

Timeout should be part of contract:

CaseDashboardQuery.search timeout 1s.
CaseExportQuery job chunk timeout 30s.
OutboxClaim timeout 2s.

Timeout response differs:

  • UI query: retry later / refine filter;
  • background job: retry with backoff;
  • command: maybe fail/retry depending idempotency.

25. Lock Review

If query locks:

  • is lock visible in method name?
  • what rows are locked?
  • how long transaction lasts?
  • lock timeout?
  • deadlock risk?
  • lock order?
  • indexes support lock predicate?
  • external calls inside transaction?
  • retry strategy?

Locking SQL must be reviewed carefully.


26. Generated SQL Review in CI

Possible checks:

  • integration tests run and log generated SQL for critical paths;
  • query count tests;
  • migration test applies schema;
  • jOOQ generated classes compile;
  • MyBatis XML parsed;
  • JPA named queries validated;
  • repository/query tests hit real DB.

Do not rely only on unit tests with mocks.


27. Real Database Testing

Use the same engine family as production.

H2 or in-memory DB can miss:

  • SQL dialect differences;
  • locking behavior;
  • isolation semantics;
  • JSON functions;
  • generated keys;
  • case sensitivity;
  • timestamp behavior;
  • constraint names;
  • partial indexes;
  • query plans.

Use Testcontainers or equivalent real DB for data access integration.


28. Query Count Tests

Critical endpoint:

@Test
void dashboardQueryWithinSqlBudget() {
    fixture.createCases(50);

    sqlCounter.reset();

    dashboardUseCase.search(...);

    assertThat(sqlCounter.total()).isLessThanOrEqualTo(2);
}

Catches N+1 and accidental extra query.


29. Explain Plan Test?

Automated explain plan tests can be brittle.

But performance smoke tests can catch big regressions:

  • load representative data sample;
  • run critical query;
  • assert within generous threshold;
  • inspect plan manually for major changes.

For very critical systems, plan regression tooling may be justified.


30. Query Review Rubric

During code review, ask:

  1. What use case is this query for?
  2. What is expected cardinality?
  3. What is max result size?
  4. Is tenant/security scope applied?
  5. Are parameters bound?
  6. Are raw fragments whitelisted?
  7. Is sort deterministic?
  8. Is pagination offset/keyset appropriate?
  9. Does query need count?
  10. What indexes support it?
  11. Is generated SQL reviewed?
  12. What happens on zero/multiple rows?
  13. How is timeout handled?
  14. What metrics/logs identify it?
  15. What tests cover it?

31. SQL Style Guide

Adopt SQL style for readability:

  • explicit column lists, no select *;
  • table aliases clear;
  • predicates one per line;
  • joins explicit;
  • query comment/name for critical SQL;
  • deterministic order;
  • tenant predicate near top;
  • limit visible;
  • CTEs named by meaning;
  • avoid clever one-liners;
  • stable constraint/index names.

Readable SQL is operationally safer.


32. No select *

Reasons:

  • schema change affects result shape;
  • unnecessary IO;
  • mapping ambiguity;
  • sensitive field exposure;
  • index-only scan harder;
  • review unclear.

Use explicit columns.

Exception: internal selectFrom(table) for known simple row might be acceptable, but critical query should prefer explicit fields.


33. Query Drift

Same query logic duplicated in many places drifts.

Example:

active case predicate
status != CLOSED and deleted_at is null

appears in 10 SQL files.

Mitigation:

  • named query method;
  • SQL fragment;
  • view/read model;
  • reusable condition function;
  • test canonical behavior.

Avoid over-centralizing too much, but shared invariant predicates need ownership.


34. View as Query Contract

Database view can centralize complex read logic.

Pros:

  • one SQL definition;
  • shared across consumers;
  • DB-level permission possible.

Cons:

  • migration/versioning;
  • hidden performance cost;
  • old/new app compatibility;
  • view dependency complexity.

Treat view as schema artifact with tests and owner.


35. Regulatory Defensibility

For regulated/reporting systems, query must answer:

  • what data source?
  • what cutoff time?
  • what filters?
  • what version of schema/query?
  • who ran it?
  • what result count/hash?
  • can it be reproduced?
  • were rows snapshot or live?
  • what transformations were applied?

Ad hoc dashboard SQL is not enough for formal report.


36. Report Snapshot Pattern

For critical report:

create report_run
capture parameters
capture cutoff/snapshot
write report rows
store row count/hash
export from report rows

Query becomes reproducible.

Store:

  • query version;
  • migration/schema version if needed;
  • parameters;
  • generated file checksum;
  • actor;
  • created_at;
  • row count.

37. Query Versioning

If report logic changes, version it.

RegulatoryCaseReport.v1
RegulatoryCaseReport.v2

Old reports remain reproducible.

Do not silently change historical report semantics.


38. Schema Coupling in API

If API response maps directly to SQL table columns, schema change breaks clients.

Use API response DTO and mapping.

Data access query can evolve independently from API contract.


39. Read Model Ownership

Read model table is SQL artifact.

Define:

  • source of truth;
  • projector owner;
  • schema owner;
  • freshness SLO;
  • rebuild strategy;
  • indexes;
  • consumers;
  • deletion/archive policy.

Read model without owner becomes stale data swamp.


40. Slow Query Runbook

When slow query alert fires:

  1. Identify query name.
  2. Find owner.
  3. Check recent deploy/migration.
  4. Inspect current SQL and plan.
  5. Check data volume/statistics.
  6. Check index usage.
  7. Check locks/waits.
  8. Check query count/N+1.
  9. Check DB health.
  10. Mitigate: kill, add index, disable feature, reduce limit, switch read model, rollback.
  11. Add test/monitor to prevent recurrence.

41. Query Kill Safety

Some admin/report queries can run too long.

Have runbook:

  • how to identify query by comment/application name;
  • how to cancel safely;
  • what user sees;
  • whether job retries;
  • whether partial output deleted;
  • whether cursor/checkpoint advances.

Design long jobs for cancellation.


42. Query Feature Flags

For risky new query path:

  • deploy behind flag;
  • sample traffic;
  • compare metrics;
  • ramp gradually;
  • fallback to old query/read model;
  • monitor DB load.

Useful for major dashboard/report change.


43. Backward-Compatible Query Change

Example: new column priority_rank.

Step:

  1. add nullable column;
  2. backfill;
  3. app writes both old priority and rank;
  4. query reads rank if present else computes;
  5. enforce not null after complete;
  6. remove old compute later.

Query code must handle transition.


44. Query and Data Retention

Queries over old archived data can become slow.

Strategies:

  • archive partitions;
  • separate history table;
  • partial indexes active data;
  • reporting store;
  • retention policy;
  • query requires date range.

Do not let active dashboard scan 10 years of archive.


45. Query and Partitioning

If table partitioned, query must include partition key.

Example:

where tenant_id = ?
  and created_at >= ?

Without partition predicate, query scans many partitions.

Review generated SQL for partition pruning.


46. Query and Multi-Tenant Hotspot

Hot tenant may need:

  • tenant-specific partitioning;
  • index tuning;
  • rate limits;
  • read model;
  • async export;
  • per-tenant query budgets;
  • query plan analysis with tenant-specific parameter.

Multi-tenant performance is data distribution problem.


47. Query and Cost Budget

Every feature has data cost.

Ask:

Can this endpoint be called by every user every 5 seconds?
What DB cost per call?
What if tenant has 10 million rows?
What if filter absent?
What if keyword has low selectivity?

Set:

  • max page size;
  • required filters for expensive search;
  • async export for large result;
  • timeout.

48. Query and API Design

API should not expose arbitrary SQL flexibility unless intentionally building query product.

Bad:

GET /cases?filter=anyColumn:anyOperator:anyValue&sort=anyColumn

This creates unreviewable database workload.

Better:

  • specific filters;
  • specific sort whitelist;
  • max limit;
  • query object;
  • export job for large operations.

49. Ownership Across Tools

Whether query written in:

  • JPQL;
  • Criteria;
  • jOOQ;
  • MyBatis XML;
  • JDBC string;
  • stored procedure;
  • database view;

ownership rules are same.

Tool does not remove SQL responsibility.


50. Stored Procedure Reviewability

Stored procedure is code.

It needs:

  • version control;
  • migration;
  • tests;
  • owner;
  • explain plan;
  • error contract;
  • transaction semantics;
  • observability;
  • deployment compatibility.

Do not treat stored procedure as invisible database magic.


51. Query Contract Template

# Query Contract: CaseDashboardQuery.search

Owner:
Use case:
Consistency:
Tables/views:
Parameters:
Security scope:
Cardinality:
Pagination:
Sort:
Indexes:
Timeout:
Failure semantics:
Metrics:
Tests:
Migration notes:

Use this for critical queries, not every trivial lookup.


52. SQL Review Checklist

  • Query has owner/name.
  • Use case clear.
  • Explicit column list.
  • Tenant/security predicate present.
  • Parameters bound.
  • Raw fragments whitelisted.
  • Result bounded.
  • Sort deterministic.
  • Count query reviewed/avoided.
  • Index supports filter/order.
  • Explain plan reviewed for critical path.
  • Lock behavior understood.
  • Timeout/deadline defined.
  • Error/cardinality semantics clear.
  • Metrics/logs identify query.
  • Tests run on real DB.
  • Migration compatibility considered.
  • Report/audit requirements satisfied if applicable.

53. Anti-Pattern: Query Without Name

If slow query appears in DB log and nobody knows its source, ownership failed.


54. Anti-Pattern: select *

Leaky and fragile.


55. Anti-Pattern: ORM Generated SQL Never Reviewed

Generated SQL is still SQL.


56. Anti-Pattern: Index Without Owner

Indexes have cost and lifecycle.


57. Anti-Pattern: Dashboard Query Used as Regulatory Report

Reports need reproducibility and evidence.


58. Anti-Pattern: Arbitrary User-Defined Filters Against OLTP

This becomes uncontrolled analytics workload.


59. Mini Lab

Create query contract for:

Supervisor case dashboard search

Include:

  • owner;
  • query name;
  • table dependencies;
  • selected columns;
  • tenant/visibility predicate;
  • filters;
  • sort;
  • pagination;
  • count strategy;
  • indexes;
  • explain plan notes;
  • timeout;
  • row count metric;
  • tests;
  • migration risk;
  • when to move to read model.

Then review it using checklist above.


60. Summary

SQL ownership and reviewability are what make data access sustainable at production scale.

You must master:

  • query naming;
  • query catalog;
  • SQL comments;
  • generated SQL review;
  • explain plan;
  • data distribution awareness;
  • schema coupling;
  • migration compatibility;
  • constraint/index ownership;
  • query cardinality;
  • security and SQL injection review;
  • observability;
  • timeout/lock review;
  • real DB testing;
  • SQL style guide;
  • regulatory defensibility;
  • report snapshot/query versioning;
  • slow query runbook;
  • feature flag/ramp;
  • query contract template.

This closes Phase 6. Part berikutnya masuk Phase 7: Reactive, Async, dan Non-Blocking Data Access, dimulai dari R2DBC mental model.


61. References

Lesson Recap

You just completed lesson 48 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.