Final StretchOrdered learning track

Async Boundaries and Database Pressure

Learn Java Data Access Pattern In Action - Part 051

Async boundaries dan database pressure untuk Java production: queue, backpressure, rate limit, bulkhead, timeout, retry storm, pool exhaustion, worker concurrency, outbox/inbox pressure, load shedding, fairness, dan observability.

14 min read2637 words
PrevNext
Lesson 5160 lesson track51–60 Final Stretch
#java#data-access#async#database-pressure+6 more

Part 051 — Async Boundaries and Database Pressure

Async sering disalahpahami sebagai solusi bottleneck.

Padahal async hanya memindahkan pekerjaan ke boundary lain:

request thread -> executor queue
HTTP request -> message queue
command -> outbox worker
sync call -> async job
blocking wait -> pending future

Jika database adalah bottleneck, async tidak menghapus bottleneck itu.

Async tanpa backpressure justru bisa membuat pressure ke database lebih sulit dilihat sampai akhirnya pool penuh, queue menumpuk, retry storm terjadi, dan latency collapse.

Part ini membahas async boundaries dan cara melindungi database di production.


1. Core Thesis

Async boundary adalah alat untuk mengatur waktu, isolation, dan resilience.

Async bukan alat untuk membuat database punya kapasitas tak terbatas.

Rule utama:

Every async boundary needs:
- queue limit;
- concurrency limit;
- timeout;
- retry budget;
- backpressure/load shedding;
- observability;
- idempotency;
- failure handling;
- database capacity awareness.

Jika tidak, async boundary hanya menjadi tempat menyembunyikan overload.


2. Database Pressure Mental Model

Database punya kapasitas nyata:

  • CPU;
  • memory;
  • I/O;
  • locks;
  • connection count;
  • transaction throughput;
  • index maintenance;
  • replication lag;
  • checkpoint/vacuum/background work;
  • query planner/statistics;
  • row/page contention.

Aplikasi bisa membuat pressure melalui:

  • terlalu banyak concurrent query;
  • query lambat;
  • retry storm;
  • batch terlalu besar;
  • export/report;
  • unbounded async worker;
  • lock contention;
  • missing index;
  • connection leak;
  • long transaction;
  • message backlog catch-up.

Async tidak mengubah kapasitas database. Async hanya mengubah pola kedatangan beban.


3. Synchronous vs Async Pressure

Synchronous request:

client waits
app thread waits
DB connection held
failure visible to client

Async job:

client gets accepted
work queued
worker later hits DB
failure may be hidden/retried
backlog can accumulate

Async improves user-perceived latency for accepted work, but total DB work remains.

If arrival rate > processing capacity:

queue grows forever

unless bounded/load-shed.


4. Queue Is Not Capacity

Queue buffers mismatch between arrival and processing.

It does not increase processing capacity.

If:

arrival = 1000 jobs/s
database can process = 300 jobs/s

then backlog grows:

700 jobs/s

Eventually:

  • memory full;
  • queue storage grows;
  • jobs expire;
  • retry storm;
  • lag SLO violated;
  • user sees stale data;
  • recovery takes long.

Queue must have limit and SLO.


5. Little's Law

Useful mental model:

L = λ * W

Where:

  • L: number of items in system;
  • λ: arrival rate;
  • W: average time in system.

If average job time increases because DB slows, queue length grows even if arrival rate same.

This is why slow DB query can suddenly create huge async backlog.


6. Connection Pool as Bulkhead

Connection pool bounds concurrent DB usage.

Example:

maxPoolSize = 50

Only 50 connections active.

But if 10,000 async tasks wait for connection:

  • memory grows;
  • latency grows;
  • request queues grow;
  • timeouts cascade;
  • retries amplify.

Connection pool is not enough. You also need upstream concurrency limits.


7. Application Bulkhead

Use semaphore/bulkhead per expensive DB path.

Imperative example:

public final class DbBulkhead {
    private final Semaphore semaphore;

    public DbBulkhead(int permits) {
        this.semaphore = new Semaphore(permits);
    }

    public <T> T execute(Callable<T> operation) throws Exception {
        if (!semaphore.tryAcquire(100, TimeUnit.MILLISECONDS)) {
            throw new DatabaseBusyException();
        }

        try {
            return operation.call();
        } finally {
            semaphore.release();
        }
    }
}

Reactive:

.flatMap(item -> process(item), concurrencyLimit)

Bulkhead protects database before connection pool queues explode.


8. Per-Operation Bulkhead

Not every DB operation has same cost.

Examples:

OperationSuggested Boundary
simple lookupnormal pool
dashboard searchper-endpoint concurrency limit
export queryvery low concurrency
backfill jobdedicated worker concurrency
outbox publisherbounded batch/worker
admin reportqueue + rate limit
write commandpool + timeout + idempotency

Do not let report/export consume all connections needed by user commands.


9. Priority and Fairness

Database pressure should consider priority.

High priority:

  • user write command;
  • outbox/inbox critical processing;
  • health-critical read;
  • small lookup.

Lower priority:

  • export;
  • report;
  • backfill;
  • admin analytics;
  • cache warmup.

Use:

  • separate worker pools;
  • separate DB users/pools if needed;
  • concurrency limits;
  • job priority queues;
  • time windows;
  • rate limits.

Fairness prevents one tenant/job from starving others.


10. Per-Tenant Rate Limit

Multi-tenant system can have hot tenant.

If one tenant runs massive export, all tenants suffer unless isolated.

Options:

  • per-tenant request rate limit;
  • per-tenant async job concurrency;
  • per-tenant quota;
  • query cost budget;
  • separate read model/export pipeline;
  • queue partitioning;
  • hot tenant isolation.

Data access layer should carry tenant ID for metrics and policy, but avoid high-cardinality metric tags unless aggregated carefully.


11. Backpressure

Backpressure means upstream slows down or rejects work when downstream cannot keep up.

Forms:

  • HTTP 429/503;
  • queue full rejection;
  • worker stops polling messages;
  • bounded flatMap;
  • semaphore reject;
  • circuit breaker open;
  • async job admission control;
  • limit page/export size;
  • require narrower filters.

Without backpressure, overload becomes memory/latency incident.


12. Load Shedding

Sometimes correct answer is reject.

Example:

Database busy.
Dashboard search rejected quickly with 503.

This is better than:

Request waits 30 seconds, times out, retries, worsens overload.

Load shedding should be:

  • fast;
  • explicit;
  • observable;
  • user/message aware;
  • safe for idempotent retry where appropriate.

13. Timeout Hierarchy

Timeouts must be layered coherently:

HTTP request timeout
  > application operation timeout
    > transaction timeout
      > query timeout
        > lock timeout
          > pool acquisition timeout

Actually outer timeout should be greater than inner operation time budgets but not too much greater.

Bad:

HTTP timeout 2s
DB query timeout 30s

Client gives up while DB keeps working.

Better:

pool wait 100ms
lock wait 200ms
query timeout 800ms
operation timeout 1s
HTTP timeout 1.5s

Adapt per operation.


14. Pool Acquisition Timeout

If no DB connection available quickly, fail or shed.

Long pool wait means system already overloaded.

Metric:

db.pool.pending
db.pool.acquire.duration
db.pool.timeout.count

Do not let all requests wait indefinitely for connection.


15. Query Timeout

Query timeout prevents runaway SQL.

But query timeout behavior depends driver/database.

Use database-side statement timeout if possible.

Application timeout without DB cancellation may leave query running.

For critical DBs, verify cancellation semantics.


16. Lock Timeout

Lock wait can consume connection and transaction time.

For interactive operations, use lock timeout/fail-fast.

For background jobs, skip/retry later.

Do not let a worker wait minutes on a locked row unless explicitly intended.


17. Retry Storm

Retry storm occurs when failure causes many clients/workers to retry simultaneously.

Example:

DB slow -> timeouts -> retry immediately -> more DB load -> more timeouts

Mitigations:

  • exponential backoff;
  • jitter;
  • retry budget;
  • circuit breaker;
  • idempotency;
  • classify retryable errors;
  • cap concurrent retries;
  • do not retry overloaded DB blindly.

18. Retry Budget

Retry should have budget:

max attempts
max elapsed time
max concurrent retries
error types
jitter/backoff
operation idempotency

Not all errors retryable.

Retry:

  • deadlock;
  • serialization failure;
  • transient network;
  • lock timeout for worker maybe.

Do not retry:

  • validation;
  • authorization;
  • duplicate business key;
  • user optimistic conflict;
  • bad SQL;
  • missing required data.

19. Circuit Breaker

Circuit breaker can stop calls when DB/downstream failing.

But database is often core dependency. If circuit opens, what is fallback?

  • reject low-priority reads;
  • serve stale cache where safe;
  • degrade dashboard;
  • stop polling queue;
  • fail fast commands with explicit error;
  • keep health endpoint honest.

Circuit breaker is not substitute for capacity planning.


20. Async Job Admission

When user requests export:

POST /exports

Do not always accept.

Check:

  • user quota;
  • tenant concurrent export limit;
  • global export queue length;
  • estimated cost;
  • required filters/date range;
  • DB health.

If queue full, return:

429 Too Many Requests

or application-specific rejection.

Accepted job should have durable job ID and status.


21. Outbox Worker Pressure

Outbox publisher can overload DB if it polls too aggressively.

Controls:

  • batch size;
  • poll interval;
  • max concurrent publishers;
  • claim query timeout;
  • publish concurrency;
  • mark-published batch size;
  • backoff when no rows;
  • backoff on DB error;
  • lease duration;
  • dead-letter policy.

Outbox is async boundary. Treat it with pressure controls.


22. Outbox Polling Anti-Pattern

Bad:

while (true) {
    List<Event> events = outbox.claim(1000);
    events.parallelStream().forEach(publisher::publish);
}

Problems:

  • huge batch;
  • unbounded publish concurrency;
  • DB pressure marking rows;
  • broker pressure;
  • no backoff;
  • no idempotency clarity.

Better:

claim small batch
publish with bounded concurrency
mark with ownership check
backoff when empty/error
emit metrics

23. Inbox Consumer Pressure

Message consumer may ingest faster than DB can process.

Controls:

  • consumer concurrency;
  • partition ordering;
  • max in-flight messages;
  • ack after DB commit;
  • retry backoff;
  • dead-letter;
  • idempotency table;
  • DB bulkhead.

If using Kafka-like system, pausing partitions can be backpressure.


24. Ordering vs Concurrency

If events for same aggregate must be ordered, high concurrency can break correctness.

Use:

  • partition by aggregate ID;
  • group processing by aggregate;
  • version check idempotent projection;
  • per-aggregate lock/semaphore;
  • concatMap per group in reactive.

Database pressure control must not violate ordering semantics.


25. Backfill Job Pressure

Backfill can destroy OLTP database if uncontrolled.

Controls:

  • chunk size;
  • sleep/jitter between chunks;
  • max rows/sec;
  • low priority connection pool;
  • off-peak scheduling;
  • index-friendly cursor;
  • resume cursor;
  • transaction per chunk;
  • query timeout;
  • kill switch;
  • progress metrics.

Backfill should be polite.


26. Export/Report Pressure

Large export should not run as synchronous API query.

Pattern:

request export
validate/admit job
write job row
worker processes chunks
writes file/object storage
user downloads later

Controls:

  • max date range;
  • required filters;
  • per-user/tenant quota;
  • snapshot/cutoff;
  • cursor checkpoint;
  • cancel support;
  • separate concurrency pool.

27. Queue Lag Metrics

For async systems, measure lag.

Metrics:

queue.depth
queue.oldest_age
queue.processing_rate
queue.arrival_rate
job.duration
job.failure.count
outbox.lag.seconds
projection.lag.seconds
backfill.progress.rows

Lag is user-visible freshness/correctness signal.


28. DB Pressure Metrics

Metrics:

db.pool.active
db.pool.idle
db.pool.pending
db.pool.acquire.duration
db.query.duration{query}
db.query.timeout.count{query}
db.transaction.duration{operation}
db.lock_wait.count
db.deadlock.count
db.rows.returned{query}
db.rows.updated{operation}

Correlate with async metrics.

Queue depth rising + DB query latency rising = downstream saturation.


29. Saturation Signals

Signs database under pressure:

  • pool pending grows;
  • query p95/p99 grows;
  • lock waits;
  • deadlocks;
  • CPU high;
  • I/O high;
  • checkpoint/vacuum impact;
  • replication lag;
  • queue lag;
  • retry count grows;
  • timeout count grows.

When saturation happens, reduce load before adding more workers.


30. Worker Autoscaling Caveat

Autoscaling workers can worsen DB overload.

If queue grows because DB is slow, adding workers increases DB concurrency and pressure.

Autoscale based on:

  • queue depth;
  • DB pool wait;
  • DB CPU/IO;
  • query latency;
  • error rate;
  • downstream capacity.

Sometimes correct autoscale is scale down or pause.


31. Rate Limit at Source

For user-facing API:

  • limit expensive searches;
  • max page size;
  • require filters for export;
  • per-user export quota;
  • per-tenant write rate;
  • debounce auto-refresh;
  • cache/read model safe data.

Prevent bad workload before it hits DB.


32. Debounce and Coalesce

If UI auto-refreshes dashboard every 2 seconds for many users, DB sees repeated same query.

Options:

  • client debounce;
  • server response cache with short TTL if safe;
  • read model;
  • push updates instead of polling;
  • per-user rate limit;
  • request coalescing.

Do not let frontend behavior become accidental DB DDoS.


33. Request Coalescing

If many identical reads arrive:

same tenant dashboard same filter

coalesce in application/cache if freshness allows.

But cache key must include tenant/scope/filter.

Do not coalesce security-sensitive queries without scope key.


34. Async Boundary and Idempotency

Async work can run more than once:

  • retry;
  • worker crash after commit before ack;
  • duplicate message;
  • timeout after unknown commit;
  • queue redelivery.

Every async write needs idempotency.

Patterns:

  • command ID;
  • unique event key;
  • inbox table;
  • outbox event ID;
  • projection source version;
  • idempotent upsert;
  • dedup table.

35. Async Boundary and Exactly-Once Illusion

Exactly-once end-to-end is usually illusion.

Aim for:

at-least-once delivery + idempotent processing

or:

effectively-once observable outcome

Database uniqueness and idempotent updates are key.

Async boundary without idempotency causes duplicate side effects under failure.


36. Timeout and Unknown Outcome

If DB operation times out, outcome may be unknown:

client timed out
transaction may commit or rollback depending layer

Use idempotency key for retry.

Do not blindly re-execute non-idempotent create/update.

Command result table can store outcome.


37. Bulkhead By Workload

Separate:

  • OLTP command pool;
  • read/query pool;
  • export/backfill pool;
  • worker pool;
  • outbox publisher concurrency.

This can be separate executors, semaphores, or even separate DB roles/pools.

Goal:

low-priority async work cannot starve critical user writes.

38. Separate Read Replica?

Read replica can offload reads, but:

  • replica lag;
  • stale reads;
  • read-your-writes issue;
  • replication pressure;
  • failover complexity;
  • query still needs indexes;
  • cache may store stale replica data.

Use for read/query/report if stale tolerance defined.

Do not validate commands on replica.


39. Async and Transaction Boundary

Do not keep transaction open across async wait.

Bad:

begin transaction
send message/HTTP request
wait
update DB
commit

Use split-phase:

  1. transaction records intent;
  2. async worker performs external operation;
  3. transaction records result;
  4. state machine ensures durable progress.

40. External Calls Inside Transaction

External calls inside DB transaction:

  • hold connection/locks while waiting;
  • external success + DB rollback inconsistency;
  • DB commit + external fail inconsistency;
  • slow downstream causes DB pressure.

Use outbox/saga/reservation pattern.


41. State Machine for Async Workflow

Long-running async operation should be state machine.

Example:

REQUESTED
RESERVED
PROCESSING
COMPLETED
FAILED_RETRYABLE
FAILED_PERMANENT
CANCELLED

Each transition is short DB transaction.

Workers pick eligible states with bounded concurrency.


42. Lease Pattern

Worker claim:

update job
set claimed_by = ?, claimed_at = ?
where id = ?
  and status = 'READY'
  and (claimed_at is null or claimed_at < ?)

or claim batch with skip locked.

Lease prevents two workers processing same job forever.

Job processing must still be idempotent because lease can expire.


43. Kill Switch

For dangerous async workload:

  • backfill;
  • export;
  • projection rebuild;
  • reindex;
  • bulk repair;

provide kill switch/pause flag.

Worker checks before next chunk.

Do not require redeploy to stop DB overload.


44. Adaptive Throttling

Worker can adjust rate based on DB health:

  • pool wait;
  • query latency;
  • timeout rate;
  • replication lag;
  • CPU.

If DB unhealthy, reduce concurrency or pause.

Keep logic simple and observable.


45. Testing Pressure Controls

Test:

  • queue full rejection;
  • pool acquisition timeout mapping;
  • worker concurrency limit;
  • retry backoff;
  • idempotent duplicate processing;
  • cursor not advanced on failure;
  • cancellation/pause flag;
  • low-priority job cannot consume all permits;
  • outbox publisher bounded concurrency;
  • stale lease reclaim.

Load test for real pressure behavior.


46. Chaos Scenario

Inject:

  • slow DB query;
  • lock wait;
  • deadlock;
  • connection acquisition timeout;
  • broker publish timeout;
  • worker crash after DB commit before ack;
  • duplicate message.

Observe:

  • retries bounded?
  • queue grows bounded?
  • DB protected?
  • idempotency works?
  • alerts fire?
  • recovery automatic?

47. Review Checklist

  • Async boundary has bounded queue.
  • Worker concurrency limited.
  • DB pool pressure monitored.
  • Operation has timeout hierarchy.
  • Retry budget with jitter/backoff.
  • Idempotency key/dedup exists.
  • Queue lag metric exists.
  • Low-priority work cannot starve commands.
  • Export/backfill chunked and resumable.
  • No external call inside DB transaction.
  • State machine for long-running workflow.
  • Lease/claim ownership checked.
  • Kill switch exists for dangerous jobs.
  • Autoscaling considers DB health.
  • Load shedding behavior defined.
  • Tests cover duplicate/retry/rollback.

48. Anti-Pattern: Async as Unlimited Buffer

Queue without limit is delayed outage.


49. Anti-Pattern: Add More Workers When DB Is Slow

Can worsen saturation.


50. Anti-Pattern: Retry Immediately Without Jitter

Retry storm.


51. Anti-Pattern: Export as Synchronous Query

Slow client/request holds DB resources.


52. Anti-Pattern: Message Consumer Ack Before DB Commit

Can lose work.


53. Anti-Pattern: External HTTP Call Inside Transaction

Holds DB resources and creates consistency problem.


54. Anti-Pattern: No Idempotency Across Async Boundary

Duplicate processing is inevitable under failure.


55. Mini Lab

Design async export system:

User requests case export.
Export can include up to 5 million rows.
System must not overload OLTP DB.
User can cancel job.
Job can resume after worker crash.
Result file stored in object storage.

Tasks:

  1. Define job states.
  2. Define admission control.
  3. Define chunk query.
  4. Define cursor/checkpoint.
  5. Define transaction boundary per chunk.
  6. Define worker concurrency.
  7. Define DB timeout.
  8. Define cancellation/kill switch.
  9. Define metrics/alerts.
  10. Define idempotency and retry behavior.

56. Summary

Async boundary is pressure management, not free capacity.

You must master:

  • database pressure model;
  • queue is not capacity;
  • connection pool as bulkhead;
  • application bulkhead;
  • backpressure;
  • load shedding;
  • timeout hierarchy;
  • retry budget;
  • retry storm;
  • per-tenant fairness;
  • outbox/inbox pressure;
  • backfill/export controls;
  • queue lag metrics;
  • DB saturation metrics;
  • autoscaling caveat;
  • idempotency;
  • unknown outcome;
  • state machine;
  • lease pattern;
  • kill switch;
  • adaptive throttling;
  • pressure tests.

Part berikutnya membahas Virtual Threads and Data Access: bagaimana Java virtual threads mengubah model blocking JDBC, kenapa pool tetap bottleneck, dan bagaimana menerapkan structured concurrency serta timeout discipline.


57. References

Lesson Recap

You just completed lesson 51 in final stretch. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.