Series/Learn Java Jakarta RESTful Web Services / JAX-RS

Deepen PracticeOrdered learning track

Client Resilience

Learn Java Jakarta RESTful Web Services / JAX-RS - Part 022

Design resilience for Jakarta REST clients: timeout budgets, retries, circuit breakers, bulkheads, idempotency keys, failure classification, observability, and fallback boundaries.

[2026-06-27]20 min read3878 words

In This Lesson

1. Kaufman Deconstruction 2. Failure Taxonomy 3. Timeout Budgeting

PrevNext

Lesson 2235 lesson track20–29 Deepen Practice

#java#jakarta-ee#jakarta-rest#jax-rs+6 more

Part 022 — Client Resilience: Timeouts, Retries, Circuit Breakers, Idempotency Keys, and Failure Classification

Goal: design outbound REST calls so failures are bounded, classified, observable, and safe to recover from.

A REST client that works in local development is easy. A REST client that behaves correctly under partial failure is hard.

This part answers a different question from Part 021:

Part 021: how do we use Jakarta REST Client API correctly?
Part 022: how do we keep outbound HTTP integrations from damaging the whole system when the remote side is slow, broken, overloaded, inconsistent, or ambiguous?

The core principle:

Resilience is not “retry everything”. Resilience is controlled failure behavior.

In distributed systems, every outbound HTTP call can fail in multiple ways:

DNS failure;
connection failure;
TLS failure;
timeout;
connection pool exhaustion;
request serialization failure;
remote 4xx;
remote 5xx;
invalid response body;
response status/body mismatch;
slow response;
partial response;
rate limiting;
duplicate command execution;
success response lost after remote mutation;
local cancellation after remote side already processed the request.

A top-tier engineer models those cases explicitly instead of hiding them behind RuntimeException.

1. Kaufman Deconstruction

1.1 Target Performance Level

After this part, you should be able to:

assign time budgets to outbound calls;
distinguish timeout types;
classify remote failures into retryable and non-retryable categories;
use retries only when safe;
use idempotency keys for mutation retries;
prevent retry storms;
apply circuit breakers for repeated remote failures;
apply bulkheads to isolate remote dependency saturation;
design fallback behavior without lying to users;
expose meaningful outbound metrics/logs/traces;
test resilience behavior deterministically.

1.2 The Real Skill

The skill is not memorizing annotations. The skill is answering:

“What should this system do when the dependency behaves badly?”

That requires:

protocol knowledge;
domain semantics;
operation idempotency;
latency budgets;
user journey impact;
state consistency model;
observability design;
incident thinking.

2. Failure Taxonomy

Start with a taxonomy.

Not all failures are equal.

Failure	Usually retryable?	Why
invalid request DTO serialization	no	local bug or bad data
bad base URI	no	configuration bug
DNS temporary failure	maybe	infrastructure/transient
connect timeout	maybe	dependency or network issue
read timeout on GET	maybe	operation may be safe
read timeout on POST	dangerous	remote may have processed mutation
`400`	no	caller sent invalid request
`401`	maybe after token refresh	auth may be expired
`403`	no	authorization denial
`404`	context-dependent	may be valid absence
`409`	usually no automatic retry	business conflict
`412`	no blind retry	optimistic concurrency failure
`429`	yes, respecting `Retry-After`	rate limiting
`500`	maybe	remote transient or bug
`503`	maybe, respecting `Retry-After`	remote unavailable
invalid JSON on `200`	no blind retry	contract/runtime bug

Resilience policy starts with classification.

3. Timeout Budgeting

Timeouts are not an implementation detail. They are part of system design.

Without timeouts, your service can wait too long for a dependency. Under load, this can exhaust request threads, connection pools, event loops, memory, or user patience.

3.1 Timeout Types

Timeout	Meaning	Common failure mode if missing
connect timeout	maximum time to establish TCP connection	threads stuck connecting
TLS handshake timeout	maximum time for TLS handshake	secure connection stalls
read/socket timeout	maximum idle/read wait	dependency hangs after accept
request timeout	total allowed time for call	retries exceed user budget
pool acquire timeout	wait for available client connection	hidden queue buildup
DNS timeout	name resolution duration	startup/runtime stalls

The Jakarta REST API standardizes the client abstraction, but some timeout knobs are implementation-specific. Treat timeout configuration as part of your adapter factory, not scattered call-site code.

3.2 Budget from User Journey

Do not pick random values like “30 seconds”. Start from user journey SLO.

Example:

User action: Submit escalation
End-to-end target p95: 800 ms
Application processing budget: 150 ms
Database budget: 100 ms
Case registry outbound budget: 250 ms
Evidence service outbound budget: 150 ms
Buffer: 150 ms

Then configure outbound calls accordingly.

3.3 Deadline vs Timeout

A timeout is often local to one attempt. A deadline is the total budget for the operation.

If each retry gets a full timeout without a total deadline, retries can exceed the user request budget.

Bad:

attempt timeout = 2s
max retries = 3
actual max latency ≈ 8s with backoff
inbound request target = 1s

Better:

total deadline = 800ms
attempt timeout = min(remaining budget, configured per-attempt cap)

4. Retry: Use as a Scalpel, Not a Hammer

Retry is useful when failures are transient and the operation is safe to repeat.

Retry is harmful when it:

repeats non-idempotent mutations;
amplifies remote overload;
hides real bugs;
increases tail latency;
creates duplicate case actions;
makes audit trails ambiguous;
violates rate limits;
causes retry storms.

4.1 Retry Decision Matrix

Operation type	Retry policy
safe read `GET`	often retryable for transport/5xx/429
idempotent `PUT` with deterministic body	retryable if safe under contract
`DELETE`	retryable if delete semantics are idempotent
mutation `POST` without idempotency key	avoid automatic retry after ambiguous failure
mutation `POST` with idempotency key	retryable if remote contract guarantees deduplication
workflow transition	retry only with idempotency and state-aware result mapping

4.2 Retryable Failures

Usually retryable:

connect timeout;
connection reset before request body sent;
temporary DNS/network failure;
503 Service Unavailable;
502 Bad Gateway;
504 Gateway Timeout;
429 Too Many Requests when respecting Retry-After;
read timeout for safe reads.

Usually not retryable:

400 Bad Request;
401 Unauthorized unless token refresh is performed;
403 Forbidden;
404 Not Found if absence is valid;
409 Conflict unless conflict resolution is implemented;
412 Precondition Failed;
serialization failure;
invalid response schema;
validation failure.

4.3 Retry with Backoff and Jitter

Bad retry:

retry immediately 3 times

Better:

attempt 1
wait 50ms + jitter
attempt 2
wait 100ms + jitter
attempt 3

Jitter matters because many clients retrying at the same fixed interval can synchronize and overload the recovering service.

4.4 Retry Budget

Bound retries with:

max attempts;
max elapsed time;
per-attempt timeout;
circuit breaker state;
rate limit awareness;
idempotency eligibility.

Pseudo-policy:

public record RetryPolicy(
    int maxAttempts,
    Duration initialBackoff,
    Duration maxBackoff,
    boolean jitterEnabled
) {}

But do not bury semantics inside RetryPolicy. Operation safety still matters.

5. Idempotency Keys

Idempotency keys solve one of the hardest mutation problems:

The client timed out after sending a mutation. Did the remote service execute it?

For non-idempotent commands, a retry can create duplicates.

Example:

POST /cases/CASE-1/escalations

If the client times out after the remote service creates the escalation, retrying the same POST can create a second escalation unless the remote API deduplicates.

5.1 Idempotency Key Model

Client sends:

Idempotency-Key: 7b8f7f6e-6b60-4a7c-b7c5-1a4e39c6f999

Remote server stores:

idempotency_key -> operation result

When the same key is received again for the same operation, the server returns the same result or an equivalent duplicate-safe response.

5.2 Client-Side Rule

Generate the idempotency key from command identity, not from each HTTP attempt.

Bad:

for (int attempt = 0; attempt < 3; attempt++) {
    String key = UUID.randomUUID().toString(); // wrong: new key per attempt
    send(command, key);
}

Better:

String key = command.idempotencyKey();
for (int attempt = 0; attempt < 3; attempt++) {
    send(command, key);
}

5.3 Domain-Derived Idempotency Key

For regulatory/case-management commands, the key can be derived from stable command identity:

public record EscalationCommand(
    String caseId,
    String requestedBy,
    String reasonCode,
    Instant requestedAt,
    String commandId
) {
    public String idempotencyKey() {
        return commandId;
    }
}

Do not derive idempotency key from mutable fields unless you want changes to create a new operation identity.

5.4 Idempotency Scope

Define scope clearly:

Scope	Example
per endpoint	key unique for `/escalations` only
per tenant	key unique within tenant
per actor	key unique per user/actor
global	key unique across service
per command type	key unique for escalation commands only

Ambiguous scope causes false duplicates or duplicate misses.

6. Circuit Breaker

A circuit breaker prevents repeated calls to a failing dependency.

States:

State	Meaning
closed	calls allowed; failures counted
open	calls fail fast; remote dependency protected
half-open	limited probe calls allowed

6.1 When Circuit Breaker Helps

Use circuit breaker when:

dependency has repeated failures;
calls are expensive;
failure consumes scarce resources;
users get faster failure than waiting for timeout;
fallback or graceful degradation exists;
you need to protect your own service from thread/connection exhaustion.

6.2 When Circuit Breaker Hurts

A bad circuit breaker can:

block recovery too long;
fail fast when dependency is healthy again;
hide partial regional recovery;
affect all tenants because one tenant triggers failures;
interact badly with retries;
create confusing error behavior.

6.3 Circuit Breaker Dimensions

Do not always use one global circuit per remote service.

Possible dimensions:

remote service;
endpoint/operation;
tenant;
region;
auth mode;
criticality.

Example:

case-registry:GET:/cases/{id}
case-registry:POST:/cases/{id}/escalations
case-registry:GET:/reference-data

A failure in reference data should not necessarily open the breaker for escalation submission.

7. Bulkhead

A bulkhead limits how much concurrency one dependency or operation can consume.

Without bulkheads, a slow remote dependency can consume all request threads or all outbound connections.

7.1 Bulkhead Types

Bulkhead type	Meaning
semaphore bulkhead	limit concurrent calls
thread-pool bulkhead	isolate execution threads
connection pool limit	limit physical remote connections
queue limit	limit waiting tasks
rate limit	limit call frequency

7.2 Bulkhead Failure Is a Real Failure

If a bulkhead is full, the service should return a classified failure quickly.

throw new RemoteDependencySaturatedException("case-registry");

Do not let bulkhead queues grow unbounded. Queues can turn overload into latency and memory failure.

8. Fallbacks

Fallback is not a synonym for “hide the error”.

Good fallback examples:

serve cached reference data;
return partial view with warning;
enqueue command for later processing if business allows async acceptance;
use stale read model for non-critical dashboard;
degrade optional enrichment.

Bad fallback examples:

return fake success for a failed mutation;
silently skip audit write;
mark escalation created when remote registry failed;
use stale authorization data for sensitive action;
suppress evidence upload failure.

8.1 Fallback Decision Table

Operation	Fallback allowed?	Why
fetch non-critical display label	yes	can degrade UI
fetch case before legal decision	maybe	depends on data freshness requirement
submit enforcement action	usually no fake success	legal/audit consequences
upload evidence	no silent fallback	evidentiary integrity
send notification	maybe async retry	notification may be eventually consistent

In regulated systems, fallback must preserve truthfulness.

A fallback may degrade experience; it must not falsify state.

9. MicroProfile Fault Tolerance

In Jakarta/MicroProfile environments, MicroProfile Fault Tolerance provides standard annotations for resilience patterns such as:

@Timeout;
@Retry;
@CircuitBreaker;
@Bulkhead;
@Fallback;
@Asynchronous.

Example:

@ApplicationScoped
public class CaseRegistryGateway {

    @Retry(maxRetries = 2, delay = 100, jitter = 50)
    @Timeout(300)
    @CircuitBreaker(requestVolumeThreshold = 20, failureRatio = 0.5, delay = 1000)
    public CaseSnapshot getCase(String caseId) {
        return callCaseRegistry(caseId);
    }
}

This is useful, but annotations are not magic. You still need operation-specific semantics.

9.1 Annotation Risk

Bad:

@Retry(maxRetries = 3)
public EscalationResult submitEscalation(EscalationCommand command) {
    return remote.submit(command);
}

If the command is non-idempotent, this can duplicate an escalation.

Better:

@Retry(maxRetries = 2, retryOn = RetryableRemoteFailure.class)
public EscalationResult submitEscalation(EscalationCommand command) {
    requireIdempotencyKey(command);
    return remote.submit(command);
}

And ensure the remote API actually honors the idempotency key.

9.2 Classify Before Retry

Your adapter should throw meaningful exceptions:

sealed class RemoteCallFailure extends RuntimeException permits
    RetryableRemoteFailure,
    NonRetryableRemoteFailure,
    AmbiguousMutationFailure {}

Then resilience policy can decide:

@Retry(
    maxRetries = 2,
    retryOn = RetryableRemoteFailure.class,
    abortOn = {
        NonRetryableRemoteFailure.class,
        AmbiguousMutationFailure.class
    }
)
public CaseSnapshot findCase(String caseId) {
    return adapter.findCaseOrThrow(caseId);
}

9.3 Keep Resilience Outside Raw HTTP Mechanics

Do not implement ad-hoc retry loops inside every Jakarta REST method. Prefer a resilience layer around the adapter or use platform mechanisms consistently.

The adapter knows HTTP. The resilience wrapper knows retry/breaker/bulkhead policy. Sometimes they live in the same class, but keep the concerns mentally separate.

10. Failure Classification Design

Create a classification model that separates:

caller errors;
dependency failures;
security failures;
concurrency conflicts;
throttling;
ambiguous mutation state;
contract violations.

Example:

public sealed interface RemoteFailure permits
    RemoteFailure.CallerRejected,
    RemoteFailure.AuthenticationFailed,
    RemoteFailure.AuthorizationDenied,
    RemoteFailure.NotFound,
    RemoteFailure.Conflict,
    RemoteFailure.PreconditionFailed,
    RemoteFailure.RateLimited,
    RemoteFailure.DependencyUnavailable,
    RemoteFailure.Timeout,
    RemoteFailure.ContractViolation,
    RemoteFailure.AmbiguousMutation {

    record CallerRejected(String code, String message) implements RemoteFailure {}
    record AuthenticationFailed() implements RemoteFailure {}
    record AuthorizationDenied() implements RemoteFailure {}
    record NotFound(String resource) implements RemoteFailure {}
    record Conflict(String code) implements RemoteFailure {}
    record PreconditionFailed(String expectedVersion) implements RemoteFailure {}
    record RateLimited(Duration retryAfter) implements RemoteFailure {}
    record DependencyUnavailable(String service) implements RemoteFailure {}
    record Timeout(String phase) implements RemoteFailure {}
    record ContractViolation(String detail) implements RemoteFailure {}
    record AmbiguousMutation(String operationId) implements RemoteFailure {}
}

Then map to exceptions or domain results.

10.1 HTTP Status Mapping

Status	Common classification
400	caller rejected / validation mismatch
401	authentication failed
403	authorization denied
404	not found or valid absence
409	conflict / state transition rejected
412	precondition failed / stale version
422	semantic validation failed
429	rate limited
500	dependency failure
502	upstream gateway failure
503	dependency unavailable
504	upstream timeout

Do not blindly map all 4xx to “client error” at the application level. If your service calls another service, a downstream 400 may indicate your adapter sent an invalid contract. That is often your service bug.

11. Ambiguous Mutation Failure

Ambiguous failure is the most dangerous category.

Scenario:

If the client retries without idempotency, it may create duplicate state.

11.1 Safe Handling

If idempotency is supported:

try {
    return submitWithRetry(command);
} catch (TimeoutException e) {
    return queryByIdempotencyKey(command.idempotencyKey())
        .orElseThrow(() -> new AmbiguousMutationFailure(command.commandId(), e));
}

If idempotency is not supported:

do not blindly retry;
record ambiguous outcome;
surface operational follow-up;
reconcile using remote query if possible;
design the API better next iteration.

11.2 Regulatory Implication

For enforcement/case systems, ambiguous mutation must be auditable.

Record:

command ID;
idempotency key;
actor;
timestamp;
target endpoint;
request hash;
timeout/failure phase;
retry attempts;
reconciliation result.

This is not over-engineering. It is how you prove what the system attempted and what it knows.

12. Rate Limiting and `Retry-After`

When the remote API returns 429 Too Many Requests, the response may include Retry-After.

Client behavior:

parse Retry-After if present;
cap it by local deadline;
avoid retrying if user journey cannot wait;
record rate limit metrics;
consider per-tenant throttling upstream.

Example:

private Optional<Duration> retryAfter(Response response) {
    String value = response.getHeaderString("Retry-After");
    if (value == null || value.isBlank()) {
        return Optional.empty();
    }

    try {
        long seconds = Long.parseLong(value);
        return Optional.of(Duration.ofSeconds(seconds));
    } catch (NumberFormatException ignored) {
        return Optional.empty();
    }
}

In production, also support HTTP-date form if the remote API uses it.

12.1 Do Not Retry Past Deadline

Duration wait = retryAfter.orElse(defaultBackoff);
if (deadline.remaining().compareTo(wait.plus(perAttemptTimeout)) < 0) {
    throw new RemoteRateLimitedException(wait);
}

13. Token Refresh and `401`

A 401 might mean:

token expired;
token invalid;
wrong audience;
missing scope;
clock skew;
authentication server issue.

Do not retry indefinitely.

Typical strategy:

call remote API;
if 401, refresh token once;
retry once if the operation is safe or no mutation was sent;
if still 401, classify as auth failure.

For mutation POSTs, token refresh retry can still be ambiguous if the first request reached the server and the 401 was generated after some processing. Usually authentication happens before mutation, but design should not depend on wishful thinking for high-risk operations.

14. Connection Pool and Saturation

Timeouts and retries are not enough. The client also needs connection pool discipline.

Risks:

too few connections: unnecessary queuing;
too many connections: remote overload;
no pool acquire timeout: hidden wait;
per-request client creation: no effective reuse;
unclosed responses: pool exhaustion;
long streaming responses consuming all connections.

14.1 Pool Sizing Questions

Ask:

How many concurrent inbound requests can trigger this remote call?
How many calls per inbound request?
What is p95/p99 remote latency?
What is the remote service's allowed QPS/concurrency?
Are calls streaming or short-lived?
Are there separate pools per remote service?
Is there tenant isolation?

14.2 Basic Estimate

If expected QPS is 100 and p95 latency is 100 ms:

concurrency ≈ qps * latencySeconds
concurrency ≈ 100 * 0.1 = 10

Then add margin, but do not exceed downstream capacity. This is an estimate, not a substitute for load testing.

15. Resilience Policy by Operation Type

15.1 Read Operation

public Optional<CaseSnapshot> findCase(String caseId) {
    return retryPolicy.execute(() -> adapter.findCase(caseId));
}

Policy:

timeout: short;
retry: yes for transient failure;
circuit breaker: yes;
bulkhead: yes;
fallback: maybe cached/stale if allowed;
idempotency key: not needed for pure read.

15.2 Mutation Command

public EscalationSubmissionResult submitEscalation(EscalationCommand command) {
    requireIdempotencyKey(command);
    return mutationRetryPolicy.execute(() -> adapter.submitEscalation(command));
}

Policy:

timeout: bounded;
retry: only with idempotency;
circuit breaker: yes;
bulkhead: yes;
fallback: no fake success;
reconciliation: yes for ambiguous failure.

15.3 Notification Operation

public NotificationResult sendNotification(NotificationCommand command) {
    return notificationPolicy.execute(() -> adapter.send(command));
}

Policy:

timeout: short;
retry: maybe async/background queue;
circuit breaker: yes;
fallback: enqueue for later;
user response: usually not block critical transaction unless notification is required.

15.4 Reference Data Lookup

Policy:

timeout: very short;
retry: maybe once;
fallback: cached data;
circuit breaker: yes;
stale tolerance: explicit.

16. Observability for Resilience

Resilience without observability is just hidden failure.

16.1 Metrics

Minimum metrics:

Metric	Labels
outbound request count	service, operation, method, status_family
outbound duration	service, operation, method
outbound failure count	service, operation, failure_type
retry count	service, operation, attempt
timeout count	service, operation, phase
circuit breaker state	service, operation
bulkhead rejection count	service, operation
rate limited count	service, operation
ambiguous mutation count	service, operation

Use route templates instead of raw URLs.

Good label:

operation=POST /cases/{caseId}/escalations

Bad label:

url=/cases/CASE-2026-000001/escalations

High-cardinality labels can damage metrics systems.

16.2 Logs

Log structured facts:

{
  "event": "remote_call_failed",
  "service": "case-registry",
  "operation": "POST /cases/{caseId}/escalations",
  "status": 503,
  "failureType": "DEPENDENCY_UNAVAILABLE",
  "attempt": 2,
  "durationMs": 240,
  "correlationId": "...",
  "remoteRequestId": "...",
  "idempotencyKeyHash": "..."
}

Do not log full request body, tokens, raw idempotency keys, or PII by default.

16.3 Tracing

For distributed tracing, propagate context to outbound calls through client filters.

Track:

remote service name;
HTTP method;
route template;
status;
exception type;
retry attempt;
timeout/circuit breaker events.

17. Testing Resilience

Resilience policies must be tested. They are too important to trust by inspection.

17.1 Test Cases

Scenario	Expected behavior
remote returns `200`	maps success
remote returns `404` for find	maps empty
remote returns `409` for command	maps domain rejection
remote returns `503` once then `201`	retry succeeds if command idempotent
remote times out on GET	retry according to policy
remote times out on POST without idempotency	no blind retry; ambiguous failure
remote returns invalid JSON on `200`	contract violation
remote returns `429 Retry-After`	waits or fails based on deadline
circuit open	fail fast
bulkhead full	dependency saturated failure

17.2 Fake Server

Use a fake server capable of:

delayed responses;
connection resets;
status sequences;
header assertions;
body assertions;
verifying retry count;
simulating invalid JSON.

Example pseudo-test:

@Test
void retryableGetRetries503Once() {
    fakeServer.stubGet("/cases/CASE-1")
        .thenRespond(503, "application/json", "{\"code\":\"UNAVAILABLE\"}")
        .thenRespond(200, "application/json", "{\"caseId\":\"CASE-1\"}");

    Optional<CaseSnapshot> result = gateway.findCase("CASE-1");

    assertThat(result).isPresent();
    fakeServer.verifyRequestCount("GET", "/cases/CASE-1", 2);
}

17.3 Ambiguous Mutation Test

@Test
void postTimeoutWithoutIdempotencyIsNotRetried() {
    fakeServer.stubPost("/cases/CASE-1/escalations")
        .thenDelayBeyondTimeout();

    assertThatThrownBy(() -> gateway.submitEscalation(commandWithoutIdempotency))
        .isInstanceOf(AmbiguousMutationFailure.class);

    fakeServer.verifyRequestCount("POST", "/cases/CASE-1/escalations", 1);
}

17.4 Idempotent Mutation Retry Test

@Test
void postTimeoutWithIdempotencyCanRetry() {
    fakeServer.stubPost("/cases/CASE-1/escalations")
        .withHeader("Idempotency-Key", "cmd-123")
        .thenDelayBeyondTimeout()
        .thenRespond(201, "application/json", "{\"escalationId\":\"ESC-1\"}");

    EscalationSubmissionResult result = gateway.submitEscalation(commandWithKey);

    assertThat(result).isInstanceOf(EscalationSubmissionResult.Accepted.class);
    fakeServer.verifyAllRequestsHadHeader("Idempotency-Key", "cmd-123");
}

18. Common Anti-Patterns

18.1 Retry Everything

@Retry(maxRetries = 3)
public Result callRemote(Command command) { ... }

Impact:

duplicate mutations;
amplified overload;
hidden bugs;
increased tail latency.

18.2 No Timeout

Impact:

request thread exhaustion;
pool saturation;
cascading failure;
bad user experience.

18.3 Timeout Longer Than User Journey

remote timeout: 30s
user request SLO: 1s

This is not resilience. It is denial of reality.

18.4 Circuit Breaker Without Classification

If all exceptions count the same, validation bugs can open the circuit. Auth failures can open the circuit. Bad requests can look like dependency outage.

Classify first.

18.5 Fallback That Lies

catch (Exception e) {
    return EscalationResult.success("TEMP");
}

This is catastrophic for regulated actions. It creates false state.

18.6 Logging Secrets on Failure

Failures are exactly when teams often log too much. Redaction must be designed before incidents.

18.7 One Global Bulkhead

A single bulkhead for all remote operations may allow low-priority calls to block critical ones.

Separate critical operations.

19. Resilience Architecture Patterns

19.1 Gateway + Resilience Decorator

Interface:

public interface CaseRegistryGateway {
    Optional<CaseSnapshot> findCase(String caseId);
    EscalationSubmissionResult submitEscalation(EscalationCommand command);
}

Adapter:

public final class JakartaRestCaseRegistryGateway implements CaseRegistryGateway {
    // pure HTTP adapter
}

Decorator:

public final class ResilientCaseRegistryGateway implements CaseRegistryGateway {

    private final CaseRegistryGateway delegate;
    private final RetryPolicy readRetry;
    private final CircuitBreaker breaker;
    private final Bulkhead bulkhead;

    @Override
    public Optional<CaseSnapshot> findCase(String caseId) {
        return bulkhead.execute(() ->
            breaker.execute(() ->
                readRetry.execute(() -> delegate.findCase(caseId))
            )
        );
    }

    @Override
    public EscalationSubmissionResult submitEscalation(EscalationCommand command) {
        requireIdempotencyKey(command);
        return bulkhead.execute(() ->
            breaker.execute(() ->
                mutationRetry.execute(() -> delegate.submitEscalation(command))
            )
        );
    }
}

The point is not this exact code. The point is separation of concerns.

19.2 Policy Registry

For larger systems, define policies per remote operation.

remoteClients:
  caseRegistry:
    operations:
      findCase:
        timeoutMs: 200
        maxAttempts: 2
        circuitBreaker: true
        bulkhead: caseRegistryReads
      submitEscalation:
        timeoutMs: 500
        maxAttempts: 2
        requiresIdempotencyKey: true
        circuitBreaker: true
        bulkhead: caseRegistryWrites

Make policies visible and reviewable.

20. Case-Management Example

Suppose our service has to create an escalation in a remote registry.

20.1 Requirements

command must not create duplicates;
actor identity must be propagated;
audit trail must record attempts;
409 means transition rejected;
412 means stale case version;
503 can be retried if idempotency key exists;
timeout after sending request is ambiguous;
response must be classified.

20.2 Client Method

public EscalationSubmissionResult submitEscalation(EscalationCommand command) {
    requireNonBlank(command.idempotencyKey());

    try (Response response = cases
        .path("{caseId}/escalations")
        .resolveTemplate("caseId", command.caseId())
        .request(MediaType.APPLICATION_JSON_TYPE)
        .header("Idempotency-Key", command.idempotencyKey())
        .header("X-Actor-Id", command.actorId())
        .post(Entity.json(CreateEscalationRequest.from(command)))) {

        return switch (response.getStatus()) {
            case 201 -> {
                EscalationCreated body = response.readEntity(EscalationCreated.class);
                yield EscalationSubmissionResult.accepted(body.escalationId(), response.getLocation());
            }
            case 200 -> {
                EscalationCreated body = response.readEntity(EscalationCreated.class);
                yield EscalationSubmissionResult.duplicate(body.escalationId());
            }
            case 409 -> {
                ErrorEnvelope error = response.readEntity(ErrorEnvelope.class);
                yield EscalationSubmissionResult.rejected(error.code());
            }
            case 412 -> throw new StaleCaseVersionException(command.caseId());
            case 429 -> throw RemoteRateLimitedException.from(response);
            case 503 -> throw RetryableRemoteFailure.from(response);
            default -> throw RemoteApiException.from(response);
        };
    } catch (ProcessingException e) {
        throw classifyProcessingException(command, e);
    }
}

20.3 Ambiguous Timeout Classification

private RuntimeException classifyProcessingException(
    EscalationCommand command,
    ProcessingException e
) {
    if (TimeoutClassifier.isReadTimeout(e)) {
        return new AmbiguousMutationFailure(
            command.commandId(),
            command.idempotencyKey(),
            e
        );
    }

    if (TimeoutClassifier.isConnectTimeout(e)) {
        return new RetryableRemoteFailure("connect-timeout", e);
    }

    return new RetryableRemoteFailure("transport-failure", e);
}

This distinction is important:

connect timeout may mean request never reached the server;
read timeout may mean request reached the server but response was not received.

Implementation details vary, so classification should be conservative.

21. Recovery and Reconciliation

For high-value mutations, design a reconciliation endpoint or query.

Example:

GET /operations/{idempotencyKey}

or:

GET /cases/{caseId}/escalations?commandId=cmd-123

Client recovery:

public EscalationSubmissionResult submitWithReconciliation(EscalationCommand command) {
    try {
        return resilientSubmit(command);
    } catch (AmbiguousMutationFailure ambiguous) {
        return lookupByCommandId(command.commandId())
            .map(existing -> EscalationSubmissionResult.duplicate(existing.escalationId()))
            .orElseThrow(() -> ambiguous);
    }
}

This is often better than increasing retries.

22. Resilience Checklist

For each outbound operation, answer:

23. Mental Model Summary

A resilient Jakarta REST client is not defined by one annotation or one library. It is defined by clear behavior under failure.

The key invariant:

A retry policy must know the semantics of the operation it retries.

24. Exercises

Exercise 1 — Classify Failures

Create a RemoteFailureClassifier that maps:

400;
401;
403;
404;
409;
412;
429;
500;
503;
invalid JSON;
timeout;
connection failure.

Acceptance criteria:

retryable and non-retryable failures are distinct;
404 is operation-specific;
429 preserves Retry-After;
invalid JSON is treated as contract violation;
read timeout on mutation is ambiguous.

Exercise 2 — Add Idempotent Mutation Retry

Implement retry for escalation submission.

Acceptance criteria:

no retry without idempotency key;
same key reused across attempts;
retries only retryable failures;
max attempts enforced;
total deadline enforced;
metrics include attempt count.

Exercise 3 — Add Circuit Breaker

Add a circuit breaker around CaseRegistryGateway.findCase.

Acceptance criteria:

repeated 503 opens the circuit;
400 does not open the circuit;
open circuit fails fast;
half-open success closes circuit;
metrics expose breaker state.

Exercise 4 — Design Fallback Matrix

For a case-management system, decide fallback policy for:

case detail lookup;
evidence download;
escalation submission;
notification sending;
reference data lookup;
audit event publishing.

For each, define:

allowed fallback;
forbidden fallback;
user-visible behavior;
audit requirement.

25. References

Jakarta RESTful Web Services 4.0 Specification: https://jakarta.ee/specifications/restful-ws/4.0/
Jakarta REST Client API package docs: https://jakarta.ee/specifications/restful-ws/4.0/apidocs/jakarta.ws.rs/jakarta/ws/rs/client/package-summary
MicroProfile Fault Tolerance 4.1: https://microprofile.io/specifications/fault-tolerance/4-1/
MicroProfile Fault Tolerance 4.1 Specification HTML: https://download.eclipse.org/microprofile/microprofile-fault-tolerance-4.1/microprofile-fault-tolerance-spec-4.1.html
RFC 9110 — HTTP Semantics: https://www.rfc-editor.org/rfc/rfc9110

Lesson Recap

You just completed lesson 22 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 21

Jakarta REST Client API Mental Model

Next Lesson

Lesson 23

MicroProfile REST Client

Client Resilience

Part 022 — Client Resilience: Timeouts, Retries, Circuit Breakers, Idempotency Keys, and Failure Classification

1. Kaufman Deconstruction

1.1 Target Performance Level

1.2 The Real Skill

2. Failure Taxonomy

3. Timeout Budgeting

3.1 Timeout Types

3.2 Budget from User Journey

3.3 Deadline vs Timeout

4. Retry: Use as a Scalpel, Not a Hammer

4.1 Retry Decision Matrix

4.2 Retryable Failures

4.3 Retry with Backoff and Jitter

4.4 Retry Budget

5. Idempotency Keys

5.1 Idempotency Key Model

5.2 Client-Side Rule

5.3 Domain-Derived Idempotency Key

5.4 Idempotency Scope

6. Circuit Breaker

6.1 When Circuit Breaker Helps

6.2 When Circuit Breaker Hurts

6.3 Circuit Breaker Dimensions

7. Bulkhead

7.1 Bulkhead Types

7.2 Bulkhead Failure Is a Real Failure

8. Fallbacks

8.1 Fallback Decision Table

9. MicroProfile Fault Tolerance

9.1 Annotation Risk

9.2 Classify Before Retry

9.3 Keep Resilience Outside Raw HTTP Mechanics

10. Failure Classification Design

10.1 HTTP Status Mapping

11. Ambiguous Mutation Failure

11.1 Safe Handling

11.2 Regulatory Implication

12. Rate Limiting and Retry-After

12.1 Do Not Retry Past Deadline

13. Token Refresh and 401

14. Connection Pool and Saturation

14.1 Pool Sizing Questions

14.2 Basic Estimate

15. Resilience Policy by Operation Type

15.1 Read Operation

15.2 Mutation Command

15.3 Notification Operation

15.4 Reference Data Lookup

16. Observability for Resilience

16.1 Metrics

16.2 Logs

16.3 Tracing

17. Testing Resilience

17.1 Test Cases

17.2 Fake Server

17.3 Ambiguous Mutation Test

17.4 Idempotent Mutation Retry Test

18. Common Anti-Patterns

18.1 Retry Everything

18.2 No Timeout

18.3 Timeout Longer Than User Journey

18.4 Circuit Breaker Without Classification

18.5 Fallback That Lies

18.6 Logging Secrets on Failure

18.7 One Global Bulkhead

19. Resilience Architecture Patterns

19.1 Gateway + Resilience Decorator

19.2 Policy Registry

20. Case-Management Example

20.1 Requirements

20.2 Client Method

20.3 Ambiguous Timeout Classification

21. Recovery and Reconciliation

22. Resilience Checklist

23. Mental Model Summary

24. Exercises

Exercise 1 — Classify Failures

Exercise 2 — Add Idempotent Mutation Retry

Exercise 3 — Add Circuit Breaker

12. Rate Limiting and `Retry-After`

13. Token Refresh and `401`