Start HereOrdered learning track

Timeout Budgeting for HTTP Calls

Learn Java Microservices Communication - Part 013

Production-grade timeout budgeting for HTTP service-to-service calls in Java microservices, covering deadline propagation, per-attempt timeout, total budget, retries, cancellation, observability, and failure semantics.

14 min read2735 words
PrevNext
Lesson 1396 lesson track01–17 Start Here
#java#microservices#http#timeout+4 more

Part 013 — Timeout Budgeting for HTTP Calls

A timeout is not a number.

A timeout is a contract about how long this caller is willing to let a dependency consume its remaining execution budget.

Most systems fail here because they treat timeout as a local client setting:

client.timeout=30s

That looks harmless. It is not.

A timeout that is too long turns dependency slowness into caller saturation. A timeout that is too short creates false failures. A timeout without retry budgeting creates retry storms. A timeout without cancellation creates useless background work. A timeout without observability turns every production incident into guessing.

In microservice communication, timeout design is part of the architecture.

The rule is simple:

Every outbound call must consume a bounded portion of a larger request budget.

This part explains how to design that budget.


1. The Real Problem: Slow Is More Dangerous Than Failed

A failed dependency releases resources quickly.

A slow dependency holds:

  • request threads;
  • virtual threads or event-loop continuations;
  • servlet container capacity;
  • connection pool slots;
  • database transactions;
  • memory buffers;
  • retry tokens;
  • user-facing request budget;
  • operator attention.

Slow dependencies create cascading failure because callers keep waiting while new requests arrive.

The key phrase is after the result stopped being useful.

A timeout is not merely about avoiding infinite wait. It is about stopping work when the result can no longer influence a useful outcome.


2. Timeout vs Deadline

These two terms are often mixed. In production design, separate them.

ConceptMeaningExample
TimeoutRelative duration allowed for an operationcall user-service for at most 150ms
DeadlineAbsolute point in time after which result is uselesscomplete before 2026-07-05T10:15:30.250Z
BudgetRemaining usable time derived from deadlinedeadline - now - safety margin

Timeouts are local.

Deadlines propagate.

A call chain should not independently invent new 30-second timeouts at every hop.

Bad:

This system can still be doing work long after the gateway has already abandoned the client request.

Better:

Each hop computes:

remainingBudget = deadline - now - localSafetyMargin

Then each outbound call uses the smaller of:

remainingBudget
serviceSpecificMaxTimeout

This prevents a downstream call from consuming time the upstream caller no longer has.


3. The Timeout Stack

A single HTTP call crosses many phases. A useful timeout model knows which phase timed out.

Different timeouts protect different resources.

TimeoutProtectsFailure meaning
Pool acquisition timeoutCaller-side concurrency and pool queueNo connection slot available fast enough
DNS timeoutName resolution pathCannot resolve target in time
Connect timeoutTCP establishmentCannot establish network path quickly
TLS handshake timeoutSecurity session establishmentCannot complete TLS negotiation quickly
Write timeoutRequest body uploadCaller cannot send bytes fast enough
Response/header timeoutUpstream processing and first byteUpstream did not start responding in time
Read/body timeoutResponse streamingUpstream started but did not finish fast enough
Total request timeoutEnd-to-end attemptWhole attempt exceeded allowed duration
Idle timeoutConnection lifecycleExisting connection idle too long
DeadlineWhole logical operationResult no longer useful

A production system does not always expose all of these as first-class knobs. But your design still needs the mental model.

If a client library only gives you timeout, ask:

  • Is it connect-only?
  • Is it total attempt timeout?
  • Does it include DNS?
  • Does it include TLS handshake?
  • Does it include body streaming?
  • Does it apply while waiting for a connection from the pool?
  • Does it cancel the underlying operation?
  • Does it release the connection safely?

Ambiguous timeout semantics are a production risk.


4. Timeout Is Not Retry

Timeout answers:

How long may one operation wait?

Retry answers:

Should we attempt again after a failure?

Budgeting answers:

How much total time may all attempts consume?

The bug appears when teams configure both independently.

requestTimeout: 2s
retries: 3

This may mean:

1 initial attempt + 3 retries = 4 attempts
4 * 2s = 8s before backoff, queueing, and execution overhead

If the upstream request budget is 2 seconds, this configuration is already impossible.

Correct model:

totalBudget = 2s
maxAttempts = 3
perAttemptTimeout = dynamic based on remainingBudget
backoff = bounded by remainingBudget

A retry that starts when the remaining budget cannot complete useful work is not resilience. It is load amplification.


5. Timeout Outcome Is Ambiguous

A timeout means the caller stopped waiting.

It does not prove the callee did not execute the operation.

Consider a POST /payments call:

From the caller's perspective, the outcome is unknown.

Maybe payment was created. Maybe not. Maybe it will be visible after replication delay. Maybe the connection failed after the callee committed but before the response arrived.

Therefore:

  • timeout on a query is often safe to retry;
  • timeout on a command is only safe to retry if the command is idempotent;
  • timeout on a non-idempotent command requires reconciliation;
  • timeout metrics must distinguish caller gave up from callee failed;
  • error messages must avoid claiming facts the caller does not know.

Bad error interpretation:

Payment failed.

Better:

Payment request outcome is unknown. Check by idempotency key or payment id.

For internal systems, encode this distinction in state machines.


6. Choosing Timeout Values

There is no universal timeout value.

Timeout values depend on:

  • user-facing SLO;
  • caller concurrency model;
  • downstream latency distribution;
  • retry policy;
  • payload size;
  • network distance;
  • cold-start behavior;
  • TLS/session setup cost;
  • deployment topology;
  • whether the operation is read, command, stream, or batch;
  • whether fallback exists;
  • whether the operation is on the critical path.

A practical starting model:

operationBudget = upstreamRemainingBudget - callerLocalWorkBudget - safetyMargin
perAttemptTimeout = min(servicePolicyMax, operationBudget / plannedAttemptCountAdjusted)

But this is only a starting point.

Percentile-based baseline

If the downstream service has reliable latency telemetry, choose timeouts using percentiles.

Example:

Downstream observed latencyValue
p5018ms
p9042ms
p9995ms
p99.9180ms

If you can tolerate about 0.1% false timeout under normal conditions, a baseline around p99.9 plus padding may be reasonable.

But be careful.

Percentiles measured inside the callee may exclude:

  • DNS;
  • connection pool wait;
  • TCP connect;
  • TLS handshake;
  • client-side queueing;
  • gateway/mesh hop;
  • serialization/deserialization;
  • response body transfer.

Caller-side latency is the source of truth for caller timeout design.

Padding rule

If p99 and p99.9 are close, small variance can create many false timeouts.

Example:

p99   = 92ms
p99.9 = 98ms

A timeout of 100ms may be fragile because a tiny regression creates many false positives. Add padding when the distribution is tight near the selected percentile.

If p99.9 is much larger than p99, the dependency has a tail-latency problem. A larger timeout may hide the symptom but damage caller capacity.


7. Budget Allocation Across a Call Chain

Assume an inbound request has 1000ms budget.

The order service must:

  • validate request;
  • read order aggregate;
  • call inventory;
  • call pricing;
  • call payment risk;
  • persist decision;
  • emit an event.

Do not give every dependency 1000ms.

Design a budget map.

Work itemBudget
Gateway overhead50ms
Order validation + local fetch100ms
Inventory call180ms
Pricing call150ms
Risk call250ms
Persist decision120ms
Response serialization50ms
Safety reserve100ms
Total1000ms

Budget maps should be revisited with production telemetry.

The goal is not perfect prediction. The goal is to prevent uncontrolled time consumption.

Parallel calls change the budget model.

If inventory and pricing run in parallel, elapsed time is closer to the maximum of the two, not the sum. But resource consumption is still additive.

Parallelism reduces latency but increases load and failure coordination complexity.


8. Deadline Propagation Header

HTTP has no universal standard deadline header equivalent to gRPC deadline semantics.

For internal HTTP APIs, many organizations define their own deadline header.

Example:

X-Request-Deadline: 2026-07-05T10:15:30.250Z

or:

X-Request-Timeout-Ms: 420

Prefer absolute deadline for multi-hop systems:

ApproachStrengthWeakness
Relative timeoutSimple at one hopAccumulates clock/processing ambiguity across hops
Absolute deadlineClear global cutoffRequires reasonably synchronized clocks
Remaining budget millisEasy to consumeCan be accidentally reset or inflated

A robust internal policy:

  • gateway creates deadline if absent;
  • services may reduce deadline, never extend it beyond trust boundary;
  • downstream calls compute remaining budget locally;
  • invalid deadline header is rejected or ignored according to trust policy;
  • deadline is logged and traced;
  • deadline is not treated as user-controlled input unless signed/validated;
  • internal services never blindly propagate external user-provided deadline headers.
public record RequestDeadline(Instant value) {
    public Duration remaining(Clock clock, Duration safetyMargin) {
        Duration remaining = Duration.between(clock.instant(), value).minus(safetyMargin);
        return remaining.isNegative() ? Duration.ZERO : remaining;
    }

    public boolean expired(Clock clock) {
        return !clock.instant().isBefore(value);
    }
}

9. Java Implementation Model

This section gives a small implementation model. Later parts will go deeper into specific clients.

9.1 Deadline context

Keep deadline as an explicit object, not a magic thread-local everywhere.

public record CallBudget(
        Instant deadline,
        Duration safetyMargin,
        Clock clock
) {
    public Duration remaining() {
        Duration value = Duration.between(clock.instant(), deadline).minus(safetyMargin);
        return value.isNegative() ? Duration.ZERO : value;
    }

    public Duration timeoutFor(Duration serviceMaximum) {
        Duration remaining = remaining();
        if (remaining.isZero()) {
            return Duration.ZERO;
        }
        return remaining.compareTo(serviceMaximum) < 0 ? remaining : serviceMaximum;
    }

    public void throwIfExpired() {
        if (remaining().isZero()) {
            throw new DeadlineExceededException("request deadline exceeded before outbound call");
        }
    }
}
public final class DeadlineExceededException extends RuntimeException {
    public DeadlineExceededException(String message) {
        super(message);
    }
}

9.2 JDK HttpClient example

JDK HttpClient has client-level connect timeout and request-level timeout via HttpRequest.Builder#timeout.

HttpClient client = HttpClient.newBuilder()
        .connectTimeout(Duration.ofMillis(200))
        .build();

public HttpResponse<String> getCustomer(String customerId, CallBudget budget)
        throws IOException, InterruptedException {

    Duration timeout = budget.timeoutFor(Duration.ofMillis(300));
    if (timeout.isZero()) {
        throw new DeadlineExceededException("no budget left for customer-service call");
    }

    HttpRequest request = HttpRequest.newBuilder()
            .uri(URI.create("https://customer-service.internal/customers/" + customerId))
            .timeout(timeout)
            .header("X-Request-Deadline", budget.deadline().toString())
            .GET()
            .build();

    return client.send(request, HttpResponse.BodyHandlers.ofString());
}

Do not interpret this as a complete production client. It is the core idea: derive outbound timeout from remaining budget.

9.3 CompletableFuture and cancellation

Async APIs need explicit cancellation handling.

CompletableFuture<HttpResponse<String>> future = client.sendAsync(
        request,
        HttpResponse.BodyHandlers.ofString()
);

scheduler.schedule(() -> future.cancel(true), timeout.toMillis(), TimeUnit.MILLISECONDS);

But cancellation behavior depends on client implementation and operation phase. Test it. Verify that resources are released and metrics are emitted.

9.4 Spring WebClient sketch

Mono<CustomerDto> result = webClient.get()
        .uri("/customers/{id}", customerId)
        .header("X-Request-Deadline", budget.deadline().toString())
        .retrieve()
        .bodyToMono(CustomerDto.class)
        .timeout(budget.timeoutFor(Duration.ofMillis(300)));

For reactive clients, remember that timeout() may not mean the same as connection timeout, pool acquisition timeout, or TLS handshake timeout. Configure the underlying connector as well.


10. Server-Side Timeout Awareness

Timeout design is not only client-side.

A server should also know when continuing work is wasteful.

Server-side behavior should include:

  • reading propagated deadline;
  • rejecting immediately if deadline already expired;
  • stopping expensive work if deadline expires;
  • propagating remaining deadline to downstream calls;
  • avoiding irreversible side effects after caller cancellation unless operation semantics require completion;
  • recording whether work was abandoned due to caller deadline.

Example servlet-style filter:

public final class DeadlineFilter implements Filter {
    private final Clock clock;

    public DeadlineFilter(Clock clock) {
        this.clock = clock;
    }

    @Override
    public void doFilter(ServletRequest req, ServletResponse res, FilterChain chain)
            throws IOException, ServletException {

        HttpServletRequest request = (HttpServletRequest) req;
        HttpServletResponse response = (HttpServletResponse) res;

        String rawDeadline = request.getHeader("X-Request-Deadline");
        if (rawDeadline != null) {
            Instant deadline = Instant.parse(rawDeadline);
            if (!clock.instant().isBefore(deadline)) {
                response.setStatus(503);
                response.setHeader("Content-Type", "application/problem+json");
                response.getWriter().write("""
                    {
                      "type":"https://errors.example.com/deadline-exceeded",
                      "title":"Deadline exceeded",
                      "status":503,
                      "detail":"Request deadline was already expired when received."
                    }
                    """);
                return;
            }
        }

        chain.doFilter(req, res);
    }
}

Whether to return 503, 504, or a domain-specific error depends on where timeout happened:

LocationTypical interpretation
Server received already-expired deadlineCaller/upstream budget exhausted
Gateway timed out waiting for upstream504 Gateway Timeout
Server intentionally sheds due to overload503 Service Unavailable
Client waited too long locallyClient-side timeout exception, not necessarily HTTP status
Server did not receive full request in time408 Request Timeout can apply

Do not flatten all of these into 500.


11. Timeout and Idempotency

Timeouts force you to decide whether retry is safe.

OperationTimeout outcomeRetry safety
GET /customers/123unknown response, no intended mutationusually safe if query is safe/idempotent
PUT /customers/123/addressupdate may or may not have happenedsafe if full replacement and versioning are correct
POST /payments without idempotency keypayment may have been createdunsafe
POST /payments with idempotency keyoutcome can be deduplicated/recoveredretryable by key
POST /emails/sendemail may have been sentunsafe without deduplication
DELETE /resource/123deletion may have happenedretry often safe if delete is idempotent

Timeout budgeting and idempotency must be designed together.

A production command API should expose a recovery handle:

POST /payments
Idempotency-Key: 2f8e04e2-7f6f-4271-b52d-f6416bf9a421

Then caller can reconcile:

GET /payments/by-idempotency-key/2f8e04e2-7f6f-4271-b52d-f6416bf9a421

The point is not that every command must be retried. The point is that timeout must not leave the caller permanently blind.


12. Timeout Observability

Every timeout should answer:

  • which dependency?
  • which endpoint template?
  • which phase?
  • which attempt?
  • what was the configured timeout?
  • what was the remaining deadline?
  • was this before or after sending request bytes?
  • did caller cancel?
  • was retry attempted?
  • did retry succeed?
  • did the callee later complete?

Metrics should use low-cardinality labels.

Good metric labels:

http.client.request.duration{service="payment-service", method="POST", route="/payments", outcome="timeout"}
http.client.timeout.count{service="payment-service", phase="response_headers", attempt="1"}
http.client.retry.count{service="payment-service", reason="timeout"}
call_budget.remaining_ms{service="payment-service", route="/payments"}

Bad metric labels:

customerId="991882123"
url="/customers/991882123/orders/2026-07-05T10:33:19.123Z"
exceptionMessage="Read timed out after 312ms for tenant abc..."

Logs should include high-value diagnostic fields:

{
  "event": "outbound_http_timeout",
  "dependency": "payment-service",
  "method": "POST",
  "route": "/payments",
  "attempt": 1,
  "phase": "response_headers",
  "configuredTimeoutMs": 250,
  "remainingBudgetMs": 31,
  "idempotencyKeyPresent": true,
  "traceId": "7d1e..."
}

Tracing should show timeout as span status/error and include the dependency name and route template, not raw high-cardinality URL.


13. Timeout Testing

Timeout behavior must be tested as behavior, not just configuration.

Test cases

TestExpected outcome
downstream never accepts connectionconnect timeout fires
pool exhaustedacquisition timeout fires
downstream accepts but never respondsresponse timeout fires
downstream sends headers but stalls bodyread/body timeout fires
deadline already expired before callclient fails fast without network call
retry would exceed remaining budgetretry is skipped
command timeout after sendstate becomes UNKNOWN, not FAILED
cancellation occursresources are released
gateway timeout shorter than client timeoutclient policy is corrected

WireMock-style mental model

@Test
void shouldNotStartOutboundCallWhenDeadlineExpired() {
    CallBudget budget = new CallBudget(
            Instant.now().minusMillis(1),
            Duration.ofMillis(10),
            Clock.systemUTC()
    );

    assertThrows(DeadlineExceededException.class,
            () -> customerClient.getCustomer("c-123", budget));

    // Verify stub server received zero requests.
}

Failure injection

A mature team can inject:

  • delayed headers;
  • delayed body chunks;
  • connection resets;
  • half-open connections;
  • slow TLS handshake;
  • DNS failure;
  • pool exhaustion;
  • gateway upstream timeout;
  • callee completes after caller timeout.

This is how timeout policy becomes real, not decorative.


14. Common Timeout Anti-Patterns

Anti-pattern 1: One timeout for everything

httpTimeout: 30s

This hides whether you mean connect, pool acquisition, response, body, or total request.

Anti-pattern 2: Client timeout longer than upstream gateway timeout

Gateway timeout: 5s
Service client timeout: 30s

The service keeps working long after the gateway has abandoned the result.

Anti-pattern 3: Retry timeout multiplication

timeout=2s, retries=3, user budget=2s

This is impossible without violating the user budget.

Anti-pattern 4: Timeout interpreted as business failure

Timeout is often unknown outcome, not domain rejection.

Anti-pattern 5: No timeout for background workers

Workers also need bounded calls. Otherwise a stuck dependency can freeze throughput and block partition/queue progress.

Anti-pattern 6: Infinite pool queue

A short HTTP timeout does not help if the request waits unbounded time before it even gets a connection.

Anti-pattern 7: Timeout configured but not observed

If you cannot break down timeout by dependency, route, and phase, you cannot operate it.

Anti-pattern 8: Caller stops waiting but callee continues irreversible work unintentionally

This creates duplicate side effects, delayed writes, and reconciliation bugs.


15. Production Timeout Policy Template

Use this as a starting template.

outboundClients:
  payment-service:
    baseUrl: https://payment-service.internal
    connectTimeout: 100ms
    poolAcquireTimeout: 50ms
    responseTimeout: 250ms
    maxTotalAttemptTimeout: 300ms
    deadlinePropagation: true
    safetyMargin: 25ms
    retry:
      enabled: true
      maxAttempts: 2
      retryableMethods: [GET, PUT, DELETE]
      retryableStatusCodes: [408, 429, 502, 503, 504]
      requireIdempotencyKeyForPost: true
      backoff:
        initial: 25ms
        max: 100ms
        jitter: full
    observability:
      dependencyName: payment-service
      routeTemplating: required
      logTimeoutEvents: true
      emitRemainingBudgetMetric: true

Do not copy numbers blindly. Copy the structure.


16. Review Checklist

Before approving an HTTP client integration, ask:

  • What is the caller's total request budget?
  • What is the downstream service maximum allowed timeout?
  • Is timeout per attempt or total across attempts?
  • Is retry bounded by the same deadline?
  • Is there a pool acquisition timeout?
  • Is connect timeout separate from response timeout?
  • Is deadline propagated downstream?
  • Can downstream reject already-expired work?
  • Does timeout cancel work or only stop waiting?
  • If the call is a command, what happens on unknown outcome?
  • Is idempotency key required where needed?
  • Are timeout events observable by dependency and route?
  • Are timeout metrics low-cardinality?
  • Is gateway/load-balancer timeout aligned with service client timeout?
  • Is the timeout tested with injected slow dependency behavior?

17. The Top 1% Mental Model

Most engineers ask:

What timeout should I set?

A stronger engineer asks:

What is the caller's remaining budget, what work can still produce a useful result, what resources are at risk while waiting, and what state should the system enter if the outcome is unknown?

That question changes the design.

Timeouts are not a defensive afterthought. They are part of the communication protocol between services.

A good timeout policy protects:

  • user experience;
  • caller capacity;
  • callee recovery;
  • retry behavior;
  • correctness under unknown outcome;
  • operator diagnosis;
  • system-wide stability.

The invariant is:

No service may let a dependency consume unbounded or useless time.

Once you enforce that invariant, HTTP communication becomes dramatically safer.


References

  • RFC 9110 — HTTP Semantics: status codes, Retry-After, method semantics, and HTTP request/response meaning.
  • RFC 9112 — HTTP/1.1: persistent connection behavior.
  • Oracle Java SE 25 API — java.net.http.HttpClient and request/client timeout APIs.
  • AWS Builders Library — Timeouts, retries, and backoff with jitter.
  • Google SRE Book — Handling overload and cascading failures.
  • OpenTelemetry Semantic Conventions — HTTP client/server metrics and trace attributes.
Lesson Recap

You just completed lesson 13 in start here. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.