Series/Learn Java Microservices Design and Architect

Series MapLesson 63 / 100

Deepen PracticeOrdered learning track

Service Discovery and Client-Side Behavior

Learn Java Microservices Design and Architect - Part 063

Service discovery and client-side behavior for Java microservices: DNS, Kubernetes Services, registry resolution, client load balancing, stale endpoints, connection pools, retries, readiness, and operational failure modes.

[2026-07-05]15 min read2862 words

In This Lesson

1. Core idea 2. The discovery pipeline 3. Logical dependency vs physical endpoint

PrevNext

Lesson 63100 lesson track55–82 Deepen Practice

#java#microservices#architecture#service-discovery+5 more

Part 063 — Service Discovery and Client-Side Behavior

1. Core idea

Service discovery is not "how one service gets another service's URL".

That is the shallow explanation.

In a real microservices system, service discovery is the runtime mechanism that answers this question:

Given a logical dependency, which concrete instance should I call right now, under current topology, health, latency, load, readiness, security, and policy constraints?

That means service discovery is not only a registry concern. It includes:

naming
endpoint publication
endpoint health
readiness semantics
DNS behavior
endpoint caching
connection pooling
client-side load balancing
timeout policy
retry policy
stale endpoint handling
graceful shutdown behavior
traffic shifting
observability
security identity

A weak service discovery design creates failures that look random:

one pod is removed but clients still call it
DNS resolves, but connection reuse still hits a draining instance
load balancer sends traffic to a pod that is alive but not ready
clients retry against the same bad endpoint
rolling deployment causes short bursts of 503
service mesh and application clients both retry
Java DNS cache hides topology changes
HTTP connection pools outlive endpoint health
the registry says "healthy" but the dependency is overloaded

The important rule:

Discovery returns candidates. Client behavior determines whether those candidates are used safely.

A top-tier engineer does not stop at "use Kubernetes Service" or "use Eureka". They ask: what happens during rollout, overload, DNS cache expiry, endpoint removal, node failure, network partition, certificate rotation, and partial dependency failure?

2. The discovery pipeline

A runtime call is a pipeline, not a single lookup.

Each stage has a distinct failure mode.

Stage	Common failure	Architectural control
Naming	Ambiguous service identity	Stable service names and ownership catalog
Resolution	Stale DNS / registry state	TTL discipline, readiness, re-resolution policy
Candidate list	Unready or draining endpoint	Readiness gates, endpoint removal, graceful shutdown
Selection	Uneven load	Load-balancing policy, connection management
Connection	Reused connection to bad instance	Max connection lifetime, idle eviction, channel health
Call execution	Slow dependency consumes threads	Deadlines, timeout, concurrency limits
Outcome handling	Retry storm	Retry budget, idempotency, backoff, jitter
Telemetry	No visibility into selected endpoint	Low-cardinality endpoint metrics and tracing attributes

Discovery is useful only if the client is disciplined.

3. Logical dependency vs physical endpoint

A service should not hardcode physical instances.

Bad:

caseProfile:
  baseUrl: http://10.12.4.71:8080

Better:

caseProfile:
  serviceName: case-profile-service

But the logical name alone is not enough.

A production-grade dependency contract should define:

dependencies:
  case-profile-service:
    protocol: http
    purpose: "Resolve subject profile summary for case intake and review screens"
    criticality: required-for-write
    discovery: kubernetes-dns
    connectTimeout: 250ms
    responseTimeout: 1200ms
    maxConcurrency: 80
    retry:
      enabled: true
      maxAttempts: 2
      retryOn:
        - connect-timeout
        - connection-reset-before-write
        - 503
      backoff: exponential-jitter
    idempotencyRequired: true
    fallback:
      mode: fail-closed
    owner: party-domain-team

The dependency contract makes runtime behavior explicit.

Without it, every service invents its own network behavior.

That becomes chaos.

4. Kubernetes DNS-based discovery

In Kubernetes, a common discovery model is:

client pod -> DNS name -> Kubernetes Service -> selected backend Pod endpoints

Example in-cluster URL:

http://case-profile-service.case-management.svc.cluster.local:8080

Usually the short name is enough inside the same namespace:

http://case-profile-service:8080

A simplified flow:

Important mental model:

Kubernetes Service discovery gives you a stable service address. It does not automatically make your application-level behavior correct.

Kubernetes can remove unready pods from Service endpoints, but your Java process may still have:

old DNS cache entries
old HTTP keep-alive connections
old HTTP/2 channels
queued requests
retries that target the same dependency
long-running calls during shutdown

So you still need client-side discipline.

5. ClusterIP Service vs headless Service

Two common patterns:

ClusterIP Service:
  client resolves stable service name
  traffic goes through service virtual IP / platform load balancing

Headless Service:
  client resolves individual pod endpoints
  client or library decides which endpoint to call

ClusterIP Service

Good default for most Java microservices.

Advantages:

simple service name
stable virtual address
endpoint changes hidden from application
platform manages routing to ready endpoints
less application code

Risks:

client may not know which backend instance was selected
connection pooling may reduce actual balancing fairness
platform load balancing does not know business criticality
retries may still amplify load

Headless Service

Useful when clients need individual endpoints.

Common examples:

stateful systems
peer-aware clients
databases/queues with special topology
gRPC client-side balancing in some setups
custom load-aware routing

Risks:

more client complexity
stale endpoint list risk
endpoint selection responsibility moves to application/client library
more failure modes during scale/down/rollout

Default rule:

Use the simplest platform-managed discovery model unless the client has a real reason to understand individual instances.

Do not choose headless discovery because it feels more "microservice-native".

6. Client-side load balancing vs server-side load balancing

There are two broad models.

Server-side / platform load balancing

The client calls a stable address. A platform component selects the instance.

Examples:

Kubernetes Service
cloud load balancer
ingress controller
API gateway
service mesh proxy

Client-side load balancing

The client obtains a list of instances and chooses one.

Examples:

Spring Cloud LoadBalancer
gRPC name resolver and load-balancing policy
client library for stateful backend
custom resolver over service registry

Decision model

Question	Prefer platform LB	Prefer client-side LB
Do clients need per-instance awareness?	No	Yes
Is simple operational model more important?	Yes	No
Is endpoint topology special/stateful?	No	Yes
Do you need weighted, locality-aware, or load-aware policy at app level?	Sometimes	Often
Can teams safely maintain client behavior?	Not required	Required
Is service mesh already standard?	Often	Sometimes

Client-side load balancing is not automatically better.

It moves correctness into the client.

7. Service registry is not service discovery by itself

A registry stores or exposes service instance data.

Discovery is the end-to-end behavior using that data.

A registry may provide:

service name
instance host/port
metadata
health status
zone/region
version
weight
tags

But the caller still needs to decide:

which endpoint to select
whether the endpoint is ready enough for this operation
how long to wait
whether to retry
whether to avoid an instance after failure
whether to prefer same-zone traffic
how to react to stale registry data

A minimal abstraction:

public interface ServiceEndpointResolver {
    List<ServiceEndpoint> resolve(ServiceName serviceName);
}

public record ServiceEndpoint(
        String serviceName,
        URI baseUri,
        String zone,
        String version,
        Map<String, String> metadata
) {}

A selector is separate:

public interface ServiceEndpointSelector {
    ServiceEndpoint select(List<ServiceEndpoint> endpoints, RequestContext context);
}

A call policy is also separate:

public record DependencyCallPolicy(
        Duration connectTimeout,
        Duration responseTimeout,
        int maxAttempts,
        int maxConcurrency,
        boolean retryOnlyIfIdempotent
) {}

Why separate them?

Because endpoint resolution, endpoint selection, and call execution are different responsibilities.

Mixing them produces clients that are hard to test and impossible to reason about during incidents.

8. Naming discipline

Service names become runtime contracts.

Bad names:

misc-service
common-service
case-api
service-a
party-v2
new-case-service

Better names:

case-command-service
case-query-service
party-profile-service
evidence-metadata-service
regulatory-decision-service
notification-dispatch-service

A good runtime service name should communicate:

business capability
ownership boundary
expected usage
not a temporary implementation detail

Naming smell:

Smell	Why it hurts
`common-service`	Usually hides low-cohesion shared logic
`core-service`	Becomes god service
`*-v2`	Encodes migration accident into identity
`api-service`	Says transport, not capability
`data-service`	Often becomes cross-domain database wrapper
`integration-service`	Too vague unless scoped to an external system

A service name appears in:

DNS
metrics
traces
logs
ACL policies
dashboards
runbooks
incident timelines
service catalog
deployment manifests

Treat it as architecture, not a label.

9. Readiness is part of discovery

A service instance should not receive traffic just because the process is alive.

A useful readiness check answers:

Can this instance safely accept normal traffic right now?

It does not mean:

Is every dependency reachable right now?

Readiness should consider:

application startup complete
configuration validated
essential local resources initialized
migration compatibility verified if relevant
HTTP server listening
thread/concurrency pool not saturated beyond admission threshold
graceful shutdown not in progress

Readiness should usually avoid deep dependency checks that cause synchronized outage.

Example bad readiness:

Service A readiness requires Service B, C, D, database, queue, search, cache all healthy.

Why bad?

one optional dependency outage removes healthy pods from traffic
readiness checks can become dependency load generators
cascading readiness failure may remove too much capacity
Kubernetes may stop routing to all pods even though degraded service is possible

Better:

Readiness = this instance can process requests according to its advertised mode.
Dependency health = exposed separately as diagnostic health/detail metric.

A degraded-ready instance is possible if the contract supports degraded behavior.

10. Stale endpoint problem

Endpoint state changes faster than many clients realize.

Events:

pod becomes unready
pod is terminating
pod is rescheduled
node fails
service scales down
deployment rolls out
service mesh sidecar restarts
DNS answer changes
certificate rotates

But clients may retain:

DNS cache
service registry cache
TCP connection
HTTP keep-alive connection
HTTP/2 multiplexed channel
gRPC channel
pooled DB connection
unresolved async work

So a service can keep calling an endpoint that should no longer receive traffic.

Mitigations:

readiness goes false before shutdown work begins
termination grace period is long enough
server stops accepting new requests during drain
client response timeout is bounded
connection max lifetime is bounded
idle connections are evicted
retry targets can change endpoint if safe
caller uses idempotency key for retry-safe commands
dashboards expose endpoint/removal-related error spikes

The platform and the application must cooperate.

11. Java DNS caching discipline

Java applications can cache DNS results.

That is normally useful.

But in dynamic infrastructure, unbounded or overly long DNS caching can hide topology changes.

Architectural rule:

Decide DNS cache TTL intentionally. Do not let it be an accidental JVM/runtime default.

Example runtime option:

-Dnetworkaddress.cache.ttl=30
-Dnetworkaddress.cache.negative.ttl=5

This does not mean every service must use 30 seconds.

The right value depends on:

discovery mechanism
DNS TTL
service mesh/proxy behavior
rollout frequency
connection pool lifetime
expected failover time
operational tolerance for stale endpoints

DNS TTL alone does not solve stale connections.

You also need connection lifecycle policy.

12. Connection pooling is part of load balancing

HTTP connection pools are necessary.

Without pooling, services waste time and CPU repeatedly opening TCP/TLS connections.

But connection pools influence load distribution.

Common issue:

Client resolves service address.
Client opens a small number of persistent connections.
Requests reuse those connections.
Traffic distribution follows existing connections more than endpoint list.

This is especially visible with:

HTTP/2 multiplexing
long-lived gRPC channels
low number of client replicas
high request volume per replica
uneven pod rollout timing
connection reuse during scale-up

Useful controls:

max idle time
max connection lifetime
max connections per host
pending acquire timeout
idle eviction
connection health check
channel reconnect policy
per-dependency pool sizing

Example Reactor Netty style configuration:

ConnectionProvider provider = ConnectionProvider.builder("case-profile-pool")
        .maxConnections(80)
        .pendingAcquireTimeout(Duration.ofMillis(200))
        .maxIdleTime(Duration.ofSeconds(30))
        .maxLifeTime(Duration.ofMinutes(5))
        .evictInBackground(Duration.ofSeconds(30))
        .build();

HttpClient httpClient = HttpClient.create(provider)
        .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 250)
        .responseTimeout(Duration.ofMillis(1200));

WebClient client = WebClient.builder()
        .clientConnector(new ReactorClientHttpConnector(httpClient))
        .baseUrl("http://case-profile-service")
        .build();

Numbers here are examples, not universal defaults.

The decision belongs to the dependency contract.

13. Client-side behavior policy

Every dependency needs a policy.

Not every dependency should be treated the same.

Example:

Dependency	Criticality	Retry	Fallback	Timeout	Notes
`regulatory-decision-service`	write-critical	limited	fail-closed	short	Duplicate decision is unacceptable
`party-profile-service`	read-critical	yes	degraded summary	medium	Profile data can be stale for some screens
`notification-dispatch-service`	async side-effect	via outbox	queue	async	Do not block primary workflow
`audit-event-service`	compliance-critical	durable outbox	fail-safe buffer	async	Losing audit event is unacceptable
`search-index-service`	eventually consistent	yes	skip/rebuild	async	Projection can be reconciled

A dependency call policy should include:

clientPolicy:
  timeout:
    connect: 250ms
    response: 1200ms
  concurrency:
    maxInFlight: 80
    pendingAcquireTimeout: 200ms
  retry:
    maxAttempts: 2
    backoff: 100ms..400ms with jitter
    onlyIdempotent: true
  circuitBreaker:
    enabled: true
    failureRateThreshold: 50
    minimumCalls: 50
  fallback:
    mode: degraded-read-only
  telemetry:
    dependencyName: case-profile-service
    recordStatusFamily: true
    recordTimeouts: true

The key is not the YAML.

The key is explicitness.

14. Retry selection and endpoint choice

A retry should not blindly repeat the same failed path.

If failure happened before the request was accepted, trying another endpoint may be safe.

If failure happened after the command may have been processed, retry needs idempotency.

Endpoint selection on retry should consider:

was the selected endpoint recently failing?
is the failure endpoint-local or service-wide?
is the call idempotent?
is there remaining deadline budget?
will retry violate concurrency/rate limit?
is downstream overloaded?

Bad retry:

Three attempts, no backoff, same endpoint, no idempotency, no deadline.

Better retry:

At most one retry, only for safe failure classes, with jitter, within original deadline, using idempotency for commands, and with metrics.

15. Discovery and graceful shutdown

Graceful shutdown requires coordination between server and discovery.

A safe shutdown flow:

Common mistake:

Process receives SIGTERM and immediately exits.

Better:

readiness false first
drain window begins
server refuses new long-running work
in-flight requests finish within deadline
async consumers stop polling
outbox publisher stops safely
app exits before platform force-kills it

If clients keep long-lived connections, make sure they handle:

GOAWAY frames for HTTP/2 where relevant
connection close
reset
retryable failure classification
endpoint reselection

16. Service mesh changes discovery, but does not remove client responsibility

A service mesh may provide:

mTLS
service identity
routing
traffic split
retries
timeouts
circuit breaking
metrics
tracing
policy enforcement

But application still owns:

business idempotency
command semantics
compensation
fallback correctness
data consistency
user-visible error shape
deadline meaning
criticality of dependencies
audit trail
domain failure response

Dangerous assumption:

The mesh handles resilience, so application clients can be naive.

No.

The mesh can enforce network policy, but it does not know whether approveCase() is safe to retry.

The application must encode business semantics.

17. Java client adapter design

Do not scatter raw WebClient, RestClient, HttpClient, or gRPC stubs across application code.

Use an adapter behind a port.

public interface PartyProfilePort {
    PartyProfileSnapshot getProfile(PartyId partyId, RequestContext context);
}

Adapter:

final class HttpPartyProfileAdapter implements PartyProfilePort {
    private final WebClient webClient;
    private final DependencyPolicy policy;

    HttpPartyProfileAdapter(WebClient webClient, DependencyPolicy policy) {
        this.webClient = webClient;
        this.policy = policy;
    }

    @Override
    public PartyProfileSnapshot getProfile(PartyId partyId, RequestContext context) {
        return webClient.get()
                .uri("/internal/parties/{partyId}/profile-summary", partyId.value())
                .header("X-Correlation-Id", context.correlationId())
                .header("X-Deadline-Ms", Long.toString(context.remainingMillis()))
                .retrieve()
                .onStatus(status -> status.value() == 404,
                        response -> Mono.error(new PartyProfileNotFound(partyId)))
                .onStatus(HttpStatusCode::is5xxServerError,
                        response -> Mono.error(new DependencyUnavailable("party-profile-service")))
                .bodyToMono(PartyProfileSnapshotResponse.class)
                .timeout(policy.responseTimeout())
                .map(PartyProfileSnapshotMapper::toDomain)
                .block();
    }
}

This adapter is responsible for:

protocol details
path construction
header propagation
timeout execution
error translation
response mapping
dependency metrics
trace attributes
fallback behavior when allowed

Application service should not know HTTP status codes from another service.

18. Client-side metrics

Dependency metrics should answer:

Which dependency is slow, failing, saturated, retrying, or returning degraded responses?

Useful labels:

caller service
dependency service
operation name
status family
failure class
retry outcome
fallback outcome
timeout type

Be careful with high-cardinality labels.

Avoid:

full URL
party ID
case ID
user ID
raw exception message
pod IP in high-volume metric labels

Metric examples:

dependency_client_requests_total{dependency="party-profile-service",operation="getProfile",outcome="success"}
dependency_client_duration_seconds_bucket{dependency="party-profile-service",operation="getProfile",le="0.5"}
dependency_client_retries_total{dependency="party-profile-service",operation="getProfile",reason="connect_timeout"}
dependency_client_inflight{dependency="party-profile-service",operation="getProfile"}
dependency_client_fallbacks_total{dependency="party-profile-service",operation="getProfile",mode="degraded"}

Tracing attributes:

peer.service = party-profile-service
rpc.system = http
http.request.method = GET
url.template = /internal/parties/{partyId}/profile-summary
dependency.criticality = read-critical
retry.attempt = 0
fallback.mode = none

The point is to make runtime behavior diagnosable.

19. Common anti-patterns

19.1 Hardcoded endpoint

private static final String BASE_URL = "http://10.0.14.22:8080";

This breaks topology evolution.

19.2 Global HTTP client policy

Every dependency uses same timeout, retry, pool, fallback, and concurrency limit.

This ignores dependency criticality.

19.3 Deep readiness dependency chain

Service is unready if any downstream dependency is down.

This can turn one dependency failure into system-wide capacity loss.

19.4 No connection lifetime

Long-lived connections never rotate, causing uneven load and stale endpoint risk.

19.5 Retry without endpoint awareness

Retrying the same failed endpoint without backoff or idempotency is load amplification.

19.6 Registry as source of truth for business health

Registry health is not enough. Business operation health must be measured separately.

19.7 Discovery hidden in random utility code

If every team writes its own mini client, behavior diverges and incidents become hard to debug.

20. Architecture review questions

Ask these before approving service-to-service discovery design:

What is the logical dependency name?
Who owns the called service?
How is the dependency resolved at runtime?
Is discovery DNS-based, registry-based, mesh-based, or static config?
Does the client call a stable service address or individual endpoints?
What is the timeout/deadline policy?
What is the connection pool policy?
What is the DNS/registry cache policy?
What happens when a pod becomes unready?
What happens during rolling deployment?
What happens when the client has stale connections?
What happens when the dependency is overloaded?
Is retry allowed for this operation?
Is retry idempotent at business level?
Can retry select a different endpoint?
Is fallback allowed?
Is the dependency required, optional, or degraded-capable?
Are dependency metrics emitted?
Are trace attributes meaningful?
Is behavior documented in the service catalog?

21. Mini case study: Case Intake calls Party Profile

Scenario:

Case Intake Service needs party profile summary while creating a regulatory case.

Naive design:

POST /cases
  -> call Party Profile synchronously
  -> if Party Profile slow, request waits
  -> retry three times
  -> no idempotency key
  -> if timeout, user retries submit
  -> duplicate case risk

Better design:

POST /cases
  -> require idempotency key
  -> create case as Draft/IntakePendingProfile
  -> call Party Profile with short timeout and deadline
  -> if profile unavailable, store profileResolutionPending task
  -> publish CaseProfileResolutionRequested event
  -> let async worker resolve profile later

Discovery policy:

dependencies:
  party-profile-service:
    discovery: kubernetes-dns
    endpoint: http://party-profile-service.case-management.svc.cluster.local
    timeout:
      connect: 200ms
      response: 800ms
    retry:
      maxAttempts: 2
      onlyIdempotent: true
      backoff: jitter
    fallback:
      mode: pending-resolution
    pool:
      maxConnections: 60
      maxIdleTime: 30s
      maxLifeTime: 5m

Business result:

case creation remains controlled
profile dependency does not block entire intake process
duplicate case risk is reduced
operational dependency is visible
unresolved profile becomes explicit workflow state

22. Fitness functions

You can automate parts of discovery discipline.

Examples:

No service may call raw IP address in production config.

Every HTTP client dependency must define connect timeout and response timeout.

Every synchronous dependency must have dependency metrics.

Every command endpoint called remotely must document idempotency behavior.

Readiness endpoint must not call more than approved local/essential dependencies.

Every service catalog dependency must declare criticality and fallback mode.

ArchUnit-ish example:

@AnalyzeClasses(packages = "com.example.caseintake")
class ClientArchitectureRulesTest {

    @ArchTest
    static final ArchRule application_does_not_depend_on_webclient =
            noClasses()
                    .that().resideInAPackage("..application..")
                    .should().dependOnClassesThat().haveSimpleName("WebClient");
}

This forces HTTP details into infrastructure adapters.

23. Practice exercise

Design the dependency policy for this service:

Enforcement Case Service depends on:
- Party Profile Service
- Evidence Metadata Service
- Decision Rule Service
- Notification Dispatch Service
- Audit Event Service

For each dependency, define:

purpose
criticality
discovery mechanism
timeout
retry policy
idempotency requirement
fallback mode
connection pool limit
observability metrics
shutdown behavior

Then answer:

Which dependencies are allowed inside the synchronous user request?
Which dependencies should become async/outbox-driven?
Which dependency failure should block the business operation?
Which dependency failure should create pending workflow state?
Which metrics would show discovery or stale endpoint issues?

24. Summary

Service discovery is not just name lookup.

It is the runtime collaboration between:

platform topology
service identity
readiness state
endpoint selection
connection lifecycle
Java client behavior
retry semantics
graceful shutdown
telemetry

The main lesson:

A service name gets you to the dependency. Client behavior determines whether calling it is safe.

Strong microservices architecture treats every dependency as an explicit runtime contract.

Weak architecture hides dependency behavior inside framework defaults.

Framework defaults can start a system.

They cannot safely operate a complex distributed system by accident.

Lesson Recap

You just completed lesson 63 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 62

Container-Ready Java Service Design

Next Lesson

Lesson 64

API Gateway, Edge, and BFF Design