Series/Learn Java Microservices Communication

Start HereOrdered learning track

Status Code Design for Service-to-Service APIs

Learn Java Microservices Communication - Part 011

Production-grade guide to HTTP status code design for Java service-to-service APIs; success, client failure, server failure, retriability, Problem Details, and Java implementation patterns.

[2026-07-05]20 min read3872 words

In This Lesson

1. The Mental Model 2. Status Codes Are Not Exception Names 3. The Small Internal Status Code Set

PrevNext

Lesson 1196 lesson track01–17 Start Here

#java#microservices#http#status-codes+3 more

Part 011 — Status Code Design for Service-to-Service APIs

A status code is not decoration. In service-to-service communication, it is the first control signal returned by the callee.

Before the caller parses the body, before it logs the error, before it decides whether to retry, before an SRE looks at dashboards, the status code already says something:

did the server understand the request?
was the requested operation accepted, completed, rejected, or deferred?
is the failure probably caused by the caller or the callee?
is retry likely safe?
is this error part of normal business flow or an infrastructure incident?
should the caller compensate, stop, back off, or page someone?

In weak systems, status codes are treated as a web API convention. In strong systems, status codes are part of the distributed control plane.

This part builds a practical status code design model for Java microservices.

1. The Mental Model

HTTP has a large status code registry, but most internal services only need a small, disciplined subset.

The purpose of a status code is not to encode every domain detail. The purpose is to classify the protocol-level outcome.

Domain detail belongs in the response body. Operational classification belongs in the status code.

A good status code answers one question first:

What class of decision should the caller make next?

Not:

What exact exception was thrown inside the callee?

That distinction matters.

2. Status Codes Are Not Exception Names

A common Java microservice failure mode is mapping exceptions directly to status codes:

IllegalArgumentException  -> 400
EntityNotFoundException   -> 404
OptimisticLockException   -> 500
IOException               -> 500
RuntimeException          -> 500

This is better than returning 200 OK for everything, but it is still too mechanical.

The caller does not care whether the server threw IllegalStateException, SQLException, TimeoutException, or WebClientResponseException as an implementation detail. The caller cares about actionability.

Better mapping starts from outcome categories:

Outcome	Question	Status family
Completed	Did the operation finish successfully?	`2xx`
Accepted but not complete	Did the service accept responsibility but defer completion?	`202`
Invalid request	Did the caller send malformed or semantically invalid input?	`4xx`
Authorization/authentication failure	Is the caller not allowed or not authenticated?	`401` / `403`
Missing target	Does the requested resource not exist in this API's visible model?	`404`
Conflict	Is the request valid but conflicts with current state?	`409` / `412`
Overload or dependency failure	Is the service unable to process now?	`503` / sometimes `502` / `504`
Bug or unknown server failure	Did the server fail unexpectedly?	`500`

Think in decisions, not exception names.

3. The Small Internal Status Code Set

For service-to-service APIs, start with this conservative set.

Code	Name	Primary meaning in internal APIs
`200`	OK	Query or command response completed and returns representation/result.
`201`	Created	A new resource was created and can be identified.
`202`	Accepted	Request accepted for asynchronous processing; outcome is not final yet.
`204`	No Content	Operation completed successfully; no response body is needed.
`400`	Bad Request	Malformed syntax, invalid JSON, invalid query parameter shape, unreadable body.
`401`	Unauthorized	Caller is unauthenticated or credentials are missing/invalid.
`403`	Forbidden	Caller is authenticated but not allowed to perform the action.
`404`	Not Found	Target resource is not visible/found in this API boundary.
`405`	Method Not Allowed	Resource exists but method is unsupported.
`409`	Conflict	Valid request conflicts with current resource/process state.
`410`	Gone	Resource used to exist but is intentionally no longer available.
`412`	Precondition Failed	Conditional request failed, usually `If-Match` / ETag concurrency control.
`413`	Content Too Large	Payload exceeds service limit.
`415`	Unsupported Media Type	Content type is unsupported.
`422`	Unprocessable Content	Syntactically valid request but semantically invalid domain input.
`429`	Too Many Requests	Caller is being rate-limited or quota-limited.
`500`	Internal Server Error	Unexpected server failure; do not expose internals.
`502`	Bad Gateway	This component is acting as a gateway/proxy and got invalid response upstream.
`503`	Service Unavailable	Service is overloaded, unavailable, draining, or dependency path is unavailable.
`504`	Gateway Timeout	This component is acting as gateway/proxy and upstream timed out.

Do not treat this list as a law. Treat it as a default vocabulary.

The goal is not to use more codes. The goal is to make fewer codes mean more.

4. Never Return `200 OK` for Failed Commands

This is one of the most damaging patterns in internal APIs:

{
  "success": false,
  "errorCode": "INSUFFICIENT_BALANCE"
}

with status:

HTTP/1.1 200 OK

Why it hurts:

client libraries classify the response as success;
metrics show success even though the operation failed;
retries may not trigger when they should;
SLO dashboards undercount failure;
gateways and service meshes cannot reason about traffic;
incident triage requires parsing custom payloads;
humans cannot scan logs quickly.

A response may have domain-level denial, but the HTTP layer still needs honest classification.

For example:

Scenario	Better status
Submitted command violates business rule	`422` or `409`
Payment already captured	`409` or `200` depending on idempotency semantics
Case already escalated	`409`
Resource exists and duplicate create is harmless under same idempotency key	`200` / `201` with idempotent result
Duplicate command with different body under same idempotency key	`409`
Validation failed	`422`

A 2xx response means the HTTP-level operation succeeded. It does not have to mean the business world is happy, but it must mean the request was processed according to the API contract.

5. Success Codes: Design the Happy Path Precisely

Most internal APIs overuse 200 OK.

That is not catastrophic, but precise success codes make APIs easier to reason about.

5.1 `200 OK`

Use 200 when:

a query returns a representation;
a command returns a result body;
an idempotent create/update returns the existing/current representation;
a command completed synchronously and the caller needs details.

Example:

HTTP/1.1 200 OK
Content-Type: application/json
ETag: "case-v14"

{
  "caseId": "CASE-1001",
  "state": "UNDER_REVIEW"
}

5.2 `201 Created`

Use 201 when the request creates a new resource.

Include Location when the created resource has a canonical URI.

HTTP/1.1 201 Created
Location: /cases/CASE-1001
Content-Type: application/json

{
  "caseId": "CASE-1001",
  "state": "DRAFT"
}

Do not use 201 for every command that creates side effects. Use it when the API contract exposes a newly created resource.

5.3 `202 Accepted`

Use 202 when the service accepts responsibility but completion is deferred.

This is common for:

long-running workflows;
asynchronous command handling;
batch ingestion;
outbox-backed processing;
workflow engine start commands;
expensive export generation;
external integration submissions.

A correct 202 response must answer:

what was accepted?
how can the caller observe progress?
what is the correlation/operation id?
what states can the operation reach?
when should the caller poll or expect callback/event?

Bad 202:

HTTP/1.1 202 Accepted

{}

Better 202:

HTTP/1.1 202 Accepted
Location: /operations/OP-8891
Retry-After: 5
Content-Type: application/json

{
  "operationId": "OP-8891",
  "status": "ACCEPTED",
  "submittedAt": "2026-07-05T02:11:43Z",
  "statusUrl": "/operations/OP-8891"
}

202 is not a shortcut to avoid reliability design. It creates an obligation to expose operation state.

5.4 `204 No Content`

Use 204 when the command completed and there is no useful body.

Good examples:

delete completed;
status flag updated;
association removed;
command completed but caller already has all needed state.

Example:

HTTP/1.1 204 No Content

Do not return a JSON body with 204. If you need a body, use 200.

6. Redirect Codes in Internal APIs

Redirects are often useful on the public web, but they should be rare in service-to-service APIs.

Why?

they hide topology changes from clients;
they complicate tracing;
some client libraries change methods on redirects unless configured carefully;
authorization and header propagation can become dangerous;
retries and redirect loops become harder to analyze.

For internal APIs, prefer:

stable service discovery;
gateway routing;
explicit version migration;
canonical identifiers;
API deprecation headers;
client configuration updates.

Use 301, 302, 307, or 308 only if your client stack, gateway, and security policy are explicitly designed for it.

7. `400 Bad Request` vs `422 Unprocessable Content`

This distinction is important.

Use 400 when the service cannot reliably interpret the request as a valid protocol/API message.

Examples:

invalid JSON syntax;
wrong query parameter type;
missing required primitive parameter;
invalid enum string format;
unreadable body;
invalid date format;
unsupported parameter shape.

Use 422 when the request is syntactically valid but semantically invalid for the domain.

Examples:

effectiveDate is before allowed policy date;
requested transition is invalid for the submitted case state;
amount exceeds configured business limit;
field combination violates domain rule;
submitted identifier is well-formed but incompatible with the operation.

This distinction helps client teams. 400 usually means the client generated the API message incorrectly. 422 usually means the client sent a valid message that the domain rejected.

8. `409 Conflict` vs `422 Unprocessable Content`

Both represent valid requests that cannot be completed. The difference is the role of current server state.

Use 409 when the request conflicts with current resource/process state.

Examples:

case is already closed;
document has already been approved;
duplicate unique external reference exists;
transition conflicts with current workflow state;
idempotency key was reused with different payload;
order cannot be cancelled because fulfillment already started.

Use 422 when the request is semantically invalid regardless of race/current state.

Examples:

invalid field combination;
unsupported business category;
invalid date range;
invalid command shape after schema validation.

Decision rule:

If the exact same request might become valid after the resource state changes,
prefer 409.

If the exact same request is invalid as submitted regardless of server state,
prefer 422.

In regulatory/workflow systems, 409 is especially useful because process state is often the real constraint.

Example:

HTTP/1.1 409 Conflict
Content-Type: application/problem+json

{
  "type": "https://errors.example.internal/case-state-conflict",
  "title": "Case state conflict",
  "status": 409,
  "detail": "Case CASE-1001 cannot be escalated from CLOSED state.",
  "instance": "/cases/CASE-1001/commands/escalate/REQ-8821",
  "errorCode": "CASE_STATE_CONFLICT",
  "currentState": "CLOSED",
  "allowedStates": ["UNDER_REVIEW", "INVESTIGATION"]
}

9. `404 Not Found` Is a Boundary Statement

404 does not always mean the row is absent from the database.

It means:

The requested resource is not visible at this API boundary.

Possible reasons:

it truly does not exist;
it exists but belongs to another tenant;
it exists but caller has no visibility;
it exists in another bounded context;
it existed but has been deleted and the API does not expose tombstones;
it is not yet materialized in a read model.

Do not leak internal existence details unless your security and domain model allow it.

For internal service-to-service APIs, be consistent:

Scenario	Recommended code
Resource absent and safe to reveal absence	`404`
Resource hidden due to permission and caller should know it lacks permission	`403`
Resource hidden due to tenancy/security and existence should not be disclosed	`404`
Resource permanently removed and clients need to distinguish from unknown	`410`
Read model not caught up yet after accepted async command	`202` operation status, or `404` with documented eventual consistency semantics

A weak 404 policy creates debugging pain. A strong 404 policy documents visibility boundaries.

10. `401` vs `403`

Keep the distinction simple:

401 Unauthorized: authentication failed or is missing.
403 Forbidden: authentication succeeded, but authorization failed.

Despite the name, 401 Unauthorized is about authentication challenge/credentials.

For this communication series, we will not go deep into authentication/authorization models because those belong to separate materials. Here, the important point is that status codes should not blur caller identity failures.

For internal APIs:

Scenario	Code
Missing service token	`401`
Expired service token	`401`
Invalid token signature	`401`
Valid service identity but scope missing	`403`
Valid service identity but tenant access denied	`403` or `404`, depending on visibility policy
mTLS client cert missing at gateway	gateway-level `401` / `403` depending policy

Do not return 500 for authentication middleware failure unless the middleware itself failed unexpectedly.

11. `412 Precondition Failed` and Concurrency Control

409 Conflict is broad. 412 Precondition Failed is precise.

Use 412 when the client supplied a conditional request precondition and the precondition failed.

Classic example: optimistic concurrency with ETag and If-Match.

Read:

GET /cases/CASE-1001 HTTP/1.1

Response:

HTTP/1.1 200 OK
ETag: "case-v14"

{
  "caseId": "CASE-1001",
  "state": "UNDER_REVIEW"
}

Update:

PUT /cases/CASE-1001 HTTP/1.1
If-Match: "case-v14"
Content-Type: application/json

{
  "priority": "HIGH"
}

If another update already moved the version to case-v15:

HTTP/1.1 412 Precondition Failed
Content-Type: application/problem+json

{
  "type": "https://errors.example.internal/precondition-failed",
  "title": "Precondition failed",
  "status": 412,
  "detail": "The supplied version case-v14 is no longer current.",
  "errorCode": "VERSION_MISMATCH",
  "currentETag": "case-v15"
}

Use 409 for general state conflicts. Use 412 for explicit conditional-request failures.

12. Rate Limiting: `429 Too Many Requests`

Use 429 when the caller exceeded quota, rate, or concurrency limits.

429 should usually include Retry-After when the server can provide useful guidance.

HTTP/1.1 429 Too Many Requests
Retry-After: 10
Content-Type: application/problem+json

{
  "type": "https://errors.example.internal/rate-limit-exceeded",
  "title": "Rate limit exceeded",
  "status": 429,
  "detail": "Caller payment-service exceeded 100 requests per second for case-service.",
  "errorCode": "RATE_LIMIT_EXCEEDED",
  "retryable": true,
  "retryAfterSeconds": 10
}

Important distinction:

Code	Meaning
`429`	This caller is limited. The service may still be healthy.
`503`	The service or dependency path is unavailable/overloaded more generally.

If every caller receives failures due to overload, 503 is usually more accurate than 429.

If one noisy caller is throttled to protect the system, 429 is better.

13. Server Failure Codes

13.1 `500 Internal Server Error`

Use 500 for unexpected server failure.

Examples:

uncaught bug;
invariant violation;
serialization bug;
null pointer in server code;
unexpected database error not classified more precisely;
impossible state reached.

Do not expose internal stack traces or SQL messages in the response.

Bad:

{
  "error": "NullPointerException at CaseService.java:119"
}

Better:

HTTP/1.1 500 Internal Server Error
Content-Type: application/problem+json

{
  "type": "https://errors.example.internal/internal-error",
  "title": "Internal server error",
  "status": 500,
  "detail": "The service failed while processing the request.",
  "errorCode": "INTERNAL_ERROR",
  "traceId": "4bf92f3577b34da6a3ce929d0e0e4736"
}

The trace id lets operators find the internal cause without leaking details to callers.

13.2 `503 Service Unavailable`

Use 503 when the service cannot process requests now but may recover.

Examples:

service is overloaded;
service is draining during deployment;
dependency is unavailable and operation cannot proceed;
circuit breaker is open;
database pool exhausted;
broker unavailable for a command that must publish before returning;
maintenance window.

Include Retry-After when meaningful.

HTTP/1.1 503 Service Unavailable
Retry-After: 30
Content-Type: application/problem+json

{
  "type": "https://errors.example.internal/service-unavailable",
  "title": "Service unavailable",
  "status": 503,
  "detail": "Case service is temporarily unable to process escalation commands.",
  "errorCode": "SERVICE_UNAVAILABLE",
  "retryable": true
}

Do not blindly convert every downstream failure to 500. If your service is healthy but a dependency path is unavailable, 503 often gives the caller a better control signal.

13.3 `502 Bad Gateway` and `504 Gateway Timeout`

Use 502 or 504 when the component is acting as a gateway/proxy/intermediary.

If case-api directly owns a business operation and its dependency fails, 503 may be more accurate.

If case-gateway forwards to case-service and receives invalid response, 502 is accurate.

If case-gateway waits for case-service and times out, 504 is accurate.

Avoid using 502 and 504 from normal application controllers unless the controller is explicitly acting as an intermediary.

14. Retriability Is Not a Status Code Alone

Status code influences retry, but it is not sufficient.

Retry safety also depends on:

HTTP method semantics;
idempotency key;
operation side effects;
whether the request reached the server;
whether the failure happened before or after commit;
whether the response body marks the error as retryable;
caller's remaining deadline;
global retry budget;
service overload state.

Basic matrix:

Code	Usually retry?	Notes
`400`	No	Client generated invalid request.
`401`	No immediate retry	Refresh credentials only if supported.
`403`	No	Permission/config issue.
`404`	Usually no	Except documented eventual-consistency read-after-write cases.
`409`	Usually no automatic retry	Requires state refresh or domain decision.
`412`	No blind retry	Re-read latest version, then retry if still desired.
`422`	No	Fix request/domain input.
`429`	Yes, with backoff and budget	Respect `Retry-After` if present.
`500`	Maybe	Only if operation is idempotent or protected by idempotency key.
`502`	Maybe	Usually transient at gateway/proxy layer.
`503`	Yes, with backoff and budget	Watch for overload amplification.
`504`	Maybe	Unknown outcome; retry only if safe/idempotent.

The dangerous one is 504 or client-side timeout after a command. The caller does not know whether the callee committed.

That is why idempotency design matters.

15. Problem Details as the Error Body

Use application/problem+json for machine-readable error bodies.

Minimum structure:

{
  "type": "https://errors.example.internal/validation-failed",
  "title": "Validation failed",
  "status": 422,
  "detail": "The request contains invalid domain fields.",
  "instance": "/cases/CASE-1001/commands/escalate/REQ-7781"
}

Useful internal extensions:

{
  "errorCode": "CASE_STATE_CONFLICT",
  "traceId": "4bf92f3577b34da6a3ce929d0e0e4736",
  "correlationId": "CORR-20260705-00091",
  "retryable": false,
  "severity": "WARN",
  "violatedInvariant": "CASE_CAN_ONLY_ESCALATE_FROM_ACTIVE_STATE",
  "fieldErrors": [
    {
      "field": "effectiveDate",
      "code": "MUST_NOT_BE_IN_PAST",
      "message": "effectiveDate must not be before today."
    }
  ]
}

Do not make detail the only machine-readable field. detail is for humans. Use stable errorCode for programmatic handling.

15.1 Stable Error Code Design

Error codes should be:

stable across deployments;
documented;
not tied to Java exception class names;
scoped enough to be useful;
not so granular that clients encode server internals;
observable in metrics and logs.

Good:

CASE_STATE_CONFLICT
VERSION_MISMATCH
VALIDATION_FAILED
RATE_LIMIT_EXCEEDED
DEPENDENCY_UNAVAILABLE

Bad:

NullPointerException
CaseServiceImplError119
SQL_STATE_23505_USER_TABLE
SomethingWentWrong

Error code is a contract. Exception type is implementation.

16. Java Implementation: Domain Error Classification

Start with an explicit error category model.

public enum ApiErrorCategory {
    BAD_REQUEST,
    UNAUTHENTICATED,
    FORBIDDEN,
    NOT_FOUND,
    CONFLICT,
    PRECONDITION_FAILED,
    VALIDATION_FAILED,
    RATE_LIMITED,
    INTERNAL_ERROR,
    SERVICE_UNAVAILABLE
}

Then define an application exception that carries stable semantics.

public class ApiException extends RuntimeException {
    private final ApiErrorCategory category;
    private final String errorCode;
    private final boolean retryable;
    private final Map<String, Object> attributes;

    public ApiException(
            ApiErrorCategory category,
            String errorCode,
            String message,
            boolean retryable,
            Map<String, Object> attributes
    ) {
        super(message);
        this.category = category;
        this.errorCode = errorCode;
        this.retryable = retryable;
        this.attributes = Map.copyOf(attributes);
    }

    public ApiErrorCategory category() {
        return category;
    }

    public String errorCode() {
        return errorCode;
    }

    public boolean retryable() {
        return retryable;
    }

    public Map<String, Object> attributes() {
        return attributes;
    }
}

Map category to status code centrally.

public final class ApiStatusMapper {
    private ApiStatusMapper() {}

    public static HttpStatusCode toStatus(ApiErrorCategory category) {
        return switch (category) {
            case BAD_REQUEST -> HttpStatus.BAD_REQUEST;
            case UNAUTHENTICATED -> HttpStatus.UNAUTHORIZED;
            case FORBIDDEN -> HttpStatus.FORBIDDEN;
            case NOT_FOUND -> HttpStatus.NOT_FOUND;
            case CONFLICT -> HttpStatus.CONFLICT;
            case PRECONDITION_FAILED -> HttpStatus.PRECONDITION_FAILED;
            case VALIDATION_FAILED -> HttpStatus.UNPROCESSABLE_ENTITY;
            case RATE_LIMITED -> HttpStatus.TOO_MANY_REQUESTS;
            case SERVICE_UNAVAILABLE -> HttpStatus.SERVICE_UNAVAILABLE;
            case INTERNAL_ERROR -> HttpStatus.INTERNAL_SERVER_ERROR;
        };
    }
}

This avoids scattering response decisions across controllers.

17. Spring Boot `ProblemDetail` Handler

Spring Framework includes ProblemDetail, which maps well to RFC 9457-style error bodies.

@RestControllerAdvice
public class ApiExceptionHandler {

    @ExceptionHandler(ApiException.class)
    public ResponseEntity<ProblemDetail> handleApiException(
            ApiException exception,
            HttpServletRequest request
    ) {
        HttpStatusCode status = ApiStatusMapper.toStatus(exception.category());

        ProblemDetail problem = ProblemDetail.forStatusAndDetail(
                status,
                exception.getMessage()
        );

        problem.setTitle(titleFor(exception.category()));
        problem.setType(URI.create("https://errors.example.internal/" +
                exception.errorCode().toLowerCase(Locale.ROOT).replace('_', '-')));
        problem.setInstance(URI.create(request.getRequestURI()));
        problem.setProperty("errorCode", exception.errorCode());
        problem.setProperty("retryable", exception.retryable());

        exception.attributes().forEach(problem::setProperty);

        return ResponseEntity
                .status(status)
                .contentType(MediaType.APPLICATION_PROBLEM_JSON)
                .body(problem);
    }

    @ExceptionHandler(MethodArgumentNotValidException.class)
    public ResponseEntity<ProblemDetail> handleValidation(
            MethodArgumentNotValidException exception,
            HttpServletRequest request
    ) {
        ProblemDetail problem = ProblemDetail.forStatusAndDetail(
                HttpStatus.UNPROCESSABLE_ENTITY,
                "The request contains invalid domain fields."
        );
        problem.setTitle("Validation failed");
        problem.setType(URI.create("https://errors.example.internal/validation-failed"));
        problem.setInstance(URI.create(request.getRequestURI()));
        problem.setProperty("errorCode", "VALIDATION_FAILED");
        problem.setProperty("retryable", false);

        List<Map<String, Object>> fieldErrors = exception.getBindingResult()
                .getFieldErrors()
                .stream()
                .map(error -> Map.<String, Object>of(
                        "field", error.getField(),
                        "code", Objects.requireNonNullElse(error.getCode(), "INVALID"),
                        "message", Objects.requireNonNullElse(error.getDefaultMessage(), "Invalid value")
                ))
                .toList();

        problem.setProperty("fieldErrors", fieldErrors);

        return ResponseEntity
                .status(HttpStatus.UNPROCESSABLE_ENTITY)
                .contentType(MediaType.APPLICATION_PROBLEM_JSON)
                .body(problem);
    }

    @ExceptionHandler(Exception.class)
    public ResponseEntity<ProblemDetail> handleUnexpected(
            Exception exception,
            HttpServletRequest request
    ) {
        ProblemDetail problem = ProblemDetail.forStatusAndDetail(
                HttpStatus.INTERNAL_SERVER_ERROR,
                "The service failed while processing the request."
        );
        problem.setTitle("Internal server error");
        problem.setType(URI.create("https://errors.example.internal/internal-error"));
        problem.setInstance(URI.create(request.getRequestURI()));
        problem.setProperty("errorCode", "INTERNAL_ERROR");
        problem.setProperty("retryable", true);

        return ResponseEntity
                .status(HttpStatus.INTERNAL_SERVER_ERROR)
                .contentType(MediaType.APPLICATION_PROBLEM_JSON)
                .body(problem);
    }

    private static String titleFor(ApiErrorCategory category) {
        return switch (category) {
            case BAD_REQUEST -> "Bad request";
            case UNAUTHENTICATED -> "Unauthenticated";
            case FORBIDDEN -> "Forbidden";
            case NOT_FOUND -> "Not found";
            case CONFLICT -> "Conflict";
            case PRECONDITION_FAILED -> "Precondition failed";
            case VALIDATION_FAILED -> "Validation failed";
            case RATE_LIMITED -> "Rate limit exceeded";
            case SERVICE_UNAVAILABLE -> "Service unavailable";
            case INTERNAL_ERROR -> "Internal server error";
        };
    }
}

Key idea: the handler is the only place where exceptions become HTTP.

18. Controller Example: State Conflict

Do not throw generic exceptions from domain workflows.

@PostMapping("/cases/{caseId}/commands/escalate")
public ResponseEntity<EscalateCaseResponse> escalate(
        @PathVariable String caseId,
        @RequestBody @Valid EscalateCaseRequest request,
        @RequestHeader(name = "Idempotency-Key", required = false) String idempotencyKey
) {
    EscalateCaseResult result = caseCommandService.escalate(caseId, request, idempotencyKey);

    return ResponseEntity.ok(new EscalateCaseResponse(
            result.caseId(),
            result.newState(),
            result.commandId()
    ));
}

Domain service:

public EscalateCaseResult escalate(
        String caseId,
        EscalateCaseRequest request,
        String idempotencyKey
) {
    CaseRecord record = caseRepository.findById(caseId)
            .orElseThrow(() -> new ApiException(
                    ApiErrorCategory.NOT_FOUND,
                    "CASE_NOT_FOUND",
                    "Case " + caseId + " was not found.",
                    false,
                    Map.of("caseId", caseId)
            ));

    if (!record.state().canEscalate()) {
        throw new ApiException(
                ApiErrorCategory.CONFLICT,
                "CASE_STATE_CONFLICT",
                "Case " + caseId + " cannot be escalated from " + record.state() + " state.",
                false,
                Map.of(
                        "caseId", caseId,
                        "currentState", record.state().name(),
                        "allowedStates", List.of("UNDER_REVIEW", "INVESTIGATION")
                )
        );
    }

    // perform command under transaction / idempotency policy
    return performEscalation(record, request, idempotencyKey);
}

This produces a useful 409, not an accidental 500.

19. Client-Side Status Classification

Callers need a small status classifier.

public enum RemoteOutcomeKind {
    SUCCESS,
    CLIENT_ERROR,
    CONFLICT,
    RATE_LIMITED,
    UNAVAILABLE,
    SERVER_ERROR,
    UNKNOWN
}

public final class HttpOutcomeClassifier {
    public RemoteOutcomeKind classify(int status) {
        if (status >= 200 && status <= 299) {
            return RemoteOutcomeKind.SUCCESS;
        }
        return switch (status) {
            case 409, 412 -> RemoteOutcomeKind.CONFLICT;
            case 429 -> RemoteOutcomeKind.RATE_LIMITED;
            case 502, 503, 504 -> RemoteOutcomeKind.UNAVAILABLE;
            default -> {
                if (status >= 400 && status <= 499) {
                    yield RemoteOutcomeKind.CLIENT_ERROR;
                }
                if (status >= 500 && status <= 599) {
                    yield RemoteOutcomeKind.SERVER_ERROR;
                }
                yield RemoteOutcomeKind.UNKNOWN;
            }
        };
    }

    public boolean isPotentiallyRetryable(int status, boolean idempotentOperation) {
        if (!idempotentOperation) {
            return false;
        }
        return status == 408 || status == 429 || status == 500 ||
               status == 502 || status == 503 || status == 504;
    }
}

This is intentionally conservative. Retry policy will be covered deeply later, but status code classification starts here.

20. Status Codes and SLOs

Your SLO/error budget policy must classify HTTP status codes intentionally.

Naive rule:

2xx = success
everything else = failure

This can be acceptable for a first dashboard, but high-quality systems need more nuance.

Examples:

Response	Should it burn server SLO?	Why
`400` due to malformed caller request	Usually no	Caller bug, not server availability issue.
`401` invalid token	Usually no	Auth/caller issue, unless caused by auth outage.
`403` permission denied	Usually no	Expected policy enforcement.
`404` unknown resource	Usually no	Often expected read behavior.
`409` business conflict	Usually no	Expected domain/process conflict.
`422` validation failure	Usually no	Expected domain validation.
`429` caller quota exceeded	Maybe	If due to protective caller throttling, no; if global overload, maybe.
`500`	Yes	Server failure.
`503`	Yes	Availability failure/overload.
`504` from gateway	Yes	Dependency/path timeout.

Do not let expected business rejections look like infrastructure failure.

Also do not hide infrastructure failure as business rejection.

21. Observability Rules

Every non-2xx response should be observable along at least these dimensions:

Dimension	Example
route template	`/cases/{caseId}/commands/escalate`
method	`POST`
status	`409`
error code	`CASE_STATE_CONFLICT`
retryable	`false`
caller service	`workflow-service`
callee service	`case-service`
trace id	`4bf92f3577b34da6a3ce929d0e0e4736`
latency bucket	`p50`, `p95`, `p99`

Avoid high-cardinality labels like raw caseId, full URL, exception message, or user id in metrics.

Good metric labels:

http.server.requests{
  service="case-service",
  method="POST",
  route="/cases/{caseId}/commands/escalate",
  status="409",
  error_code="CASE_STATE_CONFLICT"
}

Bad metric labels:

http.server.requests{
  url="/cases/CASE-1001/commands/escalate",
  exception_message="Case CASE-1001 cannot be escalated from CLOSED state"
}

The first supports aggregation. The second explodes cardinality.

22. Status Code Decision Matrix

Use this during API design reviews.

Situation	Status	Body
Query found resource	`200`	representation
Query empty collection	`200`	empty list/page
Query missing resource	`404`	Problem Details
Command completed with result	`200`	result
Command created resource	`201`	representation or creation result
Command completed with no result	`204`	none
Command accepted async	`202`	operation status/link
Malformed JSON	`400`	Problem Details
Invalid domain fields	`422`	Problem Details + field errors
Invalid state transition	`409`	Problem Details + current state
Optimistic concurrency failed	`412`	Problem Details + current version if safe
Caller exceeded quota	`429`	Problem Details + Retry-After
Service overloaded	`503`	Problem Details + Retry-After if useful
Unexpected bug	`500`	Problem Details without internals
Gateway upstream timeout	`504`	Problem Details

Print this matrix into your internal API handbook. The value is not the table itself; the value is consistency across teams.

23. Common Anti-Patterns

Anti-pattern 1: `200 OK` with error payload

Already discussed. It breaks infrastructure-level reasoning.

Anti-pattern 2: Everything is `400`

If every caller-side failure is 400, clients cannot distinguish malformed syntax, validation, conflict, concurrency, and quota.

Anti-pattern 3: Everything is `500`

This creates false incidents and hides actionable client errors.

Anti-pattern 4: Business conflicts as `500`

A case being closed is not a server crash. It is domain state.

Anti-pattern 5: Infrastructure failure as `409`

A database outage is not a business conflict. Do not hide system failures as domain errors.

Anti-pattern 6: Retrying all `5xx`

A 500 after a non-idempotent command may already have committed side effects. Retry can duplicate work.

Anti-pattern 7: Returning stack traces

Never expose stack traces in service-to-service response bodies. Use trace ids.

Anti-pattern 8: Error codes that change with implementation

JpaOptimisticLockingFailureException is not a stable external error code.

24. Testing Status Code Contracts

Status code behavior should be tested as API contract, not incidental controller behavior.

Example tests:

@Test
void escalateClosedCaseReturns409() throws Exception {
    mockMvc.perform(post("/cases/CASE-1001/commands/escalate")
            .contentType(MediaType.APPLICATION_JSON)
            .content("""
                {"reason":"priority-risk"}
                """))
        .andExpect(status().isConflict())
        .andExpect(content().contentType(MediaType.APPLICATION_PROBLEM_JSON))
        .andExpect(jsonPath("$.errorCode").value("CASE_STATE_CONFLICT"))
        .andExpect(jsonPath("$.retryable").value(false));
}

@Test
void malformedJsonReturns400() throws Exception {
    mockMvc.perform(post("/cases/CASE-1001/commands/escalate")
            .contentType(MediaType.APPLICATION_JSON)
            .content("{"))
        .andExpect(status().isBadRequest())
        .andExpect(jsonPath("$.errorCode").value("BAD_REQUEST"));
}

@Test
void invalidDomainFieldReturns422() throws Exception {
    mockMvc.perform(post("/cases/CASE-1001/commands/escalate")
            .contentType(MediaType.APPLICATION_JSON)
            .content("""
                {"effectiveDate":"1900-01-01"}
                """))
        .andExpect(status().isUnprocessableEntity())
        .andExpect(jsonPath("$.errorCode").value("VALIDATION_FAILED"));
}

These tests encode operational semantics.

25. Production Review Checklist

Before approving a service-to-service API, ask:

Does every endpoint have documented success statuses?
Are async commands represented with 202 and operation state?
Are validation failures separated from malformed requests?
Are state conflicts separated from validation failures?
Are concurrency conflicts represented precisely?
Are server failures not leaked as stack traces?
Are expected business rejections excluded from availability SLO burn?
Are rate limits represented with 429 and useful retry guidance?
Are overload/unavailable paths represented with 503?
Are gateway/intermediary errors distinguished from application errors?
Are error codes stable and documented?
Does every error include a trace/correlation mechanism?
Can clients classify retry safety without parsing human text?

If the answer is no, the API is not yet production-grade.

26. Final Mental Model

Status code design is not about memorizing codes.

It is about making distributed decisions cheap.

A strong status code design gives every layer a clean signal:

The best status code strategy is boring, small, consistent, and brutally honest.

If the caller is wrong, say so.

If the domain rejected the command, say so.

If the service is overloaded, say so.

If the server crashed, say so without leaking internals.

That honesty is what makes large microservice systems debuggable.

References

RFC 9110 — HTTP Semantics
RFC 9457 — Problem Details for HTTP APIs
RFC 6585 — Additional HTTP Status Codes, including 429 Too Many Requests
OpenTelemetry Semantic Conventions for HTTP spans
Google SRE materials on overload, error budgets, and cascading failure

Lesson Recap

You just completed lesson 11 in start here. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 10

HTTP Method Semantics: Safety, Idempotency, Cacheability

Next Lesson

Lesson 12

Headers, Metadata, Correlation, and Context Propagation

Status Code Design for Service-to-Service APIs

Part 011 — Status Code Design for Service-to-Service APIs

1. The Mental Model

2. Status Codes Are Not Exception Names

3. The Small Internal Status Code Set

4. Never Return 200 OK for Failed Commands

5. Success Codes: Design the Happy Path Precisely

5.1 200 OK

5.2 201 Created

5.3 202 Accepted

5.4 204 No Content

6. Redirect Codes in Internal APIs

7. 400 Bad Request vs 422 Unprocessable Content

8. 409 Conflict vs 422 Unprocessable Content

9. 404 Not Found Is a Boundary Statement

10. 401 vs 403

11. 412 Precondition Failed and Concurrency Control

12. Rate Limiting: 429 Too Many Requests

13. Server Failure Codes

13.1 500 Internal Server Error

13.2 503 Service Unavailable

13.3 502 Bad Gateway and 504 Gateway Timeout

14. Retriability Is Not a Status Code Alone

15. Problem Details as the Error Body

15.1 Stable Error Code Design

16. Java Implementation: Domain Error Classification

17. Spring Boot ProblemDetail Handler

18. Controller Example: State Conflict

19. Client-Side Status Classification

20. Status Codes and SLOs

21. Observability Rules

22. Status Code Decision Matrix

23. Common Anti-Patterns

Anti-pattern 1: 200 OK with error payload

Anti-pattern 2: Everything is 400

Anti-pattern 3: Everything is 500

Anti-pattern 4: Business conflicts as 500

Anti-pattern 5: Infrastructure failure as 409

Anti-pattern 6: Retrying all 5xx

Anti-pattern 7: Returning stack traces

Anti-pattern 8: Error codes that change with implementation

24. Testing Status Code Contracts

25. Production Review Checklist

26. Final Mental Model

References

4. Never Return `200 OK` for Failed Commands

5.1 `200 OK`

5.2 `201 Created`

5.3 `202 Accepted`

5.4 `204 No Content`

7. `400 Bad Request` vs `422 Unprocessable Content`

8. `409 Conflict` vs `422 Unprocessable Content`

9. `404 Not Found` Is a Boundary Statement

10. `401` vs `403`

11. `412 Precondition Failed` and Concurrency Control

12. Rate Limiting: `429 Too Many Requests`

13.1 `500 Internal Server Error`

13.2 `503 Service Unavailable`

13.3 `502 Bad Gateway` and `504 Gateway Timeout`

17. Spring Boot `ProblemDetail` Handler

Anti-pattern 1: `200 OK` with error payload

Anti-pattern 2: Everything is `400`

Anti-pattern 3: Everything is `500`

Anti-pattern 4: Business conflicts as `500`

Anti-pattern 5: Infrastructure failure as `409`

Anti-pattern 6: Retrying all `5xx`