Build CoreOrdered learning track

Observability in Jersey: Logs, Metrics, Tracing, Diagnostics

Learn Java Eclipse Jersey & GlassFish - Part 016

Observability in Jersey: structured logs, correlation IDs, metrics, tracing, Jersey monitoring, request lifecycle events, diagnostics, and production debugging on GlassFish.

18 min read3412 words
PrevNext
Lesson 1634 lesson track0718 Build Core
#java#jakarta-rest#jersey#glassfish+7 more

Part 016 — Observability in Jersey: Logs, Metrics, Tracing, Diagnostics

Goal: membangun kemampuan membaca sistem Jersey/GlassFish dari luar dan dalam: request logs, structured application logs, correlation ID, metrics, tracing, Jersey event listener, monitoring statistics, JMX, thread dump, dan diagnostic decision tree.

Observability bukan “menambahkan log”. Observability adalah kemampuan menjawab pertanyaan operasional tanpa deploy ulang dan tanpa menebak:

  • Endpoint mana yang lambat?
  • Lambatnya di matching, provider, resource method, database, external call, atau serialization?
  • 500 naik karena exception mapper, provider, resource method, atau dependency downstream?
  • 415 muncul karena client salah Content-Type atau provider hilang?
  • 401/403 naik karena token invalid, role missing, tenant mismatch, atau outage identity provider?
  • Thread pool GlassFish habis karena slow client, blocking DB, deadlock, atau outbound timeout?
  • Deployment baru mengubah provider selection atau route registry?

Mental model utama:

Logs explain individual events. Metrics explain aggregate behavior. Traces explain causal paths. Diagnostics explain runtime state. You need all four.


1. Kaufman Deconstruction

Pecah skill observability Jersey menjadi sub-skill berikut:

Sub-skillPertanyaan yang harus bisa dijawab
Request identityBagaimana satu request dilacak dari gateway sampai resource method?
LoggingLog apa yang wajib ada, di mana, dengan field apa?
MetricsCounter/timer/gauge apa yang membuktikan health dan degradation?
TracingBagaimana memahami jalur request across service dan internal Jersey stage?
Jersey monitoringBagaimana memakai event listener, monitoring statistics, dan JMX?
GlassFish diagnosticsKapan membaca server log, access log, thread dump, heap dump, JMX?
Failure classificationBagaimana membedakan client error, auth error, provider error, pool exhaustion, timeout?
Production safetyBagaimana observability tidak membocorkan token/PII dan tidak terlalu mahal?

Target praktis:

  1. Bisa membuat correlation ID filter.
  2. Bisa mendesain structured log contract untuk Jersey request.
  3. Bisa menambahkan request timer dengan outcome classification.
  4. Bisa memakai Jersey event listener untuk lifecycle diagnostics.
  5. Bisa menjelaskan kapan mengaktifkan Jersey tracing/monitoring statistics.
  6. Bisa membaca symptom HTTP menjadi runtime hypothesis.

2. Four Pillars for Jersey Runtime

Each pillar answers a different kind of question:

PillarGood atBad at
LogsExplaining individual events and decisionsAggregate trend without processing
MetricsAlerting, SLO, rate, latency, saturationPer-request causality
TracesEnd-to-end causal pathHigh-cardinality global aggregation
DiagnosticsThread/heap/classloader/runtime stateContinuous business health

For Jersey/GlassFish production, a minimum baseline is:

  • access log at edge or server;
  • application structured log with correlation ID;
  • request count by method/path/status/outcome;
  • latency histogram/timer by route;
  • outbound client metrics;
  • auth failure counters;
  • exception mapper counters;
  • thread pool/JDBC pool gauges;
  • ability to capture thread dump;
  • targeted tracing for diagnostic windows.

3. Request Lifecycle Observability Map

Jersey request processing has multiple observable stages.

You want to know:

  • Did request reach GlassFish?
  • Did it reach Jersey?
  • Which resource method matched?
  • Which auth decision happened?
  • Did body reading fail?
  • Did resource method throw?
  • Did exception mapper handle it?
  • Did body writing fail?
  • What was final status and latency?

This is why one “request completed” log is not enough.


4. Correlation ID Filter

Every request should have a stable correlation ID.

Rules:

  • Accept a trusted inbound correlation header if present and valid.
  • Generate one if missing.
  • Return it in response header.
  • Put it into logging context.
  • Propagate it to outbound Jersey Client calls.
  • Never use correlation ID as authentication.

Example:

@Provider
@Priority(Priorities.AUTHENTICATION - 100)
public final class CorrelationIdFilter implements ContainerRequestFilter, ContainerResponseFilter {

    public static final String HEADER = "X-Correlation-Id";
    public static final String PROPERTY = "correlationId";

    @Override
    public void filter(ContainerRequestContext requestContext) {
        String inbound = requestContext.getHeaderString(HEADER);
        String correlationId = isValid(inbound) ? inbound : generate();

        requestContext.setProperty(PROPERTY, correlationId);
        org.slf4j.MDC.put(PROPERTY, correlationId);
    }

    @Override
    public void filter(ContainerRequestContext requestContext,
                       ContainerResponseContext responseContext) {
        Object correlationId = requestContext.getProperty(PROPERTY);
        if (correlationId != null) {
            responseContext.getHeaders().putSingle(HEADER, correlationId.toString());
        }
        org.slf4j.MDC.remove(PROPERTY);
    }

    private static boolean isValid(String value) {
        return value != null && value.length() <= 128 && value.matches("[A-Za-z0-9._:-]+");
    }

    private static String generate() {
        return java.util.UUID.randomUUID().toString();
    }
}

Important nuance:

ContainerResponseFilter should clean MDC, but async flows and exception paths can complicate cleanup. For heavy production systems, prefer a try/finally wrapper or a request lifecycle listener that guarantees cleanup at request finish.


5. Structured Request Log Contract

A useful request-completed log should be machine-readable.

Recommended fields:

FieldExample
eventhttp_request_completed
correlation_id7c7b...
trace_idOpenTelemetry trace ID if available
methodPOST
route_template/tenants/{tenantId}/cases/{caseId}/approve
pathoptional, avoid high-cardinality full path if sensitive
status403
outcomeauthz_denied
duration_ms42
principal_hashhash/pseudonymized subject if needed
tenant_idif safe and policy allows
error_codeTENANT_MISMATCH
exception_typeForbiddenException
request_size_bytesoptional
response_size_bytesoptional

Do not log:

  • bearer tokens;
  • cookies;
  • passwords;
  • raw PII payload;
  • full request body by default;
  • authorization headers;
  • large response bodies.

Example log emission:

@Provider
@Priority(Priorities.USER)
public final class RequestCompletionLoggingFilter implements ContainerRequestFilter, ContainerResponseFilter {

    private static final String START_NANOS = "startNanos";

    @Context
    ResourceInfo resourceInfo;

    @Override
    public void filter(ContainerRequestContext requestContext) {
        requestContext.setProperty(START_NANOS, System.nanoTime());
    }

    @Override
    public void filter(ContainerRequestContext requestContext,
                       ContainerResponseContext responseContext) {
        long start = (long) requestContext.getProperty(START_NANOS);
        long durationMs = java.util.concurrent.TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start);

        String route = routeTemplate(resourceInfo);
        int status = responseContext.getStatus();

        log.info("event=http_request_completed method={} route={} status={} duration_ms={} outcome={}",
                requestContext.getMethod(),
                route,
                status,
                durationMs,
                classify(status));
    }

    private static String routeTemplate(ResourceInfo info) {
        if (info == null || info.getResourceMethod() == null) {
            return "unknown";
        }
        Path classPath = info.getResourceClass().getAnnotation(Path.class);
        Path methodPath = info.getResourceMethod().getAnnotation(Path.class);
        String a = classPath == null ? "" : classPath.value();
        String b = methodPath == null ? "" : methodPath.value();
        return ("/" + a + "/" + b).replaceAll("//+", "/");
    }

    private static String classify(int status) {
        if (status >= 500) return "server_error";
        if (status == 401) return "authn_failed";
        if (status == 403) return "authz_denied";
        if (status >= 400) return "client_error";
        return "success";
    }
}

In real code, use structured logging encoder instead of string concatenation so fields become indexed JSON.


6. Metrics: What to Measure

Metrics should represent rate, latency, errors, and saturation.

6.1 HTTP Server Metrics

MetricTypeLabels
http.server.requests.totalcountermethod, route, status, outcome
http.server.request.durationhistogram/timermethod, route, status class
http.server.request.byteshistogramroute, method
http.server.response.byteshistogramroute, status class

Avoid high-cardinality labels:

  • user ID;
  • full path with IDs;
  • raw exception message;
  • token subject;
  • search query text.

Use route template, not concrete path.

Bad:

route=/cases/CASE-2026-000123

Good:

route=/cases/{caseId}

6.2 Security Metrics

MetricWhy it matters
authn.failures.total{reason}Detect expired/invalid token spikes
authz.denials.total{permission}Detect policy/config regressions
tenant.denials.totalDetect probing or bad client routing
public.endpoint.requests.totalDetect abuse on unauthenticated endpoints

6.3 Jersey Runtime Metrics

MetricWhy it matters
provider read/write durationSerialization/deserialization bottleneck
exception mapper countsError taxonomy health
request filter durationExpensive auth/audit/filter chain
resource method durationBusiness endpoint latency
async timeout countSaturation or downstream slowness
SSE open connectionsLong-lived connection load

6.4 GlassFish / Infrastructure Metrics

MetricWhy it matters
HTTP thread pool active/queuedThread starvation
JDBC pool active/available/wait timeDB bottleneck
JVM heap/non-heapMemory pressure
GC pauseLatency spikes
CPUSaturation
file descriptorsConnection/resource leak

7. Jersey Event Listener for Request Diagnostics

Jersey provides server-side monitoring extension points such as ApplicationEventListener and RequestEventListener. They let you observe lifecycle events without polluting every resource method.

Example:

@Provider
public final class RequestDiagnosticsListener implements ApplicationEventListener {

    @Override
    public void onEvent(ApplicationEvent event) {
        if (event.getType() == ApplicationEvent.Type.INITIALIZATION_FINISHED) {
            log.info("event=jersey_initialized resources={} providers={}",
                    event.getResourceConfig().getClasses().size(),
                    event.getProviders().size());
        }
    }

    @Override
    public RequestEventListener onRequest(RequestEvent requestEvent) {
        long start = System.nanoTime();
        return event -> {
            switch (event.getType()) {
                case RESOURCE_METHOD_START -> log.debug("event=resource_method_start method={}",
                        event.getUriInfo().getMatchedResourceMethod());
                case EXCEPTION_MAPPING_FINISHED -> log.debug("event=exception_mapping_finished");
                case FINISHED -> log.debug("event=request_finished duration_ms={}",
                        java.util.concurrent.TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start));
                default -> {
                    // Keep noise controlled.
                }
            }
        };
    }
}

Use event listeners for:

  • startup route/provider visibility;
  • request stage diagnostics;
  • measuring runtime phases;
  • exceptional event tracking;
  • diagnostic builds or feature-flagged observation.

Do not use event listeners for business logic.


8. Jersey Monitoring Statistics and JMX

Jersey has optional monitoring statistics. It can expose data such as request execution statistics, resource method statistics, response counts, and exception mapper execution counts. It can also expose statistics through JMX MBeans.

Enable statistics:

public class ApiApplication extends ResourceConfig {
    public ApiApplication() {
        property("jersey.config.server.monitoring.statistics.enabled", true);
    }
}

Enable MBeans:

public class ApiApplication extends ResourceConfig {
    public ApiApplication() {
        property("jersey.config.server.monitoring.statistics.mbeans.enabled", true);
    }
}

Use carefully:

  • statistics collection has overhead;
  • JMX exposure may reveal internal application structure;
  • enable only when operationally justified;
  • protect JMX access;
  • do not expose it publicly;
  • prefer controlled environment profiles.

A safe profile strategy:

EnvironmentStatisticsMBeansTracing
local/devonoptionalon demand/all
stagingon demandprotectedon demand
production defaultoff or minimalprotected/offoff/on demand only
incident windowtemporarily onprotectedon demand

9. Jersey Tracing Support

Jersey tracing can show internal request processing stages, including matching, filters, interceptors, message body readers/writers, invocation, response filters, and exception handling.

Conceptual use:

public class ApiApplication extends ResourceConfig {
    public ApiApplication() {
        property("jersey.config.server.tracing.type", "ON_DEMAND");
        property("jersey.config.server.tracing.threshold", "SUMMARY");
    }
}

Then a diagnostic request can opt in:

curl -i \
  -H 'X-Jersey-Tracing-Accept: true' \
  -H 'X-Jersey-Tracing-Threshold: SUMMARY' \
  http://localhost:8080/api/cases/123

Do not set tracing to ALL in production by default. Verbose tracing can echo headers and expose sensitive detail depending on mode/configuration.

Use tracing when investigating:

  • 404 route mismatch;
  • 406/415 negotiation failures;
  • provider selection problems;
  • filter priority/order issues;
  • exception mapper behavior;
  • unexpectedly slow Jersey pipeline stage.

10. OpenTelemetry Boundary Model

Jersey tracing is Jersey-specific diagnostics. Distributed tracing usually means OpenTelemetry or vendor-compatible tracing.

Trace spans should represent causal work:

Practical rules:

  • name server span using route template, not full path;
  • attach status/error attributes safely;
  • do not attach raw tokens or PII;
  • propagate trace context to outbound Jersey Client;
  • sample intelligently;
  • keep high-cardinality values out of span names.

Outbound Jersey Client filter for correlation/trace propagation:

@Provider
public final class OutboundCorrelationFilter implements ClientRequestFilter {

    @Override
    public void filter(ClientRequestContext requestContext) {
        String correlationId = org.slf4j.MDC.get("correlationId");
        if (correlationId != null) {
            requestContext.getHeaders().putSingle("X-Correlation-Id", correlationId);
        }
    }
}

Register with client:

Client client = ClientBuilder.newBuilder()
        .register(OutboundCorrelationFilter.class)
        .build();

In production, prefer OpenTelemetry instrumentation when available, and use Jersey filters/listeners to fill framework-specific gaps.


11. Exception Observability

From Part 009, exception mapping is part of the contract. Observability must distinguish:

  • expected domain denial;
  • validation failure;
  • not found;
  • downstream timeout;
  • database outage;
  • serialization failure;
  • unknown bug.

Exception mapper should emit metrics and logs without leaking sensitive content.

@Provider
public final class UnhandledExceptionMapper implements ExceptionMapper<Throwable> {

    @Override
    public Response toResponse(Throwable exception) {
        String errorId = UUID.randomUUID().toString();

        log.error("event=unhandled_exception error_id={} exception_type={}",
                errorId,
                exception.getClass().getName(),
                exception);

        metrics.counter("http.server.exceptions", "type", exception.getClass().getSimpleName())
                .increment();

        return Response.status(Response.Status.INTERNAL_SERVER_ERROR)
                .type(MediaType.APPLICATION_JSON_TYPE)
                .entity(new ErrorBody("INTERNAL_ERROR", "Unexpected error", errorId))
                .build();
    }
}

Classify exception families:

FamilyClient statusLog levelAlert?
Validation400INFO/DEBUGUsually no
Unauthenticated401INFOSpike alert
Forbidden403INFO/WARNSpike alert
Not found404DEBUG/INFOAbuse detection only
Conflict409INFOBusiness trend
Downstream timeout504/503WARN/ERRORYes
DB unavailable503/500ERRORYes
Serialization bug500ERRORYes
Unknown exception500ERRORYes

12. Access Log vs Application Log

Access logs and application logs are not substitutes.

LogProduced byContainsGood for
Access logGlassFish/proxy/LBmethod, path, status, bytes, durationtraffic, status trends, basic latency
Application logJersey/app coderoute, caller, decision, error code, domain statedebugging and causality
Audit logbusiness/security subsystemwho did what to what, decision, policycompliance and investigation

Example event split:

access log:
POST /api/tenants/T1/cases/C1/approve 403 21ms

application log:
event=http_request_completed route=/tenants/{tenantId}/cases/{caseId}/approve status=403 outcome=authz_denied permission=case.approve correlation_id=...

audit log:
event=authorization_decision caller=alice action=case.approve tenant=T1 resource=C1 decision=DENY reason=SOD_VIOLATION policy=approval:v7

Do not turn application log into audit log by accident. Audit has stronger retention, integrity, and access-control requirements.


13. GlassFish Diagnostics

When symptom points below Jersey, use GlassFish/JVM diagnostics.

13.1 Server Log

Look for:

  • deployment errors;
  • classloading conflict;
  • CDI/HK2 injection failure;
  • resource pool errors;
  • thread pool warnings;
  • SSL/TLS listener problems;
  • application initialization failures.

13.2 Thread Dump

Use when:

  • latency spikes;
  • requests hang;
  • CPU high;
  • no response but process alive;
  • suspected deadlock;
  • pool exhaustion.

Thread dump questions:

  • Are HTTP threads blocked on DB pool?
  • Are threads waiting on external HTTP client without timeout?
  • Are many threads writing to slow clients?
  • Is there lock contention in singleton resource/filter/provider?
  • Are async executor threads exhausted?

13.3 Heap Dump

Use when:

  • memory leak suspected;
  • OutOfMemoryError;
  • increasing old-gen after full GC;
  • unbounded buffering;
  • leaked Response/streams;
  • SSE clients retained after disconnect.

13.4 JMX

Use for:

  • pool metrics;
  • thread state;
  • Jersey monitoring MBeans if enabled;
  • JVM memory/GC;
  • operational introspection.

Protect JMX. Treat it as privileged runtime control plane.


14. Diagnostic Decision Tree

Do not start with random code inspection. Start with symptom classification.


15. Latency Decomposition

For slow endpoint, break time into layers:

total latency
  = network ingress
  + servlet/container queue
  + Jersey filters
  + resource matching
  + request body read/deserialization
  + resource method
      + domain policy
      + database
      + outbound calls
  + response filters
  + serialization
  + network egress

Timer points:

SegmentInstrument with
whole requestserver filter/timer
auth filterfilter-specific timer
body read/writeinterceptor/provider metrics if needed
resource methodJersey event listener or AOP/interceptor
DBdatasource/persistence metrics
outbound HTTPJersey Client filter + connector metrics
async waitasync response timeout/wait metrics

If you only measure resource method time, you miss serialization and filters. If you only measure access log time, you cannot localize cause.


16. Provider and Serialization Diagnostics

Serialization can dominate latency and memory.

Symptoms:

  • 2xx slow but DB fast;
  • high CPU on response-heavy endpoints;
  • large heap allocation;
  • MessageBodyWriter not found;
  • 500 after resource method returns successfully;
  • response filter sees OK, but client gets 500 due to write failure.

Diagnostics:

  • enable Jersey tracing on demand;
  • log selected media type and route;
  • check registered providers at startup;
  • validate JSON provider dependency;
  • measure payload size;
  • inspect object graph for lazy-loaded entities;
  • avoid returning JPA entities directly;
  • avoid buffering large response into memory.

Startup provider registry log:

@Provider
public final class StartupInventoryListener implements ApplicationEventListener {

    @Override
    public void onEvent(ApplicationEvent event) {
        if (event.getType() == ApplicationEvent.Type.INITIALIZATION_FINISHED) {
            event.getProviders().forEach(provider ->
                    log.info("event=jersey_provider_registered provider={}", provider.getClass().getName())
            );
        }
    }

    @Override
    public RequestEventListener onRequest(RequestEvent requestEvent) {
        return null;
    }
}

Use this carefully; provider inventory can be noisy.


17. Filter and Interceptor Diagnostics

Filter order bugs are subtle.

Instrumentation fields:

  • filter class;
  • priority;
  • phase: pre-match, request, response, reader, writer;
  • duration;
  • abort decision;
  • reason code;
  • route if available.

Example targeted timing wrapper:

public final class TimedAuthFilter implements ContainerRequestFilter {

    private final ContainerRequestFilter delegate;

    @Override
    public void filter(ContainerRequestContext requestContext) throws IOException {
        long start = System.nanoTime();
        try {
            delegate.filter(requestContext);
        } finally {
            long durationMicros = TimeUnit.NANOSECONDS.toMicros(System.nanoTime() - start);
            metrics.timer("jersey.filter.duration", "filter", delegate.getClass().getSimpleName())
                    .record(durationMicros, TimeUnit.MICROSECONDS);
        }
    }
}

Use on high-risk filters:

  • authentication;
  • authorization;
  • audit;
  • request body logging/sanitization;
  • compression;
  • encryption/decryption;
  • tenancy resolution;
  • external policy lookup.

18. Outbound Jersey Client Observability

Server endpoints often look slow because outbound calls are slow.

Client filter:

public final class ClientTimingFilter implements ClientRequestFilter, ClientResponseFilter {

    private static final String START = "startNanos";

    @Override
    public void filter(ClientRequestContext requestContext) {
        requestContext.setProperty(START, System.nanoTime());
    }

    @Override
    public void filter(ClientRequestContext requestContext,
                       ClientResponseContext responseContext) {
        long start = (long) requestContext.getProperty(START);
        long durationMs = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start);

        log.info("event=outbound_http_completed method={} host={} status={} duration_ms={}",
                requestContext.getMethod(),
                requestContext.getUri().getHost(),
                responseContext.getStatus(),
                durationMs);
    }
}

Failure metrics:

MetricMeaning
outbound.requests.total{target,status}Dependency behavior
outbound.duration{target}Dependency latency
outbound.timeouts.total{target,type}Timeout pressure
outbound.connection.errors.total{target}Network/TLS/DNS/connect issues
outbound.retries.total{target}Retry amplification risk

Never label metrics by full URL with IDs/query strings.


19. Health, Readiness, and Liveness

Do not build one /health endpoint that checks everything.

EndpointShould checkShould not check
livenessprocess/event loop not deaddatabase, external services
readinesscan serve trafficexpensive deep dependency scan every request
startupmigration/config/bootstrap completetransient downstream health if not required

Example:

@Path("/health")
public class HealthResource {

    @GET
    @Path("/live")
    public Response live() {
        return Response.ok(Map.of("status", "UP")).build();
    }

    @GET
    @Path("/ready")
    public Response ready() {
        Readiness readiness = readinessService.check();
        return Response.status(readiness.up() ? 200 : 503)
                .entity(readiness)
                .build();
    }
}

Readiness should be quick, bounded by timeout, and cacheable briefly if checks are expensive.

Do not include secrets, internal URLs, or detailed exception stack traces in public health responses.


20. Alerting Strategy

Metrics without alerts are dashboards. Alerts should map to user impact or imminent failure.

Good alerts:

AlertSignal
API availability below SLO5xx/total ratio over window
Latency SLO burnp95/p99 duration over threshold
Auth failure spike401 reason spike after deploy or attack
Authorization denial spike403 permission-specific anomaly
JDBC pool exhaustionactive near max + wait time
HTTP thread starvationqueue growing + active maxed
Downstream timeout spikeoutbound timeout counter
Async timeout spikesuspended responses timing out
OOM riskheap high after full GC

Bad alerts:

  • every single 500 as page;
  • CPU > 80% without user impact;
  • log string matching without structure;
  • alerts on expected 404s;
  • dashboards nobody owns.

21. Observability and Privacy

Observability can become data leakage.

Rules:

  • Redact Authorization, Cookie, Set-Cookie.
  • Do not log raw request/response bodies by default.
  • Hash or pseudonymize principal where possible.
  • Avoid storing sensitive path/query values in labels.
  • Separate audit logs from operational logs.
  • Apply retention and access control.
  • Make debug/tracing modes time-limited.
  • Review Jersey tracing headers before enabling outside dev.

Sensitive fields checklist:

Authorization
Cookie
Set-Cookie
X-Api-Key
password
access_token
refresh_token
id_token
ssn
nik
email
phone
address
bank_account
case narrative
medical/legal notes

For regulatory workflows, assume case narrative and attachments are confidential.


22. Startup Observability

A production service should explain its runtime shape at startup.

Recommended startup logs:

  • application name/version/git commit;
  • Java version;
  • Jakarta/Jersey/GlassFish baseline if available;
  • active environment/profile;
  • context root/application path;
  • registered resources count;
  • provider count;
  • important feature flags;
  • configured outbound dependencies names, not secrets;
  • config validation result.

Example:

public final class StartupLogFeature implements Feature {

    @Override
    public boolean configure(FeatureContext context) {
        log.info("event=api_starting app={} version={} java={}",
                "case-api",
                BuildInfo.version(),
                Runtime.version());
        return true;
    }
}

Startup observability helps diagnose:

  • wrong artifact deployed;
  • wrong profile;
  • missing provider;
  • classpath conflict;
  • disabled security filter;
  • accidental mock dependency in production.

23. Failure Playbooks

23.1 415 Unsupported Media Type

Check:

  1. Request has correct Content-Type.
  2. Resource method has matching @Consumes.
  3. MessageBodyReader exists for entity type + media type.
  4. Provider dependency is packaged correctly.
  5. No version conflict in classpath.
  6. Jersey tracing shows MBR selection/skip reason.

23.2 406 Not Acceptable

Check:

  1. Client Accept header.
  2. Resource method @Produces.
  3. MessageBodyWriter for response type + media type.
  4. Error response media type if exception path.
  5. Provider registration/order.

23.3 Random 500 After Successful Domain Operation

Hypothesis:

  • response serialization failed;
  • lazy entity loaded outside transaction;
  • response filter threw;
  • MessageBodyWriter failed;
  • client disconnected mid-write.

Check:

  • exception mapper logs;
  • Jersey tracing around MBW;
  • application log before/after resource return;
  • stack trace root cause;
  • payload size.

23.4 Slow API with Low CPU

Hypothesis:

  • waiting on DB pool;
  • waiting on outbound HTTP without timeout;
  • thread pool queue;
  • slow client streaming;
  • lock contention.

Check:

  • thread dump;
  • JDBC pool metrics;
  • outbound timeout metrics;
  • HTTP thread pool active/queued;
  • p95 by route;
  • trace spans.

23.5 High 401 After Deployment

Hypothesis:

  • issuer/audience config changed;
  • clock skew;
  • JWKS fetch failure;
  • gateway forwarding changed;
  • token type mismatch;
  • environment pointing to wrong identity provider.

Check:

  • auth failure reason metric;
  • deployment diff;
  • config startup log;
  • JWKS cache logs;
  • sample sanitized token claims in non-production only.

24. Minimal Observability Implementation Plan

For a new Jersey/GlassFish API, implement in this order:

  1. Correlation ID filter.
  2. Structured request-completed log.
  3. Error ID in exception mapper.
  4. Metrics for request count/latency/status.
  5. Auth failure and authz denial metrics.
  6. Outbound Jersey Client timing filter.
  7. Health readiness/liveness endpoints.
  8. Startup inventory log.
  9. On-demand Jersey tracing for non-production and incident windows.
  10. Thread dump and heap dump runbook.

Do not start with fancy dashboards before the signal contract is stable.


25. Common Anti-Patterns

Anti-patternWhy it failsBetter pattern
Logging raw request body globallyPII/secrets leak, huge overheadTargeted sanitized debug logging
Metrics labeled by user IDCardinality explosionRoute/status/outcome labels
Full path as metric labelCardinality explosionRoute template
One /health checks everythingCauses false restarts/outagesSeparate live/ready/startup
No correlation IDHard to connect logsGenerate/propagate request ID
Only access logsNo application causalityAccess + app + audit logs
Tracing ALL in productionOverhead and data exposureOn-demand/time-boxed tracing
Catch-all 500 without error IDImpossible support workflowError ID + internal log
No outbound metricsBlame wrong layerClient timing + timeout metrics
No thread dump runbookBlind during incidentPredefined diagnostic commands/process

26. Practice Lab

Build a small observability package:

com.example.api.observability
  CorrelationIdFilter
  RequestCompletionLoggingFilter
  JerseyDiagnosticsListener
  OutboundCorrelationFilter
  OutboundTimingFilter
  HealthResource
  ErrorIdExceptionMapper

Exercise:

  1. Call a successful endpoint and verify correlation ID in response/log.
  2. Trigger validation failure and verify outcome=client_error.
  3. Trigger unauthorized and verify outcome=authn_failed.
  4. Trigger forbidden and verify audit decision.
  5. Trigger MessageBodyWriter error and verify error ID.
  6. Enable Jersey tracing on demand and inspect provider/matching stages.
  7. Simulate slow downstream and verify outbound timing metric.
  8. Take a thread dump during slow request and identify blocking point.

The goal is not to build a monitoring platform. The goal is to make runtime behavior legible.


27. Key Takeaways

  • Observability is a design constraint, not an afterthought.
  • Jersey gives useful runtime hooks: filters, interceptors, event listeners, monitoring statistics, MBeans, and tracing.
  • Logs, metrics, traces, and diagnostics answer different questions; none replaces the others.
  • Use route templates and outcome classification to avoid cardinality explosion.
  • Treat security, tenant, and audit signals as first-class observability dimensions.
  • Turn on expensive diagnostics intentionally and protect access to internal runtime data.
  • The best production engineers can move from HTTP symptom to runtime hypothesis quickly and prove or disprove it with signals.

References

Lesson Recap

You just completed lesson 16 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.