Part 026 — Performance Engineering: Reflection, Serialization, Allocation, Threading, Virtual Threads, and Benchmarking

1. Learning Objective

Bagian ini membahas performance Jakarta REST secara sistemik. Tujuannya bukan menghafal trik micro-optimization, tetapi membangun model biaya agar kita bisa menjawab pertanyaan seperti:

Request lambat karena routing, serialization, database, network, atau downstream call?
Apakah async membantu atau hanya memindahkan bottleneck?
Apakah virtual threads akan memperbaiki throughput?
Apakah JSON serialization menjadi hotspot?
Apakah tail latency berasal dari connection pool, GC, lock contention, atau retry storm?
Bagaimana menguji endpoint REST dengan benar?

Target akhir:

Mampu melihat endpoint Jakarta REST sebagai pipeline resource consumption: CPU, memory allocation, thread, IO, connection, queue, lock, downstream dependency, dan serialization boundary.

2. Kaufman Deconstruction

Kita pecah performance REST menjadi sub-skill berikut.

Sub-skill	Pertanyaan inti	Output praktis
Cost model	Apa saja biaya satu request?	Bisa menemukan bottleneck tanpa menebak
Measurement	Metric apa yang valid?	Bisa membedakan latency, throughput, saturation
Serialization	Berapa biaya JSON/body mapping?	DTO dan provider tidak boros
Allocation	Apakah endpoint membuat terlalu banyak object?	GC pressure terkontrol
Threading	Apakah request blocking atau non-blocking?	Thread pool tidak habis
Connection pools	Pool mana yang jadi bottleneck?	DB/HTTP client tidak starvation
Virtual threads	Kapan membantu, kapan tidak?	Tidak overclaim fitur runtime
Benchmarking	Bagaimana load test yang benar?	Hasil bisa dipercaya
Tail latency	Kenapa p99 buruk meski average bagus?	Sistem stabil di produksi

3. Performance Mental Model

Satu request Jakarta REST melewati beberapa tahap.

Potential cost at each stage:

Stage	Cost type
HTTP parse	CPU, allocation
Resource matching	CPU, route table lookup
Filters	CPU, IO if badly designed
Parameter conversion	CPU, validation, error handling
Body deserialization	CPU, allocation, reflection/codegen
Resource/service	Business logic, locks, DB, external calls
DTO mapping	CPU, allocation
Serialization	CPU, allocation, buffering
Response write	network IO, client speed

Most REST performance problems are not caused by @GET or @Path. They are caused by expensive work hidden behind a clean endpoint.

4. Measure Before Optimizing

Do not start with tuning.

Start with four questions:

What is the endpoint SLO?
What is the traffic shape?
Where is time spent?
Which resource saturates first?

Example SLO:

GET /cases/{caseId}
  p50 < 50 ms
  p95 < 200 ms
  p99 < 500 ms
  error rate < 0.1%

Traffic shape:

Peak: 300 RPS
Payload: median 4 KB, p99 80 KB
Dependencies: DB + document metadata service
Concurrency: 500 active requests

Without SLO and traffic shape, “fast” has no meaning.

5. Latency vs Throughput vs Saturation

Definitions:

Term	Meaning
Latency	Time for one request to complete
Throughput	Requests per second completed
Concurrency	Requests in flight
Saturation	Resource utilization near capacity
Tail latency	High percentile latency, e.g. p95/p99
Queueing	Waiting before actual work starts

Little’s Law is useful:

concurrency ≈ throughput × latency

If endpoint handles 200 RPS with 500 ms average latency:

concurrency ≈ 200 × 0.5 = 100 in-flight requests

If latency jumps to 2 seconds under downstream slowness:

concurrency ≈ 200 × 2 = 400 in-flight requests

That can exhaust request threads, DB pool, or HTTP client pool.

6. Average Latency Is Misleading

Average hides pain.

Example:

Percentile	Latency
p50	40 ms
p90	120 ms
p95	300 ms
p99	2,500 ms

Average may look acceptable, but p99 indicates some users hit severe delay. In distributed systems, tail latency compounds. If one user action calls 5 backend APIs, each with p99 risk, the user-facing p99 can become very bad.

Always capture:

p50,
p90,
p95,
p99,
max only for debugging, not SLO,
error rate,
saturation indicators.

7. Jakarta REST Runtime Overhead

Jakarta REST runtime overhead usually includes:

route matching,
annotation metadata lookup,
provider selection,
parameter injection,
filters/interceptors,
exception mapper resolution,
entity provider invocation.

For most business APIs, this overhead is smaller than DB/external IO and serialization. But it can matter for:

extremely high RPS small payload endpoints,
gateway-like services,
health/metrics endpoints under heavy scraping,
event ingestion APIs,
native-image/build-time optimized runtimes,
environments with strict cold start.

Do not optimize route matching if 95% of latency is database query time.

8. Serialization Cost

JSON serialization/deserialization is often a major CPU/allocation cost.

Cost drivers:

payload size,
object graph depth,
reflection/introspection,
date/time formatting,
enum conversion,
polymorphism,
null handling,
unknown field handling,
custom serializers/adapters,
records vs mutable classes,
buffering strategy.

Example problematic DTO:

public class CaseDetailResponse {
    public String caseId;
    public List<EvidenceResponse> evidence;
    public List<AuditEventResponse> fullAuditTrail;
    public Map<String, Object> dynamicAttributes;
    public Object rawWorkflowContext;
}

Problems:

large nested collections,
dynamic map defeats type discipline,
raw workflow context may be huge,
audit trail may not belong in case summary,
serialization cost unpredictable.

Better:

public record CaseDetailResponse(
        String caseId,
        String status,
        String assignedTeam,
        OffsetDateTime updatedAt,
        List<LinkResponse> links
) {}

Then expose heavy subresources separately:

GET /cases/{caseId}/evidence
GET /cases/{caseId}/audit-events
GET /cases/{caseId}/decisions

Performance and contract design are connected.

9. Avoid Entity Exposure

Returning persistence entities directly is bad for API contract and performance.

Bad:

@GET
@Path("/{id}")
public CaseEntity getCase(@PathParam("id") String id) {
    return repository.find(id);
}

Performance risks:

lazy-loading during serialization,
N+1 queries hidden in JSON writer,
circular references,
huge object graph,
accidental fields serialized,
transaction/session boundary leak.

Better:

@GET
@Path("/{id}")
public CaseResponse getCase(@PathParam("id") String id) {
    CaseView view = caseQueryService.getCaseView(id);
    return CaseResponse.from(view);
}

DTO is not just design purity. It is performance control.

10. Allocation and GC Pressure

Every request allocates objects:

request context,
parameter values,
DTOs,
JSON parser/writer objects,
collections,
log strings,
exceptions,
optional wrappers,
stream buffers.

Allocation is not always bad in modern JVMs, but excessive allocation increases GC pressure and tail latency.

Symptoms:

p99 spikes during GC,
high allocation rate per request,
CPU high even when DB is idle,
memory pressure under burst traffic,
large temporary byte arrays/strings.

Common causes:

converting body to String unnecessarily,
reading entire upload into memory,
logging full request/response body,
building large intermediate maps,
stream().map(...).collect(...) chains over huge result sets without bounds,
serializing large nested object graphs,
exception-driven control flow.

11. Streaming vs Buffering

Buffering is simpler but can be expensive.

Examples:

byte[] file = service.loadFile(id);
return Response.ok(file).build();

This loads whole file into memory.

Better for large payload:

@GET
@Path("/{id}/content")
public Response download(@PathParam("id") String id) {
    StreamingOutput stream = output -> {
        documentService.copyContentTo(id, output);
    };

    return Response.ok(stream)
            .type("application/pdf")
            .header("Content-Disposition", "attachment; filename=\"document.pdf\"")
            .build();
}

But streaming has its own constraints:

error after partial response is hard to report as JSON,
client speed affects write duration,
output stream errors must be logged,
resource cleanup must be reliable,
timeout behavior must be tested.

Use streaming for large payloads; use small DTOs for normal JSON.

12. Threading Model

Classic Jakarta REST request handling is often thread-per-request from a container-managed pool.

Simplified:

1 request = 1 container request thread until response completes

If resource method blocks on DB or external HTTP call, thread is occupied.

This model is simple and works well with bounded pools, but can suffer when:

downstream latency increases,
too many concurrent blocking calls,
request timeout too high,
retries multiply load,
DB pool is small and request threads pile up waiting.

12.1 Thread Pool Exhaustion Scenario

Request threads: 200
DB pool: 30
External API becomes slow: 5 seconds
Incoming traffic: 100 RPS

Requests pile up, occupy threads, wait for DB/external calls, and eventually even cheap endpoints may fail because no request thread is free.

Mitigations:

strict timeouts,
bulkheads,
circuit breakers,
async/job resource pattern,
queue limits,
separate executor for long-running work,
admission control,
virtual threads where supported and appropriate.

13. Async Jakarta REST Does Not Magically Make Work Faster

Async resource pattern releases request thread while work continues elsewhere.

@GET
@Path("/{id}/expensive")
public void expensive(
        @PathParam("id") String id,
        @Suspended AsyncResponse response) {

    executor.submit(() -> {
        try {
            CaseResponse result = service.compute(id);
            response.resume(result);
        } catch (Exception e) {
            response.resume(e);
        }
    });
}

This can improve request thread utilization, but total system capacity still depends on:

executor size,
downstream pool size,
CPU,
memory,
queue length,
timeout,
cancellation handling.

If you move blocking work from request thread pool to unbounded executor, you may create a worse failure mode.

Correct principle:

Async is useful when it controls resource ownership and prevents request thread starvation. It is not a substitute for capacity planning.

14. `CompletionStage` Resource Methods

Jakarta REST supports returning asynchronous types such as CompletionStage in modern versions.

Example:

@GET
@Path("/{id}")
public CompletionStage<CaseResponse> getCase(@PathParam("id") String id) {
    return caseQueryService.getCaseAsync(id)
            .thenApply(CaseResponse::from);
}

This is clean when the underlying work is genuinely async/non-blocking or managed by an appropriate executor.

Bad:

return CompletableFuture.supplyAsync(() -> blockingRepository.find(id));

without a managed bounded executor.

Problems:

uses common ForkJoinPool by default,
blocking work can starve unrelated tasks,
context propagation unclear,
timeout/cancellation often missing.

Use container-managed executor facilities where possible.

15. Virtual Threads

Virtual threads can improve scalability for blocking IO-heavy workloads by making blocking cheaper at the thread abstraction level.

But virtual threads are not magic.

They help when:

code is mostly blocking IO,
thread-per-request model is simple,
bottleneck is platform thread scarcity,
dependencies can support higher concurrency,
container/runtime supports virtual thread configuration safely.

They do not help when:

CPU is saturated,
database pool is bottleneck,
external API rate limit is bottleneck,
locks serialize work,
synchronized/blocking pinning issue appears,
payload serialization dominates CPU,
you allow unlimited concurrency without backpressure.

15.1 Virtual Thread Invariant

Virtual threads reduce cost of waiting threads; they do not increase capacity of downstream systems.

If DB pool has 30 connections, 10,000 virtual threads waiting for DB do not create 10,000 DB connections. They create 9,970 queued waiters unless you add bulkheads/admission control.

15.2 Jakarta EE Context

Jakarta EE 11 introduces JDK-runtime-aware support for virtual threads in the platform direction, and Jakarta Concurrency 3.1 provides support for virtual threads in managed concurrency resources. In practice, implementation support and configuration vary by runtime. Treat virtual threads as a runtime feature to verify through load testing, not a theoretical switch.

16. Connection Pools Are Often the Real Bottleneck

REST services usually depend on pools:

database pool,
outbound HTTP client pool,
thread pool,
executor queue,
cache connection pool,
message broker connection/channel pool.

Example bottleneck:

Request threads: 200
DB pool: 20
Endpoint requires DB query taking 100 ms
Theoretical DB-limited throughput ≈ 20 / 0.1 = 200 RPS

If query latency becomes 500 ms:

Throughput ≈ 20 / 0.5 = 40 RPS

Adding more request threads will not fix it. It may make tail latency worse.

17. Timeout Budgeting

Each request should have a time budget.

Example:

Total SLO p95: 300 ms
- REST runtime + filters: 10 ms
- DB query: 100 ms
- outbound risk service: 120 ms
- serialization: 20 ms
- buffer: 50 ms

Timeouts should respect budget:

Risk service timeout: 150 ms
DB query timeout: 120 ms
Overall request timeout: 300-400 ms

Bad:

Overall SLO: 300 ms
Outbound HTTP timeout: 30 seconds

That creates thread pile-up and failure amplification.

18. Retries and Performance Collapse

Retries can multiply load.

If traffic is 100 RPS and every failing request retries 3 times:

Effective traffic = 400 attempts/sec

During downstream degradation, retry storm can destroy both caller and callee.

Retry only when:

operation is idempotent or protected by idempotency key,
failure is likely transient,
timeout is short,
backoff/jitter exists,
retry budget is bounded,
circuit breaker prevents storm.

Do not retry large POST mutation blindly.

19. Caching

Caching can improve performance, but only when correctness is preserved.

Options:

HTTP caching with ETag, Last-Modified, Cache-Control,
application cache,
query result cache,
CDN/proxy cache for public/static resources,
client-side cache.

For case-management systems, many resources are sensitive and user-specific. Use private/no-store/no-cache carefully.

ETag for read resource:

@GET
@Path("/{id}")
public Response getCase(@PathParam("id") String id, @Context Request request) {
    CaseView view = service.getCaseView(id);
    EntityTag etag = new EntityTag(view.versionHash());

    Response.ResponseBuilder precondition = request.evaluatePreconditions(etag);
    if (precondition != null) {
        return precondition.build();
    }

    return Response.ok(CaseResponse.from(view))
            .tag(etag)
            .build();
}

This can avoid serializing and transferring unchanged representation.

20. Pagination and Response Size

Large responses hurt:

DB time,
memory,
serialization CPU,
network transfer,
client rendering,
p99 latency.

Never ship unbounded collection endpoints.

Bad:

GET /cases

with unlimited results.

Better:

GET /cases?limit=50&cursor=eyJvZmZzZXQiOjEwMDB9

Set:

default limit,
max limit,
stable sort,
cursor/keyset pagination for large datasets,
response metadata,
clear filtering grammar.

Performance begins at contract design.

21. Logging Cost

Logging can become a hidden bottleneck.

Expensive patterns:

log.info("request body={}", hugeBody);
log.info("response={}", objectMapper.writeValueAsString(response));

Risks:

CPU cost,
allocation,
blocking appender,
disk pressure,
sensitive data leakage,
p99 latency spikes.

Better:

log.info("case request completed caseId={} status={} durationMs={} correlationId={}",
        caseId, status, durationMs, correlationId);

Log structured metadata, not full payload by default.

22. Filters and Interceptors Performance

Filters run for many or all requests. A slow global filter damages every endpoint.

Avoid in global filters:

blocking DB calls,
external HTTP calls,
full body buffering,
expensive JSON parsing,
synchronous audit writes,
high-cardinality metric labels,
complex authorization if endpoint-specific logic is needed elsewhere.

Good global filters:

correlation id,
cheap auth context extraction,
security headers,
timing metrics,
access log metadata,
request size guard.

If a filter must do expensive work, bind it narrowly with name binding or resource-specific registration.

23. Exception Cost

Exceptions are expensive if used as normal control flow.

Bad:

try {
    UUID id = UUID.fromString(input);
} catch (IllegalArgumentException e) {
    // expected for many invalid requests
}

This is acceptable occasionally, but not as a hot-path parser for high-volume invalid traffic. Better validate cheap format first when needed.

Also avoid logging full stack traces for expected client errors:

400 validation error,
404 not found,
409 conflict,
412 precondition failed.

Stack traces are useful for server bugs, not for every bad user input.

24. CPU-Bound vs IO-Bound Endpoints

Performance strategy depends on workload.

24.1 IO-Bound Endpoint

Example:

GET /cases/{id}
- DB query 80 ms
- JSON serialization 5 ms

Focus:

DB query/index,
pool sizing,
timeout,
caching,
concurrency control.

Virtual threads may help if request threads are bottleneck, but DB pool remains limit.

24.2 CPU-Bound Endpoint

Example:

POST /documents/{id}/analysis
- CPU classification 800 ms
- no external IO

Focus:

algorithm optimization,
separate worker pool,
async job pattern,
limit concurrency,
avoid blocking request thread,
possibly offload to specialized service.

Virtual threads do not create more CPU.

24.3 Serialization-Bound Endpoint

Example:

GET /cases/{id}/audit-events?limit=5000
- DB 50 ms
- JSON serialization 900 ms

Focus:

reduce response size,
pagination,
streaming JSON if appropriate,
simpler DTO,
faster JSON provider/config,
compression trade-off.

25. Benchmarking REST Endpoints

Benchmarking must resemble production.

Include:

realistic payload size,
realistic auth headers/session,
realistic database volume,
realistic downstream latency,
keep-alive behavior,
warmup period,
ramp-up period,
fixed test duration,
error-rate check,
p95/p99 latency,
server resource metrics.

Tools can include wrk, k6, Gatling, JMeter, Vegeta, or custom harness. The tool matters less than test design.

25.1 Bad Benchmark

Single endpoint
Single user
No auth
Tiny payload
In-memory fake DB
10 second test
Only average latency reported

This result is not production evidence.

25.2 Better Benchmark

30 minute test
5 minute warmup
real DB dataset
mixed endpoint workload
realistic payload distribution
p50/p95/p99 reported
server CPU/memory/GC/thread/pool metrics captured
failure rate included

26. Load Test Workload Model

Example workload:

Endpoint	Weight
`GET /cases/{id}`	50%
`GET /cases?status=...`	20%
`POST /cases/{id}/notes`	10%
`POST /cases/{id}/transitions`	5%
`GET /cases/{id}/audit-events`	10%
`GET /cases/{id}/events` SSE	5% connection mix

Mixed workload catches resource interactions that single-endpoint tests miss.

For SSE, model active connections separately from request/response RPS.

27. Profiling

Use profilers when measurement shows CPU or allocation bottleneck.

Look for:

JSON serialization hotspots,
DTO mapping overhead,
regex validation cost,
logging formatting,
lock contention,
excessive allocation,
date/time formatter creation,
reflection/config introspection repeated per request.

Do not guess based on code aesthetics. Measure.

28. Warmup and Cold Start

JVM and Jakarta REST runtimes may have warmup cost:

class loading,
annotation scanning,
provider discovery,
JIT compilation,
JSON mapper initialization,
database pool initialization,
connection TLS warmup,
cache warmup.

Benchmark and readiness probes should account for this.

Production implication:

readiness should not turn green before critical providers/pools are ready,
first user request should not pay all initialization cost,
rolling deploy should avoid sending full traffic to cold instance immediately.

29. Native Image Considerations

Some Jakarta REST implementations support native-image-oriented deployments through frameworks/runtimes. Native images can improve startup and memory footprint, but may affect:

reflection configuration,
dynamic provider discovery,
JSON serialization behavior,
resource scanning,
runtime proxies,
monitoring/profiling assumptions,
peak throughput after warmup compared with JVM JIT.

Do not assume native image is always faster. It optimizes some dimensions, especially startup and footprint, but workload-specific testing remains required.

30. Performance-Aware API Design

Design decisions that improve performance:

Bounded collection endpoints.
Explicit field selection only if governance exists.
Separate heavy subresources.
Cursor/keyset pagination for large sets.
Conditional GET with ETag where safe.
Async job resource for long-running commands.
Streaming for large downloads.
Small DTOs for hot-path endpoints.
Avoid dynamic Map<String, Object> for stable contracts.
Avoid embedding audit trails into primary resource by default.

API contract is your first performance control surface.

31. Performance Failure Patterns

31.1 Retry Storm

Downstream slows. Caller retries. Traffic multiplies. Everything collapses.

Fix:

bounded retries,
jitter,
circuit breaker,
timeout budget,
idempotency key,
load shedding.

31.2 Pool Starvation

Threads wait for DB pool. Request queue grows. Latency explodes.

Fix:

right-size pool,
limit concurrency,
optimize queries,
add timeout,
separate pools if needed.

31.3 Large Response Explosion

One endpoint returns huge nested graph. Serialization dominates.

Fix:

pagination,
subresources,
projection DTO,
response size limit.

31.4 Slow Client Write

Client receives slowly. Server keeps response resource occupied.

Fix:

streaming timeout,
write timeout if runtime supports,
bounded queues,
CDN/object storage for large file delivery.

31.5 Global Filter Bottleneck

Every request performs expensive work in filter.

Fix:

move logic to endpoint-specific layer,
cache safe metadata,
name-bind filter,
remove body buffering.

32. Production Metrics Checklist

Collect at least:

HTTP Metrics

request count by route/method/status,
latency by route/method,
response size,
request size,
error rate,
active requests.

JVM Metrics

CPU,
heap usage,
allocation rate,
GC pause,
thread count,
blocked/waiting threads.

Pool Metrics

DB active/idle/pending,
HTTP client pool active/pending,
executor queue depth,
circuit breaker state,
retry count,
timeout count.

Domain Metrics

case transition rate,
validation failure rate,
conflict/precondition failure rate,
long-running job queue depth,
SSE active connections.

Avoid high-cardinality labels such as raw user id, case id, or request id in metrics.

33. Tuning Order

Use this order:

Define SLO and workload.
Measure current behavior.
Identify bottleneck.
Fix contract/design issue first.
Fix query/downstream issue.
Fix serialization/payload issue.
Fix pool/thread/timeout issue.
Tune runtime/JVM only after application bottleneck is understood.
Validate with load test.
Add regression guard.

Do not start by tweaking JVM flags.

34. Example Performance Review

Endpoint:

GET /cases/{caseId}/timeline

Observed:

p50: 120 ms
p95: 1.8 s
p99: 6.0 s
response p99 size: 9 MB
DB queries/request: 301

Likely issues:

unbounded timeline,
N+1 query,
huge JSON serialization,
no pagination,
client probably does not need all data.

Fix plan:

Add limit and cursor.
Replace entity graph serialization with projection query.
Add separate detail endpoint for individual timeline items.
Add ETag for stable pages if safe.
Add load test for 50, 100, 500 item pages.

This is better than increasing heap or request threads.

35. Case-Management Performance Blueprint

For regulated case-management APIs:

API type	Performance design
Case summary	Projection DTO, small payload, cache/ETag if allowed
Case search	Indexed filters, pagination, no arbitrary unbounded query
Evidence download	Streaming/object storage, authorization before stream
Audit trail	Append-only query, cursor pagination, no embedded full audit in case response
State transition	Small command DTO, idempotency/precondition, async for long work
Notification stream	SSE as hint, canonical state through GET
Reporting	Async export job, not synchronous huge REST response

This aligns performance with correctness and defensibility.

36. Checklist

Before optimizing:

37. Practice Tasks

Add timing metrics to all Jakarta REST resources using a response filter.
Measure p95/p99 for GET /cases/{id} with realistic payloads.
Create a deliberately unbounded collection endpoint, load test it, then fix with pagination.
Compare returning entity graph vs projection DTO.
Implement ETag on a read endpoint and measure unchanged response behavior.
Simulate downstream slowness and observe request thread/pool saturation.
Add timeout and circuit breaker, then retest.
Test virtual-thread-enabled executor/runtime if available and compare under IO-bound load.
Profile serialization-heavy endpoint.
Add SSE connections to load test and observe active connection/resource behavior.

38. Key Takeaways

Jakarta REST performance is mostly about pipeline cost, not annotation syntax.
Measure before optimizing; p95/p99 matter more than average.
Serialization, payload size, DB access, downstream calls, and pool saturation dominate many REST workloads.
Async and virtual threads can improve resource utilization, but they do not remove CPU, DB, or external system limits.
Contract design is performance design: bounded collections, projection DTOs, subresources, ETags, and async job resources matter.
Global filters/interceptors must stay cheap.
Load tests must reflect realistic traffic, payload, auth, dependencies, and long-lived streams.

References

Jakarta RESTful Web Services 4.0 Specification: https://jakarta.ee/specifications/restful-ws/4.0/jakarta-restful-ws-spec-4.0
Jakarta EE Platform 11 Specification: https://jakarta.ee/specifications/platform/11/
Jakarta EE 11 Release: https://jakarta.ee/release/11/
Jakarta Concurrency 3.1: https://jakarta.ee/specifications/concurrency/3.1/
Jakarta REST Client API package: https://jakarta.ee/specifications/restful-ws/4.0/apidocs/jakarta.ws.rs/jakarta/ws/rs/client/package-summary

Performance and Resource Efficiency

Part 026 — Performance Engineering: Reflection, Serialization, Allocation, Threading, Virtual Threads, and Benchmarking

1. Learning Objective

2. Kaufman Deconstruction

3. Performance Mental Model

4. Measure Before Optimizing

5. Latency vs Throughput vs Saturation

6. Average Latency Is Misleading

7. Jakarta REST Runtime Overhead

8. Serialization Cost

9. Avoid Entity Exposure

10. Allocation and GC Pressure

11. Streaming vs Buffering

12. Threading Model

12.1 Thread Pool Exhaustion Scenario

13. Async Jakarta REST Does Not Magically Make Work Faster

14. CompletionStage Resource Methods

15. Virtual Threads

15.1 Virtual Thread Invariant

15.2 Jakarta EE Context

16. Connection Pools Are Often the Real Bottleneck

17. Timeout Budgeting

18. Retries and Performance Collapse

19. Caching

20. Pagination and Response Size

21. Logging Cost

22. Filters and Interceptors Performance

23. Exception Cost

24. CPU-Bound vs IO-Bound Endpoints

24.1 IO-Bound Endpoint

24.2 CPU-Bound Endpoint

24.3 Serialization-Bound Endpoint

25. Benchmarking REST Endpoints

25.1 Bad Benchmark

25.2 Better Benchmark

26. Load Test Workload Model

27. Profiling

28. Warmup and Cold Start

29. Native Image Considerations

30. Performance-Aware API Design

31. Performance Failure Patterns

31.1 Retry Storm

31.2 Pool Starvation

31.3 Large Response Explosion

31.4 Slow Client Write

31.5 Global Filter Bottleneck

32. Production Metrics Checklist

HTTP Metrics

JVM Metrics

Pool Metrics

Domain Metrics

33. Tuning Order

34. Example Performance Review

35. Case-Management Performance Blueprint

36. Checklist

37. Practice Tasks

38. Key Takeaways

References

14. `CompletionStage` Resource Methods