Bulk and Batch Endpoints Without Breaking Reliability
Learn Java Microservices Communication - Part 033
Bulk and batch endpoint design for Java microservices: bounded payloads, item-level outcome modeling, atomicity choices, async job handoff, concurrency limits, and operational safety.
Part 033 — Bulk and Batch Endpoints Without Breaking Reliability
A single endpoint call is easy to reason about.
POST /payments
A bulk endpoint is not just the same endpoint repeated inside one JSON array.
POST /payments:batchCreate
The second endpoint changes almost everything:
- the request can partially succeed,
- one client call can create many side effects,
- retrying the request can duplicate many side effects,
- one slow item can hold the whole response hostage,
- one bad item can poison an otherwise valid batch,
- authorization becomes per-item,
- observability needs item-level visibility without exploding metric cardinality,
- validation errors become collections,
- transaction boundaries become explicit design choices,
- backpressure must happen before the database or downstream services melt.
This part is about designing bulk and batch endpoints as communication contracts, not convenience wrappers.
The rule:
A bulk endpoint is a controlled fan-in/fan-out boundary. It must make size, identity, failure, atomicity, retry, and observability explicit.
1. Batch vs Bulk: Use the Words Precisely
Engineers often use "batch" and "bulk" interchangeably. That is dangerous because the failure model is different.
| Term | Typical Meaning | Example | Communication Risk |
|---|---|---|---|
| Batch read | Fetch many known resources in one request | POST /orders:batchGet | Missing IDs, authorization filtering, response ordering |
| Bulk command | Apply one command to many items | POST /invoices:bulkApprove | Partial success, duplicate side effects, per-item failure |
| Batch create | Create many resources | POST /settlements:batchCreate | Idempotency, validation, uniqueness, transaction boundaries |
| Batch update | Update many resources with independent patches | PATCH /cases:batchUpdate | Version conflicts, partial update, stale preconditions |
| Bulk transition | Move many workflow entities to a new state | POST /cases:bulkEscalate | Domain invariant breakage, authorization, audit correctness |
| Async import | Submit a large file or dataset for later processing | POST /imports | Job lifecycle, replay, progress, cancellation |
A useful distinction:
- Batch usually means "many independent operations in one request".
- Bulk usually means "one logical operation over many targets".
Example:
POST /orders:batchGet
means: "get these independent orders".
POST /cases:bulkAssign
means: "assign this set of cases as one user-intended operation".
That distinction matters because a bulk transition often needs one audit event, one approval context, one reason code, and one user intent, even if item outcomes differ.
2. Why Bulk Endpoints Exist
Bulk endpoints are not automatically better than repeated single-item calls.
They exist for specific reasons:
- Reduce round trips when client-to-service latency dominates.
- Preserve user intent when one UI action applies to many entities.
- Centralize authorization and invariant checks instead of scattering them across client loops.
- Control server-side concurrency instead of letting each client fan out wildly.
- Provide consistent item-level outcome reporting.
- Move large work into an async job when synchronous request/response is the wrong communication shape.
Bad reason:
"The single endpoint is slow, so let us add a batch endpoint."
That usually hides a deeper issue: missing index, poor API shape, N+1 downstream calls, unbounded payloads, or bad client behavior.
Bulk is a communication pattern, not a performance bandage.
3. The Core Design Question
Before designing a bulk endpoint, answer this:
Is this request one operation with many targets, or many operations packaged together?
The answer decides response shape, audit model, idempotency, transaction strategy, and retry semantics.
A bulk action like "approve 300 cases" is one user intent.
A batch create request containing 300 unrelated records is many operations packaged together.
This difference affects audit:
- one bulk action may create one parent audit event plus item details,
- batch create may create independent audit entries for each resource.
It also affects idempotency:
- one bulk action may have one idempotency key for the whole operation,
- batch create may additionally need per-item client request IDs.
4. First-Class Constraints
A production bulk endpoint must publish constraints as part of the contract.
Example:
maxItems: 100
maxPayloadBytes: 1048576
maxItemPayloadBytes: 8192
atomicity: per-item
ordering: response matches request order
idempotency: required
authorization: evaluated per item
maxSyncDuration: 2s
largeRequests: use async job endpoint
If these constraints are not explicit, clients infer them by accident.
That creates production failure modes:
- one client sends 10 items and works,
- another sends 10,000 items and times out,
- retry duplicates work,
- service owner adds hidden limits,
- clients interpret generic
500as retryable, - operational team cannot distinguish valid load from abusive load.
The contract should make the envelope visible.
5. Hard Limits Are Not Optional
Every bulk endpoint needs limits.
At minimum:
| Limit | Why It Exists |
|---|---|
| Maximum item count | Protects memory, DB, downstream fan-out, response size |
| Maximum request body size | Prevents payload amplification |
| Maximum item size | Prevents one giant item hiding in a valid batch |
| Maximum response body size | Prevents response serialization blow-up |
| Maximum processing time | Prevents HTTP thread/event-loop starvation |
| Maximum concurrency per request | Prevents one request from becoming an internal DDoS |
| Maximum outstanding bulk operations per caller | Protects service-level capacity |
Bad:
for (ApproveCaseRequest item : request.items()) {
approveCase(item.caseId(), request.reason());
}
Better:
if (request.items().size() > properties.maxItems()) {
throw new BulkRequestTooLargeException(properties.maxItems());
}
payloadSizeGuard.verify(request);
callerQuotaGuard.verify(caller, "cases.bulkApprove");
The limit should be enforced at the edge of the application service, before any side effect.
6. The Request Shape
A bulk request should not be a naked array.
Bad:
[
{ "caseId": "CASE-1" },
{ "caseId": "CASE-2" }
]
Better:
{
"requestId": "bulk-2026-07-05-001",
"reason": "Evidence threshold met",
"items": [
{
"clientItemId": "line-001",
"caseId": "CASE-1",
"expectedVersion": 7
},
{
"clientItemId": "line-002",
"caseId": "CASE-2",
"expectedVersion": 3
}
]
}
Important fields:
| Field | Purpose |
|---|---|
requestId | Client-visible operation identity; useful for logs/audit but not always a substitute for Idempotency-Key |
clientItemId | Stable item identity inside the request; needed for error mapping |
| Target ID | Resource being operated on |
| Expected version / ETag | Prevents stale writes |
| Shared command fields | Reason, actor context, effective date, note, policy reference |
| Per-item command fields | Item-specific override where allowed |
A request without per-item identity is painful to debug.
If the 17th item fails, the client should not have to infer which business object failed from array position alone.
7. The Response Shape
A bulk endpoint needs a response that tells the truth.
Bad:
{
"success": false
}
Better:
{
"operationId": "BULK-APPROVE-20260705-00042",
"status": "PARTIAL_SUCCESS",
"summary": {
"requested": 3,
"succeeded": 2,
"failed": 1,
"skipped": 0
},
"results": [
{
"clientItemId": "line-001",
"caseId": "CASE-1",
"status": "SUCCEEDED",
"resourceVersion": 8
},
{
"clientItemId": "line-002",
"caseId": "CASE-2",
"status": "FAILED",
"error": {
"code": "CASE_VERSION_CONFLICT",
"message": "Case CASE-2 is at version 4, expected version 3.",
"retryable": false
}
},
{
"clientItemId": "line-003",
"caseId": "CASE-3",
"status": "SUCCEEDED",
"resourceVersion": 12
}
]
}
The response should answer:
- Was the overall request accepted?
- Which items were processed?
- Which items succeeded?
- Which failed?
- Why did they fail?
- Are failures retryable?
- Did any item have unknown outcome?
- Can the client safely retry the whole request?
- What operation ID should be used for support/audit?
8. Overall HTTP Status vs Item Status
A common mistake is to put all semantics into item results and always return 200 OK.
Another mistake is to return 500 when one item fails validation.
Use the HTTP status for the envelope outcome.
Use item status for per-item outcome.
| Scenario | HTTP Status | Body |
|---|---|---|
| Request envelope invalid JSON | 400 | Problem Details |
| Caller not authenticated | 401 | Problem Details |
| Caller cannot use bulk operation at all | 403 | Problem Details |
| Too many items | 413 or 422 | Problem Details with limit |
| Rate limit exceeded | 429 | Problem Details, maybe Retry-After |
| Service overloaded before processing | 503 | Problem Details |
| Valid request, all items succeed | 200 | Bulk result |
| Valid request, some item-level failures | 200 | Bulk result with PARTIAL_SUCCESS |
| Valid request accepted for async processing | 202 | Job resource representation |
Why not always use 207 Multi-Status?
207 exists in WebDAV and can express multiple independent statuses, but many internal API ecosystems do not standardize on it. For ordinary JSON HTTP APIs, a 200 with an explicit bulk result body is usually easier for clients, gateways, generated SDKs, and observability tools. Use 207 only if your organization has a clear convention and client support.
The important invariant is not the exact code. The invariant is:
HTTP status describes whether the bulk envelope was accepted and processed as a request. Item result describes business outcome per target.
9. Atomic vs Per-Item Processing
There are two major modes.
9.1 All-or-Nothing
Either every item succeeds, or none is committed.
Useful when:
- the operation is truly one business transaction,
- partial success would violate a domain invariant,
- item count is small,
- all items live in the same transactional boundary,
- processing can complete within the HTTP budget.
Example:
{
"atomicity": "ALL_OR_NOTHING",
"items": [ ... ]
}
Risks:
- larger lock footprint,
- longer transaction time,
- greater deadlock chance,
- poor scalability,
- timeout can still leave the client uncertain unless idempotency is implemented.
9.2 Per-Item Outcome
Each item commits independently.
Useful when:
- items are independent,
- partial completion is acceptable,
- retry can target failed items,
- large fan-out is expected,
- some items may fail due to domain state.
Risks:
- client must handle partial success,
- audit must be explicit,
- retry must avoid duplicating successful items,
- summary status must be accurate.
Default recommendation:
Prefer per-item outcome for bulk APIs unless the domain explicitly requires all-or-nothing.
Do not choose all-or-nothing because it feels simpler. It is usually simpler only for the first demo.
10. Synchronous vs Asynchronous Bulk
Some bulk requests should not be synchronous HTTP requests.
Use synchronous processing when:
- item count is small and bounded,
- p95 can stay below your HTTP deadline,
- failure result can be returned immediately,
- no heavy downstream fan-out is needed,
- result payload is bounded.
Use async job processing when:
- item count can be large,
- work can take seconds/minutes,
- retries/replays need operational control,
- progress reporting matters,
- cancellation matters,
- result file/report is large,
- downstream calls may be throttled.
Async shape:
POST /case-bulk-approval-jobs
Idempotency-Key: "9b41ff28-4e6a-4a78-b01c-26d5919f4455"
Content-Type: application/json
{
"reason": "Evidence threshold met",
"items": [
{ "clientItemId": "line-001", "caseId": "CASE-1", "expectedVersion": 7 }
]
}
Response:
HTTP/1.1 202 Accepted
Location: /case-bulk-approval-jobs/JOB-123
{
"jobId": "JOB-123",
"status": "ACCEPTED",
"links": {
"self": "/case-bulk-approval-jobs/JOB-123",
"results": "/case-bulk-approval-jobs/JOB-123/results"
}
}
Job polling:
GET /case-bulk-approval-jobs/JOB-123
{
"jobId": "JOB-123",
"status": "RUNNING",
"summary": {
"requested": 1000,
"processed": 724,
"succeeded": 700,
"failed": 24
}
}
The async job is often the more honest API.
If the operation is not naturally short, do not pretend it is short by increasing HTTP timeouts.
11. Bulk Endpoint Decision Matrix
| Question | If Yes | If No |
|---|---|---|
| Can a single bad item invalidate the whole business operation? | Consider all-or-nothing | Prefer per-item outcome |
| Can processing exceed a few seconds? | Use async job | Synchronous may be acceptable |
| Is result payload large? | Store result and provide link | Return inline result |
| Is each item independently retryable? | Include item IDs and failure codes | Use operation-level retry only |
| Can clients retry safely? | Require idempotency | Do not allow automatic retry |
| Does each item require authorization? | Model per-item forbidden result | Operation-level auth may be enough |
| Does item order matter? | Publish ordering semantics | Treat order as not meaningful |
12. Per-Item Authorization
Bulk endpoints are dangerous if authorization is checked only once.
Bad:
authorization.requirePermission(actor, "CASE_APPROVE");
for (Item item : request.items()) {
caseService.approve(item.caseId(), actor, request.reason());
}
This checks whether the actor can approve cases in general. It does not check whether they can approve each specific case.
Better:
authorization.requireOperation(actor, "CASE_BULK_APPROVE");
for (BulkApproveItem item : request.items()) {
if (!authorization.canApproveCase(actor, item.caseId())) {
results.add(BulkItemResult.failed(
item.clientItemId(),
item.caseId(),
"FORBIDDEN_FOR_CASE",
false
));
continue;
}
results.add(processOne(item));
}
In regulatory or enforcement systems, this matters because each case may have:
- jurisdiction constraints,
- confidentiality flags,
- assigned team ownership,
- conflict-of-interest rules,
- escalation state,
- legal hold,
- case category restrictions.
A bulk endpoint must not become a privilege escalation shortcut.
13. Validation Strategy
Bulk validation has two layers.
13.1 Envelope Validation
Reject the whole request when the envelope is invalid:
- missing required shared field,
- invalid JSON,
- too many items,
- duplicate
clientItemId, - duplicate target ID when not allowed,
- unsupported atomicity mode,
- request body too large.
13.2 Item Validation
Return item-level failure when individual item is invalid:
- invalid target ID,
- missing item-specific field,
- stale version,
- illegal state transition,
- item not found,
- forbidden item,
- business rule violation.
Example:
{
"status": "PARTIAL_SUCCESS",
"summary": {
"requested": 4,
"succeeded": 2,
"failed": 2
},
"results": [
{
"clientItemId": "line-003",
"caseId": "CASE-3",
"status": "FAILED",
"error": {
"code": "CASE_NOT_FOUND",
"retryable": false
}
},
{
"clientItemId": "line-004",
"caseId": "CASE-4",
"status": "FAILED",
"error": {
"code": "CASE_LOCKED_BY_LEGAL_HOLD",
"retryable": false
}
}
]
}
Avoid mixing envelope errors and item errors randomly. It makes client behavior inconsistent.
14. Duplicate Items
Decide explicitly whether duplicate target IDs are allowed.
Example request:
{
"items": [
{ "clientItemId": "line-001", "caseId": "CASE-1" },
{ "clientItemId": "line-002", "caseId": "CASE-1" }
]
}
Options:
| Policy | Behavior |
|---|---|
| Reject envelope | 422 because target appears twice |
| Process first, mark duplicate | First succeeds/fails, second gets DUPLICATE_TARGET |
| Allow duplicates | Dangerous unless operation is naturally idempotent |
Default recommendation:
Reject duplicate target IDs at envelope validation unless there is a clear domain reason to allow them.
Duplicate targets often indicate a client bug.
15. Ordering Semantics
A bulk endpoint must define response ordering.
Common choices:
- Response
resultspreserve request item order. - Response
resultsare unordered and must be matched byclientItemId. - Response groups by status.
Best default:
Preserve request order and require
clientItemIdfor stable matching.
Why both?
- Order preservation helps humans and simple clients.
clientItemIdprotects clients if order changes later, results are paginated, or async results are retrieved from storage.
16. Retry Semantics
Bulk endpoints and retries are a dangerous combination.
Imagine this:
- Client sends 100 payment captures.
- Server processes 80.
- Server times out before returning response.
- Client retries the same request.
- The first 80 may be captured twice unless the API is idempotent.
For any bulk command with side effects, choose one:
- require operation-level idempotency key,
- require per-item idempotency key,
- make the operation naturally idempotent by target state and expected version,
- forbid automatic retry.
Do not rely on HTTP method alone. POST is not idempotent by default. PUT and DELETE are defined as idempotent at HTTP method level, but application side effects like audit, notification, billing, and workflow transition can still duplicate unless designed carefully.
17. Idempotency Model for Bulk
A robust bulk command often uses two layers:
- Operation idempotency: same request returns same operation outcome.
- Item idempotency: each item has stable identity and can be retried safely.
Example:
POST /cases:bulkApprove
Idempotency-Key: "bulk-approve-01HN4Q2R6D9"
{
"requestId": "ui-selection-20260705-01",
"reason": "Evidence threshold met",
"items": [
{
"clientItemId": "line-001",
"caseId": "CASE-1",
"expectedVersion": 7
}
]
}
The server can store:
- idempotency key,
- request fingerprint,
- operation ID,
- final response summary,
- item results,
- expiry time.
Part 034 covers this deeply.
For now, the bulk rule is simple:
If the client may retry a bulk command, the server must be able to recognize the retry.
18. Transaction Strategy
18.1 One Transaction for Everything
@Transactional
public BulkApproveResponse approve(BulkApproveRequest request) {
for (BulkApproveItem item : request.items()) {
approveOne(item);
}
return response;
}
This is tempting, but it scales poorly.
Problems:
- long transaction,
- lock contention,
- large rollback cost,
- deadlocks,
- timeout ambiguity,
- memory pressure in persistence context,
- all-or-nothing even when not required.
Use only for small, truly atomic operations.
18.2 Transaction per Item
public BulkApproveResponse approve(BulkApproveRequest request) {
List<BulkItemResult> results = new ArrayList<>();
for (BulkApproveItem item : request.items()) {
results.add(transactionTemplate.execute(status -> approveOne(item)));
}
return BulkApproveResponse.from(results);
}
This is better for independent item outcomes.
But it needs:
- idempotency,
- per-item error mapping,
- careful retry policy,
- audit model,
- metrics.
18.3 Async Work Queue
For large workloads:
This decouples HTTP request duration from work duration.
It also lets the server apply controlled concurrency.
19. Avoid Internal Fan-Out Explosions
A bulk endpoint can accidentally multiply load.
Example:
- client sends 100 items,
- service calls 5 downstream services per item,
- each downstream call retries 3 times,
- one user request can trigger 1,500 downstream attempts.
Bulk endpoint implementation must have internal concurrency limits.
Bad:
request.items().parallelStream()
.map(this::processOne)
.toList();
Better:
private final ExecutorService bulkExecutor = Executors.newFixedThreadPool(8);
public List<BulkItemResult> process(BulkApproveRequest request) {
List<CompletableFuture<BulkItemResult>> futures = request.items().stream()
.map(item -> CompletableFuture.supplyAsync(() -> processOne(item), bulkExecutor))
.toList();
return futures.stream()
.map(CompletableFuture::join)
.toList();
}
Even better: use bounded queues, semaphores, and operation-level limits, not only a fixed thread pool.
20. Database Access Pattern
Bulk endpoints often expose inefficient data access.
Bad:
for (BulkApproveItem item : request.items()) {
CaseRecord c = caseRepository.findById(item.caseId()).orElseThrow();
c.approve();
caseRepository.save(c);
}
This creates N queries plus N updates.
Better pattern:
- Extract target IDs.
- Fetch all relevant rows in one bounded query.
- Build map by ID.
- Process items against map.
- Apply updates in controlled batches.
Example:
List<CaseId> caseIds = request.items().stream()
.map(BulkApproveItem::caseId)
.distinct()
.toList();
Map<CaseId, CaseRecord> casesById = caseRepository.findAllByIdsForUpdate(caseIds).stream()
.collect(Collectors.toMap(CaseRecord::id, Function.identity()));
List<BulkItemResult> results = new ArrayList<>();
for (BulkApproveItem item : request.items()) {
CaseRecord record = casesById.get(item.caseId());
if (record == null) {
results.add(BulkItemResult.notFound(item));
continue;
}
results.add(approveLoadedCase(item, record));
}
caseRepository.saveAll(changedCases(casesById.values()));
But be careful: findAllByIdsForUpdate may lock many rows. Use this only when the item count is small and bounded.
For larger workloads, async worker with chunking is safer.
21. Java DTO Example
public record BulkApproveCasesRequest(
String requestId,
String reason,
List<BulkApproveCaseItem> items
) {}
public record BulkApproveCaseItem(
String clientItemId,
String caseId,
long expectedVersion
) {}
public record BulkApproveCasesResponse(
String operationId,
BulkOperationStatus status,
BulkSummary summary,
List<BulkApproveCaseResult> results
) {}
public enum BulkOperationStatus {
SUCCEEDED,
PARTIAL_SUCCESS,
FAILED
}
public record BulkSummary(
int requested,
int succeeded,
int failed,
int skipped
) {}
public record BulkApproveCaseResult(
String clientItemId,
String caseId,
BulkItemStatus status,
Long resourceVersion,
BulkItemError error
) {}
public enum BulkItemStatus {
SUCCEEDED,
FAILED,
SKIPPED,
UNKNOWN
}
public record BulkItemError(
String code,
String message,
boolean retryable
) {}
Keep DTOs separate from domain objects.
A bulk response is a communication artifact. It should not expose your internal aggregate structure.
22. Spring Controller Example
@RestController
@RequestMapping("/cases")
final class CaseBulkController {
private final BulkApproveCasesUseCase useCase;
CaseBulkController(BulkApproveCasesUseCase useCase) {
this.useCase = useCase;
}
@PostMapping(":bulkApprove")
ResponseEntity<BulkApproveCasesResponse> bulkApprove(
@RequestHeader("Idempotency-Key") String idempotencyKey,
@Valid @RequestBody BulkApproveCasesRequest request,
Principal principal
) {
BulkApproveCasesResponse response = useCase.approve(
principal.getName(),
idempotencyKey,
request
);
return ResponseEntity.ok(response);
}
}
This controller does not contain bulk processing logic.
Its job:
- parse request,
- enforce required headers,
- pass caller identity,
- map response.
The use case owns policy.
23. Use Case Skeleton
final class BulkApproveCasesUseCase {
private final BulkEndpointPolicy policy;
private final AuthorizationService authorization;
private final CaseRepository caseRepository;
private final TransactionTemplate tx;
private final BulkAuditService audit;
BulkApproveCasesResponse approve(
String actorId,
String idempotencyKey,
BulkApproveCasesRequest request
) {
policy.validateEnvelope(request);
authorization.requireOperation(actorId, "CASE_BULK_APPROVE");
String operationId = audit.startBulkOperation(actorId, request.requestId(), request.reason());
List<BulkApproveCaseResult> results = new ArrayList<>();
for (BulkApproveCaseItem item : request.items()) {
BulkApproveCaseResult result = tx.execute(status -> processOne(actorId, operationId, item, request.reason()));
results.add(result);
}
BulkSummary summary = BulkSummaryCalculator.calculate(results);
BulkOperationStatus status = BulkOperationStatusCalculator.calculate(summary);
audit.finishBulkOperation(operationId, summary, status);
return new BulkApproveCasesResponse(operationId, status, summary, results);
}
private BulkApproveCaseResult processOne(
String actorId,
String operationId,
BulkApproveCaseItem item,
String reason
) {
if (!authorization.canApproveCase(actorId, item.caseId())) {
return BulkApproveCaseResultFactory.failed(item, "FORBIDDEN_FOR_CASE", false);
}
Optional<CaseRecord> found = caseRepository.findByIdForUpdate(item.caseId());
if (found.isEmpty()) {
return BulkApproveCaseResultFactory.failed(item, "CASE_NOT_FOUND", false);
}
CaseRecord record = found.get();
if (record.version() != item.expectedVersion()) {
return BulkApproveCaseResultFactory.failed(item, "CASE_VERSION_CONFLICT", false);
}
record.approve(reason, actorId, operationId);
caseRepository.save(record);
return BulkApproveCaseResultFactory.succeeded(item, record.version());
}
}
This example intentionally uses transaction per item.
For all-or-nothing semantics, you would use one transaction around the entire operation and return envelope-level failure if any item cannot be processed.
24. Async Job DTOs
For large work:
public record CreateBulkApprovalJobRequest(
String requestId,
String reason,
List<BulkApproveCaseItem> items
) {}
public record BulkApprovalJobResponse(
String jobId,
BulkJobStatus status,
BulkSummary summary,
Map<String, String> links
) {}
public enum BulkJobStatus {
ACCEPTED,
RUNNING,
COMPLETED,
COMPLETED_WITH_FAILURES,
FAILED,
CANCELLED
}
Job resource endpoints:
POST /case-bulk-approval-jobs
GET /case-bulk-approval-jobs/{jobId}
GET /case-bulk-approval-jobs/{jobId}/results
POST /case-bulk-approval-jobs/{jobId}:cancel
Do not make the client keep a long HTTP connection open for work that should be a job.
25. Backpressure and Load Shedding
Bulk endpoints need explicit rejection rules.
Examples:
- reject when too many active bulk jobs exist,
- reject when caller exceeds concurrent bulk limit,
- reject when downstream dependency is unhealthy,
- reject when queue depth is too high,
- degrade to async job instead of synchronous processing,
- reduce per-job worker concurrency.
Example response:
HTTP/1.1 429 Too Many Requests
Retry-After: 30
Content-Type: application/problem+json
{
"type": "https://errors.example.com/bulk-concurrency-limit",
"title": "Bulk operation limit exceeded",
"status": 429,
"detail": "Caller has 3 active bulk approval jobs; maximum allowed is 3.",
"code": "BULK_CONCURRENCY_LIMIT"
}
Rejecting early is better than accepting work that will time out later.
26. Observability Model
Bulk endpoints need both operation-level and item-level visibility.
Operation-level metrics:
bulk.requests.total,bulk.items.requested,bulk.items.succeeded,bulk.items.failed,bulk.duration,bulk.active.jobs,bulk.queue.depth,bulk.rejected.total.
Recommended dimensions:
- operation name,
- caller service,
- outcome,
- failure class,
- async/sync mode.
Avoid dimensions:
- case ID,
- item ID,
- user ID,
- raw error message.
For traces:
- one span for bulk request,
- events or structured logs for item failures,
- do not create one span per item when item count can be large unless sampling is controlled.
For logs:
{
"event": "bulk_item_failed",
"operationId": "BULK-123",
"operation": "cases.bulkApprove",
"clientItemId": "line-002",
"targetType": "case",
"failureCode": "CASE_VERSION_CONFLICT",
"retryable": false
}
Do not log full payloads by default.
27. Security and Abuse Cases
Bulk endpoints increase blast radius.
Threats:
- a caller processes too many sensitive resources at once,
- bulk endpoint bypasses item-level authorization,
- huge payload causes memory pressure,
- repeated bulk retry amplifies downstream load,
- request body contains PII and gets logged,
- one user action creates massive audit or notification spam,
- async result endpoint leaks item details to unauthorized caller.
Controls:
- item-level authorization,
- max item count,
- caller quota,
- idempotency,
- audit parent operation,
- result access control,
- request body redaction,
- suspicious usage alerts.
In enforcement or regulatory case management, bulk endpoints should often require stronger permission than single-item endpoints because the operational impact is larger.
28. Contract Example in OpenAPI Style
paths:
/cases:bulkApprove:
post:
operationId: bulkApproveCases
summary: Approve multiple cases with per-item outcomes.
parameters:
- name: Idempotency-Key
in: header
required: true
schema:
type: string
requestBody:
required: true
content:
application/json:
schema:
$ref: '#/components/schemas/BulkApproveCasesRequest'
responses:
'200':
description: Bulk request processed with item-level outcomes.
content:
application/json:
schema:
$ref: '#/components/schemas/BulkApproveCasesResponse'
'400':
description: Invalid request envelope.
'413':
description: Request payload or item count too large.
'429':
description: Caller exceeded bulk operation limit.
'503':
description: Service overloaded before processing.
A generated client can call this endpoint, but generated code will not understand the operational semantics unless you document them and wrap them behind a domain client.
29. Failure Modes
| Failure Mode | Symptom | Prevention |
|---|---|---|
| Unbounded item count | Memory/CPU spike | Max item count, request size limit |
| Duplicate side effects | Retried bulk creates duplicate records | Idempotency key, per-item dedup |
| Partial success hidden | Client thinks all failed or all succeeded | Explicit item result model |
| Overlong transaction | Lock contention, deadlock | Per-item transaction or async worker |
| Internal fan-out storm | Downstream overload | Concurrency limits, retry budget |
| Authorization bypass | User modifies unauthorized resources | Per-item authorization |
| Poor error taxonomy | Client retries non-retryable errors | Error code + retryable flag |
| High-cardinality telemetry | Metrics backend cost spike | Controlled labels/dimensions |
| Response too large | Serialization timeout | Async result pagination/file |
| Job result leak | Unauthorized user reads result | Result-level access control |
30. Testing Strategy
Test more than happy path.
Required tests:
- Empty item list rejected.
- Max item count enforced.
- Duplicate
clientItemIdrejected. - Duplicate target ID policy enforced.
- All succeed.
- One validation failure returns partial success.
- One authorization failure returns item-level forbidden.
- One stale version returns conflict item result.
- Envelope auth failure returns
403. - Retry with same idempotency key returns same result.
- Retry with same key but different payload is rejected.
- Large request switches to async or is rejected.
- Internal downstream timeout produces retryable/unknown item result as designed.
- Metrics summary matches item results.
- Audit parent operation exists even for partial success.
Example assertion:
assertThat(response.status()).isEqualTo(BulkOperationStatus.PARTIAL_SUCCESS);
assertThat(response.summary().requested()).isEqualTo(3);
assertThat(response.summary().succeeded()).isEqualTo(2);
assertThat(response.summary().failed()).isEqualTo(1);
assertThat(response.results())
.extracting(BulkApproveCaseResult::clientItemId)
.containsExactly("line-001", "line-002", "line-003");
31. Practical Design Checklist
Before approving a bulk endpoint design, answer these:
- What is the maximum item count?
- What is the maximum request size?
- Is the operation all-or-nothing or per-item?
- Is the operation synchronous or async?
- What is the expected p95 duration?
- What happens when one item fails?
- What happens when the response times out after partial processing?
- Is
Idempotency-Keyrequired? - Is there per-item client identity?
- Are duplicate target IDs allowed?
- Is response order guaranteed?
- Is authorization per operation, per item, or both?
- Are results paginated for async jobs?
- What metrics exist?
- What logs exist?
- What dashboard shows partial failures?
- What is the operator runbook?
- How are old job results expired?
If the design cannot answer these, it is not production-ready.
32. Reference Implementation Shape
Recommended package shape:
case-service/
src/main/java/com/example/caseapp/
api/http/bulk/
CaseBulkController.java
BulkApproveCasesRequest.java
BulkApproveCasesResponse.java
BulkApproveCaseItem.java
BulkApproveCaseResult.java
BulkItemError.java
application/bulk/
BulkApproveCasesUseCase.java
BulkEndpointPolicy.java
BulkSummaryCalculator.java
BulkOperationStatusCalculator.java
domain/casework/
CaseRecord.java
CaseState.java
CaseTransitionPolicy.java
infrastructure/persistence/
CaseRepository.java
infrastructure/audit/
BulkAuditService.java
infrastructure/idempotency/
IdempotencyService.java
Keep bulk-specific DTOs at the API/application boundary.
Do not infect core domain aggregates with transport-specific fields like clientItemId, retryable, httpStatus, or operationId unless they are genuine domain concepts.
33. Mental Model
A good bulk endpoint is not this:
for each item: call the single endpoint
It is this:
validate operation envelope
claim idempotency key
apply authorization policy
establish operation identity
process bounded items under controlled concurrency
record item-level outcome
publish/audit durable result
return truthful summary
The difference is production readiness.
34. Key Takeaways
- Bulk endpoints amplify both productivity and failure.
- Never accept unbounded item counts.
- Separate envelope outcome from item outcome.
- Prefer per-item result unless the domain requires all-or-nothing.
- Use async job resources for large or long-running work.
- Require idempotency for retryable side-effecting bulk commands.
- Check authorization per item when resources have different access constraints.
- Control internal fan-out; never let one request become an unbounded downstream storm.
- Design observability at operation level and item failure level.
- Treat bulk API design as an operational contract, not just a JSON array.
References
- RFC 9110 — HTTP Semantics: https://www.rfc-editor.org/rfc/rfc9110.html
- RFC 9457 — Problem Details for HTTP APIs: https://www.rfc-editor.org/rfc/rfc9457.html
- Google AIP general index for standard, custom, batch, and long-running methods: https://google.aip.dev/general
- Google AIP-231 — Batch Get: https://google.aip.dev/231
- Google AIP-233 — Batch Create: https://google.aip.dev/233
- Google AIP-151 — Long-running operations: https://google.aip.dev/151
You just completed lesson 33 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.
Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.