Series/Learn Java BPMN with Camunda 8 Zeebe

Build CoreOrdered learning track

Error Events, Failures, and Incidents

Learn Java BPMN with Camunda 8 Zeebe - Part 007

Error events, job failures, retries, incidents, and production-grade exception modeling in Camunda 8 Zeebe.

[2026-06-28]18 min read3492 words

In This Lesson

1. Kaufman Skill Deconstruction 2. The Core Mental Model: Error Handling Is Reaction Design 3. Job Outcomes: Complete, Fail, Throw Error

PrevNext

Lesson 0735 lesson track07–19 Build Core

#java#camunda#camunda-8#zeebe+5 more

Part 007 — Error Events, Failures, and Incidents

Goal: build a production mental model for what should be retried, what should be modeled, what should become an incident, and what should be escalated to a human/operator.

This part is intentionally not a generic BPMN error primer. We assume you already understand the basic service task/job worker model from the previous part. Here we focus on the failure semantics that distinguish senior Camunda 8 engineers from people who merely know how to attach a boundary event.

In Camunda 8, failure handling is not just a modeling concern. It is a distributed-systems contract among:

the BPMN model,
the Zeebe broker,
the job worker,
external systems,
Operate/operator intervention,
domain stakeholders,
audit/compliance expectations.

The critical mistake is treating all failures as one thing.

They are not.

A payment rejection, missing evidence, malformed variable, database outage, unexpected NullPointerException, unhandled BPMN error, failed FEEL expression, expired token, and unsupported regulatory decision path are all “errors” in casual language. In Zeebe, they require different reactions.

1. Kaufman Skill Deconstruction

Following Josh Kaufman's method, we break this skill into sub-skills that can be practiced independently.

1.1 Target Performance

After this part, you should be able to look at any failed process instance and answer quickly:

Is this a business reaction that belongs in BPMN?
Is this a technical reaction that belongs in retry/backoff/incident handling?
Is this a model defect that must not be patched operationally forever?
Is this a data contract defect between worker and process?
Is this a side-effect consistency problem requiring idempotency or compensation?
Is this an operator intervention point that must be documented and audited?

1.2 Sub-Skills

Sub-skill	You know it when you can...
Error taxonomy	Classify a failure without debating whether it is “business” or “technical” forever.
BPMN error modeling	Use error boundary events and error event subprocesses only when a modeled process reaction is required.
Retry design	Choose retry count/backoff based on failure type and external dependency behavior.
Incident design	Design incidents as actionable operational stops, not as hidden business workflow states.
Worker exception mapping	Map Java exceptions into complete/fail/throw-error decisions consistently.
Operate runbook	Provide enough context for operators to resolve incidents safely.
Audit defensibility	Explain why a process took a failure path, retried, stopped, or escalated.

1.3 Practice Loop

For each service task in a real process model, ask:

What can go wrong?
Can it safely retry?
Would retrying duplicate a side effect?
Does the business need to see a different path?
If it stops, who can fix it?
What information do they need?
How do we prove what happened later?

This is the actual skill. The notation is secondary.

2. The Core Mental Model: Error Handling Is Reaction Design

A weak model asks:

“Is this a business error or technical error?”

A stronger model asks:

“What reaction should the system take, and at what layer should that reaction live?”

Camunda's own guidance is aligned with this distinction: what matters is not the source of the problem, but whether the reaction is modeled in the business process or handled generically through retries/incidents.

2.1 Reaction Categories

Reaction	Layer	Typical use
Complete job	Worker → Zeebe	Work succeeded; process may continue.
Fail job with retries	Worker/engine	Temporary failure; retry later.
Fail job with zero retries	Worker/engine/operator	Processing cannot continue automatically; create incident.
Throw BPMN error	BPMN model	Known alternative process path.
Escalate to human task	BPMN model	Human decision required.
Compensation	BPMN/domain	Previous successful side effect must be counteracted.
Cancel/terminate	BPMN/model	Process should be stopped by design.
External alert	Observability/platform	Ops must investigate system health.

The advanced skill is choosing correctly under pressure.

2.2 A Practical Decision Tree

Notice the position of BPMN error: it is not the default. It is used when the process has a meaningful modeled reaction.

3. Job Outcomes: Complete, Fail, Throw Error

When a worker handles a job, there are three conceptual outcomes.

3.1 Complete Job

Use CompleteJob when:

the external work succeeded,
the side effect is complete or intentionally accepted,
the result variables are safe to merge into the process,
continuing the process is correct.

Example:

client.newCompleteCommand(job.getKey())
    .variables(Map.of(
        "payment", Map.of(
            "status", "CAPTURED",
            "providerReference", providerReference
        )
    ))
    .send()
    .join();

Risk: completing too early after initiating asynchronous work. If the process continues before the external side effect is truly durable, you have modeled a false state.

3.2 Fail Job

Use FailJob when the worker cannot complete the work and the process should not follow a modeled BPMN alternative path yet.

Examples:

database unavailable,
remote service timeout,
rate limit,
temporary credential refresh failure,
optimistic lock conflict in downstream service,
malformed payload that can be corrected operationally.

Typical shape:

int remainingRetries = Math.max(job.getRetries() - 1, 0);

client.newFailCommand(job.getKey())
    .retries(remainingRetries)
    .errorMessage("Payment provider timeout while capturing authorization")
    .retryBackoff(Duration.ofMinutes(2))
    .send()
    .join();

If retries remain, Zeebe can make the job available again. If retries reach zero, an incident is created.

3.3 Throw BPMN Error

Use ThrowError when the worker is not merely saying “I failed.” It is saying:

“The work reached a known domain outcome, and the BPMN model must take the corresponding path.”

Examples:

applicant not eligible,
payment rejected due to insufficient funds,
evidence package incomplete,
regulatory case requires supervisor review,
account closed,
duplicate claim detected,
external decision returned “manual review required”.

Typical shape:

client.newThrowErrorCommand(job.getKey())
    .errorCode("PAYMENT_REJECTED")
    .errorMessage("Payment provider rejected the capture request")
    .variables(Map.of(
        "payment", Map.of(
            "status", "REJECTED",
            "reason", "INSUFFICIENT_FUNDS"
        )
    ))
    .send()
    .join();

Do not throw a BPMN error for a random Java exception unless the process genuinely has a modeled reaction for it.

4. BPMN Error Event Semantics

A BPMN error is a modeled deviation from the default path.

It has two sides:

Throw side — an error is raised.
Catch side — a boundary event or event subprocess catches it.

4.1 Error Definition

A BPMN error definition has an errorCode. The code is the matching key used by catch events.

Example:

<bpmn:error id="Error_PaymentRejected" errorCode="PAYMENT_REJECTED" />

Use stable, domain-oriented codes. Avoid Java class names or infrastructure terms.

Bad:

NullPointerException
HttpClientException
ProviderReturned400

Better:

PAYMENT_REJECTED
CUSTOMER_NOT_ELIGIBLE
EVIDENCE_INCOMPLETE
CASE_REQUIRES_MANUAL_REVIEW

4.2 Boundary Error Event

Boundary error event is appropriate when a specific activity may result in an alternate path.

In BPMN terms:

the token enters the service task,
a job is created,
worker throws a BPMN error,
the service task is terminated,
the boundary error path is activated.

4.3 Error Event Subprocess

Use an error event subprocess when many tasks inside a scope share one error reaction.

This avoids attaching the same boundary event to every task.

4.4 Propagation

Error propagation starts where the error is thrown:

Check catchers in the current scope.
If none match, propagate to parent scope.
If called by a parent process via call activity, the parent may catch it.
If no catcher exists, an incident is created for unhandled error.

4.5 Catch-All Events

A catch-all error event can catch any error code when no specific code matches.

Use it carefully.

It is useful for:

common cleanup,
safe fallback,
conversion into manual review,
scoped repair path.

It is dangerous when it hides missing explicit handling.

Rule:

A catch-all event must produce a visible, reviewable process state. It must not silently swallow important domain outcomes.

5. Business Reaction vs Technical Reaction

The common debate “business vs technical error” is often misleading. A technical-looking event can have a business reaction, and a business-looking event can be better handled technically.

5.1 Examples

Situation	Better reaction	Why
Payment provider timeout	Fail job with retry/backoff	No business decision yet; provider may recover.
Payment provider says card declined	Throw BPMN error	Known domain outcome; customer needs alternate path.
Scoring service unavailable but policy says approve low-risk applications manually	Throw BPMN error or route to user task	The business defined a reaction.
FEEL condition returns non-boolean	Incident	Model/data defect; no domain path.
Evidence missing from applicant	BPMN error or gateway path	Known case state requiring repair.
Worker cannot parse required variable	Incident	Contract defect requiring fix.
Duplicate external callback	Ignore/idempotent complete, or correlate by message ID	Not a BPMN failure if duplicate is expected.

5.2 Better Terminology

Use these terms in design reviews:

Modeled reaction: visible in BPMN.
Generic technical reaction: retry/backoff/incident.
Operational reaction: operator fixes data/config/dependency.
Compensating reaction: undo or repair previous committed side effect.
Engineering reaction: code/model/schema must be changed.

This vocabulary prevents endless debates and anchors the design in responsibility boundaries.

6. Incident Semantics

An incident is a stop in process execution requiring intervention or correction.

Incidents are not bugs by definition, but they are always a signal that automatic execution cannot proceed safely.

6.1 Common Incident Causes

Incident trigger	Likely root cause
Job failed with no retries left	External system failure, worker bug, bad variables, downstream outage.
Condition does not return boolean	FEEL/model defect or invalid variable type.
Timer expression invalid	Model/data defect.
Decision cannot be evaluated	DMN input missing, invalid expression, version issue.
BPMN error thrown but not caught	Model incomplete or worker threw wrong error code.

6.2 Incident Lifecycle

The runbook matters as much as the model. If operators do not know what to do, you have created a stuck process graveyard.

6.3 Incident Is Not a Business State

Anti-pattern:

“If a case needs manual review, let the worker fail with zero retries so an operator sees it in Operate.”

This is wrong.

Manual review is a business state. Model it as a user task, escalation, or review subprocess.

Use incident when something is abnormal for the runtime contract:

data violates expected shape,
dependency cannot be reached after retries,
model expression is invalid,
worker cannot safely determine next state,
process cannot continue without correction.

6.4 Incident Context Requirements

When raising an incident intentionally by failing a job with zero retries, include diagnostic context.

Minimum useful context:

{
  "incidentContext": {
    "worker": "payment-capture-worker",
    "operation": "capturePayment",
    "externalSystem": "payment-provider-x",
    "externalReference": "auth-98231",
    "failureCategory": "DEPENDENCY_UNAVAILABLE",
    "safeToRetry": true,
    "lastAttemptAt": "2026-06-28T08:15:30Z",
    "operatorHint": "Verify provider outage status, then increase retries to 3 and resolve incident."
  }
}

Do not put secrets, tokens, full request payloads, or regulated personal data in process variables just for debugging. Prefer external logs/traces with correlation IDs.

7. Retry Design

Retry design is not “set retries to 3 everywhere.”

Retries encode assumptions about:

side-effect safety,
dependency recovery time,
user-visible latency,
duplicate execution risk,
ordering assumptions,
cost of repeated calls,
incident noise tolerance.

7.1 Retry Categories

Failure type	Retry?	Backoff	Notes
Network timeout before known side effect	Yes, if idempotent	Exponential or bounded	Use idempotency key.
HTTP 429/rate limit	Yes	Respect provider reset/backoff	Avoid synchronized retry storms.
HTTP 503	Yes	Increasing	Treat as dependency recovery.
HTTP 400 validation	Usually no	None	Map to BPMN error or incident depending on source.
Authorization failure	Sometimes	Short	If token refresh possible; otherwise incident.
Deserialization error	No	None	Contract defect; incident.
Duplicate request	No failure	N/A	Handle idempotently.
Business rejection	No technical retry	N/A	Throw BPMN error or complete with status.

7.2 Retry Budget

A retry budget defines how much automatic recovery you allow before human/operator intervention.

Example:

payment-capture:
  maxAttempts: 5
  backoff: 30s, 2m, 5m, 15m
  incidentAfter: 5th failure
  safeRetryRequirement: idempotency key = processInstanceKey + activityId + operationName

Retry budget should be explicit for high-risk steps.

7.3 Backoff Strategy

Immediate retries are often harmful:

they amplify outages,
they hit rate limits,
they consume worker capacity,
they produce incident storms,
they duplicate side effects when idempotency is weak.

Use backoff when dependency recovery is time-based.

Duration backoffFor(int remainingRetries) {
    return switch (remainingRetries) {
        case 4 -> Duration.ofSeconds(30);
        case 3 -> Duration.ofMinutes(2);
        case 2 -> Duration.ofMinutes(5);
        case 1 -> Duration.ofMinutes(15);
        default -> Duration.ZERO;
    };
}

7.4 Retry and Job Timeout Are Different

A job activation timeout is not a retry.

If a worker activates a job but does not complete/fail it before the activation timeout, Zeebe can make the job available again. Another worker may process it. The remaining retry count is not necessarily decremented by timeout.

This creates a key distributed-systems hazard:

Two workers may perform the same external side effect if the first worker is slow, partitioned, or stuck.

Therefore every side-effecting worker must be idempotent.

8. Java Worker Exception Mapping

A production worker should not let exception mapping be accidental.

Do not scatter try/catch decisions randomly. Create a policy.

8.1 Exception Taxonomy

sealed interface WorkerOutcome permits
    WorkerOutcome.Success,
    WorkerOutcome.RetryableFailure,
    WorkerOutcome.BusinessError,
    WorkerOutcome.NonRetryableIncident {

    record Success(Map<String, Object> variables) implements WorkerOutcome {}

    record RetryableFailure(
        String message,
        int remainingRetries,
        Duration backoff,
        Map<String, Object> diagnosticVariables
    ) implements WorkerOutcome {}

    record BusinessError(
        String errorCode,
        String message,
        Map<String, Object> variables
    ) implements WorkerOutcome {}

    record NonRetryableIncident(
        String message,
        Map<String, Object> diagnosticVariables
    ) implements WorkerOutcome {}
}

This is not mandatory API style. It is an architectural pattern: separate domain classification from Zeebe command emission.

8.2 Mapping Example

WorkerOutcome classify(Throwable error, ActivatedJob job) {
    return switch (error) {
        case PaymentRejectedException e ->
            new WorkerOutcome.BusinessError(
                "PAYMENT_REJECTED",
                e.getMessage(),
                Map.of("payment", Map.of("status", "REJECTED", "reason", e.reasonCode()))
            );

        case RateLimitedException e ->
            new WorkerOutcome.RetryableFailure(
                "Payment provider rate limited request",
                Math.max(job.getRetries() - 1, 0),
                e.retryAfter().orElse(Duration.ofMinutes(2)),
                Map.of("lastFailure", Map.of("category", "RATE_LIMITED"))
            );

        case MalformedProcessVariableException e ->
            new WorkerOutcome.NonRetryableIncident(
                "Invalid process variable contract: " + e.getMessage(),
                Map.of("lastFailure", Map.of("category", "INVALID_VARIABLE_CONTRACT"))
            );

        default ->
            new WorkerOutcome.RetryableFailure(
                "Unexpected worker failure: " + error.getClass().getSimpleName(),
                Math.max(job.getRetries() - 1, 0),
                Duration.ofSeconds(30),
                Map.of("lastFailure", Map.of("category", "UNEXPECTED"))
            );
    };
}

8.3 Emitting the Outcome

void emitOutcome(CamundaClient client, ActivatedJob job, WorkerOutcome outcome) {
    switch (outcome) {
        case WorkerOutcome.Success success ->
            client.newCompleteCommand(job.getKey())
                .variables(success.variables())
                .send()
                .join();

        case WorkerOutcome.BusinessError businessError ->
            client.newThrowErrorCommand(job.getKey())
                .errorCode(businessError.errorCode())
                .errorMessage(businessError.message())
                .variables(businessError.variables())
                .send()
                .join();

        case WorkerOutcome.RetryableFailure failure ->
            client.newFailCommand(job.getKey())
                .retries(failure.remainingRetries())
                .errorMessage(failure.message())
                .retryBackoff(failure.backoff())
                .variables(failure.diagnosticVariables())
                .send()
                .join();

        case WorkerOutcome.NonRetryableIncident incident ->
            client.newFailCommand(job.getKey())
                .retries(0)
                .errorMessage(incident.message())
                .variables(incident.diagnosticVariables())
                .send()
                .join();
    }
}

The important idea is not the exact class structure. The important idea is that failure classification is centralized, reviewable, and testable.

9. Modeling Patterns

9.1 Known Business Rejection

Use when an external system or domain rule returns a definitive outcome.

Worker behavior:

External decision = NOT_ELIGIBLE
=> throw BPMN error NOT_ELIGIBLE
=> boundary event routes to rejection notification

Why not fail job?

Because retrying will not change a deterministic decision unless inputs change. The process has a valid path.

9.2 Dependency Outage

Worker behavior:

HTTP 503 / timeout
=> fail job retries-- with backoff
=> incident only after budget exhausted

Why not BPMN error?

Because there is no business outcome yet. The infrastructure failed to produce one.

Exception: if policy says “after bureau unavailable for 2 hours, route to manual risk review,” then model that as an explicit business reaction. Do not pretend it is purely technical.

9.3 Repairable Data Defect

Use when:

process variable is missing,
value type is wrong,
document reference is invalid,
configuration is incomplete.

Do not create a BPMN path for every malformed payload unless the business process actually includes payload repair.

9.4 Manual Review Is Not Incident

Wrong:

Right:

A human business decision is not a runtime failure.

9.5 Scoped Error Event Subprocess

Use when any step in a subprocess may invalidate the whole scope.

This is useful for regulatory lifecycle modeling where an external event invalidates ongoing work.

10. Error Code Taxonomy

Error codes are API contracts between workers and BPMN.

Treat them as versioned domain terms.

10.1 Naming Guidelines

Good error codes:

PAYMENT_REJECTED
CUSTOMER_NOT_ELIGIBLE
CASE_WITHDRAWN
EVIDENCE_INCOMPLETE
SUPERVISOR_APPROVAL_REQUIRED
DUPLICATE_APPLICATION

Bad error codes:

ERROR_1
BAD_REQUEST
HTTP_400
JAVA_EXCEPTION
SERVICE_FAILED
UNKNOWN_ERROR

10.2 Error Code Categories

Category	Example	Typical catcher
Domain rejection	`NOT_ELIGIBLE`	Boundary event to rejection path.
Repair required	`EVIDENCE_INCOMPLETE`	User task or repair subprocess.
Alternative fulfillment	`PAYMENT_REJECTED`	Alternate payment path.
Governance path	`SUPERVISOR_APPROVAL_REQUIRED`	Approval subprocess.
Lifecycle transition	`CASE_WITHDRAWN`	Close/terminate branch.
Cross-process signal	`RELATED_CASE_BLOCKED`	Escalation or waiting path.

10.3 Versioning Error Codes

Avoid changing the meaning of an existing error code.

If behavior changes materially:

introduce a new error code,
keep old catcher for existing model versions,
document worker compatibility,
test both old and new process versions.

Example:

EVIDENCE_INCOMPLETE       // old: route to submitter correction
EVIDENCE_REQUIRES_REVIEW  // new: route to internal analyst first

11. Variable Mapping for Error Paths

When a worker throws a BPMN error with variables, those variables should be intentionally mapped and scoped.

11.1 What to Include

Good error payload:

{
  "eligibility": {
    "status": "NOT_ELIGIBLE",
    "reasonCode": "MINIMUM_AGE_NOT_MET",
    "evaluatedAt": "2026-06-28T09:00:00Z"
  }
}

Bad error payload:

{
  "fullHttpResponse": "...",
  "stackTrace": "...",
  "rawApplicantPayload": "...",
  "accessToken": "..."
}

11.2 Scope Discipline

Error path variables should answer:

What outcome was reached?
Why does the process take this path?
What data is needed by the next modeled activity?
What correlation ID points to deeper diagnostics?

They should not become a debugging dump.

12. Incidents and Compliance

For regulated workflows, incidents are not just operational interruptions. They may become audit evidence.

12.1 Audit Questions

A regulator, internal auditor, or incident review board may ask:

Why did the process stop?
Which system made the decision?
Was this a business rejection or technical failure?
Who changed variables?
Who resolved the incident?
Did the process retry automatically?
Was the retry safe?
Was any external side effect duplicated?
Did the model have an explicit path for this scenario?

Your design should make these questions answerable.

12.2 Operator Actions Must Be Constrained

If an operator can resolve incidents by changing arbitrary variables, you need governance.

Recommended controls:

runbooks per incident category,
approved variable patch schemas,
incident reason codes,
privileged roles for retry updates,
immutable external audit log for high-risk corrections,
correlation between Operate action and internal ticket/change request,
post-incident review for repeated incidents.

12.3 Incident Rate Is a Process Quality Metric

High incident rate usually means one of:

weak input validation,
unstable dependency,
poor retry/backoff,
non-idempotent worker,
too much data in process variables,
model expression fragility,
business scenario not modeled,
bad deployment/version compatibility.

Do not normalize incident noise.

13. Anti-Patterns

13.1 BPMN Error for Every Java Exception

Symptom:

catch (Exception e) {
    client.newThrowErrorCommand(job.getKey())
        .errorCode("SYSTEM_ERROR")
        .send();
}

Why it is bad:

hides technical failures in business model,
bypasses retry/backoff,
creates misleading process state,
makes operators think the process handled the situation intentionally.

Correct approach:

classify known domain outcomes as BPMN errors,
fail retryable technical problems,
create incidents for unrecoverable runtime/data defects.

13.2 Incident as Manual Review Queue

Manual review should be a user task or case work item, not an incident.

13.3 Infinite Retry Mindset

High retry counts can hide systemic defects and create cost explosions.

Retries must have:

a maximum,
backoff,
idempotency,
diagnostics,
eventual operator path.

13.4 Catch-All Error That Swallows Everything

A catch-all boundary event that routes to “continue anyway” can destroy auditability.

Use catch-all only with explicit review, repair, or safe fallback.

13.5 Error Code Coupled to Provider Terms

Provider-specific error codes should be translated into domain codes.

Bad:

STRIPE_CARD_DECLINED
BUREAU_SCORE_503

Better:

PAYMENT_REJECTED
RISK_SCORE_UNAVAILABLE

Provider detail can be stored separately as diagnostic metadata.

13.6 Failing Job After Side Effect Without Idempotency

If the worker successfully performed an external side effect but fails before completing the job, Zeebe may retry the job. Without idempotency, duplicate side effects are possible.

Rule:

Every side-effecting worker must use an idempotency key derived from stable process/job/domain identifiers.

14. Production Review Checklist

For every service task, review this checklist.

14.1 Worker Outcome Checklist

14.2 BPMN Error Checklist

Is every thrown errorCode caught where expected?
Is every catch path a meaningful process reaction?
Are catch-all events justified?
Are error codes domain-oriented?
Are error variables mapped intentionally?
Is error propagation across subprocess/call activity understood?
Are old process versions still compatible with worker error codes?

14.3 Incident Checklist

Does incident message explain the failure enough for an operator?
Is there a runbook?
Can the incident be resolved safely?
Are variable patches governed?
Are repeated incidents tracked as engineering defects?
Is incident resolution auditable?

15. Lab: Build a Failure-Aware Worker

15.1 Scenario

You are implementing assess-case-risk for a regulatory enforcement process.

Inputs:

{
  "caseId": "CASE-2026-0001",
  "entityId": "ENT-9001",
  "evidencePackageId": "EVP-222",
  "riskAssessment": null
}

External system responses:

External response	Desired reaction
`LOW`, `MEDIUM`, `HIGH`	Complete job with risk assessment.
`EVIDENCE_INCOMPLETE`	Throw BPMN error to evidence repair path.
`MANUAL_REVIEW_REQUIRED`	Throw BPMN error to analyst review path.
HTTP 503	Fail job with retries/backoff.
Missing `caseId`	Fail with zero retries; incident.
Unexpected JSON shape	Fail with zero retries; incident.

15.2 BPMN Shape

15.3 Worker Classification

WorkerOutcome assessRisk(ActivatedJob job) {
    CaseRiskRequest request = parseAndValidate(job.getVariablesAsMap());

    RiskResponse response = riskClient.assess(
        request,
        IdempotencyKey.of(job.getProcessInstanceKey(), job.getElementId(), request.caseId())
    );

    return switch (response.status()) {
        case LOW, MEDIUM, HIGH ->
            new WorkerOutcome.Success(Map.of(
                "riskAssessment", Map.of(
                    "level", response.status().name(),
                    "score", response.score(),
                    "assessedAt", response.assessedAt().toString()
                )
            ));

        case EVIDENCE_INCOMPLETE ->
            new WorkerOutcome.BusinessError(
                "EVIDENCE_INCOMPLETE",
                "Risk engine requires additional evidence",
                Map.of("evidence", Map.of("status", "INCOMPLETE"))
            );

        case MANUAL_REVIEW_REQUIRED ->
            new WorkerOutcome.BusinessError(
                "MANUAL_REVIEW_REQUIRED",
                "Risk engine requires manual analyst review",
                Map.of("riskAssessment", Map.of("status", "MANUAL_REVIEW_REQUIRED"))
            );
    };
}

15.4 Tests You Must Write

successful low-risk completion,
successful high-risk completion,
evidence incomplete BPMN error,
manual review BPMN error,
HTTP 503 decrements retries and applies backoff,
malformed variables create incident,
idempotency key is stable across retries,
duplicate worker execution does not duplicate downstream side effect.

16. Mental Compression

When in doubt, remember this:

Complete job: the work succeeded.
Fail job with retries: the work may succeed later.
Fail job with zero retries: the runtime cannot continue safely without intervention.
Throw BPMN error: the process has a known modeled reaction.
Incident: abnormal stop, not business-as-usual human work.
Compensation: previous successful side effect must be repaired or counteracted.

This distinction is the backbone of production-grade Camunda 8 modeling.

17. Source References

Camunda 8 Docs — Error events: https://docs.camunda.io/docs/components/modeler/bpmn/error-events/
Camunda 8 Docs — Incidents: https://docs.camunda.io/docs/components/concepts/incidents/
Camunda 8 Docs — Job workers: https://docs.camunda.io/docs/components/concepts/job-workers/
Camunda 8 Docs — Dealing with problems and exceptions: https://docs.camunda.io/docs/components/best-practices/development/dealing-with-problems-and-exceptions/
Camunda 8 Docs — Orchestration Cluster REST API: fail job / throw job error / resolve incident.

Lesson Recap

You just completed lesson 07 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 06

Service Task and Job Worker Model

Next Lesson

Lesson 08

Message Correlation and Event-Driven Processes