Deepen PracticeOrdered learning track

Long-Running Processes, Saga, Compensation, and Consistency

Learn Java BPMN with Camunda BPM Platform 7 - Part 026

Long-running processes, saga design, compensation, cancellation, timeouts, consistency, and recovery in Camunda 7: how to model business transactions without confusing BPMN transaction subprocesses with ACID transactions.

16 min read3023 words
PrevNext
Lesson 2635 lesson track2029 Deepen Practice
#java#bpmn#camunda-7#saga+8 more

Part 026 — Long-Running Processes, Saga, Compensation, and Consistency

Target skill: mampu mendesain long-running workflow yang aman terhadap partial success, timeout, cancellation, duplicate command, remote side effect, dan human correction, tanpa menyamakan BPMN transaction subprocess dengan database transaction.

Dalam sistem bisnis nyata, proses bisa berlangsung menit, jam, hari, bulan, bahkan tahun. Contoh:

  • enforcement investigation,
  • license application,
  • order fulfillment,
  • loan approval,
  • dispute handling,
  • claims processing,
  • regulatory remediation,
  • cross-agency review.

Di workflow seperti ini, Anda hampir tidak pernah punya atomic ACID transaction end-to-end. Yang Anda punya adalah durable orchestration, observable state, retry, compensation, timeout, manual intervention, dan audit trail.

Referensi resmi dan pendukung:


1. Kaufman Deconstruction

Saga/compensation skill perlu dipotong menjadi sub-skill:

Sub-skillPertanyaan utamaOutput praktis
Long-running transaction thinkingApa yang sudah commit di luar Camunda?Partial success map
Compensation modelingBagaimana membalik efek yang sudah terjadi?Compensation command/handler
Cancellation modelingSiapa boleh membatalkan dan kapan?Cancel path invariant
Timeout designApa deadline bisnis dan efeknya?Timer + escalation + cleanup
IdempotencyApa yang terjadi jika retry?Idempotent command/compensation
Consistency modelInvariant mana eventual, mana immediate?Domain consistency contract
Human recoveryKapan mesin harus berhenti dan minta manusia?Manual repair task
Audit defensibilityBagaimana membuktikan keputusan?Append-only event/audit trail
Testing failure pathsBagaimana partial success diuji?Scenario matrix

Kaufman-style compression:

A saga is not a diagram pattern. It is a consistency strategy for coordinating committed side effects across boundaries that cannot share one database transaction.


2. The Big Misconception: BPMN Transaction Is Not ACID

Camunda punya transaction subprocess di BPMN. Nama ini sering menipu engineer.

BPMN transaction subprocess bukan cara membuat database transaction panjang. Ia adalah grouping logical business activities yang punya outcomes seperti success, cancel, atau hazard. Camunda docs secara eksplisit membedakan BPMN transaction dari technical ACID transaction: BPMN transaction bisa berlangsung lama, sering mencakup banyak ACID transactions, kehilangan isolation, dan rollback tradisional tidak mungkin setelah beberapa side effect sudah commit.

Mental model:

ACID rollback undoes uncommitted work.
Saga compensation performs new committed work to counteract old committed work.

Itu perbedaan fundamental.


3. Partial Success Map

Sebelum menggambar compensation, buat partial success map.

Contoh order fulfillment:

StepSide effectReversible?CompensationDeadline
Reserve inventoryStock reservedYaRelease reservation30 min
Authorize paymentFunds authorizedYa/sebagianVoid authorization7 days
Create shipmentLabel createdYa sebelum pickupCancel shipmentBefore pickup
Capture paymentFunds capturedTidak fullyRefundBusiness-specific
Send emailCustomer notifiedTidakSend correction noticeASAP

Contoh enforcement case:

StepSide effectReversible?Compensation
Request evidence from agencyExternal request sentTidak sepenuhnyaSend withdrawal/correction
Freeze accountAccount restrictedYa dengan auditRelease restriction
Publish noticePublic/legal notice createdTidak murniPublish correction/amendment
Assign investigatorWork allocation changedYaReassign/revoke task
Escalate to committeeGovernance step openedYa/sebagianWithdraw agenda item

Insight:

Tidak semua side effect punya inverse operation yang sempurna. Kadang compensation adalah correction, apology, amendment, release, refund, or human adjudication.


4. Saga Building Blocks

Saga di Camunda biasanya dibangun dari primitive berikut:

Building blockBPMN/Java implementationTujuan
Forward commandService task / external task / outboxMelakukan side effect
Wait for resultMessage catch event / receive taskMenunggu confirmation
TimeoutTimer boundary/intermediate eventBatas waktu bisnis
RetryFailed job retry / external task retryTechnical resilience
Business failureBPMN error / result eventExpected negative outcome
CompensationCompensation boundary/event subprocess/service taskCounteraction
CancellationInterrupting boundary / event subprocess / cancel eventStop active path
Manual repairUser taskHuman decision/repair
Audit projectionHistory + domain audit tableDefensibility

Saga bukan satu elemen BPMN tunggal. Saga adalah kombinasi primitives dengan invariant domain yang jelas.


5. Retry vs Compensation vs Escalation

Jangan mencampur tiga hal ini.

MechanismUntuk apaContohSalah pakai jika
RetryTechnical/transient failureHTTP timeout, DB deadlockBusiness rejection di-retry 100 kali
CompensationUndo/counteract committed business side effectRefund, release inventoryRemote call belum pernah berhasil
Escalation/manual repairAmbiguous/unsafe situationUnknown payment statusSemua error dilempar ke manusia tanpa retry

Decision tree:

Rule:

Retry repeats the same intent.
Compensation creates a new opposite/corrective intent.
Escalation asks for judgment when automation cannot know the safe action.

6. Idempotency Is Mandatory

Forward actions and compensation actions must both be idempotent.

Example command keys:

ActionIdempotency key
Reserve inventoryorderId + itemId + reservationAttempt
Release reservationreservationId + releaseReason
Authorize paymentorderId + paymentAttempt
Void authorizationauthorizationId + voidAttempt
Refund paymentcaptureId + refundReason + amount
Send evidence requestcaseId + agencyId + requestVersion
Withdraw requestrequestId + withdrawalVersion

Bad compensation:

public void refund(String paymentId) {
    paymentClient.refund(paymentId); // no idempotency key
}

Better:

public void refund(RefundCommand command) {
    paymentClient.refund(
        command.captureId(),
        command.amount(),
        command.reason(),
        command.idempotencyKey()
    );
}

If compensation retries after a network timeout, it must not refund twice.


7. Pattern: Reservation + Confirmation

This pattern avoids hard-to-reverse side effects.

Use for:

  • inventory,
  • appointment slots,
  • temporary account restrictions,
  • capacity allocation,
  • committee agenda slot,
  • limited quota license application.

Invariant:

Reserved resources must have expiration or release path.
No reservation should rely only on the workflow reaching the release task.

External systems should also enforce TTL, because process incident or engine downtime must not lock resources forever.


8. Pattern: Send Command and Compensate If Later Step Fails

This is simple explicit compensation flow. It is often clearer than BPMN compensation events for engineering teams.

Pros:

  • readable,
  • easy to test,
  • explicit sequence,
  • easier observability,
  • fewer BPMN semantic surprises.

Cons:

  • can get verbose,
  • hard to reuse for many branches,
  • model may become tangled if every step has compensation wiring.

Use explicit compensation flow when:

  • compensation order is business-specific,
  • team is not expert in BPMN compensation semantics,
  • audit readability matters more than compact diagram,
  • there are few side effects.

9. BPMN Compensation Events

Camunda supports compensation events.

Key semantics:

  • compensation handler is associated with an activity/subprocess,
  • handler runs only for activities that completed successfully,
  • compensation can be thrown for a specific activity or current scope,
  • scope compensation includes concurrent branches,
  • compensation is triggered hierarchically,
  • default compensation order is reverse order of completion,
  • compensation boundary event becomes active after attached activity completes successfully,
  • compensation is not a magic rollback.

Basic shape:

BPMN-ish XML idea:

<serviceTask id="bookHotel" name="Book Hotel" camunda:delegateExpression="${bookHotelDelegate}" />

<boundaryEvent id="compensateBookHotel" attachedToRef="bookHotel">
  <compensateEventDefinition />
</boundaryEvent>

<association sourceRef="compensateBookHotel" targetRef="cancelHotelBooking" />

<serviceTask id="cancelHotelBooking"
             name="Cancel Hotel Booking"
             isForCompensation="true"
             camunda:delegateExpression="${cancelHotelBookingDelegate}" />

When compensation events are useful

  • many activities each have clear inverse handler,
  • compensation should follow BPMN completion order,
  • modeler audience understands compensation notation,
  • handlers are idempotent and observable,
  • compensation subscription semantics are tested.

When explicit compensation flow is better

  • compensation order differs from reverse completion order,
  • compensation requires business decisions,
  • partial compensation requires conditional logic,
  • call activities/subprocess boundaries complicate propagation,
  • operations team needs very obvious diagrams.

10. Compensation Variable Snapshot

Compensation has variable subtleties. For embedded subprocesses, compensation handler can access local variables captured when subprocess completed. Higher-scope variables are seen in their current state when compensation is thrown. That can surprise teams.

Practical rule:

Persist compensation input explicitly before completing the forward activity.
Do not rely on mutable process variables still having the right values later.

Example:

execution.setVariable("hotelBookingCompensation", Map.of(
    "bookingId", bookingId,
    "provider", provider,
    "cancelBy", cancelBy.toString(),
    "idempotencyKey", "cancel-hotel-" + bookingId
));

Better yet, persist compensation command in application DB/outbox:

create table saga_compensation_action (
  id uuid primary key,
  saga_id varchar(150) not null,
  action_type varchar(100) not null,
  action_key varchar(150) not null,
  payload_json jsonb not null,
  status varchar(30) not null,
  created_at timestamp not null,
  executed_at timestamp null,
  unique(action_key)
);

11. Transaction Subprocess: Use Carefully

A BPMN transaction subprocess can have success, cancel, and hazard outcomes.

Important semantics:

  • cancel end event can only be used with transaction subprocess,
  • cancel boundary event catches cancellation,
  • cancel boundary interrupts active executions in transaction scope,
  • compensation runs synchronously before leaving cancel boundary path,
  • only one cancel boundary event is allowed for a transaction subprocess,
  • if transaction ends by hazard/error not handled in scope, compensation may not run.

Practical warning

Do not use transaction subprocess because it “sounds correct”. Use it only when the team understands:

  • cancel end event semantics,
  • compensation subscription activation,
  • optimistic locking consequences,
  • variable snapshot behavior,
  • limitations around call activity propagation,
  • operational recovery path if compensation fails.

For many teams, explicit saga flow is easier to support.


12. Cancellation Model

Cancellation is not always compensation.

Types:

TypeMeaningExample
User cancellationActor withdraws requestApplicant withdraws license application
Business cancellationDomain condition invalidates processPayment expired
System cancellationPlatform/operator stops workflowDuplicate process started
Legal/regulatory cancellationAuthority invalidates pathJurisdiction removed
Timeout cancellationDeadline passedAgency did not respond

Cancellation design must answer:

  • what active work must stop?
  • what side effects must be compensated?
  • what side effects must remain as audit record?
  • who is notified?
  • is cancellation reversible?
  • can new process be started later?
  • what happens to late events?

13. Pattern: Interrupting Event Subprocess for Cancellation

A process-wide cancellation signal/message often fits event subprocess.

Use this when:

  • cancellation can occur across many states,
  • it should interrupt current scope,
  • cleanup path is shared,
  • cancellation is domain command with authorization.

Avoid this when:

  • cancellation rules differ heavily by state,
  • some states cannot be cancelled,
  • cancellation requires local state-specific compensation,
  • event subprocess would become a giant hidden control flow.

In those cases, explicit state-specific boundary events or gateways may be clearer.


14. Timeout Is a Business Event

Timer is not just technical scheduling. It represents a business fact:

The process has waited long enough that the domain must move to another state.

Examples:

  • payment authorization expired,
  • agency did not respond within legal deadline,
  • reviewer missed SLA,
  • customer did not provide document,
  • reservation hold expired.

Timer handling should not only “go to failure”. It should model domain response:

Timer checklist:

  • Is the duration based on calendar days, business days, or legal deadline?
  • What timezone matters?
  • What happens if event arrives after timer fired?
  • Is there a grace period?
  • Is timeout reversible?
  • Does timeout need notification?
  • Does timeout require compensation?
  • Does timer volume create job executor load?

15. Human Repair as First-Class Path

There are states automation cannot safely resolve:

  • external system says payment succeeded but no capture id,
  • compensation API returns unknown status,
  • duplicate case records exist,
  • legal notice already published with wrong data,
  • investigator changed decision after escalation,
  • timeout fired but event arrived one second later,
  • conflicting agencies provide inconsistent evidence.

Do not hide these in logs. Model them.

A good manual repair task includes:

  • business key,
  • current process state,
  • failed action,
  • external system reference,
  • retry history,
  • suggested options,
  • risk notes,
  • audit requirement,
  • link to Cockpit/business UI.

16. Saga State vs Process State

Do not rely only on BPMN token position to know domain state.

Maintain domain state explicitly:

ORDER_CREATED
INVENTORY_RESERVED
PAYMENT_AUTHORIZED
SHIPMENT_CREATED
PAYMENT_CAPTURED
FULFILLED
CANCELLING
COMPENSATING
CANCELLED
MANUAL_REPAIR_REQUIRED

Why?

  • UI/read models need domain state,
  • event adapter needs state-aware correlation,
  • audit needs business terms,
  • migration may move tokens but domain state remains meaningful,
  • operations team should not infer business state from activity id only.

Camunda process state and domain state should be related but not identical.


17. Designing Compensation Commands

A compensation command is not “call inverse API”. It is a domain command with contract.

Example:

public record ReleaseInventoryReservationCommand(
    String reservationId,
    String orderId,
    String reason,
    String requestedBy,
    String idempotencyKey,
    Instant requestedAt
) {}

Command contract:

FieldWhy
reservationIdTarget exact side effect
orderIdAudit/business context
reasonLegal/support trace
requestedByActor/system accountability
idempotencyKeyRetry safety
requestedAtTemporal audit

Compensation handler should classify outcomes:

OutcomeMeaningProcess behavior
SuccessCompensation completedContinue
Already doneIdempotent successContinue
Retryable failureTimeout/temporary errorRetry
Business impossibleCannot release/refundManual repair
UnknownNo reliable statusQuery/reconcile/manual repair

18. Outbox + Saga

For side effects, outbox works for both forward and compensating commands.

This gives you:

  • durable intent,
  • retryable publish,
  • audit trail,
  • separation of Camunda transaction from external side effect,
  • idempotency at service boundary.

19. Case Study: Regulatory Enforcement Lifecycle

Imagine an enforcement workflow:

  1. open case,
  2. request data from entities,
  3. freeze suspicious account,
  4. collect evidence,
  5. review by investigator,
  6. escalate to committee,
  7. issue decision,
  8. publish notice,
  9. monitor remediation.

Some actions are reversible, some are not.

Important distinctions:

  • withdrawing evidence request may not erase the fact it was sent,
  • releasing account restriction requires audit reason,
  • public notice may need amendment rather than deletion,
  • legal deadlines may override normal retry strategy,
  • manual decision path must be explicit.

Regulatory saga invariant:

Every irreversible or externally visible action must have an audit-visible correction or escalation path, even if it has no true technical rollback.

20. Testing Saga Failure Paths

Do not only test happy path.

ScenarioExpected behavior
Forward step fails before external commitRetry or incident, no compensation
Forward step succeeds then later step failsCompensation command emitted
Compensation API timeoutRetry with same idempotency key
Compensation returns already compensatedTreat as success
Compensation returns impossibleManual repair task
Cancel event while parallel branches activeActive work interrupted safely
Timer fires before reply eventTimeout path wins; late event marked stale
Reply event and timer raceOne path wins; other handled idempotently
Duplicate compensation commandNo duplicate external effect
Process incident during compensationRecovery resumes safely
Operator manually repairsAudit records decision
New process version deployed mid-sagaRunning instance behavior still valid

Example test naming style:

@Test
void paymentCaptureFailureAfterInventoryReservationReleasesInventory() {}

@Test
void duplicateReleaseReservationCommandIsIdempotentSuccess() {}

@Test
void latePaymentAuthorizedAfterTimeoutIsMarkedStale() {}

@Test
void compensationUnknownStatusCreatesManualRepairTask() {}

21. Operational Runbook for Saga Incidents

When saga incident occurs, operator needs structured questions:

  1. What is the business key?
  2. Which forward side effects have definitely succeeded?
  3. Which side effects are unknown?
  4. Which compensation actions have been attempted?
  5. Which actions are safe to retry?
  6. Which external systems need reconciliation?
  7. Has customer/entity/regulator been notified?
  8. Is legal/audit approval required before repair?
  9. Should process continue, compensate, or terminate?
  10. What evidence must be attached to the case?

A runbook should map incident types:

IncidentOperator action
Failed job before side effectRetry after fix
Failed job after side effect unknownCheck external reference before retry
Compensation failed retryableRetry job/command
Compensation impossibleCreate/complete manual repair path
Duplicate saga instanceSuspend duplicate and reconcile side effects
Late event after cancellationAttach as audit evidence, do not continue old wait state

22. Anti-Patterns

Anti-patternWhy it failsBetter design
Treat BPMN transaction as DB transactionLong-running work cannot hold ACID rollbackSaga + compensation
Compensation without idempotencyRetry can double-refund/releaseIdempotency key per compensation command
Only happy-path BPMNPartial success invisibleExplicit failure/compensation paths
Swallow exception after side effectEngine thinks success but state unknownRecord outcome and create incident/manual repair
Timer directly terminates without cleanupResources left reservedTimeout cleanup/compensation
Every failure becomes BPMN errorTechnical failures bypass retry/incident semanticsDistinguish retry vs business failure
Compensation handler depends on mutable variablesWrong data used laterPersist compensation input
Direct remote side effect in same transactionRollback cannot undo external commitOutbox/external task/idempotency
No manual repair pathUnsafe automation or stuck incidentsFirst-class repair workflow
Signal used to cancel one sagaBroadcast riskMessage/event subprocess with correlation
Domain state inferred only from tokenPoor UI/audit/supportBusiness state projection
Compensation hidden in delegate codeModel lies about behaviorModel compensation visibly

23. Design Checklist

Before approving long-running process design:

  • What side effects can commit outside Camunda?
  • Which side effects are reversible, partially reversible, or irreversible?
  • What is the compensation command for each reversible side effect?
  • Is each forward and compensation command idempotent?
  • Is there an outbox/inbox around external communication?
  • What is the timeout for each reservation/wait state?
  • What happens to late events after timeout/cancel?
  • Is cancellation authorized and state-aware?
  • Is human repair modeled for unknown/impossible states?
  • Are business state and BPMN token state both observable?
  • Are compensation inputs persisted immutably?
  • Do tests cover partial success and duplicate retry?
  • Does operations have a runbook?
  • Does audit show who/what/when/why for compensation?

24. Mental Compression

Keep these distinctions sharp:

Rollback != compensation.
Retry != compensation.
Cancellation != failure.
Timeout != technical error.
Incident != business rejection.
BPMN transaction != ACID transaction.
Process state != domain state.
Automation != judgment.

Saga design is mostly about humility. Distributed business processes fail in ways no single database can hide. Camunda 7 gives you durable state, BPMN control flow, jobs, timers, incidents, and compensation primitives. Your job is to add domain invariants, idempotent boundaries, observable side effects, and human recovery where automation cannot be trusted.

Top-tier engineer does not ask “how do I model rollback in BPMN?”. They ask:

“Which committed effects can become externally visible, what consistency promise do we owe the business, and what exact corrective action is safe, idempotent, observable, and auditable?”

Lesson Recap

You just completed lesson 26 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.