Deepen PracticeOrdered learning track

Orchestration vs Choreography

Learn Java BPMN with Camunda 8 Zeebe - Part 021

Deep dive into orchestration versus choreography in Camunda 8 Zeebe, including decision heuristics, event-driven boundaries, BPMN modeling patterns, Java worker integration, ownership, observability, and anti-patterns.

18 min read3534 words
PrevNext
Lesson 2135 lesson track2029 Deepen Practice
#java#camunda#camunda-8#zeebe+8 more

Part 021 — Orchestration vs Choreography

Orchestration and choreography are not enemies.

They are two coordination styles. A top-tier engineer does not ask, "Should we use BPMN or events?" The better question is:

Which parts of the business lifecycle need an explicit controlling state machine, and which parts should remain autonomous reactions to events?

Camunda 8 Zeebe gives us a durable orchestration runtime. Event streaming gives us a durable notification and propagation substrate. Enterprise systems usually need both.

This part builds the decision model.


1. Kaufman Deconstruction

The subskill is choosing the correct coordination model for distributed business processes.

Break it down into smaller skills:

SubskillQuestion You Must Answer
Lifecycle ownershipWho owns the end-to-end state?
Failure ownershipWho is responsible when step 4 fails after steps 1–3 succeeded?
Visibility needDoes a human/operator need to see the whole lifecycle?
Coupling controlIs central coordination helpful or harmful here?
Data authorityWhich service owns truth, and which system only observes?
Event semanticsIs an event a command, fact, notification, or correlation signal?
Recovery semanticsShould recovery be modeled, retried, compensated, or ignored?
GovernanceIs the lifecycle auditable, regulated, and change-controlled?

The goal is not ideological purity. The goal is a design that remains understandable under failure.


2. The Core Distinction

Orchestration

In orchestration, one process explicitly controls the sequence of work.

The process says:

  • what happens first;
  • what happens next;
  • what waits;
  • what times out;
  • what retries;
  • what escalates;
  • what compensates;
  • what completes the lifecycle.

This is useful when the lifecycle itself has business meaning.

Choreography

In choreography, services react to events without a central controller owning every step.

The event says:

  • something happened;
  • consumers may react;
  • producer does not know every downstream action;
  • downstream services remain autonomous.

This is useful when reactions are independent and do not require one end-to-end state machine.


3. The Wrong Framing

A common weak premise is:

Microservices should never be orchestrated because orchestration creates coupling.

That is incomplete.

Orchestration creates explicit coupling. Choreography often creates implicit coupling.

Implicit coupling can be worse:

  • nobody owns the full lifecycle;
  • business users cannot explain current state;
  • failures are scattered across logs;
  • duplicate events create duplicate actions;
  • downstream behavior changes without upstream visibility;
  • audit reconstruction requires joining many service logs;
  • recovery requires tribal knowledge.

The real trade-off is not coupling vs no coupling. It is visible coupling vs hidden coupling.


4. The Decision Heuristic

Use orchestration when most answers are "yes":

QuestionIf Yes, Prefer
Is there a legally/business-defined lifecycle?Orchestration
Must humans see status across multiple steps?Orchestration
Is there a clear owner of the end-to-end outcome?Orchestration
Are timeouts/escalations part of the business rule?Orchestration
Do failures require modeled recovery?Orchestration
Are steps sequentially dependent?Orchestration
Do you need versioned process definitions?Orchestration
Is auditability/regulatory defensibility critical?Orchestration

Use choreography when most answers are "yes":

QuestionIf Yes, Prefer
Are consumers optional or independently deployable?Choreography
Is the producer only announcing a fact?Choreography
Can consumers fail without blocking the producer lifecycle?Choreography
Are reactions many-to-many and evolving?Choreography
Is central sequencing unnecessary?Choreography
Is the event useful beyond one process?Choreography
Should services own their local process independently?Choreography

5. Business-Level Synchrony vs Technical Communication Style

Do not confuse business synchrony with technical synchrony.

A business step can be logically synchronous while the technical implementation is asynchronous.

Example:

"Assess risk before assigning investigator."

Business-level: synchronous dependency. Assignment must wait for risk.

Technical-level: risk engine may receive a message and respond later.

The BPMN model may represent this as:

  • one service task that waits until the worker completes;
  • send task + receive task;
  • message throw + message catch;
  • service task that delegates asynchronous implementation details to a worker.

Camunda's best-practice documentation generally recommends service tasks as the default for synchronous request/response, while send tasks are useful for sending asynchronous messages and receive tasks for incoming asynchronous messages.

The model should expose business-relevant waiting, not every protocol detail.


6. Three Integration Styles

Style A — synchronous request/response

Use when the process needs a result before continuing.

Examples:

  • validate tax ID;
  • reserve appointment slot;
  • calculate risk score;
  • fetch customer profile snapshot;
  • submit decision to authoritative case service.

Model with service task.

Style B — asynchronous request/response

Use when the process sends a request and waits for a later response.

Examples:

  • third-party background check;
  • external agency response;
  • payment settlement;
  • document verification by OCR service;
  • external registry lookup.

Model with service task if the technical async detail should be hidden, or send/receive/message events if the waiting itself is business-visible.

Style C — event notification

Use when the process publishes a fact and does not own every reaction.

Examples:

  • case opened;
  • evidence uploaded;
  • decision issued;
  • appeal period started;
  • enforcement action closed.

Model with a send task or service task that publishes an event.


7. Orchestration Boundary

An orchestration process should own a business lifecycle, not a random technical call chain.

Good orchestration boundary:

Bad orchestration boundary:

The second model may be valid only if those calls are real business milestones. Otherwise, it is an integration script disguised as BPMN.


8. Choreography Boundary

A choreographed event should be a stable fact, not an imperative disguised as an event.

Good events:

CaseOpened
EvidenceSubmitted
RiskScoreCalculated
DecisionApproved
AppealReceived
EnforcementActionClosed

Suspicious events:

SendEmailNow
CreateTaskForInvestigator
CallRiskService
UpdateDatabase
RetryPayment

Those names are commands. Commands can be fine, but do not pretend they are facts.

A fact says: "this happened."

A command says: "do this."

A correlation message says: "this response belongs to that waiting instance."

Mixing these categories creates brittle architecture.


9. Event Taxonomy

TypeMeaningProducer Knows Consumers?Example
Domain eventA business fact happenedNoCaseOpened
Integration eventA fact published for external systemsUsually noCaseReadyForAssignment
Command messageA requested actionYesCalculateRiskScore
Reply messageResponse to a requestYesRiskScoreCalculatedForRequest
Correlation messageResume waiting process instanceYesExternalAssessmentCompleted
Audit eventImmutable trail of actionNoDecisionNoticeSent

In Camunda 8, a message used for process correlation needs a clear message name and correlation key. A Kafka event may contain the payload, but the worker or event router must decide whether it should publish/correlate a Camunda message.


10. Hybrid Pattern: Orchestrated Core, Choreographed Edges

This is often the best enterprise pattern.

The process owns the core lifecycle. Events distribute facts to autonomous consumers.

This avoids two extremes:

  • central process controls every downstream side effect;
  • event soup hides the lifecycle.

11. Hybrid Pattern: Event-Started Orchestration

A process can start from an event.

Use this when the authoritative source of creation is outside Camunda, but the resulting lifecycle should be orchestrated.

Key design point:

  • external service owns entity creation;
  • Camunda owns lifecycle progression;
  • process instance key is not the domain ID;
  • correlation key should be stable, usually the domain ID.

12. Hybrid Pattern: Orchestration Publishes Facts

A process can publish events after meaningful milestones.

This pattern works well when event publication must be reliable and auditable.

The worker should not directly publish to Kafka and complete the job without thinking through the failure window. If publishing succeeds but job completion fails, the job may be retried and publish a duplicate. If job completion succeeds but publishing fails, downstream systems miss the event.

Use idempotent event IDs and an outbox where required.


13. Pattern: Process as Policy, Services as Authority

The process should often own policy sequencing, not the authoritative domain data.

Example:

The process asks services to act. It does not replace them.

Good invariant:

Camunda variables are orchestration context. Domain services remain system of record.


14. Pattern: Event Router as Anti-Corruption Layer

Do not let every service know Camunda internals.

Instead, use an event router.

The router maps external event shape to Camunda correlation semantics.

Responsibilities:

  • validate event schema;
  • deduplicate by event ID;
  • derive correlation key;
  • choose message name;
  • publish message or start process;
  • track routing result;
  • avoid leaking process instance keys to external producers.

Example Java shape:

public final class CaseEventRouter {

    private final CamundaClient camundaClient;
    private final RoutedEventRepository routedEvents;

    public void route(CaseEvent event) {
        if (routedEvents.alreadyHandled(event.eventId())) {
            return;
        }

        switch (event.type()) {
            case "CASE_OPENED" -> startCaseLifecycle(event);
            case "EVIDENCE_SUBMITTED" -> correlateEvidenceSubmitted(event);
            case "APPEAL_RECEIVED" -> correlateAppealReceived(event);
            default -> {
                routedEvents.markIgnored(event.eventId(), event.type());
                return;
            }
        }

        routedEvents.markHandled(event.eventId());
    }

    private void startCaseLifecycle(CaseEvent event) {
        camundaClient
            .newCreateInstanceCommand()
            .bpmnProcessId("regulatory-case-lifecycle")
            .latestVersion()
            .variables(Map.of(
                "caseId", event.caseId(),
                "sourceEventId", event.eventId()
            ))
            .send()
            .join();
    }

    private void correlateEvidenceSubmitted(CaseEvent event) {
        camundaClient
            .newPublishMessageCommand()
            .messageName("EvidenceSubmitted")
            .correlationKey(event.caseId())
            .messageId(event.eventId())
            .variables(Map.of(
                "caseId", event.caseId(),
                "evidenceId", event.payload().get("evidenceId")
            ))
            .send()
            .join();
    }

    private void correlateAppealReceived(CaseEvent event) {
        camundaClient
            .newPublishMessageCommand()
            .messageName("AppealReceived")
            .correlationKey(event.caseId())
            .messageId(event.eventId())
            .variables(Map.of(
                "caseId", event.caseId(),
                "appealId", event.payload().get("appealId")
            ))
            .send()
            .join();
    }
}

This code is intentionally simple. In production, separate routing, idempotency, schema validation, error handling, and client adapter concerns.


15. Pattern: Command Worker for Service Autonomy

A Camunda worker should not contain deep domain logic.

It should call the owning service.

The domain service owns:

  • validation;
  • transaction;
  • persistence;
  • domain invariants;
  • authorization;
  • audit record;
  • idempotency if the operation mutates state.

The worker owns:

  • job contract;
  • mapping variables to request;
  • retry/error mapping;
  • telemetry;
  • idempotency key propagation;
  • process output mapping.

16. Pattern: Process-Local Decision, Event-Global Fact

A process may use DMN internally, then publish the result as a fact.

DMN is local policy. Event is global fact.

Do not publish every intermediate variable. Publish facts that external consumers can rely on.


17. Pattern: Escalation as Orchestration, Notification as Choreography

Escalation often belongs in the process.

Notification usually belongs at the edge.

The process owns the escalation state. The notification service owns email/SMS/channel delivery.

This keeps the process defensible without over-modeling every notification path.


18. Pattern: Process State Projection

Some consumers need read models, not orchestration control.

A process status projection can answer:

  • current milestone;
  • assigned group;
  • pending wait state;
  • due date;
  • escalation state;
  • high-level outcome.

Do not make every consumer call Operate APIs for business integration. Build stable read models when needed.


19. Orchestration Smells

Smell 1 — BPMN as integration spaghetti

Symptoms:

  • dozens of tiny service tasks;
  • no business labels;
  • each task maps to one HTTP endpoint;
  • business stakeholders cannot read it;
  • every service deployment requires BPMN change.

Fix:

  • raise abstraction;
  • group technical steps behind service-owned APIs;
  • model business milestones;
  • keep implementation detail in workers/services.

Smell 2 — Process owns all domain state

Symptoms:

  • process variables contain complete aggregate records;
  • workers read/write huge JSON blobs;
  • services are thin CRUD wrappers;
  • incidents expose sensitive data;
  • state repair is hard.

Fix:

  • keep domain state in domain services;
  • keep process variables minimal;
  • store references and decisions;
  • use data contracts.

Smell 3 — Central process controls optional reactions

Symptoms:

  • BPMN has branches for analytics, email, archive, dashboard, BI, ML, cache refresh;
  • failure of optional consumer blocks core lifecycle;
  • process changes whenever a new consumer appears.

Fix:

  • publish domain/integration events;
  • let optional consumers subscribe;
  • model only required business effects.

Smell 4 — One process to rule them all

Symptoms:

  • one giant BPMN diagram for a whole organization;
  • many unrelated actors and lifecycles;
  • multi-entity state tangled together;
  • versioning becomes impossible.

Fix:

  • split by lifecycle ownership;
  • use call activities carefully;
  • use events for cross-lifecycle propagation;
  • define process APIs.

20. Choreography Smells

Smell 1 — Event soup

Symptoms:

  • hundreds of events;
  • inconsistent names;
  • unclear source of truth;
  • no lifecycle owner;
  • consumers rely on event order that is not guaranteed globally.

Fix:

  • define event taxonomy;
  • publish only stable facts;
  • document event ownership;
  • use process orchestration where lifecycle matters.

Smell 2 — Hidden process in consumers

Symptoms:

  • each service stores partial workflow state;
  • retries and timers implemented independently;
  • escalation logic duplicated;
  • no single operator view.

Fix:

  • move lifecycle policy into BPMN;
  • keep services authoritative for local data;
  • use events for notifications and projections.

Smell 3 — Commands masquerading as events

Symptoms:

  • event names are imperative;
  • consumers must act exactly once;
  • producer expects a specific consumer to react;
  • missing consumer is a business failure.

Fix:

  • model as command/reply or service task;
  • make dependency explicit;
  • correlate reply to process instance.

Smell 4 — No recovery owner

Symptoms:

  • failure requires manual database changes;
  • no process instance shows where the case is blocked;
  • compensating actions are ad hoc;
  • teams disagree about who should fix the issue.

Fix:

  • define failure ownership;
  • model recovery in BPMN where business-visible;
  • use incidents for technical intervention;
  • use compensation or forward recovery for completed side effects.

21. Regulatory Systems Lens

In regulatory enforcement systems, orchestration is usually justified for the core case lifecycle.

Why:

  • process stage has legal meaning;
  • deadlines matter;
  • human decision checkpoints must be auditable;
  • evidence intake must be traceable;
  • escalation must be defensible;
  • outcome must be explainable;
  • recovery must not depend on hidden logs;
  • process changes often require governance.

But not everything belongs in BPMN.

Good split:

ConcernPreferred Ownership
Case lifecycleCamunda process
Case aggregate stateCase service
Evidence binary/document metadataDocument service
Risk scoring modelRisk service / DMN where policy-level
NotificationsNotification service + events
Audit logAudit service / event store
AnalyticsProjection/warehouse
SLA escalation stateCamunda process
User task assignment policyCamunda + IAM/task app
Long-term record retentionArchive service

22. Designing an Orchestration Contract

A process should expose a contract to the rest of the system.

Minimum process contract:

process:
  bpmnProcessId: regulatory-case-lifecycle
  versioning: versionTag + deployment governance
  startsBy:
    - command: StartCaseLifecycle
      requiredVariables:
        - caseId
        - caseType
        - openedAt
  messages:
    - name: EvidenceSubmitted
      correlationKey: caseId
      variables:
        - evidenceId
        - submittedAt
    - name: AppealReceived
      correlationKey: caseId
      variables:
        - appealId
        - receivedAt
  publishedFacts:
    - CaseAssessmentStarted
    - DecisionIssued
    - EnforcementActionClosed
  workerContracts:
    - classify-case
    - assign-investigator
    - request-evidence
    - issue-decision

This makes orchestration a platform contract, not a hidden implementation detail.


23. Choosing BPMN Elements for Integration

SituationRecommended Modeling
Need response before continuingService task
Send event/fact to brokerSend task or service task with clear label
Wait for external responseMessage catch event / receive task / service task depending abstraction
Wait for external event with business meaningMessage intermediate catch event
Start lifecycle from external eventMessage start event or start command from router
Optional downstream consumersPublish event; do not model each consumer
Human intervention requiredUser task
Business rule decisionDMN business rule task
Technical failure retryJob failure/retry/backoff
Business alternate pathBPMN error/escalation/gateway

24. Operational Visibility Model

Orchestration gives visibility, but only for what it owns.

A good visibility architecture combines:

Operate shows process execution and incidents. It is not a replacement for domain audit, log aggregation, tracing, or analytics.

Design visibility per concern:

ConcernVisibility Source
Where is process blocked?Operate / process projection
Why did worker fail?worker logs/traces
What business action occurred?domain audit log
Who approved?task/user audit + domain record
Which event was emitted?event outbox/broker metadata
Which SLA breached?process variable/projection + metrics

25. Modeling Example: Case Assignment

Bad purely choreographed design

This can work, but there are hidden questions:

  • What if risk score never arrives?
  • Who owns assignment timeout?
  • What if assignment fails?
  • Can a supervisor manually override?
  • Where does a caseworker see current stage?
  • How is reassignment audited?

Better orchestrated core

Now the process owns the business lifecycle and failure path.


26. Modeling Example: Notifications

Bad over-orchestrated design

This blocks the business process on optional delivery channels.

Better edge choreography

If a legally required notice must be sent before the decision is effective, keep that notice as part of orchestration. Optional channels should be choreographed.


27. Failure Ownership Matrix

FailureOrchestration ResponsibilityService Responsibility
Risk service unavailableretry/backoff/incident/escalationexpose stable error semantics
Duplicate job activationuse idempotency keydeduplicate operation
Event publish duplicateuse message/event IDconsumer idempotency
Notification delivery failureonly if legally requiredchannel retry/dead-letter
Evidence missingwait/escalate/request againdocument service owns metadata
Assignment conflictmodel alternate pathassignment service enforces invariant
Process model bugincident/runbook/version fixn/a
Domain validation errorBPMN business path or modeled rejectiondomain service returns meaningful result

28. Versioning Implications

Orchestration versioning is explicit.

A process definition version determines behavior for future instances. Existing instances may continue under old versions unless migrated. That is useful for governance.

Choreography versioning is distributed.

Event schema changes affect all consumers. Behavior changes may be harder to see.

Use this decision:

  • if policy change must be approved, deployed, visible, and possibly applied only to new cases, orchestration versioning is valuable;
  • if new consumers can react to old facts without changing producer behavior, event schema versioning is enough.

29. Performance Considerations

Do not orchestrate high-volume low-value technical events.

Bad fit:

  • every click;
  • every telemetry sample;
  • every internal cache update;
  • every row-level CRUD change;
  • every low-value notification attempt.

Good fit:

  • long-running business case;
  • multi-step approval;
  • regulated decision lifecycle;
  • cross-service business transaction;
  • human-in-the-loop workflow;
  • external wait state;
  • compensation/recovery process.

Zeebe is scalable, but not every event deserves a process instance.


30. Team Ownership

Orchestration introduces a process owner role.

Possible ownership models:

ModelWorks WhenRisk
Central workflow team owns all BPMNstrong governance requiredbottleneck
Domain team owns its processdomains are matureinconsistent modeling
Platform team owns runtime/golden pathmany teams build processesrequires standards
Process council reviews production BPMNregulated environmentsreview overhead

For serious systems, use platform engineering:

  • BPMN standards;
  • worker contract templates;
  • naming conventions;
  • test requirements;
  • incident runbooks;
  • process review checklist;
  • versioning rules.

31. Practical Checklist

Before choosing orchestration, ask:

  • What is the lifecycle name?
  • Who owns the lifecycle outcome?
  • What are the legally/business-relevant states?
  • Which waits are business-visible?
  • Which failures require human visibility?
  • Which steps mutate external systems?
  • Which actions must be idempotent?
  • Which outputs must be published as facts?
  • What data is needed in process variables?
  • What stays in domain services?
  • What is the versioning policy?
  • What is the operational runbook?

Before choosing choreography, ask:

  • Is the event a stable fact?
  • Who owns the event schema?
  • Can consumers fail independently?
  • Does producer need a reply?
  • Is global ordering required?
  • How are duplicates handled?
  • How is event replay handled?
  • How are downstream failures observed?
  • Is there a hidden lifecycle emerging across services?

32. Mental Model Summary

Orchestration is best for explicit lifecycle control.

Choreography is best for autonomous reaction to facts.

The best enterprise design usually looks like this:

This gives you:

  • explicit lifecycle;
  • autonomous services;
  • visible recovery;
  • stable events;
  • clear contracts;
  • auditability;
  • scalability;
  • governance without centralizing everything.

33. Top 1% Engineering Rubric

You understand orchestration vs choreography when you can:

  • explain lifecycle ownership without mentioning tooling;
  • identify hidden processes in event-driven systems;
  • distinguish domain event, command, reply, and correlation message;
  • decide when BPMN should expose a wait state;
  • keep domain state out of process variables;
  • design idempotent publication and correlation;
  • define failure ownership per step;
  • prevent BPMN from becoming integration spaghetti;
  • prevent events from becoming ungoverned soup;
  • design hybrid architecture with clear contracts.

34. References


35. What Comes Next

Part 022 moves from coordination style to business transaction design.

We will cover Saga and compensation patterns: how to design long-running business transactions where ACID rollback is impossible, side effects may already have happened, and regulatory auditability matters.

Lesson Recap

You just completed lesson 21 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.