Build CoreOrdered learning track

Service Collaboration Patterns

Learn Java Microservices Design and Architect - Part 028

Mendesain pola kolaborasi antar Java microservices: request-response, asynchronous messaging, command queue, publish-subscribe, orchestration, choreography, API composition, BFF, hybrid collaboration, dan decision framework.

16 min read3006 words
PrevNext
Lesson 28100 lesson track19–54 Build Core
#java#microservices#collaboration-patterns#distributed-systems+6 more

Part 028 — Service Collaboration Patterns

Tidak ada satu pola komunikasi terbaik. Yang ada adalah pola yang cocok untuk latency, consistency, ownership, failure model, dan user journey tertentu.

Microservices tidak hidup sendiri. Mereka berkolaborasi untuk menyelesaikan business outcome.

Masalahnya, banyak desain langsung memilih teknologi:

“Pakai REST.”
“Pakai Kafka.”
“Pakai gRPC.”
“Pakai workflow engine.”

Itu urutan berpikir yang salah. Pertanyaan pertama bukan teknologi. Pertanyaan pertama adalah:

Bentuk kolaborasi apa yang dibutuhkan oleh business flow ini?

Part ini membangun decision model untuk memilih pola kolaborasi antar service.


1. Collaboration Is a Business Design Problem

Satu business flow bisa menyentuh banyak service.

Contoh: submit enforcement case.

Kalau semua digambar sebagai arrow, kita belum mendesain apa-apa. Kita baru menggambar dependency.

Untuk setiap arrow, tanya:

  1. apakah caller butuh jawaban sekarang?
  2. apakah aksi harus berhasil sebagai satu unit bisnis?
  3. apakah receiver punya ownership sendiri?
  4. apakah failure receiver harus menggagalkan caller?
  5. apakah user menunggu hasilnya?
  6. apakah data boleh stale?
  7. apakah request bisa diulang?
  8. apakah urutan penting?
  9. apakah flow perlu audit/reconstruction?
  10. apakah ada human step, timer, SLA, atau escalation?

Jawaban pertanyaan ini menentukan pola collaboration.


2. Pattern Map

Pola utama:

PatternCore ideaBest for
Request-responseCaller menunggu responseQuery/decision immediate
Async commandSender meminta receiver melakukan aksi nantiBackground task targeted
Publish-subscribe eventProducer publish fact, many consumers reactSide effects, projections
API compositionAggregator menggabungkan banyak APIRead experience
BFFBackend khusus client journeyUI-specific aggregation
OrchestrationCoordinator mengatur langkah flowLong-running business process
ChoreographyServices react to each other's eventsSimple decentralized process
HybridCampuran pola sesuai boundaryMost real systems

Tidak ada sistem enterprise besar yang hanya memakai satu pola.


3. Pattern 1: Request-Response

Request-response berarti caller mengirim request dan menunggu response.

Contoh transport:

  • REST/HTTP;
  • gRPC;
  • GraphQL internal query;
  • synchronous message RPC, walau jarang direkomendasikan.

When It Fits

Gunakan request-response ketika:

  • caller butuh jawaban untuk melanjutkan;
  • user sedang menunggu response;
  • data harus fresh;
  • receiver adalah authority untuk decision;
  • failure bisa ditampilkan langsung;
  • timeout bisa dibuat pendek dan jelas.

Contoh:

  • Case Service meminta Risk Service menghitung risk sebelum submit;
  • UI mengambil case detail;
  • Payment Service meminta fraud decision;
  • Policy Service mengevaluasi eligibility;
  • API Gateway mengambil profile ringkas untuk response.

Failure Model

Request-response membawa dependency runtime.

Jika Risk Service lambat, Case Service ikut lambat. Jika Risk Service down, Case Service harus memilih:

  • fail closed;
  • fail open;
  • degrade;
  • use cached decision;
  • return pending/accepted;
  • reject operation.

Jangan membuat remote call tanpa failure policy.

public SubmitCaseResult submit(SubmitCaseCommand command) {
    RiskDecision risk = riskClient.calculateRisk(
        new RiskRequest(command.caseId()),
        Deadline.after(Duration.ofMillis(300))
    );

    if (risk.isTooHighForAutoSubmit()) {
        return SubmitCaseResult.requiresManualReview(risk.reason());
    }

    // continue local transaction
}

Smells

Request-response smell:

  • service calls 5+ dependencies in one user request;
  • no timeout;
  • default HTTP client config;
  • retry on all failures;
  • no circuit breaker;
  • remote call inside database transaction;
  • no fallback policy;
  • response assembled from many downstream calls without budget.

Rule

Use request-response only when the caller genuinely needs the answer now.


4. Pattern 2: Asynchronous Command

Async command berarti sender meminta receiver melakukan aksi, tetapi tidak menunggu hasil immediate.

Message ini bukan event. Ini command.

Bad naming:

SendCaseSubmittedNotificationEvent

Better:

SendCaseSubmittedNotificationCommand

When It Fits

Gunakan async command ketika:

  • ada target receiver jelas;
  • sender ingin receiver melakukan aksi;
  • aksi bisa terjadi nanti;
  • sender tidak butuh response immediate;
  • retry/queueing diinginkan;
  • action punya lifecycle sendiri.

Contoh:

  • send email;
  • generate PDF;
  • run expensive validation;
  • export report;
  • submit batch reconciliation;
  • request document scanning.

Command Message Shape

{
  "commandId": "cmd-001",
  "commandType": "SendCaseSubmittedNotification",
  "targetService": "notification-service",
  "requestedBy": "case-service",
  "requestedAt": "2026-07-05T10:15:30Z",
  "correlationId": "corr-001",
  "idempotencyKey": "case-submitted:CASE-2026-00041",
  "payload": {
    "caseId": "CASE-2026-00041",
    "template": "CASE_SUBMITTED"
  }
}

Failure Model

Sender must know whether the command was accepted into durable queue. Receiver owns execution.

Important states:

Java Handler

@Transactional
public void handle(SendCaseSubmittedNotificationCommand command) {
    if (processedCommands.exists(command.commandId())) {
        return;
    }

    NotificationRequest request = NotificationRequest.caseSubmitted(
        command.idempotencyKey(),
        command.payload().caseId(),
        command.payload().template()
    );

    notificationRepository.saveIfAbsent(request);
    processedCommands.markProcessed(command.commandId(), clock.instant());
}

Smells

  • command broadcast to many services;
  • command named as event;
  • no idempotency key;
  • sender assumes success after enqueue;
  • no command status model;
  • poison command blocks queue;
  • receiver performs non-idempotent side effect directly.

5. Pattern 3: Publish-Subscribe Event

Publish-subscribe berarti producer publish fact, multiple consumers react independently.

When It Fits

Gunakan publish-subscribe ketika:

  • producer tidak perlu tahu consumer;
  • side effects tidak boleh memblokir core transaction;
  • banyak consumer tertarik pada fakta yang sama;
  • consumer bisa menerima eventual consistency;
  • event punya meaning domain yang jelas;
  • replay/projection berguna.

Example

CaseSubmitted published by Case Service.

Consumers:
- SLA Service creates review deadline.
- Search Service indexes case.
- Audit Service records immutable audit event.
- Notification Service creates notification request.
- Reporting Service updates daily count projection.

Failure Model

  • producer success tidak berarti semua consumer success;
  • consumer failure harus lokal;
  • broker delivery mungkin duplicate;
  • event order tidak global;
  • replay harus aman;
  • DLQ harus dimonitor.

Smells

  • event is actually command;
  • no event owner;
  • no schema compatibility;
  • no consumer idempotency;
  • no dead-letter handling;
  • event payload is entire database row;
  • consumer does many synchronous callbacks.

6. Pattern 4: API Composition

API composition berarti satu layer membaca dari beberapa service dan menyusun response.

When It Fits

Gunakan API composition ketika:

  • UI butuh view gabungan;
  • underlying ownership tetap terpisah;
  • view bukan source of truth;
  • latency budget masih masuk akal;
  • partial response bisa diterima atau didefinisikan.

Failure Model

API composition rentan fan-out latency.

Jika empat dependency masing-masing punya p95 200ms, response p95 aggregator tidak otomatis 200ms. Tail latency bisa memburuk karena menunggu dependency paling lambat.

Design rules:

  • parallelize calls jika aman;
  • set per-dependency timeout;
  • set total deadline;
  • support partial response jika business mengizinkan;
  • avoid deep dependency chains;
  • cache reference data;
  • observe downstream latency separately.

Java Sketch

public CaseWorkbenchView getWorkbench(CaseId caseId) {
    Deadline deadline = Deadline.after(Duration.ofMillis(800));

    CompletableFuture<CaseSummary> caseFuture = async.get(() ->
        caseClient.getSummary(caseId, deadline.slice(Duration.ofMillis(250)))
    );

    CompletableFuture<RiskSummary> riskFuture = async.get(() ->
        riskClient.getSummary(caseId, deadline.slice(Duration.ofMillis(250)))
    );

    CompletableFuture<EvidenceSummary> evidenceFuture = async.get(() ->
        evidenceClient.getSummary(caseId, deadline.slice(Duration.ofMillis(250)))
    );

    return CaseWorkbenchView.compose(
        caseFuture.join(),
        riskFuture.exceptionally(ex -> RiskSummary.unavailable()).join(),
        evidenceFuture.exceptionally(ex -> EvidenceSummary.unavailable()).join()
    );
}

Smells

  • aggregator contains business rule owner should own;
  • aggregator writes to multiple services;
  • aggregator becomes god service;
  • no partial failure model;
  • no latency budget;
  • nested aggregators call nested aggregators.

7. Pattern 5: Backend-for-Frontend

BFF is API composition shaped around a specific client experience.

When It Fits

Gunakan BFF ketika:

  • client punya kebutuhan berbeda;
  • mobile butuh payload kecil;
  • web butuh richer view;
  • client release cadence berbeda;
  • UI journey butuh composition spesifik;
  • security/token/session behavior berbeda.

Boundary Rule

BFF boleh tahu user experience. BFF tidak boleh menjadi owner business invariant.

Good BFF responsibility:

  • view shaping;
  • client-specific aggregation;
  • UI workflow convenience;
  • presentation-level permission filtering;
  • payload optimization.

Bad BFF responsibility:

  • deciding case can be approved;
  • calculating regulatory deadline;
  • owning enforcement state machine;
  • writing to multiple domain services without process owner;
  • storing authoritative domain data.

8. Pattern 6: Orchestration

Orchestration berarti coordinator menentukan langkah proses.

When It Fits

Gunakan orchestration ketika:

  • process long-running;
  • ada branching logic kompleks;
  • ada timer/SLA;
  • ada human task;
  • perlu visibility central;
  • perlu compensation;
  • perlu audit/reconstruction;
  • debugging choreography terlalu sulit;
  • process owner jelas.

Process State

Java-Like Orchestrator Sketch

public class CaseSubmissionProcess {

    public void start(SubmitCaseCommand command) {
        processStore.create(command.processId(), Step.CASE_SUBMISSION_STARTED);
        commandBus.send(new SubmitCaseInternalCommand(command.caseId(), command.officerId()));
    }

    public void on(CaseSubmitted event) {
        processStore.move(event.processId(), Step.RISK_REQUESTED);
        commandBus.send(new CalculateRiskCommand(event.caseId()));
    }

    public void on(RiskCalculated event) {
        if (event.isHighRisk()) {
            processStore.move(event.processId(), Step.MANUAL_REVIEW_REQUIRED);
            commandBus.send(new RequestManualReviewCommand(event.caseId()));
            return;
        }

        processStore.move(event.processId(), Step.EVIDENCE_VALIDATION_REQUESTED);
        commandBus.send(new ValidateEvidenceCommand(event.caseId()));
    }
}

This sketch is simplified. In production, a workflow engine or durable process manager usually handles state, timers, retries, and history better than ad-hoc code.

Failure Model

Orchestrator failure modes:

  • orchestrator becomes bottleneck;
  • orchestrator knows too much domain detail;
  • orchestrator creates tight command coupling;
  • process versioning is hard;
  • compensation logic incomplete;
  • participant idempotency missing.

Rule

Orchestration is good when process visibility and control are more valuable than maximum decentralization.


9. Pattern 7: Choreography

Choreography berarti tidak ada central coordinator. Service bereaksi terhadap event satu sama lain.

When It Fits

Gunakan choreography ketika:

  • flow sederhana;
  • tiap participant punya autonomy tinggi;
  • tidak ada central process owner;
  • side effects loosely coupled;
  • eventual consistency acceptable;
  • failure lokal tidak harus menggagalkan flow utama.

Example

CaseSubmitted ->
  SLA Service creates deadline.
  Search Service updates index.
  Audit Service records fact.
  Notification Service sends message.

This is good choreography because these reactions are side effects of a fact.

When It Fails

Choreography memburuk ketika:

  • flow punya banyak branches;
  • urutan antar service penting;
  • compensation diperlukan;
  • business bertanya “case ini stuck di tahap mana?”;
  • debug harus membaca 12 topic;
  • failure handling tersebar;
  • service A publish event agar B publish event agar C publish event agar D melakukan aksi inti.

Choreography Smell

If this is one business transaction, you probably need orchestration or a visible process manager.

Rule

Use choreography for independent reactions. Be careful using it for complex business processes.


10. Orchestration vs Choreography Decision Matrix

QuestionPrefer choreographyPrefer orchestration
Flow is simple side effects?YesNo need
Many branching decisions?NoYes
Human task/timer/SLA?MaybeYes
Need central process visibility?NoYes
Need compensation?MaybeYes
Teams highly autonomous?YesMaybe
Process owner exists?MaybeYes
Debugging distributed flow hard?NoYes
Failure handling local?YesNo
Regulatory audit requires reconstruction?MaybeYes

In regulatory systems, orchestration is often justified for lifecycle processes because auditability, SLA, state visibility, and human decisions matter. Choreography remains excellent for secondary reactions like indexing, notifications, audit ingestion, and reporting projections.


11. Hybrid Collaboration: The Real Default

A realistic architecture combines patterns.

What happens here:

  • UI uses BFF for experience-specific view;
  • command API handles core write;
  • Case Service calls Risk synchronously because decision needed now;
  • Case Service publishes event after commit;
  • Audit/Search/SLA react asynchronously;
  • escalation lifecycle uses workflow orchestration;
  • notification uses async command.

This is not messy. This is precise.


12. Collaboration Pattern by Use Case

Use caseRecommended patternWhy
User opens case dashboardBFF/API compositionClient-specific read view
Submit caseCommand API + local transactionUser needs acceptance/rejection
Validate risk before submitRequest-responseImmediate decision required
Update search indexPub-sub eventEventually consistent projection
Send notificationAsync command or event reactionSide effect can retry separately
Generate monthly regulatory reportAsync command/job + projectionLong-running workload
Escalate overdue caseWorkflow orchestrationTimer, SLA, audit, state visibility
Audit case submittedPub-sub eventIndependent immutable fact recording
Fetch party authoritative detailsRequest-responseSource authority needed
Maintain party summary in case viewEvent-carried state projectionAvoid runtime coupling

13. Collaboration and Consistency

Your collaboration pattern defines consistency behavior.

PatternConsistency tendency
Request-response inside commandCan enforce immediate decision, but distributed availability risk
Local transaction + eventStrong local consistency, eventual cross-service consistency
Async commandAccepted-now, completed-later consistency
API compositionFresh per dependency, partial failure possible
Projection queryFast read, stale by design
Workflow orchestrationProcess consistency via durable state
ChoreographyEmergent consistency through reactions

Never say “eventual consistency” without specifying:

  • eventual for whom?
  • how long can stale last?
  • what user sees during delay?
  • what happens on failure?
  • who reconciles?
  • what invariant is allowed to be temporary?

14. Collaboration and Ownership

A service collaboration design is wrong if ownership is unclear.

For every interaction, define:

Ownership questionExample answer
Who owns the business decision?Risk Service owns risk classification
Who owns process state?Escalation Workflow owns escalation lifecycle
Who owns user-facing read view?Officer BFF owns view composition
Who owns authoritative case state?Case Service
Who owns notification delivery?Notification Service
Who owns event semantic?Producing domain service
Who owns retry policy?Caller for sync, consumer for async
Who owns compensation?Process owner/orchestrator

Without this, production incidents become argument-driven debugging.


15. Collaboration and Latency Budget

For synchronous collaboration, latency budget is an architecture constraint.

Example:

User submit case target p95 <= 900ms

Budget:
- API gateway/auth overhead: 80ms
- Case Service validation: 120ms
- Risk decision call: 250ms
- DB transaction: 150ms
- Outbox append: 30ms
- response serialization/network: 70ms
- safety buffer: 200ms

If you add three more synchronous dependencies, the budget breaks.

Rule:

Every synchronous dependency must spend latency budget and justify why the caller must wait.


16. Collaboration and Failure Policy

For every dependency, define failure policy.

DependencyFailure policy examples
Risk decisionfail closed, manual review, cached decision
Notificationenqueue retry, do not block submit
Auditcritical audit may block or use durable outbox
Search indexasync retry, stale search accepted
Party profilepartial view or unavailable section
Workflow enginecommand accepted only if process started
Reportingasync projection, reconcile later

Architecture document should explicitly say:

If Notification Service is unavailable, SubmitCase still succeeds. Notification Service consumes CaseSubmitted later from broker. Notification delay SLO is 5 minutes.

Or:

If Audit Outbox append fails, SubmitCase fails because regulatory audit evidence is mandatory for formal submission.

That is architecture. Not arrows.


17. Collaboration Pattern Selection Algorithm

Use this decision flow.


18. Java Implementation Boundaries

18.1 Synchronous Client Port

public interface RiskDecisionPort {
    RiskDecision calculateRisk(CaseId caseId, Deadline deadline);
}

Adapter:

@Component
class HttpRiskDecisionAdapter implements RiskDecisionPort {
    private final WebClient webClient;

    @Override
    public RiskDecision calculateRisk(CaseId caseId, Deadline deadline) {
        return webClient.post()
            .uri("/risk-decisions")
            .bodyValue(new RiskDecisionRequest(caseId.value()))
            .retrieve()
            .bodyToMono(RiskDecisionResponse.class)
            .timeout(deadline.remaining())
            .map(RiskDecisionMapper::toDomain)
            .block();
    }
}

Application service depends on the port, not WebClient.


18.2 Event Publisher Port

public interface DomainEventPublisher {
    void publishLater(IntegrationEvent event);
}

Outbox implementation:

@Component
class OutboxDomainEventPublisher implements DomainEventPublisher {
    private final OutboxRepository outboxRepository;

    @Override
    public void publishLater(IntegrationEvent event) {
        outboxRepository.append(OutboxMessage.from(event));
    }
}

The name publishLater is intentional. It prevents developer from assuming broker publish happens immediately.


18.3 Async Command Sender Port

public interface NotificationCommandPort {
    void requestCaseSubmittedNotification(CaseId caseId, CorrelationId correlationId);
}

Implementation writes to command outbox/queue:

@Component
class QueueNotificationCommandAdapter implements NotificationCommandPort {
    private final CommandOutboxRepository outbox;

    @Override
    public void requestCaseSubmittedNotification(CaseId caseId, CorrelationId correlationId) {
        outbox.append(SendCaseSubmittedNotificationCommand.create(caseId, correlationId));
    }
}

18.4 Workflow Port

public interface EscalationWorkflowPort {
    ProcessId startEscalation(CaseId caseId, EscalationReason reason);
}

Domain service should not know if implementation uses Temporal, Camunda, custom process manager, or queue-based orchestration.


19. Collaboration Smell Catalog

19.1 Chatty Service Mesh

Symptoms:

  • service A calls B for every field;
  • many small synchronous calls;
  • latency dominated by network;
  • N+1 remote calls;
  • difficult caching.

Fix:

  • coarser API;
  • query projection;
  • API composition with batch endpoints;
  • reconsider service boundary.

19.2 God Orchestrator

Symptoms:

  • orchestrator knows every domain rule;
  • services become CRUD participants;
  • all changes require orchestrator release;
  • autonomy disappears.

Fix:

  • move domain decisions back to owning services;
  • orchestrator coordinates process, not business internals;
  • use published commands/events;
  • split process by lifecycle ownership.

19.3 Event Pinball

Symptoms:

  • A event triggers B event triggers C event triggers D event;
  • no one owns full business outcome;
  • debugging requires reading broker history manually;
  • compensation impossible.

Fix:

  • introduce process manager;
  • define business process owner;
  • collapse unnecessary events;
  • distinguish side effects from core flow.

19.4 Shared Client Library Coupling

Symptoms:

  • all services import same generated/internal library;
  • library includes domain model, API DTOs, persistence objects;
  • one library release forces many services to update.

Fix:

  • generate clients from stable contracts only;
  • avoid shared domain model;
  • version client artifacts;
  • use consumer compatibility checks.

19.5 Business Rule in Gateway/BFF

Symptoms:

  • BFF decides if case can be approved;
  • gateway validates regulatory policy;
  • UI-specific service owns core status transition.

Fix:

  • move invariant to domain owner;
  • BFF only shapes experience;
  • use command API for business transition.

20. Collaboration Design Template

Use this template before implementing cross-service flow.

# Collaboration Design: <Flow Name>

## Business Outcome
What user/system outcome is being achieved?

## Participants
- Service A: owner of ...
- Service B: owner of ...

## Interaction List
| Step | Pattern | Caller/Producer | Receiver/Consumer | Sync/Async | Failure policy |
|---|---|---|---|---|---|

## Consistency Model
What is strong locally? What is eventual? What is user-visible?

## Latency Budget
Total budget and per synchronous dependency budget.

## Idempotency
Command keys, event IDs, duplicate behavior.

## Ordering
What must be ordered? By what key?

## Observability
Trace, correlation ID, business process ID, metrics.

## Compensation/Reconciliation
What happens when downstream step fails permanently?

## Security and Data Exposure
What data crosses service boundary? Any PII/sensitive data?

## Rollout Plan
Compatibility, feature flag, migration, rollback.

21. Worked Example: Submit Case Flow

21.1 Naive Design

Problems:

  • too many synchronous dependencies;
  • notification/search/audit latency blocks user;
  • unclear failure policy;
  • remote calls likely inside transaction;
  • partial success impossible to reason about;
  • Case Service becomes process god.

21.2 Better Design

Design decisions:

ConcernDecision
Risksynchronous because submit decision needs it
Case statelocal transaction in Case Service
Auditvia outbox event; if outbox append fails, submit fails
Searchasync projection
Notificationasync event reaction or command
SLAasync event reaction creates deadline
User responseaccepted/submitted after local commit
Consistencycase authoritative immediately, projections eventual

22. Architecture Review Checklist

Pattern Fit

  • Is each interaction pattern explicitly chosen?
  • Is the reason documented?
  • Is there any synchronous call that does not need immediate answer?
  • Is there any async interaction that actually needs immediate decision?

Runtime Dependency

  • What happens when receiver is down?
  • What happens when receiver is slow?
  • Is timeout defined?
  • Is retry safe?
  • Is circuit breaker/load shedding needed?

Consistency

  • What is strong?
  • What is eventual?
  • What is stale?
  • How does user know pending state?
  • Is reconciliation needed?

Ownership

  • Who owns the process?
  • Who owns each decision?
  • Who owns compensation?
  • Who owns event/command schema?
  • Who owns operational alert?

Observability

  • Is there a correlation ID across sync and async boundaries?
  • Can we reconstruct business flow?
  • Are command/event lag and failure visible?
  • Are partial failures visible to support team?

Evolution

  • Can receiver change independently?
  • Can producer add fields without breaking consumers?
  • Can new consumer join without producer change?
  • Can process version migrate safely?

23. Final Mental Model

Service collaboration is not a technology choice. It is a decision about time, ownership, consistency, and failure.

Use this simple model:

  • request-response when caller needs answer now;
  • async command when a specific receiver should do something later;
  • pub-sub event when a domain fact happened and many consumers may care;
  • API composition/BFF when a read experience spans services;
  • orchestration when a long-running business process needs control, timers, compensation, and visibility;
  • choreography when independent services react to facts without central process control;
  • hybrid when the business flow has multiple kinds of interactions.

The top-level skill is not memorizing pattern names. It is reading a business flow and asking:

Who owns the decision?
Who must know now?
Who can react later?
What fails if this dependency is down?
What consistency does the user actually need?
What evidence do we need to debug or defend the outcome?

Once you can answer those, the technology choice becomes much easier.

In the next part, we move into API composition and aggregation in more detail: fan-out, partial responses, BFF, aggregator services, latency budget, and failure isolation.

Lesson Recap

You just completed lesson 28 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.