Fanout, Scatter-Gather, Aggregator, and Saga Messaging Patterns
Learn Java RabbitMQ, RabbitMQ Streams, Patterns, and Deployment In Action - Part 027
Fanout, scatter-gather, aggregator, and saga messaging patterns using Java RabbitMQ with production-grade topology, correlation, timeout, idempotency, observability, and failure modelling.
Part 027 — Fanout, Scatter-Gather, Aggregator, and Saga Messaging Patterns
At this point in the series, we already have the core blocks: AMQP topology, producer confirms, consumer acknowledgements, idempotent consumers, retry/DLQ, Streams, batching, windowing, and pipeline design.
This part combines those blocks into higher-level coordination patterns.
These patterns are not just “RabbitMQ examples”. They are ways to structure distributed work when one service cannot, and should not, own the entire business process.
The important mental shift:
RabbitMQ does not make distributed coordination simple. It makes coordination explicit.
That means we must model:
- who owns the decision,
- who owns the side effect,
- who owns timeout,
- who owns partial failure,
- who owns compensation,
- and who is allowed to retry.
A top-tier engineer does not merely wire messages together. They define the invariants that keep the workflow correct when messages are duplicated, delayed, reordered, replayed, rejected, or partially processed.
1. Kaufman Skill Slice
In Kaufman's terms, this part is a subskill integration point.
Earlier parts deconstructed RabbitMQ into smaller skills:
- publish safely,
- consume safely,
- route predictably,
- retry intentionally,
- deduplicate,
- preserve causality,
- replay from streams,
- batch and window safely.
Now we combine them into distributed workflow patterns.
The target skill is:
Given a business process that spans multiple services, choose whether it should be fanout notification, scatter-gather, aggregator, orchestration saga, choreography saga, or a simpler command pipeline.
The practice goal is not to memorize pattern names. It is to identify the coordination semantics.
Ask these questions first:
- Is the publisher asking for work, announcing a fact, or collecting answers?
- Does the initiator need a final decision?
- Is there a timeout budget?
- Can the process complete with partial answers?
- Are side effects reversible?
- Is the process long-running?
- Does one service own the process state?
- What happens if a participant succeeds but its response is lost?
These questions determine the pattern more than the RabbitMQ API does.
2. Pattern Taxonomy
There are four patterns in this part.
| Pattern | Intent | Initiator waits? | State owner | Typical topology |
|---|---|---|---|---|
| Fanout | Notify many independent subscribers | No | Each subscriber | Fanout/topic exchange + queue per subscriber |
| Scatter-gather | Ask many responders, collect answers | Yes, bounded | Requester or aggregator | Request exchange + reply/response queue |
| Aggregator | Build one result from many messages | Usually yes | Aggregator service | Response/event queues + correlation state store |
| Saga | Coordinate long-running multi-step transaction | Eventually | Orchestrator or participants | Commands + events + state machine |
A common mistake is to implement all of them as “pub/sub”. That hides very different correctness requirements.
Fanout is notification. Scatter-gather is bounded query. Aggregator is stateful collection. Saga is long-running process control.
3. Fanout Pattern
3.1 Definition
Fanout publishes one message to many independent subscribers.
The publisher does not know who subscribes. The subscribers do not coordinate with each other. Each subscriber owns its own queue, retry policy, DLQ, and processing logic.
The core invariant:
A fanout publisher announces a fact. It does not command all subscribers to succeed.
Example events:
order.createdquote.approvedcase.escalatedpayment.captureddocument.indexedcustomer.profile.updated
Each subscriber may react differently.
For example, order.created may be consumed by:
- billing,
- fulfilment,
- fraud screening,
- notification,
- search indexing,
- analytics,
- audit logging.
The order service should not know these subscribers exist.
3.2 Fanout Topology
There are two main variants.
Variant A — Fanout exchange
Use a fanout exchange when all subscribers receive all events from that exchange.
This is simple and explicit.
The drawback is that each subscriber receives everything, even events it does not need.
Variant B — Topic exchange
Use a topic exchange when subscribers want filtered categories.
Topic exchange scales better as event categories grow.
The trade-off is governance. Routing keys become a public API.
3.3 Fanout Design Rules
Use fanout when:
- publisher is announcing a fact,
- subscribers are independent,
- publisher does not need subscriber result,
- subscriber failure must not block publisher,
- each subscriber can handle duplicate events,
- late subscriber replay is not required, or replay is handled by RabbitMQ Streams/outbox.
Do not use fanout when:
- publisher needs a decision,
- subscribers must all succeed atomically,
- process requires a single final result,
- subscriber failure must roll back publisher state,
- event order across subscribers is a hard requirement.
The most important fanout rule:
The event is not a remote procedure call disguised as a broadcast.
3.4 Fanout Publisher Example
A fanout publisher should have no subscriber knowledge.
public final class DomainEventPublisher {
private final Channel channel;
private final String exchange;
public DomainEventPublisher(Channel channel, String exchange) {
this.channel = channel;
this.exchange = exchange;
}
public void publish(DomainEvent event) throws IOException {
AMQP.BasicProperties properties = new AMQP.BasicProperties.Builder()
.messageId(event.eventId())
.correlationId(event.correlationId())
.type(event.eventType())
.contentType("application/json")
.deliveryMode(2)
.timestamp(Date.from(event.occurredAt()))
.headers(Map.of(
"schema", event.schemaName(),
"schemaVersion", event.schemaVersion(),
"producer", event.producer(),
"causationId", event.causationId()
))
.build();
channel.basicPublish(
exchange,
event.routingKey(),
true,
properties,
event.payloadBytes()
);
}
}
Notice the mandatory=true flag. It protects against silent unroutable messages, but it does not guarantee all intended subscribers exist. Subscriber queue declarations must be governed separately.
3.5 Subscriber Queue Ownership
A subscriber owns:
- queue name,
- binding pattern,
- prefetch,
- consumer concurrency,
- retry policy,
- DLQ,
- deduplication,
- projection table,
- alert thresholds.
The publisher owns:
- event contract,
- exchange contract,
- routing key taxonomy,
- event versioning,
- outbox relay,
- publish observability.
This separation is not optional. Without it, fanout becomes a distributed monolith.
4. Scatter-Gather Pattern
4.1 Definition
Scatter-gather sends a request to multiple responders and collects their responses within a bounded time.
Examples:
- ask multiple pricing engines for a quote,
- ask multiple inventory regions for availability,
- ask several compliance services for risk signals,
- ask multiple recommendation models for candidates,
- ask several fulfilment providers for shipping options.
The initiator needs a result, but the result may be:
- all responses,
- first successful response,
- quorum result,
- best ranked response,
- partial result after timeout.
The core invariant:
Scatter-gather must have a completion rule and a timeout rule before implementation starts.
Without that, the process leaks memory, blocks users, or waits forever.
4.2 Scatter-Gather Topology
One request is published to multiple responder queues. Each responder replies to a response queue or exchange with the same correlationId.
The requester can collect responses itself, but in production systems an aggregator service is usually cleaner.
4.3 Completion Rules
Scatter-gather is defined by its completion rule.
Common rules:
| Rule | Meaning | Example |
|---|---|---|
| All required | Complete when all required responders replied | Price = base + discount + tax |
| Quorum | Complete when N of M responders agree | Risk voting |
| First successful | Complete on first valid response | Provider lookup |
| Best before timeout | Collect until deadline, choose best | Shipping quote |
| Partial allowed | Return incomplete result with warnings | Analytics enrichment |
| Mandatory subset | Required responders must reply; optional responders may be missing | Regulatory decision + optional enrichments |
Every scatter-gather workflow needs:
- correlation id,
- expected responders,
- deadline,
- duplicate reply policy,
- late reply policy,
- partial result policy,
- failure classification.
4.4 Request Envelope
A request message should include a workflow deadline, not just a timeout hidden in the caller.
{
"messageId": "msg-8d2f",
"correlationId": "quote-2026-000918",
"causationId": "api-request-2931",
"type": "quote.price.requested.v1",
"deadlineAt": "2026-07-02T09:15:30Z",
"expectedResponders": ["base-price", "discount", "tax"],
"payload": {
"quoteId": "Q-918",
"customerId": "C-1001",
"items": [
{ "sku": "SKU-1", "quantity": 4 }
]
}
}
The deadline is part of the business contract. Responders can drop stale requests instead of wasting capacity.
4.5 Response Envelope
Each response must identify the responder and the original request.
{
"messageId": "msg-reply-991",
"correlationId": "quote-2026-000918",
"causationId": "msg-8d2f",
"type": "quote.price.component.calculated.v1",
"responder": "discount",
"status": "SUCCESS",
"completedAt": "2026-07-02T09:15:03Z",
"payload": {
"component": "discount",
"amount": "120.00",
"currency": "USD"
}
}
Do not infer responder identity from queue name. Put it in the message.
4.6 Aggregator State Model
A scatter-gather initiator or aggregator must store state.
public enum AggregateStatus {
OPEN,
COMPLETED,
TIMED_OUT,
FAILED,
CANCELLED
}
public record AggregationState(
String correlationId,
Instant deadlineAt,
Set<String> expectedResponders,
Map<String, ResponseEnvelope> receivedResponses,
AggregateStatus status,
int version
) {
public boolean hasRequiredResponses() {
return receivedResponses.keySet().containsAll(expectedResponders);
}
public boolean isExpired(Instant now) {
return !deadlineAt.isAfter(now);
}
}
In production, this state should be in durable storage, not only in memory.
Why?
Because the aggregator can crash after receiving some replies and before completing the workflow.
4.7 Duplicate Response Handling
Duplicate responses are normal.
They can happen because:
- responder processed request but crashed before ack,
- reply was published but confirm was ambiguous,
- aggregator processed response but crashed before ack,
- network interruption caused redelivery,
- replay was triggered from stream or DLQ.
The aggregator must deduplicate by:
(correlationId, responder, responseSemanticKey)
Usually correlationId + responder is enough if each responder sends one answer.
If responder can send multiple components, use a finer key:
(correlationId, responder, componentType, componentId)
Never rely on RabbitMQ delivery tag for business deduplication. Delivery tag is channel-scoped transport metadata.
5. Aggregator Pattern
5.1 Definition
An aggregator consumes related messages and produces one consolidated output.
It can be used with scatter-gather, fan-in pipelines, event projections, and saga decisions.
The aggregator is stateful. That is the point.
The core invariant:
Aggregator correctness is determined by its correlation key, completion rule, timeout rule, and idempotent state transition.
5.2 Aggregator Topology
The aggregator queue is usually a command-like queue: one message must be processed with deterministic state mutation.
Use manual ack.
Ack only after:
- response is persisted or ignored as duplicate,
- aggregate state transition is committed,
- any completion event is safely published or recorded in an outbox.
5.3 Aggregator State Transition
Treat aggregation as a state machine.
The transition must be idempotent.
For a duplicate response:
OPEN + duplicate response => OPEN, no semantic change
COMPLETED + late response => COMPLETED, ignore or emit late-response metric
TIMED_OUT + late response => TIMED_OUT, store audit marker if required
FAILED + response => FAILED, ignore unless manual repair is allowed
5.4 Java Aggregator Handler Skeleton
public final class AggregatorConsumer implements DeliverCallback {
private final AggregationRepository repository;
private final OutboxRepository outbox;
private final ObjectMapper objectMapper;
@Override
public void handle(String consumerTag, Delivery delivery) throws IOException {
Channel channel = getChannel();
long tag = delivery.getEnvelope().getDeliveryTag();
ResponseEnvelope response = objectMapper.readValue(
delivery.getBody(),
ResponseEnvelope.class
);
try {
repository.inTransaction(tx -> {
AggregationState state = tx.lockByCorrelationId(response.correlationId());
if (state == null) {
tx.recordOrphanResponse(response);
return;
}
AggregationState next = state.apply(response, Instant.now());
tx.save(next);
if (state.status() == AggregateStatus.OPEN
&& next.status() == AggregateStatus.COMPLETED) {
tx.insertOutbox(AggregateCompletedEvent.from(next));
}
});
channel.basicAck(tag, false);
} catch (TransientDependencyException ex) {
channel.basicNack(tag, false, true);
} catch (InvalidResponseException ex) {
repository.recordInvalidResponse(response, ex.getMessage());
channel.basicAck(tag, false);
} catch (Exception ex) {
channel.basicNack(tag, false, false);
}
}
}
The important property is not the exact code. It is the ack boundary.
Acknowledge only after durable state mutation.
5.5 Timeout Handling
Timeouts should not depend on in-memory timers only.
A production aggregator should have one of these:
- periodic sweeper job,
- delayed timeout message,
- database query on deadline,
- stream-based timer workflow,
- scheduler service.
The simplest reliable design is a sweeper:
public final class AggregationTimeoutSweeper {
private final AggregationRepository repository;
private final Clock clock;
public void runOnce() {
Instant now = clock.instant();
List<AggregationState> expired = repository.findOpenExpired(now, 500);
for (AggregationState state : expired) {
repository.inTransaction(tx -> {
AggregationState locked = tx.lockByCorrelationId(state.correlationId());
if (locked == null || locked.status() != AggregateStatus.OPEN) {
return;
}
AggregationState timedOut = locked.timeout(now);
tx.save(timedOut);
tx.insertOutbox(AggregateTimedOutEvent.from(timedOut));
});
}
}
}
A delayed message can also work, but the database remains the source of truth.
6. Saga Pattern
6.1 Definition
A saga coordinates a long-running business transaction across multiple services using local transactions and compensating actions.
It is used when a single ACID transaction is impossible or undesirable.
Examples:
- order placement across payment, inventory, shipment,
- quote approval across discount, compliance, legal, customer acceptance,
- case enforcement workflow across evidence, review, sanction, notification,
- subscription activation across billing, identity, provisioning, entitlement.
The core invariant:
A saga is not distributed rollback. It is explicit forward recovery and compensation.
6.2 Saga Variants
There are two common variants.
Orchestration Saga
One service owns the process state and sends commands to participants.
Use orchestration when:
- process is complex,
- central visibility is required,
- timeout logic is important,
- compensation logic must be explicit,
- workflow state must be auditable.
Choreography Saga
Participants react to events and publish next events.
Use choreography when:
- process is simple,
- participants are naturally decoupled,
- no central decision is needed,
- event contracts are stable,
- cyclic dependency risk is low.
Do not use choreography just because it looks more decoupled. Complex choreography often hides coupling in event chains.
6.3 Saga State Machine
For serious systems, model the saga explicitly.
Each transition must be:
- valid from current state,
- idempotent,
- persisted durably,
- correlated to saga id,
- traceable,
- recoverable after crash.
6.4 Saga Message Types
A saga uses commands and events differently.
Commands request action:
payment.reserve.command.v1
inventory.reserve.command.v1
shipment.create.command.v1
payment.release.command.v1
Events report facts:
payment.reserved.event.v1
payment.rejected.event.v1
inventory.reserved.event.v1
inventory.rejected.event.v1
shipment.created.event.v1
shipment.rejected.event.v1
The orchestrator sends commands. Participants publish events.
Do not publish PaymentReserved unless payment reservation actually happened.
6.5 Saga Topology
This topology separates command routing from event reporting.
That distinction is critical for security and governance:
- not every service should be allowed to send commands,
- many services may be allowed to subscribe to events,
- event producers should not command other services indirectly.
6.6 Saga Orchestrator Skeleton
public final class OrderSagaOrchestrator {
private final SagaRepository repository;
private final OutboxRepository outbox;
private final Clock clock;
public void onEvent(SagaEvent event) {
repository.inTransaction(tx -> {
OrderSaga saga = tx.lockSaga(event.sagaId());
if (saga == null) {
tx.recordOrphanSagaEvent(event);
return;
}
if (saga.hasProcessed(event.messageId())) {
return;
}
SagaDecision decision = saga.apply(event, clock.instant());
tx.save(decision.nextSaga());
tx.markProcessed(event.messageId());
for (SagaCommand command : decision.commandsToSend()) {
tx.insertOutbox(command.toOutboxMessage());
}
});
}
}
The orchestrator does not publish directly inside an unsafe transaction boundary. It writes commands to an outbox. A relay publishes them with confirms.
This prevents the classic bug:
state updated, process crashes before command published
or:
command published, process crashes before state updated
6.7 Participant Handler Skeleton
A participant handles one command as a local transaction.
public final class PaymentReserveHandler {
private final PaymentRepository payments;
private final OutboxRepository outbox;
public void handle(ReservePaymentCommand command) {
payments.inTransaction(tx -> {
if (tx.hasProcessed(command.messageId())) {
return;
}
PaymentReservationResult result = tx.reserve(
command.orderId(),
command.amount(),
command.currency(),
command.idempotencyKey()
);
tx.markProcessed(command.messageId());
if (result.accepted()) {
tx.insertOutbox(PaymentReservedEvent.from(command, result));
} else {
tx.insertOutbox(PaymentRejectedEvent.from(command, result.reason()));
}
});
}
}
The participant must not assume the command is delivered only once.
7. Fanout vs Scatter-Gather vs Saga
The patterns can look similar because they all use messages. The distinction is semantic.
| Question | Fanout | Scatter-gather | Aggregator | Saga |
|---|---|---|---|---|
| Is publisher waiting? | No | Yes | Often | Eventually |
| Is there a final result? | No | Yes | Yes | Yes |
| Is process long-running? | Usually no | Usually short | Short/medium | Yes |
| Is state required? | Subscriber-specific | Requester state | Aggregator state | Saga state |
| Are compensations needed? | No | Rare | Rare | Yes |
| Failure model | Subscriber isolation | Timeout/partial result | Missing/late/duplicate parts | Forward recovery/compensation |
| Typical RabbitMQ primitive | Exchange + queues | Request/reply + queues | Queue + DB state | Commands/events + DB state |
Use this as the first design review checkpoint.
8. Ack, Retry, and DLQ Policy per Pattern
8.1 Fanout Subscriber
Ack after local side effect is durable.
If subscriber updates a projection:
consume event -> upsert projection -> mark event processed -> ack
Retry policy:
- transient dependency: retry with backoff,
- invalid event contract: DLQ,
- duplicate event: ack,
- stale event: ack with metric,
- unknown schema version: DLQ or compatibility fallback.
8.2 Scatter-Gather Responder
Ack request after response has been durably published or recorded in outbox.
consume request -> local calculation -> insert response outbox -> ack request -> relay response
Avoid:
consume request -> publish response without confirm -> ack request
That can lose the response.
8.3 Aggregator
Ack response after aggregator state mutation is committed.
Retry policy:
- database unavailable: requeue/retry,
- duplicate response: ack,
- unknown correlation: store orphan + ack,
- invalid responder: DLQ/security alert,
- late response: ack + metric.
8.4 Saga Orchestrator
Ack participant event after saga transition is committed.
Retry policy:
- invalid transition due duplicate: ack,
- invalid transition due out-of-order but possible: park/retry with delay,
- impossible transition: audit + ack or DLQ,
- state store unavailable: retry,
- outbox unavailable: retry.
9. Correlation and Causation
These patterns collapse without correlation discipline.
Recommended identifiers:
| Field | Purpose |
|---|---|
messageId | Unique identity of this message |
correlationId | End-to-end workflow/request identity |
causationId | Message that caused this message |
sagaId | Long-running saga instance |
aggregateId | Business aggregate/entity identity |
idempotencyKey | Business duplicate prevention |
traceId | Observability trace |
Example chain:
API request
messageId = api-001
correlationId = quote-123
price.requested
messageId = msg-010
correlationId = quote-123
causationId = api-001
base-price.reply
messageId = msg-011
correlationId = quote-123
causationId = msg-010
aggregate.completed
messageId = msg-020
correlationId = quote-123
causationId = msg-011 or aggregate-state-transition-id
correlationId tells what workflow the message belongs to. causationId tells why it exists.
10. Handling Partial Failure
Partial failure is the normal case.
10.1 Fanout Partial Failure
One subscriber fails. Others continue.
Correct response:
- subscriber retries or parks its own message,
- publisher is not rolled back,
- audit/monitoring detects subscriber lag or DLQ growth.
10.2 Scatter-Gather Partial Failure
One responder fails or times out.
Possible policies:
- fail whole request,
- return partial result,
- use cached value,
- use fallback responder,
- extend deadline once,
- require manual review.
The policy must be explicit in the aggregate state.
10.3 Saga Partial Failure
One completed local transaction cannot be undone automatically.
Correct response:
- send compensating command,
- wait for compensation event,
- retry compensation safely,
- escalate if compensation fails,
- keep saga state auditable.
Compensation can fail too. A top-tier design has an answer for that.
11. Common Anti-Patterns
11.1 Fanout With Hidden Coupling
A publisher emits an event but assumes a specific subscriber will update something before the user refreshes the page.
That is not fanout. That is synchronous dependency hidden behind messaging.
Fix:
- use command/RPC if immediate answer is needed,
- or model eventual consistency explicitly.
11.2 Scatter-Gather Without Deadline
The initiator waits forever for all responders.
Fix:
- deadline in request envelope,
- durable aggregate state,
- sweeper or timeout message,
- partial-result policy.
11.3 Aggregator Without Deduplication
Duplicate replies inflate totals or complete aggregate incorrectly.
Fix:
- unique constraint on
(correlation_id, responder, component_key), - idempotent transition,
- duplicate metric.
11.4 Saga Without State Machine
Services publish events and hope the workflow completes.
Fix:
- define states,
- define commands,
- define events,
- define compensation,
- define timeout transitions,
- define manual repair state.
11.5 Choreography With Cycles
Event A triggers B, B triggers C, C triggers A.
Fix:
- introduce orchestrator,
- or define causation guards,
- or split event types into user facts vs process facts.
12. Observability Model
Pattern observability must show workflow health, not only queue health.
12.1 Fanout Metrics
- publish rate per event type,
- unroutable messages,
- subscriber queue depth,
- subscriber lag,
- DLQ count per subscriber,
- duplicate rate,
- schema rejection rate,
- projection staleness.
12.2 Scatter-Gather Metrics
- requests opened,
- requests completed,
- requests timed out,
- partial completions,
- responder latency,
- missing responder count,
- late response count,
- duplicate response count,
- aggregate state age.
12.3 Saga Metrics
- saga started/completed/failed/compensated,
- time in state,
- transition failure count,
- compensation count,
- compensation failure count,
- stuck saga count,
- orphan events,
- invalid transition count,
- outbox relay lag.
12.4 Trace Requirements
Every message should include trace/correlation metadata.
Log at state transitions, not every low-level operation.
Good saga log:
{
"event": "saga.transitioned",
"sagaId": "order-saga-991",
"correlationId": "order-991",
"fromState": "PAYMENT_RESERVED",
"toState": "INVENTORY_RESERVED",
"causationId": "inventory-reserved-msg-882",
"durationMs": 1840
}
Poor saga log:
received message
13. Security and Governance
These patterns create strong coupling through message contracts. Govern them deliberately.
13.1 Permissions
Suggested permission model:
| Actor | Can publish | Can consume |
|---|---|---|
| Domain owner | Its own event exchange | Its own command queue |
| Saga orchestrator | Command exchange | Saga event queue |
| Subscriber | Rarely events; mostly none | Its subscriber queue |
| Aggregator | Result exchange | Response queue |
| External adapter | Specific ingress exchange | Adapter command queue |
Do not grant every service publish permission to every command exchange.
13.2 Event Contract Governance
For fanout and choreography:
- version events,
- document ownership,
- define compatibility policy,
- define retention/replay policy,
- define deprecation date,
- validate schema in CI.
13.3 Saga Governance
For sagas:
- document compensation semantics,
- document manual repair path,
- document timeout budget,
- document state machine,
- document owner for stuck instances,
- document audit evidence.
14. Testing Strategy
14.1 Fanout Tests
Test:
- publisher event schema,
- routing key binding,
- subscriber idempotency,
- DLQ on invalid event,
- one subscriber failure does not block others,
- topology drift detection.
14.2 Scatter-Gather Tests
Test:
- all responders successful,
- one responder late,
- one responder duplicate,
- one responder invalid,
- partial allowed,
- required responder missing,
- aggregator crash after persisting response before ack,
- aggregator crash after completion before result publish.
14.3 Saga Tests
Test:
- happy path,
- rejection at each step,
- timeout at each step,
- duplicate event at each state,
- out-of-order event,
- compensation success,
- compensation failure,
- orchestrator restart,
- outbox relay restart,
- manual repair transition.
15. Design Review Checklist
Before implementing any of these patterns, answer this checklist.
15.1 Pattern Choice
- Is this command, event, request/reply, aggregate, or saga?
- Who owns the final decision?
- Does the initiator wait?
- Is partial completion acceptable?
- Is compensation needed?
- Is central state required?
15.2 Message Contract
- What is the message type?
- What is the schema version?
- What is the routing key?
- What is the correlation id?
- What is the causation id?
- What is the idempotency key?
- What is the deadline?
15.3 Failure Semantics
- What if message is duplicated?
- What if message is late?
- What if response is lost?
- What if participant succeeds but event is not published?
- What if aggregator crashes?
- What if compensation fails?
- What if DLQ grows?
15.4 Operations
- What metrics detect stuck workflow?
- What alert fires first?
- What is the repair action?
- Can an operator replay safely?
- Is the audit trail sufficient?
- Can we explain every state transition?
16. Practice Drill
Build a quote approval workflow:
- API receives quote approval request.
- Quote service starts a saga.
- Saga sends commands to:
- discount validation,
- compliance review,
- customer credit check.
- Discount and credit are scatter-gather responders.
- Compliance may be slow and can require manual review.
- Aggregator waits for required checks.
- If all pass, saga publishes
quote.approved. - If one required check fails, saga publishes
quote.rejected. - If compliance times out, saga moves to
WAITING_MANUAL_REVIEW. - All events are fanout to audit and analytics subscribers.
Required implementation properties:
- durable saga state,
- outbox relay,
- publisher confirms,
- manual ack,
- idempotent participants,
- duplicate reply handling,
- timeout sweeper,
- DLQ and parking lot,
- metrics and runbook.
17. Summary
Fanout, scatter-gather, aggregator, and saga are not interchangeable.
Fanout announces facts to independent subscribers.
Scatter-gather asks multiple responders and collects answers inside a bounded deadline.
Aggregator owns stateful collection and completion.
Saga coordinates long-running business transactions through local transactions, commands, events, timeouts, and compensations.
The implementation details vary, but the invariants do not:
- every workflow needs correlation,
- every state transition must be idempotent,
- every timeout must be explicit,
- every side effect must have an owner,
- every compensation must be observable,
- every retry must have a limit,
- every DLQ must have a runbook.
That is the difference between “using RabbitMQ” and designing a production-grade messaging platform.
References
- RabbitMQ Documentation — AMQP 0-9-1 Concepts: https://www.rabbitmq.com/tutorials/amqp-concepts
- RabbitMQ Documentation — Consumer Acknowledgements and Publisher Confirms: https://www.rabbitmq.com/docs/confirms
- RabbitMQ Documentation — Dead Letter Exchanges: https://www.rabbitmq.com/docs/dlx
- RabbitMQ Documentation — Reliability Guide: https://www.rabbitmq.com/docs/reliability
- RabbitMQ Documentation — Java Client API Guide: https://www.rabbitmq.com/client-libraries/java-api-guide
You just completed lesson 27 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.
Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.