Capstone Production-Grade Blueprint
Learn Java RabbitMQ, RabbitMQ Streams, Patterns, and Deployment In Action - Part 035
Capstone blueprint untuk membangun platform messaging Java + RabbitMQ production-grade: topology, outbox, retry, DLQ, streams, observability, deployment, load test, failure test, governance, dan final skill rubric.
Part 035 — Capstone: Production-Grade Java RabbitMQ Platform Blueprint
Pada bagian terakhir ini kita menyatukan semua konsep menjadi satu blueprint yang bisa dipakai sebagai internal engineering handbook untuk membangun platform messaging Java + RabbitMQ yang production-grade.
Targetnya bukan membuat demo send() dan receive(). Targetnya adalah membangun sistem yang tetap benar ketika:
- broker restart;
- network putus di tengah publish;
- producer tidak tahu apakah message sudah sampai;
- consumer crash setelah commit database tetapi sebelum ack;
- DLQ tiba-tiba melonjak;
- queue tumbuh lebih cepat daripada worker;
- stream retention hampir menghapus event yang belum diproses;
- duplicate delivery terjadi ribuan kali;
- satu tenant membuat hot partition;
- observability harus menjawab “message ini hilang, lambat, atau sedang retry?”;
- tim lain menambah subscriber baru tanpa merusak topology lama;
- auditor/regulator meminta jejak keputusan sistem.
Ini adalah bagian capstone. Karena itu, kita akan menutup seri dengan reference architecture, topology, Java implementation blueprint, deployment checklist, runbook, load test, failure test, ADR, dan rubric akhir.
1. Capstone Goal
Kita akan mendesain platform bernama Regulated Order Messaging Platform.
Domain ini sengaja dipilih karena cukup kompleks untuk memaksa kita memakai RabbitMQ dengan benar:
- ada command yang harus diproses worker;
- ada domain event yang harus diterima banyak subscriber;
- ada stream yang perlu replay untuk audit dan analytics;
- ada retry dan DLQ;
- ada idempotency;
- ada topology governance;
- ada traceability lintas service;
- ada operational runbook.
Kita tidak akan membangun full application. Kita membangun blueprint arsitektur dan implementasi inti yang bisa diturunkan ke Java microservices nyata.
1.1 What Good Looks Like
Blueprint ini dianggap benar jika memenuhi invariant berikut:
- Tidak ada silent message loss. Producer memakai publisher confirms, unroutable message ditangani, dan topology dideklarasikan secara deterministik.
- Consumer aman terhadap duplicate. Semua handler yang melakukan side effect punya idempotency boundary.
- Ack hanya setelah state aman. Consumer tidak ack sebelum commit state atau sebelum side effect dianggap safe.
- Retry tidak infinite. Retry punya budget, delay, classification, DLQ, parking lot, dan owner.
- Queue bukan tempat menyimpan backlog tanpa batas. Ada capacity model, alert, dan load shedding strategy.
- Stream dipakai untuk replay dan fan-out historis. Queue dipakai untuk work dispatch; stream dipakai untuk log/replay/high fan-out.
- Topology adalah contract. Exchange, queue, binding, policy, DLX, stream, permission, dan retention dikelola sebagai code.
- Observability menjawab pertanyaan operasional. Bukan hanya CPU/memory, tetapi confirm latency, queue growth, redelivery rate, DLQ rate, consumer lag, stream offset lag, dan business correlation.
- Deployment sadar failure. Quorum queue, stream replication, anti-affinity, persistent volume, rolling upgrade, backup/restore, dan DR masuk ke design.
- Setiap trade-off terdokumentasi. Keputusan penting punya ADR.
2. Kaufman Closure: From Practice to Operational Fluency
Josh Kaufman menekankan bahwa skill bisa dipercepat dengan memecah skill menjadi sub-skill, belajar cukup untuk self-correct, menghilangkan barrier latihan, dan melakukan deliberate practice.
Di seri ini, decomposition-nya sudah kita jalani:
| Capability | Seri Part | Output Praktis |
|---|---|---|
| Broker mental model | 002-003 | Bisa menjelaskan exchange, queue, binding, routing, log, stream |
| Java client correctness | 004-007 | Bisa membuat producer/consumer lifecycle yang aman |
| Communication pattern | 008-011 | Bisa memilih command/event/RPC/routing topology |
| Reliability | 012-018 | Bisa membangun at-least-once + idempotency + failure modelling |
| Streams | 019-023 | Bisa memakai RabbitMQ Streams, offsets, replay, super streams, batching |
| Pipeline/data pattern | 024-027 | Bisa mendesain pipeline, windowing, batching, saga |
| Performance | 028-030 | Bisa capacity planning, benchmark, dan JVM tuning |
| Operations | 031-034 | Bisa deploy, observe, secure, govern |
| Capstone | 035 | Bisa menyatukan semuanya menjadi platform blueprint |
Capstone ini adalah latihan self-correction terbesar. Jangan hanya membaca. Gunakan bagian ini untuk mengaudit desain RabbitMQ yang pernah kamu buat.
Pertanyaan self-correction paling penting:
Jika service crash pada titik terburuk, apakah message akan hilang, terduplikasi, atau tertahan? Apakah sistem dan tim operasional tahu jawabannya?
3. Reference Architecture Overview
Kita akan memakai empat jalur messaging utama:
- Command Queue Path untuk perintah kerja yang harus diproses satu atau beberapa worker.
- Domain Event Path untuk notifikasi event ke banyak subscriber.
- Retry/DLQ/Parking Lot Path untuk failure recovery.
- Stream Path untuk audit, replay, analytics, dan high fan-out historical consumers.
3.1 Architectural Principle
Gunakan RabbitMQ untuk coordination and delivery boundary, bukan untuk menyembunyikan domain inconsistency.
Artinya:
- RabbitMQ mengirim signal dan data antar boundary.
- Database service tetap source of truth untuk state milik service itu.
- Message handler harus bisa dipanggil ulang.
- Broker tidak menggantikan transaction, constraint, idempotency, atau state machine.
- Stream tidak otomatis membuat event sourcing benar; stream hanya menyediakan log yang bisa dibaca ulang.
4. Workload Classification
Sebelum membuat exchange/queue/stream, klasifikasikan message.
| Workload | Example | Primitive Utama | Kenapa |
|---|---|---|---|
| Command | ReserveCreditCommand | Quorum queue | Butuh single logical processing, retry, DLQ, work dispatch |
| Domain event | OrderSubmittedEvent | Topic exchange + subscriber queue | Banyak subscriber, queue ownership per subscriber |
| Audit event | OrderStateChanged | Stream | Butuh retention, replay, forensic query |
| Analytics feed | OrderMetricsEvent | Super stream | High throughput, partitioned by tenant/account/order |
| Request/reply internal | CheckRiskScore | Direct Reply-To or short-lived reply queue | Hanya untuk low-latency bounded RPC |
| Long workflow | OrderSagaStepCompleted | Event + state machine | Jangan chain synchronous broker RPC |
| Delayed retry | failed command/event | Retry exchange + delay queues | Controlled retry budget |
| Poison record | non-retryable failure | DLQ + parking lot | Human/operational ownership |
4.1 Decision Rule
Gunakan rule sederhana ini:
5. Topology Blueprint
Topology harus diperlakukan sebagai public contract. Jangan biarkan setiap service membuat queue/exchange secara ad hoc tanpa ownership.
5.1 Naming Convention
Gunakan format:
<domain>.<intent>.<kind>[.<qualifier>]
Contoh:
| Resource | Name | Owner |
|---|---|---|
| Command exchange | orders.command.x | Order platform team |
| Event exchange | orders.event.x | Order platform team |
| Retry exchange | orders.retry.x | Platform messaging team |
| DLX | orders.dlx | Platform messaging team |
| Command queue | risk.reserve-credit.command.q | Risk team |
| Subscriber queue | billing.orders.event.q | Billing team |
| DLQ | orders.dead-letter.q | Platform messaging team |
| Parking lot | orders.parking-lot.q | Operations + domain owner |
| Audit stream | orders.audit.stream | Audit/data platform team |
| Super stream | orders.analytics.super-stream | Analytics platform team |
5.2 Exchange Blueprint
orders.command.x
type: topic
purpose: route command messages by command type and bounded context
orders.event.x
type: topic
purpose: publish domain events to queue-per-subscriber
orders.retry.x
type: direct/topic
purpose: delayed retry routing back to original work queue
orders.dlx
type: topic
purpose: dead-letter non-retryable or exhausted messages
orders.unroutable.x
type: fanout/direct
purpose: collect mandatory-return equivalent topology errors when alternate exchange is used
5.3 Routing Key Taxonomy
Command routing keys:
cmd.orders.reserve-credit.v1
cmd.orders.release-credit.v1
cmd.orders.allocate-inventory.v1
cmd.orders.cancel-order.v1
Event routing keys:
evt.orders.order-submitted.v1
evt.orders.order-approved.v1
evt.orders.order-rejected.v1
evt.orders.order-cancelled.v1
evt.orders.order-state-changed.v1
Analytics stream routing keys:
analytics.orders.<tenantId>
analytics.orders.<region>.<tenantId>
5.4 Binding Blueprint
5.5 Queue Type Rule
| Use Case | Queue Type |
|---|---|
| Production durable command queue | Quorum queue |
| Durable subscriber queue | Quorum queue unless workload explicitly needs classic-specific behavior |
| Temporary callback | Exclusive auto-delete classic queue or Direct Reply-To |
| High-throughput replay | Stream, not queue |
| Partitioned replay | Super stream |
Quorum queue bukan silver bullet. Ia memberi replication/durability model yang lebih cocok untuk production data safety, tetapi tetap membutuhkan publisher confirm, consumer ack, idempotency, dan capacity control.
6. Message Contract Blueprint
Semua message harus punya envelope yang konsisten.
6.1 Envelope
{
"messageId": "01JZ1...",
"messageType": "OrderSubmitted",
"messageVersion": 1,
"occurredAt": "2026-07-02T10:15:30Z",
"producedAt": "2026-07-02T10:15:31Z",
"producer": "order-service",
"tenantId": "tenant-123",
"correlationId": "corr-abc",
"causationId": "cmd-xyz",
"traceId": "4bf92f3577b34da6a3ce929d0e0e4736",
"schema": "orders.order-submitted.v1",
"payload": {
"orderId": "ord-1001",
"customerId": "cust-771",
"totalAmount": "125000.00",
"currency": "IDR"
}
}
6.2 AMQP Properties Mapping
| Envelope Field | AMQP Property/Header |
|---|---|
messageId | messageId |
correlationId | correlationId |
messageType | type |
producedAt | timestamp |
tenantId | header tenant-id |
schema | header schema |
traceId | header traceparent or tracing propagation header |
causationId | header causation-id |
6.3 Contract Invariant
messageIdimmutable.messageTypeimmutable.messageVersionmonotonic untuk schema family.correlationIddibawa lintas request/workflow.causationIdmenunjuk message yang menyebabkan message baru.- Payload tidak boleh bergantung pada Java class name.
- Consumer tidak boleh require field baru dari event lama tanpa compatibility plan.
- PII classification harus jelas.
- Payload harus punya business key yang cukup untuk idempotency.
7. Outbox Blueprint
Producer yang publish event/command sebagai akibat perubahan database harus memakai outbox.
Masalah yang diselesaikan:
- DB commit berhasil, publish gagal.
- Publish berhasil, DB rollback.
- Service crash setelah DB commit sebelum publish.
- Network timeout membuat publish ambiguity.
7.1 Outbox Table
CREATE TABLE message_outbox (
id UUID PRIMARY KEY,
aggregate_type VARCHAR(100) NOT NULL,
aggregate_id VARCHAR(100) NOT NULL,
message_type VARCHAR(150) NOT NULL,
routing_key VARCHAR(200) NOT NULL,
exchange_name VARCHAR(200) NOT NULL,
payload_json TEXT NOT NULL,
headers_json TEXT NOT NULL,
status VARCHAR(30) NOT NULL,
attempt_count INTEGER NOT NULL DEFAULT 0,
next_attempt_at TIMESTAMP NOT NULL,
created_at TIMESTAMP NOT NULL,
published_at TIMESTAMP NULL,
last_error TEXT NULL
);
CREATE INDEX idx_outbox_pending
ON message_outbox (status, next_attempt_at, created_at);
7.2 Outbox State Machine
7.3 Java Outbox Relay Skeleton
public final class RabbitOutboxRelay implements AutoCloseable {
private final OutboxRepository outboxRepository;
private final ConfirmingRabbitPublisher publisher;
private final Clock clock;
public void runBatch(int batchSize) {
List<OutboxMessage> batch = outboxRepository.claimPending(batchSize, clock.instant());
for (OutboxMessage msg : batch) {
try {
PublishResult result = publisher.publishWithConfirm(
msg.exchangeName(),
msg.routingKey(),
msg.payload(),
msg.properties(),
Duration.ofSeconds(5)
);
if (result.confirmed()) {
outboxRepository.markPublished(msg.id(), clock.instant());
} else if (result.unroutable()) {
outboxRepository.markFailed(msg.id(), "UNROUTABLE");
} else {
outboxRepository.releaseForRetry(msg.id(), nextBackoff(msg.attemptCount()));
}
} catch (Exception e) {
outboxRepository.releaseForRetry(msg.id(), nextBackoff(msg.attemptCount()));
}
}
}
private Instant nextBackoff(int attemptCount) {
long seconds = Math.min(300, (long) Math.pow(2, Math.min(attemptCount, 8)));
return clock.instant().plusSeconds(seconds);
}
@Override
public void close() {
publisher.close();
}
}
7.4 Outbox Relay Invariants
- Relay hanya menandai
PUBLISHEDsetelah publisher confirm. - Relay tidak menghapus row terlalu cepat; audit retention tetap ada.
messageIdstabil antar retry.- Publish retry boleh menghasilkan duplicate, maka consumer wajib idempotent.
mandatory=trueatau alternate exchange harus dipakai untuk mendeteksi routing error.- Backoff dipakai agar broker tidak diserang ulang saat outage.
8. Producer Blueprint
Producer production-grade punya beberapa lapisan:
8.1 Confirming Publisher Skeleton
public final class ConfirmingRabbitPublisher implements AutoCloseable {
private final Channel channel;
public ConfirmingRabbitPublisher(Channel channel) throws IOException {
this.channel = channel;
this.channel.confirmSelect();
}
public PublishResult publishWithConfirm(
String exchange,
String routingKey,
byte[] body,
AMQP.BasicProperties properties,
Duration timeout
) throws IOException, InterruptedException {
channel.basicPublish(exchange, routingKey, true, properties, body);
boolean confirmed = channel.waitForConfirms(timeout.toMillis());
if (!confirmed) {
return PublishResult.notConfirmed();
}
return PublishResult.confirmed();
}
@Override
public void close() throws Exception {
channel.close();
}
}
Di sistem high-throughput, gunakan async confirms dan bounded in-flight map, bukan synchronous confirm per message.
8.2 Producer Metrics
Wajib expose:
- publish attempt count;
- publish confirmed count;
- publish nacked count;
- publish returned/unroutable count;
- publish timeout count;
- confirm latency p50/p95/p99;
- in-flight confirm count;
- outbox pending count;
- outbox age max;
- relay batch duration;
- retry attempt count.
8.3 Producer Health Check
Producer dianggap tidak sehat jika:
- connection down;
- channel recovery loop;
- confirm latency melewati threshold;
- outbox oldest pending age melewati SLA;
- returned message > 0 untuk topology production;
- broker blocked connection berlangsung terlalu lama;
- relay tidak bisa claim/mark row.
9. Consumer Blueprint
Consumer harus menggabungkan manual ack, idempotency, business transaction, retry classification, dan observability.
9.1 Consumer State Machine
9.2 Inbox/Dedup Table
CREATE TABLE processed_message (
consumer_name VARCHAR(150) NOT NULL,
message_id VARCHAR(150) NOT NULL,
aggregate_id VARCHAR(150) NULL,
processed_at TIMESTAMP NOT NULL,
result_hash VARCHAR(128) NULL,
PRIMARY KEY (consumer_name, message_id)
);
9.3 Consumer Handler Skeleton
public final class IdempotentRabbitConsumer {
private final InboxRepository inbox;
private final DomainHandler handler;
private final RetryClassifier retryClassifier;
public void handle(Channel channel, Delivery delivery) throws IOException {
long tag = delivery.getEnvelope().getDeliveryTag();
MessageEnvelope envelope = decode(delivery);
try {
boolean firstTime = inbox.tryStart("billing-orders-consumer", envelope.messageId());
if (!firstTime) {
channel.basicAck(tag, false);
return;
}
handler.handleInTransaction(envelope);
inbox.markProcessed("billing-orders-consumer", envelope.messageId());
channel.basicAck(tag, false);
} catch (Exception e) {
FailureDecision decision = retryClassifier.classify(e, delivery, envelope);
switch (decision.action()) {
case RETRY_REQUEUE -> channel.basicNack(tag, false, true);
case DEAD_LETTER -> channel.basicNack(tag, false, false);
case ACK_AFTER_COMPENSATION -> channel.basicAck(tag, false);
}
}
}
}
9.4 Consumer Invariants
- Auto ack dilarang untuk workload durable.
- Ack hanya setelah state aman.
- Duplicate harus safe.
- Retry harus terbatas.
- Handler tidak boleh sleep panjang di consumer thread tanpa bounded executor strategy.
- Prefetch harus mencerminkan work budget.
- Consumer harus graceful shutdown: stop accepting delivery, finish in-flight, ack/nack safely, close channel.
10. Retry, DLQ, and Parking Lot Blueprint
10.1 Retry Topology
10.2 Retry Header Contract
Gunakan header eksplisit:
x-original-exchange
x-original-routing-key
x-retry-count
x-first-failure-at
x-last-failure-at
x-last-error-code
x-last-error-message
x-failure-class
x-parking-reason
Jangan hanya bergantung pada x-death untuk domain logic, karena format dan accumulation-nya lebih cocok untuk operational metadata daripada business decision contract.
10.3 Failure Classification
| Failure | Retry? | Action |
|---|---|---|
| DB deadlock | Yes | delayed retry |
| HTTP 503 downstream | Yes | delayed retry + circuit breaker |
| validation error | No | DLQ |
| unknown schema version | No | DLQ + contract alert |
| missing required entity but eventually consistent | Yes | delayed retry with max age |
| authorization/permission error | Usually no | DLQ/security alert |
| poison message causing repeated exception | No after budget | parking lot |
10.4 DLQ Ownership
DLQ tanpa owner adalah kuburan message.
Setiap DLQ harus punya:
- owner team;
- severity policy;
- max acceptable age;
- replay procedure;
- discard procedure;
- PII handling;
- audit trail;
- dashboard;
- alert threshold.
11. Stream and Super Stream Blueprint
Gunakan stream untuk audit dan replay, bukan sebagai pengganti semua queue.
11.1 Audit Stream
orders.audit.stream
type: stream
retention: 30-180 days depending compliance
routing source: outbox relay / event publisher
ordering key: aggregateId/orderId
consumer examples:
- audit projector
- compliance export
- forensic replay tool
- materialized view rebuild
11.2 Analytics Super Stream
orders.analytics.super-stream
partitions: 12 or 24 initially
routing key: tenantId or accountId
use case: high-throughput analytics processing
consumers:
- metrics aggregator
- anomaly detection
- BI exporter
11.3 Stream Replay Safety
A replay-safe consumer:
- tidak mengirim email ulang tanpa side-effect ledger;
- tidak melakukan double charge;
- tidak menulis projection tanpa idempotent upsert;
- bisa memulai dari offset lama;
- punya checkpoint yang konsisten dengan state output;
- dapat membedakan normal live processing dan replay mode;
- punya throttle agar replay tidak menghancurkan downstream.
11.4 Stream Offset Decision
| Consumer Type | Offset Store |
|---|---|
| Stateless analytics | server-side offset may be enough |
| Projection with DB output | external DB checkpoint in same transaction as projection |
| Audit export | external checkpoint + export ledger |
| Experiment/debug consumer | manual offset or named temporary consumer |
| Multi-threaded partition processing | per-partition checkpoint with gap tracking |
12. Deployment Blueprint
12.1 Cluster Model
Production RabbitMQ cluster minimal harus mendukung:
- persistent volumes;
- anti-affinity antar node;
- resource request/limit yang realistis;
- monitoring;
- TLS/secrets;
- backup/restore plan;
- upgrade plan;
- queue/stream leader awareness;
- capacity margin.
12.2 Kubernetes High-Level Model
12.3 RabbitMQCluster Example
apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
name: orders-rabbitmq
namespace: rabbitmq
spec:
replicas: 3
persistence:
storageClassName: fast-ssd
storage: 200Gi
resources:
requests:
cpu: "2"
memory: 8Gi
limits:
cpu: "4"
memory: 16Gi
rabbitmq:
additionalPlugins:
- rabbitmq_prometheus
- rabbitmq_stream
- rabbitmq_management
Sesuaikan field manifest dengan versi operator yang dipakai. Jangan copy-paste YAML production tanpa validasi terhadap dokumentasi versi operator.
12.4 Topology-as-Code Example
apiVersion: rabbitmq.com/v1beta1
kind: Exchange
metadata:
name: orders-event-exchange
spec:
name: orders.event.x
type: topic
durable: true
rabbitmqClusterReference:
name: orders-rabbitmq
---
apiVersion: rabbitmq.com/v1beta1
kind: Queue
metadata:
name: billing-orders-event-queue
spec:
name: billing.orders.event.q
durable: true
type: quorum
rabbitmqClusterReference:
name: orders-rabbitmq
---
apiVersion: rabbitmq.com/v1beta1
kind: Binding
metadata:
name: billing-orders-event-binding
spec:
source: orders.event.x
destination: billing.orders.event.q
destinationType: queue
routingKey: evt.orders.#
rabbitmqClusterReference:
name: orders-rabbitmq
12.5 Deployment Invariants
- Persistent workload tidak boleh berjalan tanpa durable storage.
- Broker node harus tersebar di failure domains berbeda jika platform memungkinkan.
- Upgrade diuji di staging dengan synthetic workload.
- Backup/restore diuji, bukan hanya dikonfigurasi.
- Topology changes lewat PR/review.
- Credential rotation punya runbook.
- Apps punya readiness/liveness yang sadar connection/channel state.
13. Observability Blueprint
13.1 Dashboard Layout
Buat dashboard berlapis:
-
Cluster Overview
- node up/down;
- memory/disk alarms;
- connection count;
- channel count;
- file descriptors;
- network throughput;
- disk I/O.
-
Queue Health
- ready messages;
- unacked messages;
- ingress/egress rate;
- redelivery rate;
- consumer count;
- oldest message age;
- dead-letter rate.
-
Stream Health
- stream size;
- retention risk;
- append rate;
- read rate;
- consumer offset lag;
- segment/chunk behavior;
- partition hot spot.
-
Producer Health
- publish rate;
- confirm latency;
- returned messages;
- outbox pending;
- outbox oldest age;
- in-flight confirms;
- publish errors.
-
Consumer Health
- processing latency;
- ack/nack rate;
- retry count;
- DLQ count;
- handler exception rate;
- duplicate detected;
- inbox insert conflicts;
- downstream latency.
-
Business Flow
- order submitted count;
- reserve credit success/failure;
- fulfillment pending age;
- saga timeout;
- state transition lag;
- SLA breach count.
13.2 Log Contract
Setiap producer log:
{
"event": "rabbitmq.publish.confirmed",
"messageId": "msg-123",
"exchange": "orders.event.x",
"routingKey": "evt.orders.order-submitted.v1",
"correlationId": "corr-123",
"confirmLatencyMs": 18,
"outboxId": "outbox-123"
}
Setiap consumer log:
{
"event": "rabbitmq.consume.processed",
"messageId": "msg-123",
"queue": "billing.orders.event.q",
"consumer": "billing-orders-consumer",
"correlationId": "corr-123",
"attempt": 2,
"redelivered": true,
"processingLatencyMs": 74,
"decision": "ACK"
}
13.3 Trace Contract
Trace harus menghubungkan:
- HTTP/API request;
- DB transaction;
- outbox row;
- RabbitMQ publish span;
- broker route metadata;
- consumer receive span;
- handler DB transaction;
- downstream calls;
- emitted follow-up message.
Kalau trace berhenti di producer dan tidak muncul di consumer, propagation belum benar.
14. Load Test Blueprint
14.1 Load Test Questions
Jangan mulai dari “berapa msg/sec?”. Mulai dari pertanyaan:
- Berapa throughput sustained untuk workload normal?
- Berapa p95/p99 end-to-end latency pada load normal?
- Berapa backlog drain rate setelah outage 30 menit?
- Berapa confirm latency saat quorum leader pindah?
- Berapa DLQ rate saat 1% message poison?
- Berapa max safe prefetch sebelum memory worker naik tidak terkendali?
- Berapa stream lag saat analytics consumer down 2 jam?
- Berapa lama recovery setelah broker node restart?
- Apakah outbox age tetap dalam SLA?
- Apakah business SLA tetap terpenuhi saat retry storm?
14.2 Test Matrix
| Dimension | Values |
|---|---|
| message size | 1KB, 10KB, 100KB |
| queue type | quorum queue, stream |
| producer count | 1, 5, 20 |
| consumer count | 1, 5, 20 |
| prefetch | 1, 10, 50, 200 |
| confirm mode | sync, batch, async |
| durability | persistent, transient for controlled comparison only |
| failure | none, broker restart, consumer crash, network cut |
| retry | none, 1%, 10% retryable failure |
| poison | 0%, 0.1%, 1% non-retryable |
14.3 PerfTest Examples
# Baseline quorum queue throughput
rabbitmq-perf-test \
--uri amqp://user:pass@rabbitmq:5672/%2f \
--queue perf.orders.command.q \
--queue-args x-queue-type=quorum \
--producers 4 \
--consumers 8 \
--pmessages 1000000 \
--size 1024 \
--confirm 100 \
--autoack false
# Prefetch sensitivity
rabbitmq-perf-test \
--uri amqp://user:pass@rabbitmq:5672/%2f \
--queue perf.prefetch.q \
--queue-args x-queue-type=quorum \
--producers 2 \
--consumers 10 \
--qos 50 \
--size 4096 \
--confirm 100
14.4 Stream PerfTest Examples
# Stream append/read throughput
stream-perf-test \
--uris rabbitmq-stream://user:pass@rabbitmq:5552 \
--stream perf.orders.audit.stream \
--producers 4 \
--consumers 4 \
--size 1024 \
--rate 50000
# Super stream partition test
stream-perf-test \
--uris rabbitmq-stream://user:pass@rabbitmq:5552 \
--stream perf.orders.analytics.super-stream \
--super-stream \
--producers 8 \
--consumers 8 \
--size 1024
Sesuaikan command dengan versi tool. Tujuan contoh ini adalah memberi bentuk eksperimen, bukan menggantikan dokumentasi CLI.
14.5 Benchmark Report Template
# RabbitMQ Benchmark Report
## Workload
- Message type:
- Payload size:
- Producer count:
- Consumer count:
- Queue/stream type:
- Durability:
- Confirm mode:
- Prefetch:
## Environment
- RabbitMQ version:
- Node count:
- CPU/memory:
- Storage:
- Network:
- JVM version:
- Client library version:
## Result
- Publish rate:
- Consume rate:
- End-to-end latency p50/p95/p99:
- Confirm latency p50/p95/p99:
- Queue depth max:
- Redelivery rate:
- Error rate:
- CPU/memory/disk/network:
## Interpretation
- Bottleneck:
- Safe capacity:
- Saturation signal:
- Recommended limit:
- Follow-up experiment:
15. Failure Test Blueprint
15.1 Failure Matrix
| Failure | Expected Behavior | Proof |
|---|---|---|
| Producer crash before DB commit | no outbox row, no publish | DB and broker show no message |
| Producer crash after DB commit before publish | outbox pending, relay publishes later | outbox age then published |
| Network timeout during publish | possible duplicate, no loss | same messageId, consumer dedup |
| Broker node restart | client recovers, confirm may timeout | reconnect metrics, no silent loss |
| Consumer crash before commit | message redelivered | redelivery metric |
| Consumer crash after commit before ack | duplicate delivered, dedup absorbs | inbox conflict, ack duplicate |
| Handler permanent validation error | DLQ | DLQ message with reason |
| Retryable downstream 503 | delayed retry | retry headers and eventual success |
| DLQ replay | controlled reprocessing | audit trail |
| Stream consumer down beyond retention | alert before data loss | retention risk alert |
| Hot partition | lag isolated to partition | partition lag dashboard |
| Disk alarm | publishing blocked/slowed | disk alarm metric + app backpressure |
15.2 Chaos Drill Checklist
Run drill di staging:
- kill consumer process during processing;
- kill producer during outbox relay;
- restart broker leader node;
- inject network latency;
- inject message schema error;
- inject poison message;
- pause downstream DB/API;
- fill queue faster than consumer;
- stop stream consumer until lag grows;
- replay DLQ message;
- rotate RabbitMQ password;
- rotate TLS certificate;
- perform rolling upgrade.
Drill tidak selesai sampai semua observability dan runbook terbukti bekerja.
16. Runbook Catalog
16.1 Queue Growth Runbook
Symptoms:
- ready messages naik terus;
- oldest message age naik;
- consumer count normal tetapi egress rendah;
- downstream latency tinggi.
Diagnosis:
- Cek ingress vs egress.
- Cek consumer errors.
- Cek downstream dependency.
- Cek prefetch dan unacked count.
- Cek resource broker: memory/disk/network.
- Cek apakah ada poison message blocking ordered queue.
Actions:
- scale consumer jika bottleneck CPU handler;
- reduce ingress atau shed load jika downstream saturated;
- move poison message ke DLQ jika blocking;
- tune prefetch jika unacked terlalu besar;
- add partition/queue split jika workload perlu isolation;
- escalate jika oldest age melewati SLA.
16.2 DLQ Spike Runbook
Symptoms:
- DLQ rate meningkat;
- error type sama berulang;
- retry exhausted.
Diagnosis:
- Kelompokkan DLQ by error code/message type/schema version.
- Cek deploy terbaru producer/consumer.
- Cek contract change.
- Cek downstream outage.
- Cek sample payload.
Actions:
- rollback bad producer jika schema rusak;
- hotfix consumer jika parsing/compatibility bug;
- replay setelah fix;
- park message jika butuh manual decision;
- discard hanya dengan approval dan audit.
16.3 Redelivery Storm Runbook
Symptoms:
- redelivery rate tinggi;
- same message id muncul berulang;
- consumer CPU tinggi tetapi progress rendah.
Diagnosis:
- Cek retry classifier.
- Cek apakah
basicNack(requeue=true)dipakai tanpa delay. - Cek exception root cause.
- Cek poison message.
- Cek prefetch.
Actions:
- stop consumer sementara jika storm menghancurkan system;
- ubah requeue immediate menjadi delayed retry;
- DLQ poison message;
- patch handler;
- replay controlled.
16.4 Publisher Confirm Latency Runbook
Symptoms:
- confirm latency p99 tinggi;
- outbox pending age naik;
- publish throughput turun.
Diagnosis:
- Cek broker disk I/O.
- Cek quorum queue leader health.
- Cek replication latency.
- Cek network.
- Cek message size.
- Cek publisher in-flight window.
Actions:
- reduce publish rate temporarily;
- scale/redistribute workload;
- split hot queue;
- tune batching/in-flight confirms;
- fix storage bottleneck;
- escalate broker cluster issue.
16.5 Stream Retention Risk Runbook
Symptoms:
- stream consumer lag besar;
- oldest unread offset mendekati retention boundary;
- replay consumer lambat.
Diagnosis:
- Cek append rate vs read rate.
- Cek consumer down duration.
- Cek retention policy.
- Cek partition hot spot.
- Cek downstream sink bottleneck.
Actions:
- scale consumers jika safe;
- pause noncritical producers if necessary;
- increase retention if storage allows;
- run catch-up mode;
- snapshot/rebuild if data already expired;
- document data loss if retention already passed.
17. Security and Governance Blueprint
17.1 Vhost Model
Gunakan virtual host untuk isolation:
/orders-prod
/orders-staging
/orders-dev
/shared-observability
Jangan campur dev/test/prod dalam vhost yang sama.
17.2 Permission Model
| Actor | Configure | Write | Read |
|---|---|---|---|
| topology operator | approved resources | none or limited | none |
| order-service producer | none | orders.command.x, orders.event.x, streams | none |
| billing consumer | none | retry/DLX if republish strategy | billing.orders.event.q |
| ops replay tool | none/limited | original exchanges | DLQ/parking lot |
| monitoring | none | none | metrics/API only |
17.3 Governance Checklist
- Every exchange has owner.
- Every queue has owner.
- Every DLQ has owner and runbook.
- Every routing key has schema contract.
- Every message type has compatibility policy.
- Every stream has retention policy.
- Every replay tool has authorization/audit trail.
- Every topology change uses PR.
- Every secret has rotation plan.
- Every tenant isolation rule is tested.
18. Architecture Decision Records
18.1 ADR: Queue vs Stream
# ADR: Use Quorum Queue for ReserveCreditCommand
## Status
Accepted
## Context
ReserveCreditCommand must be processed by exactly one logical worker. It does not require historical replay by many consumers. It requires retry, DLQ, and at-least-once processing.
## Decision
Use a durable quorum queue bound to `orders.command.x` with routing key `cmd.orders.reserve-credit.v1`.
## Consequences
- Stronger HA/durability profile than non-replicated queue.
- Consumer must be idempotent.
- Publish path must use publisher confirms.
- Retry/DLQ topology required.
- Queue growth must be monitored.
18.2 ADR: Stream for Audit
# ADR: Use RabbitMQ Stream for Order Audit Log
## Status
Accepted
## Context
Audit consumers need non-destructive reads, replay, and independent progress. Traditional queues remove messages after consumption and are not suitable as an audit log.
## Decision
Publish order audit events to `orders.audit.stream` with retention set by compliance requirement.
## Consequences
- Consumers track offsets.
- Replay tooling must be safe.
- Retention risk must be monitored.
- Stream is not source of truth; service DB remains source of truth for current state.
18.3 ADR: Transactional Outbox
# ADR: Use Transactional Outbox for Domain Event Publishing
## Status
Accepted
## Context
Order state changes and event publication must not diverge silently. Direct publish inside transaction cannot atomically commit with RabbitMQ.
## Decision
Write message to `message_outbox` in the same database transaction as domain state change. A relay publishes with publisher confirms.
## Consequences
- Event publication becomes eventually consistent.
- Duplicate publish is possible, so consumers must be idempotent.
- Outbox age is an SLO metric.
19. Implementation Package Structure
Contoh struktur Java service:
order-service/
src/main/java/com/example/orders/
api/
domain/
Order.java
OrderState.java
OrderService.java
messaging/
contract/
MessageEnvelope.java
MessageHeaders.java
OrderSubmittedEvent.java
ReserveCreditCommand.java
outbox/
OutboxMessage.java
OutboxRepository.java
RabbitOutboxRelay.java
rabbit/
RabbitConnectionManager.java
ConfirmingRabbitPublisher.java
RabbitTopologyVerifier.java
consumer/
IdempotentRabbitConsumer.java
RetryClassifier.java
BillingEventHandler.java
stream/
AuditStreamPublisher.java
AuditStreamConsumer.java
observability/
MessagingMetrics.java
MessagingTrace.java
config/
RabbitConfig.java
src/test/java/
messaging/
OutboxRelayTest.java
IdempotentConsumerTest.java
RetryClassifierTest.java
RabbitIntegrationTest.java
19.1 Library Boundary
Untuk organisasi besar, ekstrak reusable internal library:
platform-rabbitmq-core
- envelope model
- header conventions
- publisher wrapper
- consumer wrapper
- retry classifier interfaces
- metrics conventions
- trace propagation
- topology verifier
platform-rabbitmq-test
- test fixtures
- fake deliveries
- testcontainers support
- assertion helpers
platform-rabbitmq-spring
- Spring Boot auto-configuration
- RabbitTemplate wrappers
- listener advice
- health indicators
Jangan masukkan domain logic ke shared library. Shared library hanya menyediakan primitives dan guardrails.
20. Production Readiness Checklist
20.1 Design Checklist
- Message type classified: command/event/stream/RPC.
- Queue/stream primitive selected with reason.
- Routing key documented.
- Owner team defined.
- Idempotency key defined.
- Retry policy defined.
- DLQ owner/runbook defined.
- Message contract versioned.
- PII classification done.
- Ordering requirement reviewed.
- Backpressure behavior defined.
- Observability plan defined.
20.2 Producer Checklist
- Transactional outbox used where DB state triggers message.
- Publisher confirms enabled.
-
mandatory=trueor alternate exchange used. - Returned/unroutable messages handled.
- In-flight confirm bounded.
- Confirm timeout handled.
- Publish retry has backoff.
- Outbox age metric exposed.
- Message id stable across retries.
- Correlation/causation propagated.
20.3 Consumer Checklist
- Manual ack used.
- Ack only after safe commit.
- Idempotency implemented.
- Duplicate tested.
- Retry classifier implemented.
- Poison message DLQ path tested.
- Prefetch tuned.
- Graceful shutdown tested.
- Handler latency measured.
- Redelivery metric exposed.
20.4 Broker Checklist
- Queue type selected intentionally.
- Durable topology.
- DLX configured by policy where possible.
- TTL/max length reviewed.
- Quorum/stream replication reviewed.
- Disk capacity planned.
- Memory/disk alarms monitored.
- Prometheus/management enabled securely.
- Backup/restore tested.
- Upgrade tested.
20.5 Stream Checklist
- Retention configured.
- Offset tracking strategy defined.
- Replay safety reviewed.
- Consumer lag monitored.
- Hot partition monitored.
- Super stream partition key defined.
- Partition count decision documented.
- Catch-up strategy tested.
20.6 Security Checklist
- TLS enabled where required.
- Credentials stored in secret manager.
- Per-service users.
- Least privilege permissions.
- Vhost isolation.
- Management UI access restricted.
- Replay tool access controlled.
- PII encrypted or excluded according to policy.
- Audit logs retained.
21. Final Skill Rubric
Gunakan rubric ini untuk menilai apakah kamu sudah memiliki fluency level tinggi.
21.1 Level 1 — API User
Kamu bisa:
- publish message;
- consume message;
- membuat queue/exchange basic;
- memakai
RabbitTemplateatau Java client.
Belum cukup untuk production ownership.
21.2 Level 2 — Reliable App Developer
Kamu bisa:
- memakai manual ack;
- memakai publisher confirms;
- memahami prefetch;
- membuat retry/DLQ;
- menulis idempotent consumer;
- menghindari auto-ack untuk workload durable.
Cukup untuk membangun feature dengan review senior.
21.3 Level 3 — Production Engineer
Kamu bisa:
- mendesain topology multi-service;
- membuat outbox/inbox;
- melakukan failure modelling;
- membuat observability dan runbook;
- melakukan load test;
- menangani DLQ/replay;
- memilih queue vs stream;
- mengoperasikan RabbitMQ cluster dengan aman.
Cukup untuk menjadi owner service messaging production.
21.4 Level 4 — Platform/Architecture Engineer
Kamu bisa:
- membuat messaging platform standard untuk banyak tim;
- membuat topology-as-code governance;
- mendesain multi-tenant isolation;
- membuat capacity model organisasi;
- membuat reusable Java messaging library;
- menentukan SLO/SLA messaging;
- memimpin incident review;
- membuat architecture decision yang defensible.
Ini level yang mendekati top-tier engineering practice.
21.5 Level 5 — Principal-Level Fluency
Kamu bisa:
- menjelaskan trade-off RabbitMQ vs Kafka/Pulsar/SQS/NATS tanpa fanboy bias;
- membedakan correctness problem dari tooling problem;
- menolak desain broker yang salah walau terlihat “simple”;
- menghubungkan messaging dengan domain state machine, compliance, audit, dan operational risk;
- membuat desain yang tetap bisa dioperasikan oleh manusia;
- memformalkan invariants, failure matrix, runbook, dan feedback loop.
Target seri ini adalah membawa kamu minimal ke Level 3 kuat, dan memberi struktur untuk mengejar Level 4-5.
22. Final Practice Plan
Gunakan 20-hour deliberate practice versi capstone:
Hour 1-3 — Topology Design
- Buat exchange/queue/stream topology untuk domain order.
- Tulis routing keys.
- Tulis queue owner dan DLQ owner.
- Buat Mermaid topology diagram.
Hour 4-6 — Producer Reliability
- Implement outbox table.
- Implement outbox relay.
- Enable publisher confirms.
- Simulasikan broker down.
- Buktikan outbox tidak hilang.
Hour 7-9 — Consumer Correctness
- Implement manual ack.
- Implement inbox/dedup.
- Crash consumer setelah DB commit sebelum ack.
- Buktikan duplicate aman.
Hour 10-12 — Retry and DLQ
- Implement delayed retry.
- Inject retryable exception.
- Inject non-retryable exception.
- Replay DLQ secara controlled.
Hour 13-15 — Streams
- Publish audit events ke stream.
- Buat consumer dengan external checkpoint.
- Replay dari offset lama.
- Simulasikan retention risk.
Hour 16-18 — Observability and Load
- Tambahkan metrics producer/consumer.
- Buat dashboard minimal.
- Jalankan benchmark.
- Tentukan safe throughput dan alert threshold.
Hour 19-20 — Failure Review
- Isi failure matrix.
- Tulis ADR.
- Tulis runbook.
- Presentasikan design review seperti internal architecture review.
23. Common Capstone Mistakes
- Menggunakan RabbitMQ sebagai database.
- Menganggap publisher confirm berarti consumer sudah memproses message.
- Menganggap ack berarti business transaction pasti benar.
- Menggunakan auto ack untuk workload penting.
- Membuat retry infinite dengan immediate requeue.
- Mengabaikan returned/unroutable messages.
- Menaruh banyak subscriber pada queue yang sama padahal subscriber butuh isolation.
- Menggunakan event untuk command yang membutuhkan owner jelas.
- Menggunakan RPC untuk long-running workflow.
- Menganggap FIFO global ada.
- Mengabaikan duplicate delivery.
- Menganggap DLQ adalah solusi, padahal DLQ adalah alarm.
- Tidak punya replay policy.
- Tidak punya retention risk alert untuk stream.
- Tidak mengukur confirm latency.
- Tidak mengukur oldest message age.
- Tidak punya owner untuk topology.
- Tidak punya governance untuk schema.
- Tidak melakukan failure test.
- Tidak bisa menjawab “apa yang terjadi kalau crash di titik ini?”
24. Series Closure
Seri ini selesai di Part 035.
Kamu sekarang memiliki blueprint lengkap untuk:
- memahami RabbitMQ sebagai broker, queue system, dan stream platform;
- menggunakan Java client dan Spring AMQP tanpa bergantung pada magic abstraction;
- membangun producer dan consumer yang reliable;
- mendesain command/event/RPC/routing topology;
- membangun retry/DLQ/parking lot;
- menangani idempotency, ordering, deduplication, dan causality;
- menggunakan RabbitMQ Streams dan Super Streams;
- mendesain pipeline, windowing, batching, fanout, aggregator, saga;
- melakukan performance modelling, benchmarking, dan JVM tuning;
- deploy RabbitMQ cluster/Kubernetes Operator;
- membuat observability, security, governance, compliance, dan runbook;
- menyatukan semuanya menjadi production-grade platform blueprint.
Bagian terakhir ini bukan akhir pembelajaran. Ini adalah titik mulai untuk latihan nyata: ambil satu service, audit topology-nya, tulis failure matrix, lalu buktikan dengan chaos/load test.
Jika sebuah desain RabbitMQ tidak bisa menjelaskan message lifecycle, ack timing, duplicate handling, retry budget, DLQ ownership, stream retention, dan operational runbook, desain itu belum production-grade.
25. Official References
Gunakan referensi resmi berikut saat mengimplementasikan blueprint:
- RabbitMQ AMQP 0-9-1 Concepts: exchange, queue, binding, routing.
- RabbitMQ Java Client API Guide.
- RabbitMQ Consumer Acknowledgements and Publisher Confirms.
- RabbitMQ Reliability Guide.
- RabbitMQ Dead Letter Exchanges.
- RabbitMQ Quorum Queues.
- RabbitMQ Streams and Super Streams documentation.
- RabbitMQ Stream Java Client documentation.
- RabbitMQ Kubernetes Cluster Operator documentation.
- RabbitMQ Monitoring and Prometheus documentation.
- RabbitMQ Production Checklist.
You just completed lesson 35 in final stretch. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.
Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.