Final StretchOrdered learning track

Capstone Production-Grade Blueprint

Learn Java RabbitMQ, RabbitMQ Streams, Patterns, and Deployment In Action - Part 035

Capstone blueprint untuk membangun platform messaging Java + RabbitMQ production-grade: topology, outbox, retry, DLQ, streams, observability, deployment, load test, failure test, governance, dan final skill rubric.

24 min read4780 words
Prev
Finish
Lesson 3535 lesson track3035 Final Stretch
#java#rabbitmq#amqp#rabbitmq-streams+7 more

Part 035 — Capstone: Production-Grade Java RabbitMQ Platform Blueprint

Pada bagian terakhir ini kita menyatukan semua konsep menjadi satu blueprint yang bisa dipakai sebagai internal engineering handbook untuk membangun platform messaging Java + RabbitMQ yang production-grade.

Targetnya bukan membuat demo send() dan receive(). Targetnya adalah membangun sistem yang tetap benar ketika:

  • broker restart;
  • network putus di tengah publish;
  • producer tidak tahu apakah message sudah sampai;
  • consumer crash setelah commit database tetapi sebelum ack;
  • DLQ tiba-tiba melonjak;
  • queue tumbuh lebih cepat daripada worker;
  • stream retention hampir menghapus event yang belum diproses;
  • duplicate delivery terjadi ribuan kali;
  • satu tenant membuat hot partition;
  • observability harus menjawab “message ini hilang, lambat, atau sedang retry?”;
  • tim lain menambah subscriber baru tanpa merusak topology lama;
  • auditor/regulator meminta jejak keputusan sistem.

Ini adalah bagian capstone. Karena itu, kita akan menutup seri dengan reference architecture, topology, Java implementation blueprint, deployment checklist, runbook, load test, failure test, ADR, dan rubric akhir.


1. Capstone Goal

Kita akan mendesain platform bernama Regulated Order Messaging Platform.

Domain ini sengaja dipilih karena cukup kompleks untuk memaksa kita memakai RabbitMQ dengan benar:

  • ada command yang harus diproses worker;
  • ada domain event yang harus diterima banyak subscriber;
  • ada stream yang perlu replay untuk audit dan analytics;
  • ada retry dan DLQ;
  • ada idempotency;
  • ada topology governance;
  • ada traceability lintas service;
  • ada operational runbook.

Kita tidak akan membangun full application. Kita membangun blueprint arsitektur dan implementasi inti yang bisa diturunkan ke Java microservices nyata.

1.1 What Good Looks Like

Blueprint ini dianggap benar jika memenuhi invariant berikut:

  1. Tidak ada silent message loss. Producer memakai publisher confirms, unroutable message ditangani, dan topology dideklarasikan secara deterministik.
  2. Consumer aman terhadap duplicate. Semua handler yang melakukan side effect punya idempotency boundary.
  3. Ack hanya setelah state aman. Consumer tidak ack sebelum commit state atau sebelum side effect dianggap safe.
  4. Retry tidak infinite. Retry punya budget, delay, classification, DLQ, parking lot, dan owner.
  5. Queue bukan tempat menyimpan backlog tanpa batas. Ada capacity model, alert, dan load shedding strategy.
  6. Stream dipakai untuk replay dan fan-out historis. Queue dipakai untuk work dispatch; stream dipakai untuk log/replay/high fan-out.
  7. Topology adalah contract. Exchange, queue, binding, policy, DLX, stream, permission, dan retention dikelola sebagai code.
  8. Observability menjawab pertanyaan operasional. Bukan hanya CPU/memory, tetapi confirm latency, queue growth, redelivery rate, DLQ rate, consumer lag, stream offset lag, dan business correlation.
  9. Deployment sadar failure. Quorum queue, stream replication, anti-affinity, persistent volume, rolling upgrade, backup/restore, dan DR masuk ke design.
  10. Setiap trade-off terdokumentasi. Keputusan penting punya ADR.

2. Kaufman Closure: From Practice to Operational Fluency

Josh Kaufman menekankan bahwa skill bisa dipercepat dengan memecah skill menjadi sub-skill, belajar cukup untuk self-correct, menghilangkan barrier latihan, dan melakukan deliberate practice.

Di seri ini, decomposition-nya sudah kita jalani:

CapabilitySeri PartOutput Praktis
Broker mental model002-003Bisa menjelaskan exchange, queue, binding, routing, log, stream
Java client correctness004-007Bisa membuat producer/consumer lifecycle yang aman
Communication pattern008-011Bisa memilih command/event/RPC/routing topology
Reliability012-018Bisa membangun at-least-once + idempotency + failure modelling
Streams019-023Bisa memakai RabbitMQ Streams, offsets, replay, super streams, batching
Pipeline/data pattern024-027Bisa mendesain pipeline, windowing, batching, saga
Performance028-030Bisa capacity planning, benchmark, dan JVM tuning
Operations031-034Bisa deploy, observe, secure, govern
Capstone035Bisa menyatukan semuanya menjadi platform blueprint

Capstone ini adalah latihan self-correction terbesar. Jangan hanya membaca. Gunakan bagian ini untuk mengaudit desain RabbitMQ yang pernah kamu buat.

Pertanyaan self-correction paling penting:

Jika service crash pada titik terburuk, apakah message akan hilang, terduplikasi, atau tertahan? Apakah sistem dan tim operasional tahu jawabannya?


3. Reference Architecture Overview

Kita akan memakai empat jalur messaging utama:

  1. Command Queue Path untuk perintah kerja yang harus diproses satu atau beberapa worker.
  2. Domain Event Path untuk notifikasi event ke banyak subscriber.
  3. Retry/DLQ/Parking Lot Path untuk failure recovery.
  4. Stream Path untuk audit, replay, analytics, dan high fan-out historical consumers.

3.1 Architectural Principle

Gunakan RabbitMQ untuk coordination and delivery boundary, bukan untuk menyembunyikan domain inconsistency.

Artinya:

  • RabbitMQ mengirim signal dan data antar boundary.
  • Database service tetap source of truth untuk state milik service itu.
  • Message handler harus bisa dipanggil ulang.
  • Broker tidak menggantikan transaction, constraint, idempotency, atau state machine.
  • Stream tidak otomatis membuat event sourcing benar; stream hanya menyediakan log yang bisa dibaca ulang.

4. Workload Classification

Sebelum membuat exchange/queue/stream, klasifikasikan message.

WorkloadExamplePrimitive UtamaKenapa
CommandReserveCreditCommandQuorum queueButuh single logical processing, retry, DLQ, work dispatch
Domain eventOrderSubmittedEventTopic exchange + subscriber queueBanyak subscriber, queue ownership per subscriber
Audit eventOrderStateChangedStreamButuh retention, replay, forensic query
Analytics feedOrderMetricsEventSuper streamHigh throughput, partitioned by tenant/account/order
Request/reply internalCheckRiskScoreDirect Reply-To or short-lived reply queueHanya untuk low-latency bounded RPC
Long workflowOrderSagaStepCompletedEvent + state machineJangan chain synchronous broker RPC
Delayed retryfailed command/eventRetry exchange + delay queuesControlled retry budget
Poison recordnon-retryable failureDLQ + parking lotHuman/operational ownership

4.1 Decision Rule

Gunakan rule sederhana ini:


5. Topology Blueprint

Topology harus diperlakukan sebagai public contract. Jangan biarkan setiap service membuat queue/exchange secara ad hoc tanpa ownership.

5.1 Naming Convention

Gunakan format:

<domain>.<intent>.<kind>[.<qualifier>]

Contoh:

ResourceNameOwner
Command exchangeorders.command.xOrder platform team
Event exchangeorders.event.xOrder platform team
Retry exchangeorders.retry.xPlatform messaging team
DLXorders.dlxPlatform messaging team
Command queuerisk.reserve-credit.command.qRisk team
Subscriber queuebilling.orders.event.qBilling team
DLQorders.dead-letter.qPlatform messaging team
Parking lotorders.parking-lot.qOperations + domain owner
Audit streamorders.audit.streamAudit/data platform team
Super streamorders.analytics.super-streamAnalytics platform team

5.2 Exchange Blueprint

orders.command.x
  type: topic
  purpose: route command messages by command type and bounded context

orders.event.x
  type: topic
  purpose: publish domain events to queue-per-subscriber

orders.retry.x
  type: direct/topic
  purpose: delayed retry routing back to original work queue

orders.dlx
  type: topic
  purpose: dead-letter non-retryable or exhausted messages

orders.unroutable.x
  type: fanout/direct
  purpose: collect mandatory-return equivalent topology errors when alternate exchange is used

5.3 Routing Key Taxonomy

Command routing keys:

cmd.orders.reserve-credit.v1
cmd.orders.release-credit.v1
cmd.orders.allocate-inventory.v1
cmd.orders.cancel-order.v1

Event routing keys:

evt.orders.order-submitted.v1
evt.orders.order-approved.v1
evt.orders.order-rejected.v1
evt.orders.order-cancelled.v1
evt.orders.order-state-changed.v1

Analytics stream routing keys:

analytics.orders.<tenantId>
analytics.orders.<region>.<tenantId>

5.4 Binding Blueprint

5.5 Queue Type Rule

Use CaseQueue Type
Production durable command queueQuorum queue
Durable subscriber queueQuorum queue unless workload explicitly needs classic-specific behavior
Temporary callbackExclusive auto-delete classic queue or Direct Reply-To
High-throughput replayStream, not queue
Partitioned replaySuper stream

Quorum queue bukan silver bullet. Ia memberi replication/durability model yang lebih cocok untuk production data safety, tetapi tetap membutuhkan publisher confirm, consumer ack, idempotency, dan capacity control.


6. Message Contract Blueprint

Semua message harus punya envelope yang konsisten.

6.1 Envelope

{
  "messageId": "01JZ1...",
  "messageType": "OrderSubmitted",
  "messageVersion": 1,
  "occurredAt": "2026-07-02T10:15:30Z",
  "producedAt": "2026-07-02T10:15:31Z",
  "producer": "order-service",
  "tenantId": "tenant-123",
  "correlationId": "corr-abc",
  "causationId": "cmd-xyz",
  "traceId": "4bf92f3577b34da6a3ce929d0e0e4736",
  "schema": "orders.order-submitted.v1",
  "payload": {
    "orderId": "ord-1001",
    "customerId": "cust-771",
    "totalAmount": "125000.00",
    "currency": "IDR"
  }
}

6.2 AMQP Properties Mapping

Envelope FieldAMQP Property/Header
messageIdmessageId
correlationIdcorrelationId
messageTypetype
producedAttimestamp
tenantIdheader tenant-id
schemaheader schema
traceIdheader traceparent or tracing propagation header
causationIdheader causation-id

6.3 Contract Invariant

  • messageId immutable.
  • messageType immutable.
  • messageVersion monotonic untuk schema family.
  • correlationId dibawa lintas request/workflow.
  • causationId menunjuk message yang menyebabkan message baru.
  • Payload tidak boleh bergantung pada Java class name.
  • Consumer tidak boleh require field baru dari event lama tanpa compatibility plan.
  • PII classification harus jelas.
  • Payload harus punya business key yang cukup untuk idempotency.

7. Outbox Blueprint

Producer yang publish event/command sebagai akibat perubahan database harus memakai outbox.

Masalah yang diselesaikan:

  • DB commit berhasil, publish gagal.
  • Publish berhasil, DB rollback.
  • Service crash setelah DB commit sebelum publish.
  • Network timeout membuat publish ambiguity.

7.1 Outbox Table

CREATE TABLE message_outbox (
    id                UUID PRIMARY KEY,
    aggregate_type    VARCHAR(100) NOT NULL,
    aggregate_id      VARCHAR(100) NOT NULL,
    message_type      VARCHAR(150) NOT NULL,
    routing_key       VARCHAR(200) NOT NULL,
    exchange_name     VARCHAR(200) NOT NULL,
    payload_json      TEXT NOT NULL,
    headers_json      TEXT NOT NULL,
    status            VARCHAR(30) NOT NULL,
    attempt_count     INTEGER NOT NULL DEFAULT 0,
    next_attempt_at   TIMESTAMP NOT NULL,
    created_at        TIMESTAMP NOT NULL,
    published_at      TIMESTAMP NULL,
    last_error        TEXT NULL
);

CREATE INDEX idx_outbox_pending
ON message_outbox (status, next_attempt_at, created_at);

7.2 Outbox State Machine

7.3 Java Outbox Relay Skeleton

public final class RabbitOutboxRelay implements AutoCloseable {
    private final OutboxRepository outboxRepository;
    private final ConfirmingRabbitPublisher publisher;
    private final Clock clock;

    public void runBatch(int batchSize) {
        List<OutboxMessage> batch = outboxRepository.claimPending(batchSize, clock.instant());

        for (OutboxMessage msg : batch) {
            try {
                PublishResult result = publisher.publishWithConfirm(
                    msg.exchangeName(),
                    msg.routingKey(),
                    msg.payload(),
                    msg.properties(),
                    Duration.ofSeconds(5)
                );

                if (result.confirmed()) {
                    outboxRepository.markPublished(msg.id(), clock.instant());
                } else if (result.unroutable()) {
                    outboxRepository.markFailed(msg.id(), "UNROUTABLE");
                } else {
                    outboxRepository.releaseForRetry(msg.id(), nextBackoff(msg.attemptCount()));
                }
            } catch (Exception e) {
                outboxRepository.releaseForRetry(msg.id(), nextBackoff(msg.attemptCount()));
            }
        }
    }

    private Instant nextBackoff(int attemptCount) {
        long seconds = Math.min(300, (long) Math.pow(2, Math.min(attemptCount, 8)));
        return clock.instant().plusSeconds(seconds);
    }

    @Override
    public void close() {
        publisher.close();
    }
}

7.4 Outbox Relay Invariants

  • Relay hanya menandai PUBLISHED setelah publisher confirm.
  • Relay tidak menghapus row terlalu cepat; audit retention tetap ada.
  • messageId stabil antar retry.
  • Publish retry boleh menghasilkan duplicate, maka consumer wajib idempotent.
  • mandatory=true atau alternate exchange harus dipakai untuk mendeteksi routing error.
  • Backoff dipakai agar broker tidak diserang ulang saat outage.

8. Producer Blueprint

Producer production-grade punya beberapa lapisan:

8.1 Confirming Publisher Skeleton

public final class ConfirmingRabbitPublisher implements AutoCloseable {
    private final Channel channel;

    public ConfirmingRabbitPublisher(Channel channel) throws IOException {
        this.channel = channel;
        this.channel.confirmSelect();
    }

    public PublishResult publishWithConfirm(
        String exchange,
        String routingKey,
        byte[] body,
        AMQP.BasicProperties properties,
        Duration timeout
    ) throws IOException, InterruptedException {

        channel.basicPublish(exchange, routingKey, true, properties, body);

        boolean confirmed = channel.waitForConfirms(timeout.toMillis());
        if (!confirmed) {
            return PublishResult.notConfirmed();
        }

        return PublishResult.confirmed();
    }

    @Override
    public void close() throws Exception {
        channel.close();
    }
}

Di sistem high-throughput, gunakan async confirms dan bounded in-flight map, bukan synchronous confirm per message.

8.2 Producer Metrics

Wajib expose:

  • publish attempt count;
  • publish confirmed count;
  • publish nacked count;
  • publish returned/unroutable count;
  • publish timeout count;
  • confirm latency p50/p95/p99;
  • in-flight confirm count;
  • outbox pending count;
  • outbox age max;
  • relay batch duration;
  • retry attempt count.

8.3 Producer Health Check

Producer dianggap tidak sehat jika:

  • connection down;
  • channel recovery loop;
  • confirm latency melewati threshold;
  • outbox oldest pending age melewati SLA;
  • returned message > 0 untuk topology production;
  • broker blocked connection berlangsung terlalu lama;
  • relay tidak bisa claim/mark row.

9. Consumer Blueprint

Consumer harus menggabungkan manual ack, idempotency, business transaction, retry classification, dan observability.

9.1 Consumer State Machine

9.2 Inbox/Dedup Table

CREATE TABLE processed_message (
    consumer_name     VARCHAR(150) NOT NULL,
    message_id        VARCHAR(150) NOT NULL,
    aggregate_id      VARCHAR(150) NULL,
    processed_at      TIMESTAMP NOT NULL,
    result_hash       VARCHAR(128) NULL,
    PRIMARY KEY (consumer_name, message_id)
);

9.3 Consumer Handler Skeleton

public final class IdempotentRabbitConsumer {
    private final InboxRepository inbox;
    private final DomainHandler handler;
    private final RetryClassifier retryClassifier;

    public void handle(Channel channel, Delivery delivery) throws IOException {
        long tag = delivery.getEnvelope().getDeliveryTag();
        MessageEnvelope envelope = decode(delivery);

        try {
            boolean firstTime = inbox.tryStart("billing-orders-consumer", envelope.messageId());
            if (!firstTime) {
                channel.basicAck(tag, false);
                return;
            }

            handler.handleInTransaction(envelope);
            inbox.markProcessed("billing-orders-consumer", envelope.messageId());
            channel.basicAck(tag, false);

        } catch (Exception e) {
            FailureDecision decision = retryClassifier.classify(e, delivery, envelope);

            switch (decision.action()) {
                case RETRY_REQUEUE -> channel.basicNack(tag, false, true);
                case DEAD_LETTER -> channel.basicNack(tag, false, false);
                case ACK_AFTER_COMPENSATION -> channel.basicAck(tag, false);
            }
        }
    }
}

9.4 Consumer Invariants

  • Auto ack dilarang untuk workload durable.
  • Ack hanya setelah state aman.
  • Duplicate harus safe.
  • Retry harus terbatas.
  • Handler tidak boleh sleep panjang di consumer thread tanpa bounded executor strategy.
  • Prefetch harus mencerminkan work budget.
  • Consumer harus graceful shutdown: stop accepting delivery, finish in-flight, ack/nack safely, close channel.

10. Retry, DLQ, and Parking Lot Blueprint

10.1 Retry Topology

10.2 Retry Header Contract

Gunakan header eksplisit:

x-original-exchange
x-original-routing-key
x-retry-count
x-first-failure-at
x-last-failure-at
x-last-error-code
x-last-error-message
x-failure-class
x-parking-reason

Jangan hanya bergantung pada x-death untuk domain logic, karena format dan accumulation-nya lebih cocok untuk operational metadata daripada business decision contract.

10.3 Failure Classification

FailureRetry?Action
DB deadlockYesdelayed retry
HTTP 503 downstreamYesdelayed retry + circuit breaker
validation errorNoDLQ
unknown schema versionNoDLQ + contract alert
missing required entity but eventually consistentYesdelayed retry with max age
authorization/permission errorUsually noDLQ/security alert
poison message causing repeated exceptionNo after budgetparking lot

10.4 DLQ Ownership

DLQ tanpa owner adalah kuburan message.

Setiap DLQ harus punya:

  • owner team;
  • severity policy;
  • max acceptable age;
  • replay procedure;
  • discard procedure;
  • PII handling;
  • audit trail;
  • dashboard;
  • alert threshold.

11. Stream and Super Stream Blueprint

Gunakan stream untuk audit dan replay, bukan sebagai pengganti semua queue.

11.1 Audit Stream

orders.audit.stream
  type: stream
  retention: 30-180 days depending compliance
  routing source: outbox relay / event publisher
  ordering key: aggregateId/orderId
  consumer examples:
    - audit projector
    - compliance export
    - forensic replay tool
    - materialized view rebuild

11.2 Analytics Super Stream

orders.analytics.super-stream
  partitions: 12 or 24 initially
  routing key: tenantId or accountId
  use case: high-throughput analytics processing
  consumers:
    - metrics aggregator
    - anomaly detection
    - BI exporter

11.3 Stream Replay Safety

A replay-safe consumer:

  • tidak mengirim email ulang tanpa side-effect ledger;
  • tidak melakukan double charge;
  • tidak menulis projection tanpa idempotent upsert;
  • bisa memulai dari offset lama;
  • punya checkpoint yang konsisten dengan state output;
  • dapat membedakan normal live processing dan replay mode;
  • punya throttle agar replay tidak menghancurkan downstream.

11.4 Stream Offset Decision

Consumer TypeOffset Store
Stateless analyticsserver-side offset may be enough
Projection with DB outputexternal DB checkpoint in same transaction as projection
Audit exportexternal checkpoint + export ledger
Experiment/debug consumermanual offset or named temporary consumer
Multi-threaded partition processingper-partition checkpoint with gap tracking

12. Deployment Blueprint

12.1 Cluster Model

Production RabbitMQ cluster minimal harus mendukung:

  • persistent volumes;
  • anti-affinity antar node;
  • resource request/limit yang realistis;
  • monitoring;
  • TLS/secrets;
  • backup/restore plan;
  • upgrade plan;
  • queue/stream leader awareness;
  • capacity margin.

12.2 Kubernetes High-Level Model

12.3 RabbitMQCluster Example

apiVersion: rabbitmq.com/v1beta1
kind: RabbitmqCluster
metadata:
  name: orders-rabbitmq
  namespace: rabbitmq
spec:
  replicas: 3
  persistence:
    storageClassName: fast-ssd
    storage: 200Gi
  resources:
    requests:
      cpu: "2"
      memory: 8Gi
    limits:
      cpu: "4"
      memory: 16Gi
  rabbitmq:
    additionalPlugins:
      - rabbitmq_prometheus
      - rabbitmq_stream
      - rabbitmq_management

Sesuaikan field manifest dengan versi operator yang dipakai. Jangan copy-paste YAML production tanpa validasi terhadap dokumentasi versi operator.

12.4 Topology-as-Code Example

apiVersion: rabbitmq.com/v1beta1
kind: Exchange
metadata:
  name: orders-event-exchange
spec:
  name: orders.event.x
  type: topic
  durable: true
  rabbitmqClusterReference:
    name: orders-rabbitmq
---
apiVersion: rabbitmq.com/v1beta1
kind: Queue
metadata:
  name: billing-orders-event-queue
spec:
  name: billing.orders.event.q
  durable: true
  type: quorum
  rabbitmqClusterReference:
    name: orders-rabbitmq
---
apiVersion: rabbitmq.com/v1beta1
kind: Binding
metadata:
  name: billing-orders-event-binding
spec:
  source: orders.event.x
  destination: billing.orders.event.q
  destinationType: queue
  routingKey: evt.orders.#
  rabbitmqClusterReference:
    name: orders-rabbitmq

12.5 Deployment Invariants

  • Persistent workload tidak boleh berjalan tanpa durable storage.
  • Broker node harus tersebar di failure domains berbeda jika platform memungkinkan.
  • Upgrade diuji di staging dengan synthetic workload.
  • Backup/restore diuji, bukan hanya dikonfigurasi.
  • Topology changes lewat PR/review.
  • Credential rotation punya runbook.
  • Apps punya readiness/liveness yang sadar connection/channel state.

13. Observability Blueprint

13.1 Dashboard Layout

Buat dashboard berlapis:

  1. Cluster Overview

    • node up/down;
    • memory/disk alarms;
    • connection count;
    • channel count;
    • file descriptors;
    • network throughput;
    • disk I/O.
  2. Queue Health

    • ready messages;
    • unacked messages;
    • ingress/egress rate;
    • redelivery rate;
    • consumer count;
    • oldest message age;
    • dead-letter rate.
  3. Stream Health

    • stream size;
    • retention risk;
    • append rate;
    • read rate;
    • consumer offset lag;
    • segment/chunk behavior;
    • partition hot spot.
  4. Producer Health

    • publish rate;
    • confirm latency;
    • returned messages;
    • outbox pending;
    • outbox oldest age;
    • in-flight confirms;
    • publish errors.
  5. Consumer Health

    • processing latency;
    • ack/nack rate;
    • retry count;
    • DLQ count;
    • handler exception rate;
    • duplicate detected;
    • inbox insert conflicts;
    • downstream latency.
  6. Business Flow

    • order submitted count;
    • reserve credit success/failure;
    • fulfillment pending age;
    • saga timeout;
    • state transition lag;
    • SLA breach count.

13.2 Log Contract

Setiap producer log:

{
  "event": "rabbitmq.publish.confirmed",
  "messageId": "msg-123",
  "exchange": "orders.event.x",
  "routingKey": "evt.orders.order-submitted.v1",
  "correlationId": "corr-123",
  "confirmLatencyMs": 18,
  "outboxId": "outbox-123"
}

Setiap consumer log:

{
  "event": "rabbitmq.consume.processed",
  "messageId": "msg-123",
  "queue": "billing.orders.event.q",
  "consumer": "billing-orders-consumer",
  "correlationId": "corr-123",
  "attempt": 2,
  "redelivered": true,
  "processingLatencyMs": 74,
  "decision": "ACK"
}

13.3 Trace Contract

Trace harus menghubungkan:

  • HTTP/API request;
  • DB transaction;
  • outbox row;
  • RabbitMQ publish span;
  • broker route metadata;
  • consumer receive span;
  • handler DB transaction;
  • downstream calls;
  • emitted follow-up message.

Kalau trace berhenti di producer dan tidak muncul di consumer, propagation belum benar.


14. Load Test Blueprint

14.1 Load Test Questions

Jangan mulai dari “berapa msg/sec?”. Mulai dari pertanyaan:

  1. Berapa throughput sustained untuk workload normal?
  2. Berapa p95/p99 end-to-end latency pada load normal?
  3. Berapa backlog drain rate setelah outage 30 menit?
  4. Berapa confirm latency saat quorum leader pindah?
  5. Berapa DLQ rate saat 1% message poison?
  6. Berapa max safe prefetch sebelum memory worker naik tidak terkendali?
  7. Berapa stream lag saat analytics consumer down 2 jam?
  8. Berapa lama recovery setelah broker node restart?
  9. Apakah outbox age tetap dalam SLA?
  10. Apakah business SLA tetap terpenuhi saat retry storm?

14.2 Test Matrix

DimensionValues
message size1KB, 10KB, 100KB
queue typequorum queue, stream
producer count1, 5, 20
consumer count1, 5, 20
prefetch1, 10, 50, 200
confirm modesync, batch, async
durabilitypersistent, transient for controlled comparison only
failurenone, broker restart, consumer crash, network cut
retrynone, 1%, 10% retryable failure
poison0%, 0.1%, 1% non-retryable

14.3 PerfTest Examples

# Baseline quorum queue throughput
rabbitmq-perf-test \
  --uri amqp://user:pass@rabbitmq:5672/%2f \
  --queue perf.orders.command.q \
  --queue-args x-queue-type=quorum \
  --producers 4 \
  --consumers 8 \
  --pmessages 1000000 \
  --size 1024 \
  --confirm 100 \
  --autoack false
# Prefetch sensitivity
rabbitmq-perf-test \
  --uri amqp://user:pass@rabbitmq:5672/%2f \
  --queue perf.prefetch.q \
  --queue-args x-queue-type=quorum \
  --producers 2 \
  --consumers 10 \
  --qos 50 \
  --size 4096 \
  --confirm 100

14.4 Stream PerfTest Examples

# Stream append/read throughput
stream-perf-test \
  --uris rabbitmq-stream://user:pass@rabbitmq:5552 \
  --stream perf.orders.audit.stream \
  --producers 4 \
  --consumers 4 \
  --size 1024 \
  --rate 50000
# Super stream partition test
stream-perf-test \
  --uris rabbitmq-stream://user:pass@rabbitmq:5552 \
  --stream perf.orders.analytics.super-stream \
  --super-stream \
  --producers 8 \
  --consumers 8 \
  --size 1024

Sesuaikan command dengan versi tool. Tujuan contoh ini adalah memberi bentuk eksperimen, bukan menggantikan dokumentasi CLI.

14.5 Benchmark Report Template

# RabbitMQ Benchmark Report

## Workload
- Message type:
- Payload size:
- Producer count:
- Consumer count:
- Queue/stream type:
- Durability:
- Confirm mode:
- Prefetch:

## Environment
- RabbitMQ version:
- Node count:
- CPU/memory:
- Storage:
- Network:
- JVM version:
- Client library version:

## Result
- Publish rate:
- Consume rate:
- End-to-end latency p50/p95/p99:
- Confirm latency p50/p95/p99:
- Queue depth max:
- Redelivery rate:
- Error rate:
- CPU/memory/disk/network:

## Interpretation
- Bottleneck:
- Safe capacity:
- Saturation signal:
- Recommended limit:
- Follow-up experiment:

15. Failure Test Blueprint

15.1 Failure Matrix

FailureExpected BehaviorProof
Producer crash before DB commitno outbox row, no publishDB and broker show no message
Producer crash after DB commit before publishoutbox pending, relay publishes lateroutbox age then published
Network timeout during publishpossible duplicate, no losssame messageId, consumer dedup
Broker node restartclient recovers, confirm may timeoutreconnect metrics, no silent loss
Consumer crash before commitmessage redeliveredredelivery metric
Consumer crash after commit before ackduplicate delivered, dedup absorbsinbox conflict, ack duplicate
Handler permanent validation errorDLQDLQ message with reason
Retryable downstream 503delayed retryretry headers and eventual success
DLQ replaycontrolled reprocessingaudit trail
Stream consumer down beyond retentionalert before data lossretention risk alert
Hot partitionlag isolated to partitionpartition lag dashboard
Disk alarmpublishing blocked/sloweddisk alarm metric + app backpressure

15.2 Chaos Drill Checklist

Run drill di staging:

  • kill consumer process during processing;
  • kill producer during outbox relay;
  • restart broker leader node;
  • inject network latency;
  • inject message schema error;
  • inject poison message;
  • pause downstream DB/API;
  • fill queue faster than consumer;
  • stop stream consumer until lag grows;
  • replay DLQ message;
  • rotate RabbitMQ password;
  • rotate TLS certificate;
  • perform rolling upgrade.

Drill tidak selesai sampai semua observability dan runbook terbukti bekerja.


16. Runbook Catalog

16.1 Queue Growth Runbook

Symptoms:

  • ready messages naik terus;
  • oldest message age naik;
  • consumer count normal tetapi egress rendah;
  • downstream latency tinggi.

Diagnosis:

  1. Cek ingress vs egress.
  2. Cek consumer errors.
  3. Cek downstream dependency.
  4. Cek prefetch dan unacked count.
  5. Cek resource broker: memory/disk/network.
  6. Cek apakah ada poison message blocking ordered queue.

Actions:

  • scale consumer jika bottleneck CPU handler;
  • reduce ingress atau shed load jika downstream saturated;
  • move poison message ke DLQ jika blocking;
  • tune prefetch jika unacked terlalu besar;
  • add partition/queue split jika workload perlu isolation;
  • escalate jika oldest age melewati SLA.

16.2 DLQ Spike Runbook

Symptoms:

  • DLQ rate meningkat;
  • error type sama berulang;
  • retry exhausted.

Diagnosis:

  1. Kelompokkan DLQ by error code/message type/schema version.
  2. Cek deploy terbaru producer/consumer.
  3. Cek contract change.
  4. Cek downstream outage.
  5. Cek sample payload.

Actions:

  • rollback bad producer jika schema rusak;
  • hotfix consumer jika parsing/compatibility bug;
  • replay setelah fix;
  • park message jika butuh manual decision;
  • discard hanya dengan approval dan audit.

16.3 Redelivery Storm Runbook

Symptoms:

  • redelivery rate tinggi;
  • same message id muncul berulang;
  • consumer CPU tinggi tetapi progress rendah.

Diagnosis:

  1. Cek retry classifier.
  2. Cek apakah basicNack(requeue=true) dipakai tanpa delay.
  3. Cek exception root cause.
  4. Cek poison message.
  5. Cek prefetch.

Actions:

  • stop consumer sementara jika storm menghancurkan system;
  • ubah requeue immediate menjadi delayed retry;
  • DLQ poison message;
  • patch handler;
  • replay controlled.

16.4 Publisher Confirm Latency Runbook

Symptoms:

  • confirm latency p99 tinggi;
  • outbox pending age naik;
  • publish throughput turun.

Diagnosis:

  1. Cek broker disk I/O.
  2. Cek quorum queue leader health.
  3. Cek replication latency.
  4. Cek network.
  5. Cek message size.
  6. Cek publisher in-flight window.

Actions:

  • reduce publish rate temporarily;
  • scale/redistribute workload;
  • split hot queue;
  • tune batching/in-flight confirms;
  • fix storage bottleneck;
  • escalate broker cluster issue.

16.5 Stream Retention Risk Runbook

Symptoms:

  • stream consumer lag besar;
  • oldest unread offset mendekati retention boundary;
  • replay consumer lambat.

Diagnosis:

  1. Cek append rate vs read rate.
  2. Cek consumer down duration.
  3. Cek retention policy.
  4. Cek partition hot spot.
  5. Cek downstream sink bottleneck.

Actions:

  • scale consumers jika safe;
  • pause noncritical producers if necessary;
  • increase retention if storage allows;
  • run catch-up mode;
  • snapshot/rebuild if data already expired;
  • document data loss if retention already passed.

17. Security and Governance Blueprint

17.1 Vhost Model

Gunakan virtual host untuk isolation:

/orders-prod
/orders-staging
/orders-dev
/shared-observability

Jangan campur dev/test/prod dalam vhost yang sama.

17.2 Permission Model

ActorConfigureWriteRead
topology operatorapproved resourcesnone or limitednone
order-service producernoneorders.command.x, orders.event.x, streamsnone
billing consumernoneretry/DLX if republish strategybilling.orders.event.q
ops replay toolnone/limitedoriginal exchangesDLQ/parking lot
monitoringnonenonemetrics/API only

17.3 Governance Checklist

  • Every exchange has owner.
  • Every queue has owner.
  • Every DLQ has owner and runbook.
  • Every routing key has schema contract.
  • Every message type has compatibility policy.
  • Every stream has retention policy.
  • Every replay tool has authorization/audit trail.
  • Every topology change uses PR.
  • Every secret has rotation plan.
  • Every tenant isolation rule is tested.

18. Architecture Decision Records

18.1 ADR: Queue vs Stream

# ADR: Use Quorum Queue for ReserveCreditCommand

## Status
Accepted

## Context
ReserveCreditCommand must be processed by exactly one logical worker. It does not require historical replay by many consumers. It requires retry, DLQ, and at-least-once processing.

## Decision
Use a durable quorum queue bound to `orders.command.x` with routing key `cmd.orders.reserve-credit.v1`.

## Consequences
- Stronger HA/durability profile than non-replicated queue.
- Consumer must be idempotent.
- Publish path must use publisher confirms.
- Retry/DLQ topology required.
- Queue growth must be monitored.

18.2 ADR: Stream for Audit

# ADR: Use RabbitMQ Stream for Order Audit Log

## Status
Accepted

## Context
Audit consumers need non-destructive reads, replay, and independent progress. Traditional queues remove messages after consumption and are not suitable as an audit log.

## Decision
Publish order audit events to `orders.audit.stream` with retention set by compliance requirement.

## Consequences
- Consumers track offsets.
- Replay tooling must be safe.
- Retention risk must be monitored.
- Stream is not source of truth; service DB remains source of truth for current state.

18.3 ADR: Transactional Outbox

# ADR: Use Transactional Outbox for Domain Event Publishing

## Status
Accepted

## Context
Order state changes and event publication must not diverge silently. Direct publish inside transaction cannot atomically commit with RabbitMQ.

## Decision
Write message to `message_outbox` in the same database transaction as domain state change. A relay publishes with publisher confirms.

## Consequences
- Event publication becomes eventually consistent.
- Duplicate publish is possible, so consumers must be idempotent.
- Outbox age is an SLO metric.

19. Implementation Package Structure

Contoh struktur Java service:

order-service/
  src/main/java/com/example/orders/
    api/
    domain/
      Order.java
      OrderState.java
      OrderService.java
    messaging/
      contract/
        MessageEnvelope.java
        MessageHeaders.java
        OrderSubmittedEvent.java
        ReserveCreditCommand.java
      outbox/
        OutboxMessage.java
        OutboxRepository.java
        RabbitOutboxRelay.java
      rabbit/
        RabbitConnectionManager.java
        ConfirmingRabbitPublisher.java
        RabbitTopologyVerifier.java
      consumer/
        IdempotentRabbitConsumer.java
        RetryClassifier.java
        BillingEventHandler.java
      stream/
        AuditStreamPublisher.java
        AuditStreamConsumer.java
      observability/
        MessagingMetrics.java
        MessagingTrace.java
      config/
        RabbitConfig.java
  src/test/java/
    messaging/
      OutboxRelayTest.java
      IdempotentConsumerTest.java
      RetryClassifierTest.java
      RabbitIntegrationTest.java

19.1 Library Boundary

Untuk organisasi besar, ekstrak reusable internal library:

platform-rabbitmq-core
  - envelope model
  - header conventions
  - publisher wrapper
  - consumer wrapper
  - retry classifier interfaces
  - metrics conventions
  - trace propagation
  - topology verifier

platform-rabbitmq-test
  - test fixtures
  - fake deliveries
  - testcontainers support
  - assertion helpers

platform-rabbitmq-spring
  - Spring Boot auto-configuration
  - RabbitTemplate wrappers
  - listener advice
  - health indicators

Jangan masukkan domain logic ke shared library. Shared library hanya menyediakan primitives dan guardrails.


20. Production Readiness Checklist

20.1 Design Checklist

  • Message type classified: command/event/stream/RPC.
  • Queue/stream primitive selected with reason.
  • Routing key documented.
  • Owner team defined.
  • Idempotency key defined.
  • Retry policy defined.
  • DLQ owner/runbook defined.
  • Message contract versioned.
  • PII classification done.
  • Ordering requirement reviewed.
  • Backpressure behavior defined.
  • Observability plan defined.

20.2 Producer Checklist

  • Transactional outbox used where DB state triggers message.
  • Publisher confirms enabled.
  • mandatory=true or alternate exchange used.
  • Returned/unroutable messages handled.
  • In-flight confirm bounded.
  • Confirm timeout handled.
  • Publish retry has backoff.
  • Outbox age metric exposed.
  • Message id stable across retries.
  • Correlation/causation propagated.

20.3 Consumer Checklist

  • Manual ack used.
  • Ack only after safe commit.
  • Idempotency implemented.
  • Duplicate tested.
  • Retry classifier implemented.
  • Poison message DLQ path tested.
  • Prefetch tuned.
  • Graceful shutdown tested.
  • Handler latency measured.
  • Redelivery metric exposed.

20.4 Broker Checklist

  • Queue type selected intentionally.
  • Durable topology.
  • DLX configured by policy where possible.
  • TTL/max length reviewed.
  • Quorum/stream replication reviewed.
  • Disk capacity planned.
  • Memory/disk alarms monitored.
  • Prometheus/management enabled securely.
  • Backup/restore tested.
  • Upgrade tested.

20.5 Stream Checklist

  • Retention configured.
  • Offset tracking strategy defined.
  • Replay safety reviewed.
  • Consumer lag monitored.
  • Hot partition monitored.
  • Super stream partition key defined.
  • Partition count decision documented.
  • Catch-up strategy tested.

20.6 Security Checklist

  • TLS enabled where required.
  • Credentials stored in secret manager.
  • Per-service users.
  • Least privilege permissions.
  • Vhost isolation.
  • Management UI access restricted.
  • Replay tool access controlled.
  • PII encrypted or excluded according to policy.
  • Audit logs retained.

21. Final Skill Rubric

Gunakan rubric ini untuk menilai apakah kamu sudah memiliki fluency level tinggi.

21.1 Level 1 — API User

Kamu bisa:

  • publish message;
  • consume message;
  • membuat queue/exchange basic;
  • memakai RabbitTemplate atau Java client.

Belum cukup untuk production ownership.

21.2 Level 2 — Reliable App Developer

Kamu bisa:

  • memakai manual ack;
  • memakai publisher confirms;
  • memahami prefetch;
  • membuat retry/DLQ;
  • menulis idempotent consumer;
  • menghindari auto-ack untuk workload durable.

Cukup untuk membangun feature dengan review senior.

21.3 Level 3 — Production Engineer

Kamu bisa:

  • mendesain topology multi-service;
  • membuat outbox/inbox;
  • melakukan failure modelling;
  • membuat observability dan runbook;
  • melakukan load test;
  • menangani DLQ/replay;
  • memilih queue vs stream;
  • mengoperasikan RabbitMQ cluster dengan aman.

Cukup untuk menjadi owner service messaging production.

21.4 Level 4 — Platform/Architecture Engineer

Kamu bisa:

  • membuat messaging platform standard untuk banyak tim;
  • membuat topology-as-code governance;
  • mendesain multi-tenant isolation;
  • membuat capacity model organisasi;
  • membuat reusable Java messaging library;
  • menentukan SLO/SLA messaging;
  • memimpin incident review;
  • membuat architecture decision yang defensible.

Ini level yang mendekati top-tier engineering practice.

21.5 Level 5 — Principal-Level Fluency

Kamu bisa:

  • menjelaskan trade-off RabbitMQ vs Kafka/Pulsar/SQS/NATS tanpa fanboy bias;
  • membedakan correctness problem dari tooling problem;
  • menolak desain broker yang salah walau terlihat “simple”;
  • menghubungkan messaging dengan domain state machine, compliance, audit, dan operational risk;
  • membuat desain yang tetap bisa dioperasikan oleh manusia;
  • memformalkan invariants, failure matrix, runbook, dan feedback loop.

Target seri ini adalah membawa kamu minimal ke Level 3 kuat, dan memberi struktur untuk mengejar Level 4-5.


22. Final Practice Plan

Gunakan 20-hour deliberate practice versi capstone:

Hour 1-3 — Topology Design

  • Buat exchange/queue/stream topology untuk domain order.
  • Tulis routing keys.
  • Tulis queue owner dan DLQ owner.
  • Buat Mermaid topology diagram.

Hour 4-6 — Producer Reliability

  • Implement outbox table.
  • Implement outbox relay.
  • Enable publisher confirms.
  • Simulasikan broker down.
  • Buktikan outbox tidak hilang.

Hour 7-9 — Consumer Correctness

  • Implement manual ack.
  • Implement inbox/dedup.
  • Crash consumer setelah DB commit sebelum ack.
  • Buktikan duplicate aman.

Hour 10-12 — Retry and DLQ

  • Implement delayed retry.
  • Inject retryable exception.
  • Inject non-retryable exception.
  • Replay DLQ secara controlled.

Hour 13-15 — Streams

  • Publish audit events ke stream.
  • Buat consumer dengan external checkpoint.
  • Replay dari offset lama.
  • Simulasikan retention risk.

Hour 16-18 — Observability and Load

  • Tambahkan metrics producer/consumer.
  • Buat dashboard minimal.
  • Jalankan benchmark.
  • Tentukan safe throughput dan alert threshold.

Hour 19-20 — Failure Review

  • Isi failure matrix.
  • Tulis ADR.
  • Tulis runbook.
  • Presentasikan design review seperti internal architecture review.

23. Common Capstone Mistakes

  1. Menggunakan RabbitMQ sebagai database.
  2. Menganggap publisher confirm berarti consumer sudah memproses message.
  3. Menganggap ack berarti business transaction pasti benar.
  4. Menggunakan auto ack untuk workload penting.
  5. Membuat retry infinite dengan immediate requeue.
  6. Mengabaikan returned/unroutable messages.
  7. Menaruh banyak subscriber pada queue yang sama padahal subscriber butuh isolation.
  8. Menggunakan event untuk command yang membutuhkan owner jelas.
  9. Menggunakan RPC untuk long-running workflow.
  10. Menganggap FIFO global ada.
  11. Mengabaikan duplicate delivery.
  12. Menganggap DLQ adalah solusi, padahal DLQ adalah alarm.
  13. Tidak punya replay policy.
  14. Tidak punya retention risk alert untuk stream.
  15. Tidak mengukur confirm latency.
  16. Tidak mengukur oldest message age.
  17. Tidak punya owner untuk topology.
  18. Tidak punya governance untuk schema.
  19. Tidak melakukan failure test.
  20. Tidak bisa menjawab “apa yang terjadi kalau crash di titik ini?”

24. Series Closure

Seri ini selesai di Part 035.

Kamu sekarang memiliki blueprint lengkap untuk:

  • memahami RabbitMQ sebagai broker, queue system, dan stream platform;
  • menggunakan Java client dan Spring AMQP tanpa bergantung pada magic abstraction;
  • membangun producer dan consumer yang reliable;
  • mendesain command/event/RPC/routing topology;
  • membangun retry/DLQ/parking lot;
  • menangani idempotency, ordering, deduplication, dan causality;
  • menggunakan RabbitMQ Streams dan Super Streams;
  • mendesain pipeline, windowing, batching, fanout, aggregator, saga;
  • melakukan performance modelling, benchmarking, dan JVM tuning;
  • deploy RabbitMQ cluster/Kubernetes Operator;
  • membuat observability, security, governance, compliance, dan runbook;
  • menyatukan semuanya menjadi production-grade platform blueprint.

Bagian terakhir ini bukan akhir pembelajaran. Ini adalah titik mulai untuk latihan nyata: ambil satu service, audit topology-nya, tulis failure matrix, lalu buktikan dengan chaos/load test.

Jika sebuah desain RabbitMQ tidak bisa menjelaskan message lifecycle, ack timing, duplicate handling, retry budget, DLQ ownership, stream retention, dan operational runbook, desain itu belum production-grade.


25. Official References

Gunakan referensi resmi berikut saat mengimplementasikan blueprint:

  • RabbitMQ AMQP 0-9-1 Concepts: exchange, queue, binding, routing.
  • RabbitMQ Java Client API Guide.
  • RabbitMQ Consumer Acknowledgements and Publisher Confirms.
  • RabbitMQ Reliability Guide.
  • RabbitMQ Dead Letter Exchanges.
  • RabbitMQ Quorum Queues.
  • RabbitMQ Streams and Super Streams documentation.
  • RabbitMQ Stream Java Client documentation.
  • RabbitMQ Kubernetes Cluster Operator documentation.
  • RabbitMQ Monitoring and Prometheus documentation.
  • RabbitMQ Production Checklist.
Lesson Recap

You just completed lesson 35 in final stretch. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.