Series MapLesson 10 / 35
Build CoreOrdered learning track

Learn Java Messaging Event Streaming Part 010 Rabbitmq Queue Types And Tradeoffs

19 min read3678 words
PrevNext
Lesson 1035 lesson track0719 Build Core

title: Learn Java Messaging and Event Streaming - Part 010 description: RabbitMQ queue types and operational trade-offs: classic queues, quorum queues, lazy behavior, priority queues, queue length limits, overflow, replication, durability, and failure modelling. series: learn-java-messaging-event-streaming seriesTitle: Learn Java Messaging and Event Streaming order: 10 partTitle: RabbitMQ Queue Types and Trade-Offs tags:

  • java
  • messaging
  • rabbitmq
  • queue
  • quorum-queue
  • classic-queue
  • priority-queue
  • backpressure
  • reliability
  • operations date: 2026-06-28

Part 010 — RabbitMQ Queue Types and Trade-Offs

Tujuan part ini adalah memahami bahwa “queue” di RabbitMQ bukan satu benda tunggal. Queue type menentukan durability, replication, latency, throughput, memory pressure, failover behavior, dan batas operasional. Salah memilih queue type bisa menjadi akar incident: message hilang, broker disk penuh, leader overload, requeue storm, atau upgrade path tertutup.

Di part sebelumnya kita membahas exchange, binding, routing key, dan queue sebagai materialisasi subscription. Sekarang kita masuk ke desain queue itu sendiri.

Pertanyaan yang harus bisa dijawab engineer senior:

  • Apakah queue ini boleh kehilangan message saat node mati?
  • Apakah queue ini perlu replicated dan highly available?
  • Apakah workload membutuhkan strict ordering atau throughput?
  • Apakah backlog bisa besar?
  • Apakah pesan besar atau kecil?
  • Apakah message perlu priority?
  • Apa yang terjadi saat queue terlalu panjang?
  • Apa recovery behavior setelah consumer down 2 jam?
  • Apa yang terjadi saat leader queue pindah node?
  • Apa yang terjadi saat disk hampir penuh?

Part ini membahas:

  • classic queue,
  • quorum queue,
  • lazy behavior dan disk-backed queue,
  • priority queue,
  • TTL dan queue length limit,
  • overflow strategy,
  • queue replication dan failure mode,
  • decision matrix untuk workload nyata.

1. Queue Type adalah Reliability Decision

Queue type bukan hanya parameter deklarasi.

Contoh deklarasi:

Map<String, Object> args = new HashMap<>();
args.put("x-queue-type", "quorum");

channel.queueDeclare(
    "case-escalation.commands.q",
    true,   // durable
    false,  // exclusive
    false,  // autoDelete
    args
);

Satu baris x-queue-type dapat mengubah:

  • bagaimana message disimpan,
  • apakah queue replicated,
  • bagaimana leader dipilih,
  • apa latency write,
  • fitur apa yang tersedia,
  • bagaimana queue recover setelah node failure,
  • bagaimana queue harus dimonitor.

Mental model:

Jangan memilih queue type dari tutorial. Pilih dari failure requirement.


2. Classic Queues

Classic queue adalah queue tradisional RabbitMQ. Ia cocok untuk banyak workload umum, terutama non-replicated queue, temporary queue, transient workload, dan use case yang tidak membutuhkan consensus replication.

Karakter umum:

  • simple,
  • familiar,
  • low overhead dibanding replicated consensus queue,
  • cocok untuk queue lokal/non-critical,
  • bisa durable atau non-durable,
  • bisa exclusive/auto-delete,
  • mendukung banyak fitur RabbitMQ queue.

Contoh:

channel.queueDeclare(
    "case-notification.domain-events.q",
    true,   // durable
    false,  // exclusive
    false,  // autoDelete
    Map.of("x-queue-type", "classic")
);

Jika tidak menentukan type, default ditentukan oleh broker/policy. Untuk production, lebih baik eksplisit melalui policy atau definitions agar tidak bergantung pada default yang tidak diketahui developer.

2.1 Kapan Classic Queue Cocok

Classic queue cocok untuk:

  • temporary reply queue,
  • transient work queue,
  • low-criticality background job,
  • queue yang bisa di-rebuild,
  • queue yang kehilangan beberapa message bisa diterima,
  • local dev/test,
  • workloads yang membutuhkan fitur yang tidak tersedia di quorum queue.

Contoh:

report-generation.preview.q
user-session-cleanup.q
temporary-request-reply.q
noncritical-telemetry-forwarder.q

2.2 Kapan Classic Queue Berisiko

Classic queue berisiko jika:

  • message tidak boleh hilang,
  • queue harus tetap available saat node failure,
  • queue menjadi critical command path,
  • backlog besar dan recovery perlu predictable,
  • operator mengira durable queue sama dengan replicated queue.

Durable classic queue berarti queue definition dan message persistent bisa bertahan restart node yang sama. Itu tidak sama dengan replicated high availability across nodes.

2.3 Classic Mirroring Legacy Warning

Di RabbitMQ versi modern, mirrored classic queues adalah jalur legacy/deprecated/removed tergantung versi. Untuk replicated durable queue, RabbitMQ mendorong penggunaan quorum queues atau streams, bukan mirrored classic queues.

Engineering implication:

  • Jangan memulai desain baru dengan mirrored classic queues.
  • Jika sistem lama masih memakai mirrored classic queues, buat migration plan.
  • Jangan menganggap tutorial lama RabbitMQ HA masih berlaku untuk RabbitMQ 4.x.

3. Quorum Queues

Quorum queue adalah replicated durable queue berbasis Raft-like consensus model. Ia dirancang untuk data safety dan availability yang lebih baik dibanding mirrored classic queue legacy.

Karakter umum:

  • replicated queue,
  • leader/follower model,
  • write harus direplikasi sesuai quorum,
  • cocok untuk durable critical workloads,
  • lebih predictable untuk failover dibanding classic mirroring lama,
  • fitur tidak identik dengan classic queue.

Diagram:

Message publish ke leader, lalu state direplikasi. Jika leader gagal, follower eligible bisa menjadi leader.

3.1 Kapan Quorum Queue Cocok

Quorum queue cocok untuk:

  • business-critical commands,
  • durable domain event subscription queue,
  • regulatory workflow step yang tidak boleh hilang,
  • financial/compliance notification pipeline,
  • workload yang membutuhkan replicated queue dan predictable failover.

Contoh:

case-escalation.commands.q
case-audit.domain-events.q
enforcement-decision.commands.q
regulatory-deadline-monitor.q

3.2 Trade-Off Quorum Queue

Quorum queue bukan “selalu lebih baik”. Ia membawa biaya:

AspekKonsekuensi
Write latencyLebih tinggi karena replication/consensus
ThroughputBisa lebih rendah daripada classic non-replicated untuk workload tertentu
DiskReplicated data memperbesar konsumsi storage
Leader placementHot leader bisa overload node tertentu
Feature compatibilityTidak semua fitur classic queue tersedia/sama
Operational complexityPerlu memahami quorum, member, leader, failover

Gunakan quorum queue ketika reliability requirement membutuhkannya, bukan karena terdengar enterprise.

3.3 Quorum Queue dan Poison Message

Quorum queue menyimpan delivery count dan mendukung poison message handling melalui delivery limit. Ini sangat berguna untuk mencegah infinite redelivery.

Model:

Prinsip:

  • Jangan rely pada requeue infinite.
  • Tetapkan delivery limit untuk workload critical.
  • Arahkan exceeded delivery ke DLX/DLQ.
  • Buat runbook replay/quarantine.

3.4 Quorum Queue dan Ordering

Quorum queue tetap queue, tetapi concurrency consumer bisa membuat processing completion out-of-order.

Jika butuh strict per-entity ordering:

  • pakai single active consumer pattern bila sesuai,
  • atau partition queue berdasarkan key,
  • atau gunakan stream/log model jika replay dan ordering per partition lebih cocok.

Queue ordering bukan hanya broker order. Ordering end-to-end mencakup:

publish order -> enqueue order -> delivery order -> processing order -> side-effect commit order

Jika consumer concurrency > 1, processing order bisa berbeda dari delivery order.


4. Lazy Queues and Disk-Backed Behavior

Historisnya, RabbitMQ punya lazy queues: queue yang berusaha memindahkan message ke disk sedini mungkin agar memory footprint rendah saat backlog besar. Pada versi modern, behavior queue storage berubah; sebagian lazy-mode semantics menjadi kurang relevan atau bergeser tergantung versi dan queue implementation.

Karena seri ini menargetkan engineering jangka panjang, mental model yang lebih aman adalah:

Jangan mendesain backlog besar dengan asumsi “lazy queue akan menyelamatkan memory”. Desain backlog, disk, flow control, TTL, retention, dan consumer capacity secara eksplisit.

4.1 Masalah yang Ingin Diselesaikan Lazy Behavior

Jika producer jauh lebih cepat daripada consumer, queue depth naik. Message menumpuk. Broker harus menyimpan message di memory/disk. Jika terlalu banyak message ada di memory, broker memory pressure meningkat.

Lazy/disk-backed behavior mencoba mengurangi memory pressure dengan menyimpan backlog di disk.

Namun disk bukan magic:

  • disk throughput terbatas,
  • disk latency lebih tinggi,
  • paging/reloading message memperlambat delivery,
  • disk full bisa menghentikan broker,
  • recovery bisa lama.

4.2 Kapan Backlog Besar Valid

Backlog besar kadang valid:

  • consumer downstream maintenance,
  • batch processing window,
  • regulatory archive delay,
  • disaster recovery catch-up,
  • temporary spike.

Tetapi backlog besar harus punya budget:

max backlog messages
max backlog bytes
max acceptable catch-up time
max disk usage
max consumer recovery rate
max message age

Tanpa angka ini, backlog besar hanyalah incident yang belum diberi nama.

4.3 Jangan Gunakan Queue sebagai Database

RabbitMQ queue bukan database untuk menyimpan event historis lama. Jika use case butuh retention panjang dan replay berkali-kali, pertimbangkan Kafka atau RabbitMQ Streams.

Queue cocok untuk work/subscription backlog. Stream/log cocok untuk retained ordered history.


5. Priority Queues

Priority queue memungkinkan message dengan priority lebih tinggi dideliver lebih dulu.

Contoh deklarasi:

Map<String, Object> args = new HashMap<>();
args.put("x-max-priority", 10);

channel.queueDeclare(
    "case-review.priority.q",
    true,
    false,
    false,
    args
);

Publish:

AMQP.BasicProperties props = new AMQP.BasicProperties.Builder()
    .deliveryMode(2)
    .priority(8)
    .messageId(commandId)
    .build();

channel.basicPublish("case.commands", "case.review", props, body);

5.1 Priority Queue Mengubah Fairness

Priority terlihat menarik, tetapi mengubah fairness dan latency distribution.

Jika high-priority traffic terus masuk, low-priority message bisa starvation.

5.2 Priority Levels Jangan Terlalu Banyak

Priority level terlalu banyak meningkatkan overhead dan membuat behavior sulit diprediksi.

Lebih baik:

0 = normal
5 = urgent
9 = critical

Daripada:

1..255 dengan semantics tidak jelas

5.3 Alternatif Priority Queue

Kadang lebih baik memakai queue terpisah:

case-review-critical.q
case-review-normal.q

Lalu consumer memilih polling/weighting:

Kelebihan queue terpisah:

  • observability lebih jelas,
  • capacity bisa dipisah,
  • starvation lebih mudah dikontrol,
  • DLQ/retry berbeda.

Kekurangan:

  • consumer logic lebih kompleks,
  • topology bertambah.

Gunakan priority queue jika priority memang properti message dalam queue yang sama. Gunakan queue terpisah jika priority adalah class of service yang perlu capacity dan SLO berbeda.


6. TTL: Message TTL and Queue TTL

TTL menentukan umur message atau queue.

6.1 Message TTL

Message TTL membatasi berapa lama message boleh berada di queue.

Use case:

  • notification yang basi setelah 1 jam,
  • temporary workflow reminder,
  • retry delay topology,
  • request-reply response timeout.

Risiko:

  • message expired bisa hilang atau dead-letter tergantung DLX,
  • TTL terlalu pendek menyebabkan silent business gap,
  • TTL tidak mengganti SLA monitoring.

6.2 Queue TTL

Queue TTL/expires menghapus queue jika tidak digunakan dalam periode tertentu.

Cocok untuk:

  • temporary queues,
  • dynamic reply queues,
  • short-lived workers.

Tidak cocok untuk:

  • business-critical durable queues,
  • audit queues,
  • long-lived subscriptions.

7. Queue Length Limit and Overflow

Queue length limit membatasi jumlah message atau total bytes dalam queue.

Ini adalah safety guard, bukan capacity planning penuh.

Contoh:

Map<String, Object> args = new HashMap<>();
args.put("x-max-length", 100_000);
args.put("x-overflow", "reject-publish");

channel.queueDeclare(
    "case-notification.domain-events.q",
    true,
    false,
    false,
    args
);

7.1 Default Overflow: Drop Head / Dead-Letter Oldest

Secara umum, ketika queue mencapai batas dan overflow default berlaku, message paling lama dapat didrop atau dead-letter jika DLX dikonfigurasi.

Cocok untuk:

  • telemetry,
  • cache invalidation,
  • non-critical signal,
  • workload yang lebih mementingkan data terbaru.

Berbahaya untuk:

  • commands,
  • audit event,
  • legal/regulatory workflow,
  • payment/enforcement decision.

7.2 Reject Publish

Dengan reject-publish, message baru ditolak saat queue penuh. Jika publisher confirms aktif, publisher dapat menerima nack.

Cocok ketika producer harus merasakan backpressure.

Trade-off:

  • upstream harus punya retry/backoff,
  • producer availability bisa terdampak,
  • lebih baik daripada silent drop untuk business-critical workload.

7.3 Reject Publish DLX

Beberapa queue type/versi mendukung variasi overflow yang dead-letter message yang ditolak. Namun fitur support berbeda antar queue type, jadi jangan mengasumsikan semua queue type mendukung mode yang sama.

Prinsip:

Queue limit policy harus diuji pada queue type dan versi RabbitMQ yang benar-benar dipakai.

7.4 Queue Limit sebagai Circuit Breaker

Queue limit bisa menjadi circuit breaker:

  • jika downstream mati, backlog tidak tumbuh tanpa batas,
  • producer dipaksa backoff,
  • operator menerima alert,
  • sistem mencegah disk full total.

Tetapi untuk regulated system, menolak publish command/event harus dianggap incident atau controlled degradation, bukan normal behavior tersembunyi.


8. Durability Matrix

Durability RabbitMQ adalah kombinasi beberapa layer.

Exchange durableQueue durableMessage persistentPublisher confirmNode survives?Message safety expectation
NoAnyAnyAnyNoTopology/message bisa hilang
YesNoAnyAnyNoQueue/message bisa hilang
YesYesNoAnyPartialMessage transient bisa hilang
YesYesYesNoBetterProducer tidak tahu broker responsibility
YesYesYesYesGood on same nodeMasih bukan cross-node HA jika non-replicated
YesQuorumYesYesBetter HAReplicated, tetap perlu monitor quorum/disk

Durability bukan binary. Untuk business-critical flow, minimum baseline:

  • durable exchange,
  • durable queue,
  • persistent message,
  • publisher confirms,
  • manual consumer ack,
  • idempotent consumer,
  • DLQ/retry,
  • broker storage monitoring,
  • tested recovery.

9. Queue Type Decision Matrix

RequirementRecommended starting pointReason
Critical command queueQuorum queueReplicated durability and safer failover
Audit event subscriptionQuorum queue or streamQueue if subscription backlog; stream if replay history
Temporary reply queueClassic exclusive auto-deleteShort-lived and not critical after connection ends
Non-critical background jobClassic durable or transientLower overhead
High-throughput retained historyRabbitMQ Streams/KafkaQueue is not long-term log
Priority handlingPriority classic queue or separate queuesDepends on fairness/observability needs
Very large backlogRe-evaluate architectureQueue can buffer, but not replace capacity planning
Legacy mirrored classic queueMigrateModern RabbitMQ recommends quorum/streams instead

10. Queue Declaration Arguments: Use Carefully

Queue behavior can be controlled by arguments or policies. Prefer policy for operational attributes that may need changes without app redeploy.

Examples:

Map<String, Object> args = new HashMap<>();
args.put("x-queue-type", "quorum");
args.put("x-delivery-limit", 5);
args.put("x-dead-letter-exchange", "regulatory.platform.dlx");
args.put("x-dead-letter-routing-key", "case-escalation.commands.dlq");

channel.queueDeclare(
    "case-escalation.commands.q",
    true,
    false,
    false,
    args
);

Caution:

  • Some arguments cannot be changed after queue creation.
  • Declaring existing queue with different immutable args can fail.
  • App-declared args can fight operator policies.
  • Version compatibility matters.

Untuk production, dokumentasikan:

queue: case-escalation.commands.q
owner: case-service
queueType: quorum
durable: true
messagePersistenceRequired: true
dlx: regulatory.platform.dlx
retryPolicy: 5 attempts, delayed retry, then quarantine
maxLength: 100000
overflow: reject-publish
expectedThroughput: 500 msg/s
maxBacklogBytes: 20GiB
recoveryTimeObjective: 30m

11. Replication and Leader Placement

Quorum queue punya leader. Semua queue leader yang panas di satu node bisa membuat node itu overload.

Masalah:

  • NodeA CPU/disk/network tinggi.
  • Failover NodeA menyebabkan banyak leader election.
  • Throughput cluster tidak seimbang.

Mitigasi:

  • distribute queue leaders,
  • monitor per-node queue leader count,
  • separate hot workloads,
  • avoid too many tiny critical queues jika overhead besar,
  • capacity test failover.

12. Consumer Concurrency and Queue Type

Queue type tidak menghapus problem consumer concurrency.

Jika queue punya 10 consumers:

Maka:

  • throughput naik,
  • processing order bisa berubah,
  • duplicate redelivery tetap mungkin,
  • downstream bisa overload,
  • unacked messages perlu dimonitor.

Untuk strict ordering per case:

  • satu queue per partition key class,
  • consistent hash exchange/plugin jika sesuai,
  • single active consumer jika workload cocok,
  • atau pindah ke stream/log partitioning model.

Jangan mengklaim “RabbitMQ queue menjaga ordering” tanpa menyebut consumer concurrency dan ack behavior.


13. Queue as Backpressure Surface

Queue depth adalah sinyal tekanan.

Tetapi queue depth sendiri tidak cukup.

Monitor:

MetricMakna
Ready messagesBelum dikirim ke consumer
Unacked messagesSudah dikirim tetapi belum ack
Publish rateLaju masuk
Deliver/get rateLaju keluar broker
Ack rateLaju selesai consumer
Redeliver rateFailure/retry pressure
Consumer countKapasitas aktif
Message ageSLA backlog nyata
Disk freeSurvival budget
Memory watermarkBroker pressure

Important distinction:

queue depth high + ack rate high = catch-up mungkin sehat
queue depth high + ack rate zero = consumer mati/stuck
unacked high + CPU low = consumer blocked downstream
redelivery high = poison/retry storm
publish rate > ack rate sustained = capacity deficit

14. Disk Full Failure Mode

Queue backlog akhirnya menjadi storage problem.

Failure chain:

Mitigation design:

  • per-queue max length/bytes,
  • publisher backpressure handling,
  • DLQ/quarantine capacity,
  • disk alert well before alarm,
  • consumer autoscaling or manual runbook,
  • reject-publish for critical backpressure rather than silent drop,
  • replay/reprocess tool.

For regulated systems, disk-full prevention is part of defensibility. “The queue filled up” is not a root cause; it is a symptom of missing capacity/failure control.


15. DLQ Strategy Depends on Queue Type

Dead-lettering is routing. Queue type controls original queue behavior, but DLQ itself also needs queue type decision.

Question:

  • Should DLQ be quorum?
  • How long should DLQ keep messages?
  • Can DLQ grow without bound?
  • Who owns replay?
  • Are DLQ messages PII-sensitive?
  • What schema is used for DLQ metadata?

Pattern:

DLQ should not be a trash bin. It is a quarantine system.

DLQ payload should preserve:

  • original body,
  • original properties,
  • routing key,
  • exchange,
  • failure reason if available,
  • first failure time,
  • last failure time,
  • consumer/app version,
  • correlation ID,
  • trace ID.

16. Retry Queue Topology and Queue Type

A common RabbitMQ retry design uses TTL + DLX:

Queue type choices:

  • Main critical queue: quorum.
  • Retry queues: depends on criticality and volume.
  • Final DLQ: often quorum if messages are critical evidence.

Trade-off:

  • Quorum retry queues improve safety but increase replication cost.
  • Classic retry queues reduce overhead but may lose delayed messages during failure if not durable/persistent/available enough.

For critical enforcement actions, losing retry message may mean missing legal deadline. That changes queue type decision.


17. Version and Feature Compatibility

RabbitMQ evolves. Queue features differ by version and queue type. For example, mirrored classic queues were deprecated and then removed in modern major versions; quorum queues do not support every classic queue feature; lazy mode behavior depends on queue implementation/version.

Engineering rule:

Treat RabbitMQ queue behavior as versioned infrastructure contract. Validate against the exact broker version and enabled feature flags/policies used in production.

Do not rely on:

  • old blog posts without version,
  • tutorials using default queue type,
  • StackOverflow snippets,
  • local Docker default behavior,
  • assumptions from other brokers.

Maintain a platform compatibility matrix:

RabbitMQ version: 4.x
Default queue type: quorum/classic? controlled by policy
Allowed queue types: quorum, classic, stream
Mirrored classic queues: not allowed
Lazy mode: not allowed / version-specific
Priority queues: allowed only for approved workloads
Max queue length policy: mandatory for non-temporary queues
DLX policy: mandatory for critical queues

18. Case Study: Case Escalation Command Queue

Requirement:

  • case.escalate command must not be lost.
  • Duplicate processing is acceptable only if idempotent.
  • SLA deadline is legally important.
  • Consumer calls database and notification service.
  • If message invalid, quarantine.
  • If downstream unavailable, retry with delay.

Design:

exchange: regulatory.case.commands
type: direct
routingKey: case.escalate
queue: case-escalation.commands.q
queueType: quorum
durable: true
messagePersistence: required
publisherConfirms: required
consumerAck: manual
prefetch: 10
deliveryLimit: 5
dlx: regulatory.platform.dlx
retry: exponential delayed retry
finalDlq: case-escalation.commands.dlq
idempotencyKey: commandId

Flow:

Why quorum?

  • Command is business-critical.
  • Queue backlog is part of legal process state until consumed.
  • Node failure should not erase command.

Why manual ack?

  • Ack must happen after database transaction safe point.

Why idempotency?

  • Redelivery can happen after consumer crash.

Why DLQ?

  • Invalid command should not block queue forever.

19. Case Study: Non-Critical Notification Queue

Requirement:

  • Send notification after case created/escalated.
  • Missing one notification is bad but not legally state-changing.
  • User can see status in portal anyway.
  • High volume spike possible.
  • Old notification after 24h is useless.

Possible design:

exchange: regulatory.domain.events
queue: case-notification.domain-events.q
queueType: classic or quorum depending on business SLO
durable: true
messageTtl: 24h
maxLength: 500000
overflow: reject-publish or drop-head depending business decision
dlx: case-notification.domain-events.dlq
retry: delayed retry 5m/30m/2h

Decision discussion:

  • If notification is compliance-required, use quorum and strict DLQ.
  • If notification is convenience, classic may be acceptable.
  • If old messages are worthless, TTL is valid.
  • If queue full, drop-head may be acceptable only if business explicitly accepts losing oldest notifications.

Do not let engineer choose this alone. This is product/compliance decision encoded as infrastructure.


20. Anti-Patterns

20.1 Durable Classic Queue Assumed as HA

Durable means survives broker restart on same node. It does not automatically mean replicated across nodes.

Better: choose quorum queue or stream if HA/replication is required.

20.2 Quorum Queue for Everything

Quorum queue adds replication cost. Using it for every temporary/non-critical queue can waste disk and reduce throughput.

Better: classify workload criticality.

20.3 Priority Queue as SLA Strategy

Priority does not create capacity. It only changes service order. If system is overloaded, lower priority may starve.

Better: separate capacity pools or enforce admission control.

20.4 Unlimited Queue as Safety

Unlimited queue hides downstream failure until disk/memory incident.

Better: bounded queue, backpressure, alert, and explicit degradation.

20.5 DLQ Without Owner

DLQ with no owner is delayed data loss.

Better: define ownership, alert, replay tool, retention, and audit process.

20.6 Retry Queue Without Attempt Budget

Infinite delayed retry creates unbounded repeated failure.

Better: retry budget, delivery count, final DLQ.

20.7 Version-Blind Queue Arguments

Copying x-queue-mode=lazy or mirrored classic settings from old examples can break on modern RabbitMQ.

Better: validate against exact production RabbitMQ version.


21. Operational Runbook per Queue

Every important queue should have a runbook.

Template:

# Queue Runbook: case-escalation.commands.q

## Owner
Case Platform Team

## Purpose
Durable command queue for legally significant case escalation.

## Queue Type
Quorum queue, 3 members.

## Expected Rates
- publish: 50 msg/s normal, 500 msg/s peak
- consume: 100 msg/s normal capacity

## SLO
- p95 processing latency < 30s
- max backlog age < 5m

## Alerts
- ready > 10,000 for 5m
- unacked > 1,000 for 5m
- redelivery rate > 5 msg/s
- DLQ rate > 0 for critical commands
- disk free < 30%

## Failure Actions
1. Check consumer deployment health.
2. Check downstream DB/notification dependencies.
3. Inspect sample message from queue/DLQ.
4. Pause producer if reject/nack storm occurs.
5. Scale consumers only if downstream can handle it.
6. Replay DLQ after fix with idempotency check.

## Do Not
- Purge queue without incident commander approval.
- Requeue DLQ blindly.
- Increase max length without disk capacity review.

22. Review Checklist

Queue Type

  • Queue type explicitly chosen.
  • Choice justified by reliability requirement.
  • Feature compatibility checked for RabbitMQ production version.
  • Mirrored classic queues avoided for new design.
  • Quorum queue used for critical replicated workloads.
  • Classic queue used intentionally for lower-risk/transient workloads.

Durability

  • Durable queue if message must survive restart.
  • Persistent messages if message must survive restart.
  • Durable exchange used.
  • Publisher confirms enabled for critical publishers.
  • Manual ack used for critical consumers.

Capacity

  • Expected publish/consume rate known.
  • Backlog budget known.
  • Queue length/byte limit considered.
  • Overflow behavior explicit.
  • Disk capacity model exists.
  • Message TTL is business-approved if used.

Failure Handling

  • DLX configured.
  • DLQ owner defined.
  • Retry budget defined.
  • Poison message handling defined.
  • Idempotency key defined.
  • Replay process exists.

Observability

  • Ready/unacked monitored.
  • Redelivery monitored.
  • Consumer count monitored.
  • Message age monitored.
  • Disk/memory alarm monitored.
  • Queue leader distribution monitored for quorum workloads.

23. Latihan Terarah

Latihan 1 — Queue Classification

Klasifikasikan queue berikut:

case-audit.domain-events.q
case-notification.domain-events.q
case-escalation.commands.q
report-preview-render.q
temporary-reply-abc123.q
enforcement-decision.commands.q

Untuk masing-masing, tentukan:

  • queue type,
  • durable atau tidak,
  • persistent message atau tidak,
  • DLQ atau tidak,
  • TTL atau tidak,
  • max length/overflow,
  • alasan failure requirement.

Latihan 2 — Disk Full Simulation

Skenario:

Notification consumer mati 6 jam.
Producer tetap publish 2,000 msg/s.
Average message size 4 KB.

Hitung:

  • total message backlog,
  • total body bytes minimum,
  • apakah disk budget cukup,
  • catch-up time jika consumer bisa 5,000 msg/s setelah recovery,
  • alert apa yang harus berbunyi sebelum disk alarm.

Latihan 3 — Priority Decision

Skenario:

Case review punya normal, urgent, critical.
Critical tidak boleh menunggu normal.
Normal tetap harus selesai dalam 24 jam.

Bandingkan:

  • satu priority queue,
  • tiga queue terpisah,
  • weighted consumer,
  • admission control.

Pilih desain dan jelaskan trade-off.


24. Ringkasan

Queue type adalah keputusan reliability dan operability.

Classic queue cocok untuk banyak workload umum, temporary, dan non-critical. Quorum queue cocok untuk critical replicated queue dan pengganti desain mirrored classic legacy. Priority queue membantu service ordering tetapi bisa membuat starvation. Lazy/disk-backed behavior bukan pengganti capacity planning. TTL dan length limit adalah safety control yang harus dikaitkan dengan business meaning. Overflow behavior harus dipilih secara sadar: drop oldest, dead-letter, atau reject publish memiliki konsekuensi berbeda.

Prinsip utama:

Queue type follows failure requirement.
Durability is multi-layer.
Replication has cost.
Backlog needs budget.
DLQ needs owner.
Retry needs limit.
Monitoring needs message age, not only queue depth.

Di sistem regulatory/case-management, queue bukan sekadar buffer. Queue bisa menjadi temporary holder untuk proses hukum, escalation command, audit event, atau notification obligation. Karena itu, setiap queue critical harus punya owner, SLO, failure policy, replay strategy, dan runbook.

Part berikutnya akan membahas RabbitMQ consumer design: prefetch, ack, nack, requeue, backpressure, competing consumers, dan slow-consumer failure modes secara lebih dalam.


References

Lesson Recap

You just completed lesson 10 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.