Series/Learn Java RabbitMQ, RabbitMQ Streams, Patterns, and Deployment In Action

Final StretchOrdered learning track

Observability and Operations: Metrics, Logs, Traces, Alerts, Runbooks

Learn Java RabbitMQ, RabbitMQ Streams, Patterns, and Deployment In Action - Part 033

Production-grade observability for Java RabbitMQ systems: broker metrics, queue metrics, stream lag, producer/consumer telemetry, tracing, alerts, dashboards, and incident runbooks.

[2026-07-02]20 min read3815 words

In This Lesson

1. Kaufman Framing: Learn Enough to Self-Correct in Production 2. Observability Mental Model 3. Golden Signals for Messaging Systems

PrevNext

Lesson 3335 lesson track30–35 Final Stretch

#java#rabbitmq#observability#prometheus+6 more

Part 033 — Observability and Operations: Metrics, Logs, Traces, Alerts, Runbooks

At this point in the series, we already know how to design RabbitMQ producers, consumers, queues, exchanges, retries, streams, super streams, batching, and deployment topology. The remaining production question is different:

When the system is wrong, slow, overloaded, unsafe, or silently degrading, how do we know fast enough to protect users and data?

Observability is not a dashboard collection. It is the engineering discipline of making a distributed messaging system explain itself under failure.

For RabbitMQ, observability must cover four layers:

Application layer — Java producers, consumers, workers, thread pools, retries, outbox relays, idempotency stores.
Protocol layer — connection churn, channels, publisher confirms, consumer acknowledgements, prefetch, redelivery, unroutable messages.
Broker layer — node health, memory, disk, file descriptors, Erlang process pressure, alarms, queue leaders, stream replicas.
Business layer — order stuck, quote not generated, invoice delayed, regulatory case not escalated, SLA breach.

A senior engineer does not stop at messages_ready and messages_unacknowledged. Those are useful, but they are symptoms. Production observability asks:

Which business flow is stuck?
Which queue or stream partition is accumulating lag?
Which consumer group is slow?
Which producer is publishing faster than the system can safely absorb?
Which retry policy is hiding a poison message?
Which queue leader or stream replica is unhealthy?
Which release introduced confirm latency, redelivery, or DLQ growth?
Which incident response should be executed now?

This part builds that operating model.

1. Kaufman Framing: Learn Enough to Self-Correct in Production

Josh Kaufman's learning model emphasizes rapid feedback: break the skill down, learn enough to notice mistakes, remove friction, and practice deliberately. In RabbitMQ operations, the equivalent is:

Kaufman Principle	RabbitMQ Observability Translation
Deconstruct the skill	Separate producer, broker, queue, stream, consumer, storage, network, and business symptoms.
Learn enough to self-correct	Know which metrics disprove your hypothesis.
Remove practice barriers	Make dashboards, logs, traces, and runbooks available before incidents.
Practice deliberately	Run failure drills: consumer crash, broker restart, disk alarm, DLQ spike, duplicate storm.

The core skill is not memorizing metric names. The core skill is causal diagnosis.

You should be able to answer:

Is this a producer overload, broker resource issue, topology issue, consumer bottleneck, downstream dependency failure, schema failure, retry storm, or stream retention risk?

2. Observability Mental Model

A RabbitMQ system is a chain of custody for messages.

Each edge has a different observability question.

Edge	Question	Signal
Producer → Exchange	Are publishes accepted safely?	confirm latency, nack count, returned messages, publish error rate
Exchange → Queue	Are messages routed as expected?	unroutable returns, alternate exchange volume, binding drift
Queue → Consumer	Can consumers keep up?	ready, unacked, redelivery, delivery rate, consumer count, prefetch
Consumer → DB	Is downstream blocking processing?	handler latency, DB latency, retry count, transaction errors
Consumer → Queue	Are acks safe and timely?	ack latency, unacked age, redelivery rate, duplicate count
Broker → Producer	Is broker applying pressure?	flow control, blocked connection, memory alarm, disk alarm

The most useful dashboards follow the message lifecycle, not the organizational chart.

3. Golden Signals for Messaging Systems

For HTTP services, golden signals are often latency, traffic, errors, and saturation. For RabbitMQ systems, use a messaging-specific version:

Signal	Meaning	Examples
Ingress	How fast messages enter the system	publish rate, confirmed publish rate, returned message rate
Backlog	How much work is waiting	queue depth, stream lag, oldest message age
Processing	How fast work is completed	delivery rate, ack rate, handler success rate
Failure	How much work fails or repeats	nack rate, redelivery rate, DLQ rate, retry attempt count
Safety	Whether data can be trusted	confirm latency, ack-after-commit, offset commit lag, dedup hit rate
Saturation	Whether resources are near limits	memory watermark, disk free, file descriptors, connection/channel count
Freshness	Whether users/business are waiting too long	end-to-end age, SLA age, stream retention headroom

For production, oldest message age is often more meaningful than queue length. A queue with 100,000 tiny low-priority messages might be fine. A queue with 12 messages older than a regulatory SLA might be an incident.

4. Metric Taxonomy

4.1 Broker Node Metrics

Node metrics describe whether RabbitMQ can keep operating safely.

Track:

Node up/down.
Cluster membership.
Memory used.
Memory high watermark state.
Disk free.
Disk free alarm state.
File descriptor usage.
Socket descriptor usage.
Erlang process usage.
Connection count.
Channel count.
Queue count.
Stream count.
Network partitions.
Inter-node communication health.
GC/runtime pressure if exposed.

What they tell you:

Metric	Risk
Memory near watermark	Broker may block publishers.
Disk free below threshold	Broker may block publishers or become unsafe.
Connection churn	Bad client lifecycle, unstable network, load balancer issue.
Channel growth	Channel leak or per-message channel anti-pattern.
File descriptor pressure	Too many connections/sockets/queues or OS limit too low.
Queue leader concentration	Hot node, uneven workload, leader placement issue.

4.2 Queue Metrics

Queue metrics describe backlog and consumer progress.

Track per critical queue:

messages_ready.
messages_unacknowledged.
publish rate.
deliver/get rate.
ack rate.
redeliver rate.
consumer count.
consumer capacity / utilisation when available.
oldest message age.
queue length limit state.
DLX/dead-letter volume.
quorum queue leader and replica health.
delivery limit events for quorum queues.

Interpretation:

Pattern	Likely Meaning
Ready grows, unacked low	Not enough consumers, consumers stopped, routing spike, or prefetch too low.
Unacked grows, ready low	Consumers received work but are stuck or slow.
Ready and unacked both grow	Arrival rate exceeds total processing capacity.
Redelivery rate grows	Consumer crash, nack loop, poison message, timeout, or duplicate storm.
Consumer count drops	Deployment, connection issue, container crash, credential issue.
Ack rate below publish rate	Backlog will grow unless publish rate falls.

4.3 Stream Metrics

Stream systems are not consumed destructively. Queue depth alone is the wrong mental model.

Track:

Append rate.
Consumer read rate.
Consumer group lag.
Offset commit rate.
Oldest retained segment timestamp.
Retention headroom.
Segment/chunk storage usage.
Stream leader and replica health.
Super stream partition distribution.
Hot partition publish/read rate.
Consumer restart count.
Replay read rate.

Critical question:

Will a consumer lose the ability to replay before it catches up?

For streams, lag + retention headroom matters more than backlog size.

4.4 Producer Metrics

A Java producer must expose:

publish attempts.
publish success.
publish failures.
publisher confirms received.
publisher nacks.
confirm latency histogram.
in-flight confirms.
returned messages.
publish retries.
outbox relay lag.
local producer buffer size.
blocked connection duration.
topology declaration failures.
serialization errors.

Producer health is bad when:

confirm latency rises.
in-flight confirms saturate.
returned messages appear unexpectedly.
outbox lag grows.
blocked connection callback fires.
publish retry rate grows.

4.5 Consumer Metrics

A Java consumer must expose:

deliveries received.
handler success count.
handler failure count.
ack count.
nack/reject count.
redelivery count.
handler latency histogram.
ack latency.
consumer active/inactive gauge.
executor queue depth.
executor active threads.
downstream DB/API latency.
idempotency dedup hits.
poison message count.
DLQ publish count.
graceful shutdown drain time.

Consumer health is bad when:

handler p95/p99 latency increases.
unacked messages age.
executor queue grows.
redelivery rises.
ack rate falls behind delivery rate.
dedup hits spike after a release.

5. Minimal Metric Naming Model for Java Services

Use stable names. Avoid metric labels with unbounded cardinality such as messageId, userId, orderId, or raw routing key containing tenant-specific high-cardinality values.

Good labels:

service
environment
exchange
queue
stream
consumer_group
message_type
result
exception_class
retryable
operation

Dangerous labels:

message_id
correlation_id
tenant_id when tenant count is large
full exception message
dynamic routing key with identifiers
payload field values

Example Micrometer-style metric set:

public final class RabbitConsumerMetrics {
    private final Counter deliveries;
    private final Counter successes;
    private final Counter failures;
    private final Counter acks;
    private final Counter nacks;
    private final Counter redeliveries;
    private final Timer handlerTimer;
    private final AtomicInteger executorQueueDepth;

    public RabbitConsumerMetrics(MeterRegistry registry, String queue, String messageType) {
        Tags tags = Tags.of(
            "queue", queue,
            "message_type", messageType
        );

        this.deliveries = Counter.builder("rabbit.consumer.deliveries")
            .tags(tags)
            .register(registry);

        this.successes = Counter.builder("rabbit.consumer.handler.completed")
            .tags(tags.and("result", "success"))
            .register(registry);

        this.failures = Counter.builder("rabbit.consumer.handler.completed")
            .tags(tags.and("result", "failure"))
            .register(registry);

        this.acks = Counter.builder("rabbit.consumer.acks")
            .tags(tags)
            .register(registry);

        this.nacks = Counter.builder("rabbit.consumer.nacks")
            .tags(tags)
            .register(registry);

        this.redeliveries = Counter.builder("rabbit.consumer.redeliveries")
            .tags(tags)
            .register(registry);

        this.handlerTimer = Timer.builder("rabbit.consumer.handler.duration")
            .publishPercentileHistogram()
            .tags(tags)
            .register(registry);

        this.executorQueueDepth = registry.gauge(
            "rabbit.consumer.executor.queue.depth",
            tags,
            new AtomicInteger(0)
        );
    }
}

The metric names are intentionally domain-specific. Broker metrics tell you broker state. Application metrics tell you why the broker state is changing.

6. Logs: Event-Level Forensics Without Payload Leakage

Logs are not metrics. Metrics are aggregate signals. Logs are forensic records.

A production RabbitMQ Java service should log message lifecycle transitions at key boundaries:

message received
idempotency decision
business validation failure
downstream call failure
retry classification
DLQ/parking-lot decision
ack/nack/reject decision
publish confirmed/nacked/returned
stream offset committed
graceful shutdown start/end

Do not log full payloads by default. Payloads often contain PII, secrets, commercial data, or regulatory-sensitive content.

Use structured logs:

{
  "event": "rabbit_message_processed",
  "service": "quote-worker",
  "queue": "quote.command.generate.v1.qq",
  "messageType": "quote.generate.requested.v1",
  "messageId": "01JZ...",
  "correlationId": "case-88301",
  "causationId": "cmd-9912",
  "deliveryTag": 88112,
  "redelivered": false,
  "attempt": 1,
  "durationMs": 184,
  "result": "success"
}

A useful log line should answer:

What message type?
Which correlation chain?
Which queue/stream?
Which attempt?
Which decision?
Which safe identifier can be used to reconstruct the incident?

A dangerous log line leaks:

full payload
credentials
PII
tokens
internal certificate material
unbounded headers
raw exception stack with secrets in URLs

7. Tracing: Causality Across Producer, Broker, Consumer, and Side Effects

RabbitMQ is asynchronous. Without trace propagation, causality disappears.

Trace propagation needs two concepts:

Technical trace context — W3C traceparent, tracestate, OpenTelemetry baggage if used.
Business correlation context — correlationId, causationId, messageId, workflow id, aggregate id.

Do not confuse them.

Field	Purpose
`traceparent`	Distributed tracing propagation.
`messageId`	Unique message identity.
`correlationId`	End-to-end business/request correlation.
`causationId`	What caused this message.
`aggregateId`	Business entity ordering/partitioning key.
`workflowId`	Long-running orchestration correlation.

7.1 Trace Shape

Each span should preserve the same trace context or link to the consumed message span. For asynchronous boundaries, span links are often more accurate than parent-child semantics because the consumer may run long after the producer span ended.

8. Dashboards: What to Build First

8.1 Executive Flow Dashboard

Purpose: answer, “Are customer/business flows healthy?”

Panels:

End-to-end business latency per flow.
Number of messages older than SLA.
DLQ count by flow.
Retry rate by flow.
Outbox lag by producer service.
Consumer lag by critical queue/stream.
Error budget burn.

This dashboard is for engineering leads, incident commanders, and product stakeholders.

8.2 Broker Health Dashboard

Purpose: answer, “Can RabbitMQ operate safely?”

Panels:

Node up/down.
Memory used vs watermark.
Disk free vs threshold.
File descriptors.
Connection count and churn.
Channel count.
Queue count.
Stream count.
Network partitions.
Cluster alarms.
Queue leader distribution.

8.3 Queue Health Dashboard

Purpose: answer, “Which queues are building risk?”

Panels:

Ready messages per queue.
Unacked messages per queue.
Oldest message age.
Publish/deliver/ack rates.
Redelivery rate.
Consumer count.
DLQ rate.
Queue length limit events.
Quorum queue replica status.

8.4 Stream Health Dashboard

Purpose: answer, “Can consumers catch up before retention removes data?”

Panels:

Append rate.
Read rate.
Consumer lag.
Offset commit delay.
Retention headroom.
Partition skew.
Hot partitions.
Stream replica health.
Replay jobs running.

8.5 Java Service Dashboard

Purpose: answer, “Is the application causing or absorbing the problem?”

Panels:

Publish attempts/success/failure.
Confirm latency.
In-flight confirms.
Returned messages.
Handler duration p50/p95/p99.
Handler failures.
Ack/nack/reject rate.
Executor queue depth.
Thread pool saturation.
JVM heap, GC pause, allocation rate.
DB/API dependency latency.

9. Alert Design: Symptoms First, Causes Second

Alert fatigue comes from alerting on every metric. A good alert is actionable and tied to user/data risk.

9.1 Page-Worthy Alerts

Page someone when:

critical flow message age breaches SLA;
critical DLQ grows above threshold;
broker disk alarm blocks publishing;
broker memory alarm persists;
critical consumer count becomes zero;
quorum queue loses safe replication margin;
stream consumer lag approaches retention risk;
outbox relay lag exceeds durability/SLA threshold;
publisher confirm latency crosses safety threshold;
duplicate/redelivery storm threatens downstream correctness.

9.2 Ticket-Worthy Alerts

Create a ticket when:

connection churn is higher than baseline;
channel count trends upward;
queue depth grows but age is still safe;
non-critical DLQ has low-volume errors;
retry rate increased after release;
partition skew grows but is below incident threshold;
dashboard has missing metrics.

9.3 Bad Alerts

Avoid alerts like:

any queue depth greater than 0;
any redelivery greater than 0;
any connection closed;
any consumer restart;
CPU greater than 70% for 1 minute;
heap usage greater than 80% without GC/latency impact.

Messaging systems are bursty. Alert on sustained risk, not normal dynamics.

10. Alert Rule Examples

Use these as conceptual rules, not copy-paste defaults.

10.1 Critical Consumer Down

alert: RabbitCriticalConsumerMissing
expr: rabbitmq_queue_consumers{queue=~"quote\\.command\\..*"} == 0
for: 2m
labels:
  severity: page
annotations:
  summary: "Critical RabbitMQ queue has no consumers"
  runbook: "rabbitmq-runbook-consumer-missing"

10.2 Queue SLA Age Breach

alert: RabbitQueueOldestMessageAgeSlaBreach
expr: rabbit_queue_oldest_message_age_seconds{critical="true"} > 300
for: 5m
labels:
  severity: page
annotations:
  summary: "Oldest message age exceeds SLA"
  runbook: "rabbitmq-runbook-queue-age"

10.3 DLQ Spike

alert: RabbitDlqSpike
expr: increase(rabbitmq_queue_messages_published_total{queue=~".*\\.dlq"}[10m]) > 100
for: 5m
labels:
  severity: page
annotations:
  summary: "DLQ volume spiked"
  runbook: "rabbitmq-runbook-dlq-spike"

10.4 Stream Retention Risk

alert: RabbitStreamConsumerRetentionRisk
expr: stream_retention_headroom_seconds{critical="true"} < 3600
for: 10m
labels:
  severity: page
annotations:
  summary: "Stream consumer is close to falling behind retention"
  runbook: "rabbitmq-runbook-stream-retention-risk"

11. Runbook: Queue Growth

Symptom

messages_ready grows.
Oldest message age grows.
Ack rate is lower than publish rate.

Immediate Triage

Identify affected queue and business flow.
Check consumer count.
Compare publish rate vs ack rate.
Check handler p95/p99 latency.
Check downstream dependency latency.
Check broker memory/disk alarms.
Check recent deployments.
Check DLQ/retry rate.

Decision Tree

Safe Actions

Scale consumers horizontally if handler is CPU-bound and queue ordering does not prohibit it.
Increase prefetch only if consumers have idle capacity.
Reduce producer rate if backlog threatens SLA or broker safety.
Shed non-critical workload.
Move poison messages to parking lot.
Roll back recent release if handler failures started after deployment.

Unsafe Actions

Blindly increasing prefetch.
Blindly increasing worker threads when DB is saturated.
Purging queues without business approval.
Replaying DLQ without idempotency.
Restarting all broker nodes at once.

12. Runbook: DLQ Spike

Symptom

DLQ publish rate increases.
Retry queue depth grows.
Parking lot receives messages.

Immediate Triage

Group DLQ messages by messageType, exception class, producer, schema version.
Determine if failures are deterministic or transient.
Check if all failures are from a new deployment.
Check schema/contract change.
Check downstream dependency outage.
Check retry attempt count distribution.
Estimate business impact.

Classification

Failure Type	Action
Transient DB/API timeout	Retry with backoff if budget remains.
Schema incompatibility	Stop replay, fix producer/consumer contract.
Business validation failure	Park and route to manual/business workflow.
Authorization failure	Fix credentials/permissions; do not blindly replay.
Poison message bug	Patch consumer, then replay idempotently.
Duplicate side effect	Stop consumer; inspect idempotency ledger.

Replay Rule

Never replay DLQ because “the service is fixed” unless you can answer:

Is the handler idempotent?
Are external side effects protected?
Will replay preserve ordering assumptions?
Is the message still semantically valid?
Is the target version compatible with the message schema?
Is the retry budget reset intentionally?

13. Runbook: Publisher Confirm Latency Spike

Symptom

Producer confirm p95/p99 rises.
In-flight confirm count saturates.
Outbox relay lag grows.
Application publish latency increases.

Immediate Triage

Check broker disk and memory alarms.
Check queue type: quorum/stream/classic.
Check replication health.
Check disk I/O saturation.
Check network latency producer → broker.
Check message size increase.
Check publish fanout increase.
Check broker leader node concentration.

Likely Causes

Cause	Evidence
Disk saturation	disk latency high, confirms slow, broker safe but slow
Quorum replication lag	leader/follower health degraded
Large payload release	message size histogram shifted
Fanout expansion	one publish routes to many queues
Broker flow control	connection blocked events
Network issue	connection churn, heartbeat timeouts

Safe Actions

Throttle producers.
Reduce non-critical publish volume.
Scale broker resources only if capacity model supports it.
Move large payloads to object storage with reference messages.
Rebalance leaders if supported by operational process.
Roll back topology/payload changes.

14. Runbook: Redelivery Storm

Symptom

Redelivery rate spikes.
Same messageId appears repeatedly.
Consumer CPU high but business progress low.
DLQ may or may not grow.

Root Causes

Consumer crashes before ack.
Handler throws deterministic exception and requeues.
Nack/requeue loop.
Process timeout kills worker.
Downstream service fails and all messages retry immediately.
Ack is never called because control flow exits early.

Safe Containment

Stop requeue loop by switching deterministic failures to DLQ/parking lot.
Reduce consumer concurrency if downstream is overloaded.
Disable or throttle affected consumer group.
Patch retry classifier.
Add delivery attempt limit if using quorum queues where appropriate.

Critical Invariant

A redelivery storm is not a throughput problem. It is a progress problem.

Increasing consumers can make it worse.

15. Runbook: Stream Consumer Lag Near Retention

Symptom

Consumer offset is far behind stream tail.
Retention headroom is shrinking.
Replay consumer cannot catch up.

Immediate Triage

Identify stream and consumer group.
Determine lag in messages and time.
Determine retention policy.
Estimate catch-up rate.
Compare catch-up rate with append rate.
Check partition skew.
Check consumer failures.
Check offset commit behavior.

Formula

net_catchup_rate = consumer_read_rate - append_rate
catchup_time = lag_messages / net_catchup_rate

If net_catchup_rate <= 0, the consumer will never catch up without intervention.

Safe Actions

Add consumers/partitions if architecture supports it.
Temporarily increase retention if storage allows.
Reduce append rate from non-critical producers.
Optimize consumer batch size.
Skip/rebuild non-critical derived projections from snapshot.
Start a new consumer group from a newer offset only with explicit business approval.

16. Operational Maturity Levels

Level 1 — Reactive

Management UI checked manually.
Logs only after incident.
Queue depth alerts only.
No runbooks.
DLQ replay manual and risky.

Level 2 — Basic Production

Prometheus scraping broker metrics.
Grafana dashboards for nodes and queues.
Basic DLQ and consumer alerts.
Application metrics for handler success/failure.
Some runbooks exist.

Level 3 — Reliable Production

Business SLA dashboards.
Producer confirm metrics.
Consumer ack/redelivery metrics.
Stream lag and retention headroom.
Idempotency and duplicate metrics.
Structured logs with correlation.
Tested runbooks.

Level 4 — Top-Tier Operational System

Automated symptom-to-runbook routing.
Chaos drills.
Release correlation on dashboards.
Contract drift detection.
Topology drift detection.
Business impact estimation.
Safe replay tooling.
Post-incident learning loop.

17. Safe Replay Tooling Requirements

A production RabbitMQ platform eventually needs controlled replay tooling.

Minimum features:

Select by queue, DLQ, stream, time range, message type, schema version, correlation id.
Preview before replay.
Redact sensitive payload fields.
Validate schema compatibility.
Enforce idempotency requirement.
Limit replay rate.
Preserve original metadata where appropriate.
Add replay metadata: replay id, operator, reason, timestamp.
Write audit record.
Stop replay when failure rate exceeds threshold.

Replay is not just an operational action. It is a data mutation workflow.

For regulated systems, replay must be reviewable and explainable.

18. Incident Postmortem Template

Use this structure after RabbitMQ incidents.

# Incident: <title>

## Summary
What happened in business terms?

## Timeline
- Detection time
- First alert
- First human acknowledgement
- Mitigation start
- Mitigation complete
- Full recovery

## Affected Flows
- Message types
- Queues/streams
- Producers
- Consumers
- Business entities

## Technical Symptoms
- Queue depth
- Oldest message age
- DLQ count
- Redelivery rate
- Confirm latency
- Broker alarms
- Consumer errors

## Root Cause
What actually caused the system to stop making safe progress?

## Trigger
What changed recently?

## What Worked
Which signals/runbooks/actions helped?

## What Failed
Which signals were missing or misleading?

## Data Safety Assessment
- Message loss?
- Duplicate side effects?
- Replayed messages?
- Manual intervention?

## Preventive Actions
- Code
- Topology
- Metrics
- Alerts
- Runbooks
- Tests
- Capacity

The most important section is Data Safety Assessment. A messaging outage is not only about downtime; it is about whether work was lost, duplicated, delayed, or corrupted.

19. Practice Drill

Build a small Java RabbitMQ lab with:

one producer using publisher confirms;
one command queue;
one retry queue;
one DLQ;
one consumer with manual ack;
one stream with offset tracking;
Micrometer metrics;
structured logs;
dashboard panels.

Then run these drills:

Stop consumer for 10 minutes.
Break DB dependency.
Publish invalid schema messages.
Force consumer crash after DB commit before ack.
Increase message payload size by 10x.
Reduce broker disk free space in a safe test environment.
Create a redelivery loop.
Let stream consumer fall behind.
Replay DLQ safely.
Roll back a bad consumer release.

For each drill, write:

expected signal;
observed signal;
diagnosis path;
safe action;
missing instrumentation.

This is deliberate practice for production judgment.

20. Production Checklist

Before declaring a RabbitMQ system production-ready, verify:

References

RabbitMQ Documentation — Monitoring
RabbitMQ Documentation — Monitoring with Prometheus and Grafana
RabbitMQ Documentation — Management Plugin
RabbitMQ Documentation — Production Deployment Guidelines
RabbitMQ Documentation — Reliability Guide
RabbitMQ Documentation — Connections
RabbitMQ Documentation — Memory Use
RabbitMQ Documentation — Streams
RabbitMQ Java Client API Guide
Micrometer Documentation
OpenTelemetry Documentation

Lesson Recap

You just completed lesson 33 in final stretch. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 32

Kubernetes Deployment - RabbitMQ Cluster Operator, Topology, Storage, and Upgrades

Next Lesson

Lesson 34

Security, Multi-Tenancy, Governance, and Compliance