RabbitMQ Streams Mental Model: Append-Only Log Inside RabbitMQ
Learn Java RabbitMQ, RabbitMQ Streams, Patterns, and Deployment In Action - Part 019
Advanced mental model for RabbitMQ Streams as persistent replicated append-only logs, covering queue-vs-stream semantics, offsets, retention, replay, replication, stream-powered queues, operational constraints, and decision frameworks for Java systems.
Part 019 — RabbitMQ Streams Mental Model: Append-Only Log Inside RabbitMQ
RabbitMQ Streams are not just "large queues".
A stream is a persistent, replicated, append-only log inside RabbitMQ. Producers append records. Consumers read records from positions. Consumption does not remove data. Data disappears only because of retention. That single shift changes the whole design model: queue consumers compete for work; stream consumers independently observe history.
This part builds the mental model required before writing serious Java stream producers and consumers.
We will not repeat AMQP queue fundamentals from earlier parts. The goal here is to know when a stream is the right primitive, what guarantees it gives, what guarantees it does not give, and how to model stream-based systems without accidentally building a fragile pseudo-Kafka or a misused queue.
1. Kaufman Deconstruction
To master RabbitMQ Streams, decompose the skill into ten capabilities:
- Primitive recognition — know when the problem needs a queue, a stream, a database table, or a true stream processor.
- Storage semantics — understand append-only storage, segments, retention, replication, and disk-oriented behavior.
- Consumption semantics — understand non-destructive reads, offsets, replay, attach positions, and consumer progress.
- Fan-out semantics — understand why many independent consumers can read the same records without queue-per-subscriber duplication.
- Failure semantics — understand what happens when producers, consumers, brokers, leaders, replicas, and networks fail.
- Ordering semantics — reason about order per stream, per partition, per producer, per consumer, and under replay.
- Operational economics — reason about disk, retention, replica count, segment size, throughput, and backlog.
- Java client fit — understand why the dedicated stream protocol and Java Stream Client exist.
- Use-case boundaries — know which use cases streams are good for and which ones still need classic/quorum queues.
- Self-correction — test assumptions using replay drills, offset drills, retention drills, and failover drills.
The shortcut is this:
Use a queue when the business intent is "do this work once". Use a stream when the business intent is "record this fact and allow multiple readers to observe/replay it".
2. The Core Mental Model
A traditional RabbitMQ queue behaves like a work buffer.
A RabbitMQ stream behaves like a durable log.
The queue model says:
- a message is pending work;
- a consumer takes responsibility for work;
- successful acknowledgement removes the work item from the queue;
- backlog should normally converge toward zero.
The stream model says:
- a message is an immutable record;
- each consumer chooses where to read;
- processing does not remove the record;
- backlog is a valid stored history up to the retention boundary.
This is a domain-level difference, not just a transport-level difference.
3. Queue vs Stream: Production Decision Matrix
| Dimension | Queue | Stream |
|---|---|---|
| Primary intent | distribute work | store and broadcast records |
| Consumption | destructive after ack | non-destructive |
| Consumer progress | broker delivery/ack state | offset per consumer/group/app |
| Replay | not natural after ack | natural until retention removes data |
| Many subscribers | usually queue-per-subscriber | many consumers can read same log |
| Backlog expectation | should drain | may intentionally retain history |
| Poison handling | nack/requeue/DLX patterns | application-level skip/park/reprocess |
| Ordering | per queue, weakened by concurrency/requeue | per stream/partition append order |
| Scaling writes | one stream until hot; then super streams | partition with super streams |
| Scaling reads | competing consumers divide work | many readers observe same data |
| Retention | TTL/queue length semantics for queues | age/size-based stream retention |
| Best examples | commands, jobs, task execution | audit log, event replay, analytics feed |
The mistake is choosing a stream because it sounds more modern. The right question is:
Should successful consumption delete the message from the system?
If yes, use a queue. If no, consider a stream.
4. What RabbitMQ Streams Are For
RabbitMQ Streams complement queues. They do not replace queues.
Typical fit:
-
Large fan-out
- Many applications need the same records.
- Queue-per-subscriber replication becomes inefficient.
- A stream lets each subscriber consume independently.
-
Replay and time travel
- A projection must be rebuilt.
- A bug fix requires reprocessing the last 7 days.
- A new downstream system needs historical records.
-
High-throughput ingestion
- The workload is append-heavy.
- The messages are usually processed in sequence.
- Consumer state can track offset.
-
Large backlog retention
- Millions or billions of records are intentionally retained.
- Backlog is not necessarily a symptom.
- Disk and retention become first-class design decisions.
-
Audit and forensics
- The system needs evidence of what was emitted.
- Consumers can reconstruct past decisions.
- Regulatory traceability matters.
Poor fit:
-
Short-lived request/reply
- A queue or direct reply-to is usually simpler.
-
One-off work distribution
- Work queue semantics are more natural.
-
Per-message DLQ-driven retry
- Streams do not behave like normal queues with DLX-based poison flows.
-
Full stream processing engine
- RabbitMQ Streams provide log/replay primitives, not a complete Kafka Streams/Flink/Spark model.
-
Infinite event store
- Retention removes old data. If the system needs permanent source-of-truth history, use an event store/database plus streams as distribution.
5. The Append-Only Log Model
A stream is append-only.
Producers do not update existing records. Consumers do not delete records. The broker appends new records and manages retention.
This produces several invariants:
| Invariant | Consequence |
|---|---|
| Records are immutable | corrections are new records, not edits |
| Position matters | consumers track offset/progress |
| Reads are repeatable within retention | replay is possible |
| Retention bounds history | replay is only possible for retained data |
| Data is disk-oriented | disk throughput and segment management matter |
| Replication matters | availability and safety depend on replica quorum |
A stream is a time-indexed record of facts, not a mutable state container.
6. Offset Is Consumer Progress, Not Message Ownership
In a queue, broker delivery state is about message ownership:
- delivered;
- unacknowledged;
- requeued;
- acknowledged;
- removed.
In a stream, offset is about consumer progress:
- "I have processed up to offset 1000";
- "start from offset 1001";
- "replay from first";
- "attach from timestamp";
- "start at next new message".
Different consumers can point to different offsets in the same stream. That is the source of stream fan-out power.
6.1 Offset Choices
A consumer can generally start from:
| Offset specification | Meaning | Typical use |
|---|---|---|
| first | first available retained message | projection rebuild, test replay |
| last | last chunk | tail inspection, near-latest bootstrap |
| next | only future messages | live notification consumer |
| specific offset | exact numeric position | resume/reprocess after known point |
| timestamp | approximate time-based attach | replay from incident time |
The critical word is available. first means first message still retained, not necessarily offset 0 forever.
7. Retention Is the Stream Equivalent of Data Lifecycle
Queues are usually tuned to drain. Streams are tuned to retain.
Retention answers:
- how much history can be replayed;
- how much disk is required;
- how long new consumers can be bootstrapped;
- how far back incident investigation can go;
- how much duplicate/replay processing must tolerate old records.
Common retention policies:
| Policy | Meaning | Example |
|---|---|---|
| age-based | retain data for time window | 7 days, 30 days, 90 days |
| size-based | retain up to byte limit | 500 GB |
| combined | whichever boundary removes old data | 30 days or 1 TB |
Retention is not free. It creates operational obligations:
- disk capacity planning;
- segment size planning;
- replica count planning;
- monitoring for retention truncation;
- consumer lag alerting relative to retention window.
7.1 The Most Important Retention Invariant
A consumer is safe only if its lag remains comfortably inside the retention window.
A stream consumer that falls behind retention can no longer resume from its stored offset because the data has been removed.
So a production stream consumer needs two lag measures:
| Lag measure | Meaning |
|---|---|
| offset lag | how many records behind the end |
| retention lag | how close the consumer is to the oldest retained offset/time |
Offset lag tells you throughput pressure. Retention lag tells you data-loss risk for that consumer.
8. Segments and Chunks: Why Streams Behave Differently
Streams are disk-oriented. Internally they are stored in segments and transported in chunks.
You do not need to micromanage this at application level, but you must understand the effect:
- retention is evaluated at segment boundaries;
- timestamp-based attach can land at a chunk boundary;
- large messages reduce throughput;
- fast disks matter;
- more replicas increase replication work;
- batching improves throughput but changes latency and duplicate semantics.
This is why stream performance work is not the same as classic queue performance work. Streams are optimized for large append/read workloads, but they still obey physics: disk I/O, replication, message size, and batching dominate.
9. Stream Declaration and Topology
A RabbitMQ stream can be declared as a queue with queue type stream.
Map<String, Object> arguments = new HashMap<>();
arguments.put("x-queue-type", "stream");
arguments.put("x-max-age", "7D");
arguments.put("x-max-length-bytes", 500_000_000_000L); // 500 GB
arguments.put("x-stream-max-segment-size-bytes", 500_000_000L);
channel.queueDeclare(
"order-events.stream",
true, // durable
false, // not exclusive
false, // not auto-delete
arguments
);
For dedicated stream protocol clients, stream creation is often handled through the Stream Java Client:
Environment environment = Environment.builder()
.uri("rabbitmq-stream://user:password@rabbitmq-0:5552/%2f")
.build();
environment.streamCreator()
.stream("order-events.stream")
.maxAge(Duration.ofDays(7))
.maxLengthBytes(ByteCapacity.GB(500))
.create();
The exact builder methods depend on client version, but the design responsibility does not change:
- name the stream as a domain contract;
- set retention deliberately;
- set replication deliberately;
- monitor lag and storage;
- document ownership.
9.1 Stream Naming Convention
Use a stream name that communicates domain and semantics.
Good:
order.events.v1.stream
quote.lifecycle.v1.stream
billing.invoice-events.v1.stream
regulatory.case-events.v1.stream
Weak:
events
stream1
rabbit-stream
service-a-output
A stream is a public record. Name it like a public contract.
10. Stream as AMQP Queue vs Stream Protocol
RabbitMQ Streams can be used through AMQP 0-9-1 as stream-backed queues, or through the dedicated RabbitMQ Stream protocol.
| Access mode | When useful | Trade-off |
|---|---|---|
| AMQP 0-9-1 stream queue | interoperability, migration, existing clients | not all stream-specific features/performance |
| Stream protocol client | high throughput, offset tracking, stream-native features | requires stream client and port/protocol planning |
Use AMQP access when:
- you need compatibility with existing AMQP consumers;
- you are gradually migrating;
- you only need basic stream behavior.
Use the Stream Java Client when:
- throughput matters;
- replay/offset tracking is central;
- you need stream-native producer/consumer APIs;
- you want client support for deduplication and stream-specific flow control.
11. Fan-Out Without Queue Explosion
In a queue-based pub/sub topology, each durable subscriber typically needs its own queue.
That is valid, but each queue has its own storage and delivery lifecycle. For large persistent fan-out, this can become expensive.
With a stream:
One retained log supports many independent readers.
This changes team boundaries:
| Queue pub/sub | Stream pub/sub |
|---|---|
| subscriber owns queue | producer/platform owns stream |
| each subscriber stores its copy | stream stores shared history |
| late subscriber needs future events only or manual backfill | late subscriber can replay retained history |
| DLQ per subscriber is natural | per-subscriber poison handling must be app-level |
The last point matters. Streams make fan-out cheaper, but they push some per-consumer failure handling into application design.
12. Replay Is a Feature and a Risk
Replay enables powerful operations:
- rebuild a read model;
- recompute metrics;
- backfill a new downstream system;
- run forensic analysis;
- validate a new consumer against historical data;
- recover from a consumer bug.
But replay can also trigger dangerous side effects:
- duplicate emails;
- repeated payment calls;
- repeated external API requests;
- re-opened workflow tasks;
- duplicate audit inserts;
- stale state overriding newer state.
Therefore every stream consumer must classify its handler:
| Handler type | Replay safe? | Requirement |
|---|---|---|
| pure projection | usually yes | idempotent upsert/versioning |
| analytics aggregation | yes if dedup/window safe | deterministic aggregation or correction logic |
| external notification | usually no | side-effect ledger or replay disabled |
| command trigger | risky | causal guard and idempotency key |
| audit copy | yes | append idempotency or unique event id |
Stream replay is not a free lunch. It is a controlled operational mode.
13. Stream Consumer Correctness Model
A stream consumer has three independent states:
- read position — what offset has been delivered to the process;
- business commit state — what effect has been safely persisted;
- stored offset — what offset future process instances will resume from.
Correctness depends on their relationship.
Safe invariant:
Store offset only after the business effect for that offset is durable.
If offset is stored before the business effect, a crash can skip data. If business effect is committed before offset storage, a crash can cause duplicate processing. Duplicate processing is usually safer than skipping data, as long as handlers are idempotent.
14. Consumer Offset Storage Options
| Offset storage | Fit | Risk |
|---|---|---|
| server-side offset tracking | simple consumers, stream-native apps | not atomic with business DB |
| relational DB table | projection/inbox with DB transaction | more app responsibility |
| distributed KV store | high-throughput checkpointing | atomicity with business state is hard |
| embedded consumer state | local tools/tests | unsafe for production failover |
14.1 Server-Side Offset Tracking
Server-side tracking is convenient. The broker stores the consumer's progress using the consumer name.
Good for:
- analytics readers;
- monitoring consumers;
- consumers whose handler is idempotent;
- consumers that can tolerate replay of last N records after crash.
Be careful when:
- the consumer writes to a database;
- offset and database commit must be atomic;
- multiple instances accidentally use the same consumer name without intended coordination.
14.2 External Offset Store
External offset storage is often better when the consumer's state is in a database.
Example table:
CREATE TABLE stream_consumer_checkpoint (
stream_name varchar(255) NOT NULL,
consumer_name varchar(255) NOT NULL,
partition_name varchar(255) NOT NULL DEFAULT '',
last_offset bigint NOT NULL,
updated_at timestamp NOT NULL DEFAULT current_timestamp,
PRIMARY KEY (stream_name, consumer_name, partition_name)
);
The consumer can update projection state and checkpoint in one transaction.
15. Stream Ordering Model
A stream preserves append order within that stream.
That does not mean your business always sees correct order.
You still need to account for:
- multiple producers appending interleaved records;
- retries and duplicate publish attempts;
- consumers processing asynchronously;
- projections committing out of order;
- retention removing old offsets;
- super stream partitions when used later.
For one stream, one consumer, synchronous handler, and monotonic offset commit, order is simple.
For a consumer that dispatches to a worker pool, order becomes a design choice.
If order matters, do not process offsets concurrently for the same ordering key unless you have explicit gap/stale handling.
16. Stream Is Not a Full Stream Processing Framework
RabbitMQ Streams provide storage, replay, fan-out, and high-throughput transport.
They do not automatically provide:
- durable window state;
- exactly-once stateful processing;
- repartitioning operators;
- joins;
- watermarks;
- event-time processing;
- global aggregation correctness;
- schema registry;
- state store changelog management.
You can build application-level processing on top of streams, but you own the state model.
Decision rule:
| Need | Better fit |
|---|---|
| Durable log and replay inside RabbitMQ ecosystem | RabbitMQ Streams |
| Work distribution with retry/DLQ | quorum queue/classic queue |
| Stateful event-time processing with joins/windows | Kafka Streams/Flink/Spark-like tooling |
| Permanent event source of truth | event store/database + outbox |
| High fan-out retained event feed | RabbitMQ Streams |
17. Stream + Queue Hybrid Architecture
The strongest RabbitMQ designs often combine queues and streams.
Example:
Stream role:
- durable event feed;
- replayable history;
- fan-out to projections and analytics.
Queue role:
- work dispatch;
- retry/DLQ;
- command handling;
- bounded worker concurrency.
Do not force all messaging into one primitive. Use each primitive according to its semantics.
18. Stream-Powered Event-Carried State Transfer
A stream is especially useful for event-carried state transfer.
Instead of publishing only:
{
"eventType": "OrderUpdated",
"orderId": "O-1001"
}
publish enough state for consumers to build local projections:
{
"eventId": "01JZ8K9P6S9K1W0YVZK8A1QJ4M",
"eventType": "OrderPriced",
"schemaVersion": 1,
"occurredAt": "2026-07-01T10:15:30Z",
"aggregateId": "O-1001",
"aggregateVersion": 7,
"payload": {
"orderId": "O-1001",
"customerId": "C-991",
"currency": "IDR",
"totalAmount": 1725000,
"pricingStatus": "FINAL"
}
}
Why it matters:
- consumers do not need synchronous calls back to producer service;
- replay can rebuild downstream projections;
- historical events remain explainable;
- consumer coupling is lower.
But this requires schema discipline and privacy discipline. Retained logs can amplify PII mistakes.
19. Retained Logs and Compliance
Streams are attractive for auditability, but auditability is not the same as compliance.
Consider:
| Concern | Design implication |
|---|---|
| PII | avoid or minimize sensitive payloads |
| retention law | retention must match policy/legal basis |
| deletion request | immutable retained logs complicate erase workflows |
| encryption | use TLS in transit; consider application-level encryption for sensitive fields |
| access control | vhost/user permissions must separate producers, consumers, operators |
| forensic trail | event IDs, correlation IDs, producer identity, schema version |
| evidence quality | timestamps, causation, ordering key, source commit reference |
For regulated systems, the stream should record facts, but the source-of-truth evidence model must be intentional.
20. Failure Model for Streams
Streams change failure semantics.
20.1 Producer Failure
Possible outcomes:
| Failure point | Broker state | Producer state | Risk |
|---|---|---|---|
| before append | no record | publish failed | missing event unless retried |
| after append before confirm | record exists | producer uncertain | duplicate on retry |
| after confirm | record exists | producer safe | normal |
Mitigation:
- publisher confirms;
- producer-side deduplication where appropriate;
- stable producer identity and publishing IDs;
- outbox relay for database-to-stream atomicity.
20.2 Consumer Failure
Possible outcomes:
| Failure point | Business effect | Stored offset | Result |
|---|---|---|---|
| before effect | no effect | old offset | replay safe |
| after effect before offset | effect committed | old offset | duplicate processing |
| after offset before effect | no effect | new offset | data skipped |
| after both | effect committed | new offset | normal |
Design bias:
Prefer duplicates over skips.
20.3 Broker/Replica Failure
Streams are replicated. Availability and safety depend on replica state and quorum. Operationally, this means:
- do not treat streams as single-node files;
- monitor replica health;
- use uneven cluster sizes where quorum systems are involved;
- do not decommission nodes without stream replica planning;
- understand leader placement and client connection locality.
21. Backpressure in Stream Systems
Stream systems move pressure differently from queue systems.
In a queue:
- pressure often appears as queue depth and unacked messages.
In a stream:
- pressure appears as consumer lag, publish confirm latency, disk usage, and retention risk.
Key stream pressure signals:
| Signal | Meaning |
|---|---|
| publish confirm latency | broker/disk/replication pressure |
| consumer offset lag | consumer cannot keep up |
| oldest retained offset advancing | retention is trimming history |
| disk usage | storage pressure |
| replica sync status | safety/availability pressure |
| processing latency | downstream bottleneck |
A queue backlog says "work is not done". A stream lag says "this reader is behind".
That difference matters operationally.
22. Stream Consumer Lag and SLA
Every stream consumer should have an explicit lag SLA.
Example:
| Consumer | Purpose | Lag SLO | Retention |
|---|---|---|---|
| order-read-model | customer-facing projection | < 5 seconds | 7 days |
| fraud-analytics | near-real-time analytics | < 2 minutes | 30 days |
| audit-exporter | regulatory archive | < 1 hour | 90 days |
| backfill-tool | manual replay | no live SLA | 90 days |
Alert levels:
| Level | Condition | Action |
|---|---|---|
| warning | lag exceeds SLO | inspect consumer/downstream |
| critical | lag approaches retention boundary | scale/fix urgently |
| page | consumer offset is no longer retained | data recovery/backfill required |
A stream without lag monitoring is unsafe.
23. Stream Payload Design
Good stream payloads are:
- immutable;
- versioned;
- replay-safe;
- causally traceable;
- keyed for partitioning later;
- privacy-reviewed;
- self-describing enough for downstream consumers.
Recommended envelope:
{
"eventId": "01JZ8M8RZ9N9B5SHGBAJ7H7W2Q",
"eventType": "OrderSubmitted",
"schemaVersion": 1,
"producer": "order-service",
"stream": "order.events.v1.stream",
"occurredAt": "2026-07-01T12:30:00Z",
"publishedAt": "2026-07-01T12:30:01Z",
"correlationId": "quote-req-77",
"causationId": "01JZ8M7P2G6Z9T3V1D76Z7RE4E",
"aggregateId": "O-1001",
"aggregateVersion": 12,
"partitionKey": "O-1001",
"payload": {}
}
Keep the domain envelope stable even if wire encoding changes.
24. Streams and Deduplication
Producer confirms protect against silent loss, but they do not eliminate ambiguity after network failure. A producer may append a record and fail before receiving confirmation. If it retries, duplicates can appear unless deduplication is used.
At a mental-model level, deduplication needs:
- stable producer identity;
- monotonic publishing identity/sequence;
- careful single-writer discipline for that producer identity;
- no unsafe multi-threaded publishing with shared producer name.
Even when broker-side deduplication is used, business-level idempotency remains important.
Why?
- different producers can publish semantically duplicate events;
- consumers can replay;
- retention/backfill can re-drive old records;
- external side effects can still duplicate.
Broker deduplication helps the publish path. Idempotency protects the business effect path.
25. Streams and Ordering Keys
For a single stream, append order is one ordered lane.
This is simple but can become a bottleneck. Super Streams, introduced later in this series, partition a logical stream into multiple streams. That enables scaling, but ordering is then per partition, not globally across all partitions.
Prepare for that now by putting a partition key in every event envelope.
Good keys:
| Domain | Key |
|---|---|
| order lifecycle | orderId |
| quote lifecycle | quoteId |
| account events | accountId |
| regulatory case | caseId |
| device telemetry | deviceId |
Weak keys:
| Key | Problem |
|---|---|
| random UUID | destroys locality |
| event type | creates hot partitions |
| timestamp bucket only | weak causality |
| customer segment | likely skewed |
Even before using Super Streams, design events as if they may later be partitioned.
26. Common Stream Anti-Patterns
26.1 Using Streams as a Job Queue
Bad sign:
- every record should be processed by exactly one worker;
- retry/DLQ is central;
- consumption should remove work;
- replay is dangerous.
Use a queue.
26.2 Using Streams as the Only Source of Truth
Bad sign:
- retention can delete data required for legal/business history;
- compaction is expected but not provided like an event store;
- queries need random access by entity.
Use database/event store as source of truth and stream as distribution log.
26.3 Consumer With Unsafe Side Effects
Bad sign:
- replay sends emails again;
- replay charges cards again;
- replay creates duplicate tickets.
Use a side-effect ledger and idempotency key.
26.4 No Retention Budget
Bad sign:
- retention is omitted;
- disk is assumed infinite;
- consumer lag is not monitored;
- no plan for lag exceeding retention.
Treat retention as part of API contract.
26.5 One Giant Stream for Everything
Bad sign:
- all domains publish to
events.stream; - consumers filter most events away;
- schema governance collapses;
- retention needs conflict.
Use domain-oriented streams.
27. Design Review Checklist
Before approving a RabbitMQ Stream design, ask:
- What is the stream's domain contract?
- Who owns the stream?
- Why is a stream better than a quorum queue here?
- What is the retention policy?
- What is the expected write throughput?
- What is the expected read fan-out?
- What is the message size distribution?
- What is the replica factor / cluster placement?
- What happens if a consumer is down for 2 hours? 2 days? 2 weeks?
- What happens if the retention boundary overtakes a consumer?
- How does a consumer store offset safely?
- Which consumers are replay-safe?
- Which consumers must never replay side effects?
- What is the event schema versioning strategy?
- What PII is in the stream?
- What is the partition key if this later becomes a super stream?
- What dashboards and alerts exist?
- What is the emergency procedure for stuck consumers?
If these questions have no answers, the design is not production-ready.
28. Practice: Build the Mental Model in 90 Minutes
Drill 1 — Queue vs Stream Classification
Classify each workload:
| Workload | Queue or stream? | Why? |
|---|---|---|
| send password reset email | queue | exactly-one worker style, replay dangerous |
| order lifecycle event feed | stream | many consumers and replay value |
| image resizing job | queue | work distribution |
| regulatory case state audit | stream + archive DB | retained/replayable facts |
| customer profile projection rebuild | stream | rebuild from retained events |
| payment capture command | queue | command with strict side-effect handling |
Drill 2 — Retention Budget
For a stream with:
- 2 KB average message size;
- 5,000 messages/sec;
- 7-day retention;
- 3 replicas.
Approximate raw data:
2 KB * 5,000/sec = 10 MB/sec
10 MB/sec * 86,400 sec/day = 864 GB/day
864 GB/day * 7 = 6,048 GB
6,048 GB * 3 replicas = 18,144 GB raw replicated storage
The exact storage overhead depends on broker internals, metadata, segmenting, filesystem, and compression/batching choices, but the first-order estimate is enough to catch fantasy designs.
Drill 3 — Offset Correctness
Given this flow:
read offset 50
store offset 50
write projection row
crash before DB commit
restart at 51
Problem: offset advanced before business effect. Offset 50 is skipped.
Correct flow:
read offset 50
write projection row idempotently
commit DB
store offset 50
crash possible
restart may replay 50 if offset was not stored
Duplicate is safer than skip.
29. Reference Architecture: Stream-Based Read Model Rebuild
Key decisions:
- producer writes business DB and outbox atomically;
- relay publishes to stream with confirms;
- stream retains enough data for rebuild window;
- projection is idempotent by
eventIdandaggregateVersion; - offset is stored after projection commit;
- backfill tool reads from
firstor timestamp; - audit export has its own offset and failure handling;
- side effects are not triggered by replay consumers.
30. Self-Correction Rubric
You understand RabbitMQ Streams when you can answer these without guessing:
| Question | Strong answer should include |
|---|---|
| Why not use a queue? | non-destructive fan-out/replay/retention requirement |
| What does offset mean? | consumer progress, not message ownership |
| What deletes stream records? | retention, not ack |
| What happens after replay? | same records can drive idempotent handlers again |
| What happens if offset is stored before DB commit? | skip risk |
| What happens if DB commit happens before offset store? | duplicate risk |
| What is the retention SLO? | data window + lag boundary |
| How do consumers avoid unsafe side effects? | idempotency/side-effect ledger/replay guard |
| What is the partition key? | stable entity key for future super stream scaling |
| How is stream health observed? | lag, disk, confirms, replica health, retention risk |
31. Minimal Vocabulary
| Term | Meaning |
|---|---|
| stream | persistent replicated append-only log |
| offset | position in a stream |
| retained data | data still available for replay |
| retention | age/size policy that removes old data |
| segment | disk file unit used by stream storage |
| chunk | batch/storage/transport unit of messages |
| non-destructive consumer | consumer that does not remove data by reading |
| server-side offset tracking | broker stores consumer progress |
| external offset tracking | application stores progress elsewhere |
| super stream | partitioned logical stream made of multiple streams |
| replay | consuming historical records again |
| live tail | consuming only new records from next |
32. Closing Model
A RabbitMQ stream is a durable memory of facts within a retention boundary.
It is best used when the system needs:
- multiple independent readers;
- replayable history;
- high-throughput append;
- large retained backlog;
- event distribution inside a RabbitMQ-centered architecture.
It is not a universal replacement for queues. It is not automatically a stream processing platform. It is not infinite storage. It is not a substitute for idempotency.
The top-tier engineer's move is to model the primitive honestly:
Queue = distribute work and remove after completion.
Stream = record facts and let consumers read/replay independently.
Everything else flows from that distinction.
References
- RabbitMQ Documentation — Streams and Super Streams: https://www.rabbitmq.com/docs/streams
- RabbitMQ Documentation — Stream Plugin: https://www.rabbitmq.com/docs/stream
- RabbitMQ Blog — RabbitMQ Streams Overview: https://www.rabbitmq.com/blog/2021/07/13/rabbitmq-streams-overview
- RabbitMQ Stream Java Client Reference: https://rabbitmq.github.io/rabbitmq-stream-java-client/stable/htmlsingle/
- RabbitMQ Java Stream Tutorial — Offset Tracking: https://www.rabbitmq.com/tutorials/tutorial-two-java-stream
You just completed lesson 19 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.
Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.