Build CoreOrdered learning track

RabbitMQ Streams Mental Model: Append-Only Log Inside RabbitMQ

Learn Java RabbitMQ, RabbitMQ Streams, Patterns, and Deployment In Action - Part 019

Advanced mental model for RabbitMQ Streams as persistent replicated append-only logs, covering queue-vs-stream semantics, offsets, retention, replay, replication, stream-powered queues, operational constraints, and decision frameworks for Java systems.

24 min read4747 words
PrevNext
Lesson 1935 lesson track0719 Build Core
#java#rabbitmq#rabbitmq-streams#streaming+3 more

Part 019 — RabbitMQ Streams Mental Model: Append-Only Log Inside RabbitMQ

RabbitMQ Streams are not just "large queues".

A stream is a persistent, replicated, append-only log inside RabbitMQ. Producers append records. Consumers read records from positions. Consumption does not remove data. Data disappears only because of retention. That single shift changes the whole design model: queue consumers compete for work; stream consumers independently observe history.

This part builds the mental model required before writing serious Java stream producers and consumers.

We will not repeat AMQP queue fundamentals from earlier parts. The goal here is to know when a stream is the right primitive, what guarantees it gives, what guarantees it does not give, and how to model stream-based systems without accidentally building a fragile pseudo-Kafka or a misused queue.


1. Kaufman Deconstruction

To master RabbitMQ Streams, decompose the skill into ten capabilities:

  1. Primitive recognition — know when the problem needs a queue, a stream, a database table, or a true stream processor.
  2. Storage semantics — understand append-only storage, segments, retention, replication, and disk-oriented behavior.
  3. Consumption semantics — understand non-destructive reads, offsets, replay, attach positions, and consumer progress.
  4. Fan-out semantics — understand why many independent consumers can read the same records without queue-per-subscriber duplication.
  5. Failure semantics — understand what happens when producers, consumers, brokers, leaders, replicas, and networks fail.
  6. Ordering semantics — reason about order per stream, per partition, per producer, per consumer, and under replay.
  7. Operational economics — reason about disk, retention, replica count, segment size, throughput, and backlog.
  8. Java client fit — understand why the dedicated stream protocol and Java Stream Client exist.
  9. Use-case boundaries — know which use cases streams are good for and which ones still need classic/quorum queues.
  10. Self-correction — test assumptions using replay drills, offset drills, retention drills, and failover drills.

The shortcut is this:

Use a queue when the business intent is "do this work once". Use a stream when the business intent is "record this fact and allow multiple readers to observe/replay it".


2. The Core Mental Model

A traditional RabbitMQ queue behaves like a work buffer.

A RabbitMQ stream behaves like a durable log.

The queue model says:

  • a message is pending work;
  • a consumer takes responsibility for work;
  • successful acknowledgement removes the work item from the queue;
  • backlog should normally converge toward zero.

The stream model says:

  • a message is an immutable record;
  • each consumer chooses where to read;
  • processing does not remove the record;
  • backlog is a valid stored history up to the retention boundary.

This is a domain-level difference, not just a transport-level difference.


3. Queue vs Stream: Production Decision Matrix

DimensionQueueStream
Primary intentdistribute workstore and broadcast records
Consumptiondestructive after acknon-destructive
Consumer progressbroker delivery/ack stateoffset per consumer/group/app
Replaynot natural after acknatural until retention removes data
Many subscribersusually queue-per-subscribermany consumers can read same log
Backlog expectationshould drainmay intentionally retain history
Poison handlingnack/requeue/DLX patternsapplication-level skip/park/reprocess
Orderingper queue, weakened by concurrency/requeueper stream/partition append order
Scaling writesone stream until hot; then super streamspartition with super streams
Scaling readscompeting consumers divide workmany readers observe same data
RetentionTTL/queue length semantics for queuesage/size-based stream retention
Best examplescommands, jobs, task executionaudit log, event replay, analytics feed

The mistake is choosing a stream because it sounds more modern. The right question is:

Should successful consumption delete the message from the system?

If yes, use a queue. If no, consider a stream.


4. What RabbitMQ Streams Are For

RabbitMQ Streams complement queues. They do not replace queues.

Typical fit:

  1. Large fan-out

    • Many applications need the same records.
    • Queue-per-subscriber replication becomes inefficient.
    • A stream lets each subscriber consume independently.
  2. Replay and time travel

    • A projection must be rebuilt.
    • A bug fix requires reprocessing the last 7 days.
    • A new downstream system needs historical records.
  3. High-throughput ingestion

    • The workload is append-heavy.
    • The messages are usually processed in sequence.
    • Consumer state can track offset.
  4. Large backlog retention

    • Millions or billions of records are intentionally retained.
    • Backlog is not necessarily a symptom.
    • Disk and retention become first-class design decisions.
  5. Audit and forensics

    • The system needs evidence of what was emitted.
    • Consumers can reconstruct past decisions.
    • Regulatory traceability matters.

Poor fit:

  1. Short-lived request/reply

    • A queue or direct reply-to is usually simpler.
  2. One-off work distribution

    • Work queue semantics are more natural.
  3. Per-message DLQ-driven retry

    • Streams do not behave like normal queues with DLX-based poison flows.
  4. Full stream processing engine

    • RabbitMQ Streams provide log/replay primitives, not a complete Kafka Streams/Flink/Spark model.
  5. Infinite event store

    • Retention removes old data. If the system needs permanent source-of-truth history, use an event store/database plus streams as distribution.

5. The Append-Only Log Model

A stream is append-only.

Producers do not update existing records. Consumers do not delete records. The broker appends new records and manages retention.

This produces several invariants:

InvariantConsequence
Records are immutablecorrections are new records, not edits
Position mattersconsumers track offset/progress
Reads are repeatable within retentionreplay is possible
Retention bounds historyreplay is only possible for retained data
Data is disk-orienteddisk throughput and segment management matter
Replication mattersavailability and safety depend on replica quorum

A stream is a time-indexed record of facts, not a mutable state container.


6. Offset Is Consumer Progress, Not Message Ownership

In a queue, broker delivery state is about message ownership:

  • delivered;
  • unacknowledged;
  • requeued;
  • acknowledged;
  • removed.

In a stream, offset is about consumer progress:

  • "I have processed up to offset 1000";
  • "start from offset 1001";
  • "replay from first";
  • "attach from timestamp";
  • "start at next new message".

Different consumers can point to different offsets in the same stream. That is the source of stream fan-out power.

6.1 Offset Choices

A consumer can generally start from:

Offset specificationMeaningTypical use
firstfirst available retained messageprojection rebuild, test replay
lastlast chunktail inspection, near-latest bootstrap
nextonly future messageslive notification consumer
specific offsetexact numeric positionresume/reprocess after known point
timestampapproximate time-based attachreplay from incident time

The critical word is available. first means first message still retained, not necessarily offset 0 forever.


7. Retention Is the Stream Equivalent of Data Lifecycle

Queues are usually tuned to drain. Streams are tuned to retain.

Retention answers:

  • how much history can be replayed;
  • how much disk is required;
  • how long new consumers can be bootstrapped;
  • how far back incident investigation can go;
  • how much duplicate/replay processing must tolerate old records.

Common retention policies:

PolicyMeaningExample
age-basedretain data for time window7 days, 30 days, 90 days
size-basedretain up to byte limit500 GB
combinedwhichever boundary removes old data30 days or 1 TB

Retention is not free. It creates operational obligations:

  1. disk capacity planning;
  2. segment size planning;
  3. replica count planning;
  4. monitoring for retention truncation;
  5. consumer lag alerting relative to retention window.

7.1 The Most Important Retention Invariant

A consumer is safe only if its lag remains comfortably inside the retention window.

A stream consumer that falls behind retention can no longer resume from its stored offset because the data has been removed.

So a production stream consumer needs two lag measures:

Lag measureMeaning
offset laghow many records behind the end
retention laghow close the consumer is to the oldest retained offset/time

Offset lag tells you throughput pressure. Retention lag tells you data-loss risk for that consumer.


8. Segments and Chunks: Why Streams Behave Differently

Streams are disk-oriented. Internally they are stored in segments and transported in chunks.

You do not need to micromanage this at application level, but you must understand the effect:

  • retention is evaluated at segment boundaries;
  • timestamp-based attach can land at a chunk boundary;
  • large messages reduce throughput;
  • fast disks matter;
  • more replicas increase replication work;
  • batching improves throughput but changes latency and duplicate semantics.

This is why stream performance work is not the same as classic queue performance work. Streams are optimized for large append/read workloads, but they still obey physics: disk I/O, replication, message size, and batching dominate.


9. Stream Declaration and Topology

A RabbitMQ stream can be declared as a queue with queue type stream.

Map<String, Object> arguments = new HashMap<>();
arguments.put("x-queue-type", "stream");
arguments.put("x-max-age", "7D");
arguments.put("x-max-length-bytes", 500_000_000_000L); // 500 GB
arguments.put("x-stream-max-segment-size-bytes", 500_000_000L);

channel.queueDeclare(
    "order-events.stream",
    true,   // durable
    false,  // not exclusive
    false,  // not auto-delete
    arguments
);

For dedicated stream protocol clients, stream creation is often handled through the Stream Java Client:

Environment environment = Environment.builder()
    .uri("rabbitmq-stream://user:password@rabbitmq-0:5552/%2f")
    .build();

environment.streamCreator()
    .stream("order-events.stream")
    .maxAge(Duration.ofDays(7))
    .maxLengthBytes(ByteCapacity.GB(500))
    .create();

The exact builder methods depend on client version, but the design responsibility does not change:

  • name the stream as a domain contract;
  • set retention deliberately;
  • set replication deliberately;
  • monitor lag and storage;
  • document ownership.

9.1 Stream Naming Convention

Use a stream name that communicates domain and semantics.

Good:

order.events.v1.stream
quote.lifecycle.v1.stream
billing.invoice-events.v1.stream
regulatory.case-events.v1.stream

Weak:

events
stream1
rabbit-stream
service-a-output

A stream is a public record. Name it like a public contract.


10. Stream as AMQP Queue vs Stream Protocol

RabbitMQ Streams can be used through AMQP 0-9-1 as stream-backed queues, or through the dedicated RabbitMQ Stream protocol.

Access modeWhen usefulTrade-off
AMQP 0-9-1 stream queueinteroperability, migration, existing clientsnot all stream-specific features/performance
Stream protocol clienthigh throughput, offset tracking, stream-native featuresrequires stream client and port/protocol planning

Use AMQP access when:

  • you need compatibility with existing AMQP consumers;
  • you are gradually migrating;
  • you only need basic stream behavior.

Use the Stream Java Client when:

  • throughput matters;
  • replay/offset tracking is central;
  • you need stream-native producer/consumer APIs;
  • you want client support for deduplication and stream-specific flow control.

11. Fan-Out Without Queue Explosion

In a queue-based pub/sub topology, each durable subscriber typically needs its own queue.

That is valid, but each queue has its own storage and delivery lifecycle. For large persistent fan-out, this can become expensive.

With a stream:

One retained log supports many independent readers.

This changes team boundaries:

Queue pub/subStream pub/sub
subscriber owns queueproducer/platform owns stream
each subscriber stores its copystream stores shared history
late subscriber needs future events only or manual backfilllate subscriber can replay retained history
DLQ per subscriber is naturalper-subscriber poison handling must be app-level

The last point matters. Streams make fan-out cheaper, but they push some per-consumer failure handling into application design.


12. Replay Is a Feature and a Risk

Replay enables powerful operations:

  • rebuild a read model;
  • recompute metrics;
  • backfill a new downstream system;
  • run forensic analysis;
  • validate a new consumer against historical data;
  • recover from a consumer bug.

But replay can also trigger dangerous side effects:

  • duplicate emails;
  • repeated payment calls;
  • repeated external API requests;
  • re-opened workflow tasks;
  • duplicate audit inserts;
  • stale state overriding newer state.

Therefore every stream consumer must classify its handler:

Handler typeReplay safe?Requirement
pure projectionusually yesidempotent upsert/versioning
analytics aggregationyes if dedup/window safedeterministic aggregation or correction logic
external notificationusually noside-effect ledger or replay disabled
command triggerriskycausal guard and idempotency key
audit copyyesappend idempotency or unique event id

Stream replay is not a free lunch. It is a controlled operational mode.


13. Stream Consumer Correctness Model

A stream consumer has three independent states:

  1. read position — what offset has been delivered to the process;
  2. business commit state — what effect has been safely persisted;
  3. stored offset — what offset future process instances will resume from.

Correctness depends on their relationship.

Safe invariant:

Store offset only after the business effect for that offset is durable.

If offset is stored before the business effect, a crash can skip data. If business effect is committed before offset storage, a crash can cause duplicate processing. Duplicate processing is usually safer than skipping data, as long as handlers are idempotent.


14. Consumer Offset Storage Options

Offset storageFitRisk
server-side offset trackingsimple consumers, stream-native appsnot atomic with business DB
relational DB tableprojection/inbox with DB transactionmore app responsibility
distributed KV storehigh-throughput checkpointingatomicity with business state is hard
embedded consumer statelocal tools/testsunsafe for production failover

14.1 Server-Side Offset Tracking

Server-side tracking is convenient. The broker stores the consumer's progress using the consumer name.

Good for:

  • analytics readers;
  • monitoring consumers;
  • consumers whose handler is idempotent;
  • consumers that can tolerate replay of last N records after crash.

Be careful when:

  • the consumer writes to a database;
  • offset and database commit must be atomic;
  • multiple instances accidentally use the same consumer name without intended coordination.

14.2 External Offset Store

External offset storage is often better when the consumer's state is in a database.

Example table:

CREATE TABLE stream_consumer_checkpoint (
    stream_name        varchar(255) NOT NULL,
    consumer_name      varchar(255) NOT NULL,
    partition_name     varchar(255) NOT NULL DEFAULT '',
    last_offset        bigint NOT NULL,
    updated_at         timestamp NOT NULL DEFAULT current_timestamp,
    PRIMARY KEY (stream_name, consumer_name, partition_name)
);

The consumer can update projection state and checkpoint in one transaction.


15. Stream Ordering Model

A stream preserves append order within that stream.

That does not mean your business always sees correct order.

You still need to account for:

  • multiple producers appending interleaved records;
  • retries and duplicate publish attempts;
  • consumers processing asynchronously;
  • projections committing out of order;
  • retention removing old offsets;
  • super stream partitions when used later.

For one stream, one consumer, synchronous handler, and monotonic offset commit, order is simple.

For a consumer that dispatches to a worker pool, order becomes a design choice.

If order matters, do not process offsets concurrently for the same ordering key unless you have explicit gap/stale handling.


16. Stream Is Not a Full Stream Processing Framework

RabbitMQ Streams provide storage, replay, fan-out, and high-throughput transport.

They do not automatically provide:

  • durable window state;
  • exactly-once stateful processing;
  • repartitioning operators;
  • joins;
  • watermarks;
  • event-time processing;
  • global aggregation correctness;
  • schema registry;
  • state store changelog management.

You can build application-level processing on top of streams, but you own the state model.

Decision rule:

NeedBetter fit
Durable log and replay inside RabbitMQ ecosystemRabbitMQ Streams
Work distribution with retry/DLQquorum queue/classic queue
Stateful event-time processing with joins/windowsKafka Streams/Flink/Spark-like tooling
Permanent event source of truthevent store/database + outbox
High fan-out retained event feedRabbitMQ Streams

17. Stream + Queue Hybrid Architecture

The strongest RabbitMQ designs often combine queues and streams.

Example:

Stream role:

  • durable event feed;
  • replayable history;
  • fan-out to projections and analytics.

Queue role:

  • work dispatch;
  • retry/DLQ;
  • command handling;
  • bounded worker concurrency.

Do not force all messaging into one primitive. Use each primitive according to its semantics.


18. Stream-Powered Event-Carried State Transfer

A stream is especially useful for event-carried state transfer.

Instead of publishing only:

{
  "eventType": "OrderUpdated",
  "orderId": "O-1001"
}

publish enough state for consumers to build local projections:

{
  "eventId": "01JZ8K9P6S9K1W0YVZK8A1QJ4M",
  "eventType": "OrderPriced",
  "schemaVersion": 1,
  "occurredAt": "2026-07-01T10:15:30Z",
  "aggregateId": "O-1001",
  "aggregateVersion": 7,
  "payload": {
    "orderId": "O-1001",
    "customerId": "C-991",
    "currency": "IDR",
    "totalAmount": 1725000,
    "pricingStatus": "FINAL"
  }
}

Why it matters:

  • consumers do not need synchronous calls back to producer service;
  • replay can rebuild downstream projections;
  • historical events remain explainable;
  • consumer coupling is lower.

But this requires schema discipline and privacy discipline. Retained logs can amplify PII mistakes.


19. Retained Logs and Compliance

Streams are attractive for auditability, but auditability is not the same as compliance.

Consider:

ConcernDesign implication
PIIavoid or minimize sensitive payloads
retention lawretention must match policy/legal basis
deletion requestimmutable retained logs complicate erase workflows
encryptionuse TLS in transit; consider application-level encryption for sensitive fields
access controlvhost/user permissions must separate producers, consumers, operators
forensic trailevent IDs, correlation IDs, producer identity, schema version
evidence qualitytimestamps, causation, ordering key, source commit reference

For regulated systems, the stream should record facts, but the source-of-truth evidence model must be intentional.


20. Failure Model for Streams

Streams change failure semantics.

20.1 Producer Failure

Possible outcomes:

Failure pointBroker stateProducer stateRisk
before appendno recordpublish failedmissing event unless retried
after append before confirmrecord existsproducer uncertainduplicate on retry
after confirmrecord existsproducer safenormal

Mitigation:

  • publisher confirms;
  • producer-side deduplication where appropriate;
  • stable producer identity and publishing IDs;
  • outbox relay for database-to-stream atomicity.

20.2 Consumer Failure

Possible outcomes:

Failure pointBusiness effectStored offsetResult
before effectno effectold offsetreplay safe
after effect before offseteffect committedold offsetduplicate processing
after offset before effectno effectnew offsetdata skipped
after botheffect committednew offsetnormal

Design bias:

Prefer duplicates over skips.

20.3 Broker/Replica Failure

Streams are replicated. Availability and safety depend on replica state and quorum. Operationally, this means:

  • do not treat streams as single-node files;
  • monitor replica health;
  • use uneven cluster sizes where quorum systems are involved;
  • do not decommission nodes without stream replica planning;
  • understand leader placement and client connection locality.

21. Backpressure in Stream Systems

Stream systems move pressure differently from queue systems.

In a queue:

  • pressure often appears as queue depth and unacked messages.

In a stream:

  • pressure appears as consumer lag, publish confirm latency, disk usage, and retention risk.

Key stream pressure signals:

SignalMeaning
publish confirm latencybroker/disk/replication pressure
consumer offset lagconsumer cannot keep up
oldest retained offset advancingretention is trimming history
disk usagestorage pressure
replica sync statussafety/availability pressure
processing latencydownstream bottleneck

A queue backlog says "work is not done". A stream lag says "this reader is behind".

That difference matters operationally.


22. Stream Consumer Lag and SLA

Every stream consumer should have an explicit lag SLA.

Example:

ConsumerPurposeLag SLORetention
order-read-modelcustomer-facing projection< 5 seconds7 days
fraud-analyticsnear-real-time analytics< 2 minutes30 days
audit-exporterregulatory archive< 1 hour90 days
backfill-toolmanual replayno live SLA90 days

Alert levels:

LevelConditionAction
warninglag exceeds SLOinspect consumer/downstream
criticallag approaches retention boundaryscale/fix urgently
pageconsumer offset is no longer retaineddata recovery/backfill required

A stream without lag monitoring is unsafe.


23. Stream Payload Design

Good stream payloads are:

  • immutable;
  • versioned;
  • replay-safe;
  • causally traceable;
  • keyed for partitioning later;
  • privacy-reviewed;
  • self-describing enough for downstream consumers.

Recommended envelope:

{
  "eventId": "01JZ8M8RZ9N9B5SHGBAJ7H7W2Q",
  "eventType": "OrderSubmitted",
  "schemaVersion": 1,
  "producer": "order-service",
  "stream": "order.events.v1.stream",
  "occurredAt": "2026-07-01T12:30:00Z",
  "publishedAt": "2026-07-01T12:30:01Z",
  "correlationId": "quote-req-77",
  "causationId": "01JZ8M7P2G6Z9T3V1D76Z7RE4E",
  "aggregateId": "O-1001",
  "aggregateVersion": 12,
  "partitionKey": "O-1001",
  "payload": {}
}

Keep the domain envelope stable even if wire encoding changes.


24. Streams and Deduplication

Producer confirms protect against silent loss, but they do not eliminate ambiguity after network failure. A producer may append a record and fail before receiving confirmation. If it retries, duplicates can appear unless deduplication is used.

At a mental-model level, deduplication needs:

  • stable producer identity;
  • monotonic publishing identity/sequence;
  • careful single-writer discipline for that producer identity;
  • no unsafe multi-threaded publishing with shared producer name.

Even when broker-side deduplication is used, business-level idempotency remains important.

Why?

  • different producers can publish semantically duplicate events;
  • consumers can replay;
  • retention/backfill can re-drive old records;
  • external side effects can still duplicate.

Broker deduplication helps the publish path. Idempotency protects the business effect path.


25. Streams and Ordering Keys

For a single stream, append order is one ordered lane.

This is simple but can become a bottleneck. Super Streams, introduced later in this series, partition a logical stream into multiple streams. That enables scaling, but ordering is then per partition, not globally across all partitions.

Prepare for that now by putting a partition key in every event envelope.

Good keys:

DomainKey
order lifecycleorderId
quote lifecyclequoteId
account eventsaccountId
regulatory casecaseId
device telemetrydeviceId

Weak keys:

KeyProblem
random UUIDdestroys locality
event typecreates hot partitions
timestamp bucket onlyweak causality
customer segmentlikely skewed

Even before using Super Streams, design events as if they may later be partitioned.


26. Common Stream Anti-Patterns

26.1 Using Streams as a Job Queue

Bad sign:

  • every record should be processed by exactly one worker;
  • retry/DLQ is central;
  • consumption should remove work;
  • replay is dangerous.

Use a queue.

26.2 Using Streams as the Only Source of Truth

Bad sign:

  • retention can delete data required for legal/business history;
  • compaction is expected but not provided like an event store;
  • queries need random access by entity.

Use database/event store as source of truth and stream as distribution log.

26.3 Consumer With Unsafe Side Effects

Bad sign:

  • replay sends emails again;
  • replay charges cards again;
  • replay creates duplicate tickets.

Use a side-effect ledger and idempotency key.

26.4 No Retention Budget

Bad sign:

  • retention is omitted;
  • disk is assumed infinite;
  • consumer lag is not monitored;
  • no plan for lag exceeding retention.

Treat retention as part of API contract.

26.5 One Giant Stream for Everything

Bad sign:

  • all domains publish to events.stream;
  • consumers filter most events away;
  • schema governance collapses;
  • retention needs conflict.

Use domain-oriented streams.


27. Design Review Checklist

Before approving a RabbitMQ Stream design, ask:

  1. What is the stream's domain contract?
  2. Who owns the stream?
  3. Why is a stream better than a quorum queue here?
  4. What is the retention policy?
  5. What is the expected write throughput?
  6. What is the expected read fan-out?
  7. What is the message size distribution?
  8. What is the replica factor / cluster placement?
  9. What happens if a consumer is down for 2 hours? 2 days? 2 weeks?
  10. What happens if the retention boundary overtakes a consumer?
  11. How does a consumer store offset safely?
  12. Which consumers are replay-safe?
  13. Which consumers must never replay side effects?
  14. What is the event schema versioning strategy?
  15. What PII is in the stream?
  16. What is the partition key if this later becomes a super stream?
  17. What dashboards and alerts exist?
  18. What is the emergency procedure for stuck consumers?

If these questions have no answers, the design is not production-ready.


28. Practice: Build the Mental Model in 90 Minutes

Drill 1 — Queue vs Stream Classification

Classify each workload:

WorkloadQueue or stream?Why?
send password reset emailqueueexactly-one worker style, replay dangerous
order lifecycle event feedstreammany consumers and replay value
image resizing jobqueuework distribution
regulatory case state auditstream + archive DBretained/replayable facts
customer profile projection rebuildstreamrebuild from retained events
payment capture commandqueuecommand with strict side-effect handling

Drill 2 — Retention Budget

For a stream with:

  • 2 KB average message size;
  • 5,000 messages/sec;
  • 7-day retention;
  • 3 replicas.

Approximate raw data:

2 KB * 5,000/sec = 10 MB/sec
10 MB/sec * 86,400 sec/day = 864 GB/day
864 GB/day * 7 = 6,048 GB
6,048 GB * 3 replicas = 18,144 GB raw replicated storage

The exact storage overhead depends on broker internals, metadata, segmenting, filesystem, and compression/batching choices, but the first-order estimate is enough to catch fantasy designs.

Drill 3 — Offset Correctness

Given this flow:

read offset 50
store offset 50
write projection row
crash before DB commit
restart at 51

Problem: offset advanced before business effect. Offset 50 is skipped.

Correct flow:

read offset 50
write projection row idempotently
commit DB
store offset 50
crash possible
restart may replay 50 if offset was not stored

Duplicate is safer than skip.


29. Reference Architecture: Stream-Based Read Model Rebuild

Key decisions:

  • producer writes business DB and outbox atomically;
  • relay publishes to stream with confirms;
  • stream retains enough data for rebuild window;
  • projection is idempotent by eventId and aggregateVersion;
  • offset is stored after projection commit;
  • backfill tool reads from first or timestamp;
  • audit export has its own offset and failure handling;
  • side effects are not triggered by replay consumers.

30. Self-Correction Rubric

You understand RabbitMQ Streams when you can answer these without guessing:

QuestionStrong answer should include
Why not use a queue?non-destructive fan-out/replay/retention requirement
What does offset mean?consumer progress, not message ownership
What deletes stream records?retention, not ack
What happens after replay?same records can drive idempotent handlers again
What happens if offset is stored before DB commit?skip risk
What happens if DB commit happens before offset store?duplicate risk
What is the retention SLO?data window + lag boundary
How do consumers avoid unsafe side effects?idempotency/side-effect ledger/replay guard
What is the partition key?stable entity key for future super stream scaling
How is stream health observed?lag, disk, confirms, replica health, retention risk

31. Minimal Vocabulary

TermMeaning
streampersistent replicated append-only log
offsetposition in a stream
retained datadata still available for replay
retentionage/size policy that removes old data
segmentdisk file unit used by stream storage
chunkbatch/storage/transport unit of messages
non-destructive consumerconsumer that does not remove data by reading
server-side offset trackingbroker stores consumer progress
external offset trackingapplication stores progress elsewhere
super streampartitioned logical stream made of multiple streams
replayconsuming historical records again
live tailconsuming only new records from next

32. Closing Model

A RabbitMQ stream is a durable memory of facts within a retention boundary.

It is best used when the system needs:

  • multiple independent readers;
  • replayable history;
  • high-throughput append;
  • large retained backlog;
  • event distribution inside a RabbitMQ-centered architecture.

It is not a universal replacement for queues. It is not automatically a stream processing platform. It is not infinite storage. It is not a substitute for idempotency.

The top-tier engineer's move is to model the primitive honestly:

Queue  = distribute work and remove after completion.
Stream = record facts and let consumers read/replay independently.

Everything else flows from that distinction.


References

Lesson Recap

You just completed lesson 19 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.