Build CoreOrdered learning track

Streams Deep Dive: Event Log, Consumer Group, Pending Entries, Replay

Learn Java Redis In Action - Part 007

Production-grade Redis Streams for Java systems: append-only event logs, consumer groups, pending entries, replay, trimming, dead-letter handling, and operational boundaries.

[2026-07-02]12 min read2277 words

In This Lesson

1. Kaufman Skill Decomposition 2. The Mental Model 3. Stream vs List vs Pub/Sub vs Kafka

PrevNext

Lesson 0734 lesson track07–18 Build Core

#java#redis#streams#consumer-groups+3 more

Part 007 — Streams Deep Dive: Event Log, Consumer Group, Pending Entries, Replay

Part 006 covered Lists, Sets, and Sorted Sets. Those structures are powerful, but when the requirement becomes:

append event
consume independently
track acknowledgement
recover failed consumers
replay recent history
inspect pending work

then Redis Streams become the more natural Redis primitive.

A Redis Stream is not just a queue. It is an append-only log-like data structure with message IDs, field-value entries, range reads, consumer groups, a pending entries list, explicit acknowledgement, and trimming.

The production question is not:

Can Redis Streams replace Kafka?

The better question is:

For this bounded workflow, do I need lightweight durable-ish stream processing inside Redis,
or do I need a full event streaming platform with long retention, partition governance,
large-scale replay, schema registry, ecosystem tooling, and independent storage scaling?

This part teaches Redis Streams as a practical Java engineering tool.

1. Kaufman Skill Decomposition

Target skill:

Given a workflow that needs event-like delivery, choose Redis Streams intentionally, design the stream key model, write/read/ack/retry/dead-letter lifecycle, and operate it without unbounded memory, hidden duplicate processing, or unrecoverable pending messages.

Sub-skills:

Sub-skill	What you must be able to do
Stream modeling	Choose stream key, message ID, fields, version, tenant boundary, retention
Producer design	Append with `XADD`, enforce max length, publish stable event envelopes
Consumer design	Use `XREAD` or `XREADGROUP` correctly
Consumer group design	Model competing consumers, independent groups, and ownership
Ack lifecycle	Understand when messages enter and leave the Pending Entries List
Replay	Re-read history by ID and support deterministic recovery
Retry	Claim idle pending messages with `XAUTOCLAIM` or `XCLAIM`
Poison handling	Move toxic messages to dead-letter streams without blocking healthy flow
Backpressure	Limit batch size, blocking wait, pending size, and handler concurrency
Java integration	Implement safe producer/consumer loops with Lettuce/Spring Data Redis style APIs
Operations	Monitor length, lag, pending entries, idle consumers, memory, and trimming

Redis Streams are easy to start and easy to misuse. A senior engineer treats them as a workflow state machine, not merely a command API.

2. The Mental Model

A stream contains ordered entries. Each entry has:

stream key
entry id
a set of field-value pairs

Example:

XADD order-events * \
  eventType OrderSubmitted \
  orderId ord-1001 \
  customerId cust-77 \
  schemaVersion 1 \
  occurredAt 2026-07-02T10:15:00Z

Redis returns an ID such as:

1751441700000-0

The ID has two parts:

milliseconds-sequence

Conceptually:

1751441700000-0
|             |
|             sequence for same millisecond
milliseconds timestamp

Do not depend on stream IDs as business timestamps unless you intentionally accept server-time behavior. For business events, include occurredAt as a field.

3. Stream vs List vs Pub/Sub vs Kafka

Redis gives several messaging-ish options. They are not interchangeable.

Need	Better Redis primitive	Why
Fire-and-forget live notification	Pub/Sub	Lowest ceremony, no persistence, no replay
Simple local queue with minimal reliability	List	`LPUSH`/`BRPOP` style queue
Delayed execution	Sorted Set	Score as scheduled time
Durable-ish recent event workflow	Stream	Append log, consumer group, ack, replay
Long-retention distributed event backbone	Kafka/Pulsar/etc.	Partition governance, retention, replay, ecosystem

A useful heuristic:

If missing the message is acceptable -> Pub/Sub may be enough.
If every item must be handled at least once -> Stream or external broker.
If replay/history is a product/platform requirement -> usually Kafka-like system.
If the workflow is bounded and Redis is already operationally accepted -> Stream may be a good fit.

Redis Streams are often excellent for:

small-to-medium internal event fanout
background job pipelines
cache invalidation with replay
webhook delivery buffers
short-retention audit-ish operational events
real-time ingestion before batch flush
notification dispatch pipelines
retryable side effects
local stream bridge between services

They are usually not enough for:

multi-day or multi-month canonical event log
enterprise event mesh
cross-team data contracts with schema registry
unbounded analytics ingestion
large replay by many independent historical consumers
strict ordering across arbitrary entities
exactly-once transactional processing

4. Basic Commands

Append

XADD mystream * field1 value1 field2 value2

Read from ID

XREAD COUNT 10 STREAMS mystream 0

Read new entries only:

XREAD BLOCK 5000 COUNT 10 STREAMS mystream $

$ means:

Only entries added after this command starts waiting.

That is useful for live tailing, but dangerous for recovery because old messages are skipped.

Range query

XRANGE mystream - + COUNT 10

Reverse range:

XREVRANGE mystream + - COUNT 10

Trim

XTRIM mystream MAXLEN ~ 100000

Approximate trimming with ~ is usually preferred for performance.

Delete entry

XDEL mystream 1751441700000-0

Deleting an entry does not automatically mean your consumer group state is clean. For normal processing, use acknowledgement.

5. Stream Entry Design

A stream entry is a field-value map. The shape matters.

A weak event:

XADD order-events * orderId 123 status PAID

A production-oriented event:

XADD order-events MAXLEN ~ 1000000 * \
  eventId evt-01JZ8RE0A5QA4T8MZQWJR7MZ6F \
  eventType OrderPaymentCaptured \
  schemaVersion 3 \
  aggregateType Order \
  aggregateId ord-1001 \
  tenantId tenant-a \
  producer billing-service \
  occurredAt 2026-07-02T10:15:00.123Z \
  traceId 4bf92f3577b34da6a3ce929d0e0e4736 \
  payload '{"paymentId":"pay-777","amount":"149000","currency":"IDR"}'

Recommended envelope fields:

Field	Purpose
`eventId`	Idempotency and correlation independent of Redis stream ID
`eventType`	Routing and handler selection
`schemaVersion`	Compatibility during evolution
`aggregateType`	Domain grouping
`aggregateId`	Entity identity
`tenantId`	Multi-tenant boundary and auditability
`producer`	Ownership and debugging
`occurredAt`	Domain/event time
`traceId`	Distributed tracing
`payload`	Structured body, commonly JSON

Do not make handlers depend on a fragile positional payload. Streams are field-value oriented; use that to make messages inspectable.

6. Key Design for Streams

A stream key is a routing and scaling decision.

Possible models:

order-events
order-events:{tenant-a}
order-events:{tenant-a}:payments
service:billing:outbox
notification-dispatch:{region-apac}

Single Global Stream

stream: order-events

Pros:

simple
single ordering scope
one place to inspect
simple consumer group setup

Cons:

hot key risk
all tenants/workflows share backlog
harder isolation
limited horizontal distribution in Redis Cluster

Per-Tenant Stream

stream: order-events:{tenantId}

Pros:

tenant isolation
clearer deletion/retention
smaller per-key streams

Cons:

more keys
consumer orchestration becomes harder
must discover active tenant streams

Per-Workflow Stream

stream: billing:webhook-delivery
stream: fulfillment:shipment-events
stream: notification:email-dispatch

Pros:

bounded operational responsibility
clear handler ownership
different retention per workflow

Cons:

cross-workflow ordering not guaranteed
requires contract discipline

Cluster Hash Tags

Redis Cluster places keys by hash slot. If a Lua script or transaction needs multiple keys, use hash tags carefully:

stream:order:{tenant-a}
dlq:order:{tenant-a}
retry:index:{tenant-a}

The part inside {} controls slot placement. Do not overuse hash tags to force everything into one slot; that can create a hot shard.

7. Consumer Groups

Without consumer groups, each reader tracks its own last-read ID. With consumer groups, Redis tracks delivery state for a group.

Create group:

XGROUP CREATE order-events billing-workers $ MKSTREAM

Meaning:

stream: order-events
group: billing-workers
start at: $
create stream if missing: yes

Use 0 instead of $ if the group should process existing history.

XGROUP CREATE order-events billing-workers 0 MKSTREAM

Read as a consumer in a group:

XREADGROUP GROUP billing-workers worker-1 COUNT 10 BLOCK 5000 STREAMS order-events >

The > means:

Give me messages never delivered to any consumer in this group.

When a message is delivered through XREADGROUP, it enters the group Pending Entries List. It remains pending until acknowledged:

XACK order-events billing-workers 1751441700000-0

Consumer group model:

Important invariant:

Within one consumer group, a new message is assigned to one consumer at a time.
Across different consumer groups, each group receives its own independent view.

This means a stream can support fanout by using multiple groups:

order-events
  group billing-projection
  group search-indexer
  group notification-dispatcher
  group fraud-signal-ingestion

Each group has independent delivery and acknowledgement state.

8. Pending Entries List

The Pending Entries List, often called PEL, is the state that makes consumer groups useful.

A message becomes pending when delivered:

XREADGROUP -> pending

A message leaves pending when acknowledged:

XACK -> no longer pending

A message stays pending when the worker crashes after receiving it but before XACK.

Inspect pending summary:

XPENDING order-events billing-workers

Detailed pending entries:

XPENDING order-events billing-workers - + 10

Typical fields include:

message id
consumer name
idle time
delivery count

This is operationally critical. A growing PEL usually means one of these:

handler is failing
handler is too slow
workers crashed
ack is missing
poison messages block progress
consumer concurrency too high
external dependency is degraded

A stream with a large PEL is not healthy just because producers can still append. Backlog and pending are different problems.

Backlog: messages not yet delivered to the group.
Pending: messages delivered but not acknowledged.

9. Claiming Failed Work

If a consumer dies, its pending messages do not automatically become new messages. Another consumer must claim them after they have been idle long enough.

Modern Redis supports XAUTOCLAIM:

XAUTOCLAIM order-events billing-workers worker-2 60000 0-0 COUNT 100

Meaning:

stream: order-events
group: billing-workers
claiming consumer: worker-2
minimum idle time: 60 seconds
start scanning at: 0-0
batch: 100

High-level retry loop:

1. Read new messages with XREADGROUP ... >
2. Process messages.
3. XACK successful messages.
4. Periodically XAUTOCLAIM old pending messages.
5. Retry claimed messages.
6. If delivery count exceeds threshold, move to DLQ and XACK original.

Failure recovery flow:

Do not immediately claim messages with tiny idle times. That creates duplicate processing while the original worker may still be legitimately working.

Claiming threshold should consider:

p99 handler time
external dependency p99
GC pause budget
network timeout
shutdown grace period
maximum acceptable duplicate delay

10. Processing Semantics

Redis Streams with consumer groups are normally used for at-least-once processing.

At-least-once means:

A message may be delivered again if the worker fails before acknowledgement.

Therefore handlers must be idempotent.

Bad handler:

on PaymentCaptured event:
  send receipt email
  XACK

If the worker sends the email and crashes before XACK, another worker may send another email.

Better handler:

on PaymentCaptured event:
  claim idempotency key receipt:{eventId}
  if already processed -> XACK
  send receipt email
  persist receipt delivery record
  XACK

Example idempotency marker:

SET receipt-sent:evt-123 1 NX EX 2592000

If the side effect is external and non-idempotent, Redis Streams alone do not solve correctness. You need domain-level idempotency, transactional outbox, fencing, unique constraints, or downstream idempotent APIs.

11. Stream Handler State Machine

A production consumer should be modeled explicitly.

Key transitions:

Transition	Redis operation	Notes
Produce	`XADD`	Include envelope and bounded retention
Receive	`XREADGROUP`	Message enters PEL
Success	`XACK`	Message leaves PEL
Crash	none	Message stays pending
Recovery	`XAUTOCLAIM`	Ownership moves to new consumer
Poison	`XADD` to DLQ + `XACK` original	Avoid infinite retry loop

This state machine prevents the common misconception that a consumer group is automatically self-healing. It is not. You must run the recovery path.

12. Java Producer Design

A Java producer should publish stable event envelopes. The stream should not receive arbitrary domain object serialization without versioning.

Example domain event record:

public record RedisStreamEvent(
        String eventId,
        String eventType,
        int schemaVersion,
        String aggregateType,
        String aggregateId,
        String tenantId,
        String producer,
        Instant occurredAt,
        String traceId,
        String payloadJson
) {}

Repository-style publisher:

public interface RedisStreamPublisher {
    String publish(String streamKey, RedisStreamEvent event);
}

Pseudo-implementation shape:

public final class OrderEventPublisher implements RedisStreamPublisher {

    private final RedisCommands<String, String> redis;
    private final String streamKey;

    public OrderEventPublisher(RedisCommands<String, String> redis, String streamKey) {
        this.redis = redis;
        this.streamKey = streamKey;
    }

    @Override
    public String publish(String streamKey, RedisStreamEvent event) {
        Map<String, String> body = Map.of(
                "eventId", event.eventId(),
                "eventType", event.eventType(),
                "schemaVersion", Integer.toString(event.schemaVersion()),
                "aggregateType", event.aggregateType(),
                "aggregateId", event.aggregateId(),
                "tenantId", event.tenantId(),
                "producer", event.producer(),
                "occurredAt", event.occurredAt().toString(),
                "traceId", event.traceId(),
                "payload", event.payloadJson()
        );

        // Actual Lettuce APIs vary by version and command interface.
        // The important design is: XADD with bounded retention and explicit envelope.
        return redis.xadd(streamKey, body);
    }
}

For production, wrap this with:

command timeout
metric: publish latency
metric: publish failures
metric: payload size
metric: stream key
trace attributes
bounded stream length policy

Avoid letting application code call XADD directly from random places. Use a repository/gateway so stream contracts are centralized.

13. Java Consumer Loop Design

A naive consumer loop:

while (true) {
    var messages = readGroup();
    for (var message : messages) {
        handle(message);
        ack(message);
    }
}

A production consumer loop needs more structure:

initialize group if missing
start live read loop
start pending-claim loop
limit batch size
limit handler concurrency
use bounded executor
apply shutdown flag
stop reading on shutdown
finish or abandon in-flight work
ack only after successful handling
DLQ poison messages
emit metrics

Pseudo-code:

public final class StreamWorker implements Runnable {

    private final AtomicBoolean running = new AtomicBoolean(true);
    private final StreamConsumerClient client;
    private final EventHandler handler;
    private final DeadLetterPublisher deadLetters;

    @Override
    public void run() {
        ensureGroupExists();

        while (running.get()) {
            List<StreamMessage> batch = client.readNewMessages(
                    "billing-workers",
                    consumerName(),
                    Duration.ofSeconds(5),
                    50
            );

            for (StreamMessage message : batch) {
                processOne(message);
            }
        }
    }

    private void processOne(StreamMessage message) {
        try {
            handler.handle(message);
            client.ack(message);
        } catch (ValidationException e) {
            deadLetters.publish(message, e, "validation_failed");
            client.ack(message);
        } catch (RetryableException e) {
            // Do not ack. Let the pending recovery loop claim later.
            client.recordFailureMetric(message, e);
        } catch (Exception e) {
            // Decide whether unknown exceptions are retryable or DLQ-worthy.
            client.recordFailureMetric(message, e);
        }
    }

    public void stop() {
        running.set(false);
    }
}

The important rule:

Ack is a business decision, not a finally block.

Do not do this:

try {
    handler.handle(message);
} finally {
    ack(message); // dangerous
}

That turns failures into message loss.

14. Pending Recovery Loop

Run pending recovery separately from live consumption.

public final class PendingRecoveryWorker implements Runnable {

    private final StreamConsumerClient client;
    private final EventHandler handler;
    private final DeadLetterPublisher deadLetters;
    private final Duration minIdleTime = Duration.ofMinutes(2);
    private final int maxDeliveryCount = 5;

    @Override
    public void run() {
        String cursor = "0-0";

        while (!Thread.currentThread().isInterrupted()) {
            AutoClaimResult result = client.autoClaim(
                    "billing-workers",
                    consumerName(),
                    minIdleTime,
                    cursor,
                    100
            );

            cursor = result.nextCursor();

            for (StreamMessage message : result.messages()) {
                if (message.deliveryCount() > maxDeliveryCount) {
                    deadLetters.publish(message, null, "max_delivery_exceeded");
                    client.ack(message);
                    continue;
                }

                try {
                    handler.handle(message);
                    client.ack(message);
                } catch (RetryableException e) {
                    // Leave pending again.
                    client.recordRetryFailure(message, e);
                } catch (Exception e) {
                    deadLetters.publish(message, e, "unexpected_error");
                    client.ack(message);
                }
            }

            sleepBrieflyIfCursorWrapped(cursor);
        }
    }
}

Recovery should be observable:

claimed count
claimed age
delivery count distribution
DLQ count
handler success after claim
handler failure after claim
oldest pending age

If pending recovery is not monitored, your system may look healthy while silently accumulating unprocessed messages.

15. Dead-Letter Stream Design

A dead-letter queue is not a trash bin. It is a diagnostic and remediation contract.

Recommended DLQ key:

dlq:billing:order-events

DLQ entry fields:

XADD dlq:billing:order-events MAXLEN ~ 100000 * \
  originalStream order-events \
  originalId 1751441700000-0 \
  group billing-workers \
  consumer worker-2 \
  eventId evt-123 \
  eventType OrderPaymentCaptured \
  reason max_delivery_exceeded \
  errorClass java.net.SocketTimeoutException \
  errorMessage 'payment provider timeout' \
  failedAt 2026-07-02T10:30:00Z \
  payload '{...}'

DLQ rules:

include original stream and ID
include reason category
include delivery count
include trace ID if available
include compact error information
preserve original payload
bound DLQ length
alert on rate, not just existence
provide replay/remediation tooling

Never build a DLQ without an owner. A DLQ nobody inspects is just delayed data loss.

16. Trimming and Retention

Streams live in memory. Unbounded streams are outages waiting to happen.

Common retention strategies:

Strategy	Command shape	Use case
Max length approximate	`XADD key MAXLEN ~ N * ...`	Most common bounded stream
Explicit trim	`XTRIM key MAXLEN ~ N`	Scheduled cleanup
Min ID trim	`XTRIM key MINID ~ id`	Time-like retention by stream ID
Domain archive	Copy to database/object storage	Long-term audit/history

Example:

XADD order-events MAXLEN ~ 1000000 * eventType OrderSubmitted orderId ord-1

Do not pick MAXLEN randomly. Estimate:

messages per second
average serialized entry size
required replay window
number of streams
available Redis memory
replication overhead
AOF/RDB persistence impact

Example capacity sketch:

2,000 events/sec
average entry 700 bytes logical payload
retain 2 hours
2,000 * 7,200 = 14,400,000 events
14,400,000 * 700 bytes ~= 10 GB logical payload before overhead

This is already large for Redis. The correct answer may be Kafka, database partitioning, object storage, or shorter Redis retention.

17. Ordering Model

Redis Streams preserve entry order within a stream key. But processing order can be affected by:

multiple consumers in a group
handler latency variation
retry and claim behavior
DLQ movement
multiple stream keys
Redis Cluster sharding

If strict per-order processing is required, model the key carefully.

Options:

One Stream per Aggregate

stream:order:{orderId}

Pros:

strong per-aggregate ordering
small local history

Cons:

many stream keys
hard worker orchestration
hard global scan

One Stream per Workflow, Handler Enforces Per-Aggregate Lock

stream:order-events
lock:order:{orderId}

Pros:

simple stream topology
centralized processing

Cons:

lock complexity
duplicate/timeout concerns
throughput impacted by hot aggregates

One Stream per Shard

stream:order-events:{shard-00}
stream:order-events:{shard-01}
...

Pros:

bounded number of streams
parallelism by shard
stable routing by aggregateId hash

Cons:

requires shard router
resharding complexity

For most Java service workflows, shard-by-aggregate is a pragmatic middle ground.

18. Backpressure

A stream can absorb events faster than consumers can process them. That is not automatically good. Backlog is deferred work. Deferred work has memory, latency, and business consequences.

Backpressure controls:

producer-side rate limits
bounded stream length
consumer COUNT size
consumer BLOCK duration
handler concurrency cap
connection pool limits
external dependency circuit breaker
retry delay
DLQ threshold
alert on lag and pending age

Consumer loop anti-pattern:

while (true) {
    List<Message> messages = readCount(10_000);
    messages.parallelStream().forEach(handler::handle);
}

Problems:

unbounded parallelism
external dependency overload
ack order confusion
huge memory spike in JVM
GC pressure
timeouts cause pending explosion

Better shape:

COUNT 50
bounded executor size 8-32 depending on handler profile
bulkhead per downstream dependency
retryable failures left pending or scheduled carefully
handler timeout shorter than claim idle threshold

19. Multi-Group Fanout

Streams support independent consumer groups. This is useful when different parts of the system need the same event.

Each group has its own:

last delivered ID
pending entries
consumers
ack state
lag

This is powerful, but it also creates governance obligations:

who owns each group?
what is each group's lag SLO?
what happens if one group is dead for 3 days?
can trimming delete data before a slow group consumes it?
who approves new groups?

Redis does not magically enforce organizational discipline. You must define stream ownership.

20. Streams and the Transactional Outbox Pattern

A common mistake:

write database row
XADD event to Redis

If the database write succeeds and XADD fails, the event is missing. If XADD succeeds and the database transaction rolls back, the event is false.

For critical domain events, use an outbox.

Redis Streams can be the delivery substrate after the outbox. They should not replace the atomicity of your primary database transaction.

For non-critical ephemeral workflows, direct XADD may be acceptable. Make that decision explicit.

21. Idempotent Consumer Pattern

Use the event ID as the idempotency key.

processed:{group}:{eventId}

Basic pattern:

SET processed:billing-workers:evt-123 1 NX EX 2592000

If SET returns success, process. If it returns null, the event was already processed.

But be careful with ordering:

SET idempotency marker before side effect:
  crash after marker before side effect -> side effect skipped forever

SET idempotency marker after side effect:
  crash after side effect before marker -> side effect may duplicate

Therefore the correct design depends on side-effect semantics.

Better for database writes:

use unique constraint on event_id in the target table
process in database transaction
ack after commit

Better for external APIs:

use provider's idempotency key when available
persist delivery attempt record
ack only after confirmed outcome

Redis idempotency markers are useful, but they are not a substitute for domain-level correctness.

22. Java Error Taxonomy

Classify handler failures.

Failure type	Example	Action
Validation failure	missing required field	DLQ + ack
Schema unsupported	unknown `schemaVersion`	DLQ or park until deploy
Retryable dependency	timeout, 503	leave pending or retry later
Permanent domain failure	invalid state transition	DLQ + ack after audit
Duplicate	idempotency marker exists	ack
Poison bug	handler always crashes	DLQ after threshold
Infrastructure failure	Redis/client/network	do not falsely ack

A handler that catches all exceptions and returns success is a data-loss system. A handler that retries forever without DLQ is a backlog-amplification system.

23. Schema Evolution

Streams can contain old and new messages at the same time. Consumers must handle compatible versions.

Envelope:

{
  "eventType": "OrderSubmitted",
  "schemaVersion": 2,
  "payload": {
    "orderId": "ord-1001",
    "customerId": "cust-77",
    "totalAmount": "149000",
    "currency": "IDR"
  }
}

Evolution rules:

add optional fields first
never rename without compatibility layer
never change meaning under same schemaVersion
make consumers tolerant of unknown fields
keep old handlers until old messages are drained or expired
include version in metrics and DLQ

For internal systems, JSON may be enough. For larger platforms, consider Avro/Protobuf + schema registry, but remember Redis stores bytes/strings; schema governance is outside Redis.

24. Observability

Minimum stream metrics:

producer publish count
producer publish latency
producer publish failures
stream length
consumer group lag
pending count
oldest pending age
pending delivery count distribution
handler success count
handler failure count by type
ack latency
claim count
DLQ count
DLQ reason distribution
message payload size
consumer heartbeat/last seen

Useful Redis commands:

XLEN order-events
XINFO STREAM order-events
XINFO GROUPS order-events
XINFO CONSUMERS order-events billing-workers
XPENDING order-events billing-workers

Dashboards should separate:

input rate
processing rate
backlog
pending
failure rate
retry rate
DLQ rate
memory usage

A single “Redis is up” metric is useless for Streams reliability.

25. Operational Runbook

Symptom: Stream length keeps growing

Likely causes:

consumer stopped
consumer too slow
new consumer group created at wrong offset
external dependency slow
producer spike

Actions:

check XINFO GROUPS
check consumer process health
compare producer vs consumer rate
inspect oldest message age
scale consumers if safe
apply upstream throttling if needed

Symptom: Pending count keeps growing

Likely causes:

handlers fail before ack
ack code path missing
worker crashes
timeouts too short
poison messages

Actions:

check XPENDING
inspect delivery count
check handler exceptions
run XAUTOCLAIM recovery
DLQ poison messages
fix ack semantics

Symptom: Duplicate processing

Likely causes:

worker crash after side effect before ack
idle claim threshold too low
handler timeout longer than claim threshold
manual replay

Actions:

validate idempotency design
increase min idle claim threshold
align handler timeout and lease settings
add downstream idempotency keys

Symptom: Messages disappear

Possible causes:

stream trimming too aggressive
consumer group started at $
wrong stream key
handler acked on failure
manual XDEL
Redis data loss due persistence/replication config

Actions:

check XTRIM/MAXLEN policy
check group creation ID
check deployment logs
check persistence and failover event
review handler finally blocks

26. Practice: Build a Reliable Webhook Dispatch Stream

Scenario:

Your Java service must send webhooks to merchants when orders change.
Webhooks can fail temporarily.
You need retry, DLQ, and observability.

Design:

stream: webhook:order-events
group: webhook-dispatchers
dlq: webhook:order-events:dlq
idempotency: webhook-delivery:{merchantId}:{eventId}

Producer entry:

XADD webhook:order-events MAXLEN ~ 500000 * \
  eventId evt-123 \
  eventType OrderStatusChanged \
  schemaVersion 1 \
  merchantId m-77 \
  orderId ord-1001 \
  occurredAt 2026-07-02T10:00:00Z \
  payload '{"status":"PAID"}'

Consumer behavior:

read new messages
validate event
load merchant webhook endpoint
send webhook with eventId as idempotency key
ack on HTTP 2xx or known duplicate
leave pending on timeout/5xx
DLQ on 4xx config errors or max delivery exceeded

Metrics:

webhook delivery latency
HTTP status distribution
pending count
oldest pending age
DLQ count by merchant

Acceptance tests:

worker crash after read -> message is later claimed
worker crash after send before ack -> duplicate is suppressed by idempotency
merchant endpoint returns 500 -> message remains retryable
merchant endpoint returns 400 -> message goes to DLQ
stream reaches max length -> old entries are trimmed within retention expectation

27. Common Anti-Patterns

Anti-pattern 1: Using Streams as Infinite Event Store

Redis is memory-first. If you need long-term canonical history, use another storage layer.

Anti-pattern 2: No Ack Discipline

Acking in finally loses failed messages. Never do it.

Anti-pattern 3: No Pending Recovery

Consumer groups do not automatically retry dead consumers. You need claiming logic.

Anti-pattern 4: No DLQ

A poison message can create endless retries and hide real throughput.

Anti-pattern 5: Unbounded Message Size

Do not put giant documents in stream entries. Store large payloads elsewhere and put references in the stream.

Anti-pattern 6: Creating Consumer Names Randomly Forever

If every deploy creates new consumer names and never cleans old ones, XINFO CONSUMERS becomes noisy. Use stable instance identity and cleanup dead consumers deliberately.

Anti-pattern 7: Starting Group at `$` by Accident

$ means future messages only. If existing backlog matters, start at 0 or a known ID.

28. Decision Checklist

Use Redis Streams when most of these are true:

retention window is short/bounded
Redis memory budget can handle backlog
consumer count is moderate
workflow benefits from ack/retry
Java service owns both producer/consumer contract
replay need is operational, not historical analytics

Prefer Kafka/RabbitMQ/database/outbox when:

retention is long
many independent teams consume events
schema governance is critical
throughput is very high
backlog can be huge
you need broker-level delivery policies
routing/exchange semantics are central
strict durability is required

29. Mental Model Summary

Redis Streams provide:

append-only entries
entry IDs
range reads
blocking reads
consumer groups
pending tracking
acknowledgement
claiming failed work
bounded trimming

They do not automatically provide:

exactly-once processing
infinite retention
schema governance
transactional integration with your database
automatic poison-message handling
auto-retry without design
strong ordering under parallel consumers

The core invariant:

A message is not done when read.
A message is done when the business side effect is safely completed and the message is acknowledged.

That one sentence prevents most Redis Stream mistakes.

30. References

Redis Docs — Streams: https://redis.io/docs/latest/develop/data-types/streams/
Redis Command — XADD: https://redis.io/docs/latest/commands/xadd/
Redis Command — XREADGROUP: https://redis.io/docs/latest/commands/xreadgroup/
Redis Command — XPENDING: https://redis.io/docs/latest/commands/xpending/
Redis Command — XAUTOCLAIM: https://redis.io/docs/latest/commands/xautoclaim/
Redis Command — XACK: https://redis.io/docs/latest/commands/xack/

Lesson Recap

You just completed lesson 07 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 06

Lists, Sets, and Sorted Sets: Queues, Membership, Ranking, Scheduling

Next Lesson

Lesson 08

Pub/Sub, Keyspace Notifications, and Ephemeral Messaging

Streams Deep Dive: Event Log, Consumer Group, Pending Entries, Replay

Part 007 — Streams Deep Dive: Event Log, Consumer Group, Pending Entries, Replay

1. Kaufman Skill Decomposition

2. The Mental Model

3. Stream vs List vs Pub/Sub vs Kafka

4. Basic Commands

Append

Read from ID

Range query

Trim

Delete entry

5. Stream Entry Design

6. Key Design for Streams

Single Global Stream

Per-Tenant Stream

Per-Workflow Stream

Cluster Hash Tags

7. Consumer Groups

8. Pending Entries List

9. Claiming Failed Work

10. Processing Semantics

11. Stream Handler State Machine

12. Java Producer Design

13. Java Consumer Loop Design

14. Pending Recovery Loop

15. Dead-Letter Stream Design

16. Trimming and Retention

17. Ordering Model

One Stream per Aggregate

One Stream per Workflow, Handler Enforces Per-Aggregate Lock

One Stream per Shard

18. Backpressure

19. Multi-Group Fanout

20. Streams and the Transactional Outbox Pattern

21. Idempotent Consumer Pattern

22. Java Error Taxonomy

23. Schema Evolution

24. Observability

25. Operational Runbook

Symptom: Stream length keeps growing

Symptom: Pending count keeps growing

Symptom: Duplicate processing

Symptom: Messages disappear

26. Practice: Build a Reliable Webhook Dispatch Stream

27. Common Anti-Patterns

Anti-pattern 1: Using Streams as Infinite Event Store

Anti-pattern 2: No Ack Discipline

Anti-pattern 3: No Pending Recovery

Anti-pattern 4: No DLQ

Anti-pattern 5: Unbounded Message Size

Anti-pattern 6: Creating Consumer Names Randomly Forever

Anti-pattern 7: Starting Group at $ by Accident

28. Decision Checklist

29. Mental Model Summary

30. References

Anti-pattern 7: Starting Group at `$` by Accident