Deepen PracticeOrdered learning track

Learn Java Messaging Event Streaming Part 029 Pipelining Batching Throughput

[]23 min read4523 words

In This Lesson

1. What We Are Solving 2. The Core Mental Model 3. Pipelining vs Batching

Lesson 2935 lesson track20–29 Deepen Practice

title: Learn Java Messaging and Event Streaming - Part 029 description: Throughput engineering across JMS/Jakarta Messaging, Kafka, RabbitMQ queues, RabbitMQ Streams, Kafka Streams, and ksqlDB using pipelining, batching, compression, buffering, concurrency, and realistic benchmark discipline. series: learn-java-messaging-event-streaming seriesTitle: Learn Java Messaging and Event Streaming order: 29 partTitle: Pipelining and Batching: Throughput Engineering Across Messaging Systems tags:

java
messaging
event-streaming
kafka
rabbitmq
rabbitmq-streams
jms
jakarta-messaging
batching
pipelining
performance
throughput date: 2026-06-28

Part 029 — Pipelining and Batching: Throughput Engineering Across Messaging Systems

1. What We Are Solving

This part is about a simple production question that becomes complex quickly:

How do we move more messages per second without destroying latency, reliability, ordering, memory, or operability?

The shallow answer is usually:

Increase batch size.

That answer is incomplete.

Throughput in messaging systems is not controlled by one setting. It is the result of several queues, buffers, flush boundaries, network round trips, disk writes, replication, serialization cost, compression, acknowledgement strategy, consumer parallelism, downstream capacity, and failure recovery behavior.

A top-tier engineer does not tune batch.size or prefetch in isolation. They identify the bottleneck, choose the batching layer intentionally, protect tail latency, and define what happens when the pipeline becomes full.

In this part, we will build the mental model that applies across:

JMS / Jakarta Messaging
Kafka producer and consumer
Kafka Streams and ksqlDB
RabbitMQ queues
RabbitMQ Streams and Superstreams
application-level side effects such as database writes, HTTP calls, search indexing, and audit logging

2. The Core Mental Model

A messaging pipeline is a sequence of stages.

Each stage has:

input rate
output rate
service time
queue/buffer capacity
batch boundary
flush trigger
failure mode
observability signal

Throughput is limited by the slowest stable stage.

If the producer can emit 100,000 events/sec but the consumer-side database can safely process 8,000 writes/sec, the system is an 8,000/sec system unless you buffer, shed, partition, aggregate, or redesign the side effect.

The key invariant:

Batching increases efficiency by amortizing fixed cost, but it also increases latency, memory exposure, duplicate/replay surface, and blast radius per failure.

3. Pipelining vs Batching

These terms are often mixed together. They are related but different.

3.1 Batching

Batching groups multiple units of work into one operation.

Examples:

sending 500 Kafka records in one produce request
committing one Kafka offset after processing 1,000 records
acknowledging multiple RabbitMQ deliveries with one basicAck(..., multiple=true)
committing one JMS transacted session after 100 messages
writing 1,000 rows using JDBC batch insert
compressing a group of records together

Batching reduces per-message overhead.

But it creates waiting time.

A record may wait in a buffer until:

batch is full
linger timeout expires
transaction commits
poll loop returns
consumer worker accumulates enough items
downstream writer flushes

3.2 Pipelining

Pipelining allows multiple operations to be in flight at the same time.

Examples:

Kafka producer sends request N+1 before response N completes
RabbitMQ publisher has multiple outstanding confirm-sequence numbers
consumer fetches more records while workers process previous records
stream processor overlaps deserialize, process, write, and commit phases
application keeps several HTTP side-effect calls in flight with bounded concurrency

Pipelining reduces idle time.

But it increases the number of records that are not yet durably completed.

3.3 The Difference

Mechanism	Improves	Main Cost	Common Failure
Batching	efficiency per operation	added latency and memory	larger duplicate/retry batch
Pipelining	resource utilization	more in-flight uncertainty	out-of-order completion
Concurrency	parallelism	coordination and contention	hot keys, race conditions
Compression	bandwidth/storage	CPU	latency spike under CPU pressure
Buffering	burst absorption	delayed failure visibility	memory exhaustion

High-throughput systems usually need all five, but with explicit bounds.

4. Throughput Equation for Practical Engineers

You do not need a perfect mathematical model. You need a useful one.

A rough upper bound:

throughput ≈ effective_parallelism × batch_size / batch_cycle_time

Where:

effective_parallelism = partitions, consumers, workers, connections, streams, shards, or database writers that can truly operate independently
batch_size = number of records per durable operation
batch_cycle_time = time to fill, send, persist, replicate, process, and acknowledge one batch

A more operational version:

stable_throughput = min(
  producer_capacity,
  broker_ingest_capacity,
  broker_replication_capacity,
  broker_storage_capacity,
  consumer_fetch_capacity,
  business_processing_capacity,
  downstream_side_effect_capacity,
  ack_or_commit_capacity
)

Throughput tuning starts by finding the minimum.

If you do not know the bottleneck, do not tune blindly.

5. Batch Boundaries Across the Pipeline

A record can be batched multiple times before it is “done”.

A common mistake is stacking batch boundaries without realizing it.

Example:

producer has linger.ms=50
broker replication adds 20–80 ms under load
consumer fetch waits for larger fetches
consumer processes 1,000 records before commit
database writer flushes every 2,000 rows
alerting only sees lag after several minutes

This may produce great benchmark throughput and terrible operational latency.

The tuning question is not “how large can the batch be?”

It is:

Which layer should batch, how large, for what workload, with what latency SLO, and what happens when that layer fails?

6. Kafka Producer Batching

Kafka producer batching is one of the most important throughput levers in Java event streaming.

The producer does not usually send each record immediately as an isolated network request. It groups records by topic-partition in an internal accumulator, then sends produce requests to brokers.

Simplified flow:

6.1 Key Producer Settings

Setting	What It Controls	Engineering Meaning
`batch.size`	upper bound for records accumulated per partition batch	larger batch can improve throughput if enough records hit same partition
`linger.ms`	wait time before sending a not-yet-full batch	adds latency to improve batch formation
`compression.type`	compression codec	reduces network/storage; increases CPU
`buffer.memory`	total producer buffer memory	bounds unsent data; protects JVM only if handled correctly
`max.block.ms`	how long `send()` can block waiting for metadata/buffer	backpressure boundary to application thread
`acks`	broker acknowledgement requirement	durability vs latency trade-off
`retries`	resend on retriable error	needed for reliability; interacts with ordering/idempotence
`delivery.timeout.ms`	upper bound for successful send completion	prevents infinite in-flight attempts
`max.in.flight.requests.per.connection`	pipeline depth per connection	affects throughput and ordering under retry

The producer setting most often misunderstood is batch.size.

It does not mean every request will contain that many bytes. It is a per-partition batch upper bound. If traffic is spread thinly across many partitions, increasing it may do little unless linger.ms also allows time for batches to fill.

6.2 The `linger.ms` Trade-Off

linger.ms is a deliberate small delay.

Low value:

lower latency at low traffic
smaller batches
more requests/sec
more CPU/network overhead

Higher value:

larger batches
better compression
fewer network requests
better broker efficiency
added waiting latency

For high-volume systems, a modest linger often improves both throughput and p99 latency because it reduces broker/network pressure. For low-latency command systems, a high linger can violate responsiveness.

6.3 Compression

Compression works best when messages in a batch are similar.

For event streams with repeated field names, similar schemas, and similar values, compression can reduce network and disk pressure significantly.

But compression is not free.

It consumes CPU on producer and broker/consumer side. Under CPU pressure, compression can turn a bandwidth bottleneck into a CPU bottleneck.

The engineering rule:

Enable compression only with real workload measurement: payload shape, broker CPU, producer CPU, consumer CPU, p95/p99 latency, and failure recovery time.

6.4 Idempotence and Pipelining

Kafka’s idempotent producer protects against duplicate writes caused by producer retries within Kafka’s broker protocol boundaries.

But high throughput often pushes engineers to increase in-flight requests.

More in-flight requests improve throughput, but under retries they can interact with ordering and failure behavior. Modern Kafka defaults and idempotence make many cases safer, but the architectural rule remains:

Never tune producer pipeline depth without verifying ordering requirements, retry behavior, and idempotence configuration.

6.5 Producer Callback Discipline

Bad callback pattern:

producer.send(record);

This can hide failures if the application never observes the returned future or callback.

Better pattern:

producer.send(record, (metadata, exception) -> {
    if (exception != null) {
        // classify: retriable? fatal? serialization? authorization? timeout?
        failureCounter.increment();
        errorHandler.handle(record, exception);
        return;
    }
    successCounter.increment();
    latencyRecorder.record(metadata.timestamp());
});

For very high throughput, callback work must be lightweight. Do not perform blocking database or network calls inside the producer callback.

6.6 Application-Level Kafka Producer Batch

Sometimes the application itself creates a batch before sending records.

Example:

List<CaseEvent> events = outboxRepository.fetchReadyBatch(500);

for (CaseEvent event : events) {
    ProducerRecord<String, CaseEvent> record = new ProducerRecord<>(
        "case-events.v1",
        event.caseId(),
        event
    );
    producer.send(record, callbackFor(event));
}

producer.flush();
outboxRepository.markAttempted(events);

Be careful with flush().

It can be useful at explicit boundaries, such as shutdown, tests, or outbox batch loops.

It is harmful if called after every record because it prevents producer batching and pipelining.

7. Kafka Consumer Batching

Kafka consumers fetch records in batches.

The application sees batches through poll().

while (running.get()) {
    ConsumerRecords<String, CaseEvent> records = consumer.poll(Duration.ofMillis(500));

    for (ConsumerRecord<String, CaseEvent> record : records) {
        process(record);
    }

    consumer.commitSync();
}

This simple loop hides several batch decisions.

7.1 Consumer Batch Settings

Setting	What It Controls	Engineering Meaning
`max.poll.records`	maximum records returned per `poll()`	bounds application processing batch
`fetch.min.bytes`	broker waits for at least this much data before responding	improves fetch efficiency at cost of wait
`fetch.max.wait.ms`	max wait for `fetch.min.bytes`	latency bound for fetch batching
`fetch.max.bytes`	max data per fetch request	memory and throughput boundary
`max.partition.fetch.bytes`	max data returned per partition	protects memory; must support largest batch
`max.poll.interval.ms`	max time between polls for group membership	prevents slow processing from looking dead
`session.timeout.ms`	group failure detection window	affects rebalance sensitivity

High throughput consumers usually tune both fetch size and processing architecture.

But the main risk is poll-loop starvation.

If you fetch a large batch and process it synchronously for longer than max.poll.interval.ms, the consumer can be considered failed and the group can rebalance.

7.2 Manual Commit Batch

Manual commit after every record is safe but inefficient.

Manual commit after too many records improves throughput but increases duplicate replay on crash.

A common strategy:

process records partition-aware
track highest safely processed offset per partition
commit offsets periodically or after a safe batch
commit only records whose side effects are durable

For simple sequential processing:

while (running.get()) {
    ConsumerRecords<String, CaseEvent> records = consumer.poll(Duration.ofMillis(250));
    if (records.isEmpty()) {
        continue;
    }

    for (ConsumerRecord<String, CaseEvent> record : records) {
        processor.process(record);
    }

    consumer.commitSync();
}

For concurrent processing, this gets harder. You must avoid committing past unfinished records in the same partition.

7.3 Partition-Aware Worker Batching

A safer concurrent model is partition-affine processing.

This preserves per-partition order while allowing parallelism across partitions.

7.4 Consumer Batch and Database Batch

Consumer batching becomes much more valuable when downstream side effects support batching.

Example: instead of 1 DB transaction per event, write 500 state transitions in one transaction.

But batch DB writes change failure semantics.

If one row violates a constraint, does the whole batch fail?

Options:

Strategy	Benefit	Risk
all-or-nothing DB batch	simple atomicity	one bad event blocks batch
split-on-failure	isolates poison record	more code and retry complexity
per-record fallback	stable recovery	slower under bad data
quarantine invalid record	protects pipeline	requires governance and replay

8. RabbitMQ Queue Throughput

RabbitMQ queue throughput is shaped by:

exchange routing cost
queue type
durability
publisher confirm mode
consumer acknowledgement mode
prefetch
message size
persistence
replication/quorum behavior
consumer side-effect speed

8.1 Publisher Confirms as Pipeline

Publisher confirms are not just reliability. They are also a pipeline control mechanism.

Low-throughput but simple:

channel.basicPublish(exchange, routingKey, props, body);
channel.waitForConfirmsOrDie(5_000);

This waits after every publish.

Better throughput uses multiple outstanding publishes and then waits for confirms in batches:

channel.confirmSelect();

int batchSize = 500;
int inBatch = 0;

for (OutboundMessage msg : messages) {
    channel.basicPublish(msg.exchange(), msg.routingKey(), msg.properties(), msg.body());
    inBatch++;

    if (inBatch == batchSize) {
        channel.waitForConfirmsOrDie(10_000);
        inBatch = 0;
    }
}

if (inBatch > 0) {
    channel.waitForConfirmsOrDie(10_000);
}

This improves throughput but increases the number of messages whose final publish status is unknown during failure.

The production version tracks publish sequence numbers and handles async confirms, nacks, and timeouts explicitly.

8.2 Consumer Prefetch as Batch Window

RabbitMQ prefetch limits unacknowledged deliveries per consumer.

Prefetch is not just “performance”. It is a backpressure and fairness control.

Small prefetch:

lower memory per consumer
fairer distribution
lower duplicate exposure
potentially lower throughput

Large prefetch:

better throughput for slow round-trip ack workloads
more messages buffered on consumer
higher duplicate/requeue burst on crash
worse fairness if processing time varies

For CPU-bound short tasks, prefetch can often be close to worker concurrency.

For I/O-bound tasks, it may be larger, but only if memory and duplicate exposure are acceptable.

8.3 RabbitMQ Ack Batching

AMQP acknowledgements can acknowledge multiple deliveries with the multiple flag.

Conceptually:

long lastDeliveryTag = envelope.getDeliveryTag();
channel.basicAck(lastDeliveryTag, true);

This can reduce ack overhead.

But it is only safe if all prior delivery tags on that channel are safely processed.

Do not use multiple=true with unordered concurrent processing unless you track contiguous completion correctly.

8.4 RabbitMQ Batch Anti-Pattern: Giant Consumer Transaction

Bad pattern:

prefetch 10,000
process all messages
ack all at the end
database commit at the end

This appears efficient until:

one poison message blocks the whole batch
consumer crashes and requeues 10,000 messages
retry storm starts
queue head-of-line blocking grows
memory spikes
operator cannot see which item is bad

Better pattern:

bounded prefetch
bounded batch size
per-message idempotency
split-on-failure logic
DLQ/quarantine after retry budget
ack only after durable side effect

9. RabbitMQ Streams Throughput

RabbitMQ Streams uses a log-like model and has throughput tools closer to Kafka than classic queues.

Important concepts:

stream append
offset-based consumption
retention
producer confirmation
server-side offset tracking
superstream partitioning
sub-entry batching
compression
deduplication based on producer identity and publishing ID

9.1 Sub-Entry Batching

Sub-entry batching groups messages inside publishing frames to improve throughput.

It can reduce protocol and storage overhead.

But it has explicit costs:

increased latency
more memory pressure
more CPU if compressed
duplicate semantics can become more complex

Use it when:

throughput matters more than very low latency
payloads are small and numerous
message similarity makes compression effective
consumers can tolerate the duplicate/replay implications

9.2 Stream Offset Commit Batch

Broker-provided offset tracking can persist offsets in the stream itself.

Commit too often:

more overhead
lower throughput

Commit too rarely:

more replay after crash
larger duplicate window

A practical rule:

Offset tracking frequency is a business duplicate-window decision, not only a performance setting.

For enforcement/case-management workloads, replaying 5 seconds of duplicate notifications may be acceptable; replaying 30 minutes of irreversible external actions probably is not.

10. JMS / Jakarta Messaging Batching

JMS does not expose Kafka-style producer accumulator settings as part of the portable API. Batching is usually expressed through:

transacted sessions
provider-specific tuning
asynchronous send behavior
message listener concurrency
acknowledgement mode
container transaction boundaries
application-level grouping

10.1 Transacted Session Batch

A transacted JMS session groups sends and receives into one transaction boundary.

Conceptual producer batch:

try (JMSContext ctx = connectionFactory.createContext(JMSContext.SESSION_TRANSACTED)) {
    JMSProducer producer = ctx.createProducer();

    for (CaseCommand command : commands) {
        producer.send(destination, serialize(command));
    }

    ctx.commit();
} catch (JMSRuntimeException ex) {
    // transaction rolled back or commit uncertain depending on failure point
    throw ex;
}

This reduces commit overhead but creates an uncertainty window.

If commit status is unknown after failure, the application must reconcile using idempotency keys, outbox state, or provider-specific transaction recovery.

10.2 Consumer Batch with Transacted Session

try (JMSContext ctx = connectionFactory.createContext(JMSContext.SESSION_TRANSACTED)) {
    JMSConsumer consumer = ctx.createConsumer(queue);

    for (int i = 0; i < 100; i++) {
        Message message = consumer.receive(500);
        if (message == null) {
            break;
        }
        process(message);
    }

    ctx.commit();
}

This is efficient only if:

all messages can be processed safely in the same transaction
processing time is bounded
poison message isolation is handled
duplicate replay after rollback is acceptable
transaction timeout is configured correctly

10.3 MDB Concurrency

In Jakarta EE, MDB concurrency is controlled by container/provider configuration. You tune throughput through pool size, destination configuration, transaction behavior, and downstream capacity.

The danger is hidden concurrency.

A handler may look single-threaded in code, while the container runs many instances in parallel.

If the message key requires per-case ordering, unconstrained MDB concurrency can violate the business invariant unless the provider supports message grouping or you partition destinations by entity.

11. Kafka Streams and ksqlDB Batching

Kafka Streams and ksqlDB batch indirectly through Kafka consumer/producer behavior, internal buffers, state-store caching, commit intervals, and topology design.

Important throughput dimensions:

number of input partitions
number of stream threads
repartition topics
state store write amplification
changelog topic traffic
cache size
commit interval
window retention
RocksDB I/O behavior
output topic partition count

11.1 Stateful Processing Batch Cost

Stateful stream processing has more hidden work than stateless mapping.

A single input event may cause:

deserialize input
read state store
update state store
write changelog record
write output record
maybe write repartition record
maybe update window store
maybe emit suppression result later

Batching helps, but stateful correctness constraints limit how aggressively you can batch.

11.2 Repartitioning as Throughput Tax

Every repartition introduces an internal topic and another produce/consume hop.

Repartitioning may be necessary for correctness, especially joins and group-by operations. But it is never free.

Decision rule:

Repartition for correctness deliberately; do not accidentally repartition because keys are unclear.

12. Downstream Side-Effect Batching

The broker is often not the bottleneck.

The bottleneck is usually:

relational database writes
Elasticsearch/OpenSearch indexing
remote HTTP APIs
object storage writes
notification providers
audit/event persistence
encryption/decryption
schema validation

12.1 Database Batch Example

Bad:

for (CaseEvent event : events) {
    repository.insertProjection(event); // one transaction each
}

Better:

repository.insertProjectionBatch(events);

But the correctness requirement must be explicit:

Is the batch atomic?
Can partial success occur?
Are events idempotent?
Is ordering required inside the batch?
What is the retry unit?
What is the quarantine unit?

12.2 HTTP Side Effects

HTTP calls rarely support large atomic batches.

For HTTP side effects, pipelining is often more useful than batching.

Use bounded concurrency:

Semaphore limit = new Semaphore(32);

for (NotificationCommand command : commands) {
    limit.acquire();
    executor.submit(() -> {
        try {
            notificationClient.send(command);
        } finally {
            limit.release();
        }
    });
}

But this must be integrated with offset/ack discipline.

Never ack/commit the message before the side effect is durably completed or recorded as pending in an outbox.

13. Latency-Throughput Curve

Batching has a typical curve.

The exact curve varies by workload.

But the shape matters:

throughput gains usually flatten
latency keeps rising
memory exposure rises
duplicate/retry blast radius rises

The best batch size is rarely the largest possible batch.

It is the smallest batch that reaches the required throughput while preserving latency and recovery SLOs.

14. What to Measure

Before tuning, define baseline measurements.

14.1 Producer Metrics

Measure:

records/sec
bytes/sec
request rate
batch size average
batch split rate
buffer available bytes
time blocked on buffer
request latency
record queue time
retries
errors by category
compression ratio

14.2 Broker Metrics

Measure:

ingress bytes/sec
egress bytes/sec
disk write latency
disk utilization
replication lag
queue depth
stream lag
memory usage
flow-control activation
partition leadership distribution
request handler saturation

14.3 Consumer Metrics

Measure:

records/sec consumed
processing latency
end-to-end latency
lag by partition
commit latency
rebalance count
poll idle ratio
worker queue depth
unacked deliveries
redelivery count

14.4 Downstream Metrics

Measure:

DB transaction latency
batch write latency
lock wait time
deadlock count
HTTP response latency
error rate
connection pool saturation
retry count
idempotency conflict count

The anti-pattern:

Measuring broker throughput only and declaring the system ready.

The system is ready only when producer, broker, consumer, and side-effect path are stable together.

15. Benchmark Discipline

A benchmark that cannot predict production behavior is worse than no benchmark because it creates false confidence.

15.1 Benchmark Dimensions

Use realistic values for:

payload size distribution
key distribution
partition count
producer count
consumer group count
replication factor
durability settings
ack/confirm settings
compression
schema serialization
TLS/SASL overhead
downstream side effects
retry/DLQ behavior
deployment topology
disk type
network latency
multi-AZ replication if applicable

15.2 Benchmark Scenarios

Minimum scenarios:

Scenario	Purpose
steady state normal load	baseline
peak load	capacity
burst load	buffer and backpressure behavior
slow consumer	lag/queue growth
downstream degradation	containment
broker restart	durability and recovery
consumer crash	duplicate window
poison message	isolation
schema error	contract failure
replay backlog	catch-up capacity

15.3 Benchmark Output

A useful benchmark report includes:

throughput
p50/p95/p99/p999 latency
end-to-end latency
CPU, memory, disk, network
broker metrics
consumer lag/queue depth
retry/DLQ counts
duplicate count
message loss count
recovery time
exact configuration
test data shape
failure assumptions

16. Common Anti-Patterns

16.1 “Just Increase Batch Size”

Increasing batch size can hide the real bottleneck.

If the consumer cannot process fast enough, larger producer batches only make lag grow faster.

16.2 `flush()` After Every Kafka Send

This destroys batching.

It turns Kafka into a synchronous request system and often creates unnecessary latency.

16.3 Infinite Producer Buffering

If the application keeps accepting writes while the broker is unavailable, memory eventually becomes the failure boundary.

Bound the buffer.

Define what fails fast.

16.4 Huge RabbitMQ Prefetch

Large prefetch can make dashboards look healthy while consumers secretly hold thousands of unacked messages.

When a consumer dies, all those messages return at once.

16.5 Batch Commit Before Side Effect Is Safe

If you commit offset or ack message before side effect is durable, you have created at-most-once behavior for the business action.

16.6 Benchmark Without Failure

A throughput benchmark that excludes broker restart, consumer crash, downstream slowdown, and poison messages does not represent production.

17. Decision Heuristics

17.1 When to Increase Batch Size

Increase batch size when:

per-request overhead dominates
latency SLO has room
memory exposure is acceptable
duplicate window is acceptable
downstream can process batches safely
observability can explain batch delay

Do not increase batch size when:

p99 latency is already near limit
failures are hard to isolate
poison messages occur often
ordering constraints are unclear
consumer lag is already growing

17.2 When to Increase Pipeline Depth

Increase pipeline depth when:

resources are idle waiting for round trips
ordering does not depend on completion order, or ordering is protected by partition/key discipline
in-flight messages are bounded
retry behavior is understood

Do not increase pipeline depth when:

side effects are non-idempotent
callbacks are slow
failures cannot be attributed to a specific record
resource limits are unknown

17.3 When to Add Partitions/Consumers

Add partitions/consumers when:

processing is parallelizable by key
consumer group has fewer active consumers than useful partitions
broker and downstream capacity can handle more parallelism
hot partitions are not the main bottleneck

Do not add consumers when:

one hot key dominates
downstream database is saturated
consumer processing is serialized by global lock
ordering requires single-threaded business handling

18. Case-Management Example

Imagine a regulatory case-management platform with events:

CaseOpened
EvidenceSubmitted
RiskScoreChanged
EscalationTriggered
NoticeIssued
CaseClosed

18.1 Bad Throughput Design

one topic for all events
random keys
large producer batches
consumer commits before updating projection
projection writes one row per transaction
no idempotency
no poison event isolation
no end-to-end latency metric

This may pass a synthetic producer benchmark but fail during audit replay.

18.2 Better Throughput Design

key by caseId for per-case ordering
producer uses moderate batching and compression
consumer processes partition-affine batches
projection writes batched idempotent upserts
offset commits after DB transaction
poison event goes to quarantine with full metadata
replay runbook includes throughput target and duplicate handling
latency is measured from event occurrence time, not consume time

19. Tuning Playbook

Use this sequence.

Step 1 — Define the SLO

Write down:

target records/sec
p95/p99 end-to-end latency
maximum duplicate replay window
recovery time objective
data-loss tolerance
max backlog catch-up time

Step 2 — Baseline Without Tuning

Run normal configuration with realistic workload.

Capture all metrics.

Step 3 — Identify Bottleneck

Find which stage saturates first:

producer CPU?
producer buffer?
broker CPU?
broker disk?
replication?
consumer CPU?
downstream DB?
network?

Step 4 — Tune One Layer

Change one dimension:

batch size
linger/wait
compression
prefetch
fetch size
worker concurrency
DB batch size

Measure again.

Step 5 — Add Failure Scenarios

Repeat with:

broker restart
consumer crash
downstream slowdown
poison message
replay backlog

Step 6 — Document the Envelope

Record:

safe operating range
known saturation point
failure behavior
recommended alerts
rollback settings
tuning rationale

20. Engineering Checklist

Before increasing throughput in production, confirm:

21. Summary

Batching and pipelining are throughput tools, not correctness tools.

They can make a good design efficient.

They can also make a weak design fail faster and with larger blast radius.

The advanced mental model is:

batching amortizes fixed cost
pipelining hides round-trip latency
concurrency increases parallel service capacity
compression trades CPU for bandwidth/storage
buffering absorbs bursts but delays failure visibility
every in-flight record must have a known recovery story

For a top-tier engineer, the question is not:

What is the best batch.size?

The better question is:

Where is the fixed cost, what is the bottleneck, what is the latency budget, what duplicate window can the business tolerate, and how does the system fail when the batch does not complete?

In the next part, we will move from throughput to stability: backpressure and flow control.

References

Apache Kafka Documentation — Producer API, Consumer API, design, producer/consumer configuration.
RabbitMQ Documentation — consumer acknowledgements, publisher confirms, consumer prefetch, flow control, streams, stream Java client.
Jakarta Messaging / JMS API Documentation — sessions, transactions, producers, consumers, acknowledgement.
Confluent ksqlDB Documentation — operations, query processing, monitoring, and deployment concepts.

Lesson Recap

You just completed lesson 29 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 28

Learn Java Messaging Event Streaming Part 028 Technology Decision Matrix

Next Lesson

Lesson 30

Learn Java Messaging Event Streaming Part 030 Backpressure Flow Control Stability

Learn Java Messaging Event Streaming Part 029 Pipelining Batching Throughput

Part 029 — Pipelining and Batching: Throughput Engineering Across Messaging Systems

1. What We Are Solving

2. The Core Mental Model

3. Pipelining vs Batching

3.1 Batching

3.2 Pipelining

3.3 The Difference

4. Throughput Equation for Practical Engineers

5. Batch Boundaries Across the Pipeline

6. Kafka Producer Batching

6.1 Key Producer Settings

6.2 The linger.ms Trade-Off

6.3 Compression

6.4 Idempotence and Pipelining

6.5 Producer Callback Discipline

6.6 Application-Level Kafka Producer Batch

7. Kafka Consumer Batching

7.1 Consumer Batch Settings

7.2 Manual Commit Batch

7.3 Partition-Aware Worker Batching

7.4 Consumer Batch and Database Batch

8. RabbitMQ Queue Throughput

8.1 Publisher Confirms as Pipeline

8.2 Consumer Prefetch as Batch Window

8.3 RabbitMQ Ack Batching

8.4 RabbitMQ Batch Anti-Pattern: Giant Consumer Transaction

9. RabbitMQ Streams Throughput

9.1 Sub-Entry Batching

9.2 Stream Offset Commit Batch

10. JMS / Jakarta Messaging Batching

10.1 Transacted Session Batch

10.2 Consumer Batch with Transacted Session

10.3 MDB Concurrency

11. Kafka Streams and ksqlDB Batching

11.1 Stateful Processing Batch Cost

11.2 Repartitioning as Throughput Tax

12. Downstream Side-Effect Batching

12.1 Database Batch Example

12.2 HTTP Side Effects

13. Latency-Throughput Curve

14. What to Measure

14.1 Producer Metrics

14.2 Broker Metrics

14.3 Consumer Metrics

14.4 Downstream Metrics

15. Benchmark Discipline

15.1 Benchmark Dimensions

15.2 Benchmark Scenarios

15.3 Benchmark Output

16. Common Anti-Patterns

16.1 “Just Increase Batch Size”

16.2 flush() After Every Kafka Send

16.3 Infinite Producer Buffering

16.4 Huge RabbitMQ Prefetch

16.5 Batch Commit Before Side Effect Is Safe

16.6 Benchmark Without Failure

17. Decision Heuristics

17.1 When to Increase Batch Size

17.2 When to Increase Pipeline Depth

17.3 When to Add Partitions/Consumers

18. Case-Management Example

18.1 Bad Throughput Design

18.2 Better Throughput Design

19. Tuning Playbook

Step 1 — Define the SLO

Step 2 — Baseline Without Tuning

Step 3 — Identify Bottleneck

Step 4 — Tune One Layer

Step 5 — Add Failure Scenarios

Step 6 — Document the Envelope

20. Engineering Checklist

21. Summary

References

6.2 The `linger.ms` Trade-Off

16.2 `flush()` After Every Kafka Send