Series/Learn Java Microservices Communication

Deepen PracticeOrdered learning track

Message Delivery Semantics and Consumer Correctness

Learn Java Microservices Communication - Part 064

Message delivery semantics for Java microservices: at-most-once, at-least-once, effectively-once, exactly-once scope, acknowledgements, offsets, ordering, duplicates, idempotent consumers, retry, dead-letter topics, poison messages, and production correctness policy.

[2026-07-05]13 min read2570 words

In This Lesson

1. The Basic Semantics 2. At-Most-Once 3. At-Least-Once

PrevNext

Lesson 6496 lesson track53–79 Deepen Practice

#java#microservices#communication#messaging+5 more

Part 064 — Message Delivery Semantics and Consumer Correctness

Message delivery semantics are often misunderstood.

People say:

Kafka gives exactly once.

or:

Our queue is reliable.

or:

The consumer will only get one copy.

These statements are usually incomplete.

Delivery semantics are not only a broker feature.

They are the combined behavior of:

producer,
broker,
storage,
consumer,
offset/ack commit,
retry,
database transaction,
external side effects,
idempotency,
replay,
failure recovery.

A top-tier engineer asks:

Exactly once what, where, and under which failure boundary?

That question prevents many production bugs.

1. The Basic Semantics

Common delivery guarantees:

Semantics	Meaning
At-most-once	Message may be lost, but not redelivered
At-least-once	Message should not be lost, but may be redelivered
Exactly-once	Message/effect is observed once in a defined scope
Effectively-once	Duplicates may occur, but idempotency/dedup makes final effect once

Apache Kafka's design documentation describes the classic delivery semantics as at-most-once, at-least-once, and exactly-once.

The practical default for most microservice consumers:

assume at-least-once delivery
design idempotent consumers

Even if your broker supports stronger guarantees, your database writes and external side effects still need correctness.

2. At-Most-Once

At-most-once means:

message may be lost
message is not redelivered

Typical pattern:

commit offset/ack before processing
process message

Failure:

Result:

message lost from consumer perspective

Use at-most-once only when loss is acceptable.

Examples:

low-value telemetry,
approximate metrics,
non-critical cache refresh,
sampled analytics,
optional notifications where loss is acceptable.

Do not use at-most-once for financial, workflow, audit, compliance, or state transition messages.

3. At-Least-Once

At-least-once means:

message is not considered done until after processing
message may be delivered again after failure

Typical pattern:

process message
commit offset/ack after successful processing

Failure:

Result:

database write may happen twice unless idempotent

At-least-once is common and practical.

But it requires duplicate-safe consumers.

4. Exactly-Once: Scope Matters

Exactly-once is not magic.

It must define scope.

Examples:

Claim	Scope question
broker exactly once	producer-to-broker? broker-to-consumer?
stream processing exactly once	within Kafka topics and transactional state store?
database exactly once	database transaction and unique keys?
email exactly once	external email provider idempotency?
workflow exactly once	command dedup plus durable state?

A broker may guarantee that records produced by a transactional producer are committed once to topics.

That does not automatically mean:

the consumer's database update and external HTTP call happen exactly once

If your consumer sends an email and crashes before committing offset, broker may redeliver. Email may be sent twice unless email sending is idempotent.

Therefore, design final effects, not only broker guarantees.

5. Effectively-Once

Effectively-once is the practical goal.

It means:

delivery may duplicate,
processing may retry,
but final business effect is applied once

Techniques:

processed message table,
unique constraints,
idempotency keys,
upserts,
compare-and-set version,
event sequence checks,
external provider idempotency,
transactional write + offset where supported,
outbox/inbox patterns.

Example:

CREATE TABLE processed_message (
    consumer_name text NOT NULL,
    message_id text NOT NULL,
    processed_at timestamptz NOT NULL,
    PRIMARY KEY (consumer_name, message_id)
);

Consumer:

if (!processedMessages.tryClaim("notification-consumer", event.id())) {
    return;
}

sendNotification(event);

processedMessages.markCompleted("notification-consumer", event.id());

But be careful: if sendNotification succeeds and markCompleted fails, duplicate can still happen unless notification is also idempotent.

6. Acknowledgement and Offset Commit

The critical question:

When do you tell the broker the message is done?

Options:

Commit/ack timing	Risk
before processing	loss
after processing	duplicate processing
after DB transaction	duplicate if crash before commit offset
in same transaction as output records	stronger for broker-to-broker pipelines
after external side effect	duplicate external side effect if crash before ack

There is no free lunch.

Choose based on business risk.

Most systems choose:

process idempotently
then commit/ack

7. Consumer Offset Is Consumer State

In Kafka-like systems, offset is not just broker internal state.

It represents consumer progress.

If consumer commits offset 100:

consumer says it no longer needs records <= 100

If processing of record 100 did not actually succeed, data is lost.

Offset commit must reflect durable processing progress.

For batch processing:

for (Record record : records) {
    process(record);
}
commitOffsetsAfterAllProcessed();

But if record 7 fails in a batch, what about records 8-10?

You need policy:

stop batch and retry from 7,
process independently and commit per record where supported,
send failed record to DLQ and continue,
use partition-aware ordered processing.

Ordering requirements drive commit strategy.

8. Ordering and Delivery

Ordering and delivery semantics interact.

If events for one aggregate must be ordered:

CaseCreated -> CaseEscalated -> CaseClosed

then consumer cannot freely skip CaseEscalated and process CaseClosed.

Options:

block partition until issue resolved,
park poison message and preserve per-key order separately,
use sequence numbers and gap detection,
route to per-key retry topic,
design projection to tolerate out-of-order events.

Ordering requirement should be explicit.

"Kafka preserves order" usually means:

within a partition

not globally.

9. Duplicates

Duplicates can come from:

producer retry after uncertain publish outcome,
broker redelivery,
consumer crash after side effect before ack,
manual replay,
DLQ reprocessing,
outbox relay retry,
network timeout,
partition rebalance,
transactional ambiguity,
human backfill.

Consumer must assume duplicate unless proven otherwise.

Duplicate-safe strategies:

Effect	Strategy
projection update	upsert by aggregate ID/version
email send	notification ID/idempotency key
external API call	provider idempotency key
audit row	unique event ID
workflow transition	compare current state/version
search index	deterministic document ID
cache update	overwrite by key/version

10. Idempotent Consumer Pattern

Basic pattern:

public final class IdempotentMessageHandler<T> {
    private final ProcessedMessageRepository processed;
    private final MessageHandler<T> delegate;
    private final String consumerName;

    public void handle(MessageEnvelope<T> envelope) {
        boolean claimed = processed.tryClaim(consumerName, envelope.messageId());

        if (!claimed) {
            metrics.duplicate(envelope);
            return;
        }

        try {
            delegate.handle(envelope.payload(), envelope.context());
            processed.markCompleted(consumerName, envelope.messageId());
        } catch (RuntimeException ex) {
            processed.markFailed(consumerName, envelope.messageId(), ex);
            throw ex;
        }
    }
}

This works when:

tryClaim is atomic,
message ID is stable,
duplicate after completion can be ignored,
failed message can be retried according to policy.

But it does not automatically make external side effects safe.

11. Processed Message Table States

Use states, not just "exists."

CREATE TABLE processed_message (
    consumer_name text NOT NULL,
    message_id text NOT NULL,
    status text NOT NULL,
    first_seen_at timestamptz NOT NULL,
    updated_at timestamptz NOT NULL,
    error_code text,
    PRIMARY KEY (consumer_name, message_id)
);

States:

State	Meaning
`IN_PROGRESS`	consumer claimed message
`COMPLETED`	processing completed
`FAILED_RETRYABLE`	failed but retry allowed
`FAILED_TERMINAL`	failed permanently
`PARKED`	moved to manual/remediation path

This helps handle crash recovery and stuck IN_PROGRESS.

12. Claim Algorithm

Need carefully define stale IN_PROGRESS.

If a consumer crashes after claim, another delivery may find IN_PROGRESS.

Policy:

wait/retry later,
reclaim after timeout,
inspect processing status,
mark abandoned,
use lock expiry.

Do not let IN_PROGRESS block forever.

13. Transaction Boundary

Best case:

process message and record processed_message in same database transaction

Example projection update:

transactionTemplate.executeWithoutResult(tx -> {
    if (!processed.tryInsert(consumer, event.id())) {
        return;
    }

    projectionRepository.upsert(event.caseId(), event.version(), event.payload());
    processed.markCompleted(consumer, event.id());
});

Then commit offset after DB transaction succeeds.

Crash after DB commit but before offset commit:

message redelivered
processed table says completed
consumer skips

This gives effectively-once effect for database projection.

14. External Side Effects

External side effects are harder.

Example:

sendEmail(event);
markProcessed(event.id());
ack();

Crash after sendEmail before markProcessed:

redelivery -> email may send again

Solutions:

use external provider idempotency key,
store notification intent in DB and send via outbox,
use unique notification ID,
make provider call deduplicated,
accept duplicate if business allows,
require manual reconciliation.

For critical side effects, prefer outbox-style durable intent.

15. Retry Semantics

Message retry must distinguish:

transient failure,
deterministic poison message,
dependency outage,
rate limit,
business precondition,
schema incompatibility,
authorization failure.

Bad:

retry forever

Bad:

DLQ immediately on first transient DB timeout

Better:

bounded retry with backoff
then DLQ/parking for terminal or exhausted messages

Retry policy:

retry:
  maxAttempts: 5
  initialDelayMs: 1000
  maxDelayMs: 60000
  jitter: true
  retryable:
    - DATABASE_TIMEOUT
    - DEPENDENCY_UNAVAILABLE
    - RATE_LIMITED
  nonRetryable:
    - SCHEMA_INVALID
    - AUTHORIZATION_FAILED
    - UNKNOWN_EVENT_TYPE

16. Retry Topics vs Blocking Partition

If processing a message fails, you can:

Block and retry in place

Pros:

preserves ordering,
simple.

Cons:

one poison message blocks partition,
lag grows.

Move to retry topic with delay

Pros:

main flow continues,
backoff easier,
poison isolated.

Cons:

ordering may be broken,
complexity,
duplicates/reordering.

DLQ/parking lot

Pros:

prevents infinite blocking,
manual remediation.

Cons:

message not processed automatically,
business process may be incomplete.

Ordering requirements decide.

17. Dead-Letter Queue / Topic

DLQ stores messages that could not be processed.

DLQ message should include:

original topic/channel,
original partition/offset if available,
original key,
original headers,
payload or payload reference,
consumer name,
failure reason,
exception class,
attempt count,
first failure time,
last failure time,
schema version,
correlation ID.

DLQ is not a trash can.

It is an operational workflow.

A DLQ with no owner, alert, or replay tool is message loss with extra steps.

18. Parking Lot

A parking lot is similar to DLQ but often implies manual or controlled remediation.

Use parking lot for:

business exceptions needing human review,
unknown schema requiring deployment,
poison messages,
data correction,
dependency-specific repeated failure,
compliance-sensitive messages.

Required operations:

inspect,
classify,
fix payload or state,
replay,
skip with audit,
delete with approval.

Parking lot must have runbook.

19. Poison Message Detection

Poison message signs:

same message fails repeatedly,
same error reason,
non-retryable validation error,
deserialization failure,
unknown event type,
unsupported enum,
invariant violation,
consumer version incompatible.

Detection metric:

message.failures.total{message_id_hash,event_type,reason}

But avoid raw message ID high-cardinality in metrics.

Use logs/traces for exact IDs.

Policy:

after N attempts or non-retryable error -> DLQ/park

Do not block entire consumer indefinitely.

20. Deserialization Failures

If consumer cannot deserialize message, application handler may never run.

You still need handling.

Common causes:

incompatible schema,
wrong serializer,
corrupt payload,
missing schema ID,
unsupported version,
invalid JSON/Avro/Protobuf.

Policy:

route to deserialization DLQ if supported,
log safe metadata,
alert producer and topic owner,
do not commit offset silently without record,
do not retry forever if deterministic.

Schema compatibility gates prevent many of these failures.

But runtime handling is still needed.

21. Ordering and Idempotency

If events have version:

{
  "caseId": "CASE-100",
  "version": 42,
  "eventType": "CaseEscalated"
}

Consumer can apply:

if (event.version() <= projection.currentVersion(caseId)) {
    return; // duplicate or old event
}

if (event.version() != projection.currentVersion(caseId) + 1) {
    throw new SequenceGapException();
}

projection.apply(event);

This protects against:

duplicates,
old events,
out-of-order events,
gaps.

But it may block on missing event.

Need gap policy:

retry later,
fetch missing state,
rebuild projection,
park key,
alert.

22. Rebalance and Duplicate Processing

Consumer group rebalancing can cause duplicates.

Scenario:

Consumer A reads message.
Consumer A processes but has not committed offset.
Rebalance assigns partition to Consumer B.
Consumer B reads same message.
Duplicate processing happens.

This is normal in at-least-once systems.

Design for it.

Do not assume "one consumer instance" means no duplicate.

23. Batch Processing

Batch consumption improves throughput.

But it complicates error handling.

Example batch:

records 1,2,3,4,5
record 3 fails

Options:

fail whole batch and retry all,
process records independently and commit safe offsets,
send record 3 to DLQ and continue,
stop at record 3 to preserve ordering.

If order matters within partition, you cannot simply skip record 3 and commit record 5 without policy.

Batching should not hide per-record correctness.

24. Consumer Transactions

Some platforms support transactions.

Kafka has producer idempotence and transactions that can support exactly-once semantics in Kafka-to-Kafka pipelines when used correctly.

But if the consumer writes to an external database or sends HTTP calls, the transaction boundary may not include those systems.

Therefore:

broker transaction != global distributed transaction

Use transactions where they fit.

Still design idempotent external side effects.

25. Outbox and Inbox

Outbox protects producer side:

business state + outgoing message recorded atomically

Inbox protects consumer side:

incoming message processing recorded idempotently

Together:

This is a robust pattern for many Java microservices.

Outbox prevents missing events.

Inbox prevents duplicate effects.

26. Delivery Semantics Matrix

Producer	Broker	Consumer	Final effect
non-idempotent	at-least-once	non-idempotent	duplicates possible
idempotent producer	at-least-once	non-idempotent	consumer duplicates possible
outbox producer	durable publish	idempotent consumer	effectively-once DB effect
transactional Kafka pipeline	EOS configured	Kafka output only	exactly-once within Kafka scope
outbox + external email no idempotency	durable event	non-idempotent side effect	duplicate email possible
outbox + notification ID	durable event	provider idempotency	effectively-once notification

Always specify final effect scope.

27. Java Consumer Skeleton

public final class CaseEscalatedConsumer {
    private final ProcessedMessageRepository processedMessages;
    private final CaseEscalatedHandler handler;
    private final DeadLetterPublisher deadLetterPublisher;

    public void onMessage(MessageEnvelope<CaseEscalatedEvent> envelope) {
        try {
            boolean claimed = processedMessages.tryClaim(
                "case-escalated-consumer",
                envelope.messageId()
            );

            if (!claimed) {
                metrics.duplicate(envelope);
                return;
            }

            handler.handle(envelope.payload(), envelope.context());

            processedMessages.markCompleted(
                "case-escalated-consumer",
                envelope.messageId()
            );

            envelope.ack();
        } catch (NonRetryableMessageException ex) {
            deadLetterPublisher.publish(envelope, ex);
            envelope.ack();
        } catch (RetryableMessageException ex) {
            envelope.nackWithBackoff(ex);
        }
    }
}

Broker-specific implementation differs.

Correctness shape remains similar.

28. Message Envelope

public record MessageEnvelope<T>(
    String messageId,
    String key,
    T payload,
    MessageContext context,
    Acknowledgement acknowledgement
) {
    public void ack() {
        acknowledgement.ack();
    }

    public void nackWithBackoff(Throwable cause) {
        acknowledgement.nack(cause);
    }
}

Keep application handler focused on business processing.

Keep broker ack mechanics in infrastructure adapter.

29. Observability

Metrics:

consumer.messages.received.total{topic,consumer_group,event_type}
consumer.messages.processed.total{topic,consumer_group,event_type,outcome}
consumer.processing.duration{topic,consumer_group,event_type}
consumer.duplicates.total{topic,consumer_group,event_type}
consumer.retries.total{topic,consumer_group,event_type,reason}
consumer.dlq.total{topic,consumer_group,event_type,reason}
consumer.lag{topic,consumer_group,partition}
consumer.offset.commits.total{topic,consumer_group,status}
consumer.deserialization.failures.total{topic,reason}
consumer.poison.detected.total{topic,event_type}

Important outcomes:

success,
duplicate_skipped,
retryable_failure,
terminal_failure,
dlq,
parked,
deserialization_failure,
schema_incompatible.

30. Alerts

Useful alerts:

Alert	Meaning
consumer lag rising	consumer cannot keep up
DLQ rate > 0 for critical topic	messages not processed
poison message detected	partition may block
duplicate rate spike	producer/retry/rebalance issue
deserialization failure	schema compatibility issue
processing p99 high	dependency/DB slowness
retry exhausted	business process stuck
outbox pending growing	producer relay failing
offset commit failures	progress not durable
partition idle unexpectedly	producer stopped or routing issue

Message systems fail silently without lag/DLQ alerts.

31. Testing Delivery Semantics

Minimum tests:

Scenario	Expected
message processed successfully	ack/commit after side effect
crash before ack	redelivery handled idempotently
duplicate message	skipped or idempotently applied
retryable dependency failure	retry/backoff
non-retryable validation failure	DLQ/park
deserialization failure	error topic/DLQ
poison message	bounded retry then DLQ
out-of-order event	gap/old event policy applied
batch partial failure	correct commit behavior
replay	no duplicate side effects

Test duplicate explicitly.

Do not rely on broker behavior in unit tests.

32. Duplicate Test

@Test
void skipsAlreadyProcessedEvent() {
    CaseEscalatedEvent event = event("evt-123", "CASE-100");

    consumer.onMessage(envelope(event));
    consumer.onMessage(envelope(event));

    assertThat(notificationRepository.sentCountForEvent("evt-123")).isEqualTo(1);
}

If this test is hard to write, the consumer design is probably too coupled to broker mechanics.

33. Crash Window Test

Simulate:

business effect succeeds
ack fails/crash before ack
message redelivered

@Test
void redeliveryAfterCommitDoesNotDuplicateProjection() {
    MessageEnvelope<CaseEscalatedEvent> envelope = envelope("evt-123");

    consumer.processBusinessEffectButDoNotAck(envelope);

    consumer.onMessage(envelope("evt-123"));

    assertThat(projection.version("CASE-100")).isEqualTo(1);
}

This is the core at-least-once correctness test.

34. Production Policy Template

deliverySemantics:
  topics:
    case-events:
      defaultGuarantee: at-least-once
      key: caseId
      ordering: per-key
      retention: 7d
      replayAllowed: true

      consumers:
        search-indexer:
          idempotency:
            strategy: aggregate-version-upsert
            duplicateHandling: skip
          offsetCommit:
            timing: after-db-transaction
          retry:
            maxAttempts: 5
            backoff: exponential-jitter
          dlq:
            enabled: true
            topic: case-events.search-indexer.dlq
            owner: search-team
          poison:
            maxAttemptsBeforePark: 5
          lagSlo:
            maxLagSeconds: 60

        notification-sender:
          idempotency:
            strategy: notification-id
            externalProviderIdempotency: true
          replay:
            allowed: false-for-side-effect
          dlq:
            enabled: true
            manualReviewRequired: true

Policy must be per consumer, not only per topic.

Different consumers have different side effects.

35. Common Anti-Patterns

35.1 Assuming no duplicates

Duplicates are normal.

35.2 Committing before processing critical messages

Message loss.

35.3 Retrying poison forever

Partition blocked, lag grows.

35.4 DLQ with no owner

Operational black hole.

35.5 Global ordering requirement

Scalability killer.

35.6 Side effects without idempotency

Duplicate emails/payments/API calls.

35.7 Ignoring deserialization failures

Consumer never reaches handler.

35.8 Treating exactly-once as broker checkbox

Final business effect still duplicates.

35.9 No lag alert

Stale projections discovered by users.

35.10 Replay unsafe consumers

Backfill triggers real side effects.

36. Decision Model

The broker delivers messages.

Your consumer delivers correctness.

37. Design Checklist

Before shipping a consumer:

What delivery semantics are assumed?
Can message be duplicated?
What is message ID?
Is consumer idempotent?
What side effects happen?
Are external side effects deduplicated?
When is offset/ack committed?
What happens if crash occurs before ack?
What happens if crash occurs after side effect?
Is ordering required?
What is partition key?
What happens on poison message?
Is retry bounded?
Is DLQ owned?
Is replay safe?
Are schema/deserialization failures handled?
Is lag monitored?
Are duplicate tests written?
Is crash-window behavior tested?
Is runbook ready?

38. The Real Lesson

Messaging reliability is not the broker's job alone.

The broker can store and redeliver messages.

It cannot know whether your email was sent twice, your projection applied twice, your payment duplicated, or your workflow transitioned incorrectly.

Delivery semantics become business correctness only when consumer design is correct.

That usually means:

at-least-once delivery
+ idempotent consumer
+ correct offset/ack timing
+ bounded retry
+ DLQ/parking
+ replay policy
+ lag observability

That is the production baseline.

References

Apache Kafka Design — Message Delivery Semantics: https://kafka.apache.org/0100/design/design/#semantics
Apache Kafka Documentation: https://kafka.apache.org/documentation/
Confluent Kafka Design — Delivery Semantics: https://docs.confluent.io/kafka/design/delivery-semantics.html
Enterprise Integration Patterns — Idempotent Receiver: https://www.enterpriseintegrationpatterns.com/patterns/messaging/IdempotentReceiver.html
Enterprise Integration Patterns — Dead Letter Channel: https://www.enterpriseintegrationpatterns.com/patterns/messaging/DeadLetterChannel.html
Enterprise Integration Patterns — Message Channel: https://www.enterpriseintegrationpatterns.com/patterns/messaging/MessageChannel.html

Lesson Recap

You just completed lesson 64 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 63

Event, Message, and Stream Communication Mental Model

Next Lesson

Lesson 65

Ordering, Partitioning, Keys, and Consumer Groups