Ordering, Consistency, and Idempotency
Learn Java Kafka in Action - Part 026
Ordering, consistency, and idempotency for production-grade Java Kafka systems: partition ordering, causal ordering, per-aggregate sequencing, duplicate suppression, idempotent side effects, consistency boundaries, projection correctness, and regulatory-grade workflow invariants.
Part 026 — Ordering, Consistency, and Idempotency
Part 025 covered batching, backpressure, and flow control.
Now we move to the correctness triangle of event-driven systems:
- Ordering — In what sequence should facts be interpreted?
- Consistency — What state is allowed to be visible at each point in time?
- Idempotency — What happens if the same event or command is processed more than once?
The central idea:
Kafka gives strong ordering within a partition, but business correctness requires explicit design for key choice, sequence, duplicate handling, state transitions, and consistency boundaries.
This is where many Kafka systems fail quietly. They pass load tests, but break under replay, retry, rebalance, cross-partition workflows, or late-arriving events.
1. Kaufman Skill Decomposition
The target skill is designing Kafka systems that remain correct under duplicates, retries, replay, partial failure, and distributed timing ambiguity.
| Subskill | Production Meaning |
|---|---|
| Partition ordering | Understand Kafka’s actual ordering boundary. |
| Domain ordering | Model the order business logic needs, not only broker order. |
| Causal reasoning | Track why one event happened after another. |
| Per-aggregate sequencing | Use version/sequence to protect entity state. |
| Idempotent consumer | Make repeated processing safe. |
| Deduplication | Choose event ID, command ID, aggregate version, or hash-based dedup strategy. |
| Projection consistency | Build read models that converge correctly. |
| External side effects | Prevent duplicate API calls, payments, notifications, or workflow transitions. |
| Replay safety | Ensure historical replay does not corrupt current state. |
| Auditability | Explain why state changed and whether duplicates/out-of-order events were ignored. |
1.1 Practice Goal
By the end of this part, you should be able to:
- Explain why Kafka does not provide global ordering across all partitions.
- Choose a key that preserves the business ordering boundary.
- Design an idempotent consumer using database constraints.
- Detect stale, duplicate, and out-of-order events.
- Build a projection that tolerates replay.
- Separate Kafka exactly-once semantics from external side-effect idempotency.
2. Ordering: The First Misunderstanding
Kafka ordering is often summarized badly as:
Kafka preserves order.
A production-grade version is:
Kafka preserves append order within a single topic-partition. It does not provide one global order across all partitions in a topic, nor across multiple topics.
There is no meaningful total order between:
partition-0 offset-100
partition-1 offset-35
partition-2 offset-901
Unless your application creates one.
3. Five Types of Order
Advanced engineers distinguish different meanings of “order.”
| Order Type | Meaning | Example |
|---|---|---|
| Broker order | Order records were appended to a partition. | Offset 11 after offset 10. |
| Producer send order | Order a producer attempted to send records. | Producer called send(A) then send(B). |
| Event time order | Order events occurred in the real world/domain. | Payment happened before shipment. |
| Processing order | Order consumer logic handled records. | Worker processed B before A. |
| Visibility order | Order users/systems observed state changes. | UI saw SHIPPED before PAID. |
Kafka gives you broker order per partition. It does not automatically give event-time order, processing order, visibility order, or cross-service causal order.
3.1 Domain Example
Suppose an order lifecycle emits:
OrderCreated(orderId=O-1, version=1)
OrderPaid(orderId=O-1, version=2)
OrderShipped(orderId=O-1, version=3)
If all records use key orderId=O-1, Kafka will place them on the same partition and preserve their relative broker order.
If keys are inconsistent:
OrderCreated key = customerId
OrderPaid key = paymentId
OrderShipped key = orderId
Those events may go to different partitions. Your consumer can see them in any relative order.
4. The Ordering Boundary Is a Business Decision
A Kafka key is not only a routing detail. It defines the strongest practical ordering boundary.
| Business Requirement | Likely Key |
|---|---|
| Order lifecycle must be sequential | orderId |
| Account balance updates must be sequential | accountId |
| Case workflow transition must be sequential | caseId |
| Device telemetry only needs per-device order | deviceId |
| Tenant isolation matters more than entity ordering | composite tenantId:entityId or tenant-aware partitioning |
| Global audit ordering required | separate single-partition audit topic or external sequencer |
The key question:
Which entity becomes incorrect if its events are processed out of order?
That entity usually belongs in the Kafka key.
5. Regulatory Workflow Example
Consider an enforcement case lifecycle:
CaseOpened -> EvidenceSubmitted -> NoticeIssued -> ResponseReceived -> PenaltyAssessed -> AppealFiled -> CaseClosed
For regulatory defensibility, transitions must be explainable and monotonic.
Bad key strategy:
EvidenceSubmitted key = documentId
NoticeIssued key = officerId
PenaltyAssessed key = penaltyId
AppealFiled key = appealId
The case projection may observe:
PenaltyAssessed before NoticeIssued
CaseClosed before AppealFiled
Correcter key strategy:
key = caseId
Then the case lifecycle has a single partition ordering boundary.
But this has a trade-off: one very active case can become a hot key.
Production design requires choosing between:
- strict per-case ordering,
- throughput via sharding,
- workflow-level sequencing,
- or external state machine enforcement.
6. When One Partition Is Not Enough
A single key can become a throughput bottleneck.
Examples:
- one tenant produces 60% of traffic,
- one account has millions of transactions,
- one regulatory mega-case has thousands of documents,
- one device sends high-frequency telemetry.
Options:
| Strategy | Benefit | Cost |
|---|---|---|
| Keep strict key | Strong ordering | Hot partition risk |
| Shard key | Higher throughput | Loses strict per-entity order unless re-sequenced |
| Split by sub-entity | Balanced if domain allows | Requires domain decomposition |
| Use command sequencer | Preserves logical order | Adds service and latency |
| Use state machine guard | Allows out-of-order arrival | More complex consumer logic |
6.1 Sharded Key Example
key = accountId + ':' + shardNumber
This spreads load but destroys natural per-account ordering unless each update is independent or reassembled by sequence.
Use only when the domain allows it.
7. Consistency Boundaries
Kafka systems are distributed. You must define where consistency is strong and where it is eventual.
7.1 Local Transaction Boundary
A service can usually guarantee consistency inside its local database transaction:
Inside the DB transaction:
- state update and outbox insert are atomic,
- constraints can enforce invariant,
- version can be incremented,
- idempotency key can be stored.
Outside that boundary:
- Kafka publication is asynchronous,
- downstream projections are eventually consistent,
- consumers can lag,
- retries can duplicate.
7.2 Eventual Consistency Is Not a Bug
Eventual consistency is acceptable when:
- the business tolerates delay,
- stale reads are bounded and observable,
- users get appropriate status,
- reconciliation exists,
- state transitions remain monotonic,
- audit trail explains the sequence.
It is not acceptable when:
- immediate decisioning is required,
- legal/regulatory deadlines depend on visibility,
- duplicate external side effects are harmful,
- stale reads trigger irreversible actions.
8. Idempotency: The Core Invariant
Idempotency means processing the same input more than once has the same durable effect as processing it once.
apply(event)
apply(event)
apply(event)
final_state == apply(event) once
This does not mean the code executes only once. It means the durable business effect occurs only once.
8.1 Duplicate Sources
Kafka consumers must expect duplicates from:
- producer retry after uncertain acknowledgement,
- consumer crash after side effect before offset commit,
- rebalance during processing,
- manual replay,
- DLQ replay,
- retry topic reprocessing,
- CDC snapshot + streaming overlap,
- upstream bug,
- cross-region replication/recovery,
- operator backfill.
Duplicates are normal. Design for them.
9. Four Idempotency Keys
Different use cases need different keys.
| Key Type | Use When | Example |
|---|---|---|
eventId | Each event has a stable unique identity. | evt-01H... |
commandId | User/API command must execute once. | cmd-submit-payment-123 |
aggregateId + version | Entity state transition is sequential. | caseId=C-1, version=7 |
| business natural key | Operation has natural uniqueness. | invoice number, payment reference |
9.1 eventId
Good for duplicate suppression at event level.
Weakness: if upstream emits two different event IDs for the same business action, eventId dedup does not help.
9.2 commandId
Good for external commands:
SubmitPayment(commandId=CMD-123, paymentId=P-9)
If command is retried, the handler recognizes CMD-123.
9.3 aggregateId + version
Best for ordered state transitions.
CaseStatusChanged(caseId=C-1, version=5, status=NOTICE_ISSUED)
The consumer can reject:
- duplicate version,
- stale version,
- future version if gap not allowed.
9.4 Natural Business Key
Example:
payment_provider_reference = 'PAY-ABC-999'
If the natural key is truly unique and stable, use a database unique constraint.
10. Database-Backed Idempotent Consumer
The most reliable dedup mechanism is usually a durable store with a uniqueness constraint.
10.1 Processed Event Table
CREATE TABLE processed_kafka_event (
consumer_name TEXT NOT NULL,
topic_name TEXT NOT NULL,
partition_number INTEGER NOT NULL,
event_id TEXT NOT NULL,
processed_at TIMESTAMPTZ NOT NULL DEFAULT now(),
PRIMARY KEY (consumer_name, event_id)
);
10.2 Idempotent Processing Transaction
public final class IdempotentCaseEventConsumer {
private final CaseProjectionRepository projectionRepository;
private final ProcessedEventRepository processedEventRepository;
public void handle(ConsumerRecord<String, CaseEvent> record) {
CaseEvent event = record.value();
try {
projectionRepository.inTransaction(() -> {
boolean firstTime = processedEventRepository.tryInsert(
"case-projection-consumer",
record.topic(),
record.partition(),
event.eventId()
);
if (!firstTime) {
return; // duplicate event, no durable effect
}
projectionRepository.apply(event);
});
} catch (Exception e) {
throw new EventProcessingException(event.eventId(), e);
}
}
}
tryInsert should rely on a unique constraint, not only an application-side check.
Application-side check is race-prone:
if (!exists(eventId)) insert(eventId)
Two concurrent workers can both pass exists.
Use:
INSERT INTO processed_kafka_event (...)
VALUES (...)
ON CONFLICT DO NOTHING;
11. Versioned Projection Pattern
For entity state, eventId is not enough. You also need version monotonicity.
11.1 Projection Table
CREATE TABLE case_projection (
case_id TEXT PRIMARY KEY,
status TEXT NOT NULL,
version BIGINT NOT NULL,
last_event_id TEXT NOT NULL,
updated_at TIMESTAMPTZ NOT NULL
);
11.2 Apply Only Next Version
UPDATE case_projection
SET status = :newStatus,
version = :eventVersion,
last_event_id = :eventId,
updated_at = now()
WHERE case_id = :caseId
AND version = :eventVersion - 1;
If affected rows = 1, transition applied.
If affected rows = 0, investigate:
| Condition | Meaning | Action |
|---|---|---|
| current version == event version | duplicate | ignore if same event or equivalent state |
| current version > event version | stale event | ignore or audit |
| current version < event version - 1 | gap | hold, retry later, or quarantine |
| no row exists and event version == 1 | create projection | insert |
| no row exists and event version > 1 | missing history | retry/quarantine |
11.3 Java Version Guard
public ApplyResult apply(CaseStatusChanged event) {
int updated = jdbc.update("""
UPDATE case_projection
SET status = ?, version = ?, last_event_id = ?, updated_at = now()
WHERE case_id = ? AND version = ?
""",
event.status().name(),
event.version(),
event.eventId(),
event.caseId(),
event.version() - 1
);
if (updated == 1) {
return ApplyResult.APPLIED;
}
CaseProjection current = findById(event.caseId());
if (current == null && event.version() == 1) {
insertInitial(event);
return ApplyResult.APPLIED;
}
if (current != null && current.version() >= event.version()) {
return ApplyResult.DUPLICATE_OR_STALE;
}
return ApplyResult.GAP_DETECTED;
}
This pattern turns ordering ambiguity into explicit state.
12. Monotonic State Machine Guard
Some domains should reject invalid transitions even if versions appear sequential.
Example case lifecycle:
Consumer guard:
public boolean canTransition(CaseStatus current, CaseStatus next) {
return switch (current) {
case OPENED -> next == CaseStatus.EVIDENCE_SUBMITTED;
case EVIDENCE_SUBMITTED -> next == CaseStatus.NOTICE_ISSUED;
case NOTICE_ISSUED -> next == CaseStatus.RESPONSE_RECEIVED;
case RESPONSE_RECEIVED -> next == CaseStatus.PENALTY_ASSESSED;
case PENALTY_ASSESSED -> next == CaseStatus.APPEALED || next == CaseStatus.CLOSED;
case APPEALED -> next == CaseStatus.CLOSED;
case CLOSED -> false;
};
}
This is not merely defensive programming. It is regulatory defensibility:
- invalid transitions are detected,
- ignored transitions are explainable,
- duplicate events do not rewrite history,
- stale events cannot roll back status.
13. Idempotent External Side Effects
Kafka exactly-once processing does not automatically make external side effects exactly-once.
External side effects include:
- charging a card,
- sending email/SMS,
- creating a ticket,
- calling a government registry,
- updating a third-party CRM,
- invoking a workflow engine,
- sending a webhook.
If a consumer crashes after the side effect but before committing the offset, Kafka will redeliver the record. Your code may call the external system again.
13.1 External Idempotency Key
Use an idempotency key at the external API boundary.
PaymentRequest request = new PaymentRequest(
event.paymentId(),
event.amount(),
event.currency()
);
paymentGateway.charge(
event.commandId(), // idempotency key
request
);
The external system should guarantee:
same idempotency key + same payload => same result
same idempotency key + different payload => conflict
If the external system does not support idempotency, build your own outbox/inbox around that side effect and accept operational risk.
14. Inbox Pattern
The inbox pattern stores incoming messages before processing them.
Use when:
- side effect is expensive,
- processing needs retries independent of Kafka polling,
- external API has rate limits,
- you need operator visibility into pending work,
- you need strong dedup around external side effects.
14.1 Inbox Table
CREATE TABLE message_inbox (
message_id TEXT PRIMARY KEY,
source_topic TEXT NOT NULL,
source_partition INTEGER NOT NULL,
source_offset BIGINT NOT NULL,
payload JSONB NOT NULL,
status TEXT NOT NULL,
attempt_count INTEGER NOT NULL DEFAULT 0,
next_attempt_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
Consumer responsibility:
- Insert inbox row idempotently.
- Commit Kafka offset after inbox insert.
- Let controlled workers process inbox rows.
This shifts retry/flow control from Kafka consumer loop to a durable work table.
15. Ledger Pattern
For financial or audit-grade systems, avoid overwriting state blindly.
Instead of:
UPDATE account SET balance = balance + :amount;
Use a ledger:
CREATE TABLE account_ledger_entry (
ledger_entry_id TEXT PRIMARY KEY,
account_id TEXT NOT NULL,
event_id TEXT NOT NULL UNIQUE,
amount_delta NUMERIC NOT NULL,
entry_type TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
The balance is derived from ledger entries or maintained with strict transactional updates.
Benefits:
- duplicate events rejected by
event_idunique constraint, - audit trail preserved,
- corrections can be additive instead of destructive,
- replay can rebuild balance.
16. Handling Out-of-Order Events
When an event arrives out of order, choose one of four strategies.
| Strategy | Use When | Risk |
|---|---|---|
| Ignore stale events | Projection already moved forward. | May hide upstream disorder if not audited. |
| Hold future events | Missing prior events may arrive soon. | Requires buffer and timeout. |
| Requery source of truth | Projection can repair from authoritative store. | Adds load and coupling. |
| Quarantine | Event violates invariant. | Requires operational process. |
16.1 Gap Buffer
This is useful for domains where short-term disorder is expected but strict sequence is required.
17. Replay Safety
Replay is one of Kafka’s superpowers. It is also one of the fastest ways to corrupt a poorly designed system.
A replay-safe consumer must answer:
- Will duplicate events be ignored?
- Will external side effects be suppressed?
- Will stale events roll back current state?
- Will projections be rebuilt into a separate target or overwrite live state?
- Is replay bounded by topic, partition, offset, timestamp, or event ID?
- Are schema versions still readable?
- Are correction events handled correctly?
17.1 Safe Projection Rebuild
Safer approach:
old projection table: case_projection
new projection table: case_projection_rebuild_20260701
Steps:
- Create new projection table.
- Replay from earliest offset into new table.
- Validate counts/checksums/business invariants.
- Switch read traffic.
- Keep old table for rollback window.
Avoid replaying destructive side effects directly into live systems.
18. Consistency Pattern Matrix
| Pattern | Ordering Strategy | Idempotency Strategy | Best For |
|---|---|---|---|
| Idempotent projection | key by aggregate ID | event ID + version guard | read models |
| Inbox worker | key by work entity | inbox primary key | external side effects |
| Ledger | key by account/case | unique ledger entry/event ID | financial/audit trail |
| State machine guard | key by workflow ID | version + transition validation | lifecycle systems |
| Compacted state topic | key by entity ID | latest value/tombstone | cache/read-side state |
| Command handler | key by aggregate ID | command ID | user/API commands |
| Reconciliation job | batch key/range | natural key/upsert | repair/backfill |
19. Kafka Streams Considerations
Kafka Streams can provide exactly-once processing for Kafka read-process-write boundaries, but it does not remove domain ordering requirements.
Watch for:
- repartition changing physical partition placement,
- joins requiring co-partitioning,
- windowed joins accepting out-of-order data within grace period,
- state store restoration replaying changelog data,
- tombstones in KTables,
- external side effects inside processors.
If your topology writes only to Kafka and state stores, Kafka Streams exactly-once can be powerful.
If your topology calls external APIs, you still need external idempotency.
20. Testing Ordering and Idempotency
Unit tests are not enough. You need scenario tests.
20.1 Required Test Cases
| Test | Expected Behavior |
|---|---|
| Same event delivered twice | second delivery ignored |
| Event version 3 before version 2 | held/quarantined/rejected explicitly |
| Stale event after current version 5 | ignored and audited |
| Crash after DB write before offset commit | redelivery does not duplicate state |
| Crash after external API call before offset commit | idempotency key suppresses duplicate side effect |
| DLQ replay | no duplicate durable effect |
| Full topic replay | projection rebuild deterministic |
| Invalid state transition | rejected/quarantined with reason |
20.2 Example Test Shape
@Test
void duplicateEventShouldNotChangeProjectionTwice() {
CaseStatusChanged event = new CaseStatusChanged(
"evt-1",
"case-1",
1L,
CaseStatus.OPENED
);
consumer.handle(record(event));
consumer.handle(record(event));
CaseProjection projection = repository.findByCaseId("case-1");
assertThat(projection.version()).isEqualTo(1L);
assertThat(processedEventRepository.countByEventId("evt-1")).isEqualTo(1);
}
21. Observability
Ordering and idempotency need explicit metrics.
Track:
| Metric | Meaning |
|---|---|
| duplicate event count | upstream retry/replay or bug rate |
| stale event count | out-of-order or replay volume |
| version gap count | missing event or disorder |
| invalid transition count | domain invariant violation |
| idempotency conflict count | same key with different payload |
| external duplicate suppression count | side-effect replay avoided |
| quarantine count | records requiring operator review |
| projection rebuild checksum mismatch | non-deterministic projection logic |
A mature Kafka platform treats ignored duplicates as important signals, not silent success.
22. Anti-Patterns
22.1 “Exactly-Once Means No Duplicates Anywhere”
Exactly-once processing has a boundary. External systems, databases, APIs, and humans require separate idempotency design.
22.2 “Use Timestamp to Decide Latest State”
Timestamps can be skewed, delayed, or corrected. Use version/sequence when correctness matters.
22.3 “Key by Tenant Only”
This may preserve tenant grouping but destroy entity-level ordering if multiple entities under the same tenant interleave heavily.
22.4 “Dedup in Memory”
In-memory dedup disappears on restart and does not protect multiple instances.
22.5 “Replay Directly Into Side Effects”
Historical replay can resend emails, duplicate payments, or re-trigger workflows.
22.6 “Ignore Out-of-Order Events Without Metrics”
Silent ignore hides upstream defects.
22.7 “Use Global Ordering as a Requirement Without Challenging It”
Global ordering is expensive and often unnecessary. Usually the real requirement is per-entity, per-account, per-case, or per-workflow ordering.
23. Production Checklist
Ordering
- What is the required ordering boundary?
- Is the Kafka key aligned with that boundary?
- Are all events for the same aggregate keyed consistently?
- Are repartition operations understood?
- Are cross-topic ordering assumptions avoided?
- Are hot keys identified and accepted or mitigated?
Consistency
- Where is strong consistency required?
- Where is eventual consistency acceptable?
- What is the source of truth?
- Can projections be rebuilt deterministically?
- Are stale reads acceptable in the user journey?
- Is reconciliation available?
Idempotency
- What key suppresses duplicates?
- Is dedup durable and shared across instances?
- Are database unique constraints used?
- Are external side effects idempotent?
- Are command idempotency and event dedup separated?
- Are DLQ and replay safe?
Auditability
- Can we explain why an event was ignored?
- Can we explain why a transition was rejected?
- Can we reconstruct state from event history?
- Are correction events modeled explicitly?
- Are operator actions logged?
24. Lab: Build an Idempotent Case Projection
24.1 Scenario
You consume case.lifecycle.events.v1 and build case_projection for a regulatory case-management UI.
Events contain:
{
"eventId": "evt-123",
"caseId": "case-456",
"version": 7,
"eventType": "NOTICE_ISSUED",
"occurredAt": "2026-07-01T10:00:00Z"
}
24.2 Tasks
- Use
caseIdas Kafka key. - Create
processed_kafka_eventtable with uniqueeventIdper consumer. - Create
case_projectionwithcaseId,status,version, andlastEventId. - Apply only
version = currentVersion + 1. - Ignore duplicate/stale events with metrics.
- Quarantine future events with version gaps.
- Reject invalid lifecycle transitions.
- Write tests for duplicate, stale, gap, invalid transition, and replay.
- Add dashboard panels for duplicate count, stale count, gap count, and invalid transition count.
24.3 Expected Learning
You should learn that correctness is not a single Kafka configuration. It is a set of domain and storage invariants:
same event -> one durable effect
same aggregate -> monotonic version
invalid transition -> rejected
missing prior event -> gap detected
replay -> deterministic result
25. Architecture Review Questions
Use these questions in a design review:
- What is the exact business ordering requirement?
- What Kafka key implements that requirement?
- What happens if events arrive out of order?
- What happens if the same event is processed twice?
- What happens if the consumer crashes after DB write but before offset commit?
- What happens if the consumer crashes after external API call but before offset commit?
- Can we replay the topic without side effects?
- Can we rebuild projections from scratch?
- Are stale events ignored, held, or quarantined?
- Are invalid transitions visible to operators?
- Does the design require impossible global ordering?
- Can auditors understand why final state was reached?
26. Summary
Kafka gives you an ordered log per partition. Business correctness requires much more:
- choose the key that matches the domain ordering boundary,
- use durable idempotency keys,
- protect state with version guards,
- reject invalid transitions,
- design external side effects with idempotency,
- make replay deterministic,
- measure duplicates, stale events, and gaps.
The core invariant:
Every consumer must be safe under duplicate delivery, replay, crash recovery, and out-of-order arrival according to the domain’s declared consistency model.
In Part 027, we move from consumer-local correctness to multi-service transaction boundaries: local transactions, outbox, Kafka transactions, saga, compensation, workflow boundary, and regulatory auditability across services.
27. References
- Apache Kafka Documentation — topic partitions, ordering, producer/consumer behavior, delivery guarantees, and event streaming model.
- Confluent Documentation — Kafka partition key, delivery semantics, Schema Registry, Kafka Streams, and client behavior.
- Enterprise integration and distributed systems practice — idempotent receiver, inbox/outbox, saga, ledger, and projection rebuild patterns.
You just completed lesson 26 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.
Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.