Kafka Event Contracts: Topic, Key, Partitioning, Ordering, Retention, and Replay Semantics
Learn Java API Contract Engineering, Event Contract Engineering & Schema Governance - Part 016
Kafka event contract engineering for Java systems: topic design, message keys, partitioning, ordering, retention, compaction, replay, consumer groups, poison messages, and DLQ semantics.
Part 016 — Kafka Event Contracts: Topic, Key, Partitioning, Ordering, Retention, and Replay Semantics
Tujuan Pembelajaran
Banyak tim mengira Kafka contract hanya payload schema. Ini salah besar.
Kafka event contract mencakup:
- topic;
- event type;
- message key;
- partitioning;
- ordering guarantee;
- delivery guarantee;
- retention;
- compaction;
- tombstone semantics;
- replay policy;
- consumer group expectations;
- offset management;
- poison message handling;
- DLQ/quarantine semantics;
- schema registry interaction;
- producer transaction/outbox behavior;
- idempotency and deduplication;
- operational SLO.
Schema hanya satu bagian.
Setelah part ini, kamu harus mampu:
- mendesain Kafka topic sebagai contract boundary;
- memilih message key berdasarkan ordering, partitioning, compaction, dan load distribution;
- menjelaskan ordering yang benar: per partition, not global;
- mendesain event replay dan side-effect safety;
- membedakan retention delete dan compaction;
- memahami tombstone semantics untuk compacted topics;
- merancang DLQ/quarantine contract;
- menulis Java producer/consumer contract tests untuk Kafka events;
- menghindari anti-pattern yang menyebabkan data loss, consumer breakage, dan event stream chaos.
1. Kafka Contract Surface
Kafka contract is not:
topic + Avro schema
Kafka contract is:
who publishes what message to which topic,
with what key,
under what ordering/delivery/retention/replay assumptions,
using what schema,
with what failure and compatibility policy.
If contract only mentions schema, consumers will invent assumptions for the rest.
2. Topic as Contract Boundary
Topic names are API surface.
Bad:
service-a-output
test-topic
customer
events
new-topic-1
Better:
customer-events
case-events
payment-events
customer-snapshots
case-commands
case-events-dlq
2.1 Topic Naming Principles
- reflect domain or message purpose;
- avoid implementation component names;
- avoid temporary names;
- avoid consumer-specific names unless topic is consumer-owned;
- make environment separate from logical name if possible;
- distinguish events, commands, snapshots, DLQ;
- document owner and authority.
Example convention:
<domain>-events
<domain>-commands
<domain>-snapshots
<domain>-events-dlq
or:
acme.<domain>.<stream-kind>
2.2 Topic Ownership
Topic must have an owner.
| Topic | Owner | Authority |
|---|---|---|
case-events | case-management-platform | case-service |
customer-events | customer-platform | customer-service |
payment-events | payment-platform | payment-service |
case-events-dlq | case-platform / platform | consumers/platform |
No owner = no governance.
2.3 Topic Lifecycle
Retiring a topic requires:
- producer stopped;
- consumer inventory migrated;
- retention expired or archived;
- replay requirements satisfied;
- schema references archived;
- access controls cleaned;
- docs updated.
3. Topic Design Patterns
3.1 Event Type per Topic
case-approved-events
case-submitted-events
case-closed-events
Pros:
- simple schema per topic;
- easy consumer filtering;
- granular access control;
- easier DLQ per event type.
Cons:
- topic explosion;
- cross-event ordering per aggregate difficult;
- operational overhead;
- consumers need many subscriptions.
Use when:
- event types have very different consumers/security;
- event volume high and isolated;
- per-event retention differs;
- access control differs significantly.
3.2 Domain Event Stream
case-events
Contains:
CaseSubmitted
CaseAssigned
CaseApproved
CaseClosed
CaseReopened
Pros:
- cohesive domain stream;
- per-aggregate ordering possible if keyed by caseId;
- fewer topics;
- good for event-sourced/projection consumers.
Cons:
- consumers must filter eventType;
- multiple schemas per topic;
- access control coarser;
- topic schema governance more complex.
Use when:
- consumers need lifecycle stream;
- ordering across event types per aggregate matters;
- event types share owner/security/retention.
3.3 Snapshot Topic
customer-snapshots
Usually compacted by key.
Contains latest state snapshot per entity.
Pros:
- consumers can bootstrap projection;
- compaction retains latest per key;
- good for read-model replication.
Cons:
- not audit history;
- old changes lost by compaction;
- tombstone semantics needed;
- consumers may confuse snapshot with event history.
3.4 Command Topic
case-commands
Contains commands, not events.
Must define:
- command authority;
- authorization context;
- idempotency key;
- command expiration;
- result/rejection path;
- retry policy.
Do not mix commands and events casually.
4. Message Key Is Contract
Kafka message key affects partitioning. Partitioning affects ordering and load.
If key is caseId, events for same case usually go to same partition, preserving order for that case.
key = caseId
Contract:
topic: case-events
messageKey: metadata.aggregateId
orderingGuarantee: per-case
4.1 Bad Key Choice
Random key:
key = eventId
Pros:
- good distribution.
Cons:
- events for same aggregate go to different partitions;
- per-aggregate ordering lost;
- projection consumers must handle disorder more heavily.
Null key:
key = null
Pros:
- producer easy.
Cons:
- partitioning may be round-robin/sticky depending producer behavior;
- no key-based ordering;
- compaction impossible/useful semantics lost;
- consumer assumptions weak.
4.2 Key Decision Matrix
| Requirement | Good key |
|---|---|
| preserve order per case | caseId |
| preserve order per customer | customerId |
| compact latest snapshot per account | accountId |
| distribute high-volume independent events | random or eventId |
| tenant-local ordering | tenantId + ":" + aggregateId |
| consumer partition by region | region + ":" + aggregateId |
4.3 Key Stability
Changing key is dangerous.
Impacts:
- ordering guarantee changes;
- partition distribution changes;
- stateful stream tasks rebalance;
- compaction semantics change;
- consumer local state stores may break;
- replay behavior changes;
- monitoring baselines change.
Key changes require contract review.
5. Partitioning and Ordering
Kafka ordering is within a partition, not global across all partitions.
If topic has multiple partitions:
Partition 0: Case A v1, Case A v2
Partition 1: Case B v1, Case B v2
There is no global order between Case A and Case B.
5.1 Per-Aggregate Ordering
If key = aggregateId, and partitioner stable, events for same aggregate go to same partition.
Consumer can assume:
CaseSubmitted v1 -> CaseAssigned v2 -> CaseApproved v3
for same case, if producer sends in order and no producer misconfiguration.
5.2 Ordering Contract Should State
ordering:
scope: per-aggregate
key: metadata.aggregateId
sequenceField: metadata.aggregateVersion
duplicates: possible
globalOrdering: false
5.3 Add Sequence Anyway
Even with Kafka partition ordering, include aggregateVersion/sequence.
Why:
- detect missing events;
- detect duplicates;
- handle replay;
- survive migration/repartition;
- help projection correctness;
- debug ordering issues.
Consumer logic:
void apply(EventEnvelope<CaseEventPayload> event) {
long incoming = event.metadata().aggregateVersion();
long current = projection.version(event.metadata().aggregateId());
if (incoming <= current) {
return; // duplicate or old
}
if (incoming != current + 1) {
quarantine(event, "SEQUENCE_GAP");
return;
}
projection.apply(event);
}
6. Partition Count Is Contract-ish
Kafka partition count is operational config, but consumers can be affected.
Increasing partitions:
- may change key-to-partition mapping depending partitioner and key hashing;
- can affect ordering if producers/consumers rely on partition assignment assumptions;
- affects throughput;
- affects consumer group parallelism;
- affects state store distribution;
- affects replay duration.
Do not let consumers assume fixed partition numbers unless documented.
Contract should say:
partitioning:
partitionCountIsStableContract: false
orderingScope: per-key
consumersMustNotDependOnPartitionNumber: true
If using Kafka Streams/state stores, partition count changes require migration planning.
7. Delivery Guarantees
Common real-world statement:
At-least-once delivery, duplicates possible.
Contract should state:
delivery:
guarantee: at-least-once
duplicateDelivery: possible
producerEventIdStableAcrossRetries: true
consumerIdempotencyRequired: true
7.1 At-Most-Once
Messages may be lost. Rarely acceptable for critical domain events.
7.2 At-Least-Once
Messages not lost if systems configured correctly, but duplicates possible.
Most common.
7.3 Exactly-Once
Kafka has transactional/exactly-once features for some scenarios, but end-to-end exactly-once across arbitrary external side effects is hard.
Contract should not promise exactly-once unless carefully scoped:
delivery:
guarantee: exactly-once-within-kafka-transactional-topology
externalSideEffectsExactlyOnce: false
Avoid marketing language.
8. Producer Contract
Producer promises:
- event emitted after durable state change;
- event ID stable;
- correct topic;
- correct key;
- correct schema;
- correct event type;
- correct aggregate version;
- no forbidden data;
- ordering per key if promised;
- retry publish safely.
8.1 Outbox Pattern
Outbox helps align state and event.
8.2 Producer Java Example
public void publish(EventEnvelope<CaseApprovedPayload> event) {
String topic = "case-events";
String key = event.metadata().aggregateId();
ProducerRecord<String, EventEnvelope<CaseApprovedPayload>> record =
new ProducerRecord<>(topic, key, event);
record.headers().add(
"event-type",
event.metadata().eventType().getBytes(StandardCharsets.UTF_8)
);
record.headers().add(
"correlation-id",
event.metadata().correlationId().getBytes(StandardCharsets.UTF_8)
);
kafkaTemplate.send(record);
}
The topic/key/header choice should be contract-driven, not incidental.
9. Consumer Contract
Consumer must handle:
- duplicates;
- out-of-order events if not guaranteed;
- unknown event types;
- unknown optional fields;
- schema evolution;
- transient processing failure;
- poison messages;
- replay;
- rebalancing;
- offset commit semantics.
9.1 Consumer Handler
@KafkaListener(topics = "case-events", groupId = "case-projection")
public void consume(ConsumerRecord<String, EventEnvelope<CaseApprovedPayload>> record) {
EventEnvelope<CaseApprovedPayload> event = record.value();
if (!"CaseApproved".equals(event.metadata().eventType())) {
return;
}
handler.handle(event);
}
9.2 Offset Commit
Offset commit strategy affects failure behavior.
| Strategy | Risk |
|---|---|
| commit before processing | message loss on crash |
| commit after processing | duplicate processing on crash |
| transactional consume-process-produce | stronger but complex |
| manual commit with idempotency | common practical pattern |
Contract may not dictate offset strategy, but consumer guide should.
10. Retention Policy
Retention answers: how long messages remain available.
Kafka topic config can use delete policy based on time/size, compaction, or both depending configuration.
10.1 Delete Retention
Messages are deleted after retention time/size.
Example:
retention:
policy: delete
duration: P90D
purpose: allow replay within 90 days
Contract implications:
- new consumer can bootstrap only within retention window;
- replay after retention requires archive;
- audit must not rely only on topic if retention short;
- data deletion must align with legal policy.
10.2 Size-Based Retention
Retention may be size-limited. This makes replay window variable.
Contract should avoid promising “90 days” if size can delete earlier.
10.3 Retention as Governance
For regulated systems:
- topic retention;
- data lake/archive retention;
- DLQ retention;
- schema retention;
- outbox retention;
- audit log retention.
All must align.
11. Compaction
Compaction retains latest record per key, not full history.
Compacted topic is suitable for latest state snapshots.
Example:
key=cus_123 value=status=PENDING
key=cus_123 value=status=ACTIVE
key=cus_123 value=status=SUSPENDED
After compaction, topic may retain latest:
key=cus_123 value=status=SUSPENDED
11.1 Contract Implications
Compacted topic is not reliable audit history.
Use for:
- latest entity snapshot;
- lookup table replication;
- materialized view bootstrap;
- configuration/rules latest state.
Avoid for:
- audit events;
- lifecycle history;
- compliance event stream;
- workflows needing every transition.
11.2 Tombstones
In compacted topics, a null value for a key often acts as tombstone/delete marker.
Example:
key=cus_123 value=null
Contract must define tombstone semantics:
tombstone:
supported: true
meaning: customer snapshot deleted or no longer available
consumerAction: delete local projection for key
Consumers must handle tombstones.
11.3 Compaction Does Not Mean Single Record Immediately
Compaction is asynchronous. Consumers may see multiple records for same key before compaction.
Contract should say:
compaction:
policy: compact
consumersMustHandleMultipleRecordsPerKey: true
latestRecordByOffsetWins: true
12. Delete + Compact
Kafka can support delete and compact policies together depending topic config.
Meaning:
- compaction keeps latest per key;
- delete retention can remove old segments based on time/size.
Contract must document final behavior:
retention:
policy: delete,compact
deleteRetention: P30D
compactedLatestPerKey: true
This matters for bootstrap and recovery.
13. Replay Semantics
Replay means consumers can re-read old messages.
Kinds:
- consumer resets offset;
- producer republishes from outbox/archive;
- replay topic;
- data lake backfill;
- compacted snapshot bootstrap.
13.1 Replay Contract
replay:
supported: true
source: kafka-retention
retentionWindow: P90D
sideEffectConsumersAllowed: false
deduplicationKey: metadata.eventId
13.2 Replay-Safe Consumer
Projection consumer:
if (alreadyApplied(event.eventId())) {
return;
}
applyProjection(event);
Email/SMS/payment consumer:
- must not blindly replay;
- must dedup by eventId or business key;
- may need separate live-only topic or replay marker;
- may be excluded from replay.
13.3 Replay Can Break Semantics
Old event interpreted with new code may yield different projection if business rules changed.
Mitigations:
- event contains facts, not instructions;
- versioned handlers;
- schema evolution support;
- archived reference data;
- deterministic projection logic;
- migration scripts.
14. Consumer Groups
Consumer group controls load-sharing.
In a group, each partition is consumed by one member at a time. Different groups each receive their own copy of stream.
Contract guidance:
consumerGroups:
expected:
- case-projection
- case-notification
note: Each logical application should use its own groupId.
14.1 Group ID as Consumer Identity
Bad:
group.id=test
group.id=consumer
Better:
case-projection-service
fraud-monitoring-service
customer-search-indexer
14.2 Consumer Group Contract Risks
- two different apps accidentally share group ID and split messages;
- one app changes group ID and reprocesses all history;
- offset reset policy causes unexpected replay/skip;
- partition count limits parallelism;
- slow consumer causes lag.
Consumer onboarding docs should include group ID guidance.
15. Offset Reset Policy
If consumer has no committed offset:
earliest: read from beginning available;latest: read only new messages;none: fail if no offset.
Contract should guide new consumers:
consumerGuidance:
autoOffsetReset: earliest for projection rebuild, latest for notification-only consumers
Wrong default can cause:
- missed historical events;
- accidental massive replay;
- duplicate side effects;
- consumer overload.
16. Poison Messages
Poison message = message consumer cannot process successfully.
Causes:
- schema incompatible;
- unknown event type;
- invalid domain state;
- bug in consumer;
- missing reference data;
- deserialization failure;
- authorization/classification violation.
16.1 Strategies
| Strategy | When |
|---|---|
| fail and retry forever | rarely acceptable |
| skip and commit | low-value notification only |
| DLQ | common |
| quarantine | regulated/high-risk |
| pause partition | strict ordering required |
| fetch current state and repair | projection consumers |
| manual remediation | critical workflows |
16.2 Contract Should Define
poisonMessage:
strategy: quarantine
dlqTopic: case-events-dlq
maxAttempts: 3
preservesOriginalEnvelope: true
includesFailureMetadata: true
redriveSupported: true
17. DLQ Contract
DLQ is not a trash can. It is a contract.
DLQ message should include:
- original topic;
- original partition;
- original offset;
- original key;
- original headers;
- original event envelope/payload;
- consumer name/group;
- failure time;
- failure code;
- failure message/class sanitized;
- attempt count;
- correlation ID;
- schema ID if available.
Example:
{
"failure": {
"consumer": "case-projection-service",
"groupId": "case-projection-service",
"failedAt": "2026-06-29T05:00:00Z",
"failureCode": "SCHEMA_VALIDATION_FAILED",
"attempt": 3,
"originalTopic": "case-events",
"originalPartition": 4,
"originalOffset": 998172
},
"original": {
"key": "case_123",
"headers": {
"event-type": "CaseApproved",
"correlation-id": "corr_123"
},
"value": {
"metadata": {
"eventId": "evt_123",
"eventType": "CaseApproved"
},
"payload": {}
}
}
}
DLQ access may need same or stronger security than source topic because it contains failed sensitive payloads.
18. Unknown Event Types
For multi-type topic, consumers must handle unknown event types.
Bad:
throw new IllegalArgumentException("Unknown event type");
This can poison the partition when producer adds new event type.
Better:
if (!supportedEventTypes.contains(event.metadata().eventType())) {
metrics.increment("event.ignored", "eventType", event.metadata().eventType());
return;
}
Contract:
multiTypeTopic:
consumersMustIgnoreUnknownEventTypes: true
Unless consumer is meant to validate all events as governance monitor.
19. Schema Evolution in Kafka
Schema evolution depends on serializer and registry. But Kafka contract must document:
- schema format;
- registry;
- subject naming;
- compatibility mode;
- reader/writer behavior;
- unknown fields;
- enum evolution;
- nullability/defaults.
Topic contract example:
schema:
format: avro
registry: apicurio
subjectNamingStrategy: record-name
compatibility: backward-transitive
This series will deep dive Avro, Protobuf, and JSON Schema in later parts.
20. Retrying Consumer Processing
Consumer retries differ from producer retries.
20.1 Retry in Same Partition
If processing fails and consumer does not commit offset, same message will be retried. This preserves order but can block partition.
20.2 Retry Topic
Move failed message to retry topic with delay.
Pros:
- avoids blocking main partition;
- controlled backoff.
Cons:
- ordering may break;
- more topics;
- complex redrive.
20.3 DLQ After Attempts
After max attempts, send DLQ/quarantine and commit original offset.
Risk: projection may miss event.
Contract must define whether this is acceptable.
For strict state projections, skipping event can corrupt projection unless later snapshot repair exists.
21. Event Ordering vs Retry Topics
Retry topic can violate ordering.
Example:
CaseApproved v5 fails, moved to retry.
CaseClosed v6 succeeds.
Then CaseApproved v5 retry succeeds later.
Projection corrupts unless version logic handles it.
If ordering matters, options:
- block partition until fixed;
- quarantine and stop projection;
- use version checks;
- fetch current state;
- route all events for aggregate to same retry lane;
- design commutative handlers.
Document strategy.
22. Kafka Topic Configuration as Contract
Important topic configs:
| Config concept | Contract relevance |
|---|---|
| partitions | parallelism and ordering scope |
| replication factor | availability/durability |
| cleanup.policy | delete/compact behavior |
| retention.ms/bytes | replay window |
| min.insync.replicas | durability with producer acks |
| max.message.bytes | payload size limit |
| compression | performance/compat |
| segment settings | retention behavior |
| delete.retention.ms | tombstone retention |
| compaction lag | snapshot correctness timing |
Not every config belongs in developer docs, but contract-critical ones do.
23. Security and Access Control
Kafka event contract should document:
- who can produce;
- who can consume;
- data classification;
- PII presence;
- encryption expectations;
- tenant isolation;
- schema registry permissions;
- DLQ access;
- replay access;
- audit logging.
Example:
security:
producers:
- case-service
consumers:
approvalRequired: true
dataClassification: confidential
containsPii: false
tenantScoped: true
A topic with sensitive data should not be discoverable/consumable casually just because it exists.
24. Observability Contract
Metrics:
- producer publish count;
- producer publish failure;
- event type count;
- schema validation failure;
- consumer lag;
- consumer processing failure;
- DLQ count;
- retry count;
- replay count;
- end-to-end latency: occurredAt to processedAt.
Key labels:
topic
eventType
consumerGroup
status
failureCode
Avoid high cardinality:
eventId
aggregateId
customerId
correlationId
Logs should include eventId/correlationId, but metrics should avoid unbounded IDs.
25. Java Producer Contract Tests
25.1 Verify Topic, Key, Event Type
@Test
void approvingCasePublishesCaseApprovedWithCaseIdKey() {
caseService.approve(caseId, approveCommand);
ProducerRecord<String, EventEnvelope<CaseApprovedPayload>> record =
testKafka.singleProducedRecord("case-events");
assertThat(record.key()).isEqualTo(caseId.value());
assertThat(record.value().metadata().eventType()).isEqualTo("CaseApproved");
assertThat(record.value().metadata().aggregateId()).isEqualTo(caseId.value());
assertThat(record.value().metadata().aggregateVersion()).isEqualTo(17L);
}
25.2 Verify Schema
@Test
void producedCaseApprovedEventMatchesSchema() {
EventEnvelope<CaseApprovedPayload> event = produceCaseApproved();
schemaValidator.validate("case.CaseApproved", event);
}
25.3 Verify Outbox Event ID Stability
@Test
void outboxPublishRetryUsesSameEventId() {
OutboxEvent outboxEvent = createOutboxEvent();
publisher.publish(outboxEvent);
publisher.publish(outboxEvent);
assertThat(testKafka.records("case-events"))
.extracting(record -> record.value().metadata().eventId())
.containsOnly(outboxEvent.eventId());
}
26. Java Consumer Contract Tests
26.1 Duplicate Handling
@Test
void duplicateEventShouldNotApplyProjectionTwice() {
EventEnvelope<CaseApprovedPayload> event = caseApprovedEvent("evt_123", 17);
consumer.handle(event);
consumer.handle(event);
CaseProjection projection = repository.get(event.payload().caseId());
assertThat(projection.appliedEventCount("evt_123")).isEqualTo(1);
}
26.2 Out-of-Order Handling
@Test
void eventWithSequenceGapShouldBeQuarantined() {
repository.saveProjection("case_123", 15);
EventEnvelope<CaseApprovedPayload> event = caseApprovedEvent("evt_17", 17);
consumer.handle(event);
assertThat(quarantineRepository.exists("evt_17")).isTrue();
}
26.3 Unknown Event Type
@Test
void unknownEventTypeShouldBeIgnoredOnMultiTypeTopic() {
EventEnvelope<Object> event = unknownEvent("NewFutureEvent");
consumer.handle(event);
assertThat(dlq.count()).isZero();
}
27. Topic Contract Template
topic: case-events
ownerTeam: case-management-platform
authority: case-service
purpose: Case lifecycle domain event stream.
messageTypes:
- CaseSubmitted
- CaseAssigned
- CaseApproved
- CaseClosed
key:
field: metadata.aggregateId
meaning: caseId
ordering:
scope: per-case
globalOrdering: false
sequenceField: metadata.aggregateVersion
delivery:
guarantee: at-least-once
duplicates: possible
retention:
policy: delete
duration: P90D
replay:
supported: true
source: kafka-retention
sideEffectConsumers: must-deduplicate-or-opt-out
schema:
format: avro
registry: apicurio
compatibility: backward-transitive
dlq:
topic: case-events-dlq
preservesOriginalEnvelope: true
security:
dataClassification: confidential
pii: false
consumerGuidance:
unknownEventTypes: ignore
idempotencyKey: metadata.eventId
offsetReset:
projectionConsumers: earliest
notificationConsumers: latest
28. Kafka Contract Review Checklist
28.1 Topic
- Is topic name stable and meaningful?
- Is owner defined?
- Is authority defined?
- Is topic lifecycle known?
- Are event types listed?
28.2 Key and Ordering
- Is message key documented?
- Does key match ordering requirement?
- Is aggregateVersion/sequence present?
- Is global ordering avoided unless actually provided?
- Is key change impact analyzed?
28.3 Retention and Replay
- Is retention policy documented?
- Is replay window known?
- Is compaction used?
- Are tombstones defined?
- Is archive needed beyond retention?
- Are side-effect consumers protected?
28.4 Consumer Handling
- Are duplicates possible?
- Must unknown event types be ignored?
- Is offset reset guidance documented?
- Is DLQ/quarantine contract clear?
- Are poison messages handled?
28.5 Schema and Compatibility
- Is schema format known?
- Is registry known?
- Is subject naming strategy known?
- Is compatibility mode declared?
- Are event examples valid?
28.6 Security and Governance
- Is classification defined?
- Is PII marker correct?
- Are ACL expectations clear?
- Is DLQ access controlled?
- Is consumer onboarding process defined?
29. Common Anti-Patterns
29.1 Null Key for Ordered Events
No per-aggregate ordering.
29.2 Event Type per Topic Explosion
Hundreds of topics with no lifecycle/governance.
29.3 One Giant Topic for Everything
All events, all domains, all security levels in one stream.
29.4 Assuming Global Order
Kafka does not provide global order across partitions.
29.5 Compacted Topic Used as Audit Log
Compaction removes old values eventually.
29.6 Consumer Crashes on Unknown Event Type
Adding event type breaks old consumers.
29.7 DLQ Without Original Context
Cannot redrive/debug.
29.8 Replay Causes External Side Effects
Duplicate emails, payments, notifications.
29.9 Producer Publishes Before Commit
Consumers see facts that roll back.
29.10 Changing Key Without Review
Ordering and partitioning contract breaks.
29.11 Offset Reset Accident
Consumer changes group ID and replays millions unexpectedly.
29.12 Schema Registry Compatibility Only
Schema check passes but topic key/order/retention changed.
30. Practice Lab
Lab 1 — Topic Design
Design topics for:
- customer lifecycle events;
- customer latest snapshot;
- case commands;
- payment authorization events;
- failed case event processing.
For each, specify key, retention, compaction, owner, and consumer guidance.
Lab 2 — Key Selection
Choose key for:
CaseApproved;CustomerSnapshotUpdated;PaymentCaptured;FraudSignalObserved;PolicyRuleActivated.
Explain ordering and load trade-offs.
Lab 3 — Replay Policy
Consumer sends email on CustomerRegistered. Topic has 90-day retention. Design safe replay strategy.
Lab 4 — DLQ Contract
Design DLQ payload for failed CaseApproved consumer processing.
Lab 5 — Compaction Semantics
Topic customer-snapshots is compacted. Design tombstone handling and consumer bootstrap behavior.
Lab 6 — Ordering Failure
Events arrive:
CaseApproved version=5
CaseAssigned version=4
Design handling for projection consumer.
31. Senior Engineer Heuristics
- Kafka contract is much bigger than schema.
- Topic name is public API surface.
- Key choice is ordering, partitioning, and compaction policy.
- Kafka gives order within partition, not global order.
- Use aggregateVersion even when key ordering exists.
- At-least-once means duplicates are normal, not exceptional.
- Compacted topic is latest state, not history.
- Tombstone is a contract.
- Replay is dangerous for side-effect consumers.
- DLQ is a contract, not garbage storage.
- Unknown event types must not break multi-type topic consumers.
- Changing key is often a breaking change.
- Retention defines replay window.
- Consumer group ID is operational identity; choose it deliberately.
- Schema compatibility does not protect ordering, retention, or replay semantics.
32. Summary
Kafka event contract engineering requires thinking beyond payload schema. Topic, key, partitioning, ordering, retention, compaction, replay, DLQ, consumer groups, offset behavior, and security are all part of the real contract.
Main takeaways:
- topic is a contract boundary with owner and lifecycle;
- message key determines partitioning and often ordering;
- ordering is per partition, commonly per aggregate if keyed correctly;
- aggregateVersion helps detect gaps, duplicates, and out-of-order processing;
- delivery is usually at-least-once, so consumers must be idempotent;
- retention determines replay window;
- compaction retains latest per key and requires tombstone semantics;
- replay must distinguish projection consumers from side-effect consumers;
- DLQ/quarantine must preserve original event context;
- Kafka configuration changes can be contract changes;
- schema registry is necessary but not sufficient for Kafka governance.
Part berikutnya membahas Avro contract engineering: schema resolution, defaults, union types, logical types, namespace strategy, and backward/forward/full compatibility.
You just completed lesson 16 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.
Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.