Build CoreOrdered learning track

Event Design and Versioning

Learn Java Kafka in Action - Part 013

Event design and versioning for Java Kafka systems: event envelope, business facts, causality, timestamps, identifiers, semantic evolution, regulatory-grade auditability, and safe topic/event lifecycle design.

[2026-07-01]21 min read4038 words

In This Lesson

1. Kaufman Skill Decomposition 2. Mental Model: Event as Durable Business Evidence 3. Event, Command, Message, Notification, and Snapshot

PrevNext

Lesson 1335 lesson track07–19 Build Core

#java#kafka#event-design#event-versioning+4 more

Part 013 — Event Design and Versioning

Part 012 focused on schema contracts. This part moves one layer higher.

A schema can be syntactically compatible and still be a bad event. A Kafka event is not just a JSON/Avro/Protobuf payload moving through a topic. In a serious distributed system, an event becomes a long-lived record of a business fact, a coordination signal, an integration contract, a replay source, and sometimes an audit artifact.

The core question in this part:

How do we design Kafka events that remain correct, understandable, evolvable, and useful after the original producer team has changed, downstream consumers have multiplied, and the system has been replayed many times?

Kafka gives us durable ordered logs. It does not automatically give us good event semantics.

1. Kaufman Skill Decomposition

The skill is event modeling for distributed systems.

We deconstruct it into subskills:

Subskill	Production Meaning
Fact modeling	Distinguish facts from commands, requests, snapshots, and notifications.
Naming	Name events after business state changes, not implementation details.
Envelope design	Carry metadata required for tracing, causality, replay, audit, and routing.
Identity design	Use stable event IDs, aggregate IDs, correlation IDs, causation IDs, and idempotency keys.
Time semantics	Separate event time, processing time, ingestion time, and valid/business time.
Versioning	Evolve schema and meaning without breaking consumers.
Topic placement	Decide whether event type, aggregate type, bounded context, or lifecycle stage defines the topic.
Consumer empathy	Design events so consumers can act without hidden producer knowledge.
Replay safety	Make events deterministic enough to rebuild projections.
Governance	Use review rules, compatibility checks, and deprecation policy.

1.1 Target Capability

After this part, we want to be able to look at an event and answer:

What business fact happened?
Who owns the meaning of this fact?
What entity or aggregate does it belong to?
Is ordering needed, and along what key?
Can the event be replayed safely?
Can future consumers understand it without reading the producer code?
Can the event evolve without semantic traps?
Can the event support audit, debugging, and incident reconstruction?

That is the difference between producing messages and designing event contracts.

2. Mental Model: Event as Durable Business Evidence

A Kafka record has:

topic;
partition;
offset;
key;
value;
headers;
timestamp.

An event has more meaning:

something happened;
at a business-relevant time;
to a business entity or process;
under a causal chain;
observed and emitted by a responsible service;
with enough data for intended consumers;
under an explicit compatibility and retention policy.

A good event is not merely serializable. It is interpretable.

2.1 The Production Rule

A Kafka event should survive these changes:

the producer implementation is rewritten;
the consumer team changes;
a new downstream use case appears;
data is replayed months later;
schema evolves;
a bug investigation needs causality;
a regulator asks why a state transition happened.

If the event is only understandable with current code and tribal knowledge, it is not a production-grade event.

3. Event, Command, Message, Notification, and Snapshot

Many Kafka systems fail because teams call every record an event.

3.1 Event

An event is a statement of fact.

Examples:

OrderPlaced
QuoteApproved
PaymentCaptured
CustomerAddressChanged
FraudReviewEscalated
CaseAssigned
EntitlementRevoked

Good event names use past tense because the change already happened.

Good:    OrderPlaced
Weak:    PlaceOrder
Bad:     OrderMessage
Bad:     OrderData
Bad:     OrderUpdated

OrderUpdated is often too vague. It hides what changed. If different consumers care about different changes, vague update events force consumers to diff payloads or infer intent.

3.2 Command

A command expresses intent.

Examples:

PlaceOrder
ApproveQuote
CapturePayment
AssignCase

Commands can be sent through Kafka, but they have different semantics:

there is usually an intended handler;
failure matters to the sender;
duplicate handling must be explicit;
ordering and retry can change side effects;
authorization is usually important.

Kafka can carry commands, but Kafka topics are not a replacement for every synchronous workflow. A command topic should not pretend to be a fact log.

3.3 Notification

A notification says: “something may have changed; check the source.”

Example:

{
  "eventType": "CustomerChanged",
  "customerId": "cus-123"
}

This is lighter than an event-carried state transfer. It reduces payload duplication but increases coupling to a source API.

3.4 Snapshot

A snapshot carries current state at a point in time.

Examples:

CustomerSnapshotPublished
InventoryPositionSnapshot
AccountBalanceSnapshot

Snapshots are useful for bootstrapping and compacted topics, but they are not the same as business events. A snapshot tells what is true now. An event tells what changed.

3.5 Event-Carried State Transfer

This event includes enough changed state for consumers to update themselves without calling the producer.

Example:

{
  "eventType": "CustomerAddressChanged",
  "customerId": "cus-123",
  "newAddress": {
    "country": "ID",
    "city": "Jakarta",
    "postalCode": "12940"
  }
}

This reduces runtime coupling but creates a stronger data contract.

4. Event Naming Rules

Event names should be stable, precise, and domain-owned.

4.1 Use Business Language

Prefer:

QuoteApproved
OrderSubmitted
InvoiceIssued
CaseEscalated
PolicySuspended

Avoid:

QuoteRowInserted
OrderDTOUpdated
InvoiceControllerCompleted
CaseStatusChangedToE2
PolicyTableChanged

Implementation-oriented names leak internals and become obsolete.

4.2 Prefer Specific Facts Over Generic Update Events

Generic events are tempting:

CustomerUpdated
OrderUpdated
CaseUpdated
ProductChanged

They look flexible but push complexity to consumers.

Specific events are better when the meaning differs:

CustomerEmailChanged
CustomerKycVerified
CustomerRiskTierChanged
OrderDeliveryAddressChanged
OrderCancelled
OrderFulfillmentStarted

4.3 When Generic Events Are Acceptable

Generic change events can be valid when:

the topic is a CDC stream;
consumers are data replication systems;
payload includes explicit changed fields;
consumers do not need business intent;
event name is intentionally technical.

Example:

CustomerRecordChanged

This is acceptable for CDC. It is weaker for domain integration.

5. Event Envelope Design

A production event usually needs an envelope.

The envelope should contain cross-cutting metadata. The payload should contain domain data.

5.1 Recommended Envelope Fields

Field	Meaning	Required?
`eventId`	Unique identity of this event occurrence.	Yes
`eventType`	Stable name such as `OrderPlaced`.	Yes
`eventVersion`	Semantic/schema version of event type.	Yes
`source`	Service or bounded context that emitted the event.	Yes
`aggregateType`	Entity/process type such as `Order`, `Quote`, `Case`.	Usually
`aggregateId`	Business identity used for ordering and correlation.	Usually
`correlationId`	End-to-end business request/process correlation.	Yes
`causationId`	Event/command/message that caused this event.	Recommended
`idempotencyKey`	Key for deduplication or safe side effects.	Recommended
`occurredAt`	When the business fact happened.	Yes
`publishedAt`	When the event was published to Kafka.	Recommended
`tenantId`	Tenant boundary in multi-tenant systems.	If multi-tenant
`traceId`	Distributed tracing identity.	Recommended
`data`	Domain payload.	Yes

5.2 Why `eventId` Matters

Without eventId, consumers struggle to deduplicate.

eventId should identify the occurrence, not merely the aggregate.

orderId = ord-123

OrderPlaced eventId       = evt-001
OrderPaymentCaptured      = evt-002
OrderShipmentRequested    = evt-003

If the producer retries the same send, it should ideally preserve the same eventId. If the business action is re-executed, it may create a new event. That distinction is important.

5.3 Why `correlationId` Matters

correlationId connects all records in one business journey.

Example:

HTTP request: POST /orders
correlationId: corr-789

OrderPlaced.correlationId = corr-789
PaymentAuthorized.correlationId = corr-789
InventoryReserved.correlationId = corr-789
FulfillmentStarted.correlationId = corr-789

This supports:

debugging;
tracing;
audit;
customer support;
incident reconstruction;
compliance reporting.

5.4 Why `causationId` Matters

causationId points to the immediate cause.

correlationId groups the journey. causationId explains the direct chain.

5.5 Why `occurredAt` and `publishedAt` Are Different

occurredAt is business time.

publishedAt is broker publication time or producer publication time.

They differ when:

outbox publishing is delayed;
producer retries;
CDC connector lags;
offline device data arrives later;
batch job emits historical events;
replay republishes old events.

Never use Kafka ingestion time as a universal substitute for domain time.

6. CloudEvents and Kafka Event Metadata

CloudEvents is a specification for describing event data in a common way across services, platforms, and systems. It defines standard attributes such as event ID, source, type, subject, time, content type, and data schema.

Kafka systems do not have to use CloudEvents, but the specification is useful as a vocabulary.

A CloudEvents-like envelope can reduce custom metadata drift across teams.

Example JSON shape:

{
  "specversion": "1.0",
  "id": "evt-01J2RF7Z6PTFZ7A5CXQ2J7G8HG",
  "source": "urn:service:order-service",
  "type": "com.example.order.OrderPlaced.v1",
  "subject": "order/ord-123",
  "time": "2026-07-01T09:31:02Z",
  "datacontenttype": "application/json",
  "dataschema": "https://schema.example.com/order-placed/v1",
  "correlationid": "corr-abc",
  "causationid": "cmd-xyz",
  "data": {
    "orderId": "ord-123",
    "customerId": "cus-456",
    "totalAmount": "250000.00",
    "currency": "IDR"
  }
}

6.1 Practical Rule

Use CloudEvents when you need cross-platform interoperability or organization-wide event consistency.

Use a custom envelope when your organization has deeper domain-specific needs, but do not invent names randomly. Align with common concepts where possible.

7. Kafka Key Design and Event Identity

Part 008 covered partitioning. Here we focus on semantic identity.

The Kafka record key should usually represent the ordering boundary.

For aggregate events:

key = aggregateId

Examples:

Event	Recommended Key	Reason
`OrderPlaced`	`orderId`	Order lifecycle must be ordered per order.
`OrderCancelled`	`orderId`	Must follow prior order events.
`CustomerEmailChanged`	`customerId`	Customer profile changes must be ordered per customer.
`InventoryReserved`	`skuId + warehouseId`	Inventory consistency is usually per item-location.
`CaseEscalated`	`caseId`	Case lifecycle ordering.

7.1 Key Is Not Always Event ID

eventId is unique per event occurrence.

key is usually stable per entity or ordering boundary.

key     = order-123
value.eventId = evt-999

If key = eventId, every event may scatter across partitions, destroying per-aggregate ordering.

7.2 Key Should Match Consumer Invariants

Ask:

What must not be processed out of order?

That answer often gives the key.

8. Event Payload Design

A payload should be sufficient, minimal, and stable.

8.1 Sufficient

A consumer should not need hidden producer state for normal operation.

Weak:

{
  "orderId": "ord-123",
  "status": "APPROVED"
}

Better:

{
  "orderId": "ord-123",
  "approval": {
    "approvedAt": "2026-07-01T09:31:02Z",
    "approvedBy": "user-17",
    "approvalPolicy": "AUTO_LOW_RISK_V3",
    "riskScore": 21
  }
}

The better event explains why the state changed.

8.2 Minimal

Do not dump the full database row into every domain event unless the event is explicitly a snapshot or CDC record.

Bad domain event:

{
  "orderId": "ord-123",
  "customerId": "cus-456",
  "internalLockVersion": 14,
  "dbShard": "s3",
  "lastModifiedByBatchJob": false,
  "hibernateProxyClass": "...",
  "allFieldsFromOrderTable": "..."
}

Too much data expands contract surface and makes future changes expensive.

8.3 Stable

Avoid fields that expose implementation details:

database table names;
internal enum codes without domain meaning;
framework class names;
temporary workflow stage names;
internal retry attempt counters;
storage shard identifiers;
UI-specific labels.

8.4 Payload Should Not Require Diffing

If the business event is CustomerEmailChanged, include old and new values when useful:

{
  "customerId": "cus-123",
  "oldEmail": "old@example.com",
  "newEmail": "new@example.com",
  "changedBy": "user-7",
  "reason": "CUSTOMER_REQUEST"
}

Do not force consumers to reconstruct prior state just to understand what happened.

9. Event Versioning Strategy

There are two types of versioning:

Schema versioning — shape-level compatibility.
Semantic versioning — meaning-level compatibility.

Part 012 covered schema. Here we focus on meaning.

9.1 Semantic Versioning Is Harder Than Schema Versioning

This can be schema-compatible but semantically breaking:

field: status
old meaning: approval status
new meaning: fulfillment status

The field name and type stay the same. Consumers still deserialize. But their logic becomes wrong.

9.2 Compatible Event Changes

Usually safe:

add optional field with default;
add metadata field;
add enum value only if consumers tolerate unknown values;
add nested optional object;
document stronger constraints without invalidating existing data.

9.3 Dangerous Event Changes

Usually unsafe:

rename event type;
change event meaning;
change key semantics;
change timestamp meaning;
remove field still used by consumers;
turn optional field into required field;
change unit or currency semantics;
reuse enum value with new meaning;
collapse multiple precise events into one vague event;
split one event into multiple required events without migration.

9.4 Version in Event Type or Schema?

Common options:

Event type includes version:
com.example.order.OrderPlaced.v1
com.example.order.OrderPlaced.v2

Schema Registry handles version:
subject = orders-value
schema version = 7

Both can work, but they solve different problems.

Strategy	Good For	Risk
Schema Registry version only	Compatible structural evolution.	Semantic breaking changes can hide.
Event type version	Semantic breaking changes.	Too many event types if overused.
Topic version	Major migration or new lifecycle.	Operational complexity and duplicated streams.

9.5 Recommended Rule

Use schema evolution for compatible changes.

Use new event type or topic for semantic breaking changes.

Do not hide semantic breaking changes behind schema compatibility.

10. Event Lifecycle Model

Events need lifecycle management.

10.1 Proposed

Before publishing:

identify owner;
define event type;
define topic;
define key;
define envelope;
define payload;
define consumers if known;
define retention;
define compatibility mode;
define examples;
define PII classification.

10.2 Reviewed

Architecture review should check:

Is this event a fact?
Is the name precise?
Is the key correct?
Does payload expose internals?
Is replay safe?
Is schema compatibility defined?
Are time semantics explicit?
Is ownership clear?

10.3 Published

Once published, assume unknown consumers can appear. Even internal topics become organizational contracts quickly.

10.4 Deprecated

Deprecation requires:

replacement event;
migration guide;
consumer inventory;
deadline;
monitoring;
rollback plan.

10.5 Retired

Retirement can happen only when:

no active producers emit old event;
no active consumers require old event;
historical replay requirement is understood;
retention/audit obligations allow removal.

11. Topic Design and Event Design

Topic design shapes consumer coupling.

11.1 Topic Per Aggregate Type

Example:

order.events
customer.events
payment.events

Good when:

lifecycle events share ordering boundary;
consumers commonly need all changes for an aggregate;
event volume is manageable;
one bounded context owns the stream.

Risk:

topic becomes a mixed bag;
many consumers filter most records;
schema subject strategy becomes more complex.

11.2 Topic Per Event Type

Example:

order.placed
order.cancelled
payment.captured

Good when:

event types have very different consumers;
volume differs greatly;
retention differs;
security differs;
schema simplicity matters.

Risk:

topic explosion;
harder lifecycle reconstruction;
consumers subscribe to many topics.

11.3 Topic Per Bounded Context

Example:

sales.events
billing.events
fulfillment.events

Good when:

team ownership maps to bounded contexts;
consumers care about domain context;
governance is organized by domain.

Risk:

too broad if context is large;
consumers may receive unrelated events.

11.4 Topic Per Tenant Is Usually Dangerous

Avoid topic-per-tenant unless there are strong isolation, regulatory, or throughput reasons.

Topic-per-tenant can create:

operational explosion;
ACL sprawl;
consumer subscription complexity;
metadata overhead;
uneven partition distribution.

Often better:

key = tenantId + aggregateId
payload.tenantId = tenantId
ACL = service-level or prefixed topic-level

But strict regulated isolation may justify physical separation.

12. Event Design for Regulatory and Audit Systems

For enforcement lifecycle, case management, compliance, and regulatory systems, event design needs stronger guarantees.

12.1 Audit-Grade Event Requirements

An audit-grade event should answer:

what changed;
when it happened;
who or what caused it;
under which rule/policy/version;
which prior event or command caused it;
what data was known at decision time;
whether the event was automatic or manual;
whether the event is reversible;
which user/service had authority;
whether any correction event superseded it.

12.2 Example: Case Escalation Event

{
  "eventId": "evt-01J2S0...",
  "eventType": "CaseEscalated",
  "eventVersion": "1.0",
  "source": "case-management-service",
  "aggregateType": "Case",
  "aggregateId": "case-2026-00091",
  "correlationId": "corr-921",
  "causationId": "evt-previous",
  "occurredAt": "2026-07-01T10:20:00Z",
  "publishedAt": "2026-07-01T10:20:02Z",
  "data": {
    "caseId": "case-2026-00091",
    "fromLevel": "L1_REVIEW",
    "toLevel": "L2_ENFORCEMENT_REVIEW",
    "reasonCode": "SLA_BREACH",
    "ruleId": "case-escalation-policy",
    "ruleVersion": "2026.07.0",
    "actorType": "SYSTEM",
    "actorId": "sla-monitor-service",
    "decisionInputs": {
      "ageHours": 73,
      "slaThresholdHours": 72,
      "priority": "HIGH"
    }
  }
}

The event is not just status = ESCALATED. It preserves the decision context.

12.3 Corrections Instead of Mutation

Kafka logs are append-oriented. If a prior event was wrong, prefer a correction event.

Examples:

CaseEscalationCorrected
InvoiceIssuedCorrected
PaymentAllocationReversed
CustomerKycDecisionRevised

Do not pretend the old event never happened if auditability matters.

13. Java Event Envelope Example

Below is a simple Java shape. In production, this might be generated from Avro/Protobuf or represented as a generic envelope plus typed payload.

import java.time.Instant;
import java.util.Map;
import java.util.Objects;

public record EventEnvelope<T>(
        String eventId,
        String eventType,
        String eventVersion,
        String source,
        String aggregateType,
        String aggregateId,
        String correlationId,
        String causationId,
        String idempotencyKey,
        Instant occurredAt,
        Instant publishedAt,
        String tenantId,
        String traceId,
        Map<String, String> attributes,
        T data
) {
    public EventEnvelope {
        Objects.requireNonNull(eventId, "eventId");
        Objects.requireNonNull(eventType, "eventType");
        Objects.requireNonNull(eventVersion, "eventVersion");
        Objects.requireNonNull(source, "source");
        Objects.requireNonNull(occurredAt, "occurredAt");
        Objects.requireNonNull(data, "data");
    }
}

Payload example:

import java.math.BigDecimal;
import java.time.Instant;

public record OrderPlaced(
        String orderId,
        String customerId,
        BigDecimal totalAmount,
        String currency,
        Instant placedAt
) {}

Producer sketch:

EventEnvelope<OrderPlaced> event = new EventEnvelope<>(
        eventId,
        "OrderPlaced",
        "1.0",
        "order-service",
        "Order",
        orderId,
        correlationId,
        commandId,
        commandId,
        placedAt,
        Instant.now(),
        tenantId,
        traceId,
        Map.of("channel", "api"),
        new OrderPlaced(orderId, customerId, totalAmount, "IDR", placedAt)
);

ProducerRecord<String, EventEnvelope<OrderPlaced>> record =
        new ProducerRecord<>("order.events", orderId, event);

producer.send(record);

Key point:

Kafka key = orderId
Envelope eventId = unique event occurrence
Envelope idempotencyKey = stable command/business action identity

14. Header vs Payload Metadata

Kafka headers are useful but not always visible in downstream tooling.

14.1 Good Header Uses

Headers are good for:

trace propagation;
content type;
schema metadata;
lightweight routing hints;
producer library metadata;
compression/encryption markers;
framework integration.

14.2 Good Payload Metadata Uses

Payload envelope is better for:

event ID;
event type;
business time;
correlation ID;
causation ID;
aggregate ID;
tenant ID;
audit attributes.

Why? Because payload is usually persisted, indexed, replayed, transformed, and inspected more consistently than headers.

14.3 Practical Rule

If losing the metadata would break audit, replay, or business interpretation, put it in the envelope payload, not only in Kafka headers.

15. Time Semantics

Time is one of the most common sources of subtle bugs.

15.1 Types of Time

Time	Meaning	Example
Event time	When the business fact happened.	Customer placed order at 10:01.
Publish time	When producer emitted to Kafka.	Producer sent at 10:02.
Broker timestamp	Kafka record timestamp.	Broker log timestamp.
Processing time	When consumer processed it.	Projection updated at 10:03.
Valid time	When fact is valid in business domain.	Policy effective from Aug 1.
Transaction time	When system recorded the fact.	DB committed at 10:01:59.

15.2 Example

A regulatory policy change is approved today but effective next month.

{
  "eventType": "PolicyRuleApproved",
  "occurredAt": "2026-07-01T10:00:00Z",
  "publishedAt": "2026-07-01T10:00:02Z",
  "data": {
    "ruleId": "late-filing-penalty-v4",
    "approvedAt": "2026-07-01T10:00:00Z",
    "effectiveFrom": "2026-08-01T00:00:00Z"
  }
}

Consumers must not confuse approval time with effective time.

16. Event Ordering and Causality

Kafka orders records per partition. It does not globally order a distributed business process.

16.1 Per-Aggregate Ordering

For order lifecycle:

key = orderId

Expected sequence:

OrderPlaced -> PaymentAuthorized -> OrderConfirmed -> FulfillmentStarted

But payment may be owned by another service. Its topic may use paymentId or orderId depending on design. Cross-topic ordering is not guaranteed.

16.2 Causality Is Not the Same as Offset Order

Offset order tells:

record A appeared before record B in one partition

Causality tells:

record B happened because of record A

Use causationId for causality, not offset arithmetic.

16.3 State Machine Consumers

For lifecycle consumers, enforce allowed transitions.

Consumer rule:

Never trust event arrival order alone. Validate transition invariants.

17. Idempotency and Event Design

Idempotency is easier when the event carries the right identity.

17.1 Consumer Dedup Table

CREATE TABLE processed_event (
    consumer_name VARCHAR(120) NOT NULL,
    event_id VARCHAR(120) NOT NULL,
    processed_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    PRIMARY KEY (consumer_name, event_id)
);

Consumer flow:

1. Begin transaction.
2. Insert (consumer_name, event_id).
3. If duplicate key, skip.
4. Apply side effect/projection.
5. Commit transaction.
6. Commit Kafka offset.

17.2 Idempotency Key vs Event ID

eventId identifies the event occurrence.

idempotencyKey identifies the business action that should not be applied twice.

Example:

Command: Approve quote commandId=cmd-123
Event: QuoteApproved eventId=evt-456 idempotencyKey=cmd-123

If the producer accidentally emits two different event IDs for the same command, consumers can still deduplicate by idempotency key when appropriate.

18. Event Design Patterns

18.1 Domain Event

Represents a business fact.

OrderPlaced
InvoiceIssued
CaseEscalated

Use for service integration and business process propagation.

18.2 Integration Event

Externalized event intended for other bounded contexts.

It may be derived from internal domain events and cleaned for public consumption.

This protects internal model evolution.

18.3 CDC Event

Represents database row-level change.

Good for:

replication;
search indexing;
analytics;
migration;
outbox publication.

Less ideal as a domain event because it may not express business intent.

18.4 State Snapshot Event

Represents full current state.

Good for compacted topics and cache warming.

18.5 Correction Event

Represents a correction to earlier facts.

Essential when audit matters.

18.6 Tombstone Event

In Kafka compacted topics, a record with key and null value can act as a tombstone for compaction. This is a storage/log semantics pattern and must be designed carefully because consumers need to understand deletion semantics.

19. Event Design Anti-Patterns

19.1 The DTO Dump

Producer serializes API response or database entity directly.

Problem:

exposes internals;
breaks consumers when API changes;
includes irrelevant fields;
couples event model to persistence or UI.

19.2 The Vague Update

EntityUpdated

Problem:

consumers infer intent;
audit is weak;
replay logic becomes conditional and fragile.

19.3 The Command Disguised as Event

SendEmailEvent

This is likely a command: SendEmail.

Problem:

unclear ownership;
unclear success/failure semantics;
retries may cause duplicate side effects.

19.4 The Global Event

business.events

Everything goes into one mega-topic.

Problem:

schema chaos;
retention conflict;
security conflict;
consumer filtering overhead;
ownership ambiguity.

19.5 The Meaning Mutation

Same event name, same field names, changed meaning.

Problem:

compatibility tooling cannot detect it;
consumers silently become wrong.

19.6 The Hidden Join Requirement

Event contains only IDs, but every consumer must call producer APIs to understand it.

Problem:

runtime coupling;
API fan-out;
inconsistent reads;
replay becomes slow or impossible.

20. Design Review Checklist

Before approving a new event, ask:

20.1 Semantics

Is this a fact, command, notification, or snapshot?
Is the event name past tense and business meaningful?
Is the event precise enough?
Does it represent a stable concept?

20.2 Ownership

Which bounded context owns the event?
Who approves schema changes?
Who handles deprecation?
Who supports consumers during incidents?

20.3 Kafka Placement

Which topic?
Which key?
How many partitions?
Is ordering boundary explicit?
Is retention appropriate?
Is compaction appropriate?

20.4 Contract

Is schema compatibility mode defined?
Are examples included?
Are enum evolution rules documented?
Are nullable fields intentional?
Are units and currency explicit?

20.5 Operations

Can event be replayed?
Can consumer deduplicate?
Can we trace causality?
Can we diagnose producer bugs?
Can we migrate to v2 safely?

20.6 Compliance

Does it contain PII?
Does it contain secrets?
Does retention violate policy?
Does it need encryption or tokenization?
Does it preserve decision context?

21. ADR Template for Event Design

# ADR: Publish <EventName>

## Status
Proposed | Accepted | Deprecated | Retired

## Context
What business state change needs to be communicated?

## Event Classification
Domain event | Integration event | CDC event | Snapshot | Command | Notification

## Event Name
<EventName>

## Owner
<service/team/bounded context>

## Topic
<topic-name>

## Key
<key expression>

## Ordering Requirement
What must be ordered and why?

## Envelope
Required metadata fields.

## Payload
Fields, meaning, units, examples.

## Schema Strategy
Format, subject, compatibility mode.

## Versioning Strategy
Compatible evolution rules and breaking-change plan.

## Consumers
Known consumers and expected use.

## Retention and Replay
Retention period, compaction, replay safety.

## Security and Privacy
PII, secrets, tenant boundary, ACL.

## Audit Requirement
Decision context and causality fields.

## Alternatives Considered
API call, command topic, CDC, batch file, database read.

22. Worked Example: Bad to Better Event

22.1 Initial Event

{
  "eventType": "OrderUpdated",
  "orderId": "ord-123",
  "status": "APPROVED"
}

Problems:

vague name;
unclear status dimension;
no event ID;
no causality;
no business time;
no reason;
weak audit value;
hard to version semantically.

22.2 Better Event

{
  "eventId": "evt-01J2S4TRZ6T8H0JX0FZP9M7E8R",
  "eventType": "OrderApproved",
  "eventVersion": "1.0",
  "source": "order-service",
  "aggregateType": "Order",
  "aggregateId": "ord-123",
  "correlationId": "corr-abc",
  "causationId": "cmd-approve-789",
  "idempotencyKey": "cmd-approve-789",
  "occurredAt": "2026-07-01T10:10:00Z",
  "publishedAt": "2026-07-01T10:10:01Z",
  "tenantId": "tenant-a",
  "data": {
    "orderId": "ord-123",
    "approvalType": "AUTO",
    "approvedBy": "risk-policy-engine",
    "policyId": "order-approval-policy",
    "policyVersion": "2026.07.0",
    "riskScore": 18,
    "currency": "IDR",
    "totalAmount": "250000.00"
  }
}

This event is longer, but it is more useful and safer.

22.3 When Shorter Is Better

Not every event needs everything. A high-volume telemetry event may intentionally use a compact schema.

The rule is not “always verbose”. The rule is:

Include enough information for the event's operational, business, audit, and replay purpose.

23. Practice Lab

23.1 Lab: Redesign Vague Events

Given these events:

CustomerUpdated
PaymentChanged
CaseStatusUpdated
QuoteModified
OrderProcessed

For each, produce:

more precise event names;
aggregate key;
envelope fields;
required payload fields;
versioning concern;
replay concern.

23.2 Lab: Design Event for Case Escalation

Design a CaseEscalated event for a regulatory case system.

Requirements:

support audit;
preserve rule version;
support manual and automatic escalation;
allow replay into a read model;
support deduplication;
avoid PII leakage where not needed.

Deliverables:

event name;
topic;
key;
envelope;
payload;
schema compatibility mode;
sample JSON;
ADR.

23.3 Lab: Breaking Change Detection

Review this proposed change:

OrderApproved.v1:
  approvedBy = user ID or service ID

OrderApproved.v2:
  approvedBy = display name

Question:

Is this schema-compatible?
Is it semantically compatible?
How should it be changed safely?

Expected answer:

It may be schema-compatible if the field remains string.
It is semantically breaking because identity semantics changed.
Add a new field such as approvedByDisplayName, keep approvedByActorId, or publish a new event version.

24. Production Readiness Rubric

Level	Event Design Capability
L1	Can produce and consume typed events.
L2	Can distinguish event, command, snapshot, and notification.
L3	Can design envelope, key, time, and versioning strategy.
L4	Can review event contracts for compatibility, replay, audit, privacy, and ownership.
L5	Can govern an organization-wide event model across teams and lifecycles.

Top-level Kafka engineers operate at L4/L5. They do not only ask “does it serialize?” They ask “will this event still be correct under replay, scale, audit, and evolution?”

25. Key Takeaways

A Kafka event is a durable statement of fact, not merely a message payload.
Good event names are precise, past tense, and domain-owned.
Event schema compatibility is not enough; semantic compatibility matters more.
Use stable identifiers: eventId, aggregateId, correlationId, causationId, and idempotency keys.
Keep Kafka key aligned with ordering boundary.
Separate occurredAt, publishedAt, processing time, and effective/valid time.
Do not mutate event meaning under the same name.
In audit-grade systems, preserve decision context, actor, rule version, and causality.
Treat event lifecycle as a governed product: proposed, reviewed, published, deprecated, retired.

26. References

Apache Kafka Documentation — https://kafka.apache.org/documentation/
Apache Kafka Design — https://kafka.apache.org/42/design/design/
Confluent Schema Evolution and Compatibility — https://docs.confluent.io/platform/current/schema-registry/fundamentals/schema-evolution.html
CloudEvents Specification — https://github.com/cloudevents/spec
CloudEvents Project — https://cloudevents.io/

Lesson Recap

You just completed lesson 13 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 12

Schema Contracts and Evolution

Next Lesson

Lesson 14

Communication Patterns with Kafka

Event Design and Versioning

Part 013 — Event Design and Versioning

1. Kaufman Skill Decomposition

1.1 Target Capability

2. Mental Model: Event as Durable Business Evidence

2.1 The Production Rule

3. Event, Command, Message, Notification, and Snapshot

3.1 Event

3.2 Command

3.3 Notification

3.4 Snapshot

3.5 Event-Carried State Transfer

4. Event Naming Rules

4.1 Use Business Language

4.2 Prefer Specific Facts Over Generic Update Events

4.3 When Generic Events Are Acceptable

5. Event Envelope Design

5.1 Recommended Envelope Fields

5.2 Why eventId Matters

5.3 Why correlationId Matters

5.4 Why causationId Matters

5.5 Why occurredAt and publishedAt Are Different

6. CloudEvents and Kafka Event Metadata

6.1 Practical Rule

7. Kafka Key Design and Event Identity

7.1 Key Is Not Always Event ID

7.2 Key Should Match Consumer Invariants

8. Event Payload Design

8.1 Sufficient

8.2 Minimal

8.3 Stable

8.4 Payload Should Not Require Diffing

9. Event Versioning Strategy

9.1 Semantic Versioning Is Harder Than Schema Versioning

9.2 Compatible Event Changes

9.3 Dangerous Event Changes

9.4 Version in Event Type or Schema?

9.5 Recommended Rule

10. Event Lifecycle Model

10.1 Proposed

10.2 Reviewed

10.3 Published

10.4 Deprecated

10.5 Retired

11. Topic Design and Event Design

11.1 Topic Per Aggregate Type

11.2 Topic Per Event Type

11.3 Topic Per Bounded Context

11.4 Topic Per Tenant Is Usually Dangerous

12. Event Design for Regulatory and Audit Systems

12.1 Audit-Grade Event Requirements

12.2 Example: Case Escalation Event

12.3 Corrections Instead of Mutation

13. Java Event Envelope Example

14. Header vs Payload Metadata

14.1 Good Header Uses

14.2 Good Payload Metadata Uses

14.3 Practical Rule

15. Time Semantics

15.1 Types of Time

15.2 Example

16. Event Ordering and Causality

16.1 Per-Aggregate Ordering

16.2 Causality Is Not the Same as Offset Order

16.3 State Machine Consumers

17. Idempotency and Event Design

17.1 Consumer Dedup Table

17.2 Idempotency Key vs Event ID

18. Event Design Patterns

18.1 Domain Event

18.2 Integration Event

18.3 CDC Event

18.4 State Snapshot Event

18.5 Correction Event

18.6 Tombstone Event

19. Event Design Anti-Patterns

19.1 The DTO Dump

19.2 The Vague Update

19.3 The Command Disguised as Event

19.4 The Global Event

5.2 Why `eventId` Matters

5.3 Why `correlationId` Matters

5.4 Why `causationId` Matters

5.5 Why `occurredAt` and `publishedAt` Are Different