Build CoreOrdered learning track

Learn Java Messaging Event Streaming Part 016 Kafka Core Architecture

[]17 min read3310 words

In This Lesson

Tujuan Bagian Ini 1. Mental Model Utama 2. Vocabulary Dasar

Lesson 1635 lesson track07–19 Build Core

title: Learn Java Messaging and Event Streaming - Part 016 description: Apache Kafka core architecture: broker, topic, partition, record batch, offset, log segment, retention, replica, ISR, leader/follower, controller, producer path, and consumer path. series: learn-java-messaging-event-streaming seriesTitle: Learn Java Messaging and Event Streaming order: 16 partTitle: Kafka Core Architecture tags:

java
kafka
apache-kafka
broker
topic
partition
offset
replication
distributed-log
event-streaming date: 2026-06-28

Part 016 — Kafka Core Architecture: Broker, Topic, Partition, Segment, Offset, Replica

Tujuan Bagian Ini

Bagian ini membuka blok Kafka. Kita tidak mulai dari Producer API atau Consumer API dulu. Kita mulai dari arsitektur inti, karena hampir semua tuning Kafka adalah konsekuensi dari model ini:

Kafka is a distributed, replicated, partitioned, append-only log.

Setelah bagian ini, kamu harus bisa:

Menjelaskan Kafka sebagai distributed log, bukan sekadar message broker.
Memahami topic, partition, offset, segment, replica, leader, follower, ISR, controller, dan consumer group.
Menjelaskan path producer dari record sampai durable/committed message.
Menjelaskan path consumer dari fetch request sampai offset commit.
Membaca failure mode Kafka dari struktur log dan replication-nya.
Mendesain partition count, key, retention, dan replication factor sebagai keputusan arsitektur.
Menghindari anti-pattern: topic-per-entity, partition explosion, global ordering illusion, dan retention tanpa disk budget.

1. Mental Model Utama

Kafka bukan queue tradisional yang menghapus message saat consumer selesai. Kafka menyimpan record di log partition, lalu consumer menyimpan posisi baca dalam bentuk offset.

Invariants:

A topic is split into partitions.
A partition is an ordered append-only log.
An offset is a position inside one partition.
Ordering is guaranteed within a partition, not across partitions.
Consumers track their own progress.
Kafka retention is independent of whether a consumer has read the record.

Kalau kamu mengingat hanya satu hal:

Kafka scales by partitioning the log.

Bukan dengan membuat satu queue lebih cepat, tetapi dengan membagi satu topic menjadi banyak ordered sub-logs.

2. Vocabulary Dasar

Istilah	Arti	Design Consequence
Broker	server Kafka yang menyimpan partition dan melayani client	kapasitas cluster berasal dari gabungan broker
Topic	logical stream/category record	unit contract dan retention
Partition	ordered log shard dari topic	unit parallelism, ordering, dan replication
Record	key/value/header/timestamp yang dipublish	unit data aplikasi
Offset	posisi record di partition	consumer progress dan replay position
Segment	file log fisik untuk bagian partition	retention dan cleanup bekerja per segment
Leader	replica partition yang menerima write	producer menulis ke leader
Follower	replica yang menyalin leader	durability/failover
ISR	in-sync replicas	basis committed record dan durability
Controller	node/role yang mengelola metadata, broker liveness, partition leadership	control plane cluster
Consumer Group	kumpulan consumer yang membagi partition assignment	scale-out consumption
Group Coordinator	broker yang mengelola group membership dan offset commit	rebalance dan offset tracking

3. Topic Bukan Queue

Queue klasik:

message delivered -> acknowledged -> removed or hidden from others

Kafka topic:

record appended -> retained by time/size/compaction -> many consumer groups read independently

Konsekuensi:

consumer group A dan B bisa membaca data yang sama tanpa saling mengganggu;
replay adalah fitur natural;
message yang sudah dibaca tidak otomatis hilang;
disk planning wajib;
retention menjadi bagian dari contract.

Anti-pattern:

Treating Kafka topic like a task queue where message lifetime follows worker ack.

Kafka bisa dipakai untuk work distribution, tetapi mental modelnya tetap log + offset, bukan deletion-on-consume queue.

4. Partition: Unit Ordering, Parallelism, dan Storage

Partition adalah unit paling penting di Kafka.

Topic case-events
  partition 0: offset 0, 1, 2, 3, ...
  partition 1: offset 0, 1, 2, 3, ...
  partition 2: offset 0, 1, 2, 3, ...

Setiap partition punya offset sequence sendiri.

4.1 Ordering Hanya per Partition

Jika record dengan key sama selalu masuk partition yang sama, order record untuk key itu bisa dipertahankan.

Contoh:

key = caseId

Maka event untuk CASE-123 akan masuk partition yang sama, sehingga urutan lifecycle case bisa dijaga.

CASE-123 OPENED
CASE-123 ASSIGNED
CASE-123 ESCALATED
CASE-123 CLOSED

Tapi order antar case berbeda tidak dijamin secara global.

CASE-123 offset 10 in partition 0
CASE-999 offset 8 in partition 1

Tidak ada meaningful global order antar partition hanya dari offset.

4.2 Partition Count Adalah Keputusan Arsitektur

Partition count menentukan:

parallelism producer write;
parallelism consumer group;
jumlah file/log handle;
replication workload;
rebalance cost;
controller metadata size;
recovery time;
maximum number of active consumers in one group for that topic.

Rule:

More partitions increase parallelism and operational overhead.
Fewer partitions simplify operations but cap throughput and consumer scaling.

Jangan membuat 1,000 partition karena “biar scalable”. Jangan membuat 1 partition karena “ordering gampang” jika throughput dan replay SLA tidak akan cukup.

5. Offset: Posisi, Bukan ID Bisnis

Offset adalah posisi record dalam satu partition. Offset bukan event ID, bukan transaction ID, bukan business sequence global.

(topic=case-events, partition=2, offset=99182)

Ini cukup untuk menemukan record secara teknis, tetapi tidak cukup sebagai identity bisnis.

Gunakan event envelope:

{
  "eventId": "evt-2026-000042",
  "caseId": "CASE-123",
  "eventType": "CASE_ESCALATED",
  "eventVersion": 7,
  "occurredAt": "2026-06-28T04:15:00Z"
}

Lalu Kafka coordinate:

topic=case-events
partition=4
offset=120998

Keduanya punya fungsi berbeda:

Value	Dipakai Untuk
eventId	idempotency, audit, trace
caseId	partition key, aggregate affinity
eventVersion	aggregate ordering/business concurrency
topic/partition/offset	replay, debugging, consumer checkpoint

Anti-pattern:

Menggunakan offset sebagai business event id.

Offset bisa berubah jika event di-republish ke topic lain, cluster lain, atau hasil compacted/reprocessed pipeline.

6. Record Anatomy

Record Kafka biasanya punya:

key
value
headers
timestamp
topic
partition
offset

6.1 Key

Key menentukan partition jika partitioner menggunakan hash key.

Contoh:

ProducerRecord<String, byte[]> record = new ProducerRecord<>(
    "case-events",
    caseId,        // key
    payloadBytes   // value
);

Key sebaiknya dipilih berdasarkan ordering dan locality requirement.

Common key choices:

Key	Cocok Untuk	Risiko
caseId	lifecycle per case	hot case jika satu case sangat aktif
customerId	customer-level timeline	high-cardinality, possible hot customer
region	regional ordering	hot region, low parallelism
random/null	max distribution	no per-entity ordering
composite key	custom locality	migration dan compatibility lebih sulit

6.2 Value

Value adalah payload. Gunakan schema discipline. Ini akan dibahas khusus di Part 022.

6.3 Headers

Headers berguna untuk metadata ringan:

correlation-id
causation-id
traceparent
schema-id
event-type
producer-service

Jangan taruh payload besar di headers. Jangan taruh PII di headers kecuali governance mengizinkan.

6.4 Timestamp

Timestamp dapat merepresentasikan create time atau log append time tergantung konfigurasi. Jangan asal pakai timestamp Kafka sebagai business occurred time tanpa contract.

7. Log Segment: Bentuk Fisik dari Partition

Partition log tidak disimpan sebagai satu file tak terbatas. Ia dipecah menjadi segment files.

case-events-0/
  00000000000000000000.log
  00000000000000000000.index
  00000000000000000000.timeindex
  00000000000000100000.log
  00000000000000100000.index
  00000000000000100000.timeindex

Segment penting karena:

retention delete bekerja per segment;
compaction bekerja terhadap log data;
file rolling memengaruhi cleanup timing;
index membantu lookup offset;
disk footprint dihitung dari semua segment replica.

7.1 Retention Tidak Presisi Per Record

Jika retention time 7 hari, bukan berarti setiap record dihapus tepat pada umur 7 hari. Karena cleanup bekerja per segment, record bisa bertahan lebih lama tergantung segment boundary dan cleanup schedule.

Rule:

Retention is a storage policy, not a precise per-record deletion timer.

Untuk compliance deletion/PII, jangan mengandalkan topic retention saja tanpa governance desain yang benar.

8. Broker dan Cluster

Broker menyimpan partition replicas dan melayani request producer/consumer/admin.

Cluster capacity bukan hanya jumlah broker. Capacity dipengaruhi:

leader distribution;
replica distribution;
disk throughput;
network throughput;
CPU untuk compression/TLS/requests;
page cache;
partition count;
request batching;
consumer lag/replay pattern.

Broker yang punya terlalu banyak partition leader bisa menjadi hotspot.

9. Controller dan Metadata

Kafka butuh control plane yang tahu:

broker mana yang hidup;
topic dan partition apa yang ada;
replica assignment;
partition leader;
ISR;
cluster metadata changes.

Di Kafka modern, metadata dikelola dengan KRaft mode, bukan bergantung pada ZooKeeper untuk cluster baru. Untuk engineer aplikasi, detail internal controller tidak selalu muncul di code, tetapi sangat penting saat incident.

Gejala control-plane problem:

metadata request lambat;
leader election sering;
topic creation/deletion lambat;
partition reassignment stuck;
broker flapping;
consumer rebalance ikut terganggu.

Mental model:

Data plane moves records.
Control plane decides where partitions live and who leads them.

10. Replication: Leader, Follower, ISR

Setiap partition punya replicas. Satu replica menjadi leader. Producer menulis ke leader. Follower menyalin log dari leader.

Untuk replication factor 3:

1 leader + 2 followers = 3 replicas

10.1 ISR

ISR berarti in-sync replicas: replica yang dianggap cukup up-to-date terhadap leader.

Committed record berkaitan dengan ISR dan producer acknowledgment.

Important invariant:

Kafka durability is about replicated log state, not simply “broker accepted write”.

Jika producer memakai durability setting lemah, producer bisa mendapat latency rendah tetapi risiko loss lebih besar saat leader failure.

10.2 acks dan min.insync.replicas

Producer setting acks menentukan kapan producer menganggap write sukses:

acks	Makna Sederhana	Risiko
0	producer tidak menunggu broker ack	loss tidak terdeteksi
1	leader ack setelah menerima write	loss mungkin jika leader gagal sebelum replica catch up
all / -1	tunggu ISR sesuai aturan	latency lebih tinggi, durability lebih kuat

Topic/broker setting min.insync.replicas menentukan minimal ISR yang diperlukan untuk menerima write dengan acks=all.

Common production baseline untuk critical event:

replication.factor = 3
min.insync.replicas = 2
producer acks = all
producer enable.idempotence = true

Detail producer reliability akan dibahas di Part 017 dan Part 020.

11. Producer Write Path

Producer write path simplified:

Important details:

producer tidak mengirim ke “topic” secara abstrak; producer mengirim ke leader broker untuk partition tertentu;
metadata digunakan untuk mengetahui leader partition;
key menentukan partition;
producer melakukan batching;
compression bisa terjadi pada batch;
broker append ke log;
replication menentukan committed visibility/durability.

11.1 Metadata Matters

Producer perlu metadata cluster:

topic -> partitions -> leader broker

Jika metadata stale:

producer mendapat NotLeaderOrFollower/leader-related error;
refresh metadata;
retry;
latency spike.

Ini normal saat leader election atau broker rolling restart.

12. Consumer Read Path

Consumer read path simplified:

Important details:

consumer pulls data;
fetch request menyertakan offset;
consumer bisa rewind/seek;
group coordinator mengelola membership dan offset commit;
offset commit biasanya menyimpan “next offset to read”, bukan last processed offset.

12.1 Pull Model

Kafka consumer pull model membuat consumer bisa:

mengontrol rate;
membaca batch besar;
tertinggal dan catch up;
replay dari offset lama;
menghindari broker push yang membanjiri consumer lambat.

Tapi pull model juga berarti:

poll loop harus sehat;
processing tidak boleh blocking terlalu lama;
commit strategy harus benar;
lag harus dimonitor.

13. Consumer Group dan Partition Assignment

Dalam satu consumer group, satu partition hanya diproses oleh satu consumer aktif pada satu waktu.

Jika ada 4 partitions dan 2 consumers, masing-masing bisa mendapat 2 partitions. Jika ada 4 partitions dan 6 consumers, 2 consumers idle.

Max active consumers for a topic in one group <= partition count.

Consumer group memungkinkan scale-out tetapi tidak mengubah ordering per partition.

13.1 Banyak Consumer Group

Consumer group berbeda membaca secara independen.

Ini menguntungkan untuk:

audit projector;
search indexer;
notification service;
analytics pipeline;
compliance archive.

Tapi setiap group menambah read load dan operational surface.

14. Offset Commit: Checkpoint Consumer

Offset commit adalah checkpoint consumer group. Kafka menyimpan offset commit di internal compacted topic __consumer_offsets melalui group coordinator.

Important:

Offset commit says: this group can resume from here.
It does not prove business side effect is correct.

Commit terlalu awal:

consume -> commit -> process -> crash during process

Risiko: data dianggap selesai padahal side effect belum terjadi.

Commit terlalu akhir:

consume -> process -> crash before commit

Risiko: message diproses ulang. Ini aman jika idempotent.

Untuk at-least-once processing, pattern umum:

poll -> process idempotently -> commit offset after success

15. Retention dan Replay

Kafka menyimpan data berdasarkan retention policy, bukan berdasarkan consumer acknowledgement.

Retention modes:

Mode	Makna
time-based	hapus segment yang lebih tua dari waktu tertentu
size-based	hapus saat size melebihi limit
compaction	simpan latest value per key secara eventual
delete + compact	kombinasi behavior

Replay hanya mungkin selama data masih ada.

Replay SLA <= retention window

Jika regulatory audit butuh replay 1 tahun, tetapi retention 7 hari, Kafka topic bukan audit archive penuh. Kamu butuh archive/storage lain atau retention yang sesuai budget.

15.1 Retention Cost Formula

Estimasi kasar:

daily_ingress_bytes = avg_record_size * records_per_day
retained_bytes = daily_ingress_bytes * retention_days
cluster_storage_bytes = retained_bytes * replication_factor * overhead_factor

Contoh:

avg record = 2 KB
records/day = 100 million
ingress/day = 200 GB
retention = 14 days
replication factor = 3
overhead factor = 1.2

storage ≈ 200 GB * 14 * 3 * 1.2 = 10.08 TB

Jangan set retention berdasarkan keinginan replay tanpa menghitung disk.

16. Log Compaction: Latest Value per Key, Bukan Full History

Log compaction mempertahankan setidaknya latest value per key secara eventual. Ini cocok untuk changelog/state topics, bukan audit timeline yang butuh semua event.

Contoh compacted topic:

key=CASE-1 value=status=OPENED
key=CASE-1 value=status=ASSIGNED
key=CASE-1 value=status=CLOSED

Setelah compaction, older values bisa hilang dan latest tetap ada.

Cocok untuk:

lookup table replication;
latest profile state;
Kafka Streams changelog;
materialized state recovery.

Tidak cocok sebagai satu-satunya audit log jika semua transition harus disimpan.

Rule:

Use delete retention for event history.
Use compaction for latest state by key.
Use both only if you understand the cleanup semantics.

17. Kafka Batching dan Page Cache

Kafka performance sangat bergantung pada batching dan sequential I/O.

Producer batching:

many records -> one batch -> one request -> append sequentially

Consumer fetch:

one request -> many records -> large sequential read

Broker storage:

append to log segment -> OS page cache -> disk

Implication:

tiny unbatched messages waste overhead;
compression works better per batch;
caught-up consumers often read from page cache;
historical replay can become disk-heavy;
TLS can change zero-copy behavior and CPU cost;
benchmark must include realistic batching/compression.

18. Topic Design

Topic adalah contract boundary. Jangan buat topic hanya karena ada satu class/event kecil. Jangan gabungkan semua domain karena “biar gampang”.

Pertanyaan desain:

Siapa owner topic?
Apa semantic contract record di topic?
Apa key dan ordering requirement?
Berapa retention?
Apakah compacted atau delete retention?
Siapa consumers?
Apakah schema evolution policy jelas?
Apakah PII/governance jelas?
Berapa throughput dan growth estimate?
Bagaimana replay dilakukan?

18.1 Contoh Topic Boundary

Regulatory platform:

case.lifecycle.events
case.assignment.events
enforcement.decision.events
notification.commands
case.search-index.changelog
reference.regulated-entity.compacted

Penjelasan:

case.lifecycle.events: audit/event history, retention panjang, key caseId.
notification.commands: command-like work distribution, retention pendek, key mungkin notificationId.
reference.regulated-entity.compacted: latest reference data, compacted, key entityId.

19. Partition Key Design

Partition key adalah salah satu keputusan paling sulit.

19.1 Key by Aggregate ID

key = caseId

Kelebihan:

order per case;
projector mudah;
event version validasi sederhana.

Risiko:

hot case;
jika satu case punya event sangat banyak, satu partition jadi hotspot;
cross-case ordering tidak ada.

19.2 Key by Tenant

key = tenantId

Kelebihan:

tenant affinity;
tenant-level processing mudah.

Risiko:

tenant besar menjadi hotspot;
parallelism per tenant rendah;
ordering terlalu coarse.

19.3 Key by Region

key = region

Biasanya buruk untuk high-throughput karena cardinality rendah. Semua APAC masuk partition yang sama jika jumlah region kecil dan key hash menghasilkan distribusi terbatas.

19.4 Key by Composite

key = tenantId + ':' + caseId

Bisa menjaga case ordering dan distribusi lebih baik, tetapi migration dan compatibility perlu disiplin.

Rule:

Choose the key that preserves the smallest ordering scope required by correctness.

Jangan menjaga ordering lebih luas dari kebutuhan. Global ordering mahal dan sering tidak perlu.

20. Partition Count Sizing

Mulai dari pertanyaan:

How much parallelism do we need now and in the foreseeable future?

Faktor:

target MB/sec write;
target MB/sec read per consumer group;
number of consumer groups;
maximum consumers per group;
per-partition throughput expected;
broker count;
replication factor;
recovery/rebalance tolerance;
future growth.

Simplified heuristic:

partitions >= max(required_producer_parallelism, required_consumer_parallelism)

Tetapi jangan berhenti di situ. Terlalu banyak partition:

memperbanyak metadata;
memperbanyak file handles;
memperbanyak leader election work;
memperlama recovery;
memperbesar rebalance cost;
memperumit monitoring.

20.1 Menambah Partition Mengubah Key Mapping

Jika partition count dinaikkan, hash key ke partition bisa berubah untuk sebagian key. Ini bisa mengganggu ordering jika producer lama dan baru tidak dikontrol.

Contoh:

caseId=CASE-123 sebelumnya partition 2
setelah partition count naik, CASE-123 bisa ke partition 7

Event untuk case yang sama bisa tersebar di partition berbeda setelah perubahan partition count.

Mitigasi:

rencanakan partition count cukup untuk horizon tertentu;
gunakan topic baru untuk repartition besar;
gunakan custom partitioning dengan hati-hati;
lakukan migration window;
pastikan consumer bisa handle ordering transition jika diperlukan.

21. Producer Path Failure Modes

Failure	Gejala	Penyebab Umum	Mitigasi
metadata stale	retry, latency spike	leader election	retry dengan timeout benar
buffer exhausted	send blocked/error	broker lambat, producer terlalu cepat	backpressure, buffer tuning
serialization error	record gagal sebelum send	schema/payload invalid	validate sebelum publish
not enough replicas	produce error	ISR < min.insync.replicas	broker health, RF/minISR design
timeout	unknown outcome	network/broker overload	idempotent producer, retry discipline
record too large	rejection	payload melebihi config	payload contract, external blob storage

Unknown outcome adalah masalah utama distributed publish:

Producer timeout does not always mean broker did not append.

Karena itu producer idempotence dan idempotent consumer penting.

22. Consumer Path Failure Modes

Failure	Gejala	Penyebab Umum	Mitigasi
lag grows	consumer tertinggal	handler lambat, traffic naik	scale partitions/consumers, optimize handler
rebalance storm	processing interrupted	poll loop lambat, unstable instances	tune poll, static membership, cooperative rebalance
duplicate processing	side effect berulang	crash before commit	idempotency/inbox
message skipped	business data hilang	commit before process	commit after success
out-of-range offset	consumer cannot read old offset	retention expired	reset policy, larger retention, archive
hot partition	one consumer overloaded	bad key distribution	key redesign, repartition topic

23. Kafka Tidak Menghapus Kebutuhan Database Transaction Design

Kafka menyelesaikan log transport dan retention. Ia tidak otomatis menyelesaikan:

atomicity DB update + publish;
idempotent side effects;
external API exactly-once;
business invariant validation;
duplicate semantic events;
authorization;
PII governance.

Untuk publish dari database transaction, outbox/CDC masih relevan. Untuk consume ke database, inbox/idempotency masih relevan.

Kita akan bahas detail di Part 031.

24. Kafka vs RabbitMQ Streams Mental Diff

Keduanya punya model log/stream, tetapi ecosystem dan operational assumptions berbeda.

Concern	Kafka	RabbitMQ Streams
Core identity	distributed event streaming platform	stream feature inside RabbitMQ broker ecosystem
Partitioning	native topic partition model	stream + superstream
Consumer group	core concept	single active consumer / stream consumer patterns
Ecosystem	Kafka Streams, Connect, ksqlDB, Schema Registry ecosystem	RabbitMQ routing + stream protocol + AMQP ecosystem
Routing	topic/partition centric	exchange/queue/stream routing model also available
Operational fit	large-scale event platform	broker + stream hybrid use cases

Jangan pilih hanya karena “dua-duanya stream”. Pilih berdasarkan platform fit, team operation, routing needs, replay volume, ecosystem, and governance.

25. Java Touchpoint: AdminClient untuk Melihat Topology

Contoh membaca topic description:

import org.apache.kafka.clients.admin.AdminClient;
import org.apache.kafka.clients.admin.DescribeTopicsResult;
import org.apache.kafka.clients.admin.TopicDescription;

import java.util.List;
import java.util.Map;
import java.util.Properties;

public final class TopicInspector {
    public static void main(String[] args) throws Exception {
        Properties props = new Properties();
        props.put("bootstrap.servers", "localhost:9092");

        try (AdminClient admin = AdminClient.create(props)) {
            DescribeTopicsResult result = admin.describeTopics(List.of("case-events"));
            Map<String, TopicDescription> descriptions = result.allTopicNames().get();

            TopicDescription topic = descriptions.get("case-events");
            System.out.println("Topic: " + topic.name());
            topic.partitions().forEach(partition -> {
                System.out.printf(
                    "partition=%d leader=%s replicas=%s isr=%s%n",
                    partition.partition(),
                    partition.leader(),
                    partition.replicas(),
                    partition.isr()
                );
            });
        }
    }
}

Ini bukan production observability lengkap, tetapi melatih mental model:

topic -> partitions -> leader -> replicas -> ISR

26. Diagram End-to-End Kafka Core

Interpretasi:

Producer menulis ke partition leader.
Follower replicate dari leader.
Consumer group projector membagi partition antar instance.
Consumer group audit bisa membaca semua partition independen dari projector.
Offset masing-masing group berbeda.

27. Anti-Pattern Kafka Core

27.1 Global Ordering Illusion

Salah:

Semua event dalam topic Kafka punya order global.

Benar:

Order hanya per partition.

Jika butuh order per case, key by caseId. Jika butuh order global seluruh organisasi, Kafka multi-partition bukan jawabannya tanpa sequencer global, dan sequencer global menjadi bottleneck.

27.2 Topic per Entity Instance

Salah:

case-CASE-1
case-CASE-2
case-CASE-3
...

Ini menghancurkan metadata dan operasi. Gunakan topic per event category/domain, key per entity.

27.3 Partition per Tenant Tanpa Hitung Tenant Growth

Jika tenant tumbuh ribuan, partition-per-tenant bisa meledak. Jika tenant besar, satu partition bisa hotspot.

27.4 Retention Panjang Tanpa Disk Budget

Salah:

Set retention 365 hari saja, nanti disk ditambah.

Retention adalah cost commitment. Hitung ingress * retention * replication.

27.5 Commit Before Process

Salah:

poll -> commit -> process

Ini bisa skip message jika crash setelah commit.

27.6 Menganggap Kafka Exactly-Once Menyelesaikan Semua Side Effect

Kafka transactions membantu consume-process-produce di Kafka boundary. Ia tidak membuat email, HTTP call, atau database eksternal magically exactly-once.

28. Operational Questions yang Harus Bisa Dijawab

Untuk setiap production topic, engineer senior harus bisa menjawab:

[ ] Berapa partition count dan kenapa?
[ ] Apa partition key dan kenapa?
[ ] Apa replication factor?
[ ] Apa min.insync.replicas?
[ ] Producer pakai acks apa?
[ ] Berapa retention time/size?
[ ] Apakah compacted, delete, atau keduanya?
[ ] Berapa estimated ingress MB/day?
[ ] Berapa estimated storage dengan replication?
[ ] Consumer group apa saja yang membaca?
[ ] Consumer mana yang paling lambat?
[ ] Apa replay SLA?
[ ] Apa schema compatibility policy?
[ ] Apa PII/data governance policy?
[ ] Apa recovery plan jika partition hot?

Jika jawaban tidak ada, topic itu belum siap production-critical workload.

29. Mini Case Study: `case-events`

Requirement

Regulatory case platform butuh event stream untuk lifecycle case:

create case;
assign officer;
request evidence;
receive evidence;
escalate;
enforcement decision;
close case.

Correctness:

Events for the same case must be processed in order.
Different cases do not require global ordering.

Design

topic: case.lifecycle.events
key: caseId
partitions: 24
replication_factor: 3
min_insync_replicas: 2
retention: 90 days
cleanup_policy: delete
producer_acks: all
producer_idempotence: true
schema: versioned envelope + payload

Why:

caseId preserves order per case;
24 partitions allow consumer scale-out;
RF=3 and minISR=2 provide durability baseline;
90-day retention supports operational replay;
long-term audit archive should still exist outside Kafka if legal retention exceeds 90 days.

Consumer Groups

case-projector
case-search-indexer
sla-monitor
notification-orchestrator
audit-archive-writer

Each group stores offsets independently.

Known Risk

Hot case:

A large investigation case generates millions of events.
All go to same partition because key=caseId.

Mitigation options:

accept if rare and bounded;
special-case high-volume case event type;
split sub-stream by caseId + subProcessId if business ordering permits;
build hot-key detection metrics.

30. Latihan 30 Menit

Ambil satu candidate topic dari sistemmu. Isi ini:

topic: ""
owning_service: ""
record_semantics: "event|command|state-change|changelog"
key: ""
why_this_key: ""
ordering_scope: ""
partitions_initial: 0
partitions_growth_plan: ""
replication_factor: 0
min_insync_replicas: 0
retention_time: ""
retention_size: ""
cleanup_policy: "delete|compact|delete,compact"
estimated_records_per_day: 0
estimated_avg_record_size: ""
estimated_storage: ""
consumer_groups:
  - name: ""
    purpose: ""
    replay_sla: ""
pii_fields: []
schema_compatibility: "backward|forward|full|none"
known_hot_key_risks: []

Kemudian jawab:

Apa satu keputusan yang paling mungkin salah?
Apa failure mode-nya?
Apa metric yang akan mendeteksinya paling awal?

31. Ringkasan

Kafka core architecture sederhana tetapi konsekuensinya dalam.

Invariants:

Topic is logical; partition is physical ordering unit.
Offset is per partition, not global business identity.
Producer writes to partition leader.
Followers replicate leader log.
ISR determines which replicas are sufficiently caught up.
Consumer group parallelism is bounded by partition count.
Retention is independent of consumption.
Replay exists only inside retention window.
Partition key is a correctness decision, not only a distribution decision.

Setelah memahami bagian ini, Part 017 tentang producer internals akan jauh lebih masuk akal. Producer configs seperti acks, linger.ms, batch.size, compression.type, enable.idempotence, dan delivery.timeout.ms bukan tuning random; semuanya adalah knob yang memengaruhi path dari record aplikasi ke replicated partition log.

Referensi

Apache Kafka 4.3 documentation — Design, Implementation Log, Distribution, Operations, and Configuration.
Apache Kafka documentation — producer, consumer, admin, and streams APIs.
Apache Kafka design notes on log persistence, batching, pull-based consumer model, offsets, replication, ISR, and committed messages.

Lesson Recap

You just completed lesson 16 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 15

Learn Java Messaging Event Streaming Part 015 Rabbitmq Stream Filtering Dedup Perftest

Next Lesson

Lesson 17

Learn Java Messaging Event Streaming Part 017 Kafka Producer Internals Batching

Learn Java Messaging Event Streaming Part 016 Kafka Core Architecture

Part 016 — Kafka Core Architecture: Broker, Topic, Partition, Segment, Offset, Replica

Tujuan Bagian Ini

1. Mental Model Utama

2. Vocabulary Dasar

3. Topic Bukan Queue

4. Partition: Unit Ordering, Parallelism, dan Storage

4.1 Ordering Hanya per Partition

4.2 Partition Count Adalah Keputusan Arsitektur

5. Offset: Posisi, Bukan ID Bisnis

6. Record Anatomy

6.1 Key

6.2 Value

6.3 Headers

6.4 Timestamp

7. Log Segment: Bentuk Fisik dari Partition

7.1 Retention Tidak Presisi Per Record

8. Broker dan Cluster

9. Controller dan Metadata

10. Replication: Leader, Follower, ISR

10.1 ISR

10.2 acks dan min.insync.replicas

11. Producer Write Path

11.1 Metadata Matters

12. Consumer Read Path

12.1 Pull Model

13. Consumer Group dan Partition Assignment

13.1 Banyak Consumer Group

14. Offset Commit: Checkpoint Consumer

15. Retention dan Replay

15.1 Retention Cost Formula

16. Log Compaction: Latest Value per Key, Bukan Full History

17. Kafka Batching dan Page Cache

18. Topic Design

18.1 Contoh Topic Boundary

19. Partition Key Design

19.1 Key by Aggregate ID

19.2 Key by Tenant

19.3 Key by Region

19.4 Key by Composite

20. Partition Count Sizing

20.1 Menambah Partition Mengubah Key Mapping

21. Producer Path Failure Modes

22. Consumer Path Failure Modes

23. Kafka Tidak Menghapus Kebutuhan Database Transaction Design

24. Kafka vs RabbitMQ Streams Mental Diff

25. Java Touchpoint: AdminClient untuk Melihat Topology

26. Diagram End-to-End Kafka Core

27. Anti-Pattern Kafka Core

27.1 Global Ordering Illusion

27.2 Topic per Entity Instance

27.3 Partition per Tenant Tanpa Hitung Tenant Growth

27.4 Retention Panjang Tanpa Disk Budget

27.5 Commit Before Process

27.6 Menganggap Kafka Exactly-Once Menyelesaikan Semua Side Effect

28. Operational Questions yang Harus Bisa Dijawab

29. Mini Case Study: case-events

Requirement

Design

Consumer Groups

Known Risk

30. Latihan 30 Menit

31. Ringkasan

Referensi

29. Mini Case Study: `case-events`