Debezium CDC in Java Systems
Learn Java Data Pipeline Pattern - Part 021
Debezium CDC in Java systems: connector topology, Kafka Connect runtime, envelopes, offsets, schema history, snapshots, heartbeats, transaction metadata, sink integration, and production failure handling.
Part 021 — Debezium CDC in Java Systems
Part 020 built the CDC mental model. This part turns it into an implementation model with Debezium in Java-based systems.
Debezium is commonly described as a CDC platform. In practical Java architecture, think of it as:
A log-reader plus envelope producer that converts database commit history into Kafka records, usually running inside Kafka Connect, with source offsets, schema metadata, snapshot behavior, and connector-specific operational constraints.
That wording matters.
Debezium is not just a library you call from a Java method. In production, it is usually an infrastructure component sitting beside your Java services. Your Java services write to operational databases; Debezium observes committed changes; Kafka stores the change stream; Java consumers, stream processors, and sinks consume those changes.
The value of Debezium is not merely "near real-time events". Its real value is that it lets downstream systems reason from database commit history instead of fragile polling loops or unsafe dual writes.
But it also introduces a new boundary: the database log becomes part of your distributed system contract.
1. What Problem Debezium Solves
Debezium primarily solves the problem:
How can downstream systems observe committed database changes without changing every write path and without polling source tables repeatedly?
It is strongest when you need:
- replication into Kafka,
- cache invalidation,
- search index update,
- materialized views,
- lake ingestion,
- audit trail construction,
- migration from monolith database to event-driven architecture,
- integration with legacy systems that cannot easily emit domain events.
It is weaker when you need:
- rich business intent,
- user intent semantics,
- cross-aggregate domain facts,
- carefully curated domain event names,
- event payloads independent of database schema,
- domain-level versioning controlled by application code.
For those cases, the outbox pattern in Part 022 is usually a better fit.
Debezium sees committed row changes. It does not automatically know why those changes happened.
2. Debezium Is Usually Kafka Connect, Not Application Code
Most production Debezium deployments run as Kafka Connect source connectors.
The distinction is important:
| Layer | Responsibility |
|---|---|
| Java application | Owns business transaction and writes to database |
| Database | Commits state and writes transaction log |
| Debezium connector | Reads database change source and emits change events |
| Kafka Connect runtime | Runs connector tasks, stores offsets/config/status, handles connector lifecycle |
| Kafka | Stores emitted change events durably |
| Java consumers/processors | Interpret change events and update downstream systems |
A common mistake is to put too much logic in the connector layer.
The connector should usually be boring:
- read from source log,
- emit stable envelope,
- route topic if necessary,
- preserve source metadata,
- expose metrics,
- fail loudly when it cannot maintain correctness.
Business transformation belongs downstream, not inside the CDC connector.
3. Connector Topology
A Debezium topology has several moving parts.
The operational health of the pipeline depends on all of these:
- database permissions,
- replication slot or equivalent log reader state,
- log retention,
- Kafka Connect offset storage,
- Kafka topic availability,
- schema history persistence,
- connector task liveness,
- consumer lag downstream.
If any one is mismanaged, the CDC pipeline can silently fall behind or loudly fail. Loud failure is preferable.
4. The Debezium Envelope Mental Model
Debezium emits records with a structured envelope. The exact structure depends on connector and configuration, but the conceptual model is stable.
A CDC event usually contains:
before: row state before change, when available,after: row state after change, when available,op: operation code,source: database/table/log position metadata,ts_msor equivalent timestamp metadata,- transaction metadata if enabled/configured,
- schema information when using schemaful serialization.
A simplified update event:
{
"before": {
"case_id": "C-1001",
"status": "UNDER_REVIEW",
"risk_score": 81
},
"after": {
"case_id": "C-1001",
"status": "ESCALATED",
"risk_score": 93
},
"source": {
"connector": "postgresql",
"db": "regulatory",
"schema": "public",
"table": "case_file",
"lsn": 24023128,
"txId": 721901
},
"op": "u",
"ts_ms": 1783148400000
}
The operation code matters:
| Operation | Meaning |
|---|---|
c | create / insert |
u | update |
d | delete |
r | snapshot read |
Do not treat all events as "upserts" unless your downstream model explicitly allows it.
A delete is not the same as an update with a null payload. A snapshot read is not the same as a new business creation. An update may carry both before and after state, or only after state depending on database configuration and connector support.
5. Snapshot Events Are Not Business Events
During initial snapshot, Debezium emits existing rows. These are often marked differently from streaming updates.
That means:
op = r
should generally mean:
This row existed at snapshot time.
It should not mean:
This entity was created now.
This distinction matters for downstream side effects.
Bad consumer:
if (event.after() != null) {
emailService.sendCaseCreatedEmail(event.after().caseId());
}
Better consumer:
if (event.op() == Operation.CREATE && !event.isSnapshot()) {
domainProjection.applyNewlyCreatedCase(event.after());
}
if (event.op() == Operation.READ && event.isSnapshot()) {
domainProjection.loadExistingCase(event.after());
}
In most pipelines, snapshot events should update projections, indexes, or lake tables. They should not trigger irreversible external side effects.
6. Topic Naming and Capture Scope
A default Debezium topic model often maps source tables into topics.
Conceptually:
<serverName>.<schema>.<table>
or similar depending on connector and configuration.
Example:
regulatory.public.case_file
regulatory.public.case_assignment
regulatory.public.enforcement_action
This default is useful for replication-style CDC but not always ideal for product-facing event streams.
Design questions:
| Question | Why It Matters |
|---|---|
| One topic per table or routed topics? | Affects consumer coupling and topic count |
| Include schema name? | Affects multi-schema databases and migrations |
| Capture all columns or selected columns? | Affects privacy, payload size, evolution |
| Capture all tables or explicit allowlist? | Affects blast radius |
| Compact topic or retain history? | Affects rebuild strategy |
| Who owns topic compatibility? | Affects governance |
Rule of thumb:
- Use table topics for internal replication/materialization.
- Use outbox-routed topics for domain events.
- Avoid exposing raw table CDC topics as public domain APIs unless consumers explicitly accept database coupling.
7. Source Offset Is a Correctness Boundary
CDC correctness depends on durable source offsets.
A source offset answers:
Up to which source-log position has the connector already emitted events?
For PostgreSQL, this may involve LSN-like position metadata. For MySQL, binlog file and position are involved. For other systems, the representation differs.
The important invariant is tool-independent:
A connector restart must resume from a source position that prevents both silent loss and unbounded duplication beyond the expected delivery model.
Kafka Connect stores connector offsets in Kafka internal topics. The source database may also maintain server-side state, such as replication slots or equivalent retention mechanisms.
Failure cases:
| Failure | Consequence |
|---|---|
| Offset topic lost | Connector may need resnapshot or manual recovery |
| Replication slot removed | Source may discard required log history |
| Log retention too short | Connector cannot catch up after downtime |
| Connector config changed incorrectly | Topic or envelope semantics may change |
| Schema history lost | Connector may not decode future log records correctly |
Offsets are not just operational metadata. They are part of the pipeline's recovery protocol.
8. Schema History Is Not Optional Complexity
CDC tools must understand table structure over time.
Imagine this timeline:
T1: table case_file(case_id, status)
T2: insert case_id=C-1, status=OPEN
T3: alter table add column risk_score
T4: update C-1 set risk_score=91
When reading log entries, the connector must understand which schema was valid at each point.
Schema history helps decode change records correctly across DDL changes.
A practical rule:
Treat schema history storage as critical state, not disposable cache.
Do not casually delete connector schema history topics, files, or equivalent state. Losing them can force a rebuild or resnapshot.
9. Heartbeats and Lag Detection
CDC systems can appear idle for two very different reasons:
- there are no source changes,
- the connector is broken or stuck.
Heartbeats help distinguish these situations.
A heartbeat is a small signal emitted periodically so operators can know the connector is alive and advancing.
Useful metrics:
- last source event timestamp,
- last connector emit timestamp,
- heartbeat age,
- source log position age,
- replication slot retained bytes,
- Kafka Connect task status,
- Kafka producer error rate,
- downstream consumer lag.
A mature runbook does not merely ask:
Is the connector running?
It asks:
Is the connector still able to observe new committed changes within the freshness SLO and retain enough source log to recover?
10. Transaction Metadata and Ordering
CDC is often consumed at row-event granularity. But databases commit transactions, not isolated rows.
A single transaction might update several tables:
BEGIN;
UPDATE case_file SET status = 'ESCALATED' WHERE case_id = 'C-1001';
INSERT INTO case_assignment(...);
INSERT INTO case_timeline(...);
COMMIT;
Downstream consumers may see multiple CDC events.
Important questions:
- Are events from one transaction identifiable?
- Can consumers preserve commit order?
- Is table-level topic partitioning hiding cross-table transaction ordering?
- Does the downstream projection require all rows from a transaction before applying?
- Is partial visibility acceptable?
For many projections, row-level idempotent upserts are enough.
For regulatory or financial workflows, transaction grouping can matter. A downstream audit model may need to know that status change, assignment change, and timeline entry were one committed source transaction.
A robust event envelope should preserve source transaction metadata when available.
11. Debezium and Java Consumer Design
A Java consumer should not deserialize Debezium events directly into business objects without acknowledging the envelope.
Bad model:
record CaseFile(String caseId, String status, int riskScore) {}
This loses:
- operation type,
- before/after distinction,
- source table,
- source offset,
- snapshot flag,
- schema version,
- transaction metadata,
- event time,
- delete semantics.
Better model:
public enum CdcOperation {
CREATE,
UPDATE,
DELETE,
SNAPSHOT_READ
}
public record SourcePosition(
String connector,
String database,
String schema,
String table,
String partition,
String offset
) {}
public record CdcEnvelope<T>(
CdcOperation operation,
T before,
T after,
SourcePosition sourcePosition,
Instant sourceTimestamp,
boolean snapshot,
String transactionId,
String schemaVersion
) {
public Optional<T> currentValue() {
return Optional.ofNullable(after);
}
}
Then downstream logic can be explicit:
public ProjectionCommand toProjectionCommand(CdcEnvelope<CaseFileRow> event) {
return switch (event.operation()) {
case CREATE, UPDATE, SNAPSHOT_READ ->
ProjectionCommand.upsert(
event.after().caseId(),
event.after(),
event.sourcePosition()
);
case DELETE ->
ProjectionCommand.delete(
event.before().caseId(),
event.sourcePosition()
);
};
}
This model is a little more verbose. It prevents expensive ambiguity later.
12. The Delete Problem
Deletes are one of the most common CDC bugs.
A delete event may contain:
beforerow data,- no
afterdata, - tombstone record for compacted Kafka topics,
- key-only deletion semantics,
- connector-specific behavior.
Downstream choices:
| Sink Type | Delete Handling |
|---|---|
| Search index | Delete document by key |
| Cache | Evict key |
| Projection table | Mark deleted or physical delete |
| Lake table | Write delete event or merge delete |
| Audit log | Append delete fact |
| Analytics dimension | Soft-close record validity |
A dangerous default is to ignore events with null after.
if (event.after() == null) {
return; // bug: silently ignores deletes
}
Better:
if (event.operation() == CdcOperation.DELETE) {
projection.delete(event.key(), event.sourcePosition());
return;
}
If your source table allows hard deletes, delete semantics must be a first-class design concern.
13. Snapshot Mode Decisions
Snapshot mode determines how the connector initializes downstream state.
Common conceptual options include:
| Mode | Meaning | Use Case |
|---|---|---|
| Initial snapshot then stream | Read existing data, then stream changes | Build new projection/lake from current state |
| No initial snapshot | Only stream future changes | Topic already initialized, or only future events matter |
| Schema-only initialization | Capture schema state without rows | Existing downstream already loaded |
| Incremental snapshot | Snapshot selected data while streaming | Large tables, online re-snapshot, targeted repair |
| Ad hoc snapshot | Trigger snapshot for selected table/data | Repair or backfill |
The key decision is not "which flag should I set?"
The real decision is:
What downstream completeness claim are we making after the connector starts?
If you start without snapshot, downstream cannot claim to contain all current rows unless another mechanism loaded them.
If you run an initial snapshot, downstream must distinguish snapshot reads from newly created business facts.
If you run incremental snapshot, downstream must tolerate interleaving snapshot and streaming events.
14. Configuration as Architecture
Connector configuration is not boilerplate. It encodes architecture decisions.
Representative categories:
| Config Category | Architectural Meaning |
|---|---|
| Database connection | Which source truth is observed |
| Table include/exclude list | Capture scope and blast radius |
| Column include/exclude/masking | Privacy and payload contract |
| Snapshot mode | Initialization semantics |
| Topic naming | Consumer coupling and ownership |
| Serialization | Schema governance and compatibility |
| Tombstone behavior | Delete semantics and compaction |
| Heartbeat | Liveness and lag observability |
| Signal table/topic | Operational control plane |
| Transaction metadata | Cross-row/cross-table correlation |
Treat connector config as code:
- peer reviewed,
- version controlled,
- environment-specific values separated,
- promoted through environments,
- validated before deployment,
- owned by a clear team,
- associated with a rollback strategy.
15. Minimal PostgreSQL-Oriented Topology Example
The exact values vary by environment. This example is intentionally schematic.
name=regulatory-postgres-cdc
connector.class=io.debezium.connector.postgresql.PostgresConnector
plugin.name=pgoutput
database.hostname=postgres.internal
database.port=5432
database.user=debezium
database.password=${secrets:debezium_password}
database.dbname=regulatory
topic.prefix=regulatory
schema.include.list=public
table.include.list=public.case_file,public.case_assignment,public.enforcement_action
slot.name=regulatory_cdc_slot
publication.name=regulatory_cdc_publication
snapshot.mode=initial
heartbeat.interval.ms=10000
tombstones.on.delete=true
provide.transaction.metadata=true
schema.history.internal.kafka.bootstrap.servers=kafka:9092
schema.history.internal.kafka.topic=debezium.schemahistory.regulatory
Production concerns hidden behind this tiny file:
- Does the
debeziumDB user have only necessary privileges? - Is WAL retention large enough for worst-case connector downtime?
- Is the replication slot monitored?
- Are included tables stable and intentionally owned?
- Are sensitive columns excluded or masked?
- Is schema history topic replicated and retained safely?
- Does downstream know snapshot events may arrive?
- Are transaction metadata topics/records handled?
- Is the connector config deployed through controlled release?
16. Topic Design for Debezium Table CDC
Assume source tables:
case_file
case_assignment
case_note
case_party
case_decision
Naive topic design:
regulatory.public.case_file
regulatory.public.case_assignment
regulatory.public.case_note
regulatory.public.case_party
regulatory.public.case_decision
This is acceptable for internal materialization. It is not necessarily acceptable as a public event API.
Consumer coupling risk:
Every table refactor now becomes a downstream compatibility problem.
Better internal classification:
| Topic Kind | Audience | Stability Promise |
|---|---|---|
| Raw CDC table topic | Internal platform/pipeline | Database-coupled, low semantic promise |
| Curated change topic | Data consumers | Stable schema, cleaned semantics |
| Domain event topic | Business consumers | Domain-level versioned contract |
| Projection topic | Specific downstream system | Purpose-built contract |
Do not pretend raw CDC is a domain event stream.
17. CDC-to-Projection Pattern
A common Java use case:
Maintain a read model from operational database changes.
Flow:
Consumer algorithm:
for each CDC event:
parse envelope
validate source table and schema version
map row change to projection command
apply idempotent sink operation
record source position applied
commit Kafka offset only after sink success
Idempotent projection table example:
create table case_search_projection (
case_id text primary key,
status text not null,
risk_score int,
assigned_team text,
deleted boolean not null default false,
source_table text not null,
source_position text not null,
source_timestamp timestamptz,
updated_at timestamptz not null default now()
);
Apply update:
insert into case_search_projection (
case_id,
status,
risk_score,
assigned_team,
deleted,
source_table,
source_position,
source_timestamp
)
values (?, ?, ?, ?, false, ?, ?, ?)
on conflict (case_id) do update set
status = excluded.status,
risk_score = excluded.risk_score,
assigned_team = excluded.assigned_team,
deleted = false,
source_table = excluded.source_table,
source_position = excluded.source_position,
source_timestamp = excluded.source_timestamp,
updated_at = now();
Apply delete:
update case_search_projection
set deleted = true,
source_position = ?,
source_timestamp = ?,
updated_at = now()
where case_id = ?;
This is replay-safe if later duplicate events produce the same final projection.
For strict ordering, store and compare source position per key where possible.
18. CDC-to-Lake Pattern
For analytics, a CDC event stream often lands into object storage or table formats.
A simplified flow:
Bronze/raw should preserve:
- original key,
- operation,
- before/after if needed,
- source position,
- source timestamp,
- ingestion timestamp,
- connector identity,
- schema version,
- transaction metadata,
- raw payload or lossless representation.
Do not prematurely flatten away operation and source metadata. You will need them for:
- replay,
- correction,
- audit,
- schema migration,
- reconciliation,
- debugging source anomalies.
The normalized layer can apply business rules. The raw layer should behave like evidence.
19. CDC-to-Search Pattern
Search indexing is a practical CDC use case.
Pattern:
CDC event -> projection command -> search document update/delete
Challenges:
- search sink may be eventually consistent,
- partial update may fail,
- delete must be handled,
- index mapping may lag schema evolution,
- multiple tables may contribute to one document,
- enrichment data may be stale,
- reindex requires replay/backfill.
For one-table-to-one-document mapping, direct CDC is simple.
For multi-table documents, CDC becomes trickier:
case_file + case_assignment + case_party + case_action -> case_search_document
You need a materialization strategy:
| Strategy | Description | Trade-Off |
|---|---|---|
| On-change query | On any CDC event, query source DB to rebuild document | Simple, source load, race risk |
| Local state store | Maintain state for contributing tables | More complex, replayable |
| Database projection | Source DB maintains projection table, CDC captures it | Moves complexity upstream |
| Stream processor | Kafka Streams/Flink joins table topics | Scalable, stateful complexity |
For high-value systems, avoid ad hoc source DB queries in every consumer. They couple consumer correctness to source availability and isolation behavior.
20. CDC and Schema Registry
Debezium can work with schema-aware serialization such as Avro or Protobuf depending on converter configuration.
Schema registry is valuable because CDC payloads evolve when tables evolve.
But beware:
Database schema compatibility and event schema compatibility are not always the same thing.
A database migration might be safe for the application but breaking for a consumer.
Example:
alter table case_file rename column status to lifecycle_status;
The database app may deploy code simultaneously. CDC consumers may not.
Consumer-safe approach:
- prefer additive changes,
- deploy consumers before producers when needed,
- avoid rename-as-in-place-change for public streams,
- use curated topics to decouple raw table changes,
- test schema compatibility in CI,
- include schema version in envelope or registry metadata,
- maintain deprecation windows.
Raw CDC topics should be treated as database-coupled contracts. Curated topics should obey stricter compatibility promises.
21. Security and Sensitive Columns
CDC can accidentally become a data exfiltration pipeline.
Why?
Because it captures changes near the database truth.
Sensitive columns may include:
- national identifiers,
- addresses,
- phone numbers,
- free-text notes,
- investigation details,
- enforcement evidence,
- payment data,
- credentials or tokens accidentally stored in tables,
- internal staff comments.
Design controls:
| Control | Purpose |
|---|---|
| Table allowlist | Avoid accidental broad capture |
| Column exclude list | Remove sensitive fields |
| Masking SMT or downstream masking | Reduce exposure |
| Topic ACLs | Restrict consumers |
| Encryption in transit/at rest | Protect storage path |
| Data classification metadata | Help downstream policy enforcement |
| Retention policy | Limit long-term exposure |
| Audit logs | Know who consumed what |
Do not start with capture everything in regulated environments.
Start with minimal capture scope and expand intentionally.
22. Operational Runbook: Connector Stops
When a connector stops, do not immediately restart blindly.
Ask:
- Why did it stop?
- Is source log retention still sufficient?
- Is the replication slot or equivalent state intact?
- Did schema change break decoding?
- Did Kafka reject writes?
- Did offset storage fail?
- Did connector task crash due to poison schema/data?
- Are downstream consumers lagging separately?
- Is a resnapshot required?
- Would restart duplicate events downstream?
Basic decision tree:
The worst outcome is not downtime. The worst outcome is falsely believing downstream data is complete after a log gap.
23. Operational Runbook: Downstream Is Wrong
If a downstream projection is wrong, debug from evidence.
Checklist:
- What is the source row now?
- What is the source transaction history around the issue?
- Did Debezium emit the expected event?
- Did Kafka contain the event?
- Did the consumer read it?
- Did the consumer reject it?
- Did the sink apply it?
- Was the event later overwritten by an older event?
- Was a delete ignored?
- Was a snapshot event misclassified?
- Was schema evolution mishandled?
- Was backfill interleaved with live stream incorrectly?
A good CDC pipeline stores enough metadata in the sink to answer:
Which source event last modified this downstream row?
Without that, every data incident becomes archaeology.
24. Java Integration Testing with Debezium
For serious systems, test CDC behavior with real infrastructure.
Recommended test layers:
| Test Level | Purpose |
|---|---|
| Envelope parser unit test | Parse operation/source/delete/snapshot correctly |
| Mapper unit test | Convert CDC event to sink command |
| Idempotency test | Duplicate events do not corrupt sink |
| Delete test | Deletes propagate correctly |
| Schema evolution test | Add nullable column, add required column with default, rename simulation |
| Embedded/local integration | DB + Debezium + Kafka + consumer |
| Recovery test | Kill connector, restart, verify no loss |
| Snapshot test | Initial snapshot does not trigger business side effects |
| Backfill test | Replay old topic into empty sink |
Do not only test the happy path:
insert row -> event appears
Also test:
insert -> update -> delete -> duplicate -> restart -> replay -> schema change
25. Example Java CDC Consumer Skeleton
This is intentionally framework-light. It shows boundaries.
public final class CdcProjectionConsumer {
private final KafkaConsumer<String, byte[]> consumer;
private final DebeziumEventParser parser;
private final ProjectionMapper mapper;
private final ProjectionSink sink;
private final DeadLetterSink deadLetters;
public void run() {
consumer.subscribe(List.of("regulatory.public.case_file"));
while (!Thread.currentThread().isInterrupted()) {
ConsumerRecords<String, byte[]> records = consumer.poll(Duration.ofMillis(500));
for (ConsumerRecord<String, byte[]> record : records) {
try {
CdcEnvelope<CaseFileRow> event = parser.parse(record);
ProjectionCommand command = mapper.map(event);
sink.apply(command);
} catch (NonRetryableDataException e) {
deadLetters.write(record, e);
} catch (RetryableInfrastructureException e) {
throw e;
}
}
consumer.commitSync();
}
}
}
Important details omitted from this simple skeleton but required in production:
- partition-aware pause/resume,
- batch processing,
- retry budget,
- DLQ envelope,
- metrics,
- graceful shutdown,
- rebalance listener,
- transaction or idempotent sink boundary,
- offset commit after sink success,
- poison record handling,
- schema compatibility validation.
The skeleton is not the architecture. It is the smallest visible loop.
26. Debezium Anti-Patterns
Anti-Pattern 1: Treating Raw CDC as Domain Events
Raw CDC says a row changed. It does not necessarily say what business event happened.
Anti-Pattern 2: Ignoring Snapshot Operation
Snapshot rows are existing state, not new domain creations.
Anti-Pattern 3: Ignoring Deletes
Null after does not mean irrelevant. It may mean deletion.
Anti-Pattern 4: Capture Everything by Default
This creates privacy, cost, and blast-radius problems.
Anti-Pattern 5: No Log Retention Monitoring
If the connector falls behind beyond source log retention, completeness may be broken.
Anti-Pattern 6: Deleting Schema History
Schema history is part of the connector's ability to decode changes.
Anti-Pattern 7: Business Logic in Connector Configuration
Use connector transforms carefully. Complex business mapping belongs downstream.
Anti-Pattern 8: No Replay Plan
If consumers are not idempotent, Debezium's replay power becomes dangerous.
Anti-Pattern 9: One Consumer Does Everything
A monolithic CDC consumer that updates search, sends email, writes lake, and calls APIs becomes untestable and unreplayable.
Anti-Pattern 10: No Ownership Boundary
Someone must own connector config, topic compatibility, schema changes, lag, and incident response.
27. Production Checklist
Before deploying Debezium CDC to production, verify:
Source Database
- Capture scope is explicit.
- Database user has least required privileges.
- Replication/log retention is configured for worst-case downtime.
- Replication slot or equivalent state is monitored.
- Source load from snapshot is acceptable.
- DDL process accounts for CDC consumers.
Kafka Connect
- Connect cluster is highly available enough for freshness SLO.
- Offset/config/status topics are replicated and retained.
- Schema history topic is protected.
- Connector config is version controlled.
- Restart and rollback process is documented.
Kafka Topics
- Topic naming is intentional.
- Retention/compaction policy matches use case.
- ACLs protect sensitive data.
- Serialization and schema strategy are defined.
- DLQ strategy is available for downstream consumers.
Consumers
- Operation type is handled explicitly.
- Snapshot events are handled safely.
- Deletes are handled.
- Sink is idempotent.
- Offset commit happens after sink success.
- Replay/backfill is tested.
- Schema evolution is tested.
Observability
- Connector task status is monitored.
- Source lag is monitored.
- Heartbeat age is monitored.
- Kafka produce errors are monitored.
- Downstream consumer lag is monitored.
- Data reconciliation exists.
- Runbooks exist for log gap, schema failure, and bad data.
28. How This Connects to the Next Part
Debezium table CDC is powerful, but it captures database mutations. Sometimes that is exactly what you need. Sometimes it is dangerously low-level.
If the downstream system needs to know:
CaseEscalated,EnforcementActionApproved,SlaBreached,InvestigationAssigned,DecisionPublished,
then raw table changes are a poor public contract.
The next part covers the outbox pattern: using the application transaction itself to write domain event records atomically, then letting Debezium publish them reliably.
That gives us the best of both worlds:
- application owns domain semantics,
- database transaction owns atomicity,
- Debezium owns reliable log-based publishing,
- Kafka owns durable distribution.
References
- Debezium Documentation — Features: https://debezium.io/documentation/reference/stable/features.html
- Debezium Documentation — Source Connectors: https://debezium.io/documentation/reference/stable/connectors/index.html
- Debezium Documentation — PostgreSQL Connector: https://debezium.io/documentation/reference/stable/connectors/postgresql.html
- Debezium Documentation — MySQL Connector: https://debezium.io/documentation/reference/stable/connectors/mysql.html
- Debezium Documentation — SQL Server Connector: https://debezium.io/documentation/reference/stable/connectors/sqlserver.html
- Kafka Connect Documentation: https://kafka.apache.org/documentation/#connect
You just completed lesson 21 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.
Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.