ALL_SERIES
SERIES_OVERVIEW // CURRICULUM_MAP

Learn Java Data Pipeline Pattern

// Mental model data pipeline sebagai distributed correctness system, bukan sekadar ETL template. Membahas batas sistem, invariant, lifecycle record, semantic contract, dan failure-first thinking untuk pipeline Java production-grade.

84 Lessons1375 Min Total04 Phases

This overview is designed to help you choose the right entry point quickly. Follow the full track from lesson one, continue from your last checkpoint, or jump straight into a phase that matches what you need right now.

airflowapache-beamapache-icebergapache-sparkapi+198 more

Curriculum Map

Navigate by phase, then choose the lesson that matches your current depth.

01

Pipeline Mental Model

20 min

Mental model data pipeline sebagai distributed correctness system, bukan sekadar ETL template. Membahas batas sistem, invariant, lifecycle record, semantic contract, dan failure-first thinking untuk pipeline Java production-grade.

02

Dataflow vs Control-Flow

17 min

Dataflow vs control-flow dalam pipeline Java production-grade. Membahas DAG, stream graph, workflow, job, task, operator, dependency semantics, orchestration, choreography, dan failure boundary.

03

Pipeline Invariants

19 min

Invariant inti data pipeline production-grade: completeness, ordering, freshness, idempotency, replayability, determinism, bounded side effects, dan auditability. Membahas cara berpikir, desain Java, failure mode, dan review checklist.

04

Batch, Streaming, CDC, and Request-Driven Pipeline Taxonomy

22 min

Taxonomy pipeline production-grade: batch, micro-batch, streaming, CDC, request-driven, reverse ETL, materialized view, file pipeline, and hybrid architecture. Fokus pada usage, implementation, invariant, trade-off, dan decision framework untuk Java systems.

05

Source-Transform-Sink Contract

19 min

Source-transform-sink sebagai kontrak produksi, bukan sekadar template ETL. Membahas responsibility boundary, contract design, Java abstractions, failure semantics, metadata, lifecycle, dan checklist review.

06

Pipeline Failure Model

21 min

Failure model data pipeline production-grade: duplicate, loss, reorder, poison data, partial commit, split brain, stale metadata, late data, dan operator-induced failure. Membahas taxonomy, mitigasi, Java modeling, dan review checklist.

07

Delivery Semantics Reality

19 min

Delivery semantics in real production systems: at-most-once, at-least-once, effectively-once, exactly-once, and the Java implementation patterns that make those terms operational instead of marketing labels.

08

Pipeline Decision Framework

20 min

A production decision framework for choosing between custom Java services, Kafka Streams, Flink, Beam, Spark, Airflow, Temporal, and hybrid data pipeline architectures.

09

Java Pipeline Core Abstractions

14 min

Core abstractions for building production-grade Java data pipelines from first principles: Source, Record, Envelope, Processor, Sink, Checkpoint, Runner, and Lifecycle.

10

Record Envelope Design

15 min

Deep dive into production-grade record envelope design for Java data pipelines: identity, payload, metadata, event time, schema, trace context, source position, causality, and replay safety.

11

Pipeline Type System

14 min

Deep dive into type-safe pipeline design in Java: generics, sealed interfaces, records, value objects, phantom types, domain events, result modeling, and compile-time boundary protection.

12

Local Pipeline Runner

14 min

Build a local Java pipeline runner from first principles: pull loop, push loop, lifecycle, bounded queue, worker model, commit order, error lanes, graceful shutdown, and deterministic test harness.

13

Backpressure From First Principles

15 min

Backpressure from first principles for Java data pipelines: rate mismatch, bounded memory, queue pressure, slow sinks, adaptive throttling, pause/resume, batching, and operational signals.

14

Checkpoint Interface Design

13 min

Design a production-grade checkpoint interface for Java data pipelines: offsets, cursors, watermarks, snapshots, recovery tokens, commit ordering, compare-and-swap, partitioned progress, and recovery algorithms.

15

Idempotent Sink from Scratch

14 min

Build idempotent sinks from scratch in Java using natural keys, dedupe keys, versioning, compare-and-swap, transactional boundaries, and replay-safe write protocols.

16

Retry, DLQ, and Poison Records

15 min

Design production-grade retry, dead-letter queue, quarantine, poison record isolation, and non-blocking error lanes for Java data pipelines.

17

File Ingestion Patterns

16 min

Design production-grade file ingestion pipelines in Java using landing zones, manifests, atomic handoff, partial-file detection, idempotent imports, and defensible operational controls.

18

API Ingestion Patterns

13 min

Design robust API ingestion pipelines in Java with cursor pagination, rate-limit control, retry budgets, incremental sync, deletion handling, and freshness SLOs.

19

Database Ingestion Patterns

16 min

Database ingestion patterns for production-grade Java data pipelines: full load, incremental load, high-watermark, snapshot isolation, chunking, consistency, delete handling, and operational safety.

20

CDC Ingestion Mental Model

18 min

Change Data Capture ingestion mental model for Java data pipelines: transaction logs, snapshots, WAL/binlog/redo, ordering, transaction boundaries, offsets, deletes, schema changes, and CDC failure modes.

21

Debezium CDC in Java Systems

20 min

Debezium CDC in Java systems: connector topology, Kafka Connect runtime, envelopes, offsets, schema history, snapshots, heartbeats, transaction metadata, sink integration, and production failure handling.

22

Outbox Pattern for Pipelines

17 min

Transactional outbox pattern for Java data pipelines: dual-write failure, event table design, aggregate ordering, Debezium outbox routing, relay alternatives, idempotent consumers, cleanup, replay, and production operations.

23

Inbox Dedupe and Consumer State

15 min

Inbox pattern, dedupe tables, consumer state, offset management, replay-safe command handling, and transactional event consumption for Java data pipelines.

24

Schema-on-Read vs Schema-on-Write

15 min

Schema-on-read versus schema-on-write for Java data pipelines: ingestion validation, raw zones, canonical models, compatibility, drift handling, and governance trade-offs.

25

Data Contracts for Pipelines

15 min

Pipeline data contracts as explicit producer promises, consumer assumptions, runtime enforcement points, and operational governance boundaries in Java data pipeline systems.

26

Schema Evolution Rules

14 min

Schema evolution rules for Java data pipelines, covering backward compatibility, forward compatibility, full compatibility, transitive modes, rollout sequencing, and safe schema changes across Avro, Protobuf, JSON Schema, Kafka, batch, streaming, and lakehouse systems.

27

Avro, Protobuf, JSON Schema in Pipelines

15 min

Memilih dan menerapkan Avro, Protobuf, dan JSON Schema untuk pipeline Java production-grade, dengan fokus pada evolusi schema, registry, encoding, debugging, compatibility, dan batas operasional.

28

Canonical Event Modeling

13 min

Mendesain canonical event untuk Java data pipeline production-grade: facts, state changes, commands, snapshots, corrections, identity, temporal semantics, versioning, auditability, dan replay safety.

29

Event Time and Business Time

17 min

Event time, processing time, ingestion time, source commit time, and business effective time as explicit correctness contracts in Java data pipelines.

30

Data Quality Contracts

16 min

Data quality contracts for Java data pipelines: nullability, range, uniqueness, referential validity, drift, enforcement policy, quarantine, metrics, and runtime validation.

31

Versioned Transformations

17 min

Versioned transformations for Java data pipelines: reproducibility, semantic change, manifest design, dual-running, migration, replay, state migration, sunset, and safe rollout.

32

Contract Testing for Pipelines

16 min

Contract testing for Java data pipelines: producer contracts, consumer assumptions, schema compatibility, semantic examples, golden datasets, replay tests, backfill tests, CDC/outbox tests, and CI gates.

33

Kafka as Pipeline Log

20 min

Kafka as a pipeline log: topic, partition, offset, replay, retention, compaction, consumer group, ordering boundary, and Java implementation mental model.

34

Topic Design for Data Pipelines

18 min

Kafka topic design for Java data pipelines: domain boundary, partition key, compaction, retention, topic taxonomy, tenancy, lifecycle, security, DLQ, backfill, and production review.

35

Producer Patterns Java

16 min

Java Kafka producer patterns for production-grade data pipelines: batching, compression, idempotence, transactions, headers, partitioning, backpressure, observability, and safe publishing boundaries.

36

Consumer Patterns Java

13 min

Java Kafka consumer patterns for production-grade data pipelines: poll loop, offset commit, rebalance, pause/resume, partition concurrency, idempotent processing, retry, DLQ, observability, and replay safety.

37

Kafka Streams Topology Design

17 min

Kafka Streams topology design for production-grade Java data pipelines: KStream, KTable, GlobalKTable, processor topology, repartitioning, state stores, changelog topics, task model, scaling, failure boundaries, and operational review.

38

Stream Table Join Patterns

16 min

Stream-table join patterns in Java data pipelines using Kafka Streams: KStream-KTable, KStream-GlobalKTable, KTable-KTable, temporal semantics, enrichment correctness, repartitioning, late data, table freshness, and operational design.

39

Log Compaction and Materialized Views

17 min

Log compaction and materialized view patterns in Kafka-centric Java data pipelines, including latest-state topics, tombstones, rebuilds, CDC projections, state restore, and operational failure modes.

40

Kafka Exactly-Once Boundaries

15 min

Kafka exactly-once semantics boundaries for Java data pipelines, including idempotent producers, transactions, consume-transform-produce loops, Kafka Streams guarantees, external side effects, and effectively-once design.

41

Stateful Stream Processing Model

16 min

Mental model produksi untuk stateful stream processing: operator state, keyed state, timers, snapshots, watermark, recovery, dan batas correctness.

42

Flink Java DataStream Foundation

12 min

Fondasi Flink Java DataStream untuk membangun pipeline stateful production-grade: source, operator, keyBy, managed state, timers, sink, parallelism, checkpoint, dan deployment boundary.

43

Flink Checkpointing, Savepoints, and Stateful Recovery

19 min

Flink checkpointing, savepoints, restart strategy, state backend, recovery semantics, upgrade workflow, and production-grade operational discipline for Java streaming pipelines.

44

Watermarks, Late Events, and Event-Time Correctness

16 min

Event-time correctness in Flink with watermarks, late events, allowed lateness, side outputs, temporal disorder, replay behavior, and production-grade Java implementation patterns.

45

Windowing Patterns

16 min

Windowing patterns for production Java data pipelines: tumbling, sliding, session, global, custom windows, triggers, lateness, state cost, and correctness boundaries.