Deepen PracticeOrdered learning track

Control Plane vs Data Plane

Learn Java Data Pipeline Pattern - Part 064

Control plane vs data plane architecture for internal Java data pipeline platforms: pipeline definitions, run state, scheduling, policy enforcement, lineage, execution workers, asset registry, self-service, governance, and production operating model.

[2026-07-04]16 min read3145 words

In This Lesson

1. The Mental Model 2. Why the Separation Exists 3. Control Plane Responsibilities

PrevNext

Lesson 6484 lesson track46–69 Deepen Practice

#java#data-pipeline#platform-engineering#control-plane+4 more

Part 064 — Control Plane vs Data Plane

A pipeline platform fails when every pipeline owns its own operational brain.

One job implements retries this way. Another stores checkpoints differently. A third has no run manifest. A fourth writes lineage only when it succeeds. A fifth sends Slack alerts from transform code. A sixth has manual backfill scripts known by one engineer.

This is not a platform.

It is a pile of jobs.

The architectural move is to separate:

Control plane = decides, records, governs, schedules, observes.
Data plane = reads, transforms, writes, and moves data.

This separation is not cosmetic.

It determines whether your organization can scale from ten pipelines to hundreds without losing correctness, ownership, auditability, and operational control.

1. The Mental Model

Think of a pipeline platform as two cooperating systems.

The control plane does not process every record.

The data plane does not decide global policy.

That is the boundary.

2. Why the Separation Exists

Without separation, business logic, operational policy, and execution mechanics become tangled.

Example bad Java job:

public static void main(String[] args) {
    readConfig();
    checkIfHoliday();
    queryDatabase();
    transformRows();
    validateOutput();
    writeParquet();
    updateRunStatus();
    sendSlackAlert();
    updateLineage();
    triggerDownstreamJobs();
}

This job knows too much.

It owns:

scheduling policy
business calendar
source integration
transformation
validation
output publishing
run state
notification
lineage
downstream triggering

That makes every job a platform clone.

A better split:

Control plane:
  - create run
  - enforce policy
  - pass run context
  - monitor state
  - publish lineage/asset state
  - trigger downstream

Data plane:
  - read assigned input
  - transform deterministically
  - write output through declared boundary
  - report progress and evidence

The job becomes smaller and safer.

3. Control Plane Responsibilities

The control plane owns metadata and decisions.

Core responsibilities:

Responsibility	Meaning
Pipeline registry	What pipelines exist and who owns them
Version registry	Which code/config/schema version is active
Run creation	When a run exists and why
Scheduling/triggering	Cron, asset events, manual, SLA, webhook
Policy enforcement	Access, data classification, approval, quality gate
Run state	Planned/running/succeeded/failed/cancelled/superseded
Asset state	Fresh/stale/invalid/partial/degraded
Lineage	Inputs, outputs, run, code, schema, transform version
Backfill control	What historical range is allowed and isolated
Retry/rerun control	What can be retried safely
Observability	Metrics, alerts, logs correlation, dashboard
Governance	Ownership, retention, sensitive-data rules, audit evidence
Self-service	Developer-facing API/UI/scaffold

The control plane is the operational memory of the platform.

If the control plane does not know it happened, it is not operationally real.

4. Data Plane Responsibilities

The data plane owns execution.

Core responsibilities:

Responsibility	Meaning
Source read	Pull or receive input records/assets
Transform	Apply deterministic or declared stateful logic
Sink write	Write output/effects using boundary contracts
Local resource management	CPU, memory, parallelism, batching, backpressure
Worker lifecycle	startup, graceful shutdown, heartbeat
Checkpointing	engine-specific progress and recoverability
Runtime metrics	records, bytes, lag, error, latency
Evidence emission	run facts, quality facts, output facts
Failure classification	retryable/non-retryable/unknown outcome

The data plane must be allowed to move fast.

But it should not be allowed to invent platform semantics.

5. What Belongs Where

A useful rule:

If the decision affects multiple pipelines or consumers, it belongs in the control plane.
If the decision is about processing records inside one run, it belongs in the data plane.

Examples:

Concern	Plane
Transforming `CaseUpdated` into `CaseSnapshot`	Data plane
Deciding whether `CaseSnapshot` is publishable	Control plane + quality policy
Reading Kafka partition offsets	Data plane
Recording which offsets were processed by a run	Control plane metadata
Flink keyed state	Data plane
Asset freshness status	Control plane
Spark executor sizing	Data plane runtime/config
Maximum backfill range allowed for PII asset	Control plane policy
DLQ item serialization	Data plane
DLQ replay approval	Control plane
Retry transient API timeout	Data plane/boundary policy
Retry failed regulatory report publication	Control plane decision

6. Pipeline Definition Model

A pipeline definition should be declarative enough for the control plane to reason about it.

Example:

pipeline: case-daily-snapshot
owner: enforcement-data-platform
runtime: java-batch
entrypoint: com.acme.pipeline.case.snapshot.CaseDailySnapshotJob
version: 9.4.1
inputs:
  - asset: silver.case
    freshness_required: PT2H
    mode: read_snapshot
  - asset: silver.case_assignment
    freshness_required: PT2H
    mode: read_snapshot
outputs:
  - asset: gold.case_daily_snapshot
    mode: replace_partition
    partition_key: business_date
contracts:
  output_schema: case_daily_snapshot@3.2.0
  quality_contract: case_daily_snapshot_quality@2.1.0
schedule:
  type: cron
  expression: "0 2 * * *"
  timezone: Asia/Singapore
policies:
  data_classification: restricted
  requires_quality_gate: true
  max_backfill_days_without_approval: 31
  publish_requires_reconciliation: true
runtime_config:
  memory: 4Gi
  max_parallelism: 8
  timeout: PT2H

This definition is not only for documentation.

The control plane can use it to:

create runs
check dependencies
enforce access
validate schema compatibility
determine rerun scope
create lineage edges
publish asset state
display ownership
constrain backfill

7. Run Model

A run is a first-class entity.

Do not rely only on logs.

public record PipelineRun(
    String runId,
    String pipelineId,
    String pipelineVersion,
    RunReason reason,
    RunStatus status,
    Instant plannedAt,
    Optional<Instant> startedAt,
    Optional<Instant> finishedAt,
    Map<String, String> parameters,
    List<InputAssetRef> inputs,
    List<OutputAssetRef> expectedOutputs,
    Optional<String> parentRunId,
    Optional<String> approvalId
) {}

Run status should be explicit:

Do not collapse Succeeded and Published.

A data plane job may finish successfully while quality gate or publish gate fails.

8. Run Context Passed to Data Plane

Every job should receive a run context.

{
  "runId": "run-20260704-000183",
  "pipelineId": "case-daily-snapshot",
  "pipelineVersion": "9.4.1",
  "attempt": 1,
  "reason": "SCHEDULED",
  "dataIntervalStart": "2026-07-03T00:00:00+08:00",
  "dataIntervalEnd": "2026-07-04T00:00:00+08:00",
  "inputAssets": [
    {"asset": "silver.case", "version": "snapshot-9012"}
  ],
  "outputAssets": [
    {"asset": "gold.case_daily_snapshot", "partition": "business_date=2026-07-03"}
  ],
  "policy": {
    "qualityGate": "blocking",
    "piiLoggingAllowed": false
  }
}

The job should not infer this from wall-clock time.

Use the control plane's assigned data interval and input asset versions.

That makes reruns deterministic.

9. Asset Registry

The asset registry is the control plane's model of outputs.

An asset is not just a table name.

public record DataAsset(
    String assetId,
    String name,
    AssetKind kind,
    String owner,
    DataClassification classification,
    SchemaRef schema,
    QualityContractRef qualityContract,
    RetentionPolicy retentionPolicy,
    List<AssetDependency> dependencies,
    AssetFreshnessPolicy freshnessPolicy
) {}

Asset state:

public enum AssetState {
    UNKNOWN,
    MATERIALIZING,
    FRESH,
    STALE,
    INVALID,
    PARTIAL,
    DEGRADED,
    DEPRECATED
}

The registry answers:

Who owns this asset?
What schema does it promise?
Is it fresh enough?
Which run produced the current version?
Which inputs created it?
What downstream assets depend on it?
Is it allowed to contain PII?
Is it publishable?

10. Data Plane Execution Modes

The platform should support multiple data plane modes.

Mode	Good For	Control Plane Role
Java batch worker	bounded jobs, custom integration	create run, dispatch command, capture output
Java streaming service	continuous Kafka consumers	deploy config, monitor lag, manage version rollout
Kafka Streams app	Kafka-native stream/table processing	register topology, monitor state, manage reset/replay
Flink job	stateful event-time stream processing	submit job, manage savepoint, observe checkpoint/state
Spark job	large batch/micro-batch processing	submit job, track application, validate outputs
Airflow task	workflow-level task execution	orchestrate tasks, pass run context
Temporal worker	durable workflow and side-effect control	start workflow, observe workflow history

A mature platform does not force one runtime for everything.

It standardizes the contracts around all runtimes.

11. Control Plane API

A pipeline control plane usually needs APIs like:

POST /pipelines
GET  /pipelines/{pipelineId}
POST /pipelines/{pipelineId}/runs
GET  /runs/{runId}
POST /runs/{runId}/cancel
POST /runs/{runId}/retry
POST /runs/{runId}/approve
GET  /assets/{assetId}
GET  /assets/{assetId}/lineage
POST /assets/{assetId}/materializations
POST /quality-results
POST /lineage-events

Internal Java model:

public interface PipelineControlPlane {
    PipelineRun createRun(CreateRunCommand command);
    void markRunStarted(String runId, RunStartedEvidence evidence);
    void reportProgress(String runId, RunProgress progress);
    void publishQualityResult(String runId, QualityResult result);
    void publishMaterialization(String runId, AssetMaterialization materialization);
    void markRunFinished(String runId, RunOutcome outcome);
}

The API should be boring and stable.

The platform should not require every pipeline to invent a new metadata protocol.

12. Worker Contract

Every data plane worker should implement a consistent contract.

Input:

run context
pipeline version
input asset references
output target references
policy bundle
secrets references
resource limits

Output:

status
metrics
materialization evidence
quality evidence
lineage evidence
error classification
checkpoint/effect references

Worker result example:

{
  "runId": "run-20260704-000183",
  "status": "SUCCEEDED",
  "outputs": [
    {
      "asset": "gold.case_daily_snapshot",
      "partition": "business_date=2026-07-03",
      "recordCount": 188392,
      "schemaVersion": "case_daily_snapshot@3.2.0",
      "storageRef": "iceberg://warehouse/gold/case_daily_snapshot@snapshot-9182"
    }
  ],
  "quality": {
    "status": "PASSED",
    "checksPassed": 41,
    "checksFailed": 0
  },
  "metrics": {
    "durationMs": 482000,
    "recordsRead": 190104,
    "recordsWritten": 188392
  }
}

This result is more valuable than a log line saying job complete.

13. Policy Engine

The control plane needs policy enforcement.

Policy examples:

policy: restricted-asset-publish
rules:
  - output.classification in [restricted, confidential] requires quality_gate == passed
  - output.classification == restricted requires lineage_complete == true
  - backfill.days > 31 requires approval
  - pii_logging_allowed must be false
  - source.owner must approve schema breaking change
  - consumer_count > 0 requires deprecation_notice before deletion

Policy belongs in the control plane because it affects more than one job.

The data plane can enforce local policy checks, but the canonical decision should be centralized.

14. Quality Gate as Control Plane Decision

Quality checks often run in the data plane.

But publish decision should be visible in the control plane.

This allows consistent behavior:

fail publish on critical quality breach
allow degraded publish for low-severity warning
require approval for override
record why output was blocked or allowed

15. Lineage and Evidence

Lineage should be emitted by the platform, not manually reconstructed from code.

Minimum lineage event:

public record LineageEvent(
    String runId,
    String jobName,
    String jobVersion,
    List<DatasetRef> inputs,
    List<DatasetRef> outputs,
    Instant eventTime,
    Map<String, String> facets
) {}

Useful facets:

schema version
transform version
source offsets
source snapshots
row counts
data quality summary
code git SHA
container image digest
environment
owner
classification

For regulated systems, lineage is evidence.

It answers:

Why did this report say what it said on that day?

16. Scheduling in the Control Plane

The scheduler should create runs.

It should not directly mutate business data.

Run creation reasons:

public enum RunReason {
    SCHEDULED,
    ASSET_TRIGGERED,
    MANUAL,
    BACKFILL,
    RETRY,
    REPROCESSING,
    CORRECTION,
    DEPENDENCY_REFRESH,
    SLA_RECOVERY
}

This matters because different reasons imply different policies.

A manual backfill over two years of restricted data may require approval.

A retry of a transient API timeout may not.

A correction run may need to supersede prior output.

A scheduled run may be blocked by upstream freshness.

17. Backfill Control Plane

Backfill is where platform governance becomes necessary.

A backfill request should include:

public record BackfillRequest(
    String pipelineId,
    LocalDate startDate,
    LocalDate endDate,
    String requestedBy,
    String reason,
    BackfillMode mode,
    boolean isolateOutputs,
    Optional<String> approvalId
) {}

Backfill policy questions:

Is the date range allowed?
Are inputs still retained?
Are schemas compatible historically?
Will output overwrite current data?
Should output go to isolated namespace?
Are downstream consumers auto-triggered?
Is approval required?
What cost limit applies?

The data plane executes chunks.

The control plane owns the backfill campaign.

18. Retry and Rerun Control

Retry is not always safe.

The control plane should know the difference between:

Action	Meaning
Retry attempt	Same run, same inputs, same output target, transient failure
Rerun	New run for same logical interval/output
Reprocess	Recompute with possibly new code or contract
Backfill	Historical range run
Correction	Apply correction semantics and supersede prior output
Replay	Re-read event log from offsets/time

Each action has different safety rules.

Example:

Retry Java task after DB timeout: maybe safe.
Retry side-effect notification: safe only if idempotency/effect ledger exists.
Rerun replace-partition output: safe if staged publish is atomic.
Replay Kafka topic into search index: safe if index write is version-aware.
Reprocess regulatory report: requires supersession record.

The control plane should encode this.

19. Continuous Streaming Jobs

Batch jobs map naturally to runs.

Streaming jobs are different.

A streaming job may run for weeks.

Control plane concepts:

deployment version
job instance
checkpoint/savepoint
input lag
watermark lag
error rate
state size
current topology version
restart history
upgrade history

A streaming job can still emit logical materialization events:

case_breach_alert_stream processed up to event_time=2026-07-04T10:00:00Z
silver.case_current updated through source_lsn=18499200

Do not force streaming into fake daily run semantics.

Use:

job lifecycle state + periodic progress/materialization state

20. Versioning Across Planes

A pipeline output depends on many versions:

pipeline definition version
code version
transform version
schema version
quality contract version
input asset version
runtime image version
config version
reference data version
platform policy version

The control plane should record them.

Example materialization:

{
  "asset": "gold.case_daily_snapshot",
  "assetVersion": "snapshot-9182",
  "runId": "run-20260704-000183",
  "codeVersion": "git:1a2b3c4",
  "imageDigest": "sha256:...",
  "schemaVersion": "case_daily_snapshot@3.2.0",
  "qualityContractVersion": "2.1.0",
  "transformVersion": "9.4.1",
  "inputVersions": [
    {"asset": "silver.case", "version": "snapshot-9012"}
  ]
}

This enables reproducibility.

21. Platform Database Model

A minimal control plane can start with tables like:

CREATE TABLE pipeline_definition (
  pipeline_id       text PRIMARY KEY,
  name              text NOT NULL,
  owner             text NOT NULL,
  runtime           text NOT NULL,
  active_version    text NOT NULL,
  definition_json   jsonb NOT NULL,
  created_at        timestamptz NOT NULL,
  updated_at        timestamptz NOT NULL
);

CREATE TABLE pipeline_run (
  run_id            text PRIMARY KEY,
  pipeline_id       text NOT NULL,
  pipeline_version  text NOT NULL,
  reason            text NOT NULL,
  status            text NOT NULL,
  data_interval_start timestamptz,
  data_interval_end   timestamptz,
  parameters_json   jsonb NOT NULL,
  created_at        timestamptz NOT NULL,
  started_at        timestamptz,
  finished_at       timestamptz,
  parent_run_id     text,
  approval_id       text
);

CREATE TABLE asset_materialization (
  materialization_id text PRIMARY KEY,
  asset_id           text NOT NULL,
  asset_version      text NOT NULL,
  run_id             text NOT NULL,
  status             text NOT NULL,
  storage_ref        text,
  schema_version     text,
  record_count       bigint,
  data_interval_start timestamptz,
  data_interval_end   timestamptz,
  materialized_at    timestamptz NOT NULL,
  metadata_json      jsonb NOT NULL
);

Add more only when needed.

The important point is not the exact schema.

The important point is that control-plane facts are persisted as first-class state.

22. Control Plane State Transitions

Use compare-and-swap style updates for run state.

Bad:

UPDATE pipeline_run SET status = 'SUCCEEDED' WHERE run_id = ?;

Better:

UPDATE pipeline_run
SET status = 'SUCCEEDED', finished_at = now()
WHERE run_id = ?
  AND status = 'RUNNING';

If no row is updated, your worker is stale or the run was cancelled.

This prevents accidental transitions like:

CANCELLED -> SUCCEEDED
FAILED -> RUNNING
PUBLISHED -> FAILED

Model the state machine explicitly.

23. Dispatch Model

The control plane can dispatch work in several ways:

Dispatch	Good For	Trade-Off
HTTP call to worker	simple services	timeout/availability coupling
Queue command	async workers	need command dedupe/lease
Kubernetes job	batch isolation	startup overhead
Airflow DAG/task	workflow ecosystem	Airflow state must be reconciled
Spark submit	large batch	external app tracking needed
Flink job submit	streaming/stateful	savepoint/upgrade lifecycle
Temporal workflow	durable process	deterministic workflow constraints

A dispatch command should be idempotent.

public record DispatchCommand(
    String commandId,
    String runId,
    String pipelineId,
    String runtime,
    String workerRef,
    RunContext runContext
) {}

If the control plane retries dispatch after timeout, it must not start duplicate unsafe runs.

24. Heartbeats and Leases

For long-running workers, use heartbeat/lease.

public record WorkerHeartbeat(
    String runId,
    String workerId,
    Instant heartbeatAt,
    RunProgress progress,
    Map<String, String> runtimeStats
) {}

Lease model:

UPDATE pipeline_run
SET leased_by = ?, lease_expires_at = now() + interval '2 minutes'
WHERE run_id = ?
  AND status IN ('READY', 'RUNNING')
  AND (lease_expires_at IS NULL OR lease_expires_at < now());

Leases prevent two workers from owning the same run.

But leases alone do not protect external side effects.

Use fencing tokens for write boundaries where needed.

25. Fencing Tokens

A fencing token prevents stale workers from committing after losing ownership.

Example:

public record RunLease(
    String runId,
    String workerId,
    long fencingToken,
    Instant expiresAt
) {}

External publish operation includes token:

UPDATE asset_publish_pointer
SET asset_version = ?, fencing_token = ?
WHERE asset_id = ?
  AND fencing_token < ?;

If stale worker tries to publish with old token, update fails.

This matters in distributed systems because cancellation and restart are not instantaneous.

26. Self-Service Platform API

The platform should make the right path easy.

A developer should be able to define:

source
transform
sink
contract
schedule
quality gate
ownership
data classification

without hand-writing all operational glue.

Self-service does not mean no governance.

It means governance is embedded.

Example developer flow:

pipeline init case-daily-snapshot --runtime java-batch
pipeline add-input silver.case
pipeline add-output gold.case_daily_snapshot --mode replace-partition
pipeline add-quality-contract case_daily_snapshot_quality.yaml
pipeline deploy --env staging
pipeline run --date 2026-07-03
pipeline promote --env prod

Behind the scenes, the control plane creates:

pipeline definition
CI checks
schema compatibility gate
deployment metadata
run configuration
lineage hooks
standard alerts
standard dashboards

27. Control Plane vs Frameworks

Do not confuse a framework with the control plane.

Airflow can be part of a control plane, but Airflow alone may not be your full platform model.

Flink has job management and checkpointing, but Flink is primarily data plane for stream processing.

Kafka has consumer group state, but Kafka is not an asset registry.

Temporal has durable workflow history, but Temporal does not automatically know data contracts or lineage.

Spark has application tracking, but Spark does not decide whether an asset is publishable.

The platform may compose them:

The control plane is the system that makes cross-runtime decisions.

28. Platform Evolution Stages

Most organizations do not build the final platform first.

A practical evolution:

Stage 1 — Standard Library

Provide Java libraries for:

run context
metrics
logging
retry policy
boundary adapters
quality result emission
lineage emission

Stage 2 — Run Registry

Persist:

pipeline definitions
run state
materialization facts
quality results
basic lineage

Stage 3 — Controlled Scheduling

Move from ad hoc cron to:

run creation
dependency checks
backfill tracking
retry/rerun semantics

Stage 4 — Policy Enforcement

Add:

data classification
approval workflow
schema gate
quality gate
publish gate

Stage 5 — Self-Service Platform

Add:

API/UI
templates
scaffolding
ownership model
cost controls
consumer registry
impact analysis

Do not start with a huge platform rewrite.

Start by making metadata and run semantics explicit.

29. Regulatory Enforcement Case Study

Consider a regulatory enforcement data platform.

Data sources:

case management database
assignment history
breach events
decision records
document metadata
external regulator submissions

Outputs:

case current state
SLA breach alert stream
daily enforcement report
investigator workload dashboard
audit evidence bundle
external submission status projection

Control plane responsibilities:

enforce restricted data policy
prevent report publish if quality checks fail
track lineage from operational case changes to report rows
require approval for correction backfills over closed reporting periods
record which run superseded prior report output
manage asset freshness for dashboards
block downstream gold assets when silver canonical case is invalid

Data plane responsibilities:

consume CDC events
run Flink breach detection
run Spark daily snapshot
write Iceberg tables
update search projection
emit quality metrics
emit lineage events

Architecture:

This is where the control-plane/data-plane separation pays off.

The organization can prove:

what ran
why it ran
which data it used
which version of code produced output
which quality gates passed
who approved exceptional actions
which downstream assets were affected

That is regulatory defensibility.

30. Anti-Patterns

Anti-Pattern 1: Every Pipeline Has Its Own Control Plane

Each job stores status differently, retries differently, and alerts differently.

This does not scale.

Anti-Pattern 2: Control Plane Processes Records

A central service tries to transform every record for every pipeline.

This becomes bottleneck and coupling point.

Keep high-volume processing in the data plane.

Anti-Pattern 3: Data Plane Decides Governance

A Spark job decides to ignore quality failure and publish anyway.

That decision must be visible and governed.

Anti-Pattern 4: No Run ID Everywhere

Without run id propagation, debugging becomes archaeology.

Every log, metric, lineage event, materialization, and quality result should carry run context where applicable.

Anti-Pattern 5: Wall-Clock Driven Jobs

A job computes “yesterday” internally.

Reruns become ambiguous.

The control plane should pass explicit data intervals.

Anti-Pattern 6: One Runtime to Rule Them All

Forcing every pipeline into one engine creates unnatural designs.

Use standard contracts across runtimes instead.

Anti-Pattern 7: Platform Metadata as Afterthought

Lineage and quality are emitted only on success or only from some jobs.

Then impact analysis is incomplete exactly when needed most.

Anti-Pattern 8: No Publish State

Job success is treated as asset freshness.

But a job can succeed and publish can fail.

Model materialization and publication explicitly.

31. Implementation Blueprint

A minimal internal platform can be implemented incrementally.

31.1 Java Library

Provide:

pipeline-runtime-core
pipeline-boundary-core
pipeline-quality-core
pipeline-lineage-client
pipeline-controlplane-client
pipeline-testkit

31.2 Control Plane Service

Build a Java service with:

pipeline definition API
run API
asset API
materialization API
quality result API
lineage sink integration
policy evaluation
backfill campaign API

31.3 Worker SDK

Every job uses:

public final class PipelineJobMain {
    public static void main(String[] args) {
        RunContext context = RunContextLoader.fromArgs(args);
        PipelineControlClient control = PipelineControlClient.fromEnv();

        control.markStarted(context.runId());

        try {
            JobResult result = new CaseDailySnapshotJob().run(context);
            control.publishQualityResult(context.runId(), result.quality());
            control.publishMaterialization(context.runId(), result.materialization());
            control.markSucceeded(context.runId(), result.outcome());
        } catch (PipelineException ex) {
            control.markFailed(context.runId(), FailureReport.from(ex));
            throw ex;
        }
    }
}

31.4 Metadata First

Before building a UI, make sure the platform records:

run state
input/output assets
materialization
quality result
lineage event
failure classification

A small truthful platform is better than a large dashboard over unreliable metadata.

32. Production Checklist

Before calling something a pipeline platform, verify:

Control Plane

Pipeline definitions are versioned.
Runs are first-class records.
Run state transitions are guarded.
Data intervals are explicit.
Backfills are tracked as campaigns.
Asset materializations are recorded.
Asset freshness is computed from materializations.
Quality gate decisions are visible.
Policy decisions are auditable.
Ownership is explicit.
Sensitive-data classification is explicit.
Lineage is emitted consistently.

Data Plane

Jobs receive run context.
Jobs do not infer logical dates from wall-clock time.
Boundary adapters classify failures.
Sink writes are idempotent or reconciled.
Metrics include run id and boundary names.
Logs avoid sensitive payloads.
Workers support graceful shutdown.
Streaming jobs expose lag/watermark/state metrics.
Batch jobs produce materialization evidence.

Cross-Plane

Dispatch is idempotent.
Heartbeats exist for long-running work.
Fencing prevents stale publish.
Retry/rerun/reprocess/backfill are distinct.
Publish state is separate from job success.
Manual overrides are recorded.
Downstream impact can be queried.
Operators have runbooks.

33. Mental Model Summary

The control plane is the platform's memory and governance brain.

The data plane is the execution muscle.

Do not put all memory in logs. Do not put governance in transform code. Do not put record processing in the control service. Do not let every runtime invent its own semantics.

The invariant:

The data plane may vary by engine.
The control-plane contract must remain stable.

That is how a Java data pipeline platform grows without becoming operationally incoherent.

34. What Comes Next

Part 065 begins the reliability and operational excellence phase.

We will define pipeline SLOs in practical terms:

freshness
completeness
accuracy
availability
cost
recovery time
replayability
auditability

This is where pipeline engineering moves from implementation patterns to operating promises.

Lesson Recap

You just completed lesson 64 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 63

External System Boundaries

Next Lesson

Lesson 65

Pipeline SLO and Error Budget