Build CoreOrdered learning track

Contract Testing for Pipelines

Learn Java Data Pipeline Pattern - Part 032

Contract testing for Java data pipelines: producer contracts, consumer assumptions, schema compatibility, semantic examples, golden datasets, replay tests, backfill tests, CDC/outbox tests, and CI gates.

[2026-07-04]16 min read3191 words

In This Lesson

1. Why Pipeline Contract Testing Is Different 2. The Contract Testing Surface 3. Schema Compatibility Is Necessary but Not Sufficient

PrevNext

Lesson 3284 lesson track16–45 Build Core

#java#data-pipeline#contract-testing#schema-evolution+4 more

Part 032 — Contract Testing for Pipelines

A pipeline contract is useless if it is only written down.

It must be executable.

Contract testing answers:

Can this producer still publish what consumers expect?
Can this consumer still understand what producers publish?
Can this transformation still preserve its promised semantics?
Can this pipeline still replay, backfill, reject, quarantine, and publish safely?

This part focuses on contract testing for Java data pipelines.

Not generic unit testing. Not generic API testing. Not generic data quality dashboards.

The goal is to test the boundary promises that keep distributed data systems from silently corrupting downstream state.

1. Why Pipeline Contract Testing Is Different

A request/response service usually fails quickly when a contract breaks.

A data pipeline may keep running while producing wrong data for days.

Examples:

Producer removes a field but consumers silently default it.
CDC source changes a column type and downstream parser accepts it as string.
Transform changes risk score meaning without schema change.
Late-event policy changes and historical aggregates drift.
Backfill uses a different reference data snapshot than streaming.
Sink dedupe key changes and replay creates duplicates.
DLQ classification changes and poison records block the main lane.

The pipeline did not crash.

That is the problem.

Contract tests must catch semantic breakage before runtime.

2. The Contract Testing Surface

A production pipeline has multiple contract surfaces:

Each surface needs different tests.

Surface	What can break
Producer contract	Field removal, incompatible type, changed meaning, invalid examples
Consumer assumption	Consumer depends on undocumented fields or values
Transform contract	Wrong mapping, changed semantics, non-determinism
Quality contract	Invalid data accepted or valid data rejected
State contract	Migration failure, incompatible state, reset behavior
Backfill contract	Historical data unreadable, duplicate output, wrong reference data
Privacy contract	PII leakage, masking regression, retention violation
Operational contract	Retry, DLQ, checkpoint, idempotency behavior broken

A serious pipeline test suite covers more than schema compatibility.

3. Schema Compatibility Is Necessary but Not Sufficient

Schema compatibility checks answer:

Can old/new readers and writers parse each other?

They do not answer:

Does the value still mean the same thing?

Example compatible but semantically breaking change:

{
  "field": "riskScore",
  "type": "int",
  "description": "0-100 score"
}

Later:

{
  "field": "riskScore",
  "type": "int",
  "description": "0-1000 score"
}

Schema may still be compatible. Consumers may still parse it. But dashboards, thresholds, alerts, and reports may be wrong.

So contract testing has layers:

parse compatibility
+ semantic examples
+ consumer assumptions
+ quality rules
+ transformation golden outputs
+ replay/backfill behavior
+ operational failure behavior

4. Contract Test Pyramid for Pipelines

The lower layers are cheap and fast.

The upper layers are slower but closer to reality.

Do not skip lower layers because you have integration tests. Integration tests are too late and too expensive to explain every failure.

Do not rely only on lower layers because they cannot detect runtime semantics.

5. Static Contract Linting

Before compatibility, lint the contract itself.

Rules:

Every field has a description.
Every nullable field has a reason.
Every enum has unknown/future handling policy.
Every timestamp declares time semantics.
Every identifier declares uniqueness scope.
Every sensitive field declares classification.
Every output field declares owner.
Every breaking change has migration notes.
Every transform version has a manifest.
Every contract has examples.

Example lint failure:

Field caseStatus is enum but no UNKNOWN value or future-value policy is declared.

This is not bureaucracy. It prevents schema files from becoming false confidence.

6. Schema Compatibility Tests

For Kafka/Avro/Protobuf/JSON Schema pipelines, schema registry compatibility should run in CI before registration.

Test dimensions:

Test	Purpose
new producer vs old consumer	Can existing consumers keep reading?
old producer vs new consumer	Can new consumer read historical data?
transitive compatibility	Can new version handle all relevant older versions?
reserved field rule	Prevent field number/name reuse where applicable
default value check	Ensure added fields have safe defaults where required
enum evolution check	Ensure unknown value strategy exists

A Java build can call a schema registry compatibility endpoint, use a local compatibility library, or run a contract test harness against checked-in schema versions.

The important rule:

A schema must not be promoted only because it compiles.

7. Producer Contract Tests

A producer contract test proves that the producer emits valid examples.

It should not require full downstream infrastructure.

Example producer test shape:

class CaseEventProducerContractTest {

    @Test
    void producedCaseEscalatedEventMustMatchContract() {
        CaseEscalatedEvent event = new CaseEscalatedEvent(
                "CASE-123",
                "evt-001",
                Instant.parse("2026-07-04T10:00:00Z"),
                "HIGH",
                "OFFICER-9"
        );

        byte[] encoded = caseEventSerializer.serialize(event);
        ContractValidationResult result = contractValidator.validate(
                "enforcement.case-event",
                "2.4.0",
                encoded
        );

        assertTrue(result.valid(), result::message);
    }
}

This test should run in the producer repository.

It proves:

serializer works
schema is valid
required fields are present
example data satisfies semantic constraints
headers/envelope metadata are included

Producer contract tests should include negative examples too.

8. Producer Example Catalog

Each producer should publish examples.

Example layout:

contracts/
  enforcement.case-event/
    v2.4.0/
      schema.avsc
      examples/
        case-opened.valid.json
        case-escalated.valid.json
        case-closed.valid.json
        case-reopened.valid.json
        missing-case-id.invalid.json
        unknown-status.invalid.json
      semantics.md

Examples are not decorative.

They become test fixtures for consumers and transformations.

A contract without examples is under-specified.

9. Consumer Assumption Tests

Consumer-driven testing is powerful because consumers often depend on assumptions not captured in the provider schema.

For data pipelines, consumer assumptions may include:

field always present
enum subset only
timestamp monotonic per key
field range
no duplicate event ID
no out-of-order status transition
event type emitted after another event type
ID format
partition key stability
no PII in output

Example assumption file:

consumer: enforcement.sla-breach-detector
consumes: enforcement.case-event
version: 2.4.0
assumptions:
  - id: case-id-present
    rule: "caseId must be non-empty"
  - id: occurred-at-required
    rule: "occurredAt must be present and parseable as instant"
  - id: supported-statuses
    rule: "status must be one of OPEN, ASSIGNED, ESCALATED, SUSPENDED, CLOSED"
  - id: per-case-ordering
    rule: "events for same case must be ordered by source transaction position"

The producer pipeline should run these consumer assumptions before changing the contract.

This is similar in spirit to consumer-driven contract testing, but adapted to data/event semantics.

10. Consumer Test Harness

A consumer test harness runs producer examples against consumer assumptions.

public interface ConsumerAssumption<T> {
    String id();
    AssumptionResult check(T event);
}

public record AssumptionResult(boolean passed, String message) {
    public static AssumptionResult pass() {
        return new AssumptionResult(true, "ok");
    }

    public static AssumptionResult fail(String message) {
        return new AssumptionResult(false, message);
    }
}

Example:

public final class CaseIdPresentAssumption implements ConsumerAssumption<CaseEvent> {
    @Override
    public String id() {
        return "case-id-present";
    }

    @Override
    public AssumptionResult check(CaseEvent event) {
        if (event.caseId() == null || event.caseId().isBlank()) {
            return AssumptionResult.fail("caseId is required");
        }
        return AssumptionResult.pass();
    }
}

Run it against all valid producer examples.

If the producer adds a new event shape that violates an assumption, the break is visible before release.

11. Golden Dataset Transform Tests

A golden dataset test proves that a transformation version produces expected output for known input.

Input:

{"eventId":"evt-1","caseId":"CASE-1","status":"OPEN","occurredAt":"2026-07-04T09:00:00Z"}
{"eventId":"evt-2","caseId":"CASE-1","status":"ESCALATED","occurredAt":"2026-07-04T10:00:00Z"}

Expected output:

{"caseId":"CASE-1","riskBand":"HIGH","reason":"ESCALATED_CASE"}

Test:

@Test
void caseRiskTransformV3MatchesGoldenDataset() throws Exception {
    GoldenDataset dataset = GoldenDataset.load("enforcement.case-risk-score/v3.2.0");

    VersionedTransform<CaseEvent, OutputCommand> transform = registry.get(
            new TransformationId("enforcement.case-risk-score"),
            new TransformationVersion(3, 2, 0)
    );

    List<OutputCommand> actual = new ArrayList<>();
    for (CaseEvent input : dataset.inputs(CaseEvent.class)) {
        actual.add(transform.apply(dataset.context(), input).command());
    }

    GoldenComparator.assertMatches(dataset.expectedOutputs(), actual, dataset.comparisonRules());
}

This is the core test for transformation semantics.

12. Golden Dataset Design Rules

A good golden dataset is small but sharp.

It should include:

happy path
missing optional field
missing required field
unknown enum value
late event
duplicate event
out-of-order event
deleted entity
correction event
boundary values
reference data miss
PII field
multi-tenant data
backfill mode example
replay example

Avoid giant golden datasets that nobody can understand.

Use large datasets for regression and shadow diff. Use small golden datasets for semantic clarity.

13. Negative Contract Tests

A contract test suite must include invalid data.

Example:

missing required caseId -> reject
unknown status with strict policy -> quarantine
future occurredAt beyond tolerance -> quarantine
negative penalty amount -> reject
PII in public output -> fail build
duplicate eventId -> dedupe, not duplicate output

Negative tests prove the pipeline fails correctly.

For production systems, correct rejection is as important as correct acceptance.

14. Quality Contract Tests

Quality rules should be executable in CI against examples and in runtime against actual data.

Example Java shape:

public interface DataQualityRule<T> {
    String id();
    Severity severity();
    QualityCheckResult check(T record);
}

public enum Severity {
    BLOCK,
    QUARANTINE,
    WARN
}

Test:

@Test
void closedCaseMustHaveClosedAt() {
    CaseProjection projection = new CaseProjection(
            "CASE-1",
            "CLOSED",
            Instant.parse("2026-07-04T10:00:00Z"),
            null
    );

    QualityCheckResult result = new ClosedCaseMustHaveClosedAt().check(projection);

    assertFalse(result.passed());
    assertEquals(Severity.QUARANTINE, result.severity());
}

The rule is not just a test. It should also be used in runtime validation.

That keeps CI and production behavior aligned.

15. Temporal Contract Tests

Time breaks pipelines often.

Temporal contract tests should cover:

event time vs processing time
watermark behavior
allowed lateness
late record policy
business effective time
daylight saving boundary
timezone conversion
ordering per key
historical correction
bitemporal validity

Example test cases:

Case	Expected behavior
event arrives 5 minutes late	accepted into window
event arrives 5 days late	side output/quarantine/correction depending policy
correction event effective last month	restate affected projection
source commit time after event time	preserve both timestamps
same key events out of order	reorder if supported, otherwise quarantine/compensate

A pipeline that cannot explain time semantics cannot be trusted for reporting or alerting.

16. Replay Contract Tests

A replay test proves that reprocessing input does not corrupt output.

Test properties:

Replaying same records does not duplicate output.
Replaying from checkpoint produces same final state.
Replaying with same transform version produces equivalent output.
Replaying with same reference data version is deterministic.
DLQ/quarantine behavior is stable.

Pseudo-test:

@Test
void replayIsIdempotent() {
    PipelineHarness harness = PipelineHarness.inMemory();
    List<Envelope<CaseEvent>> input = Fixtures.caseLifecycle();

    harness.run(input);
    Snapshot first = harness.outputSnapshot();

    harness.run(input);
    Snapshot second = harness.outputSnapshot();

    assertEquals(first, second);
}

This test catches weak sink idempotency and bad checkpoint assumptions.

17. Backfill Contract Tests

A backfill test proves historical processing is safe.

It should check:

old schema versions can be decoded
historical reference data exists
output mode is correct
overwrite policy is explicit
duplicates are not produced
existing production data is not silently overwritten
downstream consumers are not surprised

Example backfill manifest:

backfillTest:
  inputRange:
    from: "2025-01-01T00:00:00Z"
    to: "2025-01-31T23:59:59Z"
  transformVersion: "3.2.0"
  referenceData:
    legal-calendar: "2025"
    sector-risk-multiplier: "2025-01"
  outputMode: versioned-append
  overwriteAllowed: false

Backfill without contract testing is a production incident waiting for a calendar slot.

18. CDC Contract Tests

CDC pipelines need special tests because source events are not business events by default.

Test cases:

insert event
update event
delete event
tombstone event
snapshot row
schema change
transaction with multiple row changes
out-of-order delivery simulation
duplicate source offset
connector restart

For Debezium-style envelopes, validate:

operation code handling
before/after semantics
source position extraction
transaction metadata if used
delete/tombstone policy
snapshot flag handling
schema history compatibility

A CDC consumer that only tests insert/update will fail when deletes arrive.

19. Outbox Contract Tests

Outbox contract tests prove that application transactions publish the intended event.

Test shape:

@Test
void closingCaseWritesOutboxEventInSameTransaction() {
    caseService.closeCase("CASE-123", "completed investigation");

    CaseEntity caseEntity = caseRepository.findById("CASE-123").orElseThrow();
    OutboxEvent event = outboxRepository.findByAggregateId("CASE-123").single();

    assertEquals("CLOSED", caseEntity.status());
    assertEquals("CaseClosed", event.eventType());
    assertEquals("CASE-123", event.aggregateId());
    assertNotNull(event.eventId());
    assertNotNull(event.occurredAt());
}

Important assertions:

business state and outbox event commit atomically
event ID is stable
aggregate ID is correct
event type is versioned
payload matches canonical event contract
event contains enough metadata
rollback prevents both state and outbox row

Outbox tests catch dual-write bugs before CDC sees anything.

20. Sink Contract Tests

A sink contract test proves output behavior under duplicate, retry, partial failure, and unknown outcome.

Test dimensions:

Scenario	Expected behavior
same output command twice	one final effect
timeout after successful write	retry does not duplicate
partial batch failure	successful records not duplicated, failed records retried/quarantined
stale version update	rejected or ignored according to policy
out-of-order update	version/CAS rule protects projection
delete command	tombstone/delete semantics correct

Example:

@Test
void sinkUpsertIsIdempotentByCommandKey() {
    UpsertProjection command = Fixtures.riskScoreUpsert("CASE-1", 80, "cmd-1");

    sink.write(command);
    sink.write(command);

    assertEquals(1, sink.countRows("CASE-1"));
    assertEquals(80, sink.findRiskScore("CASE-1"));
}

This is where delivery semantics become concrete.

21. Privacy Contract Tests

Privacy regression can happen through transformation changes.

Tests should prove:

PII fields are not included in public outputs
masked fields remain masked
tokens are not reversible outside secure boundary
sensitive headers are not propagated to lower-trust sinks
DLQ does not leak raw sensitive payload into broad-access storage
logs do not print sensitive payloads

Example rule:

@Test
void publicRiskOutputMustNotContainOfficerName() {
    RiskScoreOutput output = transform.apply(context, Fixtures.caseWithOfficerName()).command().payload();

    assertFalse(output.toString().contains("Jane Officer"));
    assertFalse(output.fields().containsKey("officerName"));
}

DLQ privacy is often forgotten.

A poison record may contain the most sensitive data because it failed validation and bypassed normal publishing.

22. Operational Contract Tests

Operational behavior is part of the contract.

Test:

retryable error is retried
non-retryable error is quarantined
poison record does not block next record forever
checkpoint commits only after sink success
graceful shutdown preserves progress
backpressure pauses source
metrics are emitted
DLQ record contains enough diagnostic metadata

Example:

@Test
void checkpointIsNotAdvancedWhenSinkFails() {
    FakeSource source = FakeSource.withOneRecord("offset-10", Fixtures.validEvent());
    FailingSink sink = new FailingSink();
    InMemoryCheckpointStore checkpoints = new InMemoryCheckpointStore();

    PipelineRunner runner = new PipelineRunner(source, transform, sink, checkpoints);

    assertThrows(SinkException.class, runner::runOnce);

    assertTrue(checkpoints.current().isEmpty());
}

This prevents silent data loss.

23. Contract Test Matrix

A useful CI matrix:

Change type	Required tests
Producer schema change	lint, schema compatibility, producer examples, consumer assumptions
Producer semantic change	above + semantic examples + consumer review
Transform code change	unit, golden dataset, quality, replay, diff vs previous
Transform version change	manifest validation, golden dataset, shadow diff
Sink logic change	idempotency, retry, partial failure, unknown outcome
State schema change	migration, rollback, replay, state compatibility
Backfill job change	historical decode, reference data, output mode, idempotency
Privacy rule change	PII scan, masking, DLQ/log checks

This matrix should be automated as much as possible.

Manual review should focus on semantic judgment, not checklist memory.

24. CI Architecture

For large organizations, not every repository owns every test.

A platform can provide:

contract registry
compatibility checker
test harness library
golden dataset runner
consumer assumption registry
schema linter
CI plugin
report publisher

Teams provide:

domain examples
semantic rules
ownership decisions
migration notes

25. Consumer Assumption Registry

A consumer assumption registry records who depends on what.

Minimal model:

CREATE TABLE consumer_assumption (
    consumer_id text NOT NULL,
    producer_contract text NOT NULL,
    producer_contract_version_range text NOT NULL,
    assumption_id text NOT NULL,
    assumption_json jsonb NOT NULL,
    severity text NOT NULL,
    owner text NOT NULL,
    created_at timestamptz NOT NULL,
    PRIMARY KEY (consumer_id, producer_contract, assumption_id)
);

This enables impact analysis:

Producer wants to change caseStatus enum.
Which consumers assume only OPEN, ASSIGNED, ESCALATED, CLOSED?

Without this registry, impact analysis becomes Slack archaeology.

26. Contract Test Reports

Test reports should be readable by humans.

A good report includes:

changed contract
affected consumers
compatibility result
semantic example result
golden dataset delta
quality rule result
privacy result
replay result
required approvals

Example:

Contract: enforcement.case-event 2.4.0 -> 2.5.0
Schema compatibility: PASS
Consumer assumptions: FAIL
  - enforcement.sla-breach-detector: unsupported status REOPENED
Golden transform tests: PASS
Privacy tests: PASS
Decision: BLOCKED until consumer supports REOPENED or producer gates rollout

This is more useful than a red CI job with a stack trace.

27. Versioned Golden Dataset Storage

Golden datasets should be versioned with the transformation or contract.

Recommended pattern:

contracts/
  producer/
    enforcement.case-event/
      v2.4.0/
        schema.avsc
        examples/
  transforms/
    enforcement.case-risk-score/
      v3.2.0/
        manifest.yaml
        input.jsonl
        expected-output.jsonl
        expected-quarantine.jsonl
        compare.yaml
  consumers/
    enforcement.sla-breach-detector/
      assumptions.yaml

Keep small semantic datasets in Git.

Store large representative datasets in object storage with immutable version references.

Do not store sensitive production payloads in Git.

28. Production Shadow Contract Tests

CI cannot cover all real data.

For high-risk changes, run shadow tests in production-like or production-shadow mode.

Shadow contract test:

same input -> old transform + new transform -> compare outputs -> publish diff metrics

Compare:

count by key
count by status
field-level delta
aggregate totals
outliers
invalid output count
quarantine rate
processing latency
state size
sink write pattern

Promotion rule:

Every material delta must be classified before cutover.

Do not require zero delta for semantic migrations. Require understood delta.

29. Handling Nondeterminism in Tests

Nondeterminism should be isolated.

Bad test:

assertEquals(expectedProducedAt, output.producedAt());

If producedAt is runtime time, this test is brittle.

Better:

TransformContext context = contextWithFixedClock("2026-07-04T10:00:00Z");

Or comparison rule:

fields:
  producedAt:
    mode: ignore
  score:
    mode: exact

Do not ignore fields casually.

Every ignored field reduces test strength.

30. Contract Testing for Stateful Pipelines

Stateful pipelines need sequence tests, not only record tests.

Example case lifecycle:

CaseOpened
CaseAssigned
CaseEscalated
CaseSuspended
CaseResumed
CaseClosed

The expected output may depend on the sequence.

Test harness:

@Test
void suspendedCaseDoesNotBreachSlaUntilResumed() {
    PipelineHarness harness = PipelineHarness.withTransform(new SlaBreachTransformV2());

    harness.accept(Fixtures.caseOpened("CASE-1", "2026-07-01T09:00:00Z"));
    harness.accept(Fixtures.caseSuspended("CASE-1", "2026-07-02T09:00:00Z"));
    harness.advanceEventTime("2026-07-10T09:00:00Z");

    assertFalse(harness.outputs().containsAlert("CASE-1", "SLA_BREACH"));

    harness.accept(Fixtures.caseResumed("CASE-1", "2026-07-11T09:00:00Z"));
    harness.advanceEventTime("2026-07-20T09:00:00Z");

    assertTrue(harness.outputs().containsAlert("CASE-1", "SLA_BREACH"));
}

Stateful contracts are about histories, not isolated events.

31. Contract Testing Event Ordering

Ordering assumptions must be tested.

Example:

Input order:
1. CaseClosed at 10:00
2. CaseOpened at 09:00 arrives late

Expected behavior must be explicit:

reorder by event time?
reject impossible transition?
publish correction?
quarantine late event?
rebuild state from source position?

A test should assert the policy.

Ambiguous ordering behavior is a correctness risk.

32. Contract Testing Deletes and Tombstones

Delete behavior is often under-tested.

Test all delete forms:

Source	Delete representation
CDC	delete event with `before`, null `after`
Kafka compacted topic	tombstone value
API sync	missing item plus deletion endpoint
File load	explicit delete marker row
Lake table	equality delete / position delete depending format

Expected behavior:

projection deleted?
marked inactive?
historical fact retained?
downstream tombstone emitted?
PII removed?
audit trail preserved?

There is no universal answer. There must be a tested answer.

33. Contract Testing Corrections

A correction is not the same as an update.

Correction says:

A previously published fact was wrong or incomplete.

Test correction behavior:

original event creates output
correction event arrives
affected output is restated or compensated
audit link connects correction to original
downstream output includes correction metadata

Example expected output:

{
  "caseId": "CASE-1",
  "riskScore": 70,
  "correctionOfEventId": "evt-100",
  "correctionReason": "SOURCE_DATA_ERROR"
}

Corrections are essential for regulated and analytical pipelines.

34. Contract Testing Multi-Tenancy

Multi-tenant pipelines need tenant isolation tests.

Test:

tenant A input cannot produce tenant B output
tenant-specific config is applied correctly
tenant-specific reference data is versioned
tenant quotas do not affect correctness
tenant canary routing is visible
tenant deletion/retention policy is honored

Example:

@Test
void tenantDataDoesNotCrossOutputBoundary() {
    harness.accept(Fixtures.caseEvent("tenant-a", "CASE-1"));
    harness.accept(Fixtures.caseEvent("tenant-b", "CASE-2"));

    assertTrue(harness.outputForTenant("tenant-a").containsCase("CASE-1"));
    assertFalse(harness.outputForTenant("tenant-a").containsCase("CASE-2"));
}

Isolation is a contract.

35. Test Data Governance

Do not casually copy production data into contract tests.

Rules:

Prefer synthetic data for Git-based fixtures.
Use realistic edge cases, not real sensitive values.
Mask or tokenize production-derived datasets.
Store large datasets in controlled object storage.
Version dataset references immutably.
Track dataset owner and retention.
Do not put secrets or PII into CI logs.

Test data is part of the pipeline governance surface.

36. Build Tool Integration

In Java, contract tests should run as part of normal build lifecycle.

Example Maven naming:

src/test/java/...               unit and contract tests
src/test/resources/contracts/... small fixtures
src/integration-test/java/...    broker/db/object-store integration tests

Example categories:

@Tag("contract")
@Tag("schema")
@Tag("golden")
@Tag("replay")
@Tag("privacy")

CI can run:

mvn test -Dgroups=contract
mvn verify -Dgroups=integration

The exact build system is less important than making contract tests unavoidable before release.

37. What to Block vs Warn

Not every contract violation should block a build.

Suggested policy:

Violation	CI behavior
Schema incompatible with active consumer	block
Missing field description	warn or block depending maturity
Semantic breaking change without migration note	block
Golden dataset mismatch	block unless approved expected delta
Privacy leak	block
Replay idempotency failure	block
New optional field without example	warn or block
Shadow diff outside threshold	block promotion, not necessarily build

Be explicit.

A flaky or arbitrary gate will be bypassed.

A precise gate becomes trusted infrastructure.

38. Common Anti-Patterns

38.1 Only Testing the Happy Path

A pipeline mostly fails at edges: late, duplicate, deleted, malformed, reordered, corrected, and replayed records.

38.2 Schema Compatibility as the Only Gate

Schema compatibility does not prove semantic compatibility.

38.3 Consumer Assumptions in Slack

If consumer assumptions are not executable, they will be forgotten.

38.4 Golden Dataset Too Large to Understand

Huge fixtures become unreadable. Use small semantic fixtures plus larger regression datasets.

38.5 Mocking Away the Contract

Mocks can accidentally encode what the test wants rather than what the producer emits.

Use real serializers, real schemas, real examples, and real validation paths.

38.6 No Negative Examples

A pipeline must prove it rejects invalid data correctly.

38.7 No Replay Test

If replay is not tested, idempotency is a belief.

38.8 DLQ Without Contract

DLQ records need schema, privacy rules, retention rules, and replay policy.

39. Production Checklist

Before accepting a pipeline contract change:

Is schema compatibility checked?
Are producer examples updated?
Are invalid examples included?
Are consumer assumptions executable?
Are transformation golden datasets updated?
Are output deltas explained?
Are data quality rules tested?
Are temporal edge cases covered?
Are replay and backfill behaviors tested?
Are sink idempotency tests passing?
Are CDC delete/tombstone cases covered?
Are privacy checks passing?
Are DLQ/quarantine contracts tested?
Is state migration tested if state changed?
Are contract test reports understandable?
Is there an approval path for expected breaking changes?

A good contract test suite does not eliminate production incidents.

It eliminates entire classes of preventable incidents.

40. Mental Model Summary

A contract test is not a test of implementation detail.

It is a test of a boundary promise.

For Java data pipelines, the important boundary promises are:

Producer emits valid facts.
Consumer assumptions remain true or are explicitly renegotiated.
Transform semantics are stable and versioned.
Quality rules are executable.
Sink effects are idempotent.
Replay and backfill are safe.
Privacy constraints hold even during failure.
Operational behavior preserves correctness under retry, DLQ, and checkpoint failure.

A top-tier pipeline team treats these promises as code.

Not as a document.

Not as tribal memory.

Not as dashboards discovered after damage.

As executable contracts that run before every meaningful change.

41. References

Confluent Schema Registry — schema evolution and compatibility types: https://docs.confluent.io/platform/current/schema-registry/fundamentals/schema-evolution.html
Pact documentation — consumer-driven contract testing concepts: https://docs.pact.io/
Great Expectations / GX Core — open source data quality testing, validation, and documentation: https://greatexpectations.io/
Apache Beam Programming Guide — pipeline testing and programming model concepts: https://beam.apache.org/documentation/programming-guide/

Lesson Recap

You just completed lesson 32 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 31

Versioned Transformations

Next Lesson

Lesson 33

Kafka as Pipeline Log