Series/Learn Java Data Contract Engineering in Action

Build CoreOrdered learning track

Canonical Model vs Transport Model vs Storage Model

Learn Java Data Contract Engineering in Action - Part 026

Canonical model vs transport model vs storage model: production design patterns for separating API DTOs, event schemas, XML contracts, Avro/Protobuf models, database rows, domain aggregates, and Java mapping boundaries.

[2026-07-03]14 min read2614 words

In This Lesson

1. The Core Problem 2. Definitions That Actually Help 3. Model Types in a Production System

PrevNext

Lesson 2650 lesson track10–27 Build Core

#java#data-contract#canonical-model#dto+6 more

Part 026 — Canonical Model vs Transport Model vs Storage Model

One of the fastest ways to damage a system is to use one model everywhere.

It starts innocently:

We already have a Case class. Let us reuse it for API, database, events, and workflow.

Then the class grows.

It gets JSON annotations.

Then JPA annotations.

Then Avro defaults.

Then Protobuf compatibility hacks.

Then UI-only fields.

Then database-only fields.

Then workflow-only fields.

Then a field that is required by one API but optional in another.

Then a field that must never be exposed externally but is accidentally serialized.

Now the model is not a model.

It is a junk drawer.

This part explains how to separate:

canonical model
transport model
storage model
domain model
event model
integration model
generated contract model
read model
command model

The point is not architectural ceremony.

The point is to keep invariants clear.

1. The Core Problem

Different boundaries need different truths.

A database row cares about persistence.

An API response cares about consumer representation.

A domain aggregate cares about behavior and invariants.

An event cares about historical fact.

A workflow variable cares about process execution.

A generated OpenAPI DTO cares about wire compatibility.

An Avro schema cares about reader/writer schema resolution.

A Protobuf message cares about field numbers and binary compatibility.

An XSD contract cares about XML namespace, element structure, and validation.

Trying to force all of those into one Java class creates hidden coupling.

A production-grade system instead makes model boundaries explicit.

The mapping cost is real.

The coupling cost of not mapping is larger.

2. Definitions That Actually Help

2.1 Domain Model

The domain model represents business concepts and invariants.

It should answer:

What is true in the business?
What operations are allowed?
What state transitions are valid?
What must never happen?

Example:

public final class CaseFile {
    private final CaseId id;
    private CaseStatus status;
    private final ApplicantId applicantId;
    private final List<EvidenceItem> evidenceItems;

    public void close(Decision decision, OfficerId officerId, Instant now) {
        if (status != CaseStatus.UNDER_REVIEW) {
            throw new CaseStateConflict("Only cases under review can be closed");
        }
        if (evidenceItems.isEmpty()) {
            throw new CaseRuleViolation("Case cannot be closed without evidence");
        }
        this.status = CaseStatus.CLOSED;
        // record domain event internally
    }
}

Notice what is absent:

no JSON annotation
no JPA annotation
no Avro annotation
no OpenAPI annotation
no database column name
no transport-specific null hacks

The domain model should not know how it is serialized.

2.2 Transport Model

The transport model is the shape exposed at a communication boundary.

Examples:

OpenAPI request DTO
OpenAPI response DTO
XSD/JAXB generated class
Protobuf message
GraphQL input/output type
JSON Schema-validated payload

It should answer:

What can cross this boundary?
What is required on the wire?
What can clients ignore?
What is compatibility-safe?

Example OpenAPI DTO generated from contract:

public class CreateCaseRequestDto {
    private String applicantId;
    private String caseType;
    private String priority;
    private String externalReference;

    // generated getters/setters omitted
}

This object is not your domain aggregate.

It is a boundary object.

2.3 Storage Model

The storage model represents persistence shape.

It should answer:

How is data stored, indexed, joined, partitioned, queried, and migrated?

Example:

public record CaseRow(
    UUID id,
    String caseNumber,
    String applicantId,
    String statusCode,
    String priorityCode,
    Instant createdAt,
    Instant updatedAt,
    long version
) {}

Storage model concerns:

primary key
foreign key
version column
index-friendly shape
normalized vs denormalized columns
enum code storage
audit columns
soft-delete marker
migration compatibility

The storage model should not leak directly into external contracts.

2.4 Event Model

An event model represents a historical fact that other systems may consume.

It should answer:

What happened?
When did it happen?
Who/what caused it?
What minimum data is needed by consumers?
How can this fact be replayed later?

Example:

{
  "eventId": "01J1ZB9NN7C0ZTY4Y4XHYJMAQJ",
  "eventType": "CaseCreated",
  "occurredAt": "2026-07-03T08:15:30Z",
  "caseId": "CASE-2026-000123",
  "applicantId": "APP-123",
  "caseType": "BENEFIT_REVIEW"
}

An event is not a database row.

An event is not always the whole aggregate.

An event is a contract with time.

2.5 Canonical Model

“Canonical model” is the most abused term in integration architecture.

It can mean two very different things.

Useful Meaning

A canonical model is a stable shared vocabulary for a bounded integration context.

Example:

Within the enforcement platform, CaseId, PartyId, EvidenceId, DecisionId, and Money are represented consistently across contracts.

This is useful.

Dangerous Meaning

A canonical model is one enterprise-wide object model that every system must use.

Example:

Every system must send and store the canonical EnterpriseCase object with 350 fields.

This usually fails.

Why?

Because every boundary has different semantics, lifecycle, ownership, and volatility.

A giant canonical object becomes the shared database of integration.

3. Model Types in a Production System

A serious Java system may have many model types.

Model Type	Purpose	Owner	Should Be Generated?
Domain aggregate	Enforce business invariants	Domain/application team	No
Command object	Internal use case input	Application layer	Usually no
Query/read model	Efficient read response	Application/read side	Maybe
OpenAPI request DTO	HTTP boundary input	API contract	Often yes
OpenAPI response DTO	HTTP boundary output	API contract	Often yes
JSON Schema payload	Flexible JSON validation boundary	Contract/platform	Maybe
Avro event	Stream contract	Event producer with consumer governance	Yes
Protobuf message	RPC/binary contract	Service contract	Yes
XSD/JAXB class	XML integration boundary	XML contract	Yes
Persistence row/entity	Database storage	Service owner	Sometimes
Analytics model	BI/lake consumption	Data platform	Maybe
Workflow variable DTO	BPM/workflow execution	Process owner	Maybe

The top 1% skill is not memorizing these names.

The skill is knowing which invariants belong where.

4. Why One Model Fails

4.1 Different Optionality

OpenAPI create request:

applicantId required
caseType required
priority optional

OpenAPI response:

caseId required
status required
createdAt required
assignedOfficer optional

Database row:

id required
version required
created_by required
created_at required
updated_at required

Domain aggregate:

status cannot be null
caseNumber must exist after creation
assignedOfficer may be absent depending on status

Avro event:

new fields require defaults for compatibility

Protobuf message:

field presence and default values behave differently depending on syntax/edition

There is no single optionality rule that fits all boundaries.

4.2 Different Versioning Rules

OpenAPI compatibility says:

Do not remove response fields if consumers may depend on them.

Avro compatibility says:

Reader/writer schema resolution decides whether old/new consumers can read old/new data.

Protobuf compatibility says:

Field numbers are the durable identity. Do not reuse removed field numbers.

Database migration says:

Expand, backfill, dual-read/write, contract.

A single Java class cannot encode all versioning semantics safely.

4.3 Different Security Requirements

Storage model may contain:

internal notes
risk score
fraud signal
supervisor comments
PII fields

External response should expose only:

caseId
status
public timeline
allowed actions

If you serialize storage/domain objects directly, data leakage is one annotation mistake away.

4.4 Different Performance Requirements

Domain aggregate may load full state.

List API response needs only summary fields.

Analytics pipeline wants denormalized facts.

Workflow engine wants minimal variables.

Database storage wants normalized/indexed shape.

One model either underfetches, overfetches, or leaks.

4.5 Different Ownership

A database table is owned by a service.

A public API is owned by provider and consumers.

An event is owned by producer but constrained by consumers.

An analytics model may be owned by data platform.

A workflow variable may be owned by process team.

One class cannot represent multiple ownership boundaries without turning every change into a negotiation.

5. The Correct Rule: Models Are Boundary-Specific

Use this invariant:

A model should be optimized for the boundary it belongs to.

Then connect models through explicit mapping.

This looks like more code.

It is also more control.

6. Canonical Model: Use Carefully

The phrase “canonical model” is attractive because it promises consistency.

But consistency has levels.

6.1 Good Canonicalization: Value Types

Canonicalize stable primitives and value objects.

Examples:

Concept	Canonical Decision
Timestamp	UTC instant string in ISO-8601 format at API boundary
Money	amount as decimal string + ISO 4217 currency
Identifier	stable opaque ID, not database sequence exposed blindly
Country	ISO country code if business supports it
Language	BCP 47 language tag if needed
Decimal precision	explicitly bounded by domain
Case status	controlled vocabulary with unknown-value strategy

Example Java value object:

public record Money(BigDecimal amount, Currency currency) {
    public Money {
        Objects.requireNonNull(amount, "amount");
        Objects.requireNonNull(currency, "currency");
        if (amount.scale() > 2) {
            throw new IllegalArgumentException("Money scale must be <= 2");
        }
    }
}

Canonical value types reduce ambiguity.

6.2 Good Canonicalization: Shared Vocabulary

A bounded context can define shared language:

Case
Party
Evidence
Decision
Violation
EnforcementAction
Appeal

But each boundary still gets its own representation.

Case domain aggregate != Case API response != CaseCreated event != case table row

6.3 Bad Canonicalization: One Giant Enterprise Object

Bad:

{
  "enterpriseCase": {
    "caseId": "...",
    "legacyCaseId": "...",
    "caseType": "...",
    "appealInfo": {},
    "paymentInfo": {},
    "workflowInfo": {},
    "analyticsInfo": {},
    "uiInfo": {},
    "migrationInfo": {},
    "deprecatedField1": "...",
    "deprecatedField2": "..."
  }
}

This object becomes:

too large to understand
too stable to improve
too generic to validate strongly
too sensitive to expose safely
too coupled to change independently

Do not confuse shared vocabulary with shared payload.

7. Transport Model Design

A transport model should be designed for the consumer use case.

7.1 Request Models Should Match Commands, Not Tables

Bad request:

{
  "id": null,
  "status": "NEW",
  "version": 0,
  "createdAt": null,
  "updatedAt": null,
  "createdBy": null,
  "applicantId": "APP-123",
  "caseType": "BENEFIT_REVIEW"
}

This exposes storage lifecycle fields.

Better:

{
  "applicantId": "APP-123",
  "caseType": "BENEFIT_REVIEW",
  "externalReference": "PORTAL-REQ-987"
}

The server owns:

ID generation
initial status
audit fields
version
timestamps

7.2 Response Models Should Match Consumer Decisions

A list endpoint should not return the full aggregate.

Bad:

GET /cases

Returns 200 fields per case.

Better:

{
  "items": [
    {
      "caseId": "CASE-2026-000123",
      "caseType": "BENEFIT_REVIEW",
      "status": "UNDER_REVIEW",
      "createdAt": "2026-07-03T08:15:30Z",
      "availableActions": ["SUBMIT_EVIDENCE"]
    }
  ],
  "nextPageToken": "..."
}

This model serves the list use case.

Detailed data belongs to:

GET /cases/{caseId}

7.3 Transport Models Need Stability

Once external consumers depend on a field, it becomes expensive to remove.

Therefore:

expose fewer fields
name fields carefully
avoid internal implementation terms
define enum evolution strategy
avoid returning fields “just in case”
avoid exposing database IDs unless intentionally stable

8. Storage Model Design

Storage models are optimized for persistence and queries.

8.1 Database Row Is Not the Domain

A database row may be an implementation detail.

Example:

CREATE TABLE regulatory_case (
    id UUID PRIMARY KEY,
    case_number TEXT NOT NULL UNIQUE,
    applicant_id TEXT NOT NULL,
    case_type_code TEXT NOT NULL,
    status_code TEXT NOT NULL,
    priority_code TEXT NOT NULL,
    created_at TIMESTAMPTZ NOT NULL,
    updated_at TIMESTAMPTZ NOT NULL,
    version BIGINT NOT NULL,
    deleted_at TIMESTAMPTZ NULL
);

This schema includes persistence concerns:

primary key
unique constraint
code columns
timestamps
optimistic lock version
soft delete

The API should not have to mirror this.

8.2 Storage Model Can Be More Normalized Than Domain

Domain object:

CaseFile contains applicant snapshot and evidence items.

Storage may split:

regulatory_case
case_party
case_evidence
case_status_history
case_audit_log

The domain aggregate can be reconstructed from multiple rows.

8.3 Storage Model Can Be More Denormalized Than Domain

For read performance, you may store:

case_search_projection
case_dashboard_projection
case_timeline_projection

These are read models, not domain truth.

9. Event Model Design

Events should not blindly serialize domain aggregates.

9.1 Event as Fact

Good event name:

CaseCreated
EvidenceSubmitted
CaseAssigned
DecisionIssued
CaseClosed

Weak event name:

CaseUpdated

CaseUpdated hides meaning.

Consumers cannot know what changed without diffing payloads.

9.2 Event Payload Should Be Sufficient, Not Maximal

Bad:

Publish entire Case aggregate on every change.

This increases coupling and data leakage.

Better:

{
  "eventId": "01J1Z...",
  "eventType": "EvidenceSubmitted",
  "occurredAt": "2026-07-03T08:45:00Z",
  "caseId": "CASE-2026-000123",
  "evidenceId": "EVD-456",
  "submittedBy": "PARTY-789",
  "channel": "PORTAL"
}

9.3 Event Schema Is a Long-Term Contract

Events may be replayed years later.

Therefore event schemas need:

stable event type
stable field names
compatibility discipline
default values for Avro additions
reserved field numbers for Protobuf removals
PII classification
retention policy
replay semantics

10. Command Model vs Request DTO

A request DTO is external.

A command object is internal.

They may look similar, but they are not the same.

Example request DTO:

public class CreateCaseRequestDto {
    public String applicantId;
    public String caseType;
    public String priority;
    public String externalReference;
}

Internal command:

public record CreateCaseCommand(
    ApplicantId applicantId,
    CaseType caseType,
    Priority priority,
    Optional<ExternalReference> externalReference,
    OfficerId requestedBy,
    TenantId tenantId,
    Instant receivedAt
) {}

The command includes trusted context that should not come from the client body:

authenticated actor
tenant
received time
authorization scope
request correlation ID

Never trust the client to send fields the server must derive.

11. Mapping as an Architectural Boundary

Mapping is not boilerplate.

Mapping is where you enforce boundary translation.

11.1 API Request to Command

public final class CaseApiMapper {
    public CreateCaseCommand toCommand(
            CreateCaseRequestDto dto,
            RequestContext context
    ) {
        return new CreateCaseCommand(
            ApplicantId.parse(dto.getApplicantId()),
            CaseType.parse(dto.getCaseType()),
            dto.getPriority() == null
                ? Priority.NORMAL
                : Priority.parse(dto.getPriority()),
            Optional.ofNullable(dto.getExternalReference()).map(ExternalReference::new),
            context.actorId(),
            context.tenantId(),
            context.receivedAt()
        );
    }
}

Boundary rules belong here:

string to value object
default assignment
trusted context injection
request field normalization
rejection of unsupported combinations

11.2 Domain to Response DTO

public CaseResponseDto toResponse(CaseFile caseFile, ActionPolicy actions) {
    CaseResponseDto dto = new CaseResponseDto();
    dto.setCaseId(caseFile.id().value());
    dto.setStatus(caseFile.status().externalCode());
    dto.setCaseType(caseFile.caseType().externalCode());
    dto.setCreatedAt(caseFile.createdAt().toString());
    dto.setAvailableActions(actions.availableActionsFor(caseFile));
    return dto;
}

Do not expose every domain field.

Expose the representation the consumer needs.

11.3 Domain to Event

public CaseCreatedEvent toEvent(CaseFile caseFile, DomainEventMetadata metadata) {
    return CaseCreatedEvent.newBuilder()
        .setEventId(metadata.eventId().value())
        .setOccurredAt(metadata.occurredAt())
        .setCaseId(caseFile.id().value())
        .setApplicantId(caseFile.applicantId().value())
        .setCaseType(caseFile.caseType().externalCode())
        .build();
}

Event mapping should be deliberate.

Do not publish internal object graphs accidentally.

11.4 Storage Row to Domain

public CaseFile toDomain(CaseRow row, List<EvidenceRow> evidenceRows) {
    return CaseFile.rehydrate(
        new CaseId(row.caseNumber()),
        new ApplicantId(row.applicantId()),
        CaseType.fromCode(row.caseTypeCode()),
        CaseStatus.fromCode(row.statusCode()),
        evidenceRows.stream().map(this::toEvidenceItem).toList(),
        row.version()
    );
}

Rehydration should rebuild domain invariants.

If the database contains invalid data, fail loudly or route to repair workflow.

12. Anti-Corruption Layer

An anti-corruption layer protects your domain from external models.

It translates foreign language into local language.

Example:

Legacy XML says: <ComplaintCategory>99</ComplaintCategory>
Local domain says: CaseType.OTHER_REGULATORY_COMPLAINT

Do not spread legacy codes throughout your domain.

Centralize translation.

12.1 ACL Responsibilities

translate names
translate codes
normalize dates
validate missing legacy fields
map old status values to current workflow states
preserve raw payload for audit if needed
quarantine untranslatable records
emit structured errors

12.2 ACL Should Not Become Dumping Ground

Bad ACL:

10,000-line mapper with every integration rule in one class.

Better ACL structure:

legacy-case-adapter/
  xsd-generated/
  mapper/
    LegacyCaseMapper.java
    LegacyPartyMapper.java
    LegacyEvidenceMapper.java
  code/
    LegacyStatusTranslator.java
    LegacyCategoryTranslator.java
  validation/
    LegacyCaseSemanticValidator.java
  quarantine/
    LegacyPayloadQuarantineService.java

13. Package Structure in Java

A clean Java module layout makes boundaries visible.

case-service/
  src/main/java/com/example/caseapp/
    domain/
      CaseFile.java
      CaseStatus.java
      CaseType.java
      EvidenceItem.java
      value/
        CaseId.java
        ApplicantId.java
    application/
      command/
        CreateCaseCommand.java
        CloseCaseCommand.java
      service/
        CaseApplicationService.java
      port/
        CaseRepository.java
        CaseEventPublisher.java
    adapter/
      http/
        generated/          # OpenAPI generated DTO/interfaces
        mapper/
          CaseApiMapper.java
        resource/
          CaseResource.java
        error/
          ProblemMapper.java
      persistence/
        row/
          CaseRow.java
          EvidenceRow.java
        mapper/
          CasePersistenceMapper.java
        repository/
          JdbcCaseRepository.java
      event/
        avro/               # Avro generated classes
        mapper/
          CaseEventMapper.java
        publisher/
          KafkaCaseEventPublisher.java
      legacyxml/
        generated/          # JAXB generated classes
        mapper/
          LegacyCaseMapper.java

Dependencies should point inward:

Domain must not depend on adapters.

14. Generated Models: Keep Them at the Edge

Generated models are useful.

They are also dangerous if allowed into the core.

Generated code may change because:

generator version changed
schema changed
naming option changed
validation option changed
runtime dependency changed
nullable handling changed
enum representation changed

Therefore:

Generated models live at the boundary.
Domain models live in the core.
Mapping connects them.

14.1 Bad Dependency

Domain service accepts CreateCaseRequestDto.

Now your domain depends on OpenAPI.

14.2 Good Dependency

HTTP adapter accepts CreateCaseRequestDto.
HTTP mapper converts it to CreateCaseCommand.
Application service accepts CreateCaseCommand.

This keeps OpenAPI changes away from domain code.

15. Canonical Type Library

A good compromise is a canonical type library, not a canonical object model.

Example module:

contract-types/
  Money.schema.json
  Money.avsc
  money.proto
  common-openapi.yaml
  java/
    Money.java
    ExternalReference.java
    CorrelationId.java

Use it for stable low-level concepts:

money
timestamp
correlation ID
tenant ID
pagination metadata
problem details
audit metadata

Do not use it for giant domain aggregates.

15.1 Contract Type Example: Money

OpenAPI:

Money:
  type: object
  required:
    - amount
    - currency
  properties:
    amount:
      type: string
      pattern: "^-?\\d+\\.\\d{2}$"
      example: "123.45"
    currency:
      type: string
      minLength: 3
      maxLength: 3
      example: "USD"

Avro:

{
  "type": "record",
  "name": "Money",
  "namespace": "com.example.contract.common",
  "fields": [
    { "name": "amount", "type": { "type": "bytes", "logicalType": "decimal", "precision": 18, "scale": 2 } },
    { "name": "currency", "type": "string" }
  ]
}

Protobuf:

message Money {
  string amount = 1;
  string currency = 2;
}

The representation differs by format, but the semantic contract is shared.

16. Case Study: Regulatory Case Management

Suppose the platform supports:

external case intake API
internal case workflow
Kafka events
PostgreSQL storage
legacy XML import
reporting lake

The same business concept appears in several models.

16.1 OpenAPI Create Request

{
  "applicantId": "APP-123",
  "caseType": "BENEFIT_REVIEW",
  "externalReference": "PORTAL-REQ-987"
}

Purpose:

Consumer asks the platform to create a case.

16.2 Domain Aggregate

CaseFile
- id
- applicantId
- caseType
- status
- evidenceItems
- assignedOfficer
- decision
- state transition methods

Purpose:

Enforce lifecycle invariants.

16.3 Storage Rows

regulatory_case
case_evidence
case_assignment
case_decision
case_audit_log

Purpose:

Persist and query data efficiently.

16.4 Avro Event

CaseCreated
- eventId
- occurredAt
- caseId
- applicantId
- caseType
- sourceChannel

Purpose:

Notify downstream systems of a historical fact.

16.5 Legacy XML Model

<LegacyComplaint>
  <ComplaintNo>LC-7788</ComplaintNo>
  <Category>99</Category>
  <ReceivedDate>03/07/2026</ReceivedDate>
</LegacyComplaint>

Purpose:

Import old-system data into modern domain language.

16.6 Reporting Model

case_daily_snapshot
- snapshot_date
- case_id
- status
- age_days
- region
- assigned_team

Purpose:

Support dashboard and regulatory reporting.

Trying to make all of these one class is a category error.

17. Mapping Failure Modes

Mapping creates a place to catch failure.

Common failure modes:

Failure	Example	Handling
Unknown code	Legacy category `99X`	Quarantine or map to `UNKNOWN` with warning
Invalid date	`31/02/2026`	Reject payload with structured validation error
Precision loss	BigDecimal to double	Never use double for money
Missing required field	no applicant ID	Reject at boundary
Unsupported enum	new external status	Preserve raw value, map to unknown, alert
Timezone ambiguity	local date-time without zone	Require explicit zone or map by source policy
ID collision	external reference not unique	Scope by source system/tenant
PII leakage	internal notes in response	Response mapper must whitelist fields

Mapping is not just transformation.

Mapping is controlled semantic translation.

18. Testing Model Boundaries

Test each mapper as a contract boundary.

18.1 API Mapper Tests

[ ] Required request fields become value objects.
[ ] Defaults are assigned consistently.
[ ] Client cannot override server-owned fields.
[ ] Unknown enum values are handled according to policy.
[ ] Invalid values produce structured errors.

18.2 Persistence Mapper Tests

[ ] Row rehydrates valid aggregate.
[ ] Invalid persisted status is detected.
[ ] Version is preserved.
[ ] Soft-deleted rows are filtered by repository policy.
[ ] Decimal/time values round-trip safely.

18.3 Event Mapper Tests

[ ] Event contains stable identifiers.
[ ] Event does not expose internal-only fields.
[ ] Event time is source-of-truth occurrence time.
[ ] Event schema defaults are respected.
[ ] Generated event passes schema validation.

18.4 Legacy Mapper Tests

[ ] Known legacy codes translate correctly.
[ ] Unknown codes are quarantined or mapped safely.
[ ] Date formats are parsed by source-specific rule.
[ ] Raw payload reference is preserved for audit.
[ ] Mapping errors are observable.

19. Model Boundary Decision Framework

When introducing a new model, ask:

1. Who owns this model?
2. Who consumes it?
3. Is it internal or external?
4. What compatibility rules apply?
5. Is it generated or handwritten?
6. What invariant does it enforce?
7. What invariant does it intentionally not enforce?
8. Can it contain sensitive data?
9. How long must it remain readable?
10. What happens if a field is removed, renamed, or retyped?

If two models have different answers, they should probably be separate.

20. Practical Heuristics

Heuristic 1: Generated Models Do Not Enter the Domain

Generated DTOs stay in adapters.

Heuristic 2: Database Rows Do Not Leave the Service

Never return database entities directly from API controllers.

Heuristic 3: Events Are Facts, Not CRUD Snapshots by Default

Name events by business occurrence.

Heuristic 4: Canonicalize Value Types, Not Whole Enterprise Objects

Shared Money is good.

Shared EnterpriseCaseWithEverything is dangerous.

Heuristic 5: Mapping Is Where You Pay for Decoupling

Do not remove mapping just because it feels repetitive.

Heuristic 6: Whitelist External Responses

Never serialize internal objects and hope annotations hide sensitive fields.

Heuristic 7: Keep Storage Migration Independent from API Evolution

Database schema can change without forcing API contract change.

API contract can evolve without forcing immediate database shape change.

Heuristic 8: Preserve Raw External Payloads When Audit Matters

For regulatory-grade imports, keep raw payload reference/hash so mapping decisions are defensible.

21. Production Checklist

Model Inventory
[ ] Domain models are handwritten and annotation-light.
[ ] Generated contract models stay at boundaries.
[ ] Storage models are not exposed externally.
[ ] Event models are explicit and versioned.
[ ] Legacy models are isolated behind ACL.

Mapping
[ ] Request-to-command mapping injects trusted context.
[ ] Domain-to-response mapping whitelists fields.
[ ] Domain-to-event mapping avoids internal leakage.
[ ] Storage-to-domain mapping revalidates invariants.
[ ] Mapping failures are observable and testable.

Canonicalization
[ ] Shared value types are standardized.
[ ] Giant enterprise canonical object is avoided.
[ ] Controlled vocabularies have ownership.
[ ] Unknown-value strategy is defined.

Security
[ ] PII fields are classified.
[ ] Sensitive internal fields cannot accidentally serialize.
[ ] Audit data is separated from public response data.
[ ] Raw legacy payload retention follows policy.

Evolution
[ ] API model can evolve independently from DB schema.
[ ] Event schema compatibility is checked.
[ ] Protobuf field numbers are stable.
[ ] Avro defaults and aliases are reviewed.
[ ] Mapping tests cover old and new contract versions.

22. Exercise

Take a domain concept:

EnforcementAction

Design separate models for:

CreateEnforcementActionRequest OpenAPI DTO
EnforcementAction domain aggregate/entity
enforcement_action PostgreSQL row
EnforcementActionCreated Avro event
EnforcementActionMessage Protobuf RPC message
legacy XSD import model
dashboard read model

For each model, write:

Purpose:
Owner:
Consumers:
Required fields:
Optional fields:
Forbidden fields:
Versioning rules:
Security classification:
Mapping rules:
Failure modes:

Then answer:

Which fields are shared by all models?
Which fields exist only in storage?
Which fields exist only in API response?
Which fields exist only in event history?
Which fields must never cross the external boundary?

This exercise forces the central insight:

A concept can be shared without sharing one physical model.

23. Final Mental Model

Do not ask:

Can we reuse the same Java class?

Ask:

Do these boundaries have the same owner, lifecycle, compatibility rule, security policy, and invariant set?

If the answer is no, separate the models.

Use mapping deliberately.

A top-tier data contract engineer is not someone who avoids DTOs.

A top-tier data contract engineer knows which representation belongs to which boundary, which invariants live there, and how to evolve each one without corrupting the others.

That is the essence of contract engineering across XSD, JSON Schema, Avro, Protobuf, OpenAPI, Java, and storage.

References

OpenAPI Specification 3.2.0 — https://spec.openapis.org/oas/v3.2.0.html
Apache Avro 1.12.0 Specification — https://avro.apache.org/docs/1.12.0/specification/
Protocol Buffers Language Guide — https://protobuf.dev/programming-guides/proto3/
Protocol Buffers Editions Overview — https://protobuf.dev/editions/overview/
JSON Schema Draft 2020-12 — https://json-schema.org/draft/2020-12
W3C XML Schema 1.1 — https://www.w3.org/TR/xmlschema11-1/

Lesson Recap

You just completed lesson 26 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 25

OpenAPI Versioning, Error Model, Pagination, and Idempotency

Next Lesson

Lesson 27

Contract Composition, Polymorphism, and Extension Patterns