Schema Registry Architecture: Subject Naming, Artifact Identity, Compatibility Rules, and Access Control
Learn Java API Contract Engineering, Event Contract Engineering & Schema Governance - Part 022
Schema registry architecture for Java event platforms: subject naming, artifact identity, compatibility rules, references, access control, environment promotion, metadata, and registry governance.
Part 022 — Schema Registry Architecture: Subject Naming, Artifact Identity, Compatibility Rules, and Access Control
Tujuan Pembelajaran
Schema registry adalah salah satu control point paling penting dalam event-driven architecture. Tetapi banyak organisasi memperlakukannya hanya sebagai tempat menyimpan Avro schema ID. Itu terlalu sempit.
Schema registry yang matang adalah:
sistem governance untuk menyimpan, mengidentifikasi, memvalidasi, mengevolusi, mendistribusikan, mengamankan, dan mengaudit schema/API artifacts.
Part ini membahas arsitektur schema registry dari perspektif Java enterprise event platform.
Setelah part ini, kamu harus mampu:
- menjelaskan peran schema registry dalam event contract lifecycle;
- membedakan subject, artifact, group, version, global ID, content ID, and schema reference;
- memilih subject/artifact naming strategy;
- mendesain compatibility rules global, group-level, artifact-level;
- mengelola Avro, Protobuf, JSON Schema, OpenAPI, AsyncAPI as artifacts;
- mendesain promotion across environments;
- mengatur access control dan ownership;
- mengintegrasikan registry dengan producer/consumer Java;
- membuat CI gates untuk registry;
- menghindari registry anti-pattern seperti subject chaos, compatibility none, and environment drift.
1. What Schema Registry Actually Does
Schema registry responsibilities:
- stores schema/artifact content;
- assigns identifiers;
- versions artifacts;
- validates syntax/content;
- enforces compatibility rules;
- resolves references;
- serves schemas to serializers/deserializers;
- supports discovery;
- records metadata;
- controls access;
- supports promotion and audit;
- integrates with CI/CD.
Registry is not just runtime dependency. It is governance infrastructure.
2. Registry Concepts
Different products use different words. Learn the concepts.
| Concept | Meaning |
|---|---|
| subject/artifact | logical identity under which versions are stored |
| group | namespace/collection of artifacts |
| version | version of artifact under identity |
| global ID | registry-wide unique identifier for artifact version/content |
| content ID | identifier for identical content |
| schema ID | commonly used runtime ID embedded in payload/header |
| compatibility rule | policy for whether new version is allowed |
| validity rule | schema syntax/content validation |
| references | schema depends on another schema |
| metadata | owner, lifecycle, labels, description |
| registry API | management and runtime access interface |
Example:
group: case
artifactId: com.acme.case.events.CaseApproved
version: 7
globalId: 18492
compatibility: BACKWARD_TRANSITIVE
format: AVRO
3. Runtime Flow with Schema ID
Typical Kafka schema registry flow:
Key point:
The schema ID in a message must be enough for consumer deserializer to find the writer schema.
If consumers cannot resolve old schema IDs during replay, event history becomes unreadable.
4. Registry as Design-Time and Runtime System
4.1 Design-Time
Used by:
- contract reviews;
- CI compatibility checks;
- schema linting;
- artifact metadata;
- promotion workflow;
- catalog generation;
- governance approvals.
4.2 Runtime
Used by:
- producer serializers;
- consumer deserializers;
- generic processors;
- DLQ tooling;
- replay tools;
- data lake ingestion;
- schema-aware gateways.
Design-time registry access can be stricter than runtime read access.
5. Artifact Types
A mature registry may store more than Avro:
| Artifact type | Use |
|---|---|
| Avro schema | Kafka/event payload |
| Protobuf schema | Kafka/gRPC/message payload |
| JSON Schema | JSON events, APIs, configs |
| OpenAPI | HTTP API contract |
| AsyncAPI | message-driven API contract |
| GraphQL schema | graph API contract |
| XML Schema | legacy/batch contracts |
| Kafka Connect schema | connector payload |
| WSDL | SOAP legacy systems |
Even if your runtime serializer only uses Avro/Protobuf, storing OpenAPI/AsyncAPI in registry or catalog can unify governance.
6. Subject / Artifact Naming Strategy
Naming strategy is one of the most important registry architecture decisions.
Bad names:
test
value
case-events-value
schema1
customer
new-schema
Good names:
com.acme.case.events.CaseApproved
com.acme.customer.events.CustomerRegistered
com.acme.common.Money
case.CaseApproved
customer.CustomerRegistered
6.1 Topic-Name Strategy
Subject tied to topic:
case-events-value
case-events-key
Pros:
- simple;
- common default in some ecosystems;
- good for one value schema per topic;
- topic-level compatibility.
Cons:
- poor for multi-event topic;
- unrelated event types share compatibility scope;
- record reuse across topics harder;
- subject identity coupled to topic name;
- topic migration changes schema identity.
Use when:
- one schema per topic;
- simple event streams;
- topic is true compatibility boundary.
6.2 Record-Name Strategy
Subject tied to record/message full name:
com.acme.case.events.CaseApproved
Pros:
- good for multi-event topics;
- schema identity follows event type;
- record reuse possible;
- compatibility per event type.
Cons:
- same record on different topics shares compatibility boundary;
- topic-specific constraints not captured;
- governance must track topic separately.
Use when:
- domain topic carries multiple event types;
- event type is compatibility boundary;
- schema reused across topics intentionally.
6.3 Topic-Record Strategy
Subject tied to topic and record:
case-events-com.acme.case.events.CaseApproved
Pros:
- supports multi-event topic;
- separates same record used in different topics;
- compatibility boundary includes topic context.
Cons:
- longer names;
- duplicate schema versions across topics;
- more subjects.
Use when:
- same record may evolve differently per topic;
- topic context matters;
- governance wants strong isolation.
6.4 Artifact Group Strategy
If registry supports groups/namespaces:
group: case
artifactId: com.acme.case.events.CaseApproved
or:
group: acme.case.events
artifactId: CaseApproved
This improves organization and access control.
7. Key Schema vs Value Schema
Kafka record can have key and value schemas.
Key schema matters when key is structured.
Example key:
{
"tenantId": "tenant_123",
"caseId": "case_123"
}
If key is structured and serialized, govern its schema too.
Subject examples:
case-events-key
case-events-value
or:
com.acme.case.keys.CaseEventKey
com.acme.case.events.CaseApproved
Key schema changes can be breaking for partitioning/ordering/compaction.
Registry compatibility for value schema will not protect key contract unless key schema is also governed.
8. Compatibility Rule Levels
Registry products often support rules at different scopes.
Possible levels:
- global default;
- group/namespace;
- artifact/subject;
- version-specific metadata;
- environment-specific policy.
Precedence depends on product. Architecturally, more specific rule should override general rule only with governance.
8.1 Global Rule
Example:
globalCompatibility: BACKWARD
Good baseline. Dangerous if too permissive or disabled.
8.2 Group Rule
Example:
group: case
compatibility: BACKWARD_TRANSITIVE
Good for domain-level consistency.
8.3 Artifact Rule
Example:
artifact: com.acme.case.events.CaseApproved
compatibility: FULL_TRANSITIVE
For high-risk event.
8.4 Exception Rule
Exceptions must be auditable.
artifact: com.acme.case.events.LegacyCaseUpdated
compatibility: NONE
reason: Legacy migration artifact. Producer and consumers locked to migration window.
expiresAt: 2026-12-31
approvedBy: architecture-review-board
No permanent silent NONE.
9. Compatibility Modes
Common modes:
| Mode | Meaning |
|---|---|
| NONE | no compatibility check |
| BACKWARD | new schema can read latest previous data |
| BACKWARD_TRANSITIVE | new schema can read all previous data |
| FORWARD | old schema can read new data |
| FORWARD_TRANSITIVE | all previous schemas can read new data |
| FULL | backward + forward latest |
| FULL_TRANSITIVE | backward + forward against all versions |
Choose based on event lifecycle.
9.1 Recommended Defaults
| Artifact class | Suggested mode |
|---|---|
| stable event stream | BACKWARD_TRANSITIVE or FULL_TRANSITIVE |
| internal short-lived event | BACKWARD |
| command message | BACKWARD or FULL depending old consumers |
| compacted snapshot | BACKWARD_TRANSITIVE often useful |
| public/partner schema | FULL_TRANSITIVE or strict custom |
| experimental schema | BACKWARD with expiry or isolated group |
| legacy migration | exception with time-bound NONE only if unavoidable |
Do not blindly use one mode for everything. But never leave critical streams as NONE.
10. Format-Specific Compatibility
Compatibility is format-specific.
10.1 Avro
Important factors:
- reader/writer schema resolution;
- defaults;
- unions;
- aliases;
- enum symbols/defaults;
- logical types;
- namespace/name.
10.2 Protobuf
Important factors:
- field number reuse;
- reserved numbers/names;
- wire type compatibility;
- enum number reuse;
- oneof changes;
- service method changes if registry checks
.proto; - package changes.
10.3 JSON Schema
Important factors:
- type changes;
- required properties;
- additionalProperties;
- enum changes;
- constraint tightening;
- composition changes;
- nullability.
A registry compatibility pass means “under this registry’s algorithm.” It does not guarantee semantic compatibility.
11. Schema References
Schemas often reuse common types.
Avro example:
{
"type": "record",
"name": "CaseApproved",
"namespace": "com.acme.case.events",
"fields": [
{
"name": "metadata",
"type": "com.acme.common.EventMetadata"
}
]
}
Protobuf import:
import "acme/events/v1/event_metadata.proto";
JSON Schema $ref:
{
"$ref": "https://schemas.acme.com/common/EventMetadata.schema.json"
}
11.1 Registry Reference Requirements
Registry should support:
- registering referenced schemas;
- resolving references during serialization/deserialization;
- versioning references;
- checking compatibility with references;
- preventing deletion of referenced artifact;
- showing dependency graph.
11.2 Common Schema Blast Radius
Changing common schema like EventMetadata or Money affects many artifacts.
Governance:
- stricter review;
- compatibility tests across dependents;
- semantic versioning;
- deprecation path;
- dependency graph;
- consumer communication.
12. Environment Strategy
Environments:
dev
test
staging
prod
Bad strategy:
Each environment has unrelated schema IDs and manual schema edits.
Better:
Schemas are promoted as immutable artifacts from lower to higher environments.
12.1 Promotion Principles
- schema content immutable once approved;
- same artifact identity across environments;
- version metadata preserved;
- compatibility checked before promotion;
- prod registry not edited manually;
- rollback strategy defined;
- old schemas retained for replay;
- environment-specific IDs handled carefully.
12.2 Same IDs Across Environments?
Some registries generate IDs per environment. Do not require numeric global IDs to match unless platform supports it.
Instead rely on:
- artifactId/subject;
- version;
- content hash;
- metadata;
- promotion records.
Runtime messages in prod use prod registry IDs.
13. Registry Access Control
Access levels:
| Role | Permission |
|---|---|
| platform admin | manage registry config |
| schema owner | create/update artifacts in owned group |
| producer runtime | register/lookup schema if allowed |
| consumer runtime | read schemas by ID |
| auditor | read metadata/history |
| catalog service | read all approved metadata |
| CI service | validate/register/publish under controlled identity |
13.1 Producer Auto-Registration
Producer auto-registration can be convenient but risky.
Pros:
- developer speed;
- automatic schema registration;
- fewer deployment steps.
Cons:
- runtime can publish unreviewed schema;
- prod registry changes outside CI;
- accidental schema drift;
- compatibility exceptions harder to control.
Recommended for production:
autoRegisterSchemas: false
useLatestVersion: false
schemaRegistration: CI/CD only
runtimeProducer: lookup existing schema ID
This avoids runtime surprise.
13.2 Consumer Access
Consumers generally need read access.
But data classification may restrict schema discovery too, because schema names/fields can reveal sensitive domains.
14. Registry Metadata
Artifact metadata should include:
- owner team;
- domain;
- lifecycle;
- data classification;
- PII flag;
- compatibility mode;
- subject naming strategy;
- topic/channel references;
- AsyncAPI/OpenAPI references;
- introduced date;
- deprecation status;
- replacement artifact;
- documentation URL;
- approval record;
- tags/labels.
Example:
artifactId: com.acme.case.events.CaseApproved
groupId: case
type: AVRO
ownerTeam: case-management-platform
lifecycle: stable
dataClassification: confidential
pii: false
compatibility: BACKWARD_TRANSITIVE
topics:
- case-events
messageKey: metadata.aggregateId
asyncApi: case-events.asyncapi.yaml#/components/messages/CaseApproved
introducedAt: 2026-06-29
Registry without metadata is only half useful.
15. Registry and Event Catalog
Schema registry stores artifacts. Event catalog explains integration.
Registry answers:
What schema versions exist?
Are they compatible?
What is schema ID?
Catalog answers:
What does this event mean?
Who publishes it?
Which topic?
Who consumes it?
What are ordering/replay semantics?
Is it deprecated?
They should be linked, not confused.
Schema registry can be source for schema artifacts. Catalog is broader contract discovery layer.
16. Subject Naming Decision Matrix
| Scenario | Recommended |
|---|---|
| one schema per topic | topic-name strategy acceptable |
| multiple event types per topic | record-name or topic-record |
| schema reused across topics | record-name if same compatibility boundary |
| same record evolves differently per topic | topic-record |
| high governance by domain | group + artifact identity |
| compacted snapshot | subject tied to snapshot record |
| command topic | command message subject |
| key schema | separate key subject/artifact |
Document selected strategy globally.
17. Registry Integration in Java
17.1 Producer Configuration
Conceptual properties:
schema.registry.url=https://schema-registry.acme.internal
auto.register.schemas=false
use.latest.version=false
Serializer-specific configs depend on registry and serializer.
17.2 Consumer Configuration
schema.registry.url=https://schema-registry.acme.internal
specific.avro.reader=true
For Protobuf/JSON Schema serializers, configuration differs.
17.3 Java Boundary
Producer should not hand-register arbitrary schema at runtime in prod.
Preferred:
- schema registered in CI;
- producer artifact built against schema version;
- runtime serializer looks up exact schema;
- startup check verifies required schemas exist;
- metrics/logs expose schemaRef/schemaId.
18. Startup Schema Verification
Producer startup can verify required schemas exist:
@Component
public class SchemaStartupVerifier {
private final SchemaRegistryClient registryClient;
@PostConstruct
void verifySchemas() {
verify("com.acme.case.events.CaseApproved", "7");
verify("com.acme.case.events.CaseSubmitted", "3");
}
private void verify(String artifactId, String version) {
if (!registryClient.exists(artifactId, version)) {
throw new IllegalStateException(
"Required schema missing: " + artifactId + ":" + version
);
}
}
}
This fails fast before producer emits invalid/unregistered events.
19. CI/CD Registry Workflow
19.1 Required CI Checks
- parse schema;
- enforce naming convention;
- enforce required metadata;
- compatibility check;
- reference resolution;
- generated code compile;
- golden sample validation;
- AsyncAPI link validation;
- Kafka key contract check;
- artifact metadata validation.
20. Registry Rule Policy
Example:
registryPolicy:
global:
validity: enabled
compatibility: BACKWARD
groups:
case:
compatibility: BACKWARD_TRANSITIVE
artifactTypes:
- AVRO
- JSON
- ASYNCAPI
customer:
compatibility: FULL_TRANSITIVE
artifacts:
com.acme.case.events.CaseApproved:
compatibility: BACKWARD_TRANSITIVE
ownerTeam: case-management-platform
lifecycle: stable
exceptions:
- artifact: com.acme.legacy.LegacyCustomerEvent
compatibility: NONE
expiresAt: 2026-12-31
approvedBy: architecture-review-board
reason: migration-only stream
Policy should be machine-validated.
21. Compatibility Exceptions
Exception process:
- requester explains change;
- impact analysis generated;
- consumers identified;
- migration plan attached;
- exception expiry date set;
- approver recorded;
- telemetry monitored;
- rollback plan defined.
No permanent exception without review.
Exception record:
exceptionId: schema-exc-2026-06-29-001
artifact: com.acme.case.events.LegacyCaseUpdated
requestedCompatibility: NONE
reason: one-time migration from legacy schema
approvedBy: platform-architecture
expiresAt: 2026-09-30
consumerImpact: known migration consumers only
rollback: restore previous schema and stop migration producer
22. Registry Auditability
Registry should answer:
- who created artifact?
- who changed compatibility mode?
- who registered version?
- what changed?
- when was it promoted to prod?
- which CI build registered it?
- which approval was attached?
- which consumers were impacted?
- which schemas reference it?
- is it deprecated?
Audit is not optional in regulated domains.
23. Deleting Schemas
Avoid hard-deleting schemas used by historical messages.
If old messages contain schema ID, consumer replay needs schema.
Preferred:
- mark deprecated;
- disable new registration;
- prevent new producer usage;
- keep schema readable;
- archive metadata;
- hide from default catalog view if needed.
Hard delete only if:
- schema never reached prod;
- no messages exist;
- legal/security requires removal;
- migration plan exists.
Schema deletion can make old Kafka data unreadable.
24. Registry Backup and DR
Schema registry is critical infrastructure.
Backup requirements:
- artifact content;
- versions;
- global IDs/content IDs;
- compatibility rules;
- metadata;
- access control config;
- audit log;
- references.
DR questions:
- can consumers deserialize old messages after registry restore?
- are schema IDs preserved?
- can producers continue if registry temporarily unavailable?
- are schemas cached?
- is cache eviction safe?
- what is startup behavior when registry unavailable?
Runtime serializers often cache schemas. But cache is not a DR strategy.
25. Multi-Region and Multi-Cluster Registry
Questions:
- one global registry or per-region registry?
- are artifact IDs consistent?
- are numeric schema IDs replicated?
- how are schema promotions coordinated?
- can region A consume region B events?
- what happens during failover?
- do data residency rules restrict schema/content?
- are compatibility rules identical?
Possible models:
| Model | Pros | Cons |
|---|---|---|
| central registry | single governance point | latency/availability dependency |
| per-region registry with promotion | regional resilience | ID drift, promotion complexity |
| registry replication | consistent runtime | operational complexity |
| build-time embedded schema | less runtime dependency | less dynamic/generic |
Document model explicitly.
26. Registry and Data Classification
Schema itself can reveal sensitive information:
national_id
fraud_score
blacklist_reason
medical_condition
Registry access should consider metadata sensitivity.
For restricted schemas:
- limited read access;
- masked catalog fields if needed;
- approval for consumer onboarding;
- audit reads;
- classification metadata mandatory;
- DLQ access aligned.
Data classification belongs in both schema metadata and event catalog.
27. Schema Registry Anti-Patterns
27.1 Compatibility NONE Everywhere
Fast today, broken tomorrow.
27.2 Runtime Auto-Register in Prod
Unreviewed schema changes enter production.
27.3 Subject Naming Chaos
Every team invents naming. Discovery and compatibility fail.
27.4 Topic-Name Strategy for Multi-Event Topic Without Plan
Unrelated event types share one compatibility subject.
27.5 Deleting Old Schemas
Replay breaks.
27.6 Registry as Only Documentation
Schema says shape, not event meaning.
27.7 No Ownership Metadata
Nobody approves changes.
27.8 Common Schema Changed Without Impact Analysis
Mass breakage.
27.9 Environment Drift
Dev/staging/prod schemas differ silently.
27.10 Ignoring Key Schema
Value schema governed, key contract breaks.
27.11 Manual Prod Edits
Audit and reproducibility lost.
27.12 Assuming Registry Compatibility Equals Business Safety
Semantic break passes structural check.
28. Registry Architecture Blueprint
Key architecture principles:
- contract repository is source for changes;
- registry is authoritative runtime artifact store;
- CI gates all new versions;
- prod registry is not manually edited;
- catalog links schema to business semantics;
- access control enforces ownership;
- audit log preserves governance evidence;
- old schemas remain readable.
29. Review Checklist
29.1 Identity
- Is artifact/subject name stable?
- Is naming strategy correct for topic/event model?
- Is group/namespace correct?
- Is key schema governed if structured?
- Is schemaRef included in event envelope?
29.2 Compatibility
- Is compatibility mode appropriate?
- Is transitive compatibility required?
- Are exceptions time-bound?
- Does registry check pass?
- Are semantic changes reviewed separately?
29.3 References
- Are referenced schemas registered?
- Is common schema blast radius known?
- Are reference versions pinned or intentionally latest?
- Is dependency graph visible?
29.4 Environments
- Is schema promoted via CI?
- Are dev/staging/prod aligned?
- Are numeric IDs handled correctly?
- Is manual prod edit blocked?
29.5 Access and Audit
- Are producer/consumer permissions correct?
- Is owner metadata present?
- Are changes auditable?
- Is schema deletion restricted?
- Is data classification present?
29.6 Runtime
- Is auto-registration disabled in prod if policy requires?
- Do producers verify schema availability?
- Can consumers resolve old schemas during replay?
- Are registry outages handled?
30. Practice Lab
Lab 1 — Naming Strategy
Given topic case-events contains:
CaseSubmitted;CaseApproved;CaseClosed.
Choose subject/artifact naming strategy and justify.
Lab 2 — Registry Policy
Design registry policy for:
- stable case events;
- experimental fraud events;
- public partner schemas;
- legacy migration schemas.
Include compatibility modes and exception process.
Lab 3 — Environment Promotion
Design promotion flow from PR to prod registry. Include validation, compatibility, code generation, approval, and rollback.
Lab 4 — Common Schema Change
EventMetadata needs new required jurisdiction. Design safe rollout.
Lab 5 — Access Control
Define roles and permissions for:
- platform admin;
- domain schema owner;
- producer runtime;
- consumer runtime;
- auditor;
- catalog service;
- CI service.
Lab 6 — Registry Incident
A schema was hard-deleted and consumers cannot replay old events. Design recovery and prevention plan.
31. Senior Engineer Heuristics
- Schema registry is governance infrastructure, not only serializer support.
- Subject naming strategy determines compatibility boundary.
- Topic-name strategy is simple but weak for multi-event topics.
- Record-name strategy fits event-type compatibility.
- Topic-record strategy fits topic-specific isolation.
- Compatibility NONE must be rare, approved, and temporary.
- Prod schema registration should be controlled by CI/CD, not accidental runtime.
- Old schemas are needed for replay. Do not delete casually.
- Common schemas have huge blast radius.
- Schema registry does not document event semantics; catalog/AsyncAPI must.
- Registry compatibility is structural, not business semantic proof.
- Key schema and value schema both matter.
- Environment drift breaks reproducibility.
- Access control must consider schema sensitivity.
- A schema without owner is unmanaged public infrastructure.
32. Summary
Schema registry architecture is central to schema governance. It defines how artifacts are named, versioned, validated, checked for compatibility, referenced, promoted, accessed, audited, and consumed at runtime.
Main takeaways:
- registry stores and governs schemas/artifacts;
- subject/artifact naming defines compatibility boundaries;
- topic-name, record-name, and topic-record strategies have different trade-offs;
- compatibility rules should exist at global/group/artifact levels;
- Avro, Protobuf, JSON Schema, OpenAPI, and AsyncAPI can all be governed as artifacts;
- references and common schemas need dependency-aware governance;
- environment promotion must be controlled and auditable;
- prod auto-registration should be treated carefully;
- old schemas must remain available for replay;
- registry must be linked to catalog and semantic documentation.
Part berikutnya membahas compatibility governance in more depth: backward, forward, full, transitive, semantic compatibility, decision records, exceptions, risk-based approval, and consumer impact scoring.
You just completed lesson 22 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.
Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.