Learn Build From Scratch Recommendations System Part 058 Model Registry And Model Lifecycle
title: Build From Scratch Recommendations System - Part 058 description: Mendesain model registry dan model lifecycle production-grade untuk recommendation system: artifact metadata, model bundle, dataset/feature lineage, evaluation gates, approval, shadow, canary, deployment, rollback, monitoring, retraining, retirement, dan governance. series: learn-build-from-scratch-recommendations-system seriesTitle: Build From Scratch: Enterprise Recommendations System order: 58 partTitle: Model Registry and Model Lifecycle tags:
- recommendation-system
- recsys
- model-registry
- mlops
- model-lifecycle
- deployment
- series date: 2026-07-02
Part 058 — Model Registry and Model Lifecycle
Recommendation system production-grade tidak cukup dengan model file seperti:
ranker.pkl
model.bin
model.onnx
Kita butuh tahu:
- model ini dilatih dari dataset apa,
- feature set apa,
- label version apa,
- code version apa,
- metrics apa,
- calibration artifact apa,
- utility policy apa,
- siapa owner,
- status approval,
- apakah sedang shadow/canary/production,
- bagaimana rollback,
- model apa yang digantikan,
- kapan harus retrain,
- apakah masih dipakai.
Model registry dan model lifecycle menjadikan model sebagai artifact yang governed, reproducible, deployable, and auditable.
Part ini membahas desain model registry dan lifecycle production-grade untuk recommendation system: metadata, model bundle, lineage, evaluation gates, approval, deployment stages, shadow/canary, rollback, monitoring, retraining, retirement, and governance.
1. Mental Model: Model Is a Versioned Decision Artifact
Model bukan hanya weight.
Model untuk serving membutuhkan bundle:
model artifact
feature set
feature schema
normalization stats
categorical vocab
calibration
utility policy
runtime
metadata
Jika salah satu mismatch, prediction bisa salah.
Model registry adalah source of truth untuk model artifacts and lifecycle state.
2. Model Registry Responsibilities
Registry stores:
- model metadata,
- artifact URI,
- model version,
- feature set version,
- training dataset version,
- label version,
- code/container version,
- hyperparameters,
- evaluation metrics,
- calibration artifact,
- serving runtime,
- approval status,
- deployment status,
- owner,
- lineage,
- rollback relationship,
- retirement state.
Registry should not necessarily store raw model bytes; it can store artifact references.
3. Model Versioning
Use immutable model version.
Example:
home_ranker_20260702_001
Do not overwrite model version.
Bad:
latest_model.bin
Good:
model_name: home_ranker
model_version: home_ranker_20260702_001
artifact_uri: s3://models/home_ranker/20260702_001/model.onnx
Serving route can point to latest production, but artifact version immutable.
4. Model Name vs Model Version vs Route
Distinguish:
model_name: home_ranker
model_version: home_ranker_20260702_001
route: home_feed_ranker_production
Route points to version.
route: home_feed_ranker_production
active_model_version: home_ranker_20260702_001
Rollback changes route pointer, not model file.
5. Model Bundle
Model bundle includes everything needed to serve.
model_bundle:
model_name: home_ranker
model_version: home_ranker_20260702_001
artifact_uri: ...
runtime: gbdt_runtime_v3
feature_set_version: home_features_v18
feature_schema_version: home_feature_schema_v18
calibration_version: home_click_purchase_hide_cal_v5
utility_policy_version: home_utility_v7
categorical_vocab_versions:
category_id: category_vocab_v6
normalization_stats_version: dense_norm_v4
training_dataset_version: home_ranker_ds_20260701_001
Deploy bundle atomically.
6. Artifact Metadata
Metadata example:
model_name: home_ranker
model_version: home_ranker_20260702_001
model_type: gbdt_pointwise
objective: multi_task_utility_components
owner: recsys-ranking
created_at: 2026-07-02T03:00:00Z
training_job_id: train_abc123
code_version: git_sha_...
container_image: recsys-train:20260702
random_seed: 42
status: candidate
This enables reproducibility.
7. Lineage Metadata
Lineage:
training_dataset_version: home_ranker_ds_20260701_001
feature_set_version: home_features_v18
label_versions:
click_30m: v3
purchase_7d: v2
hide_7d: v1
negative_sampling_policy: exposed_no_click_v4
source_data:
clean_events: 20260701
decision_logs: 20260701
Lineage allows root cause analysis.
8. Hyperparameters
Store hyperparameters.
For GBDT:
num_trees: 800
learning_rate: 0.05
max_depth: 8
min_data_in_leaf: 100
subsample: 0.8
For deep model:
embedding_dim: 128
layers: [512, 256, 128]
dropout: 0.1
batch_size: 4096
optimizer: Adam
learning_rate: 0.0005
Hyperparameters are part of reproducibility.
9. Evaluation Metrics
Registry should store metrics.
Examples:
offline_metrics:
ndcg_at_20: 0.412
auc_click: 0.781
logloss_click: 0.221
calibration_ece_click: 0.012
guardrail_metrics:
predicted_hide_rate: 0.019
cold_item_ndcg_at_20: 0.301
new_user_ndcg_at_20: 0.287
latency_estimate_ms:
p95: 14
Metrics should include segment metrics, not only global.
10. Segment Metrics
Store metrics by:
- surface,
- category,
- region,
- user tenure,
- item age,
- candidate source,
- tenant,
- privacy mode.
Example:
segments:
new_user:
ndcg_at_20: 0.29
warm_user:
ndcg_at_20: 0.44
new_item:
coverage: 0.72
Global metrics can hide regressions.
11. Evaluation Gates
Model promotion gate:
gates:
ndcg_at_20:
min_relative_improvement: 0.005
calibration_ece_click:
max: 0.02
latency_p95_ms:
max: 20
new_user_ndcg:
max_relative_regression: 0.01
hide_rate_prediction:
max_relative_increase: 0.02
feature_compatibility:
required: pass
If gate fails, model remains candidate.
Automate gates.
12. Compatibility Gates
Before serving, validate:
feature set exists online
feature types match
all required features available
vocab versions present
normalization stats present
calibration compatible
runtime supports artifact
model input dimension matches
privacy class allowed
Compatibility failure should block deployment.
13. Model Status Lifecycle
Statuses:
created
candidate
validated
rejected
shadow
canary
production
rolled_back
deprecated
archived
State machine:
Transitions should be audited.
14. Approval Workflow
Approval may require:
- ML owner,
- product owner,
- data quality,
- safety/policy,
- platform/serving,
- enterprise tenant owner.
Approval metadata:
approved_by:
- user: ml_lead
time: 2026-07-02T04:00:00Z
- user: product_owner
time: 2026-07-02T04:10:00Z
approval_notes: "Passes offline gates; shadow 24h planned."
Critical systems need governance.
15. Shadow Deployment
Shadow model scores live traffic but does not affect users.
Registry stores:
deployment_stage: shadow
traffic_sample: 5%
started_at: ...
comparison_baseline: home_ranker_20260701_001
Shadow metrics:
- latency,
- error rate,
- feature missing,
- score distribution,
- top-K overlap,
- segment score drift.
Shadow catches serving bugs.
16. Canary Deployment
Canary affects small traffic.
Example:
stage: canary
traffic: 1%
assignment_unit: user
guardrails:
hide_rate: no_increase
latency_p95: < 200ms
rollback_on_alert: true
Canary should use experiment assignment/logging.
Registry tracks canary status and metrics.
17. Production Promotion
Production promotion updates route pointer.
route: home_feed_ranker_production
old_model: home_ranker_20260701_001
new_model: home_ranker_20260702_001
activated_at: 2026-07-02T08:00:00Z
Keep old model available for rollback.
Do not delete old model immediately.
18. Rollback
Rollback:
route pointer back to previous model bundle
Must include:
- model artifact,
- feature set,
- calibration,
- vocab,
- normalization,
- utility policy.
If rollback only changes model but leaves incompatible feature schema, rollback fails.
Rollback uses bundle.
19. Rollback Triggers
Triggers:
- latency spike,
- error rate,
- feature missing,
- score distribution anomaly,
- online metric regression,
- guardrail violation,
- policy incident,
- model server crash,
- calibration drift severe.
Rollback should be fast and safe.
Manual or automated depending maturity.
20. Model Runtime Metadata
Registry should know runtime.
Examples:
gbdt_runtime_v3
onnx_runtime_v1
tensorflow_serving
torchserve
custom_java_runtime
Runtime compatibility:
- model format,
- Java client,
- CPU/GPU,
- memory,
- batch size,
- latency.
Serving cannot load arbitrary model artifact.
21. Model Artifact Validation
Before registry accepts artifact:
- checksum,
- format valid,
- can load runtime,
- sample inference passes,
- input/output schema correct,
- no malware/untrusted binary,
- artifact size within limit.
For enterprise, artifact provenance matters.
22. Model Card / Documentation
Model registry should include model card-like info:
purpose
intended surfaces
training data period
features used
objectives
known limitations
segments evaluated
privacy considerations
safety considerations
owner
rollback plan
This is useful for governance and onboarding.
23. Model Dependency Graph
Track dependencies:
model -> feature set
model -> calibration artifact
model -> utility policy
model -> vocab
model -> normalization stats
model -> dataset
model -> code version
Dependency graph helps:
- feature deprecation,
- incident blast radius,
- reproducibility,
- compliance.
24. Model Registry API
Core operations:
registerModel
getModelVersion
listModels
updateStatus
promoteModel
rollbackRoute
getActiveModelForRoute
getModelLineage
getModelsUsingFeature
Serving path mostly calls:
getActiveModelForRoute
Control plane calls promotion/status operations.
25. Serving Integration
Ranking service startup:
- Resolve active model route.
- Fetch model bundle metadata.
- Download/load artifact.
- Load calibration/vocab/norm stats.
- Validate feature schema.
- Run warmup inference.
- Mark ready.
Runtime refresh:
- periodically poll route pointer,
- or receive deployment event,
- load new model side-by-side,
- switch after health check.
26. Model Warmup
Warmup:
load artifact
initialize runtime
run sample batch
check latency
validate outputs finite
check feature schema
Do not send first live request to cold model.
Deep models may need GPU warmup.
27. Side-by-Side Loading
For safe switch:
old model remains loaded
new model loaded and warmed
traffic switch
old model kept for rollback window
Memory planning must account for side-by-side.
28. Model Monitoring
Monitor online:
prediction distribution
score distribution
feature missing
latency
error rate
calibration proxy
top item/category distribution
source contribution
fallback rate
business metrics
guardrails
By model version.
Registry should link production metrics back to model version.
29. Model Drift
Drift types:
- feature drift,
- prediction drift,
- label drift,
- calibration drift,
- candidate distribution drift,
- segment drift.
When drift exceeds threshold:
- alert,
- recalibrate,
- retrain,
- rollback if severe,
- investigate data pipeline.
Drift monitoring is lifecycle trigger.
30. Retraining Triggers
Retrain when:
scheduled cadence
data drift
model quality regression
new features available
new candidate source launched
catalog distribution changed
seasonality
policy/objective changed
new surface/region/tenant
calibration drift
Not every retrain should auto-promote.
Retrained model must pass gates.
31. Scheduled Retraining
Cadence examples:
ranking model: daily/weekly
retrieval model: daily/weekly
content embeddings: on content change/daily
calibration: daily/weekly
cold-start priors: daily
Choose based on data velocity and operational cost.
32. Continuous Training Risk
Continuous training without gates can ship bad models.
Risks:
- event bug trains bad model,
- label delay incomplete,
- feature drift,
- data poisoning,
- seasonal anomaly,
- training instability.
Automated retraining should still require validation gates and canary.
33. Champion-Challenger
Champion:
current production model
Challenger:
candidate new model
Compare:
- offline metrics,
- shadow metrics,
- canary metrics,
- business metrics.
Only promote challenger if it beats champion within guardrails.
34. Multiple Models per Surface
One surface may use multiple models:
retrieval two-tower
pre-ranker
ranker
hide risk model
calibration model
reranker policy
Registry should handle different artifact types.
Do not assume only one model.
35. Multi-Tenant Models
Options:
- global model,
- tenant-specific calibration,
- tenant-specific model,
- tenant-specific route.
Registry route can include tenant:
route: tenant_123_case_action_ranker
Need governance and fallback.
Small tenants may not have enough data for own model.
36. Privacy and Compliance in Registry
Registry metadata should include:
- training data privacy class,
- personal data usage,
- sensitive features,
- tenant data scope,
- retention,
- deletion compliance,
- approval status.
For regulated domains, model lineage and data usage matter.
37. Model Retirement
Retire model when:
- no route uses it,
- rollback window expired,
- compliance retention satisfied,
- artifacts archived,
- metadata retained.
Do not delete metadata. Archive it.
Model may be needed for audit/replay.
38. Artifact Retention
Keep:
production models
recent rollback candidates
models used in experiments
models needed for audit
Archive:
failed candidates
old experiment models
deprecated artifacts
Balance storage cost and audit/replay requirements.
39. Model Incident Response
Incident:
bad recommendations after model deploy
Steps:
- identify active model route/version,
- inspect deployment time,
- compare previous model,
- check feature/policy versions,
- rollback if needed,
- preserve logs,
- root cause,
- update gates/tests.
Registry makes step 1-4 fast.
40. Model Registry Anti-Patterns
40.1 Artifact Without Metadata
Cannot reproduce.
40.2 Mutable Latest Model
Rollback impossible.
40.3 Feature Set Not Tracked
Serving mismatch.
40.4 No Approval Status
Unreviewed model ships.
40.5 No Shadow/Canary State
Deployment process invisible.
40.6 Model and Calibration Separate Untracked
Wrong calibration used.
40.7 No Segment Metrics
Global improvement hides harm.
40.8 No Rollback Bundle
Rollback incomplete.
40.9 No Dependency Graph
Feature deletion breaks model.
40.10 No Retirement Process
Registry becomes graveyard.
41. Implementation Sketch: Model Metadata
public record ModelMetadata(
String modelName,
String modelVersion,
String modelType,
String artifactUri,
String runtime,
String owner,
Instant createdAt,
String trainingDatasetVersion,
String featureSetVersion,
String calibrationVersion,
String utilityPolicyVersion,
String codeVersion,
ModelStatus status,
Map<String, Double> offlineMetrics,
Map<String, String> dependencies
) {}
This is registry core.
42. Implementation Sketch: Model Status
public enum ModelStatus {
CANDIDATE,
VALIDATED,
REJECTED,
SHADOW,
CANARY,
PRODUCTION,
ROLLED_BACK,
DEPRECATED,
ARCHIVED
}
Transitions should be controlled.
43. Implementation Sketch: Model Route
public record ModelRoute(
String routeName,
String activeModelVersion,
Optional<String> previousModelVersion,
String surface,
Optional<String> tenantId,
Instant activatedAt,
String activatedBy
) {}
Serving resolves model by route.
44. Implementation Sketch: Registry Interface
public interface ModelRegistry {
ModelMetadata register(ModelRegistrationRequest request);
ModelMetadata getModel(String modelVersion);
ModelRoute getRoute(String routeName);
ModelRoute promote(String routeName, String modelVersion, PromotionRequest request);
ModelRoute rollback(String routeName, RollbackRequest request);
List<ModelMetadata> findModelsUsingFeature(String featureName, String featureVersion);
}
Control plane uses this interface; serving mainly reads route/model.
45. Implementation Sketch: Evaluation Gate
public interface EvaluationGate {
GateResult evaluate(ModelMetadata candidate, ModelMetadata champion);
}
public record GateResult(
boolean passed,
String gateName,
String message,
Map<String, Double> metrics
) {}
Example:
if (candidate.offlineMetrics().get("ndcg_at_20")
< champion.offlineMetrics().get("ndcg_at_20") * 1.005) {
return GateResult.fail("ndcg_improvement", "NDCG improvement too small");
}
Gates should be configurable.
46. Minimal Production Model Registry Plan
Start with:
registry:
immutable_model_versions: true
artifact_uri: required
model_status_lifecycle: true
route_pointer: true
metadata:
training_dataset_version: required
feature_set_version: required
label_versions: required
code_version: required
metrics: required
deployment:
shadow_status: true
canary_status: true
production_status: true
rollback_previous: true
validation:
feature_compatibility_gate: true
offline_metric_gate: true
latency_gate: true
monitoring:
model_version_metrics: true
drift_alerts: true
governance:
owner: required
approval: required_for_production
This is enough to prevent most model lifecycle chaos.
47. Checklist Model Registry and Lifecycle Readiness
[ ] Model versions are immutable.
[ ] Model route points to active version.
[ ] Model bundle includes feature/calibration/vocab/norm/runtime.
[ ] Training dataset version is recorded.
[ ] Feature set version is recorded.
[ ] Label and negative sampling versions are recorded.
[ ] Code/container version is recorded.
[ ] Offline and segment metrics are stored.
[ ] Evaluation gates are automated.
[ ] Compatibility gates run before deploy.
[ ] Approval workflow exists.
[ ] Shadow stage exists.
[ ] Canary stage exists.
[ ] Production promotion updates route atomically.
[ ] Rollback uses previous compatible bundle.
[ ] Online monitoring is by model version.
[ ] Drift triggers retrain/recalibration/investigation.
[ ] Model dependency graph exists.
[ ] Retirement/archive process exists.
48. Kesimpulan
Model registry dan lifecycle management membuat model recommendation bisa dioperasikan sebagai production artifact yang aman, reproducible, dan governed.
Prinsip utama:
- Model is a versioned decision artifact, not just weights.
- Serving requires model bundle, not single file.
- Feature set, dataset, labels, calibration, vocab, and runtime must be tracked.
- Model versions should be immutable.
- Routes point to active production versions.
- Evaluation and compatibility gates block bad models.
- Shadow and canary reduce deployment risk.
- Rollback must restore compatible bundle.
- Monitoring and drift are lifecycle triggers.
- Retirement keeps registry clean while preserving auditability.
Di Part 059, kita akan membahas Training Orchestration and Reproducibility: bagaimana training jobs, dataset specs, feature snapshots, random seeds, environments, metrics, artifacts, and reruns dikelola agar model bisa direproduksi dan dipercaya.
You just completed lesson 58 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.
Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.