Series MapLesson 79 / 80
Final StretchOrdered learning track

Learn Build From Scratch Recommendations System Part 079 End To End Capstone Architecture Review

8 min read1449 words
PrevNext
Lesson 7980 lesson track6780 Final Stretch

title: Build From Scratch Recommendations System - Part 079 description: End-to-end capstone architecture review untuk enterprise recommendation system: requirements, domain boundaries, service architecture, APIs, event/data flow, candidate/ranking/reranking, feature/model lifecycle, observability, governance, security, failure modes, and architecture review checklist. series: learn-build-from-scratch-recommendations-system seriesTitle: Build From Scratch: Enterprise Recommendations System order: 79 partTitle: End-to-End Capstone Architecture Review tags:

  • recommendation-system
  • recsys
  • capstone
  • architecture-review
  • system-design
  • enterprise
  • series date: 2026-07-02

Part 079 — End-to-End Capstone Architecture Review

Kita sudah membahas RecSys dari banyak sisi:

  • domain model,
  • events,
  • data foundation,
  • implicit feedback,
  • baselines,
  • candidate generation,
  • retrieval,
  • embeddings,
  • ANN index,
  • ranking,
  • reranking,
  • exploration,
  • causal thinking,
  • LLM augmentation,
  • serving path,
  • feature store,
  • profile store,
  • MLOps,
  • batch scoring,
  • caching,
  • fault tolerance,
  • evaluation,
  • experimentation,
  • observability,
  • privacy,
  • safety,
  • security,
  • multi-tenant config,
  • cost,
  • operating model,
  • domain implementation tracks.

Part ini menyatukan semuanya menjadi capstone architecture review.

Tujuannya bukan membuat diagram cantik saja.

Tujuannya adalah melatih cara berpikir architect/staff/principal engineer saat menilai apakah sebuah recommendation system siap dibangun, dioperasikan, dan dikembangkan secara enterprise-grade.


1. Capstone Scenario

Kita akan review architecture untuk platform:

Enterprise Recommendation Platform

Yang melayani tiga domain:

  1. E-commerce recommendations.
  2. Content feed recommendations.
  3. B2B/internal workflow recommendations.

Platform harus mendukung:

  • multiple surfaces,
  • multiple tenants,
  • personalization and non-personalized modes,
  • candidate generation from many sources,
  • ranking/reranking,
  • safety and policy,
  • experiment,
  • observability,
  • batch and online serving,
  • model lifecycle,
  • privacy and security,
  • production operations.

2. Architecture Review Mindset

Architecture review bukan bertanya:

Apakah teknologinya keren?

Tetapi:

Apakah sistem ini menyelesaikan problem dengan benar?
Apakah contract jelas?
Apakah data loop sehat?
Apakah failure mode aman?
Apakah bisa di-debug?
Apakah bisa di-scale?
Apakah bisa dipantau?
Apakah governance cukup?
Apakah tim bisa mengoperasikannya?

Good architecture is operable architecture.


3. Non-Goals

Capstone ini bukan:

  • deep dive ulang semua materi,
  • implementasi penuh codebase,
  • rekomendasi satu vendor/tool tertentu,
  • klaim bahwa semua komponen harus dibuat dari nol,
  • desain yang harus dipakai semua perusahaan.

Ini adalah architecture review template.

Adaptasikan ke domain dan maturity organisasi.


4. Top-Level Requirements

Functional requirements:

serve recommendations for multiple surfaces
support personalized/contextual/non-personalized modes
support multiple candidate sources
rank and rerank candidates
enforce eligibility, safety, policy, privacy, permission
log decisions/impressions/actions
support offline training/evaluation
support online experiments
support batch/precomputed recommendations
support debug and replay
support tenant-specific configuration

Non-functional requirements:

low latency
high availability
safe degradation
observability
security
privacy
auditability
cost control
model lifecycle governance
operational ownership

5. Quality Attribute Targets

Example SLOs:

home_feed:
  availability: 99.9%
  p95_latency_ms: 200
  p99_latency_ms: 500
  empty_slate_rate: <0.1%
  decision_log_success: >99.9%
  critical_policy_violation: 0

enterprise_next_action:
  availability: 99.95%
  p95_latency_ms: 300
  permission_violation: 0
  audit_log_success: >99.99%
  safe_empty_allowed: true

Quality targets differ by surface.


6. Domain Boundaries

Core bounded contexts:

Recommendation Serving
Candidate Generation
Eligibility and Policy
Ranking
Slate Construction
Feature/Profile
Event and Feedback
Training and Evaluation
Model/Index Registry
Experimentation
Configuration
Observability and Debugging
Governance/Security/Privacy

Boundaries prevent “one giant recommendation service” chaos.


7. Reference Architecture Diagram

This is the full loop.


8. Online Serving Request Flow

Key architecture point:

Rec API orchestrates deadline, fallback, trace, and decision integrity.

9. Request Contract Review

A request must carry:

request_id
subject
surface
context
privacy mode
tenant if applicable
limit
debug flag

Review questions:

Can subject be spoofed?
Is tenant required?
Is privacy mode resolved server-side?
Is surface valid?
Can caller request debug?
Are context dimensions complete enough for eligibility?

A weak request contract creates downstream chaos.


10. Response Contract Review

Response must include:

request_id
slate_id
items
positions
tracking tokens
reason codes
metadata

Internal metadata should include:

model version
policy version
fallback flag
experiment variants

External clients should not receive sensitive raw scores unless intended.


11. Candidate Generation Review

Candidate layer should answer:

Which sources exist?
What is each source’s responsibility?
What are source quotas?
What is source latency?
What is source recall?
What source produced each item?
How are candidates deduped?
How are stale/invalid candidates filtered?

A good candidate layer has provenance and diagnostics.


12. Candidate Portfolio Review

Example portfolio:

home_feed:
  personalized_two_tower:
    quota: 800
    criticality: preferred
  recent_item_similar:
    quota: 300
    criticality: preferred
  trending_safe:
    quota: 200
    criticality: fallback
  editorial_safe:
    quota: 50
    criticality: fallback

Review:

  • enough recall?
  • enough diversity?
  • cold-start path?
  • anonymous/no-consent path?
  • fallback path?
  • safety filter?

13. Retrieval/Embedding Review

Review:

embedding family defined?
query/item embedding compatible?
index versioned?
ANN recall benchmark exists?
delta index exists for freshness?
deletion/tombstone path exists?
multi-tenant filter safe?
index publish atomic?
rollback possible?

Vector retrieval is powerful but dangerous without versioning.


14. Eligibility and Policy Review

Eligibility should enforce:

item/action active
recommendable on surface
region/locale/age constraints
tenant/permission
user suppression
frequency cap
inventory/availability
safety policy
business campaign validity

Review:

Which checks are hard?
Which checks are soft?
Which fail closed?
Which can use stale cache?
Are reason codes logged?
Is final validation present?

15. Feature Store Review

Review:

feature definitions versioned?
feature ownership exists?
freshness SLA defined?
offline/online parity validated?
privacy metadata present?
missing/default policy defined?
feature cost known?
feature monitoring exists?

Features are production APIs, not ad hoc columns.


16. Profile Store Review

Review:

long-term vs session state separated?
suppression state fresh?
identity merge/split handled?
consent changes handled?
profile reset/delete handled?
hot keys handled?
profile debug view safe?

Bad profile state causes bad personalization and privacy risk.


17. Ranking Review

Review:

ranking objective explicit?
model type appropriate?
features available online?
calibration needed?
score composition governed?
latency within budget?
fallback ranker exists?
model version logged?
segment evaluation passed?

The ranker should be replaceable behind a stable contract.


18. Reranking/Slate Review

Review:

diversity constraints?
frequency/fatigue?
business rules?
sponsored caps/disclosure?
fairness/exposure?
exploration slots?
relevance floor?
layout constraints?
final hard checks?

User experiences a slate, not independent item scores.


19. Event and Feedback Review

Events must include:

decision logs
impressions
viewability if needed
actions/clicks
conversions
negative feedback
deletion/suppression
catalog/policy changes

Review:

Are events idempotent?
Are schemas versioned?
Are event_time and ingestion_time tracked?
Can actions join to impressions?
Are tracking tokens reliable?
Is event lag monitored?

No event loop, no learning loop.


20. Training Dataset Review

Review:

dataset spec versioned?
temporal split?
point-in-time joins?
label windows mature?
negative sampling policy?
privacy filters?
tenant scope?
data quality gates?
lineage?

Training data quality is model quality.


21. Model Lifecycle Review

Review:

model registry exists?
model bundle includes feature set/calibration/runtime?
offline metrics stored?
segment metrics stored?
shadow/canary/production lifecycle?
rollback bundle?
owner?
approval?
monitoring?

No registry means model chaos.


22. Batch/Precomputed Review

Review:

which surfaces use batch?
subject selection consent-aware?
list TTL?
final online eligibility?
store more than visible count?
lineage?
batch validation?
rollback?
fallback?

Precomputed lists are stale by design; compensate safely.


23. Experimentation Review

Review:

experiment registry?
deterministic assignment?
randomization unit?
exposure logging?
treatment-applied logging?
cache variant isolation?
primary metric?
guardrails?
SRM check?
segment analysis?
rollback?

Experiments are production deployments.


24. Offline Evaluation Review

Review:

retrieval recall?
ranking NDCG/MRR/AUC/logloss?
calibration?
slate diversity/novelty?
negative feedback?
segments?
cold-start?
counterfactual limitations documented?

Offline evaluation should screen, not prove.


25. Observability Review

Dashboards should cover:

online serving
candidate generation
feature freshness/missing
ranking score distribution
slate quality
feedback events
experiments
model drift
embedding/index health
business metrics
tenant health

Review:

Can we debug a bad recommendation?
Can we detect silent quality regression?
Can we attribute change to model/config/source?

26. Debug/Replay Review

Debug should answer:

why this item?
why this position?
why not filtered?
which source?
which features?
which model?
which rules?
which experiment?
which fallback?

Replay should support:

  • request context,
  • candidate set,
  • feature snapshot,
  • model/policy version,
  • random seed.

Access-controlled.


27. Privacy Review

Review:

privacy modes defined?
consent checked before personalization?
non-personalized path real?
feature privacy metadata?
training privacy filters?
deletion workflow?
retention?
debug redaction?
LLM context minimization?

Privacy is architecture, not UI.


28. Safety Review

Review:

policy taxonomy?
recommendability by surface?
candidate generation filters?
final validation?
tombstone/denylist?
trust signals?
abuse-resistant trending?
safety guardrails in experiments?
kill switches?
incident runbook?

Safety must be multi-layered.


29. Security Review

Review:

service auth?
subject authorization?
tenant isolation?
feature access control?
debug RBAC/audit?
model artifact integrity?
vector index access?
admin approval?
secret management?
LLM least privilege?

RecSys is a data aggregator and security boundary.


30. Multi-Tenant Review

Review:

effective config resolver?
tenant ID propagated?
tenant-specific config versioned?
tenant-specific model/rules/features/fallback?
global mandatory safety non-overridable?
tenant dashboards?
tenant onboarding/offboarding?
config audit?

Enterprise support needs explainable tenant config.


31. Cost/Capacity Review

Review:

QPS by surface?
candidate scores/sec?
feature values/sec?
vector queries/sec?
model inference cost?
cache hit/stale metrics?
batch scoring volume?
LLM cost?
tenant cost?
degradation modes?

If cost is not designed, it becomes surprise.


32. Fault Tolerance Review

Review:

dependency timeouts?
circuit breakers?
bulkheads?
fallback hierarchy?
fail-open vs fail-closed matrix?
safe empty?
event logging degradation?
model fallback?
cache outage plan?
runbooks?

Good RecSys fails safely.


33. Governance Review

Review:

artifact owners?
model review?
experiment review?
feature review?
rule/config review?
tenant config review?
on-call?
incident process?
objective governance?
cost ownership?
deprecation process?

Governance makes architecture sustainable.


34. Architecture Decision Records

Important ADRs:

ADR-001: modular monolith vs microservices
ADR-002: recommendation unit by domain
ADR-003: event schema standard
ADR-004: candidate source contract
ADR-005: ranking model deployment strategy
ADR-006: feature store architecture
ADR-007: privacy mode enforcement
ADR-008: tenant isolation strategy
ADR-009: experiment assignment unit
ADR-010: fallback/degradation policy

ADR captures reasoning and trade-offs.


35. Example Service Boundary Decision

Decision:

Start with modular monolith for Rec API, candidate, ranking, reranking.
Split model runtime and event ingestion separately.

Reason:

  • reduce distributed complexity,
  • keep hot path simple,
  • preserve module boundaries,
  • independent event ingestion scaling.

Review risk:

  • monolith growth,
  • team coupling,
  • deployment blast radius.

Mitigation:

  • strict modules,
  • interface boundaries,
  • extraction plan.

36. Example Data Flow Decision

Decision:

Kafka for decision/impression/action events.
Raw -> clean -> curated data layers.
Training dataset builder uses curated events and feature snapshots.

Reason:

  • asynchronous feedback loop,
  • idempotent processing,
  • replay/backfill support,
  • training lineage.

Review risk:

  • event lag,
  • schema drift,
  • tracking token bugs.

Mitigation:

  • schema registry,
  • event quality dashboards,
  • join-rate monitoring.

37. Example Candidate Decision

Decision:

Candidate generation portfolio: two-tower, item-to-item, content, trending, editorial.

Reason:

  • complementary recall,
  • cold-start support,
  • fallback source.

Review risk:

  • source overlap,
  • latency,
  • invalid candidates,
  • source score inconsistency.

Mitigation:

  • source contribution funnel,
  • quotas,
  • dedup,
  • source normalization,
  • fallback.

38. Example Ranking Decision

Decision:

Use GBDT ranker first, then deep model after data/feature platform matures.

Reason:

  • strong tabular baseline,
  • debuggable,
  • low latency,
  • easier feature importance.

Review risk:

  • sequence/multimodal signals underused.

Mitigation:

  • add embeddings as features,
  • train deep challenger in shadow,
  • compare offline/online.

39. Example Privacy Decision

Decision:

Support three privacy modes: personalized, contextual_only, non_personalized.

Reason:

  • consent-aware serving,
  • regulatory/product flexibility,
  • graceful fallback.

Review risk:

  • duplicated pipelines,
  • model quality lower in non-personalized mode.

Mitigation:

  • shared contextual sources,
  • specific ranker route,
  • privacy regression tests.

40. Example Tenant Decision

Decision:

Shared infrastructure with logical tenant isolation for most tenants.
Dedicated vector index for high-risk/large tenants.

Reason:

  • cost efficiency,
  • security flexibility.

Review risk:

  • cross-tenant leakage in shared mode.

Mitigation:

  • tenant-aware keys,
  • permission final check,
  • security tests,
  • tenant dashboards.

41. Architecture Smell: Model-Centric Design

Smell:

Architecture deck starts with neural model and ignores events, logging, safety, fallback.

Problem:

  • cannot learn reliably,
  • cannot debug,
  • cannot operate.

Fix:

  • start with decision loop,
  • then model.

42. Architecture Smell: No Final Eligibility

Smell:

Candidate sources are assumed clean.

Problem:

  • stale index/cache/precomputed lists leak invalid items.

Fix:

  • final eligibility/tombstone check mandatory.

43. Architecture Smell: No Version Tags

Smell:

Metrics show CTR drop but no model/policy/source version dimension.

Problem:

  • root cause slow.

Fix:

  • version tags everywhere.

44. Architecture Smell: One Global Objective

Smell:

All surfaces use same ranking objective.

Problem:

  • home, PDP, cart, push, enterprise actions have different goals.

Fix:

  • surface-specific objective/config/model route.

45. Architecture Smell: Cache Without Privacy/Tenant Dimensions

Smell:

cache key = surface:user

Problem:

  • privacy/tenant/experiment contamination.

Fix:

  • typed cache keys include tenant/privacy/context/version.

46. Architecture Smell: Offline Metrics Only

Smell:

Model launched because NDCG improved.

Problem:

  • online causality missing.

Fix:

  • shadow/canary/A-B with guardrails.

47. Architecture Smell: No Operating Owner

Smell:

Every artifact exists, no one owns it.

Problem:

  • stale features, broken dashboards, config drift.

Fix:

  • ownership registry and review cadence.

48. End-to-End Review Checklist

[ ] Requirements and non-goals are explicit.
[ ] Surfaces and objectives are defined.
[ ] Recommendation units are defined.
[ ] Request/response contracts are versioned.
[ ] Event contracts are versioned.
[ ] Candidate source contract exists.
[ ] Candidate provenance is logged.
[ ] Eligibility/policy/final validation exist.
[ ] Feature registry/store design exists.
[ ] Profile/session/suppression design exists.
[ ] Ranking contract and model route exist.
[ ] Reranking/slate policy exists.
[ ] Tracking tokens and decision logs exist.
[ ] Offline training dataset builder exists.
[ ] Model/index registries exist.
[ ] Evaluation gates exist.
[ ] Experiment infrastructure exists.
[ ] Observability dashboards/alerts exist.
[ ] Debug/replay capability exists.
[ ] Privacy modes and consent enforcement exist.
[ ] Safety/policy enforcement exists.
[ ] Security/tenant isolation exists.
[ ] Cost/capacity model exists.
[ ] Fault tolerance/degradation plan exists.
[ ] Operating ownership/governance exists.

49. Capstone Architecture Review Questions

Use these in architecture review meeting:

What happens if the ranker times out?
What happens if consent is unknown?
What happens if item is deleted after index build?
How do we explain why item was shown?
How do we know model v2 is better?
How do we roll back model/index/config?
How do we prevent tenant data leakage?
How do we support no-personalization users?
How do we know feature X is stale?
How do we attribute click to recommendation?
How do we detect source degradation?
How do we keep cost bounded?
How do we handle bad recommendation report?
How do we prevent sponsored from overriding safety?
How do we onboard a new tenant?

If architecture cannot answer, it is not ready.


50. Minimal Capstone Implementation Roadmap

Phase 1 — Production Skeleton

API
candidate sources
eligibility
heuristic ranker
slate policy
events
decision logs
dashboard
fallback
privacy/safety basics

Phase 2 — Learning Loop

clean events
profile updates
training dataset builder
offline evaluation
GBDT ranker
model registry
A/B testing

Phase 3 — Retrieval and MLOps

two-tower retrieval
embeddings
ANN index versioning
feature store
batch scoring
model lifecycle
drift monitoring

Phase 4 — Enterprise and Governance

multi-tenant config
security hardening
privacy deletion
safety classifiers
cost attribution
operating model

Phase 5 — Advanced Optimization

bandits
causal/OPE
deep/multimodal models
LLM augmentation
fairness/exposure optimization

51. Final Architecture Summary

A production-grade recommendation system is not one algorithm.

It is a closed-loop decision platform:

data -> candidates -> ranking -> slate -> exposure -> feedback -> learning -> deployment -> monitoring -> governance

Core architecture principle:

Every decision must be explainable, observable, versioned, safe, and improvable.

If a system cannot say why it recommended something, cannot learn from feedback, cannot roll back, cannot respect consent, cannot enforce policy, and cannot be monitored, it is not production-grade no matter how good the model seems offline.


52. Kesimpulan

Capstone architecture review menyatukan seluruh seri menjadi practical review framework.

Prinsip utama:

  1. Architecture review evaluates operability, not just design elegance.
  2. RecSys is a closed-loop decision platform.
  3. Contracts, events, versioning, and decision logs are foundational.
  4. Candidate/ranking/reranking must be separated but integrated.
  5. Privacy, safety, security, and tenant isolation are architecture concerns.
  6. Offline evaluation and online experiments serve different roles.
  7. Observability/debug/replay determine incident velocity.
  8. Cost/capacity/fault tolerance must be designed early.
  9. Governance and ownership are part of system architecture.
  10. Every recommendation should be traceable from request to feedback and back to learning.

Di Part 080, kita akan menutup seri dengan Production Readiness Checklist and Series Close — checklist lengkap untuk menilai readiness sebelum launch, scale, and long-term operation.

Lesson Recap

You just completed lesson 79 in final stretch. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.