Learn Build From Scratch Recommendations System Part 044 Diversity Novelty And Serendipity
title: Build From Scratch Recommendations System - Part 044 description: Mendesain diversity, novelty, dan serendipity dalam recommendation system production-grade: taxonomy diversity, intra-list similarity, long-tail exposure, user novelty, calibrated serendipity, metrics, trade-offs, reranking, dan guardrails. series: learn-build-from-scratch-recommendations-system seriesTitle: Build From Scratch: Enterprise Recommendations System order: 44 partTitle: Diversity, Novelty, and Serendipity tags:
- recommendation-system
- recsys
- diversity
- novelty
- serendipity
- reranking
- series date: 2026-07-02
Part 044 — Diversity, Novelty, and Serendipity
Recommendation system yang hanya mengejar relevance score tertinggi sering menjadi sempit.
User klik kamera, lalu semua rekomendasi menjadi kamera.
User menonton satu video Kafka, lalu semua slate berisi Kafka.
User membeli satu produk bayi sebagai hadiah, lalu profile berubah permanen.
Top creator mendominasi feed.
Item populer semakin populer.
Produk baru tidak pernah punya kesempatan.
Enterprise user melihat artikel policy yang sama berulang, sementara context sudah berubah.
Diversity, novelty, dan serendipity adalah mekanisme untuk menjaga recommendation system tetap sehat, eksploratif, dan tidak membosankan.
Namun variasi yang buruk juga merusak relevance. Jika diversity membuat user melihat item random, sistem kehilangan trust.
Part ini membahas cara mendesain diversity, novelty, dan serendipity secara production-grade: definisi, metrics, reranking, trade-offs, segmenting, long-tail, user controls, exploration, dan failure modes.
1. Mental Model: Relevance Is Necessary but Not Sufficient
Relevance menjawab:
Apakah item ini cocok?
Diversity menjawab:
Apakah slate ini terlalu seragam?
Novelty menjawab:
Apakah item ini baru/kurang familiar untuk user?
Serendipity menjawab:
Apakah item ini tak terduga tetapi tetap berguna?
Good recommendation is not:
most similar items only
It is:
relevant enough + diverse enough + fresh enough + useful enough
2. Definitions
Diversity
Variation within a slate or across exposure.
Examples:
different categories
different creators
different topics
different price ranges
different item types
different sources
Novelty
Unfamiliarity.
Examples:
item user has not seen
new creator
long-tail item
new topic adjacent to interest
new product in category
Serendipity
Unexpected usefulness.
Example:
user likes Java backend; recommends distributed tracing course
It is not random. It is adjacent surprise.
3. Why Diversity Matters
Diversity helps:
- reduce boredom,
- prevent filter bubbles,
- improve discovery,
- support long-tail ecosystem,
- reduce overdependence on popularity,
- learn more about user,
- improve long-term retention,
- avoid repeated exposure,
- support marketplace fairness.
But too much diversity can reduce short-term CTR.
This is a trade-off, not universal good.
4. Why Novelty Matters
Novelty helps users discover new value.
Types:
new to user
new to platform
long-tail
new creator/seller
new topic
new format
fresh content
Novelty can improve:
- exploration,
- long-term engagement,
- marketplace growth,
- cold-start item learning,
- user satisfaction.
But novelty without relevance feels random.
5. Why Serendipity Matters
Serendipity is discovery beyond obvious similarity.
Obvious recommendation:
camera -> another camera
Serendipitous recommendation:
camera -> travel photography guide
camera -> lightweight tripod
camera -> photo editing course
Still relevant, but not identical.
Serendipity expands user intent intelligently.
6. Diversity Dimensions
Diversity can be measured along different dimensions.
category
topic
creator
seller
brand
price
format
modality
source
popularity bucket
freshness
geography
difficulty
workflow action type
document owner
Choose dimensions based on product.
For e-commerce PDP, diversity between substitutes/complements matters.
For content feed, topic/creator diversity matters.
For enterprise, action/document type diversity may matter.
7. Intra-List Similarity
Common diversity metric:
average pairwise similarity among items in slate
If items are very similar, diversity low.
ILS = average(sim(item_i, item_j))
diversity = 1 - ILS
Similarity can be based on:
- category,
- embedding,
- creator,
- topic,
- graph community.
Embedding similarity gives semantic diversity. Category gives controllable diversity.
8. Category Entropy
Measure category spread.
entropy = -sum(p(category) * log(p(category)))
If all items same category, entropy low.
If spread across categories, entropy high.
But category entropy should not be maximized blindly.
If user explicitly searches “laptop”, too much category diversity is bad.
Context matters.
9. Coverage
Coverage measures how much catalog/ecosystem receives exposure.
Types:
catalog coverage
creator coverage
seller coverage
category coverage
long-tail coverage
new-item coverage
Example:
unique recommended items / eligible items
High coverage indicates system not only recommending same few items.
Coverage must be quality-aware. Exposing low-quality items just for coverage is bad.
10. Long-Tail Exposure
Popularity bias concentrates exposure.
Long-tail metrics:
exposure share by popularity bucket
click/conversion by popularity bucket
new item exposure
creator exposure distribution
Gini coefficient of exposure
Goal is not equal exposure to all items. Goal is healthy exposure opportunity conditioned on quality/relevance.
11. Personal Novelty
Personal novelty:
candidate not previously seen/consumed by user
Features:
user_has_seen_item
time_since_last_impression
user_has_seen_creator
user_has_seen_category
topic_novelty_to_user
Novelty can be per item, creator, topic, format.
A new item in familiar category may be moderately novel.
A new category far from preference may be too novel.
12. Distance from User Profile
Novelty can be measured by distance.
profile_similarity = sim(user_profile, item)
novelty = 1 - profile_similarity
But very low similarity means irrelevant.
Serendipity lies in middle:
not too similar, not too far
Think:
adjacent possible
13. Serendipity as Relevant Surprise
Serendipity requires both:
unexpectedness + usefulness
A random irrelevant item is surprising but not serendipitous.
Approximate:
serendipity_score =
relevance_score * unexpectedness_score
where unexpectedness is bounded.
Example:
serendipity = relevance * min(max_unexpectedness, profile_distance)
Need guardrails.
14. Diversity-Relevance Trade-Off
Diversity often trades off with immediate relevance.
If diversity penalty too strong:
- CTR drops,
- user sees irrelevant items,
- trust declines.
If too weak:
- repetitive slate,
- filter bubble,
- long-tail starvation.
Tune by:
- offline simulation,
- A/B tests,
- segment analysis,
- long-term metrics,
- user feedback.
15. Context-Dependent Diversity
Diversity need varies.
Search
If query specific, relevance dominates.
Home Feed
Diversity important.
Product Detail
Mix alternatives and complements.
Cart
Diversity less important than compatibility.
Email Digest
Avoid repetition; high confidence needed.
Enterprise Case
Diversity means covering relevant action/document types, not random topics.
Slate policy should be surface-specific.
16. User-Dependent Diversity
Users differ.
Some prefer focused recommendations.
Some like discovery.
Signals:
user exploration affinity
clicks on novel items
hide rate for diverse items
category breadth
session behavior
explicit controls
Diversity policy can be personalized.
Example:
if user often engages with new categories:
increase novelty
else:
keep closer to profile
Be careful with reinforcing narrowness too much.
17. Diversity in Candidate Generation vs Reranking
Diversity can be introduced at:
Candidate Generation
Include multiple sources and categories.
Ranking
Model learns novelty/diversity features.
Reranking
Enforce final slate constraints.
Reranking is usually best for slate-level diversity.
But candidate generation must provide diverse candidates first. Reranker cannot select diversity that is not in pool.
18. Maximal Marginal Relevance
MMR is a classic reranking idea.
Select item that balances relevance and dissimilarity to selected slate.
score(item) =
lambda * relevance(item)
- (1 - lambda) * max_similarity(item, selected_items)
If lambda high: relevance dominates.
If lambda low: diversity dominates.
MMR is simple and effective.
19. MMR Example
Candidates:
A score 0.95 camera
B score 0.93 camera
C score 0.88 tripod
D score 0.85 travel bag
Top-K by score:
A, B, C
MMR may choose:
A, C, D
if B too similar to A.
This creates more varied slate.
20. MMR Pseudocode
while (slate.size() < finalSize) {
best = null;
bestScore = -infinity;
for (candidate in remaining) {
relevance = candidate.rankScore();
similarityToSlate = maxSimilarity(candidate, slate);
mmrScore = lambda * relevance - (1 - lambda) * similarityToSlate;
if (mmrScore > bestScore && constraintsPass(candidate)) {
best = candidate;
bestScore = mmrScore;
}
}
add best to slate
}
Similarity function is key.
21. Similarity Function Choices
Similarity can use:
same category
same creator
same brand
embedding cosine
topic overlap
graph community
same source
same dedup group
price bucket similarity
workflow action type
Use multiple similarity dimensions.
Example:
sim = 0.4*embedding_sim
+ 0.3*same_category
+ 0.2*same_creator
+ 0.1*same_source
Weights should be surface-specific.
22. Determinantal Point Processes
DPP is another diversity method.
Concept:
select subset with high quality and dissimilarity
DPP can produce diverse sets elegantly, but can be complex/expensive.
Use cases:
- small slate,
- high value diversity,
- manageable candidate pool.
Most production systems start with greedy/MMR/constraints.
23. Diversity Constraints
Hard-ish constraints:
max_same_creator
max_same_category
max_same_brand
max_same_source
max_same_topic
These are simpler than similarity scoring.
Example:
max_same_creator: 2
max_same_category: 5
max_consecutive_same_category: 2
Constraints are easy to explain and debug.
But too rigid constraints can hurt relevance.
24. Soft Diversity Penalties
Penalty-based diversity:
adjusted_score = base_score - penalty_for_repetition
Example:
penalty = category_count_in_slate * 0.03
This lets highly relevant item still win if score gap large.
Soft penalties are flexible and usually safer than hard max for non-safety objectives.
25. Novelty Features
Features:
not_seen_item
time_since_last_seen_item
not_seen_creator_30d
not_seen_topic_30d
item_popularity_bucket
item_age
is_new_item
long_tail_score
profile_distance
source_exploration
Use these in ranking/reranking.
Novelty is user-relative and catalog-relative.
26. Novelty Constraints
Examples:
min_new_items_if_available: 1
max_repeated_items_seen_7d: 3
min_long_tail_slots_if_quality: 1
Use relevance and quality floors.
Do not recommend bad long-tail items just to satisfy novelty.
27. Serendipity Candidate Sources
Serendipity often needs special sources:
adjacent category graph
users similar but slightly broader
topic expansion
content graph neighbors
expert/editorial bridge
bundle/complement graph
knowledge graph path
Example:
user likes Kafka
adjacent topics: stream processing, event sourcing, schema registry
Candidate generation should produce adjacent candidates; reranker controls exposure.
28. Adjacent Expansion
Graph/taxonomy can define adjacency.
camera -> travel photography -> lightweight backpack
Java Kafka -> event-driven architecture -> observability
AML case -> suspicious transfer -> beneficial ownership article
Expansion should be:
near enough to be useful
far enough to be novel
Avoid large semantic jumps.
29. Exploration vs Serendipity
Exploration:
system tests uncertain candidates to learn
Serendipity:
user receives unexpectedly useful candidate
They overlap but differ.
Exploration can create serendipity, but can also be random.
Serendipity should still be relevance-aware.
30. Diversity and Cold-Start
Diversity helps new items get exposure.
But cold-start needs:
- quality gate,
- content relevance,
- exploration cap,
- feedback monitoring.
Reranking can reserve small share for new/long-tail items.
new_item_novelty:
max_slots: 2
min_quality: 0.8
min_relevance: 0.01
31. Diversity and Fairness
Diversity can support fairness/ecosystem health.
Examples:
- creator exposure distribution,
- seller exposure opportunity,
- category representation,
- avoid rich-get-richer loops.
But fairness policies must be explicit and domain-governed.
Do not hide fairness constraints inside vague diversity.
Part 046 will go deeper into fairness/exposure/marketplace health.
32. Diversity and Personalization
Too much personalization can reduce diversity.
Too much diversity can reduce personalization.
Balance through:
- candidate source mix,
- reranking penalties,
- user novelty preference,
- session intent confidence,
- exploration quotas.
Example:
if current query specific:
reduce diversity
if home feed idle browsing:
increase diversity
33. Diversity and Negative Feedback
If user hides a topic/creator, diversity system should not reintroduce it just to diversify.
User controls override diversity.
Example:
user blocked creator X
Hard suppress creator X.
Do not add it as “novel”.
34. Diversity in Enterprise
Enterprise diversity is not entertainment diversity.
Examples:
- include required action and supporting article,
- cover multiple risk indicators,
- avoid duplicate policy versions,
- include evidence checklist and escalation guidance,
- show documents from correct jurisdiction.
Diversity dimension:
action type
risk topic
document type
policy section
workflow stage
Goal: task coverage and decision support.
35. Measuring Serendipity
Serendipity is hard.
Approximate metrics:
novel item engaged
adjacent category engagement
unexpected-but-clicked rate
new topic follow-up
long-term retention after novel exposure
user saves/shares novel item
Survey/user feedback can help:
"Was this useful?"
"Show me more like this"
Avoid optimizing surprise alone.
36. User Feedback for Diversity
Controls:
show more like this
show less like this
not interested
hide creator
explore more
focus recommendations
These help personalize diversity.
If user consistently rejects novel items, reduce novelty.
If user engages with adjacent topics, expand.
37. Diversity A/B Test Metrics
Primary:
- CTR/CVR/watch/satisfaction,
- retention/session depth,
- negative feedback.
Diversity-specific:
category entropy
intra-list similarity
creator concentration
new item exposure
long-tail exposure
repeat exposure
serendipitous engagement
Guardrails:
- report/hide,
- conversion,
- latency,
- relevance proxy.
Test long enough to capture long-term value.
38. Offline Diversity Simulation
Given scored candidates, simulate reranking.
Compare policies:
topK baseline
MMR lambda 0.8
MMR lambda 0.6
category cap
creator cap
novelty boost
Measure:
- predicted utility loss,
- diversity gain,
- source/category mix,
- novelty exposure,
- constraint violations.
Pick candidates for online test.
39. Tuning Lambda / Penalties
MMR lambda:
lambda=1.0 -> pure relevance
lambda=0.0 -> pure diversity
Typical starting:
0.7 to 0.9
But depends on score scale and similarity.
If score not calibrated, lambda interpretation is unstable.
Normalize scores or tune empirically.
40. Diversity by Position
Top positions can remain high relevance; lower positions can explore.
Example:
positions 1-3: high confidence
positions 4-10: moderate diversity
positions 10+: more exploration/novelty
This reduces risk.
Surface-specific:
- email first item should be very high confidence,
- feed can mix,
- checkout top items should be compatible.
41. Diversity Without Candidate Pool
Reranker cannot diversify if all candidates same.
Candidate generation must produce:
- multiple sources,
- content/graph/long-tail candidates,
- exploration candidates,
- adjacent-topic candidates.
If diversity low at candidate pool, fix retrieval too.
Metric:
candidate_pool_diversity
final_slate_diversity
Track both.
42. Failure Modes
42.1 Randomness Disguised as Serendipity
Unexpected but useless.
42.2 Diversity Too Strong
Relevance drops.
42.3 Diversity Too Weak
Monotony/filter bubble.
42.4 Popularity Bias Persists
Long-tail gets no exposure.
42.5 Bad Similarity Function
Items considered diverse but user sees them as same.
42.6 User Controls Ignored
Hidden topics reintroduced.
42.7 Offline Diversity Improves, Online Trust Drops
Bad trade-off.
42.8 Candidate Pool Too Narrow
Reranker cannot help.
42.9 Novelty Overexposes Low-Quality New Items
Trust/safety issue.
42.10 Diversity Metric Optimized Blindly
Metric improves but product worsens.
43. Implementation Sketch: MMR Reranker
public final class MmrReranker {
private final SimilarityService similarityService;
private final double lambda;
public List<ScoredCandidate> rerank(List<ScoredCandidate> candidates, int k) {
List<ScoredCandidate> selected = new ArrayList<>();
List<ScoredCandidate> remaining = new ArrayList<>(candidates);
while (selected.size() < k && !remaining.isEmpty()) {
ScoredCandidate best = null;
double bestScore = Double.NEGATIVE_INFINITY;
for (ScoredCandidate c : remaining) {
double maxSimilarity = selected.stream()
.mapToDouble(s -> similarityService.similarity(c, s))
.max()
.orElse(0.0);
double mmr = lambda * c.rankScore() - (1.0 - lambda) * maxSimilarity;
if (mmr > bestScore) {
bestScore = mmr;
best = c;
}
}
selected.add(best);
remaining.remove(best);
}
return selected;
}
}
Production version adds hard constraints, diagnostics, and score component logging.
44. Implementation Sketch: Diversity Similarity
public final class DiversitySimilarityService {
public double similarity(ItemFeatures a, ItemFeatures b) {
double sim = 0.0;
if (a.categoryId().equals(b.categoryId())) {
sim += 0.35;
}
if (a.creatorId().equals(b.creatorId())) {
sim += 0.25;
}
sim += 0.30 * cosine(a.contentEmbedding(), b.contentEmbedding());
if (a.sourcePrimary().equals(b.sourcePrimary())) {
sim += 0.10;
}
return Math.min(sim, 1.0);
}
}
Weights must be tuned by domain/surface.
45. Minimal Production Diversity Plan
Start with:
candidate_generation:
require_multi_source_pool: true
reranking:
algorithm: greedy_mmr
lambda: 0.85
constraints:
max_same_creator: 2
max_same_category: 6
exact_dedup: true
novelty:
seen_recently_penalty: true
max_seen_count_7d: 5
exploration:
max_new_item_slots: 1
quality_floor: 0.8
metrics:
- intra_list_similarity
- category_entropy
- creator_concentration
- new_item_exposure
- repeat_exposure
- CTR/CVR/hide/report
Then personalize diversity and add serendipity sources.
46. Checklist Diversity, Novelty, Serendipity Readiness
[ ] Diversity dimensions are defined per surface.
[ ] Candidate pool diversity is measured.
[ ] Final slate diversity is measured.
[ ] Dedup and repetition controls exist.
[ ] Similarity function is domain-appropriate.
[ ] MMR or diversity-aware greedy reranker exists.
[ ] Diversity penalties/constraints are bounded.
[ ] Relevance floor exists for novelty/exploration.
[ ] User controls override diversity.
[ ] New/long-tail exposure is monitored.
[ ] Diversity metrics are evaluated with product metrics.
[ ] Offline simulation exists.
[ ] A/B testing includes guardrails.
[ ] Enterprise diversity means task coverage, not randomness.
[ ] Serendipity is measured as useful novelty, not surprise alone.
47. Kesimpulan
Diversity, novelty, dan serendipity menjaga recommendation system dari terlalu sempit, monoton, dan bias ke popularitas.
Prinsip utama:
- Relevance is necessary but not sufficient.
- Diversity is slate-level variation.
- Novelty is user/catalog unfamiliarity.
- Serendipity is unexpected usefulness, not randomness.
- Diversity dimensions must be domain/surface-specific.
- MMR and greedy penalties are strong practical baselines.
- Candidate pool must contain diverse options before reranking can help.
- Novelty/exploration need relevance and quality floors.
- User controls override diversity goals.
- Evaluate diversity with product and guardrail metrics, not alone.
Di Part 045, kita akan membahas Frequency Capping, Fatigue, and Repetition Control: bagaimana mengelola exposure dari waktu ke waktu agar user tidak terus-menerus melihat item/creator/topic yang sama.
You just completed lesson 44 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.
Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.