Build CoreOrdered learning track

Learn Build From Scratch Recommendations System Part 025 Multi Source Candidate Generation

[]12 min read2235 words

In This Lesson

1. Mental Model: Candidate Portfolio 2. Why One Source Is Not Enough 3. Candidate Portfolio by Surface

Lesson 2580 lesson track16–44 Build Core

title: Build From Scratch Recommendations System - Part 025 description: Mendesain multi-source candidate generation production-grade: source portfolio, blending, quotas, dedup, source normalization, fallback, exploration, source contribution, candidate recall, dan operability. series: learn-build-from-scratch-recommendations-system seriesTitle: Build From Scratch: Enterprise Recommendations System order: 25 partTitle: Multi-Source Candidate Generation tags:

recommendation-system
recsys
candidate-generation
retrieval
multi-source
system-design
series date: 2026-07-02

Part 025 — Multi-Source Candidate Generation

Candidate generation jarang cukup jika hanya memakai satu source.

Satu source punya blind spot.

Popularity kuat untuk cold-start user, tetapi bias terhadap item populer.
Content-based kuat untuk cold-start item, tetapi sering terlalu similar.
Item-to-item kuat untuk PDP, tetapi butuh seed item.
Matrix factorization kuat untuk collaborative retrieval, tetapi lemah untuk new items.
Graph kuat untuk multi-hop relationship, tetapi bisa mahal dan noisy.
Two-tower kuat untuk scalable personalized retrieval, tetapi butuh training data dan embedding infra.
Editorial kuat untuk safety/launch, tetapi tidak scale untuk semua user.
Rules kuat untuk enterprise constraints, tetapi tidak menangkap hidden preference.
Exploration penting untuk data masa depan, tetapi berisiko jika tidak dikontrol.

Karena itu production recommendation system biasanya memakai multi-source candidate generation.

Part ini membahas bagaimana mendesain portfolio candidate sources yang seimbang, kuat, observable, dan bisa dikembangkan.

1. Mental Model: Candidate Portfolio

Candidate generation adalah portfolio, bukan single model.

Setiap source berkontribusi berbeda.

Goal candidate portfolio:

high recall
+ source diversity
+ cold-start coverage
+ domain constraint safety
+ low latency
+ observable contribution

2. Why One Source Is Not Enough

2.1 Popularity Only

Good:

stable,
fast,
works for anonymous users.

Bad:

not personalized,
popularity bias,
long-tail poor,
stale if not refreshed.

2.2 Content-Based Only

Good:

cold item,
explainable,
metadata-aware.

Bad:

over-specialized,
misses behavioral complement,
metadata quality dependent.

2.3 Collaborative Only

Good:

hidden preference,
behavior-driven.

Bad:

cold-start weak,
popularity bias,
sparse for low activity users/items.

2.4 Two-Tower Only

Good:

scalable,
personalized,
embedding retrieval.

Bad:

training/serving complexity,
data bias,
model drift,
hard to explain alone.

2.5 Rules Only

Good:

safe,
deterministic,
enterprise-friendly.

Bad:

limited personalization,
maintenance burden,
misses latent relation.

A robust system blends these.

3. Candidate Portfolio by Surface

Different surface needs different portfolio.

3.1 Home Feed

Goal: discovery.

Sources:

two_tower personalized
matrix_factorization
content_based_session
trending_region
segment_popularity
editorial
fresh_item_exploration
graph_topic

3.2 Product Detail Page

Goal: related to seed item.

Sources:

content_similar
co_view_alternatives
co_buy_complements
compatible_accessories
same_category_popularity
recently_viewed_related

3.3 Cart

Goal: cross-sell and attach.

Sources:

frequently_bought_together
compatibility_graph
category_complements
low_return_addons
promotion_eligible_items

3.4 Search

Goal: explicit intent satisfaction.

Sources:

lexical_search
semantic_search
query_category_popularity
query_rewrite_retrieval
zero_result_relaxation

3.5 Enterprise Case Workflow

Goal: correct, authorized, useful next support.

Sources:

valid_actions_by_state_machine
policy_required_actions
knowledge_articles_by_case_topic
similar_cases_authorized
historical_success_actions
expert_curated_rules

Source portfolio must be surface-specific.

4. Candidate Policy as Control Plane

Define candidate policy in config.

surface: home_feed
policy_version: home-candidates-v6
sources:
  - name: two_tower
    enabled: true
    quota: 800
    timeout_ms: 40
    required: false
  - name: matrix_factorization
    enabled: true
    quota: 300
    timeout_ms: 20
    required: false
  - name: content_based_session
    enabled: true
    quota: 300
    timeout_ms: 30
    required: false
  - name: trending_region
    enabled: true
    quota: 150
    timeout_ms: 10
    required: false
  - name: editorial_safe
    enabled: true
    quota: 50
    timeout_ms: 5
    required: false
  - name: new_item_exploration
    enabled: true
    quota: 50
    timeout_ms: 10
    required: false
minimum_pool_size: 700
fallback_sources:
  - segment_popularity
  - global_popularity
merge:
  keep_multi_source_provenance: true
  exact_dedup: true
  dedup_group_policy: one_per_group_before_ranker

Candidate policy is product logic.

It should be versioned, reviewed, monitored, and experimentable.

5. Source Quota Strategy

Quota controls how many candidates each source can contribute.

Reasons:

prevent domination,
control latency,
ensure source diversity,
bound feature/ranking cost,
allow exploration.

Quota examples:

two_tower: 800
item_cf: 300
content_based: 300
trending: 100
editorial: 50
exploration: 50

Quota does not mean final slate quota. Ranker can choose fewer/more from each source after scoring.

But candidate pool should provide enough variety.

6. Static vs Dynamic Quotas

Static quota:

always two_tower 800, content 300

Simple.

Dynamic quota adapts by context:

if user is new:
  reduce two_tower, increase popularity/content/editorial

if seed item exists:
  increase item-to-item/content-similar

if user has strong session intent:
  increase session/content/query source

if feature store degraded:
  increase baseline fallback

if no consent:
  disable behavioral sources

Example:

quota_rules:
  - condition: user_history_count < 3
    overrides:
      two_tower: 200
      segment_popularity: 500
      editorial_safe: 100
      content_based_contextual: 300
  - condition: privacy_mode == non_personalized
    disable:
      - two_tower
      - matrix_factorization
      - user_graph

Dynamic quotas make candidate generation smarter but require careful observability.

7. Source Applicability Matrix

Document source applicability per surface/context.

Source	Home	PDP	Cart	Search	Enterprise Case
popularity	yes	fallback	fallback	yes	limited
trending	yes	limited	no	limited	no
content-based	yes	yes	limited	yes	yes
item-to-item	yes if history	yes	yes	limited	yes
MF	yes	maybe	no	no	limited
graph	yes	yes	yes	yes	yes
two-tower	yes	maybe	maybe	yes	possible
editorial	yes	yes	yes	yes	yes
valid actions	no	no	no	no	yes

Applicability should be encoded, not tribal knowledge.

8. Parallel Execution

Sources should run in parallel under timeout.

Slow source should not block all sources unless required.

Use:

per-source timeouts,
cancellation,
partial results,
fallback if pool too small,
circuit breaker for unhealthy sources.

9. Required vs Optional Sources

Some sources are optional:

trending,
MF,
content-based,
editorial.

Some may be required:

enterprise valid action source,
policy eligibility source,
authorization-aware retrieval,
safety source.

If required source fails:

fail closed or return safe fallback

Example:

valid_action_source:
  required: true
  failure_mode: fail_closed

For consumer home feed:

two_tower:
  required: false
  failure_mode: fallback_to_popularity

Requiredness is domain/surface-specific.

10. Candidate Merge

When multiple sources return same item, merge.

Before merge:

Candidate(source=A,item=X,score=0.8)
Candidate(source=B,item=X,score=0.5)

After merge:

{
  "item_id": "X",
  "sources": [
    {"source": "two_tower", "score": 0.8, "rank": 12},
    {"source": "trending", "score": 0.5, "rank": 4}
  ],
  "source_count": 2
}

Multi-source presence is a valuable signal.

Candidate appearing from independent sources often deserves attention.

But do not simply add raw scores with incompatible semantics.

11. Source Score Normalization

Scores have different units.

Possible normalization:

Rank-Based

normalized_score = 1 / log(1 + source_rank)

Simple and robust.

Percentile Within Source

score_percentile = percentile(source_score among returned candidates)

Z-score Within Source

(score - mean) / std

Works if distribution stable.

Learned by Ranker

Pass raw source score + source ID to ranker. Let ranker learn.

Source Calibration

Calibrate source score to probability/lift if possible.

For candidate aggregation, often avoid cross-source score ordering. Let ranker handle.

12. Dedup Across Sources

Dedup exact item ID first.

Then maybe dedup group depending stage.

Example:

product variants all returned from content source
same product returned from MF and trending
same article canonical URL
same video reupload
same enterprise document version

Dedup policy options:

Early Dedup

Reduce ranker cost.

Late Dedup

Let ranker see variants and choose.

Group Candidate

Represent product family as candidate, resolve offer/variant later.

For e-commerce:

candidate at product-family level for home feed
offer/SKU resolution later for checkout

Dedup policy must be surface-specific.

13. Eligibility Filtering Stage

Multiple filters:

Source-level filter

Cheap/coarse.

item active
surface allowed
tenant boundary if required

Aggregator-level filter

Common filters after merge.

item_type allowed
region available
dedup group

Final serving filter

Strict request-specific.

user suppression
policy
stock
permission
consent

Do not rely on source-level filtering only.

Track filter reasons.

14. Candidate Pool Quality

A candidate pool can be large but poor.

Quality indicators:

candidate_recall@K
source diversity
item type coverage
category diversity
cold item coverage
long-tail coverage
filter rate
duplicate rate
source overlap
final slate contribution

A pool of 5000 candidates all from same category/source may have poor quality.

Candidate generation should optimize portfolio quality, not count.

15. Minimum Pool Size and Fallback

If candidate pool after filtering is too small, fallback.

Example:

minimum_pool_size: 500
fallback:
  - segment_popularity
  - category_popularity
  - global_popularity
  - editorial_safe

Fallback can be additive or replacement.

if pool size < 500:
    add segment popularity until pool reaches 500

Always log fallback.

If fallback is used frequently, candidate sources are unhealthy or overly narrow.

16. Source Contribution Funnel

Track candidate lifecycle.

Metrics per source:

generated_count
merged_count
eligible_count
ranked_top_count
final_slate_count
click_count
conversion_count
hide_count
report_count

This tells whether a source is useful or wasteful.

17. Source Interaction Effects

A source may be useful even if few final items.

Example:

exploration source rarely final, but discovers long-tail.
editorial source has low CTR but protects safety/freshness.
content source improves cold-start.
trending source spikes during events.

Do not remove source based only on final clicks. Look at strategic role.

Metrics should be interpreted by source purpose.

18. Candidate Recall Analysis

Evaluate per source:

source_recall@K
portfolio_recall@K
marginal_recall_gain

Marginal recall:

Recall(all sources) - Recall(all sources except source X)

This shows source unique value.

A source with low individual recall but high marginal recall for cold items may be valuable.

Measure by segment:

new users,
new items,
low-activity users,
long-tail,
high-value categories,
enterprise roles.

19. Source Overlap

Sources can overlap heavily.

Overlap matrix:

	two_tower	MF	content	trending
two_tower	1.0	0.45	0.20	0.10
MF	0.45	1.0	0.15	0.08
content	0.20	0.15	1.0	0.05
trending	0.10	0.08	0.05	1.0

High overlap may mean redundancy. But overlap with independent sources can signal confidence.

Track:

source_pair_overlap_rate
average_source_count_per_candidate
unique_candidates_by_source

20. Candidate Portfolio for Cold Start

New User

Use:

contextual popularity,
editorial,
trending,
content from onboarding/query/session,
region/category popularity,
exploration.

Reduce:

user MF,
long-term CF,
user graph.

New Item

Use:

content-based,
editorial/new arrival,
exploration,
category/creator prior,
graph metadata edges.

Collaborative sources will underrepresent new items.

Candidate policy should explicitly handle cold-start.

21. Candidate Portfolio for Privacy Modes

Privacy mode affects source availability.

Personalized Mode

All allowed sources.

Contextual Mode

No behavioral user history, but current context/seed/query allowed depending consent.

Non-Personalized Mode

Only global/contextual/editorial sources.

Example:

privacy_mode_sources:
  personalized:
    - two_tower
    - mf
    - item_cf
    - graph_user
    - content_session
    - popularity
  contextual:
    - seed_content
    - query_retrieval
    - region_popularity
    - editorial
  non_personalized:
    - global_popularity
    - editorial
    - trending_public

Candidate policy must enforce this, not rely on source teams remembering.

22. Candidate Portfolio for Enterprise

Enterprise source portfolio should prioritize correctness.

Example case recommendation:

sources:
  valid_actions_by_state:
    required: true
    quota: 50
  policy_required_actions:
    required: true
    quota: 20
  knowledge_articles_by_case_topic:
    required: false
    quota: 100
  similar_cases_authorized:
    required: false
    quota: 100
  historical_success_actions:
    required: false
    quota: 50
filters:
  - tenant_boundary
  - actor_permission
  - case_state_validity
  - jurisdiction
  - policy_version_valid

Candidate source that returns unauthorized item is a serious incident.

23. Exploration as a Source

Exploration source generates candidates not fully optimized for current predicted score.

Goals:

collect data for new items,
reduce bias,
test uncertain candidates,
improve long-tail coverage,
estimate propensities.

Candidate must include:

{
  "source": "exploration_new_items",
  "exploration_policy": "new-item-v2",
  "propensity": 0.015,
  "exploration_reason": "cold_start_item"
}

Ranker/reranker may reserve slots or apply controlled randomization.

Exploration should be capped and monitored.

24. Sponsored / Business Candidates

If system has sponsored/promoted/business-rule candidates, treat them as candidate source with explicit provenance.

{
  "source": "sponsored_campaign",
  "campaign_id": "camp_123",
  "bid": 1.25,
  "eligibility_status": "coarse_filtered",
  "disclosure_required": true
}

Do not hide business source as organic recommendation.

Ranking/reranking must enforce:

disclosure,
policy,
relevance floor,
user experience guardrails,
campaign budget,
frequency cap.

25. Source Governance

Each candidate source needs owner.

Source registry:

source: matrix_factorization
owner_team: recsys-ml
purpose: collaborative personalized retrieval
allowed_surfaces:
  - home_feed
  - email_digest
privacy:
  requires_personalization_consent: true
dependencies:
  - user_vector_store
  - item_vector_index
freshness:
  model_retrain: daily
  index_refresh: daily

Candidate source without owner becomes operational risk.

26. Candidate Source Lifecycle

Lifecycle:

Shadow

Source runs but does not affect final ranking. Log candidates.

Limited Traffic

Small experiment.

Production

Enabled by candidate policy.

Deprecated

No new traffic; kept for fallback/model dependency if needed.

This prevents risky source launches.

27. Shadow Evaluation

Before enabling source:

run in shadow,
measure recall contribution,
filter rate,
latency,
overlap,
candidate quality,
safety violations,
cold-start coverage.

Shadow source can reveal:

returns too many unavailable items
latency too high
overlaps 95% with existing source
great recall for cold items

Do not send to ranker before basic quality is known.

28. Candidate Source A/B Testing

Experiment knobs:

enable source,
change quota,
change model version,
change index,
change merge policy,
change exploration rate.

Metrics:

final product metrics,
source contribution,
latency,
candidate recall,
filter rate,
diversity,
guardrails.

If source adds candidates but ranker never selects them, maybe ranker needs retraining or source quality is poor.

Candidate source experiments can be subtle because ranker mediates impact.

29. Ranker Training and Candidate Distribution

Ranker trained on historical candidate distribution may not handle new source well.

If new source introduces different candidate distribution:

score calibration may be off,
features missing,
ranker may under-score,
offline metrics may not reflect online.

Mitigations:

shadow log source candidates,
include source features,
retrain ranker with source candidates,
use exploration/interleaving,
source-specific calibration,
gradual rollout.

Candidate generation and ranking must co-evolve.

30. Candidate Source Features

For each source, pass features.

Examples:

two_tower_score
two_tower_rank
mf_dot_product
item_cf_similarity
i2i_lift
content_similarity
graph_ppr
popularity_score
trending_score
editorial_priority
exploration_propensity

Ranker can learn source reliability by context.

Keep feature contracts stable.

31. Source-Aware Reranking

Even after ranking, reranker may enforce source mix.

Example:

final_slate_constraints:
  max_sponsored: 2
  min_exploration_if_eligible: 1
  max_same_source_if_low_diversity: 8
  must_include_policy_required_actions: true

Use carefully. Hard source quotas can reduce relevance. But some business/safety/exploration goals require them.

Separate:

source quota at candidate stage,
final slate source constraints,
ranking objective.

32. Handling Empty Sources

Empty source can mean:

not applicable,
user cold-start,
dependency failure,
source index stale,
all candidates filtered,
source bug,
surface unsupported.

Status should distinguish.

{
  "source": "item_cf",
  "status": "empty",
  "empty_reason": "no_user_history"
}

or:

{
  "source": "item_cf",
  "status": "empty",
  "empty_reason": "all_candidates_filtered_by_policy"
}

This improves debugging.

33. Candidate Pool Debugging Questions

When recommendation bad, ask:

Did good item appear in any candidate source?
Which sources returned it?
Was it filtered before ranking?
Was it ranked low?
Was it removed by reranker?
Was it ineligible?
Was source disabled by privacy/experiment?
Did source timeout?
Was candidate pool too small?

This is why candidate provenance and debug trace matter.

34. Multi-Source Failure Modes

34.1 All Sources Return Same Popular Items

Low diversity, long-tail poor.

34.2 New Source Never Selected

Ranker distribution mismatch or source low quality.

34.3 Source Timeout Causes Empty Pool

No fallback or too much reliance on one source.

34.4 Source Returns Invalid Items

Stale index/list or weak eligibility.

34.5 Candidate Pool Too Large

Feature/ranking latency explosion.

34.6 Exploration Overexposes Bad Items

Guardrails insufficient.

34.7 Privacy Mode Bug

Personalized source used when disabled.

34.8 Enterprise Authorization Leak

Unauthorized candidates generated.

34.9 No Source Attribution

Cannot understand production behavior.

34.10 Candidate Policy Drift

Config changes without evaluation.

35. Implementation Sketch

Candidate orchestrator:

public final class CandidateGenerationOrchestrator {
    private final CandidatePolicyRegistry policyRegistry;
    private final Map<String, CandidateSource> sources;
    private final CandidateAggregator aggregator;
    private final EligibilityService eligibilityService;
    private final FallbackCandidateService fallbackService;

    public CandidatePool generate(RecommendationRequest request) {
        CandidatePolicy policy = policyRegistry.resolve(request.surface(), request.context());

        List<CandidateSourceCall> calls = policy.sources().stream()
            .filter(sourceConfig -> sourceConfig.enabled())
            .filter(sourceConfig -> sources.get(sourceConfig.name()).isApplicable(request))
            .map(sourceConfig -> callAsync(sourceConfig, request))
            .toList();

        List<CandidateSourceResult> results = waitWithinBudget(calls, policy.totalTimeout());

        CandidatePool pool = aggregator.merge(results, policy.mergePolicy());
        CandidatePool eligiblePool = eligibilityService.filter(pool, request.context());

        if (eligiblePool.size() < policy.minimumPoolSize()) {
            CandidatePool fallback = fallbackService.generate(request, policy.fallbackSources());
            eligiblePool = aggregator.merge(List.of(eligiblePool, fallback), policy.mergePolicy());
        }

        return eligiblePool.withDiagnostics(results);
    }
}

Important details:

async calls,
per-source timeout,
diagnostics,
fallback,
eligibility,
provenance preservation.

36. Candidate Aggregator Sketch

public final class CandidateAggregator {
    public CandidatePool merge(List<CandidateSourceResult> results, MergePolicy policy) {
        Map<String, AggregatedCandidate> byItemId = new HashMap<>();

        for (CandidateSourceResult result : results) {
            if (!result.status().canUseCandidates()) {
                continue;
            }

            for (Candidate candidate : result.candidates()) {
                String key = policy.exactDedup()
                    ? candidate.itemId()
                    : candidate.candidateKey();

                byItemId.computeIfAbsent(key, ignored -> AggregatedCandidate.from(candidate))
                        .addSource(candidate);
            }
        }

        return new CandidatePool(
            byItemId.values().stream()
                .limit(policy.maxMergedCandidates())
                .toList()
        );
    }
}

Production needs better top-K handling, quotas, dedup groups, and memory safety.

37. Data Model for Aggregated Candidate

{
  "item_id": "item_123",
  "item_type": "product",
  "dedup_group_id": "family_123",
  "sources": [
    {
      "source": "two_tower",
      "version": "v5",
      "rank": 12,
      "score": 0.82,
      "score_type": "inner_product"
    },
    {
      "source": "content_based",
      "version": "v3",
      "rank": 8,
      "score": 0.77,
      "score_type": "cosine_similarity"
    }
  ],
  "aggregated_features": {
    "source_count": 2,
    "best_source_rank": 8,
    "has_two_tower": true,
    "has_content_based": true
  },
  "provenance": {
    "reason_codes": ["embedding_match", "same_topic"]
  }
}

This becomes input to feature fetch/ranker.

38. Minimal Production Multi-Source Plan

For first robust system:

home_feed:
  sources:
    segment_popularity: 300
    content_based_user_session: 300
    item_cf_from_history: 300
    matrix_factorization: 500
    trending_region: 100
    editorial_safe: 50
  fallback:
    - segment_popularity
    - global_popularity

product_detail:
  sources:
    content_similar: 300
    co_view: 300
    co_buy: 200
    category_popularity: 100
  fallback:
    - same_category_popularity
    - global_popularity

enterprise_case:
  sources:
    valid_actions: required
    policy_required: required
    knowledge_by_topic: 100
    similar_cases_authorized: 100
  fallback:
    - policy_required

Then add:

two-tower,
graph PPR,
exploration,
sponsored/business sources if applicable.

39. Checklist Multi-Source Candidate Generation

[ ] Candidate policy exists per surface.
[ ] Source applicability is defined.
[ ] Source quotas are defined.
[ ] Per-source timeout exists.
[ ] Sources run in parallel where possible.
[ ] Required vs optional source behavior is explicit.
[ ] Candidate merge preserves multi-source provenance.
[ ] Score semantics are not mixed blindly.
[ ] Exact item dedup exists.
[ ] Dedup group strategy exists.
[ ] Final eligibility filtering exists.
[ ] Minimum pool size and fallback exist.
[ ] Source contribution funnel is monitored.
[ ] Candidate recall is measured per source and portfolio.
[ ] Source overlap is monitored.
[ ] Cold-start policy changes source mix.
[ ] Privacy mode disables behavioral sources.
[ ] Enterprise authorization is enforced.
[ ] Exploration source logs propensity.
[ ] Source lifecycle includes shadow mode.
[ ] Ranker retraining considers new source distribution.

40. Kesimpulan

Multi-source candidate generation adalah cara production recommendation system menghindari blind spot.

Prinsip utama:

Candidate generation is a portfolio.
Source mix must be surface-specific.
Source quotas control recall, cost, and diversity.
Scores are source-native; do not compare raw scores blindly.
Multi-source provenance is a feature, not noise.
Candidate recall and source contribution must be measured.
Cold-start, privacy, and enterprise modes require different source policies.
Exploration is a source with propensity and guardrails.
Candidate source lifecycle needs shadow, experiment, production, deprecation.
Ranker and candidate distribution must evolve together.

Di Part 026, kita masuk ke retrieval modern: Two-Tower Retrieval Model, fondasi untuk scalable personalized retrieval dengan user/query tower, item tower, embedding index, negative sampling, dan ANN serving.

Lesson Recap

You just completed lesson 25 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 24

Learn Build From Scratch Recommendations System Part 024 Candidate Generation Contract

Next Lesson

Lesson 26

Learn Build From Scratch Recommendations System Part 026 Two Tower Retrieval Model