Build CoreOrdered learning track

Learn Build From Scratch Recommendations System Part 023 Graph Based Recommendation

[]12 min read2352 words

In This Lesson

1. Mental Model: Recommendation as Graph Traversal 2. Why Graph-Based Recommendation Matters 3. Graph Types in Recommendation

PrevNext

Lesson 2380 lesson track16–44 Build Core

title: Build From Scratch Recommendations System - Part 023 description: Membangun graph-based recommendation production-grade: user-item graph, item graph, entity graph, random walk, Personalized PageRank, graph embeddings, community signals, constraints, scalability, dan serving architecture. series: learn-build-from-scratch-recommendations-system seriesTitle: Build From Scratch: Enterprise Recommendations System order: 23 partTitle: Graph-Based Recommendation tags:

recommendation-system
recsys
graph
graph-recommendation
personalized-pagerank
embeddings
series date: 2026-07-02

Part 023 — Graph-Based Recommendation

Recommendation system sering lebih mudah dipahami sebagai graph.

User berinteraksi dengan item. Item punya kategori. Item dibuat creator. Produk dijual seller. Artikel punya topic. Case punya entities. Knowledge article berlaku untuk policy. User berada di organization. Role punya permission. Item mirip karena co-view. Produk terhubung karena co-buy. Video terhubung karena watch-next. Case terhubung karena shared risk indicators.

Jika kita memodelkan semua hubungan itu sebagai graph, recommendation menjadi masalah:

Dari node awal, node lain mana yang paling relevan, reachable, valid, dan berguna?

Graph-based recommendation memberi kita cara berpikir yang sangat kuat karena ia bisa menggabungkan collaborative signal, content signal, taxonomy, entity relationship, policy constraint, dan domain knowledge dalam satu struktur.

Part ini membahas graph-based recommendation production-grade: user-item graph, item graph, heterogeneous graph, random walk, Personalized PageRank, graph embeddings, community signals, enterprise graph, scalability, serving, dan failure modes.

1. Mental Model: Recommendation as Graph Traversal

Bayangkan graph:

User -> clicked -> Item
Item -> belongs_to -> Category
Item -> created_by -> Creator
Item -> co_bought_with -> Item
User -> follows -> Creator
Case -> has_risk_indicator -> Entity
KnowledgeArticle -> applies_to -> PolicyTopic
Action -> valid_after -> Action

Recommendation bisa dimodelkan sebagai traversal:

start from user/session/context node
walk through meaningful edges
collect candidate item/action/document nodes
score by path strength and constraints

Diagram:

Graph gives multiple paths:

User -> Item -> Similar Item
User -> Category -> Popular Item
User -> Creator -> New Item
User -> Case -> Policy Topic -> Knowledge Article

2. Why Graph-Based Recommendation Matters

Graph recommendation is useful when:

entities have rich relationships,
behavior alone is sparse,
content and collaborative signals must combine,
explanation via paths matters,
domain constraints are relational,
enterprise permission/policy graph exists,
item taxonomy is important,
multi-hop reasoning helps,
cold-start can use metadata graph.

Examples:

E-commerce

user -> purchased camera
camera -> compatible_with memory_card
camera -> co_bought_with camera_bag
camera -> belongs_to mirrorless_category
mirrorless_category -> popular tripod

Content

user -> watched video
video -> topic distributed_tracing
topic -> related_topic observability
observability -> article/post/course

Enterprise Case

case -> risk_indicator suspicious_transfer
risk_indicator -> policy_topic AML_escalation
policy_topic -> knowledge_article
case_state -> valid_next_action
actor_role -> permission -> action

Graph makes these paths explicit.

3. Graph Types in Recommendation

3.1 User-Item Bipartite Graph

Nodes:

users,
items.

Edges:

view,
click,
purchase,
watch,
like,
hide.

Good for collaborative recommendation.

3.2 Item Graph

Nodes:

items.

Edges:

co-view,
co-buy,
similar content,
same category,
sequence next,
complement,
alternative.

Good for item-to-item recommendation and candidate generation.

3.3 Knowledge Graph / Entity Graph

Nodes:

user,
item,
category,
brand,
creator,
seller,
topic,
tenant,
role,
policy,
case,
action,
document.

Edges are typed.

Good for hybrid recommendation and enterprise.

3.4 Session Graph

Nodes:

items/actions in a session.

Edges:

sequence,
time proximity,
transition.

Good for next-item/session recommendation.

4. Heterogeneous Graph

Production recommendation graph is usually heterogeneous.

Different node types:

User
Item
Category
Creator
Seller
Brand
Topic
Query
Session
Case
Action
KnowledgeArticle
Tenant
Role
Policy

Different edge types:

clicked
purchased
viewed
belongs_to
created_by
follows
co_bought
similar_to
valid_for
requires_permission
applies_to
next_action

Example schema:

nodes:
  - user
  - item
  - category
  - creator
  - topic
  - case
  - knowledge_article
  - action
edges:
  - user_clicked_item
  - user_purchased_item
  - item_belongs_to_category
  - item_created_by_creator
  - item_has_topic
  - case_has_topic
  - article_applies_to_topic
  - action_valid_for_case_state

Typed graph matters because clicked edge is not the same as belongs_to.

5. Edge Semantics

Every edge needs meaning.

Edge fields:

{
  "source_node": "user:u123",
  "target_node": "item:i456",
  "edge_type": "clicked",
  "weight": 1.0,
  "event_time": "2026-07-02T10:00:00Z",
  "valid_from": "2026-07-02T10:00:00Z",
  "valid_until": null,
  "source_event_id": "evt_001",
  "confidence": 0.8
}

Edge semantics include:

direction,
weight,
timestamp,
confidence,
validity,
source,
privacy/policy scope.

Do not turn all edges into untyped connections. You lose meaning.

6. Edge Weighting

Edge weight can reflect interaction strength.

Example:

impression = 0.1
click = 1.0
long_dwell = 2.0
add_to_cart = 3.0
purchase = 5.0
hide = negative/suppression
report = policy signal

For graph traversal, edge weights affect probability.

Example:

user -> purchased item

should carry stronger signal than:

user -> clicked item

But edge type should remain explicit. Weight is not replacement for semantic type.

7. Time and Recency in Graphs

Graph changes over time.

Edges should decay or expire.

edge_weight(t) = original_weight * exp(-lambda * age)

Examples:

session click edge expires quickly,
purchase edge lasts longer,
case state edge valid until state changes,
policy edge valid until policy version replaced,
stock/availability edge can change immediately.

Graph recommendation must avoid stale edges.

For production, store:

created_at
valid_from
valid_until
last_updated_at
ttl

8. Graph Construction Pipeline

Graph is built from curated data, not raw events.

Input quality still matters:

dedup,
bot filtering,
policy state,
catalog version,
identity resolution,
tenant isolation.

9. Graph Storage

Storage options conceptually:

9.1 Edge Tables

Relational/lake format:

source_type, source_id, edge_type, target_type, target_id, weight, timestamp

Good for batch graph algorithms.

9.2 Key-Value Adjacency Lists

node_id -> list of outgoing edges

Good for online traversal.

9.3 Graph Database

Useful for complex relationship queries, but not always necessary for high-throughput recommendation.

9.4 Vector Store for Graph Embeddings

If graph algorithms produce embeddings.

Production often uses multiple:

edge tables for batch
adjacency store for serving
vector index for retrieval

10. Simple Graph Traversal Recommendation

Given user:

Get recently interacted items.
Traverse item -> related items.
Traverse item -> category -> popular items.
Traverse creator/topic edges.
Aggregate candidates.
Filter and rank.

Example scoring by path:

User --purchase--> Camera --co_buy--> MemoryCard
score = weight(purchase) * weight(co_buy)

For multi-hop:

User -> Item -> Category -> Item

Score decays by path length.

path_score = product(edge_weights) * path_decay^length

Shorter, stronger paths usually better.

11. Path-Based Scoring

Path example:

user:u1 --clicked--> item:A --has_topic--> topic:T --has_item--> item:B

Score:

score(B) += user_click_weight * topic_edge_weight * topic_item_weight * decay(3 hops)

For each candidate, sum over paths:

score(candidate) = sum(path_scores from seed nodes to candidate)

This is explainable:

Recommended because it shares topic T with item A you clicked.

But enumerating many paths can be expensive. Limit:

max path length,
edge type whitelist,
top edges per node,
candidate cap,
time window.

12. Personalized PageRank

Personalized PageRank (PPR) is a graph algorithm for personalized relevance.

Intuition:

start random walk from user/context seed nodes,
at each step, either continue walking or restart at seed,
nodes visited often are relevant.

Formula idea:

rank = restart_probability * seed_distribution
     + (1 - restart_probability) * transition_matrix * rank

PPR naturally handles multi-hop relationships while keeping personalization anchored.

Use cases:

user-item graph,
heterogeneous graph,
related documents,
enterprise knowledge graph.

Pros:

explainable-ish via graph,
combines multi-hop signal,
can use typed edges/weights,
works with sparse data.

Cons:

expensive at scale,
graph freshness,
topic drift if graph dense,
hard to tune edge weights,
needs filtering.

13. Random Walk with Restart

Algorithm mental model:

current_node = seed
repeat many times:
  with probability restart:
      current_node = seed
  else:
      current_node = sample_neighbor(current_node)
  count visits

Candidates are item/action/document nodes with high visit count.

In production, exact random walks online can be expensive. Options:

precompute PPR for active users/items,
approximate online for small subgraph,
use graph embeddings,
use PPR as batch feature/candidate source.

14. Typed Edge Random Walks

In heterogeneous graph, not all paths are valid.

Example e-commerce recommendation path:

user -> purchased_item -> co_bought_item

valid.

But:

user -> item -> seller -> all_seller_items

might overpromote seller.

Use metapaths.

Metapath examples:

User-Item-Item
User-Item-Category-Item
User-Item-Creator-Item
Case-Topic-Article
Case-State-Action
User-Role-Permission-Action

Define allowed metapaths per surface.

surface: product_detail_related
allowed_metapaths:
  - item->co_view->item
  - item->co_buy->item
  - item->category->item
  - item->brand->item
blocked_metapaths:
  - item->seller->item

This keeps graph traversal controlled.

15. Graph Embeddings

Graph embeddings learn vector representation of nodes from graph structure.

Examples:

item embedding from co-interaction graph,
user embedding from user-item graph,
entity embedding from knowledge graph,
case embedding from case-entity graph.

Methods conceptually:

random walk sequences + skip-gram,
matrix factorization of graph proximity,
graph neural networks,
knowledge graph embedding.

In production, graph embeddings are useful as candidate retrieval features:

ANN.search(user_graph_embedding)

But embedding versioning and training cutoff matter.

16. Random Walk Embedding Mental Model

Generate node sequences:

user1 -> itemA -> categoryC -> itemB -> user2 -> itemD

Train embeddings so nodes appearing in similar walks are close.

This captures graph neighborhood similarity.

Use typed walks to avoid meaningless paths.

For recommendation:

query node embedding,
retrieve nearest item/action nodes,
filter eligibility,
rank.

Graph embeddings bridge graph reasoning and vector retrieval.

17. Graph Neural Networks

Graph neural networks aggregate neighbor information.

Concept:

node_representation =
  function(node_features, neighbor_representations, edge_types)

Useful when:

graph has rich node features,
multi-hop structure matters,
node types/edge types are meaningful,
you need inductive generalization for new nodes with features.

Challenges:

complexity,
training cost,
serving cost,
stale graph,
explainability,
operational maturity.

For this series stage, understand graph GNNs as advanced extension. Build simpler graph traversal/PPR/embedding first.

18. Graph-Based Candidate Generation

Graph can produce candidates:

From User Node

user -> interacted items -> related items

From Session Node

session -> recent items -> next items

From Item Seed

seed item -> co_buy/co_view/similar graph

From Context/Case Node

case -> topics/entities/state -> articles/actions

Candidate output should include provenance:

{
  "item_id": "item_123",
  "source": "graph_ppr",
  "score": 0.73,
  "paths": [
    "user_clicked:item_A -> topic:distributed_tracing -> item_123"
  ],
  "graph_version": "graph-20260702"
}

Path provenance is powerful for debugging.

19. Graph-Based Features

Graph is not only candidate source. It can produce features for ranker.

Examples:

user_item_graph_distance
personalized_pagerank_score
common_neighbor_count
user_topic_graph_affinity
item_community_id
creator_centrality
seller_trust_graph_score
case_article_path_score
action_success_graph_score

Ranker can use these with other features.

Important: graph features must be point-in-time safe.

20. Community Detection

Graphs reveal communities.

Examples:

item clusters,
user interest groups,
creator communities,
case clusters,
knowledge topic clusters.

Community features:

item_community_id
user_top_community
user_item_same_community
community_popularity
community_diversity

Use cases:

candidate generation,
diversity,
exploration,
segmentation,
cold-start.

Beware of filter bubbles. Community-based recommendation can over-narrow exposure.

21. Knowledge Graph Recommendation

Knowledge graph includes domain facts.

Example e-commerce:

Product -> compatible_with -> Accessory
Product -> requires -> BatteryType
Brand -> has_series -> ProductLine

Example enterprise:

CaseType -> governed_by -> Policy
Policy -> requires -> EvidenceType
EvidenceType -> described_by -> KnowledgeArticle
Action -> valid_for -> CaseState
Role -> allowed_to_execute -> Action

Recommendation can use rule-like graph traversal:

case -> policy -> required_evidence -> knowledge_article

This is highly explainable and safe if graph is curated.

Knowledge graph edges often represent truth/constraint, not behavior. Treat them differently from click edges.

22. Enterprise Graph Example

Serving must enforce:

actor has permission
case state allows action
jurisdiction matches policy
tenant boundary holds
article valid now

Graph makes these constraints explicit.

23. Graph Constraints and Safety

Graph traversal can accidentally reach unauthorized nodes.

Example:

case -> similar_case -> confidential_document

If actor cannot access similar case, candidate should not appear.

Apply constraints:

traversal-level edge filtering,
node visibility filtering,
final eligibility filtering,
path redaction in logs,
tenant isolation,
purpose/consent checks.

For high-sensitivity graph, do authorization-aware traversal, not only final filtering.

24. Graph Versioning

Version graph artifacts:

edge_schema_version
edge_weight_policy
graph_snapshot_version
node_feature_version
embedding_model_version
ppr_algorithm_version
metapath_policy_version

Log:

{
  "graph_version": "graph-snapshot-20260702",
  "metapath_policy": "home-graph-metapath-v3",
  "edge_weight_policy": "edge-weight-v4"
}

If graph changes, recommendation behavior changes.

25. Temporal Graph Snapshots

For training/evaluation:

use graph as-of prediction_time

Do not use future edges.

Example leakage:

user buys item B after impression,
graph includes edge user->B,
training example before purchase uses graph path to B.

Solution:

graph snapshots by time,
edge valid_from/valid_to,
temporal joins,
graph embedding training cutoff.

Graph leakage can be subtle.

26. Scalability

Graphs can be huge.

Challenges:

billions of edges,
high-degree nodes,
popular items/categories,
online traversal latency,
memory,
frequent updates,
access control.

Mitigations:

cap neighbor lists,
filter edge types,
downweight high-degree nodes,
precompute top candidates,
use approximate PPR,
use graph embeddings,
partition by tenant/region/domain,
cache adjacency lists,
use batch graph processing.

High-degree nodes like global category “Electronics” can make traversal noisy. Penalize or limit.

27. High-Degree Node Problem

Nodes like:

category: all
topic: technology
creator: mega_creator
item: viral_item

connect to many nodes.

They can dominate traversal.

Mitigations:

degree normalization,
edge weight normalization,
stopword-like node removal,
max neighbors per node,
metapath restrictions,
popularity penalty,
use specific category levels.

Graph equivalent of “the” in text: too common to be useful.

28. Serving Patterns

28.1 Precomputed Graph Candidates

Batch compute graph candidates per user/item/context.

Pros:

low latency,
stable,
easy fallback.

Cons:

stale,
storage heavy.

28.2 Online Traversal

Compute candidates at request time.

Pros:

dynamic,
context-aware.

Cons:

latency,
complexity,
graph store dependency.

28.3 Graph Embedding Retrieval

Use vector index.

Pros:

scalable retrieval,
fast.

Cons:

less path explainability,
stale embeddings.

Hybrid is common:

precomputed graph candidates
+ online filters
+ graph score as rank feature
+ graph embedding retrieval for broader recall

29. Graph Candidate Contract

Graph candidate should include:

{
  "candidate_id": "item_123",
  "candidate_type": "item",
  "source": "graph_ppr",
  "score": 0.731,
  "reason_paths": [
    {
      "path_type": "user_item_topic_item",
      "path": ["user:u1", "item:A", "topic:T", "item:123"],
      "path_score": 0.42
    }
  ],
  "graph_metadata": {
    "graph_version": "graph-20260702",
    "edge_weight_policy": "edge-weight-v3",
    "metapath_policy": "home-v2"
  }
}

For privacy, reason paths may be internal-only. Do not expose sensitive nodes.

30. Evaluation

Offline:

Recall@K,
NDCG@K,
HitRate@K,
path relevance,
coverage,
diversity,
cold-start improvement,
graph source contribution,
community coverage,
long-tail exposure.

Online:

CTR,
conversion,
watch completion,
case success,
hide/report,
session continuation,
diversity,
user trust,
latency.

For enterprise:

permission violation count must be zero,
invalid action recommendations must be zero,
policy/jurisdiction mismatch must be zero.

Graph systems need safety evaluation, not just relevance.

31. Explainability via Paths

Graph recommendations can be explained by path.

Examples:

Because you read articles about distributed tracing.
Because this product is compatible with your camera.
Because this article applies to the risk indicator in your case.
Because users who completed this course also took this next module.

Internal explanation:

user -> item_A -> topic_observability -> item_B

User-facing explanation should be semantic, concise, and policy-safe.

Do not expose:

other user identities,
confidential case nodes,
sensitive inferred attributes,
hidden policy signals.

32. Graph and Diversity

Graph can help diversity.

If candidates belong to graph communities, reranker can enforce:

max items per community
cover multiple related topics
avoid same creator cluster

Graph can also hurt diversity by staying inside one cluster.

Use graph community as both feature and constraint.

33. Graph and Cold Start

Cold item can connect through metadata graph:

new item -> category/topic/creator -> users/items

Even with no interactions, graph can recommend it via content edges.

Cold user can connect through:

onboarding topics,
session items,
query,
role,
case context,
region.

Graph is strong for cold-start if metadata/entity edges are rich.

34. Graph and Policy

Graph can encode policy:

role -> can_access -> action
item -> restricted_in -> region
article -> valid_for -> jurisdiction
case_state -> allows -> action

But do not mix policy score with preference score.

Use policy graph for eligibility, then preference graph for relevance.

Hard constraint:

if no valid permission path:
    exclude candidate

35. Operational Failure Modes

35.1 Stale Edges

Old interactions dominate.

35.2 High-Degree Dominance

Generic nodes recommend everything.

35.3 Unauthorized Path Leakage

Traversal reaches restricted nodes.

35.4 Graph Explosion

Online traversal too slow.

35.5 Edge Semantics Mixed

Click, category, permission, and co-buy all treated same.

35.6 Future Edge Leakage

Graph embedding trained with future data.

35.7 Poor Metadata

Cold-start graph edges wrong.

35.8 No Path Provenance

Recommendation cannot be debugged.

35.9 Over-Clustering

Recommendations too narrow.

35.10 Tenant Leakage

Enterprise graph crosses tenant boundary.

36. Implementation Sketch: Path-Based Candidate Generator

Conceptual Java-like structure:

public interface GraphStore {
    List<Edge> outgoing(NodeId node, EdgeFilter filter, int limit);
}

public record GraphSeed(NodeId nodeId, double weight) {}

public record GraphCandidate(
    String candidateId,
    String candidateType,
    double score,
    List<String> reasonPaths
) {}

Traversal:

public final class PathBasedGraphCandidateGenerator {
    private final GraphStore graphStore;
    private final MetapathPolicy metapathPolicy;

    public List<GraphCandidate> generate(List<GraphSeed> seeds, GraphRequestContext context) {
        Map<String, CandidateAccumulator> candidates = new HashMap<>();

        for (GraphSeed seed : seeds) {
            traverse(seed.nodeId(), seed.weight(), List.of(seed.nodeId().toString()), 0, candidates, context);
        }

        return candidates.values().stream()
            .map(CandidateAccumulator::toGraphCandidate)
            .sorted(Comparator.comparing(GraphCandidate::score).reversed())
            .limit(context.limit())
            .toList();
    }

    private void traverse(
        NodeId node,
        double pathScore,
        List<String> path,
        int depth,
        Map<String, CandidateAccumulator> candidates,
        GraphRequestContext context
    ) {
        if (depth >= context.maxDepth()) {
            return;
        }

        EdgeFilter filter = metapathPolicy.edgeFilterFor(path, context);
        for (Edge edge : graphStore.outgoing(node, filter, context.maxNeighborsPerNode())) {
            double nextScore = pathScore * edge.weight() * context.pathDecay();

            NodeId target = edge.target();
            List<String> nextPath = append(path, target.toString());

            if (context.isCandidateType(target.type())) {
                candidates.computeIfAbsent(target.id(), CandidateAccumulator::new)
                    .add(nextScore, nextPath);
            }

            traverse(target, nextScore, nextPath, depth + 1, candidates, context);
        }
    }
}

Production version needs cycle control, privacy filtering, quotas, and latency budgets.

37. Minimal Production Graph Plan

Start with three graph candidate sources:

37.1 User-Item-Item Graph

user recent positive items -> item-to-item edges -> candidates

Edges:

co_view,
co_buy,
content_similar.

37.2 User-Item-Topic-Item Graph

user positive items -> topics/categories -> items

Good for cold-start and diversity.

37.3 Enterprise Case-Topic-Article/Action Graph

case -> risk/topic/state -> valid articles/actions

For enterprise domain.

Implement:

edge table,
graph version,
metapath policy,
offline candidate generation,
online eligibility filter,
path provenance,
graph metrics.

Do not start with full GNN unless graph/data/ops foundation is mature.

38. Checklist Graph Recommendation Readiness

[ ] Node types are defined.
[ ] Edge types are defined.
[ ] Edge semantics are documented.
[ ] Edge weights are versioned.
[ ] Edge timestamps and validity are stored.
[ ] Graph construction uses clean curated data.
[ ] Bot/internal/test edges are excluded or flagged.
[ ] High-degree node handling exists.
[ ] Metapath policy exists per surface.
[ ] Traversal respects tenant/permission/policy boundaries.
[ ] Graph snapshot/version is logged.
[ ] Temporal graph leakage is controlled.
[ ] Candidate provenance/path is stored.
[ ] Online eligibility/suppression filters run.
[ ] Graph candidate latency is bounded.
[ ] Coverage/diversity metrics are monitored.
[ ] Unauthorized recommendation count is monitored.
[ ] Cold-start impact is measured.
[ ] Fallback exists if graph source empty.

39. Kesimpulan

Graph-based recommendation memberi cara berpikir yang luas dan kuat: recommendation sebagai traversal dan scoring di atas hubungan antar entity.

Prinsip utama:

Recommendation graph harus typed; edge semantics penting.
Graph bisa menggabungkan collaborative, content, taxonomy, policy, dan domain knowledge.
Path-based scoring memberi explainability.
Personalized PageRank/random walk memberi multi-hop relevance.
Graph embeddings memberi retrieval scalable.
High-degree nodes dan stale edges harus dikontrol.
Temporal graph leakage harus dicegah.
Authorization-aware traversal penting untuk enterprise.
Graph is evidence, not a replacement for policy.
Start simple with metapaths and precomputed candidates before jumping to GNN.

Part ini menutup Module 3: baseline dan recommender klasik sebelum deep learning.

Di Part 024, kita masuk Module 4: Candidate Generation / Retrieval Layer, dimulai dari Candidate Generation Contract — batas formal yang membuat banyak source retrieval bisa bekerja rapi dengan ranker dan serving system.

Lesson Recap

You just completed lesson 23 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 22

Learn Build From Scratch Recommendations System Part 022 Matrix Factorization From Scratch

Next Lesson

Lesson 24

Learn Build From Scratch Recommendations System Part 024 Candidate Generation Contract