Build CoreOrdered learning track

Learn Ai Code Documentation Agent Memory Part 015 Hybrid Search And Ranking

[]14 min read2723 words

In This Lesson

1. Tujuan Part Ini 2. Kenapa Hybrid Search Wajib 3. Retrieval Stack

Lesson 1535 lesson track07–19 Build Core

title: Learn AI Code Documentation & Agent Memory Platform - Part 015 description: Hybrid search dan ranking untuk menggabungkan exact lookup, lexical search, vector search, graph expansion, metadata filtering, trust, freshness, dan task intent dalam retrieval code intelligence. series: learn-ai-code-documentation-agent-memory seriesTitle: Learn AI Code Documentation & Agent Memory Platform order: 15 partTitle: Hybrid Search and Ranking tags:

ai
retrieval
hybrid-search
ranking
vector-search
lexical-search
code-intelligence
agent-context date: 2026-07-02

Part 015 — Hybrid Search and Ranking

1. Tujuan Part Ini

Part 014 membahas embedding dan vector indexing. Sekarang kita menyusun retrieval layer yang benar-benar usable: hybrid search and ranking.

Dalam code intelligence platform, tidak ada satu search strategy yang cukup.

Vector search bagus untuk semantic recall, tetapi lemah untuk exact identifier. Lexical search bagus untuk keyword dan identifier, tetapi lemah untuk intent. Graph bagus untuk dependency dan impact, tetapi tidak cocok sebagai discovery awal untuk pertanyaan vague. Metadata filtering menjaga scope dan permission. Trust/freshness menjaga agar hasil yang tampak relevan tidak menyesatkan.

Target part ini:

memahami kenapa retrieval harus hybrid,
membedakan exact lookup, lexical, vector, graph, metadata, dan memory retrieval,
mendesain query understanding dan query routing,
membuat ranking model berbasis task intent,
menggabungkan hasil dari beberapa retriever,
menerapkan permission filtering dan source boundary,
memberi ranking penalty untuk stale/generated/vendor/low-confidence evidence,
menghasilkan retrieval result yang explainable,
membuat evaluation untuk hybrid retrieval,
menyiapkan output untuk context assembly engine di Part 016.

2. Kenapa Hybrid Search Wajib

Repository query memiliki bentuk yang sangat berbeda.

Query	Search Terbaik
`OrderValidator.validate`	exact symbol lookup
`POST /orders`	route/API index
`src/main/java/.../OrderService.java`	file path lookup
`order.validation.max-items`	config key index
"how does order validation work?"	vector + lexical + graph
"what calls this method?"	graph query
"which tests cover invalid orders?"	test graph + lexical
"why did we centralize validation?"	ADR/doc search + vector
"what changed after commit X?"	metadata + graph diff
"agent needs context to edit pricing rule"	exact target + graph + tests + memory

Jika semua query dilempar ke vector DB, hasil akan tampak pintar tetapi sering salah. Jika semua query dilempar ke lexical search, sistem tidak memahami intent.

Hybrid retrieval menggabungkan kekuatan beberapa strategi.

3. Retrieval Stack

Retrieval bukan satu call. Retrieval adalah pipeline.

4. Retrieval Inputs

4.1 Query Request

retrievalRequest:
  tenantId: acme
  principal:
    userId: user_123
    teams:
      - team-order-platform
  scope:
    repositoryId: order-service
    branch: main
    commitSha: 6f41ab2
  query:
    text: "how does order validation work?"
    taskType: module_explanation
  constraints:
    maxCandidates: 50
    includeMemory: true
    includeDocs: true
    includeTests: true

4.2 Required Context

Retrieval needs:

tenant/principal,
repository scope,
snapshot/commit,
task type,
query text,
target symbol/path if available,
source boundary policy,
permission policy,
freshness policy,
ranking profile.

Without these, retrieval cannot be safe or relevant.

5. Query Understanding

Query understanding decides how to route query.

5.1 Query Features

Extract:

exact symbol-like tokens,
file paths,
endpoint paths,
config keys,
event topics,
table names,
error messages,
natural language intent,
task type,
target module,
repository names,
branch/commit hints.

5.2 Query Classification

queryUnderstanding:
  raw: "what tests cover OrderValidator.validate?"
  detected:
    intent: find_related_tests
    symbolCandidates:
      - OrderValidator.validate
    wantsTests: true
    queryMode:
      - exact_symbol
      - graph_tests
      - lexical

5.3 Intent Types

Intent	Example
`find_symbol`	"where is OrderValidator?"
`explain_module`	"how does order validation work?"
`find_callers`	"what calls validate?"
`find_tests`	"tests for corporate order validation"
`generate_docs`	"generate module docs"
`code_change_context`	"context to modify validation rule"
`api_explanation`	"what handles POST /orders?"
`data_flow`	"where is orders table written?"
`architecture_decision`	"why use RuleRegistry?"
`troubleshooting`	"why validation failures spike?"

5.4 Exact Pattern Detection

Examples:

com.acme.order.OrderValidator.validate
src/main/java/com/acme/order/OrderValidator.java
POST /orders
order.created
order.validation.max-items
orders.status

Exact patterns should route to exact indexes before semantic search.

6. Retriever Types

6.1 Exact Lookup

Exact lookup is deterministic.

Use for:

symbol,
file path,
route,
config key,
event topic,
table,
document path,
memory ID.

Example:

exactLookup:
  type: symbol
  value: OrderValidator.validate

6.2 Lexical Search

Lexical search is strong for identifiers and literal strings.

Use for:

class/function names,
error messages,
config keys,
table names,
comments,
specific terms.

6.3 Vector Search

Vector search is strong for semantic recall.

Use for:

conceptual queries,
vague questions,
onboarding,
behavior discovery,
doc discovery,
memory discovery.

6.4 Graph Retrieval

Graph retrieval is strong for relation queries.

Use for:

callers,
callees,
tests,
route flow,
dependency impact,
docs linked to symbol,
memory grounded in symbol.

6.5 Document Retrieval

Document retrieval uses document metadata, type, freshness, review state.

Use for:

ADR,
README,
runbook,
module docs,
generated docs,
stale docs report.

6.6 Memory Retrieval

Memory retrieval is scoped and governed.

Use for:

conventions,
decisions,
known pitfalls,
test strategy,
previous evaluation lessons.

Memory should be returned as derived guidance, not primary evidence.

7. Retrieval Routing

7.1 Routing Matrix

Intent	Exact	Lexical	Vector	Graph	Docs	Memory
find symbol	high	high	low	medium	low	low
explain module	medium	medium	high	high	high	medium
find callers	high	low	low	high	low	low
find tests	medium	high	medium	high	low	medium
generate docs	medium	medium	high	high	high	medium
code change context	high	medium	medium	high	medium	high
API explanation	high	medium	medium	high	high	medium
architecture decision	low	medium	high	medium	high	high
troubleshooting	medium	high	high	high	high	medium

7.2 Example Routing

Query:

what handles POST /orders?

Routing:

routes:
  - api_route_index
  - graph_handler_query
  - lexical_search
  - vector_search_low_priority

Query:

why are validation rules centralized?

Routing:

routes:
  - document_search_adr
  - vector_search_docs
  - memory_search_decisions
  - graph_search_related_symbols

8. Candidate Model

All retrievers should output a common candidate model.

candidate:
  candidateId: cand_01J...
  sourceRetriever: vector_search
  artifactType: chunk
  artifactId: chunk_01J...
  repositoryId: order-service
  snapshotId: snap_6f41ab2
  path: src/main/java/com/acme/order/validation/OrderValidator.java
  title: OrderValidator.validate
  score:
    raw: 0.83
    normalized: 0.76
  evidenceRole: primary_evidence
  metadata:
    chunkType: method_chunk
    language: java
    staleRisk: low
    confidence: 0.94

8.1 Candidate Types

Candidate Type	Example
chunk	method/doc/config chunk
symbol	`OrderValidator.validate`
graph_node	API operation
graph_edge	`CALLS` edge
document	`docs/order-validation.md`
memory	`mem_rule_registry`
graph_path	request flow
file	source file

Candidates may be transformed into context items later.

9. Candidate Merge

Multiple retrievers may return the same thing.

9.1 Duplicate Sources

Example:

exact lookup returns OrderValidator.validate,
vector search returns method chunk,
graph retrieval returns same symbol as target.

Merge by logical identity.

mergedCandidate:
  artifactId: chunk_order_validator_validate
  matchedBy:
    - exact_symbol
    - vector_search
    - graph_target
  scores:
    exact: 1.0
    vector: 0.82
    graph: 0.95

9.2 Merge Keys

Use:

logicalChunkId,
logicalSymbolId,
documentId + sectionId,
graph node logical ID,
memoryId,
file path + commit.

9.3 Merge Benefit

Merged candidates get stronger ranking because multiple retrieval modes agree.

10. Permission and Scope Filtering

Security filter is not optional.

10.1 Filter Stages

Apply filtering:

before retrieval if possible,
after retrieval before ranking,
before context assembly,
before output.

10.2 Required Checks

tenant match,
repository access,
snapshot access,
document visibility,
memory scope,
sensitivity,
blocked content,
derived visibility.

10.3 Never Return Unauthorized Metadata

Even path/title can leak.

Bad:

{
  "path": "fraud-service/src/main/java/HighRiskPaymentDetector.java"
}

if user cannot access fraud-service.

10.4 Filter Result

Store exclusion counts, not secret details.

filterReport:
  candidatesBefore: 80
  excluded:
    permissionDenied: 12
    blockedSensitive: 1
    staleHighRisk: 3
  candidatesAfter: 64

11. Ranking Features

Ranking combines many signals.

11.1 Feature Categories

Category	Features
Query match	exact, lexical, vector similarity
Structural	graph proximity, parent/child relation
Source quality	source kind, generated/vendor, confidence
Freshness	stale risk, commit match, last changed
Trust	review state, evidence coverage, conflict
Task intent	tests for code change, ADR for decision
Scope	same module/repo, branch/commit
Security	allowed, sensitivity
Cost	token size, chunk length
Diversity	avoid duplicate chunks

11.2 Example Feature Record

features:
  exactSymbolMatch: 1.0
  lexicalScore: 0.62
  vectorSimilarity: 0.79
  graphProximity: 0.90
  sourceKindBoost: 0.20
  freshnessPenalty: 0.00
  generatedPenalty: 0.00
  memoryDerivedPenalty: 0.00
  tokenCostPenalty: 0.04

12. Ranking Formula

Start simple and explainable.

finalScore =
    exactMatchScore * 0.25
  + lexicalScore * 0.20
  + vectorScore * 0.20
  + graphScore * 0.15
  + taskIntentScore * 0.10
  + trustScore * 0.05
  + freshnessScore * 0.05
  - penalties

12.1 Penalties

penalties =
    stalePenalty
  + generatedPenalty
  + vendorPenalty
  + lowConfidencePenalty
  + duplicatePenalty
  + tokenCostPenalty
  + wrongScopePenalty

12.2 Task-Specific Weight

For exact symbol lookup:

exactMatchWeight high
vectorWeight low

For conceptual explanation:

vectorWeight high
docs/source/trust high

For code change context:

target/exact/graph/tests high
docs medium
memory medium

13. Ranking Profiles

13.1 Module Explanation Profile

Prefer:

module source chunks,
class overview chunks,
docs,
graph path chunks,
ADR,
tests as supporting evidence.

Penalize:

unrelated private helpers,
generated code,
stale docs.

13.2 Code Change Profile

Prefer:

target symbol,
parent class,
direct callers/callees,
related tests,
config/schema,
known pitfalls memory.

Penalize:

broad docs,
large files,
unrelated modules.

13.3 API Documentation Profile

Prefer:

API operation chunks,
OpenAPI contract,
route handler,
request/response schema,
tests,
service flow graph.

13.4 Architecture Decision Profile

Prefer:

ADR,
architecture docs,
graph dependencies,
decision memory,
reviewed docs.

13.5 Troubleshooting Profile

Prefer:

runbooks,
config,
operational docs,
error messages,
code path,
recent incidents if integrated.

14. Graph Proximity Scoring

Graph helps rank related artifacts.

14.1 Distance

Distance	Example	Score
0	target symbol	1.0
1	direct caller/callee/test/doc	0.8
2	caller of caller / dependency	0.5
3+	distant	low

14.2 Edge Type Weight

Edge Type	Code Change	Docs
TESTS	high	medium
CALLS	high	high
HANDLED_BY	medium	high
READS_CONFIG	medium	medium
DOCUMENTED_BY	medium	high
GROUNDED_IN	medium	medium
IMPORTS	low/medium	low
MENTIONS	low/medium	medium

14.3 Confidence

Graph edge confidence should multiply proximity.

graphScore = proximityScore * edgeConfidence * edgeTypeWeight

15. Freshness and Trust Ranking

15.1 Freshness

Stale Risk	Score Effect
low	no penalty
medium	small penalty
high	large penalty
critical	exclude unless explicitly requested
unknown	small/medium penalty

15.2 Trust

Trust includes:

source strength,
review state,
conflict state,
generation provenance,
evidence coverage.

15.3 Example

A stale README may have high vector similarity. A current source method with lower semantic similarity may be better.

Ranking should prefer current evidence for factual code questions.

16. Source Boundary Ranking

Source kind matters.

16.1 Default Source Priority

For implementation truth:

current source code
> tests/contracts
> reviewed docs/ADR
> generated reviewed docs
> memory
> unreviewed generated docs
> stale docs

For decision rationale:

reviewed ADR
> architecture docs
> decision memory
> source implementation
> generated docs

For operational troubleshooting:

runbook
> config/infra
> code path
> tests
> README

16.2 Generated and Vendor Penalty

Generated code:

useful if original contract absent,
usually not primary evidence.

Vendor code:

usually exclude.

17. Diversity and Redundancy

Top results should not be 10 chunks from the same file if task needs broad understanding.

17.1 Diversity Rules

Limit:

max chunks per file,
max chunks per symbol,
max docs per doc type,
max memory records,
max graph paths.

17.2 Diversity Example

For module docs, prefer:

one class overview,
main methods,
related tests,
ADR,
config,
graph path.

Not:

8 helper methods from same class.

17.3 Redundancy Handling

If class chunk and method chunks overlap, choose based on task.

explanation: class overview + key methods,
code change: target method + related tests.

18. Explainable Ranking

Every result should explain why it was returned.

18.1 Result Explanation

result:
  title: OrderValidator.validate
  finalScore: 0.91
  reasons:
    - "Exact match to target symbol"
    - "Primary source chunk"
    - "Directly linked to requested module"
    - "Current snapshot"
  penalties: []

18.2 Stale Result Explanation

result:
  title: docs/legacy-rule-engine.md
  finalScore: 0.32
  reasons:
    - "Semantic match to validation rules"
  penalties:
    - "High stale risk"
    - "Mentions missing symbol OrderRuleEngine"

18.3 Why Explanation Matters

It helps:

debugging retrieval,
user trust,
eval,
tuning,
audit.

19. Retrieval Output Contract

Hybrid retrieval should output structured results.

retrievalResult:
  retrievalRunId: ret_01J...
  queryUnderstanding:
    intent: explain_module
    detectedSymbols:
      - OrderValidator
  scope:
    repositoryId: order-service
    commitSha: 6f41ab2
  results:
    - rank: 1
      artifactType: chunk
      chunkId: chunk_order_validator_validate
      title: OrderValidator.validate
      finalScore: 0.91
      evidenceRole: primary_evidence
      reasons:
        - exact symbol/module match
        - source primary evidence
    - rank: 2
      artifactType: chunk
      chunkId: chunk_order_validator_test
      title: OrderValidatorTest
      finalScore: 0.84
      evidenceRole: supporting_evidence
  exclusions:
    - reason: stale_high_risk
      count: 2
    - reason: permission_denied
      count: 5
  versions:
    rankerVersion: hybrid-ranker-v1
    retrieverVersion: retrieval-orchestrator-v1

20. Hybrid Retrieval Implementation

20.1 Interfaces

public interface Retriever {
    boolean supports(RetrievalRequest request);

    List<RetrievalCandidate> retrieve(RetrievalRequest request);
}

20.2 Orchestrator

public final class HybridRetrievalOrchestrator {
    private final List<Retriever> retrievers;
    private final CandidateMerger merger;
    private final PermissionFilter permissionFilter;
    private final Reranker reranker;

    public RetrievalResult retrieve(RetrievalRequest request) {
        QueryUnderstanding understanding = understand(request);

        List<RetrievalCandidate> candidates = retrievers.stream()
            .filter(r -> r.supports(request.withUnderstanding(understanding)))
            .flatMap(r -> r.retrieve(request).stream())
            .toList();

        List<RetrievalCandidate> merged = merger.merge(candidates);
        List<RetrievalCandidate> allowed = permissionFilter.filter(merged, request.principal());
        List<RankedResult> ranked = reranker.rank(allowed, request, understanding);

        return RetrievalResult.of(understanding, ranked);
    }
}

20.3 Reranker

public interface Reranker {
    List<RankedResult> rank(
        List<RetrievalCandidate> candidates,
        RetrievalRequest request,
        QueryUnderstanding understanding
    );
}

21. Lexical Index Design

21.1 Fields

Index:

title,
content,
symbol name,
qualified name,
path,
comments,
doc headings,
endpoint path,
config key,
event topic,
table name.

21.2 Analyzer

For code:

preserve identifiers,
split camelCase,
split snake_case,
split package names,
preserve exact phrase.

Example:

OrderValidator.validate
Order Validator validate
order validator validate

21.3 Field Boost

Field	Boost
exact symbol name	high
qualified name	high
path	high
title/heading	medium/high
content	medium
comments	medium
generated content	low

22. Exact Index Design

Exact indexes should exist for:

symbol name,
qualified symbol,
file path,
route method/path,
config key,
event topic,
table name,
document path,
memory ID.

22.1 Exact Query Output

Exact lookup returns high-confidence candidates, but still filter by permission and snapshot.

22.2 Ambiguity

If symbol name ambiguous:

query: OrderService
candidates:
  - com.acme.order.OrderService
  - com.acme.billing.OrderService
status: ambiguous

Use scope/repo/path to disambiguate.

23. Memory Ranking

Memory should not dominate source evidence.

23.1 Memory Ranking Rules

Boost:

exact scope match,
active state,
approved review,
high confidence,
graph proximity,
task type match.

Penalize/exclude:

stale,
conflicted,
candidate-only,
broad scope mismatch,
low evidence strength.

23.2 Memory Output

Memory should be labeled:

result:
  artifactType: memory
  sourceRole: derived_guidance
  statement: "Validation rules are registered through RuleRegistry."
  evidence:
    - RuleRegistry.java

Do not present memory as source code fact without evidence.

24. Ranking Evaluation

24.1 Golden Queries

Create query set:

- query: "where are validation rules registered?"
  intent: code_location
  expected:
    - RuleRegistry.java
    - OrderValidator.java

- query: "why centralize validation rules?"
  intent: architecture_decision
  expected:
    - docs/adr/012-validation-rules.md

- query: "what tests cover OrderValidator.validate?"
  intent: find_tests
  expected:
    - OrderValidatorTest.shouldRejectInvalidOrder

24.2 Metrics

Metric	Meaning
recall@k	expected relevant items found
precision@k	top results relevant
MRR	first relevant rank
nDCG	ranked relevance
stale@k	stale results in top k
sourceCoverage	includes source/test/docs as required
permissionViolations	must be zero
contextSuccess	downstream task success

24.3 Evaluate by Intent

Do not average everything blindly.

A ranker good at conceptual docs may be bad at exact symbol navigation.

25. Retrieval Debugging

25.1 Debug Trace

For each query, store:

query understanding,
retrievers called,
raw candidates,
merge decisions,
filters,
ranking features,
final results,
exclusions.

25.2 Example

debug:
  retrievers:
    exact_symbol:
      candidates: 1
    vector:
      candidates: 40
    lexical:
      candidates: 22
    graph:
      candidates: 8
  merge:
    before: 71
    after: 48
  filter:
    permissionDenied: 4
    staleExcluded: 2
  ranker:
    version: hybrid-ranker-v1

25.3 Why Debugging Matters

Bad retrieval causes bad docs and bad agent behavior. Without trace, prompt tuning becomes guesswork.

26. Common Mistakes

26.1 Vector Search Only

Fails exact symbol, endpoint, and path queries.

26.2 Lexical Search Only

Fails semantic discovery and vague queries.

26.3 Graph Only

Graph needs seed nodes; it is weak for broad discovery.

26.4 No Permission Filter

Security incident waiting to happen.

26.5 Ranking Without Task Intent

Different tasks need different evidence.

26.6 No Freshness Penalty

Stale docs can rank highly because they are semantically similar.

26.7 Memory Treated as Source Truth

Memory should be derived guidance.

26.8 No Explainability

Ranker becomes impossible to tune and audit.

27. Practical Exercise

Build hybrid retrieval for one repository.

27.1 Input

Use indexed artifacts:

symbols
chunks
documents
graph edges
memory records
vector records
lexical index

27.2 Queries

Test:

OrderValidator.validate
what handles POST /orders?
where are validation rules registered?
why are validation rules centralized?
what tests cover invalid orders?
what config controls max order items?

27.3 Output

Produce:

retrieval-results.json
retrieval-debug-trace.json
retrieval-eval-report.yaml

27.4 Acceptance Criteria

exact symbol query returns symbol first,
endpoint query returns route handler and contract,
conceptual query includes source + docs + tests,
stale docs penalized,
permission filter applied,
memory labeled as derived,
ranking reasons included,
eval metrics reported per intent.

28. Summary

Hybrid search and ranking is where retrieval becomes useful for real engineering tasks.

Key points:

no single retrieval method is enough,
query understanding should route to exact, lexical, vector, graph, docs, and memory retrieval,
all retrievers should output a common candidate model,
candidates must be merged by logical identity,
permission and scope filtering are mandatory,
ranking should account for exact match, lexical, vector, graph, task intent, trust, and freshness,
stale/generated/vendor/low-confidence artifacts need penalties,
memory must be ranked as derived guidance,
result explanations are essential for debugging and trust,
retrieval evaluation must be task-specific.

Part berikutnya membahas Context Assembly Engine: bagaimana mengubah retrieval results menjadi context pack yang compact, safe, cited, ordered, token-aware, and useful for documentation generation and AI agents.

Lesson Recap

You just completed lesson 15 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 14

Learn Ai Code Documentation Agent Memory Part 014 Embedding And Vector Indexing

Next Lesson

Lesson 16

Learn Ai Code Documentation Agent Memory Part 016 Context Assembly Engine