Start HereOrdered learning track

Learn Ai Code Documentation Agent Memory Part 002 Product Vision And Problem Boundary

[]18 min read3412 words

In This Lesson

1. Tujuan Part Ini 2. Definisi Produk 3. Problem Statement

Lesson 0235 lesson track01–06 Start Here

title: Learn AI Code Documentation & Agent Memory Platform - Part 002 description: Product vision dan problem boundary untuk platform AI code documentation, repository intelligence, dan agent context/memory. series: learn-ai-code-documentation-agent-memory seriesTitle: Learn AI Code Documentation & Agent Memory Platform order: 2 partTitle: Product Vision and Problem Boundary tags:

ai
product-architecture
code-intelligence
documentation
agent-memory
repository-analysis
software-architecture date: 2026-07-02

Part 002 — Product Vision and Problem Boundary

1. Tujuan Part Ini

Part 001 memetakan skill.

Part ini mengunci batas produk.

Ini penting karena proyek seperti ini mudah melebar. Dalam satu minggu, scope bisa berubah dari "generate docs dari repo" menjadi:

chatbot,
code search,
service catalog,
AI coding agent,
docs portal,
observability platform,
knowledge graph,
enterprise governance system,
compliance archive,
CI automation,
IDE extension.

Semua itu mungkin relevan. Tapi jika semua dibangun sekaligus, produk akan gagal.

Part ini menjawab:

Apa produk yang sebenarnya kita bangun?
Siapa user-nya?
Apa problem yang diselesaikan?
Apa yang sengaja tidak diselesaikan?
Apa MVP yang masuk akal?
Bagaimana membedakan human documentation, AI context, dan agent memory?
Apa invariant produk yang tidak boleh dilanggar?

2. Definisi Produk

Produk yang kita bangun adalah:

Platform repository intelligence yang mengubah satu atau banyak repository menjadi dokumentasi manusia dan context/memory terstruktur untuk AI agents, dengan provenance, permission, freshness, dan evaluasi kualitas.

Definisi tersebut sengaja panjang karena setiap kata penting.

Frasa	Arti
Platform	Bukan script sekali jalan. Ada lifecycle, API, storage, jobs, auth, eval.
Repository intelligence	Tidak sekadar membaca file. Sistem memahami struktur, symbol, dependency, dan docs.
Satu atau banyak repository	Harus bisa mulai single-repo, lalu berevolusi ke multi-repo.
Dokumentasi manusia	Output yang bisa dibaca engineer.
Context/memory AI agents	Output yang compact, task-aware, dan machine-consumable.
Provenance	Setiap claim penting punya evidence.
Permission	Access mengikuti source repository.
Freshness	Knowledge bisa stale dan harus dideteksi.
Evaluasi kualitas	Sistem harus bisa diuji, bukan hanya terlihat pintar.

3. Problem Statement

Problem statement yang tepat:

Software teams cannot reliably keep human documentation and AI agent context synchronized with fast-changing codebases across repositories, because code knowledge is fragmented, unstated, stale, and difficult to retrieve with provenance.

Mari kita pecah.

3.1 Code Knowledge Fragmented

Knowledge tersebar di:

source code,
tests,
config,
README,
ADR,
API specs,
DB migrations,
CI files,
deployment manifests,
ticket,
incident notes,
chat,
tribal knowledge.

Repository memang penting, tetapi repository bukan satu-satunya knowledge source. Namun untuk seri ini, repository menjadi starting point karena paling dekat dengan source of truth teknis.

3.2 Code Knowledge Often Unstated

Banyak hal penting tidak tertulis eksplisit.

Contoh:

module mana yang menjadi entry point,
flow validasi,
dependency antar service,
convention error handling,
ownership implicit,
lifecycle entity,
retry behavior,
idempotency assumption,
transactional boundary.

Sistem harus bisa menginferensi sebagian, tetapi tetap harus menandai confidence dan evidence.

3.3 Documentation Becomes Stale

Docs stale karena:

kode berubah,
docs tidak ikut diupdate,
ownership berubah,
API berubah,
dependency berubah,
runtime behavior berubah,
docs duplicate dan bertentangan.

Karena itu, generated docs harus punya freshness metadata.

3.4 AI Agents Need Better Context

AI agent yang buruk sering bukan karena modelnya lemah, tetapi karena context-nya buruk.

Context buruk biasanya:

terlalu besar,
terlalu kecil,
tidak relevan,
stale,
tidak punya source,
mencampur repo yang salah,
melanggar permission,
tidak sesuai task.

Produk ini harus menjadi context provider, bukan hanya document generator.

4. User Persona

4.1 Backend Engineer

Butuh:

memahami module cepat,
mencari flow request/event,
tahu impact perubahan,
update docs tanpa membaca seluruh repo,
memberi context ke AI coding assistant.

Contoh pertanyaan:

Explain how order validation works and which classes I should inspect before changing validation rules.

4.2 Tech Lead / Staff Engineer

Butuh:

dependency map,
ownership map,
cross-repo impact,
architecture consistency,
stale docs report,
onboarding path.

Contoh pertanyaan:

Which services depend on the legacy pricing contract, and which docs need to be updated if we remove it?

4.3 Platform Engineer

Butuh:

scalable indexing,
permission model,
integration with Git provider,
API/tool layer,
observability,
cost control.

Contoh pertanyaan:

How do we index 800 repositories incrementally without leaking private repository data across teams?

4.4 AI Agent

Butuh:

compact task context,
safe tools,
exact symbol references,
write boundary,
memory,
evidence.

Contoh request:

{
  "task": "modify order validation rule",
  "repo": "order-service",
  "branch": "feature/new-rule",
  "neededContext": [
    "relevant symbols",
    "tests",
    "architecture constraints",
    "recent memory"
  ]
}

4.5 Engineering Manager

Butuh:

documentation coverage,
onboarding quality,
risk visibility,
ownership gap,
process metrics.

Contoh pertanyaan:

Which critical services have stale runbooks or no architecture docs?

5. Job To Be Done

Kita bisa menulis JTBD seperti ini:

5.1 Human Documentation JTBD

Ketika engineer perlu memahami area kode yang tidak familiar, mereka ingin dokumentasi yang akurat, ringkas, dan bisa diverifikasi, supaya bisa membuat perubahan tanpa membaca seluruh repository.

5.2 Agent Context JTBD

Ketika AI agent diminta melakukan task engineering, agent membutuhkan context yang relevan, compact, dan permission-safe, supaya bisa bertindak dengan lebih akurat dan tidak melakukan eksplorasi berulang.

5.3 Maintenance JTBD

Ketika kode berubah, tim ingin tahu knowledge mana yang stale, supaya dokumentasi dan memory tetap sinkron dengan source.

5.4 Governance JTBD

Ketika sistem menghasilkan knowledge turunan dari private repository, organisasi perlu memastikan akses, audit, dan provenance tetap sesuai aturan.

6. Tiga Output Produk yang Berbeda

Jangan campur tiga output ini.

6.1 Human Documentation

Karakteristik:

readable,
naratif,
durable,
bisa direview,
cocok untuk onboarding,
punya heading dan struktur,
bisa dipublish ke docs portal/repo.

Contoh:

# Order Validation Module

The order validation module verifies order eligibility before quote conversion...

6.2 Agent Context Pack

Karakteristik:

task-specific,
compact,
evidence-dense,
boleh tidak indah dibaca,
punya source path,
disusun untuk token budget.

Contoh:

task: update-validation-rule
context:
  - symbol: OrderValidator.validate
    path: src/main/java/.../OrderValidator.java
    reason: primary validation entry point
  - symbol: RuleRegistry
    path: src/main/java/.../RuleRegistry.java
    reason: rule registration source
constraints:
  - update tests in OrderValidatorTest
  - preserve idempotency behavior

6.3 Agent Memory

Karakteristik:

reusable,
scoped,
versioned,
expires,
conflict-aware,
permission-aware.

Contoh:

memory:
  type: repo_convention
  scope: order-service
  statement: "Validation rules are registered through RuleRegistry, not instantiated directly in controllers."
  evidence:
    - path: src/main/java/.../RuleRegistry.java
      commit: 6f41ab2
  expiresWhen:
    - symbolChanged: RuleRegistry

6.4 Perbandingan

Dimensi	Human Docs	Agent Context	Agent Memory
Audience	Engineer	Agent run	Future agent runs
Format	Markdown/MDX	YAML/JSON/structured text	Structured records
Lifecycle	Days/months	Minutes/hours	Days/months
Style	Naratif	Dense	Atomic
Evidence	Required	Required	Required
Token optimized	Tidak utama	Ya	Ya
Review	Human review	Runtime validation	Governance/revision

7. Scope MVP

MVP harus kecil tapi benar secara arsitektur.

7.1 MVP Goal

Given one repository at one commit, generate evidence-based module documentation and agent context pack for a selected module.

Itu cukup.

Jangan mulai dengan multi-repo enterprise platform.

7.2 MVP Input

Input minimal:

repository:
  name: order-service
  url: git@github.com:acme/order-service.git
  branch: main
  commit: 6f41ab2
target:
  type: module
  path: src/main/java/com/acme/order/validation
output:
  - human_doc
  - agent_context_pack

7.3 MVP Output

Output minimal:

generated module documentation,
context pack,
evidence list,
unsupported claim report,
run trace.

7.4 MVP Non-Goals

MVP tidak melakukan:

automatic PR merge,
full multi-repo indexing,
runtime tracing,
IDE plugin,
complex graph database,
admin dashboard,
fine-tuning model,
autonomous code modification.

7.5 MVP Architecture

8. Product Boundary

8.1 In Scope

Capability	MVP	Later
Single repo scan	Yes	Yes
File classification	Yes	Yes
Language detection	Basic	Advanced
Symbol extraction	One/two languages	Multi-language
Module documentation	Yes	Yes
Agent context pack	Yes	Yes
Evidence metadata	Yes	Yes
Unsupported claim check	Basic	Advanced
Multi-repo graph	No	Yes
MCP server	No	Yes
Memory store	Basic candidate	Yes
RBAC	Basic local	Enterprise
UI portal	No	Optional
IDE extension	No	Optional

8.2 Out of Scope Awal

Out of Scope	Kenapa
Autonomous code writing	Terlalu besar; butuh sandbox, tests, approval, patch strategy.
Fine-tuning LLM	Retrieval dan context lebih penting untuk MVP.
Full static analysis compiler-level	Mahal dan language-specific. Mulai dari useful approximation.
Full enterprise service catalog	Nanti setelah repository model stabil.
Runtime observability ingestion	Penting, tapi bukan fondasi awal.
Auto-delete/modify memory tanpa policy	Risky. Harus ada governance.

9. Single-Repo vs Multi-Repo

9.1 Single-Repo

Single-repo cocok untuk MVP.

Kelebihan:

lebih mudah dipahami,
permission sederhana,
indexing murah,
graph lebih kecil,
quality gate lebih mudah.

Kelemahan:

tidak melihat cross-service impact,
dependency eksternal hanya terlihat sebagian,
ownership lintas platform tidak terlihat.

9.2 Multi-Repo

Multi-repo diperlukan untuk platform nyata.

Kelebihan:

bisa impact analysis lintas service,
bisa service dependency docs,
bisa platform-level onboarding,
bisa identify duplicate logic,
bisa cross-team ownership map.

Kelemahan:

permission kompleks,
indexing mahal,
graph identity lebih sulit,
version alignment sulit,
duplicate/conflicting knowledge lebih sering,
blast radius security lebih besar.

9.3 Strategi Evolusi

Jangan desain MVP yang single-repo-only secara permanen.

Desain dari awal dengan field:

repositoryId: string
snapshotId: string
commitSha: string
tenantId: string
visibility: string
sourceSystem: git

Dengan begitu, sistem bisa mulai single-repo tetapi tidak harus dibongkar ulang saat masuk multi-repo.

10. Human Docs vs AI Context: Design Tension

Satu kesalahan umum adalah menggunakan dokumen yang sama untuk manusia dan AI agent.

Itu tidak selalu tepat.

10.1 Human Docs Perlu Narasi

Human docs butuh:

background,
conceptual explanation,
diagrams,
examples,
trade-off,
links,
onboarding flow.

10.2 Agent Context Perlu Presisi

Agent context butuh:

exact files,
relevant symbols,
constraints,
tests,
known pitfalls,
allowed tools,
compact evidence.

10.3 Solusi

Gunakan satu evidence base, tetapi generate dua projection:

Evidence base sama. Format output berbeda.

11. Agent Memory Boundary

Memory adalah area paling mudah disalahdesain.

11.1 Memory Bukan Cache

Cache:

mempercepat akses,
bisa dihapus kapan saja,
tidak perlu semantic meaning.

Memory:

menyimpan knowledge,
memengaruhi perilaku agent,
harus punya scope,
harus punya governance.

11.2 Memory Bukan Dokumentasi

Dokumentasi biasanya naratif.

Memory harus atomic dan actionable.

Buruk:

memory: "Order service is complicated and has many validation rules."

Lebih baik:

memory:
  statement: "Order validation rules are registered in RuleRegistry."
  scope: repository:order-service
  evidence:
    - path: src/main/java/com/acme/order/validation/RuleRegistry.java
  confidence: 0.84

11.3 Memory Bukan Ground Truth

Ground truth tetap source.

Memory adalah derived knowledge. Jika source berubah, memory harus direvalidasi.

Invariant:

No memory should outlive the source evidence that invalidates it.

12. Core Product Invariants

Invariant adalah aturan yang harus benar di semua kondisi.

12.1 Evidence Invariant

Every important generated claim must be traceable to source evidence or marked as uncertain.

Implikasi:

output harus punya citation map,
context pack harus menyimpan source span,
quality gate harus bisa menemukan unsupported claims.

12.2 Permission Invariant

Derived knowledge must not be more visible than its source.

Implikasi:

index mengikuti ACL repo,
memory mengikuti ACL source,
docs mengikuti classification source,
search harus filter by permission sebelum atau saat retrieval.

12.3 Freshness Invariant

Generated knowledge must know which source version it represents.

Implikasi:

setiap docs punya commit SHA,
setiap memory punya evidence version,
perubahan file harus bisa invalidate chunk/docs/memory.

12.4 Reproducibility Invariant

A generated output should be reproducible from its source snapshot and generation metadata.

Implikasi:

simpan model/prompt/template version,
simpan retrieved evidence ID,
simpan commit,
simpan output metadata.

12.5 Safe Write Invariant

The system may propose changes, but official writes require explicit workflow and approval.

Implikasi:

default read-only,
generated docs sebagai draft,
PR bukan direct push,
memory candidate sebelum memory active.

13. Product Data Model Awal

Kita butuh data model sebelum coding.

13.1 Core Entities

13.2 Entity Definition

Entity	Description
Tenant	Organization/workspace boundary.
Repository	Git repository identity.
Snapshot	Specific commit/branch scan result.
File	File metadata at snapshot.
Symbol	Extracted code entity.
Document	Existing source docs.
Chunk	Search/retrieval unit.
EvidenceRef	Reference to source span.
GeneratedDoc	AI-produced documentation.
MemoryRecord	Persistent knowledge candidate/active record.
Run	Execution trace for indexing/doc generation.

13.3 Why `Snapshot` Matters

Tanpa snapshot, kita tidak tahu docs menjelaskan versi mana.

Bad schema:

docs(repository_id, content)

Better schema:

docs(repository_id, snapshot_id, commit_sha, content, generated_at)

Best schema eventually:

generated_docs(
    id,
    tenant_id,
    repository_id,
    snapshot_id,
    commit_sha,
    doc_type,
    target_entity_id,
    generator_version,
    context_pack_id,
    content,
    quality_score,
    review_state,
    created_at
)

14. Product Workflows

14.1 Generate Module Documentation

14.2 Generate Agent Context Pack

14.3 Update After Commit

15. Key User Stories

15.1 Engineer: Understand Module

Given a repository has been indexed
And I select a module path
When I request module documentation
Then the system generates a document explaining purpose, components, flow, dependencies, and known uncertainties
And every major claim includes source evidence

15.2 Agent: Get Task Context

Given an AI agent needs to modify a validation rule
When it requests context for the task
Then the system returns relevant symbols, tests, docs, constraints, and memory
And the context does not include unauthorized repositories

15.3 Tech Lead: Detect Stale Docs

Given source code changed after docs were generated
When I request stale documentation report
Then the system lists docs whose evidence changed
And explains which files or symbols caused staleness

15.4 Platform Engineer: Audit Output

Given a generated document exists
When I inspect its generation run
Then I can see repository commit, retrieved chunks, model/template version, quality checks, and reviewer state

16. Quality Bar

16.1 Functional Quality

The system should:

scan repository consistently,
extract useful symbols,
retrieve relevant evidence,
generate docs with citations,
build agent context packs,
detect unsupported claims,
preserve source version metadata.

16.2 Non-Functional Quality

The system should be:

reproducible,
incremental,
permission-aware,
observable,
cost-aware,
reviewable,
extensible.

16.3 Product Quality Metrics

Metric	Meaning
Evidence coverage	Percentage of generated claims supported by evidence.
Unsupported claim count	Claims without evidence.
Retrieval precision@k	Relevance of top retrieved chunks.
Stale doc count	Docs whose source evidence changed.
Context token efficiency	Useful evidence per token.
Memory invalidation accuracy	Whether stale memory is detected.
Permission violation count	Must be zero.
Human acceptance rate	Docs accepted without major rewrite.

17. Architecture Boundary

17.1 What Belongs in Core

Core platform:

repository scanner,
file classifier,
parser/symbol extractor,
metadata store,
search/retrieval,
context assembler,
doc generator,
memory manager,
quality gate,
API.

17.2 What Belongs in Integrations

Integrations:

GitHub/GitLab/Bitbucket connector,
Slack/Teams notification,
docs portal publishing,
IDE extension,
CI/CD hook,
MCP server,
issue tracker sync.

17.3 Why This Boundary Matters

Core should not depend on GitHub-specific assumptions.

Bad:

PullRequestDocumentationGenerator

Better:

DocumentationGenerationService
  input: RepositorySnapshot, TargetScope, DocType
  output: GeneratedDocDraft

Then GitHub PR integration becomes adapter.

18. API Surface Awal

Kita belum mendesain OpenAPI penuh, tetapi kita bisa menetapkan API shape.

18.1 Repository API

POST /repositories
GET /repositories/{repositoryId}
POST /repositories/{repositoryId}/sync
GET /repositories/{repositoryId}/snapshots

18.2 Search API

POST /search
POST /symbols/search
GET /symbols/{symbolId}
GET /symbols/{symbolId}/neighbors

18.3 Documentation API

POST /documentation/generate
GET /documentation/{docId}
GET /documentation/{docId}/evidence
POST /documentation/{docId}/review

18.4 Context API

POST /context/assemble
GET /context-packs/{contextPackId}

18.5 Memory API

POST /memory/candidates
GET /memory/search
POST /memory/{memoryId}/approve
POST /memory/{memoryId}/invalidate

18.6 Run API

GET /runs/{runId}
GET /runs/{runId}/trace
GET /runs/{runId}/quality-report

19. Request/Response Examples

19.1 Generate Documentation Request

{
  "repositoryId": "repo_order_service",
  "snapshot": {
    "branch": "main",
    "commitSha": "6f41ab2"
  },
  "target": {
    "type": "module",
    "path": "src/main/java/com/acme/order/validation"
  },
  "docType": "module_documentation",
  "audience": ["backend_engineer", "ai_agent"],
  "options": {
    "includeMermaid": true,
    "requireEvidence": true,
    "maxTokens": 12000
  }
}

19.2 Generate Documentation Response

{
  "docId": "doc_01J...",
  "runId": "run_01J...",
  "status": "draft",
  "quality": {
    "evidenceCoverage": 0.87,
    "unsupportedClaimCount": 1,
    "staleRisk": "low"
  },
  "outputs": {
    "markdownPath": "generated/order-validation.md",
    "evidencePath": "generated/order-validation.evidence.json"
  }
}

19.3 Assemble Agent Context Request

{
  "repositoryId": "repo_order_service",
  "branch": "main",
  "task": {
    "type": "code_change",
    "description": "Add a validation rule for corporate orders"
  },
  "target": {
    "symbol": "com.acme.order.validation.OrderValidator"
  },
  "budget": {
    "maxTokens": 8000
  },
  "include": {
    "tests": true,
    "memory": true,
    "docs": true,
    "graphNeighbors": true
  }
}

19.4 Assemble Agent Context Response

{
  "contextPackId": "ctx_01J...",
  "tokenEstimate": 7420,
  "evidence": [
    {
      "kind": "symbol",
      "path": "src/main/java/com/acme/order/validation/OrderValidator.java",
      "lines": [12, 144],
      "reason": "Primary validation entry point"
    },
    {
      "kind": "test",
      "path": "src/test/java/com/acme/order/validation/OrderValidatorTest.java",
      "lines": [20, 188],
      "reason": "Relevant test coverage"
    }
  ],
  "memory": [
    {
      "memoryId": "mem_order_validation_rule_registry",
      "statement": "Rules are registered through RuleRegistry."
    }
  ],
  "warnings": [
    "No ADR was found for corporate order validation."
  ]
}

20. Deployment Boundary

MVP bisa berjalan lokal.

Production butuh service boundary.

20.1 Local MVP

CLI + local repo + local metadata DB + generated files

Kelebihan:

cepat,
mudah debug,
murah,
cocok untuk belajar.

Kekurangan:

tidak multi-user,
permission sederhana,
tidak ada worker scaling,
tidak ada audit kuat.

20.2 Team Deployment

API service + worker + shared database + object storage + search index

Kelebihan:

bisa dipakai tim,
ada shared index,
bisa punya review workflow.

Kekurangan:

auth dan permission mulai serius,
cost perlu dikontrol,
job retry perlu benar.

20.3 Enterprise Deployment

Multi-tenant API + distributed workers + graph/vector/search stores + audit + policy engine

Kelebihan:

multi-team,
multi-repo,
governance,
observability.

Kekurangan:

kompleks,
butuh platform team,
failure mode lebih banyak.

21. Technology-Agnostic First

Seri ini akan membahas teknologi, tetapi produk tidak boleh tergantung pada satu vendor.

21.1 Abstraction yang Harus Ada

Area	Abstraction
Git provider	`RepositoryProvider`
Parser	`LanguageParser`
Embedding	`EmbeddingProvider`
LLM	`GenerationProvider`
Vector store	`VectorIndex`
Search	`LexicalIndex`
Graph	`GraphRepository`
AuthZ	`PermissionEvaluator`
Memory	`MemoryStore`

21.2 Contoh Interface

public interface RepositoryProvider {
    RepositorySnapshot fetchSnapshot(RepositoryRef ref, SnapshotSelector selector);
}

public interface LanguageParser {
    boolean supports(Language language);
    ParseResult parse(SourceFile file);
}

public interface ContextAssembler {
    ContextPack assemble(ContextRequest request);
}

public interface DocumentationGenerator {
    GeneratedDocument generate(DocumentationRequest request, ContextPack context);
}

Desain ini membuat sistem bisa mulai sederhana, lalu mengganti implementasi tanpa membongkar domain model.

22. Risk Register

22.1 Product Risks

Risk	Mitigation
Output tidak dipercaya	Evidence, citations, review workflow.
Docs terlalu verbose	Doc taxonomy dan audience-specific template.
Agent context terlalu besar	Token budget dan context ranking.
Scope melebar	MVP boundary dan anti-goals.
User tidak mau review	Buat diff kecil dan quality report jelas.

22.2 Technical Risks

Risk	Mitigation
Parser gagal di banyak bahasa	Plugin parser dan fallback lexical.
Indexing mahal	Incremental scan, batching, cache.
Retrieval buruk	Hybrid search, graph expansion, eval.
Memory stale	Invalidation policy.
Multi-repo identity kacau	Stable IDs dan canonical naming.

22.3 Security Risks

Risk	Mitigation
Secret leakage	Secret scanning before indexing/context.
Prompt injection dari repo	Treat repo content as untrusted data.
Permission leak	Source-derived ACL.
Dangerous tools	Least-privilege and read-only default.
Memory contamination	Review and provenance.

23. The Product Contract

Kita bisa menulis product contract seperti ini:

The platform will not claim to understand code unless it can point to evidence.
The platform will not expose derived knowledge to users who cannot access the source.
The platform will not treat generated documentation as official without review.
The platform will not keep memory active after its source evidence is invalidated.
The platform will not optimize for model cleverness over retrieval correctness.

Ini bukan slogan. Ini constraint desain.

24. Recommended MVP Milestones

Milestone 1 — Repository Snapshot

Output:

repository metadata,
file list,
file classification,
fingerprint.

Milestone 2 — Symbol Index

Output:

extracted symbols,
line ranges,
stable IDs,
basic symbol search.

Milestone 3 — Evidence Retrieval

Output:

retrieve relevant files/symbols/docs,
simple ranking,
evidence list.

Milestone 4 — Context Pack

Output:

structured context pack,
token budget,
citation map.

Milestone 5 — Documentation Draft

Output:

module docs,
source evidence section,
uncertainty section.

Milestone 6 — Quality Report

Output:

unsupported claims,
stale risk,
missing docs,
reviewer checklist.

Milestone 7 — Memory Candidate

Output:

proposed memory records,
evidence,
expiry policy,
approval state.

25. Example End-to-End MVP Scenario

25.1 Input

Repository: order-service
Target: src/main/java/com/acme/order/validation
Doc Type: Module Documentation
Audience: Backend Engineer

25.2 System Action

Resolve commit.
Classify files under target path.
Parse source files.
Extract symbols.
Find tests.
Find ADR/README references.
Retrieve evidence.
Assemble context.
Generate docs.
Verify claims.
Produce output.

25.3 Output Files

generated/
  order-validation.module-doc.md
  order-validation.agent-context.yaml
  order-validation.evidence.json
  order-validation.quality-report.yaml

25.4 Quality Report Example

docId: doc_order_validation
repository: order-service
commit: 6f41ab2
quality:
  evidenceCoverage: 0.88
  unsupportedClaims:
    - claim: "Validation rules are loaded dynamically from database."
      reason: "No evidence found in retrieved context."
  missingEvidence:
    - "Retry behavior"
  staleRisk: low
review:
  required: true
  suggestedReviewer: team-order-platform

26. Decision Records untuk Produk

Sejak awal, tulis keputusan penting sebagai ADR.

ADR 001 — Evidence Required for Generated Claims

# ADR 001 — Evidence Required for Generated Claims

## Status

Accepted

## Context

Generated documentation can sound correct while being unsupported by code evidence.

## Decision

Every major generated claim must be linked to source evidence or marked as uncertain.

## Consequences

- Context assembly must preserve source spans.
- Documentation output must include evidence.
- Verification pipeline must check unsupported claims.

ADR 002 — Default Read-Only Agent Tools

# ADR 002 — Default Read-Only Agent Tools

## Status

Accepted

## Context

Agent tools that can write to repositories, memory, or external systems increase risk.

## Decision

All agent tools are read-only by default. Write operations must produce proposals that require approval.

## Consequences

- Safer MVP.
- More explicit review process.
- Slightly slower automation.

27. What We Will Build Across the Series

Seri ini akan bergerak dari MVP menuju platform.

28. What We Will Not Optimize Yet

Untuk menjaga fokus, jangan optimasi ini terlalu awal:

Jangan Optimasi Dulu	Alasan
Model selection	Retrieval dan evidence lebih fundamental.
UI cantik	Core pipeline harus benar dulu.
Multi-language sempurna	Mulai dari 1–2 bahasa.
Graph database pilihan	Model graph lebih penting dari storage awal.
Agent autonomy	Context quality dulu.
Complex permissions	Mulai dengan source-derived model sederhana.
Massive scale	Desain incremental, tapi implementasi bertahap.

29. Common Misframing

29.1 "Ini hanya RAG untuk kode"

Kurang tepat.

RAG adalah bagian retrieval. Produk ini juga mencakup:

parsing,
symbol identity,
graph,
documentation lifecycle,
memory lifecycle,
permission,
quality gates,
audit.

29.2 "Vector DB akan menyelesaikan semuanya"

Tidak.

Vector search lemah untuk exact identifier, versioning, permission, dan structural relation. Kita butuh hybrid retrieval.

29.3 "Docs bisa langsung digenerate dari semua file"

Bisa, tapi hasilnya sering buruk.

Lebih baik:

pilih target,
retrieve evidence,
assemble context,
generate draft,
verify claims.

29.4 "Memory sama dengan menyimpan summary"

Tidak.

Memory harus atomic, scoped, evidence-based, dan bisa expire.

29.5 "Agent boleh membaca semua repo supaya pintar"

Tidak.

Agent harus mengikuti permission user/task. Context yang tidak boleh dilihat user juga tidak boleh diberikan ke agent atas nama user.

30. Exit Criteria Part Ini

Kita siap lanjut jika sudah jelas:

produk bukan chatbot repo biasa,
MVP adalah single-repo evidence-based doc/context generator,
human docs, agent context, dan memory adalah output berbeda,
source evidence adalah pusat trust,
permission harus diwariskan dari source,
freshness harus eksplisit,
write operation harus melalui approval,
multi-repo adalah evolusi, bukan titik awal.

31. Ringkasan

Product boundary yang baik membuat engineering decision lebih mudah.

Untuk seri ini, produk final adalah:

Repository intelligence platform untuk menghasilkan human documentation dan AI agent context/memory berbasis source evidence.

MVP yang benar:

Single-repo, commit-aware, module-level documentation dan agent context pack dengan evidence, quality report, dan memory candidate.

Hal yang tidak boleh dikorbankan:

evidence,
permission,
freshness,
reproducibility,
safe write boundary.

Part berikutnya akan membangun System Mental Model: bagaimana memandang repository sebagai evidence database, code sebagai graph, dokumentasi sebagai projection, dan memory sebagai derived knowledge yang punya lifecycle.

Lesson Recap

You just completed lesson 02 in start here. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 01

Learn Ai Code Documentation Agent Memory Part 001 Kaufman Skill Map

Next Lesson

Lesson 03

Learn Ai Code Documentation Agent Memory Part 003 System Mental Model

Learn Ai Code Documentation Agent Memory Part 002 Product Vision And Problem Boundary

Part 002 — Product Vision and Problem Boundary

1. Tujuan Part Ini

2. Definisi Produk

3. Problem Statement

3.1 Code Knowledge Fragmented

3.2 Code Knowledge Often Unstated

3.3 Documentation Becomes Stale

3.4 AI Agents Need Better Context

4. User Persona

4.1 Backend Engineer

4.2 Tech Lead / Staff Engineer

4.3 Platform Engineer

4.4 AI Agent

4.5 Engineering Manager

5. Job To Be Done

5.1 Human Documentation JTBD

5.2 Agent Context JTBD

5.3 Maintenance JTBD

5.4 Governance JTBD

6. Tiga Output Produk yang Berbeda

6.1 Human Documentation

6.2 Agent Context Pack

6.3 Agent Memory

6.4 Perbandingan

7. Scope MVP

7.1 MVP Goal

7.2 MVP Input

7.3 MVP Output

7.4 MVP Non-Goals

7.5 MVP Architecture

8. Product Boundary

8.1 In Scope

8.2 Out of Scope Awal

9. Single-Repo vs Multi-Repo

9.1 Single-Repo

9.2 Multi-Repo

9.3 Strategi Evolusi

10. Human Docs vs AI Context: Design Tension

10.1 Human Docs Perlu Narasi

10.2 Agent Context Perlu Presisi

10.3 Solusi

11. Agent Memory Boundary

11.1 Memory Bukan Cache

11.2 Memory Bukan Dokumentasi

11.3 Memory Bukan Ground Truth

12. Core Product Invariants

12.1 Evidence Invariant

12.2 Permission Invariant

12.3 Freshness Invariant

12.4 Reproducibility Invariant

12.5 Safe Write Invariant

13. Product Data Model Awal

13.1 Core Entities

13.2 Entity Definition

13.3 Why Snapshot Matters

14. Product Workflows

14.1 Generate Module Documentation

14.2 Generate Agent Context Pack

14.3 Update After Commit

15. Key User Stories

15.1 Engineer: Understand Module

15.2 Agent: Get Task Context

15.3 Tech Lead: Detect Stale Docs

15.4 Platform Engineer: Audit Output

16. Quality Bar

16.1 Functional Quality

16.2 Non-Functional Quality

16.3 Product Quality Metrics

17. Architecture Boundary

17.1 What Belongs in Core

17.2 What Belongs in Integrations

17.3 Why This Boundary Matters

18. API Surface Awal

18.1 Repository API

18.2 Search API

18.3 Documentation API

18.4 Context API

18.5 Memory API

18.6 Run API

13.3 Why `Snapshot` Matters