Learn Ai Docs Km Cli Part 025 Source Grounded Generation
title: Build From Scratch: Mintlify-like AI-driven Documentation Generator CLI - Part 025 description: Design source-grounded generation so every generated documentation claim can be traced back to code, contracts, tests, configuration, or approved human notes. series: learn-ai-docs-km-cli seriesTitle: Build From Scratch: Mintlify-like AI-driven Documentation Generator CLI with Code2Prompt and Open-source Knowledge Management order: 25 partTitle: Source-grounded Generation tags:
- ai-docs
- documentation
- source-grounding
- hallucination
- provenance
- verification
- cli
- mdx date: 2026-07-04
Part 025 — Source-grounded Generation
In the previous part, we built the documentation verifier core.
Now we need to make generation itself safer.
The core rule of this part is simple:
If the system cannot point to a source, the system should not present the statement as a fact.
That rule sounds obvious. In practice, most AI documentation tools violate it constantly.
They generate confident docs from incomplete context, invent missing behavior, smooth over contradictions, create fake examples, and produce architecture narratives that sound plausible but are not actually supported by the repository.
A Mintlify-like AI docs CLI must not behave like that.
It must behave more like a compiler:
- collect source material,
- build a grounded context,
- ask for constrained output,
- extract claims,
- map claims to evidence,
- reject unsupported claims,
- preserve provenance in artifacts,
- make uncertainty visible.
This part designs that pipeline.
We are not trying to make a perfect truth machine. We are building a system with explicit limits.
The goal is not:
“The AI is always correct.”
The goal is:
“The system can explain what each generated claim is based on, detect when evidence is missing, and force review before unsupported content enters the docs.”
That is a realistic production target.
1. Why source-grounding is non-negotiable
Developer documentation has a special property: most of its truth lives close to the repository.
A generated page may describe:
- exported functions,
- API endpoints,
- configuration keys,
- environment variables,
- CLI commands,
- database migrations,
- event schemas,
- authentication behavior,
- deployment topology,
- error responses,
- examples,
- operational procedures.
These facts are not stable essays. They are coupled to code.
When code changes, docs can become wrong.
When generated docs are wrong, the damage is practical:
- users call nonexistent endpoints,
- developers copy invalid examples,
- operators run obsolete commands,
- customers misunderstand behavior,
- support teams debug using stale assumptions,
- AI agents retrieve bad documentation and amplify the error.
This is why generated docs need source-grounding.
Source-grounding means:
The generator may only produce factual claims that are either directly supported by repository evidence, supported by approved human-authored notes, or explicitly marked as assumption / unknown / recommendation.
This is not only an LLM prompt trick. It is a system design constraint.
2. Mental model: documentation as claims over evidence
A documentation page is not just prose.
It is a collection of claims.
Example:
The `/v1/invoices` endpoint creates an invoice and returns `201 Created` with the created invoice object.
This sentence contains several claims:
- there is an endpoint
/v1/invoices, - it creates an invoice,
- successful creation returns status
201, - the response contains an invoice object.
Each claim may have different evidence:
| Claim | Possible evidence |
|---|---|
| Endpoint exists | OpenAPI path, router definition, controller annotation |
| It creates invoice | handler name, OpenAPI operation summary, test behavior |
| Returns 201 | OpenAPI response, integration test assertion |
| Response object shape | schema definition, test fixture, serializer |
The system should reason at this level.
A page is therefore:
Page = sections + claims + examples + links + source references + verification report
The generator writes prose.
The verifier extracts claims.
The provenance layer maps claims to evidence.
The review layer shows unsupported claims to humans.
3. Grounding levels
Not all claims are grounded equally.
We need a vocabulary.
Level 0 — Unsupported
The claim has no known source.
Example:
This SDK is highly scalable and battle-tested in production.
Unless this exists in approved marketing copy, benchmark notes, or production evidence, it is unsupported.
Default action: reject or mark as human-authored recommendation.
Level 1 — Weakly inferred
The claim is inferred from names, directory structure, or conventions.
Example:
The `billing` module likely handles payment and invoice workflows.
Evidence:
- directory name
src/billing, - classes named
InvoiceService,PaymentGatewayClient.
Default action: allowed only if phrased as an inference or converted into a more precise source-backed statement.
Level 2 — Source-backed
The claim is directly supported by source files.
Example:
The CLI exposes a `scan` command.
Evidence:
- command registry includes
scan, - tests invoke
aidocs scan.
Default action: allowed.
Level 3 — Contract-backed
The claim is supported by formal contract or schema.
Example:
`GET /users/{id}` returns a `User` response on status `200`.
Evidence:
- OpenAPI spec,
- GraphQL schema,
- JSON Schema,
- Protobuf definition.
Default action: allowed, but still check implementation drift when possible.
Level 4 — Behavior-backed
The claim is supported by executable test, fixture, or verified example.
Example:
When the token is missing, the API returns `401 Unauthorized`.
Evidence:
- integration test assertion,
- contract test,
- executable example.
Default action: strongest evidence for behavior.
Level 5 — Human-approved
The claim was approved by an owner.
Example:
This feature is intended for enterprise audit workflows.
Evidence:
- architecture decision record,
- owner-approved note,
- manually reviewed docs block.
Default action: allowed, but must preserve ownership/provenance.
4. Source authority model
A source-grounded generator needs to know which sources are more authoritative.
For example, if the OpenAPI file says one thing but tests say another, which one wins?
There is no universal answer. The CLI must define a source authority model.
A practical default:
| Source type | Authority for |
|---|---|
| OpenAPI / GraphQL / Protobuf / JSON Schema | public contract shape |
| Tests | observed behavior and examples |
| Source code | implementation details, commands, symbols |
| Config files | configuration names, defaults, deployment hints |
| Migrations | database schema evolution |
| Existing docs | intent, explanation, terminology |
| ADRs / human notes | design rationale, ownership, constraints |
| README | entry-level project usage and conventions |
The point is not to declare one source always superior.
The point is to classify claims by topic.
Example:
- For HTTP response schema, OpenAPI is usually authoritative.
- For actual observed behavior, integration tests may be stronger.
- For design rationale, ADRs are stronger than code names.
- For command syntax, CLI parser and command tests are stronger than README prose.
This belongs in configuration:
sourceAuthority:
api_contract:
- openapi
- source_route
- integration_test
- existing_docs
behavior:
- integration_test
- unit_test
- source_code
- existing_docs
rationale:
- adr
- human_note
- existing_docs
- source_code
The generator should not hide conflicts. It should surface them.
5. Grounded generation pipeline
Here is the pipeline we want.
Notice the key design choice:
Source-grounding happens before and after generation.
Before generation, the prompt only contains selected evidence.
After generation, claims are checked against source references.
This dual strategy matters because LLMs can still ignore or misread context.
6. Evidence pack
The evidence pack is the subset of context provided for a specific page.
It is narrower than the full prompt bundle.
A prompt bundle may include task instructions, output schema, style rules, and context units.
An evidence pack is the factual base for claims.
Example:
{
"schemaVersion": "evidence-pack.v1",
"pageId": "api.invoices.create",
"topic": "Create invoice API endpoint",
"sources": [
{
"sourceRef": "src:openapi.yaml#paths./v1/invoices.post",
"type": "openapi_operation",
"authority": "contract",
"supports": ["endpoint", "method", "request_schema", "response_schema"]
},
{
"sourceRef": "src:tests/invoices/create-invoice.test.ts#L12-L58",
"type": "integration_test",
"authority": "behavior",
"supports": ["happy_path", "status_201", "example_request"]
},
{
"sourceRef": "src:src/routes/invoices.ts#L20-L43",
"type": "source_code",
"authority": "implementation",
"supports": ["handler", "route_binding"]
}
],
"knownFacts": [
{
"factId": "fact.endpoint.create_invoice",
"text": "POST /v1/invoices exists in the OpenAPI contract.",
"sourceRefs": ["src:openapi.yaml#paths./v1/invoices.post"],
"groundingLevel": 3
},
{
"factId": "fact.create_invoice.returns_201",
"text": "The integration test expects a 201 response when an invoice is created successfully.",
"sourceRefs": ["src:tests/invoices/create-invoice.test.ts#L31-L35"],
"groundingLevel": 4
}
],
"unknowns": [
"No source describes rate limiting for this endpoint.",
"No test covers idempotency behavior."
],
"forbiddenClaims": [
"Do not claim rate limits exist.",
"Do not claim idempotency support unless a source is added."
]
}
The evidence pack has two jobs:
- give the model the facts it may use,
- give the verifier a checklist of what output is allowed to claim.
7. Claim ledger
The claim ledger is the post-generation artifact.
It records every extracted factual claim and its support.
Example:
{
"schemaVersion": "claim-ledger.v1",
"pageId": "api.invoices.create",
"generatedFile": "docs/api/invoices/create.mdx",
"claims": [
{
"claimId": "claim.001",
"text": "Use POST /v1/invoices to create an invoice.",
"claimType": "api_operation",
"groundingStatus": "supported",
"groundingLevel": 3,
"sourceRefs": ["src:openapi.yaml#paths./v1/invoices.post"],
"confidence": 0.96
},
{
"claimId": "claim.002",
"text": "The endpoint is idempotent when an Idempotency-Key header is supplied.",
"claimType": "behavior",
"groundingStatus": "unsupported",
"groundingLevel": 0,
"sourceRefs": [],
"confidence": 0.18,
"action": "remove_or_request_source"
}
]
}
The claim ledger becomes useful for:
- review UI,
- CI reports,
- drift detection,
- auditability,
- incremental regeneration,
- human approval,
- knowledge graph sync.
This is the artifact that turns “AI wrote it” into “AI proposed it and the system checked it.”
8. Claim types
A claim extractor should classify claims.
Useful claim types:
export type ClaimType =
| 'api_operation'
| 'api_schema'
| 'http_status'
| 'auth_requirement'
| 'config_key'
| 'cli_command'
| 'code_symbol'
| 'module_responsibility'
| 'architecture_relation'
| 'runtime_behavior'
| 'error_condition'
| 'database_schema'
| 'event_contract'
| 'example_behavior'
| 'installation_step'
| 'version_requirement'
| 'performance_claim'
| 'security_claim'
| 'recommendation'
| 'rationale'
| 'unknown_statement';
Different claim types require different evidence.
| Claim type | Minimum acceptable grounding |
|---|---|
| API operation | OpenAPI, router source, contract test |
| HTTP status | OpenAPI response or integration test |
| Config key | config schema, env parser, config docs |
| CLI command | command registry or command parser test |
| Architecture relation | import graph, deployment config, ADR, owner note |
| Runtime behavior | integration test, source path, runbook evidence |
| Performance claim | benchmark or production metric note |
| Security claim | auth code, policy config, security docs, tests |
| Recommendation | must be labeled as recommendation |
The stricter the claim, the stronger the evidence should be.
Security and performance claims should never be casually generated.
9. Source references
A source reference should be stable, precise, and readable.
Bad source reference:
invoice code
Better:
src:src/routes/invoices.ts#L20-L43
Better when available:
symbol:typescript:src/routes/invoices.ts#createInvoiceHandler
Best when tied to artifact hash:
{
"ref": "symbol:typescript:src/routes/invoices.ts#createInvoiceHandler",
"fileHash": "sha256:9a2c...",
"lineRange": [20, 43],
"commit": "abc1234"
}
A production-grade reference should support:
- file path,
- line range,
- symbol ID,
- source artifact hash,
- optional commit SHA,
- optional contract JSON pointer,
- optional test case ID.
For OpenAPI:
src:openapi.yaml#/paths/~1v1~1invoices/post/responses/201
For JSON Schema:
src:schemas/invoice.schema.json#/$defs/Invoice/properties/status
For GraphQL:
graphql:schema.graphql#type.Query.field.invoice
For Logseq/OpenNote human notes:
note:logseq/pages/Billing Architecture.md#block-64f2
The reference format matters because doc drift detection will later depend on it.
10. Generated prose should expose provenance selectively
Do not pollute every public docs paragraph with internal source references.
Public docs should be readable.
But the artifact should preserve provenance.
Recommended pattern:
<!-- aidocs:section id="create-invoice" sources="src:openapi.yaml#/paths/~1v1~1invoices/post,src:tests/invoices/create.test.ts#L12-L58" -->
## Create an invoice
Use `POST /v1/invoices` to create an invoice.
```bash
curl -X POST https://api.example.com/v1/invoices \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-d '{"customerId":"cus_123","amount":25000}'
<!-- /aidocs:section -->
This gives you both:
- clean docs for users,
- source mapping for tools.
For internal docs, you may also render visible source notes:
```mdx
> Source: `openapi.yaml`, `tests/invoices/create.test.ts`
For external public docs, hidden metadata is usually better.
11. Prompt design for source-grounded generation
The prompt must not say:
Write documentation for this code.
That is too open.
It should say something closer to:
You are generating documentation from a bounded evidence pack.
Use only the facts in the evidence pack.
Do not infer unsupported behavior.
When evidence is missing, write an "Unknowns" note in the generation report, not in the public page.
Every factual section must map to at least one sourceRef.
Do not create examples unless they are derived from supplied examples or contracts.
Return MDX plus a claim ledger.
The important instruction is not tone.
The important instruction is the output contract.
Example output contract:
{
"mdx": "string",
"claimLedger": [
{
"claimText": "string",
"claimType": "api_operation | config_key | runtime_behavior | ...",
"sourceRefs": ["string"],
"groundingLevel": 0,
"uncertainty": "string | null"
}
],
"unknowns": ["string"],
"removedClaims": ["string"],
"questionsForReviewer": ["string"]
}
This forces the model to participate in auditability.
The verifier must still check the output independently.
12. The “unknowns” channel
A source-grounded system needs a place for missing knowledge.
Otherwise the model fills gaps.
The unknowns channel is separate from the generated page.
Example:
{
"unknowns": [
{
"topic": "rate_limiting",
"reason": "No rate limit config, OpenAPI extension, or docs source was found.",
"suggestedAction": "Ask API owner or add source note."
},
{
"topic": "idempotency",
"reason": "Request header parser supports Idempotency-Key but no test or docs confirm semantics.",
"suggestedAction": "Add integration test or ADR."
}
]
}
Unknowns are not failures by themselves.
Unknowns are healthy.
A system that admits unknowns is safer than a system that invents answers.
13. Handling contradictions
Contradictions are normal in real repositories.
Example:
- OpenAPI says
401, - integration test expects
403, - README says “unauthorized requests fail”.
The generator must not silently choose one.
It should emit a conflict report:
{
"conflicts": [
{
"conflictId": "conflict.auth.invoice.create.status",
"claimTopic": "missing token response status",
"sources": [
{
"sourceRef": "src:openapi.yaml#/paths/~1v1~1invoices/post/responses/401",
"value": "401"
},
{
"sourceRef": "src:tests/auth/invoice-auth.test.ts#L44-L49",
"value": "403"
}
],
"severity": "major",
"recommendedAction": "Do not generate exact status claim until owner resolves contract drift."
}
]
}
The page can still be generated with a weaker statement:
Requests without valid authorization fail.
But exact status code should not be claimed until resolved.
14. Grounding policy by documentation type
Different page types need different strictness.
API reference
Strict.
Allowed claims should come from:
- OpenAPI / GraphQL / Protobuf,
- route source,
- schema files,
- integration tests,
- approved human notes.
Do not generate behavior that is not in contract or tests.
Tutorial
Moderately strict.
Tutorials may include narrative, but commands and examples must be verified.
Allowed sources:
- real examples,
- integration tests,
- quickstart scripts,
- README,
- package metadata.
How-to guide
Strict for steps and commands.
Every command should be linked to:
- CLI command registry,
- package scripts,
- Docker Compose service,
- Makefile target,
- verified shell example.
Architecture explanation
Careful.
Architecture docs often require synthesis.
Generated claims must distinguish:
- observed structure,
- inferred responsibility,
- owner-approved rationale,
- recommended interpretation.
Troubleshooting / runbook
Very strict.
Commands must be safe, scoped, and sourced.
Do not invent operational remediation steps.
Concept page
Moderately strict.
Conceptual writing may explain relationships, but repository-specific facts still need sources.
15. Source-grounded Mermaid diagrams
Diagrams are claims too.
A diagram edge like this:
contains at least two claims:
- API depends on BillingService,
- BillingService depends on PaymentGateway.
Each edge needs evidence.
Better diagram metadata:
<!-- aidocs:diagram id="billing-flow" sources="symbol:api#InvoiceController,symbol:billing#BillingService,symbol:payments#PaymentGatewayClient" -->
```mermaid
flowchart LR
API[Invoice API] --> Billing[BillingService]
Billing --> Payments[PaymentGatewayClient]
<!-- /aidocs:diagram -->
The verifier should parse diagram nodes/edges and compare them against relation graph evidence.
Diagrams should not become fiction with arrows.
---
## 16. Source-grounded examples
Examples are dangerous because they are copied.
Rules:
1. Prefer examples mined from tests.
2. Prefer examples derived from formal contracts.
3. Never invent auth tokens, IDs, hostnames, or config names without placeholders.
4. Mark placeholders clearly.
5. Validate code fences where possible.
6. Link examples to source evidence.
Example metadata:
```mdx
<!-- aidocs:example id="create-invoice-curl" source="example:http:tests/invoices/create.test.ts#createInvoiceHappyPath" verified="true" -->
```bash
curl -X POST "$API_BASE_URL/v1/invoices" \
-H "Authorization: Bearer $API_TOKEN" \
-H "Content-Type: application/json" \
-d '{"customerId":"cus_123","amount":25000}'
<!-- /aidocs:example -->
The example is allowed because it came from a test episode or contract.
---
## 17. Grounding and human-authored sections
Not every section should be AI-owned.
A production docs system must preserve human-authored knowledge.
Example:
```mdx
<!-- aidocs:manual id="product-positioning" owner="docs-team" -->
## When to use this product
Use this product when your team needs auditable invoice workflows across multiple approval stages.
<!-- /aidocs:manual -->
Manual sections can contain business context that code cannot prove.
But they should still have ownership.
Recommended metadata:
manualSections:
- id: product-positioning
owner: docs-team
lastReviewed: 2026-07-01
reviewCadence: quarterly
This keeps human knowledge accountable.
18. Confidence score is not enough
Do not rely on model confidence.
Confidence is useful only when combined with evidence.
Bad:
{
"claim": "This endpoint is idempotent.",
"confidence": 0.91
}
Good:
{
"claim": "This endpoint is idempotent.",
"groundingStatus": "unsupported",
"confidence": 0.91,
"action": "reject",
"reason": "High model confidence without evidence is not acceptable."
}
The invariant:
Evidence beats confidence.
19. Implementation model
A minimal implementation can use these modules:
src/
grounding/
EvidencePackBuilder.ts
ClaimExtractor.ts
ClaimMatcher.ts
ConflictDetector.ts
GroundingPolicy.ts
ClaimLedgerWriter.ts
generation/
GroundedPageGenerator.ts
verifier/
SourceGroundingVerifier.ts
EvidencePackBuilder
Input:
- page spec,
- repo map,
- symbols,
- contracts,
- examples,
- existing docs,
- human notes.
Output:
evidence-pack.v1.json.
ClaimExtractor
Input:
- generated MDX.
Output:
- candidate claims.
Can be implemented in phases:
- rule-based extraction for code spans, endpoints, config names,
- LLM structured extraction for prose claims,
- hybrid extraction with verifier checks.
ClaimMatcher
Input:
- candidate claims,
- evidence pack,
- source index.
Output:
- claim ledger.
ConflictDetector
Input:
- competing evidence values.
Output:
- conflict report.
SourceGroundingVerifier
Input:
- generated MDX,
- claim ledger,
- page spec,
- evidence pack.
Output:
- pass/fail report.
20. Claim extraction strategy
You do not need perfect NLP on day one.
Start with high-signal patterns.
Pattern 1 — API endpoints
Regex:
\b(GET|POST|PUT|PATCH|DELETE)\s+(/[^\s`]+)
Candidate claim:
{
"claimType": "api_operation",
"method": "POST",
"path": "/v1/invoices"
}
Pattern 2 — HTTP status codes
Regex:
\b(200|201|204|400|401|403|404|409|422|429|500)\b
Then inspect nearby sentence.
Pattern 3 — Config keys
Patterns:
`[A-Z][A-Z0-9_]+`
`[a-z][a-zA-Z0-9_.-]+`
Then match against config schema, env parser, or docs config.
Pattern 4 — CLI commands
Patterns:
```bash
<command>
Then parse command name and subcommand.
### Pattern 5 — Architecture edges
Extract Mermaid edges:
```txt
A --> B
A -.-> B
A -- label --> B
Then match nodes/edges against relation graph.
Pattern 6 — Strong adjectives
Flag unsupported marketing-like claims:
scalable
secure
production-ready
enterprise-grade
high-performance
battle-tested
fault-tolerant
zero-downtime
These are not always false, but they require strong evidence.
21. Matching claims to sources
Matching can be exact or semantic.
Exact matching
Good for:
- endpoint method/path,
- config key,
- CLI command,
- schema field,
- status code,
- file path,
- symbol name.
Structural matching
Good for:
- method + path + status,
- function + exported module,
- config key + default value,
- migration table + column.
Semantic matching
Good for:
- module responsibility,
- architecture explanation,
- design rationale,
- troubleshooting cause.
Use semantic matching carefully.
For high-risk claims, semantic similarity is not enough.
Example:
Claim: The service retries payment gateway calls.
A semantic match to a file named PaymentGatewayClient is not enough.
You need evidence like:
- retry policy config,
- retry library usage,
- test asserting retry,
- runbook describing retry behavior.
22. Grounding policy as code
Do not hardcode all rules in prose.
Represent grounding policy as data.
groundingPolicy:
default:
unsupported: fail
weaklyInferred: warn
claimTypes:
api_operation:
minLevel: contract_backed
allowedSources:
- openapi
- route_source
- contract_test
http_status:
minLevel: contract_backed
allowedSources:
- openapi_response
- integration_test
runtime_behavior:
minLevel: behavior_backed
allowedSources:
- integration_test
- source_code
- runbook
performance_claim:
minLevel: human_approved
allowedSources:
- benchmark
- production_metric_note
- approved_docs
security_claim:
minLevel: source_backed
allowedSources:
- auth_source
- security_config
- security_test
- approved_security_note
This enables:
- stricter enterprise profiles,
- OSS-friendly defaults,
- project-specific exceptions,
- CI enforcement.
23. Generated MDX block metadata
Every generated block should carry machine-readable metadata.
Example:
<!-- aidocs:block
id="auth-requirements"
type="generated"
sourceRefs="src:openapi.yaml#/components/securitySchemes/BearerAuth,src:src/middleware/auth.ts#L10-L48"
grounding="contract_backed"
lastGenerated="2026-07-04T00:00:00Z"
-->
## Authentication
Requests require a bearer token using the `Authorization` header.
<!-- /aidocs:block -->
Why block metadata matters:
- drift detector knows what source a paragraph depends on,
- human editor can preserve manual blocks,
- verifier can target only changed sections,
- review UI can show evidence per section,
- regeneration can be surgical.
24. Source-grounded generation with repair loop
Generation will fail sometimes.
Repair loop:
Repair prompt should not ask the model to “try harder”.
It should provide concrete violations:
The previous draft contains unsupported claims:
1. "The endpoint is idempotent." No sourceRef supports idempotency.
2. "The API retries failed payments." No retry evidence found.
Revise the MDX by removing unsupported claims or rewriting them as unknowns in the generation report.
Do not add new factual claims.
The system should limit repair attempts.
After two failed repairs, escalate.
25. Grounding report
The generated report should be easy to inspect.
Example CLI output:
$ aidocs generate docs/api/invoices/create.mdx --grounded
Generated: docs/api/invoices/create.mdx
Grounding: failed
Claims:
supported: 18
weak: 3
unsupported: 2
conflicts: 1
Unsupported:
- "The endpoint is idempotent when Idempotency-Key is supplied."
reason: no source found
action: remove or add source
Conflict:
- Missing auth status: OpenAPI says 401, test says 403
action: avoid exact status claim
Next:
aidocs review docs/api/invoices/create.mdx
aidocs repair docs/api/invoices/create.mdx
The report should be blunt.
Do not hide unsupported content behind green checkmarks.
26. Knowledge graph integration
Source-grounded generation should feed the knowledge graph.
Example concept node:
{
"nodeId": "api.POST./v1/invoices",
"type": "ApiOperation",
"label": "Create invoice",
"sourceRefs": ["src:openapi.yaml#/paths/~1v1~1invoices/post"],
"documentedBy": ["docs/api/invoices/create.mdx#create-invoice"],
"claims": ["claim.001", "claim.003"],
"confidence": 0.96
}
This enables:
- backlink from Logseq note to docs page,
- OpenNote semantic search result with provenance,
- “what docs depend on this endpoint?” queries,
- drift analysis.
Generated knowledge should not be a disconnected note dump.
It should be linked to claims and sources.
27. CLI commands
Recommended commands:
aidocs evidence build docs/api/invoices/create.mdx
Build the evidence pack for a target page.
aidocs claims extract docs/api/invoices/create.mdx
Extract claims from an MDX file.
aidocs claims verify docs/api/invoices/create.mdx
Map claims to sources and emit a claim ledger.
aidocs generate docs/api/invoices/create.mdx --grounded
Generate with evidence pack and grounding policy.
aidocs review --unsupported
Show unsupported claims across generated docs.
aidocs explain-claim claim.001
Explain why a claim passed or failed.
28. Testing source-grounded generation
Test the system with fixtures.
Fixture 1 — supported endpoint
Input:
- OpenAPI endpoint exists,
- test confirms status,
- generated page claims endpoint/status.
Expected:
- claim supported.
Fixture 2 — hallucinated endpoint
Input:
- generated page mentions nonexistent endpoint.
Expected:
- claim unsupported.
Fixture 3 — status conflict
Input:
- OpenAPI says
401, - test says
403.
Expected:
- conflict detected.
Fixture 4 — fake architecture edge
Input:
- Mermaid diagram says
API --> Redis, - no dependency or config evidence.
Expected:
- edge unsupported.
Fixture 5 — manual section
Input:
- human-owned paragraph has owner metadata.
Expected:
- not overwritten,
- review date checked.
Fixture 6 — unsupported performance claim
Input:
- generated page says “high-performance”.
Expected:
- rejected unless benchmark evidence exists.
29. Failure modes
Failure mode: evidence overload
Too much source context makes the model less reliable.
Fix:
- evidence pack must be page-specific,
- context packing must rank and compress.
Failure mode: source laundering
The model cites a source that does not actually support the claim.
Fix:
- verifier checks citation-content alignment.
Failure mode: weak inference presented as fact
The model infers module purpose from names and writes it as certain.
Fix:
- weakly inferred claims must be phrased cautiously or reviewed.
Failure mode: stale human notes
Approved notes can become stale.
Fix:
- review cadence,
- source dependency,
- drift detection.
Failure mode: exactness where only generality is supported
Evidence supports “auth fails”, but not “auth fails with 401”.
Fix:
- claim granularity matching.
Failure mode: diagram hallucination
Architecture diagram contains invented edges.
Fix:
- parse diagrams as claims.
30. Design invariant checklist
Before accepting generated docs, check:
- Does every factual generated block have source references?
- Does every API claim map to contract/source/test evidence?
- Does every example map to a real test, fixture, or contract?
- Are unsupported claims removed or marked for review?
- Are conflicts reported instead of hidden?
- Are human-owned sections preserved?
- Are risky claims like security/performance treated strictly?
- Does the claim ledger exist?
- Can a reviewer inspect why a claim passed?
- Can drift detection later find dependent pages?
If yes, the system is becoming trustworthy.
31. References
- Code2Prompt repository: https://github.com/mufeedvh/code2prompt
- OpenAI, “Why language models hallucinate”: https://openai.com/index/why-language-models-hallucinate/
- Diátaxis documentation framework: https://diataxis.fr/
- OpenAPI Specification: https://spec.openapis.org/oas/latest.html
- Mermaid documentation: https://mermaid.js.org/
- Google SRE Workbook, on-call/playbooks: https://sre.google/workbook/on-call/
32. What we have now
We now have a source-grounding layer.
The system can:
- build an evidence pack,
- constrain generation to evidence,
- extract claims from MDX,
- map claims to source references,
- detect unsupported claims,
- report conflicts,
- preserve provenance,
- feed the knowledge graph.
The next part builds on this directly.
If every generated claim has source references, then every generated claim can also become stale when its sources change.
That brings us to documentation drift detection.
You just completed lesson 25 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.
Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.