Learn Ai Driven Documentation Part 021 Code To Docs Implementation
title: Learn AI-Driven Documentation and Technical Writing Implementation and Usage - Part 021 description: Deep implementation guide for code-to-docs systems: static analysis, AST extraction, symbol graphs, semantic summaries, examples, ownership, validation, and AI-assisted documentation generation from source code. series: learn-ai-driven-documentation seriesTitle: Learn AI-Driven Documentation and Technical Writing Implementation and Usage order: 21 partTitle: Code-to-Docs Implementation tags:
- ai
- documentation
- technical-writing
- code-to-docs
- static-analysis
- ast
- symbol-graph
- docs-as-code
- series date: 2026-06-30
Part 021 — Code-to-Docs Implementation
1. Why This Part Exists
Code-to-docs is one of the most attractive AI documentation use cases, but also one of the easiest to implement badly.
The naive version says:
Read source code and ask an LLM to explain it.
That can help during local exploration, but it is not an engineering-grade documentation system. It does not know ownership, public API boundaries, runtime behavior, version compatibility, examples, operational constraints, or whether the generated text is still true after the next commit.
A production-grade code-to-docs system must answer harder questions:
- Which source files are allowed to become documentation sources?
- Which code symbols are part of a stable contract and which are internal implementation details?
- Which claims can be derived mechanically and which require human verification?
- How do we prevent generated docs from becoming stale faster than manually written docs?
- How do we attach evidence to AI-generated explanations?
- How do we keep code documentation useful without turning it into noisy line-by-line paraphrase?
- How do we review generated documentation in a PR workflow?
- How do we support multiple languages and frameworks without hardcoding one parser per team forever?
This part focuses on implementation.
We are not re-teaching Java internals, API design, domain modeling, persistence, or event governance. Those belong in separate deep series. Here, code is treated as a documentation source and a machine-readable evidence stream.
The core principle:
Code-to-docs should extract structure deterministically, generate narrative cautiously, validate claims automatically where possible, and require human ownership for published truth.
2. Kaufman Framing
Kaufman's method tells us to deconstruct the skill into sub-skills, learn enough to self-correct, remove practice barriers, and spend focused time producing real outputs.
For code-to-docs, the sub-skills are not "write a prompt". They are:
| Sub-skill | What You Must Be Able to Do |
|---|---|
| Boundary detection | Identify public API, internal code, generated code, test fixtures, and examples. |
| Source extraction | Parse source into symbols, relationships, signatures, comments, annotations, and metadata. |
| Semantic classification | Classify symbols by role: entrypoint, DTO, service, repository, config, handler, adapter, command, policy. |
| Evidence modeling | Connect each generated claim to files, line ranges, symbols, tests, specs, commits, or owners. |
| Narrative generation | Produce useful docs without paraphrasing code mechanically. |
| Example generation | Derive or propose usage examples that compile, run, or pass validation. |
| Drift detection | Detect when code changes invalidate docs. |
| Review integration | Route generated docs through owners, CI checks, and editorial gates. |
| Safety control | Prevent leakage of secrets, internal-only behavior, exploit details, and misleading guarantees. |
2.1 Target Performance Level
After this part, you should be able to design and implement a code-to-docs pipeline that can:
- Parse a repository and build a language-neutral symbol inventory.
- Generate module-level and symbol-level documentation drafts.
- Attach evidence to generated documentation.
- Distinguish extractive facts from inferred explanations.
- Validate code snippets and examples in CI.
- Detect stale docs when code changes.
- Route risky documentation changes to human owners.
- Avoid publishing AI-generated claims without verification.
2.2 Practice Output
The concrete practice output for this part is a small but realistic pipeline:
repository source
-> source scanner
-> parser / language adapter
-> symbol graph
-> documentation intent classifier
-> context packet builder
-> LLM draft generator
-> claim verifier
-> docs PR creator
-> human review
-> docs site publish
This is the smallest architecture that has enough structure to be useful beyond a demo.
3. Code-to-Docs Is Not Code Explanation
A code explanation is usually local:
What does this function do?
Documentation is operational:
What does a reader need to know to correctly use, change, extend, operate, or govern this code?
The difference matters.
3.1 Bad Code-to-Docs Output
Bad output often looks like this:
The `PaymentProcessor` class processes payments. It has a method called `processPayment` that takes a request and returns a response. It checks the request, calls the repository, and returns the result.
This is low value because it repeats obvious syntax. It does not answer:
- when to use it,
- what invariants it enforces,
- what errors can occur,
- what dependencies it has,
- what state transitions it triggers,
- what observability signals it emits,
- how it should be tested,
- what not to call directly,
- and what contract must remain stable.
3.2 Good Code-to-Docs Output
Good output is reader-task oriented:
`PaymentProcessor` is the application-layer coordinator for payment submission.
Use it from command handlers, not directly from HTTP controllers, because it assumes the request has already passed transport-level authentication and request-shape validation.
It enforces these business invariants:
- an order can only be charged when its state is `PAYMENT_PENDING`,
- duplicate idempotency keys return the existing payment attempt,
- provider failures are stored as retryable attempts unless the provider marks the failure as permanent.
Changing this component requires reviewing:
- `PaymentAttemptRepository`, because persistence state is mutated during processing,
- `PaymentProviderClient`, because external provider errors are normalized here,
- `payment-submitted` event consumers, because successful processing emits downstream events.
This output is better because it explains boundaries, invariants, dependencies, and change impact.
3.3 The Code-to-Docs Value Ladder
| Level | Output | Value | Risk |
|---|---|---|---|
| 1 | Symbol list | Low but reliable | Low |
| 2 | Signature reference | Useful for navigation | Low |
| 3 | Extracted comments | Useful if comments are good | Medium |
| 4 | Structural summary | Useful for onboarding | Medium |
| 5 | Behavioral explanation | High value | High |
| 6 | Change impact analysis | Very high value | High |
| 7 | Operational guidance | Very high value | Very high |
| 8 | Contract guarantees | Highest value | Highest risk |
AI can help at every level, but the validation requirement increases as value increases.
4. Mental Model: Code as Evidence, Not as Complete Truth
Source code is an important source of truth, but it is not the whole truth.
Code tells us:
- signatures,
- control flow,
- data flow,
- annotations,
- imports,
- call relationships,
- error handling branches,
- constants,
- default values,
- comments,
- test assertions,
- dependency references.
Code often does not fully tell us:
- product intent,
- business rationale,
- historical decision context,
- operational constraints,
- external service behavior,
- regulatory meaning,
- compatibility promises,
- what customers rely on,
- what must not change.
Therefore, code-to-docs should separate claim types.
4.1 Claim Taxonomy
| Claim Type | Example | Can Be Mechanically Verified? | Required Review |
|---|---|---|---|
| Structural | OrderService has submitOrder() | Yes | Low |
| Signature | submitOrder accepts SubmitOrderCommand | Yes | Low |
| Dependency | OrderService calls PaymentClient | Usually | Medium |
| Behavior | duplicate command returns existing order | Sometimes via tests | Medium/High |
| Invariant | order cannot be paid twice | Sometimes | High |
| Operational | retry after 5xx is safe | Rarely from code alone | High |
| Compliance | audit trail satisfies retention policy | No | Very High |
| Product | customers see state as "submitted" | No | Product/Domain review |
A code-to-docs system should store this taxonomy in metadata.
Example frontmatter:
aiGenerated: true
sourceType: code-to-docs
claimRisk: medium
evidence:
- type: source
path: services/order/src/main/java/com/acme/order/OrderService.java
symbols:
- com.acme.order.OrderService#submitOrder
- type: test
path: services/order/src/test/java/com/acme/order/OrderServiceTest.java
reviewRequired:
- technical-owner
- domain-owner
This prevents generated documentation from pretending every statement has the same confidence.
5. Architecture Overview
A robust code-to-docs implementation has several stages.
Each stage should have a clear contract.
| Stage | Responsibility | Should Not Do |
|---|---|---|
| Source scanner | Find candidate files and classify them | Interpret behavior deeply |
| Language adapter | Parse language-specific syntax | Generate final prose |
| Symbol inventory | Store symbols and signatures | Infer business intent |
| Relationship graph | Store calls, imports, inheritance, config links | Pretend dynamic behavior is fully known |
| Intent classifier | Decide what doc type is needed | Publish docs |
| Context builder | Assemble bounded evidence | Include unlimited repo context |
| LLM generator | Draft narrative | Invent unverified guarantees |
| Verifier | Check claims against evidence | Approve product claims alone |
| Formatter | Produce MDX/Markdown | Change technical meaning |
| PR creator | Route review | Bypass owners |
6. Source Scanning
The first implementation mistake is to feed the whole repository to AI. That creates noise, cost, leakage risk, and weak traceability.
Instead, scan and classify files before generation.
6.1 File Categories
| Category | Examples | Use in Docs? |
|---|---|---|
| Public source | exported modules, public classes, controllers, SDK APIs | Yes |
| Internal source | private helpers, package-private implementation | Usually summarized only at module level |
| Tests | unit, integration, contract, golden tests | Yes, as behavior evidence |
| Config | route config, feature flags, deployment manifests | Yes, with caution |
| Generated code | protobuf output, OpenAPI generated clients | Usually no; source spec is better |
| Vendor code | copied dependencies | No |
| Build scripts | Gradle, Maven, npm, Make, CI | Yes for setup/build docs |
| Secrets/config samples | .env.example, Helm values | Yes, with redaction |
| Fixtures | test data | Only as examples if sanitized |
6.2 Ignore Rules
Define explicit ignore rules:
codeToDocs:
ignore:
- "**/target/**"
- "**/build/**"
- "**/node_modules/**"
- "**/vendor/**"
- "**/*.generated.*"
- "**/generated/**"
- "**/.terraform/**"
- "**/secrets/**"
includeEvidenceFromTests: true
includePrivateSymbols: false
maxFileSizeKb: 256
Do not rely only on .gitignore. Documentation scanning has different safety concerns.
6.3 Source Classification Heuristics
A first version can use path and extension heuristics:
| Signal | Likely Meaning |
|---|---|
/api/, /controllers/, /routes/ | external entrypoints |
/domain/, /model/, /aggregate/ | business model |
/infra/, /adapter/, /client/ | integration boundary |
/config/ | runtime configuration |
/test/, /spec/, /__tests__/ | behavior evidence |
*Controller, *Handler, *Resource | request entrypoint |
*Service, *UseCase, *CommandHandler | application operation |
*Repository, *Dao | persistence boundary |
*Client, *Gateway, *Adapter | external dependency |
*Policy, *Rule, *Validator | business rule/invariant |
These heuristics should be treated as hints, not truth.
7. Parsing Strategy
There are four common approaches.
| Approach | Example | Pros | Cons |
|---|---|---|---|
| Regex | scan for class, function, annotations | Simple | Fragile |
| Language compiler API | Java compiler API, TypeScript compiler API, Roslyn | Accurate | Language-specific |
| Universal parser | Tree-sitter | Multi-language, fast | Semantics limited |
| Framework reflection | runtime route metadata, Spring beans, NestJS modules | High-level | Requires build/runtime context |
A production system often combines them.
7.1 Recommended Layering
Use Tree-sitter or a similar parser for broad, fast coverage. Use compiler APIs for language-specific precision where it matters. Use framework introspection when documentation depends on runtime wiring.
7.2 Symbol Model
Create a language-neutral symbol model.
type SymbolKind =
| "module"
| "package"
| "class"
| "interface"
| "record"
| "enum"
| "function"
| "method"
| "field"
| "route"
| "event-handler"
| "configuration"
| "test-case";
type SymbolNode = {
id: string;
language: "java" | "typescript" | "go" | "csharp" | "python" | "unknown";
kind: SymbolKind;
name: string;
qualifiedName?: string;
path: string;
startLine: number;
endLine: number;
visibility?: "public" | "protected" | "package" | "private" | "exported" | "internal";
signature?: string;
annotations?: string[];
docComment?: string;
owner?: string;
tags?: string[];
};
Do not make the model too language-specific. Keep extension fields for language details.
type SymbolNode = {
id: string;
kind: SymbolKind;
path: string;
startLine: number;
endLine: number;
languageDetails?: Record<string, unknown>;
};
7.3 Relationship Model
Docs become useful when symbols are connected.
type EdgeKind =
| "declares"
| "imports"
| "calls"
| "implements"
| "extends"
| "uses-type"
| "throws"
| "emits-event"
| "handles-event"
| "reads-config"
| "writes-state"
| "tested-by"
| "documented-by"
| "owned-by";
type SymbolEdge = {
from: string;
to: string;
kind: EdgeKind;
confidence: "exact" | "inferred" | "heuristic";
evidence: EvidenceRef[];
};
type EvidenceRef = {
path: string;
startLine?: number;
endLine?: number;
commit?: string;
};
The confidence field is important. A static call edge might be exact in a simple function call, inferred in a dependency-injected service, and heuristic in reflection-heavy code.
8. Extract Before You Generate
Do not ask the LLM to discover everything from raw files. First, extract deterministic facts.
8.1 Extracted Facts
| Fact | Extraction Source |
|---|---|
| Symbol names | parser/compiler |
| Visibility | parser/compiler |
| Signatures | parser/compiler |
| Parameters | parser/compiler |
| Return types | parser/compiler |
| Throws/errors | annotations, code, compiler |
| Annotations/decorators | parser/compiler |
| Routes | framework annotations/config |
| Event topics | annotations/config/constants/specs |
| Config keys | constants/config binding |
| Existing comments | doc comments |
| Tests | test names/assertions |
| Owners | CODEOWNERS, service catalog |
| Links to docs | comments/frontmatter/search |
8.2 Generated Explanations
Only after extraction should AI generate narrative.
| Narrative | Required Input |
|---|---|
| Module overview | symbol graph + package structure + README + owners |
| Component responsibility | class/function signature + dependencies + tests |
| Usage guide | public entrypoints + examples + tests |
| Change impact | relationship graph + owners + downstream references |
| Troubleshooting | errors + logs + tests + runbooks |
| Migration note | diff + deprecation metadata + examples |
This separation reduces hallucination and makes the output reviewable.
9. Documentation Intent Classifier
Not every symbol deserves a standalone document.
A code-to-docs system should classify documentation intent.
9.1 Intent Categories
| Intent | Trigger | Output |
|---|---|---|
| Reference | public API, SDK class, exported function | symbol reference page |
| Explanation | complex module, high fan-in, non-obvious policy | conceptual doc |
| How-to | common task, setup, extension point | procedural guide |
| Tutorial | onboarding path | guided exercise |
| Runbook | alerts, operational procedures | operational doc |
| ADR candidate | code shows major design decision | decision draft |
| Deprecated API note | deprecation annotation/tag | migration doc |
9.2 Priority Score
A practical prioritization model:
priority = publicBoundaryWeight
+ changeFrequencyWeight
+ supportTicketWeight
+ incidentWeight
+ onboardingWeight
+ fanInWeight
- existingDocQualityWeight
High-priority generated docs should be reviewed first.
10. Context Packet for Code-to-Docs
The LLM should receive a bounded context packet, not an arbitrary folder dump.
10.1 Context Packet Shape
docIntent: component-explanation
targetAudience:
- backend engineer
- tech lead
readerTask: "Understand how to safely modify payment submission behavior."
sourceOfTruthPriority:
- source-code
- tests
- openapi-spec
- adr
- existing-docs
symbol:
id: "java:com.acme.payment.PaymentProcessor"
path: "services/payment/src/main/java/com/acme/payment/PaymentProcessor.java"
lines: "24-188"
relatedSymbols:
- "PaymentAttemptRepository"
- "PaymentProviderClient"
- "PaymentSubmittedEvent"
knownTests:
- "PaymentProcessorTest.shouldReturnExistingAttemptForSameIdempotencyKey"
evidence:
- path: "services/payment/src/main/java/com/acme/payment/PaymentProcessor.java"
lines: "56-87"
claim: "idempotency key is checked before provider call"
constraints:
- "Do not claim provider retry behavior unless supported by tests or runbook."
- "Separate verified facts from inferred explanation."
outputFormat: mdx
10.2 Context Rules
The context packet should include:
- target reader,
- documentation intent,
- source hierarchy,
- selected symbols,
- related symbols,
- evidence snippets,
- existing docs to preserve,
- style constraints,
- forbidden claims,
- required verification output.
It should exclude:
- secrets,
- production credentials,
- personal data,
- irrelevant files,
- large logs,
- vendor dependencies,
- stale generated docs unless explicitly marked,
- previous AI output unless reviewed and promoted.
11. Prompt Contract for Code-to-Docs
A useful prompt is a contract, not a wish.
You are generating a documentation draft from source evidence.
Reader task:
- Help a backend engineer understand how to safely modify payment submission behavior.
Rules:
- Use only the provided evidence.
- Do not claim runtime behavior unless evidence supports it.
- Separate verified facts from inferred explanations.
- Attach evidence references for important claims.
- Mark missing information explicitly.
- Do not document private helper methods unless they affect public behavior.
- Do not expose secrets, internal tokens, private URLs, or customer data.
Output:
- MDX document.
- Sections: Overview, When to Use, Boundaries, Key Flow, Invariants, Dependencies, Failure Modes, Change Checklist, Evidence Table, Gaps.
The prompt should force the model to admit uncertainty.
11.1 Required Evidence Table
Every generated doc should include a hidden or visible evidence table during review.
| Claim | Evidence | Confidence | Review Needed |
|---|---|---|---|
PaymentProcessor checks idempotency before provider call | PaymentProcessor.java:56-87 | High | Technical owner |
| Provider timeout is retryable | PaymentProviderClientTest.java:44-61 | Medium | SRE owner |
| Duplicate submission is safe | source + tests | Medium | Domain owner |
For public docs, the final evidence table may be hidden or transformed, but the PR review should retain it.
12. Output Patterns
12.1 Module Overview
A module overview should answer:
- What does this module own?
- What does it not own?
- What are its public boundaries?
- What state does it read/write?
- What external systems does it call?
- What events does it emit or consume?
- How do engineers safely change it?
Template:
# Payment Module
## Responsibility
## Non-Responsibilities
## Public Entry Points
## Main Flow
## Domain Invariants
## External Dependencies
## Events
## Configuration
## Operational Notes
## Change Checklist
## Evidence
12.2 Symbol Reference
A symbol reference should be terse.
## `PaymentProcessor.submit()`
Coordinates payment submission for an order that is ready for payment.
### Signature
```java
PaymentResult submit(SubmitPaymentCommand command)
Inputs
| Field | Meaning | Required | Notes |
|---|---|---|---|
orderId | Order to charge | Yes | Must refer to existing order |
idempotencyKey | Duplicate submission key | Yes | Reused keys return existing attempt |
Behavior
- Loads the order.
- Checks payment state.
- Checks idempotency key.
- Calls provider client.
- Persists payment attempt.
- Emits payment event on success.
Failure Modes
| Failure | Result | Retry? |
|---|---|---|
| invalid state | domain error | No |
| provider timeout | retryable attempt | Yes |
| permanent provider decline | failed attempt | No |
### 12.3 Change Impact Summary
```md
## Change Impact
Changing `PaymentProcessor.submit()` may affect:
- HTTP endpoint: `POST /payments`
- Event: `payment-submitted`
- Tables: `payment_attempts`
- Consumers: settlement service, notification service
- Tests: `PaymentProcessorTest`, `PaymentContractTest`
Before merging, verify:
- idempotency behavior,
- provider failure mapping,
- event payload compatibility,
- rollback behavior,
- dashboard and alert expectations.
This is much more useful than a generated method-by-method catalog.
13. Example Implementation Blueprint
13.1 Repository Layout
/tools/code-to-docs/
src/
scanner/
adapters/
java/
typescript/
go/
csharp/
graph/
context/
generation/
verification/
output/
config/
code-to-docs.yml
prompts/
module-overview.md
symbol-reference.md
change-impact.md
tests/
/docs/
engineering/
generated/
services/
13.2 Pipeline Command
code-to-docs generate \
--repo . \
--config tools/code-to-docs/config/code-to-docs.yml \
--target services/payment \
--intent module-overview \
--out docs/services/payment/overview.mdx \
--create-pr
13.3 Configuration
project:
name: payments-platform
defaultAudience:
- backend-engineer
- tech-lead
sourceScanning:
include:
- "services/**/src/**"
- "services/**/test/**"
exclude:
- "**/target/**"
- "**/build/**"
- "**/generated/**"
- "**/*.generated.*"
languages:
java:
enabled: true
parser: compiler-api
includeAnnotations:
- RestController
- Controller
- Service
- Repository
- Deprecated
- Transactional
typescript:
enabled: true
parser: typescript-compiler
go:
enabled: true
parser: tree-sitter
output:
format: mdx
includeEvidenceTable: true
generatedNotice: true
maxDocLengthWords: 1800
review:
requireCodeOwner: true
requireDomainOwnerForInvariants: true
requireSecurityReviewForAuthOrSecretClaims: true
validation:
runMarkdownLint: true
runVale: true
runSnippetTests: true
failOnUnsupportedClaim: true
14. Language Adapter Design
A language adapter converts language-specific syntax into the unified symbol model.
interface LanguageAdapter {
language: string;
supports(path: string): boolean;
parse(file: SourceFile): ParsedFile;
extractSymbols(parsed: ParsedFile): SymbolNode[];
extractEdges(parsed: ParsedFile, symbols: SymbolNode[]): SymbolEdge[];
extractDocComments(parsed: ParsedFile, symbols: SymbolNode[]): DocComment[];
extractTests?(parsed: ParsedFile): TestEvidence[];
}
14.1 Java Adapter
For Java, useful signals include:
- package declaration,
- class/interface/record/enum declarations,
- method signatures,
- annotations,
- visibility modifiers,
- Javadoc comments,
- thrown exceptions,
- dependency injection constructor fields,
- test annotations,
- framework annotations.
Example extraction result:
{
"id": "java:com.acme.payment.PaymentProcessor#submit",
"language": "java",
"kind": "method",
"name": "submit",
"qualifiedName": "com.acme.payment.PaymentProcessor.submit",
"path": "services/payment/src/main/java/com/acme/payment/PaymentProcessor.java",
"startLine": 42,
"endLine": 98,
"visibility": "public",
"signature": "PaymentResult submit(SubmitPaymentCommand command)",
"annotations": ["Transactional"],
"tags": ["application-service", "state-mutating"]
}
Javadoc is useful, but should not be blindly trusted. It may be stale. Treat it as one evidence source.
14.2 TypeScript Adapter
For TypeScript:
- exported functions/classes/interfaces,
- route decorators,
- type aliases,
- generics,
- JSDoc/TSDoc comments,
- imports/exports,
- test cases,
- schema validators.
14.3 Go Adapter
For Go:
- exported symbols by capitalization,
- package comments,
- interface definitions,
- function signatures,
- struct tags,
- HTTP handlers,
- test functions,
- error variables.
14.4 C# Adapter
For C#:
- namespaces,
- public/internal types,
- attributes,
- XML doc comments,
- controller actions,
- dependency injection registrations,
- test attributes.
14.5 Adapter Rule
A language adapter should extract structure, not write prose.
The generation stage should be language-aware but not language-bound.
15. Symbol Graph Design
A symbol graph allows the system to answer documentation questions that are impossible with isolated files.
15.1 Useful Queries
| Query | Documentation Use |
|---|---|
| What public entrypoints call this component? | Change impact |
| What tests cover this behavior? | Evidence |
| What external systems are called? | Operational docs |
| What config keys are read? | Setup docs |
| What symbols have no docs? | Coverage |
| What docs mention removed symbols? | Drift detection |
| What high-fan-in symbols are undocumented? | Prioritization |
15.2 Graph Example
15.3 Storage Options
| Option | Good For | Trade-off |
|---|---|---|
| JSON files | simple builds, CI artifacts | limited query power |
| SQLite | local indexing, medium repos | graph traversal manual |
| Graph DB | rich relationships | operational overhead |
| Search index | retrieval and ranking | weak relationship semantics |
| Hybrid | serious internal platform | more implementation complexity |
For a first implementation, SQLite plus JSON artifacts is often enough. Move to graph storage when relationship queries become central.
16. AI Generation Strategy
16.1 Do Not Generate Everything
Do not generate docs for every symbol. That creates documentation spam.
Generate only when one or more conditions are true:
- symbol is a public or exported boundary,
- symbol is high fan-in,
- symbol is high-risk,
- symbol is frequently changed,
- symbol is frequently searched,
- symbol is frequently involved in incidents,
- symbol has poor existing documentation,
- symbol is part of onboarding path,
- symbol appears in support tickets.
16.2 Use Templates
Prompting should be template-driven.
| Doc Type | Template Sections |
|---|---|
| Module overview | responsibility, boundaries, flows, dependencies, change checklist |
| Component explanation | purpose, collaborators, invariants, failure modes |
| Public API reference | signature, params, return, errors, examples |
| Extension guide | extension point, steps, tests, constraints |
| Change impact | affected entrypoints, tests, docs, owners |
| Deprecation note | old behavior, replacement, migration steps, timeline |
16.3 Require Missing Information Section
Every generated draft should include:
## Missing Information
- No test evidence found for retry behavior.
- No ADR found for idempotency strategy.
- No runbook found for provider outage handling.
This is one of the simplest ways to prevent false confidence.
17. Example Generation
Examples are where code-to-docs becomes dangerous.
An example that does not compile or does not match production behavior is worse than no example.
17.1 Example Source Priority
| Priority | Source | Trust |
|---|---|---|
| 1 | Existing tests | Highest |
| 2 | Existing examples | High if tested |
| 3 | Contract tests | High |
| 4 | README snippets | Medium |
| 5 | AI-generated examples | Low until tested |
17.2 Example Lifecycle
17.3 Snippet Testing
For Java:
./gradlew test --tests '*DocumentationSnippetTest'
For TypeScript:
npm run test:docs-snippets
For shell examples:
shellcheck docs/**/*.sh
For HTTP examples:
newman run docs/examples/payments.postman_collection.json
The exact tool does not matter as much as the invariant:
Published examples must be executable or explicitly marked as illustrative.
18. Drift Detection
Code-to-docs is only useful if it can detect staleness.
18.1 Drift Types
| Drift Type | Example | Detection |
|---|---|---|
| Symbol removed | docs mention deleted method | symbol index check |
| Signature changed | docs show old parameter | signature diff |
| Behavior changed | tests changed but docs not updated | test-to-doc mapping |
| Example broken | snippet no longer compiles | snippet tests |
| Config changed | docs mention old key | config key index |
| Ownership changed | stale team owner | CODEOWNERS/service catalog diff |
| Public contract changed | OpenAPI differs from docs | spec diff |
18.2 Stale Claim Metadata
Each generated claim can store a dependency fingerprint.
claims:
- id: claim-001
text: "Duplicate idempotency keys return the existing payment attempt."
evidence:
- path: PaymentProcessor.java
lines: 56-87
hash: "sha256:abc123"
- path: PaymentProcessorTest.java
lines: 44-62
hash: "sha256:def456"
reviewOwner: payments-team
When any evidence hash changes, mark the claim as stale.
18.3 PR Bot Behavior
On a code PR, the bot can comment:
Documentation impact detected.
Changed symbols:
- `PaymentProcessor.submit()`
Affected docs:
- `docs/services/payment/overview.mdx`
- `docs/services/payment/change-checklist.mdx`
Stale claims:
- duplicate idempotency behavior
- provider timeout retry behavior
Action required:
- Update docs or add `docs-impact: none` with owner approval.
This is high leverage because it keeps documentation updates close to code changes.
19. Review Workflow
Generated documentation should enter the same PR system as hand-written docs.
19.1 Review Checklist
Technical reviewer checks:
- Is the source boundary correct?
- Are public/private details separated?
- Are behavior claims supported by code/tests/specs?
- Are missing gaps clearly marked?
- Are examples correct?
- Are failure modes accurate?
- Is the change impact checklist realistic?
Editorial reviewer checks:
- Is the doc organized by reader task?
- Is the title specific?
- Are paragraphs short?
- Are instructions imperative and direct?
- Is terminology consistent?
- Does it avoid overclaiming?
- Does it link to reference docs instead of duplicating them?
Security reviewer checks:
- Are secrets removed?
- Are internal-only endpoints hidden?
- Are exploit steps avoided?
- Is auth behavior described safely?
- Are logs/screenshots sanitized?
20. Security Controls
Code-to-docs systems can leak sensitive information because they process source repositories.
20.1 Sensitive Inputs
| Sensitive Input | Risk |
|---|---|
| secrets in code | credential leakage |
| private endpoints | exposure of internal attack surface |
| feature flags | leaking unreleased features |
| customer-specific rules | confidentiality breach |
| incident workarounds | exploit guidance |
| auth logic details | bypass hints |
| infrastructure manifests | environment exposure |
20.2 Safety Filters
Implement filters before LLM context assembly:
20.3 Policy Example
security:
blockPatterns:
- "AKIA[0-9A-Z]{16}"
- "-----BEGIN PRIVATE KEY-----"
restrictedPaths:
- "infra/secrets/**"
- "ops/break-glass/**"
requireSecurityReviewFor:
- authentication
- authorization
- encryption
- token
- secret
- admin
- bypass
publicDocsDenyTags:
- internal-only
- customer-specific
- exploit-sensitive
The system should fail closed for public docs.
21. Anti-Patterns
21.1 Line-by-Line Paraphrase
Bad:
The method creates a variable called `result`. Then it checks if result is null.
Better:
The method treats missing provider results as retryable failures and persists a failed attempt before returning.
21.2 Treating Comments as Truth
Existing comments may be stale. Use them, but verify against code and tests.
21.3 Documenting Private Helpers Publicly
Private helper docs create noise and expose unnecessary internals. Summarize internals only when they matter for safe modification.
21.4 Generating Examples Without Running Them
AI-generated examples must be tested or marked illustrative.
21.5 No Ownership
Generated docs without owners become abandoned artifacts.
21.6 No Claim Risk Model
A generated class summary and a generated compliance guarantee are not the same risk.
21.7 Recursive AI Contamination
Do not use unreviewed generated docs as primary evidence for future generated docs.
22. Quality Metrics
Track whether the system improves documentation quality.
| Metric | Meaning |
|---|---|
| documented public symbols | coverage of stable API boundaries |
| stale claim count | drift debt |
| docs PR acceptance rate | quality of generated drafts |
| manual edit distance | how much humans must fix |
| unsupported claim rate | hallucination pressure |
| snippet pass rate | example reliability |
| review latency | workflow cost |
| search success rate | usefulness to readers |
| incident-linked doc gaps | operational risk |
| onboarding task success | practical learning value |
22.1 Manual Edit Distance
A useful metric is how much the reviewer changed the generated text before merge.
High edit distance means:
- context is insufficient,
- prompt is weak,
- template is wrong,
- source extraction is noisy,
- or AI is asked to infer too much.
Low edit distance does not automatically mean high quality. Reviewers may rubber-stamp. Pair it with defect metrics.
23. Minimal Viable Implementation
Start small.
23.1 MVP Scope
Implement:
- scan one repository,
- parse one primary language,
- extract public symbols,
- extract doc comments,
- link tests by naming convention,
- generate module overview docs,
- include evidence table,
- run Markdown/prose linting,
- create PR for human review.
Do not start with:
- full graph database,
- all languages,
- automatic public publishing,
- autonomous edits to code,
- organization-wide rollout,
- compliance claims,
- runtime behavior inference.
23.2 MVP Success Criteria
The MVP succeeds if:
- reviewers accept at least 50% of generated structure,
- generated docs identify missing information accurately,
- no unsupported high-risk claims are published,
- examples are either tested or excluded,
- docs PRs are easy to review,
- engineers voluntarily use the output for onboarding or change impact.
24. Advanced Implementation: Diff-Aware Generation
A mature system should generate docs based on code diffs, not only full repo scans.
24.1 Changed Symbol Detection
A change may affect docs when:
- public signature changes,
- parameter is added/removed,
- error behavior changes,
- config key changes,
- event emission changes,
- annotation changes,
- route mapping changes,
- test expectation changes,
- deprecation metadata changes.
24.2 Docs Patch Strategy
Prefer minimal diffs.
Bad:
Regenerate the whole page on every code change.
Better:
Update only affected sections and evidence metadata.
This keeps review cost low.
25. Advanced Implementation: Behavior Evidence from Tests
Tests are often better documentation evidence than implementation code because they encode expected behavior.
25.1 Test Signal Extraction
Extract:
- test class/function names,
- arrange/act/assert structure,
- expected errors,
- boundary values,
- fixtures,
- mocked external systems,
- contract expectations,
- snapshot outputs.
25.2 Test Naming Quality
If test names are poor, AI documentation will suffer.
Bad:
@Test
void test1() {}
Better:
@Test
void submit_returnsExistingAttempt_whenIdempotencyKeyWasAlreadyUsed() {}
Good test names become high-quality documentation seeds.
25.3 Behavior Summary from Tests
AI can summarize test evidence:
Observed behavior from tests:
- duplicate idempotency keys return the existing attempt,
- invalid order state produces a domain validation error,
- provider timeout is stored as retryable failure,
- provider permanent decline is stored as non-retryable failure.
But the system must label this as test-observed behavior, not necessarily complete behavior.
26. Integration with Existing Docs
Generated docs should not overwrite human-authored architecture docs.
26.1 Merge Modes
| Mode | Meaning | Use Case |
|---|---|---|
| generated file | full file owned by generator | reference docs |
| generated section | marked region inside human doc | symbol inventory |
| suggested patch | bot proposes changes | narrative docs |
| review comment | no file change | PR assistance |
| evidence artifact | machine-readable metadata only | audit/validation |
26.2 Generated Section Markers
<!-- ai-docs:start source="code-to-docs" symbol="PaymentProcessor" -->
Generated reference content here.
<!-- ai-docs:end -->
Only update inside markers. Never overwrite surrounding human narrative automatically.
27. Failure Modeling
27.1 Failure Modes
| Failure | Cause | Mitigation |
|---|---|---|
| Hallucinated behavior | insufficient evidence | evidence table + unsupported claim gate |
| Stale docs | code changes not tracked | claim fingerprints + PR impact bot |
| Noisy docs | too many symbols documented | intent classifier + priority scoring |
| Leaked internals | public/private boundary missing | security classifier + fail-closed policy |
| Reviewer fatigue | huge generated diffs | minimal patch generation |
| Wrong ownership | stale CODEOWNERS | service catalog sync |
| Broken examples | generated snippets not tested | snippet CI |
| Recursive contamination | AI output used as evidence | source hierarchy policy |
| Framework blind spots | static parser misses runtime wiring | framework adapters + human review |
27.2 Debugging Checklist
When output quality is poor, inspect in this order:
- Was the doc intent correct?
- Was the target audience correct?
- Did the context packet include the right symbols?
- Were tests included as evidence?
- Did the prompt separate facts from inference?
- Did the model receive stale generated docs as source?
- Were private/internal symbols overrepresented?
- Was the output template appropriate?
- Did validation catch unsupported claims?
- Did reviewers have enough evidence to correct it?
28. Practical 20-Hour Drill
Hour 1–2: Pick a Small Service
Choose one service with:
- 5–20 public entrypoints,
- meaningful tests,
- existing but imperfect docs,
- clear owner.
Hour 3–5: Build Symbol Inventory
Extract:
- files,
- public classes/functions,
- signatures,
- comments,
- test names,
- owners.
Output JSON.
Hour 6–8: Generate Module Overview
Build one context packet and generate one module overview.
Include evidence table and missing information.
Hour 9–11: Add Validation
Add:
- markdown lint,
- prose lint,
- link check,
- snippet test if examples exist.
Hour 12–14: Add Drift Detection
Track hashes for evidence snippets.
Mark docs stale when evidence changes.
Hour 15–17: Add PR Workflow
Generate a docs branch or patch.
Ask the code owner to review.
Measure edit distance.
Hour 18–20: Improve One Weak Point
Pick the biggest weakness:
- bad context,
- weak prompt,
- missing tests,
- noisy output,
- bad template,
- broken example.
Improve only that.
This is Kaufman's loop: practice, feedback, correction.
29. Final Mental Model
Code-to-docs is not about making AI explain code faster.
It is about building a documentation supply chain from source evidence to reviewed knowledge.
The system should preserve these invariants:
- Extract deterministic facts before generating narrative.
- Keep public documentation separate from internal implementation details.
- Attach evidence to generated claims.
- Mark inference as inference.
- Test examples before publishing.
- Detect drift when source evidence changes.
- Route high-risk docs to human owners.
- Avoid using unreviewed AI output as source truth.
- Optimize for reader tasks, not symbol count.
- Treat documentation as part of the engineering system.
If these invariants hold, AI becomes a multiplier for documentation quality.
If they do not, AI becomes a high-throughput stale-doc generator.
30. References
- OpenAPI Initiative — OpenAPI Specifications: https://www.openapis.org/
- OpenAPI Specification v3.2.0: https://spec.openapis.org/oas/v3.2.0.html
- Oracle JavaDoc Documentation Comment Specification for the Standard Doclet, JDK 25: https://docs.oracle.com/en/java/javase/25/docs/specs/javadoc/doc-comment-spec.html
- Tree-sitter Introduction: https://tree-sitter.github.io/tree-sitter/
- Write the Docs — Docs as Code: https://www.writethedocs.org/guide/docs-as-code/
31. What Comes Next
Part 022 applies the same system thinking to API documentation with OpenAPI.
The key shift is:
Code-to-docs starts from implementation evidence. OpenAPI documentation starts from an explicit HTTP contract.
That changes the source-of-truth model, validation strategy, governance workflow, and AI usage pattern.
You just completed lesson 21 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.
Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.