Deepen PracticeOrdered learning track

Learn Ai Driven Documentation Part 021 Code To Docs Implementation

[]25 min read4925 words

In This Lesson

1. Why This Part Exists 2. Kaufman Framing 3. Code-to-Docs Is Not Code Explanation

Lesson 2135 lesson track20–29 Deepen Practice

title: Learn AI-Driven Documentation and Technical Writing Implementation and Usage - Part 021 description: Deep implementation guide for code-to-docs systems: static analysis, AST extraction, symbol graphs, semantic summaries, examples, ownership, validation, and AI-assisted documentation generation from source code. series: learn-ai-driven-documentation seriesTitle: Learn AI-Driven Documentation and Technical Writing Implementation and Usage order: 21 partTitle: Code-to-Docs Implementation tags:

ai
documentation
technical-writing
code-to-docs
static-analysis
ast
symbol-graph
docs-as-code
series date: 2026-06-30

Part 021 — Code-to-Docs Implementation

1. Why This Part Exists

Code-to-docs is one of the most attractive AI documentation use cases, but also one of the easiest to implement badly.

The naive version says:

Read source code and ask an LLM to explain it.

That can help during local exploration, but it is not an engineering-grade documentation system. It does not know ownership, public API boundaries, runtime behavior, version compatibility, examples, operational constraints, or whether the generated text is still true after the next commit.

A production-grade code-to-docs system must answer harder questions:

Which source files are allowed to become documentation sources?
Which code symbols are part of a stable contract and which are internal implementation details?
Which claims can be derived mechanically and which require human verification?
How do we prevent generated docs from becoming stale faster than manually written docs?
How do we attach evidence to AI-generated explanations?
How do we keep code documentation useful without turning it into noisy line-by-line paraphrase?
How do we review generated documentation in a PR workflow?
How do we support multiple languages and frameworks without hardcoding one parser per team forever?

This part focuses on implementation.

We are not re-teaching Java internals, API design, domain modeling, persistence, or event governance. Those belong in separate deep series. Here, code is treated as a documentation source and a machine-readable evidence stream.

The core principle:

Code-to-docs should extract structure deterministically, generate narrative cautiously, validate claims automatically where possible, and require human ownership for published truth.

2. Kaufman Framing

Kaufman's method tells us to deconstruct the skill into sub-skills, learn enough to self-correct, remove practice barriers, and spend focused time producing real outputs.

For code-to-docs, the sub-skills are not "write a prompt". They are:

Sub-skill	What You Must Be Able to Do
Boundary detection	Identify public API, internal code, generated code, test fixtures, and examples.
Source extraction	Parse source into symbols, relationships, signatures, comments, annotations, and metadata.
Semantic classification	Classify symbols by role: entrypoint, DTO, service, repository, config, handler, adapter, command, policy.
Evidence modeling	Connect each generated claim to files, line ranges, symbols, tests, specs, commits, or owners.
Narrative generation	Produce useful docs without paraphrasing code mechanically.
Example generation	Derive or propose usage examples that compile, run, or pass validation.
Drift detection	Detect when code changes invalidate docs.
Review integration	Route generated docs through owners, CI checks, and editorial gates.
Safety control	Prevent leakage of secrets, internal-only behavior, exploit details, and misleading guarantees.

2.1 Target Performance Level

After this part, you should be able to design and implement a code-to-docs pipeline that can:

Parse a repository and build a language-neutral symbol inventory.
Generate module-level and symbol-level documentation drafts.
Attach evidence to generated documentation.
Distinguish extractive facts from inferred explanations.
Validate code snippets and examples in CI.
Detect stale docs when code changes.
Route risky documentation changes to human owners.
Avoid publishing AI-generated claims without verification.

2.2 Practice Output

The concrete practice output for this part is a small but realistic pipeline:

repository source
  -> source scanner
  -> parser / language adapter
  -> symbol graph
  -> documentation intent classifier
  -> context packet builder
  -> LLM draft generator
  -> claim verifier
  -> docs PR creator
  -> human review
  -> docs site publish

This is the smallest architecture that has enough structure to be useful beyond a demo.

3. Code-to-Docs Is Not Code Explanation

A code explanation is usually local:

What does this function do?

Documentation is operational:

What does a reader need to know to correctly use, change, extend, operate, or govern this code?

The difference matters.

3.1 Bad Code-to-Docs Output

Bad output often looks like this:

The `PaymentProcessor` class processes payments. It has a method called `processPayment` that takes a request and returns a response. It checks the request, calls the repository, and returns the result.

This is low value because it repeats obvious syntax. It does not answer:

when to use it,
what invariants it enforces,
what errors can occur,
what dependencies it has,
what state transitions it triggers,
what observability signals it emits,
how it should be tested,
what not to call directly,
and what contract must remain stable.

3.2 Good Code-to-Docs Output

Good output is reader-task oriented:

`PaymentProcessor` is the application-layer coordinator for payment submission.
Use it from command handlers, not directly from HTTP controllers, because it assumes the request has already passed transport-level authentication and request-shape validation.

It enforces these business invariants:

- an order can only be charged when its state is `PAYMENT_PENDING`,
- duplicate idempotency keys return the existing payment attempt,
- provider failures are stored as retryable attempts unless the provider marks the failure as permanent.

Changing this component requires reviewing:

- `PaymentAttemptRepository`, because persistence state is mutated during processing,
- `PaymentProviderClient`, because external provider errors are normalized here,
- `payment-submitted` event consumers, because successful processing emits downstream events.

This output is better because it explains boundaries, invariants, dependencies, and change impact.

3.3 The Code-to-Docs Value Ladder

Level	Output	Value	Risk
1	Symbol list	Low but reliable	Low
2	Signature reference	Useful for navigation	Low
3	Extracted comments	Useful if comments are good	Medium
4	Structural summary	Useful for onboarding	Medium
5	Behavioral explanation	High value	High
6	Change impact analysis	Very high value	High
7	Operational guidance	Very high value	Very high
8	Contract guarantees	Highest value	Highest risk

AI can help at every level, but the validation requirement increases as value increases.

4. Mental Model: Code as Evidence, Not as Complete Truth

Source code is an important source of truth, but it is not the whole truth.

Code tells us:

signatures,
control flow,
data flow,
annotations,
imports,
call relationships,
error handling branches,
constants,
default values,
comments,
test assertions,
dependency references.

Code often does not fully tell us:

product intent,
business rationale,
historical decision context,
operational constraints,
external service behavior,
regulatory meaning,
compatibility promises,
what customers rely on,
what must not change.

Therefore, code-to-docs should separate claim types.

4.1 Claim Taxonomy

Claim Type	Example	Can Be Mechanically Verified?	Required Review
Structural	`OrderService` has `submitOrder()`	Yes	Low
Signature	`submitOrder` accepts `SubmitOrderCommand`	Yes	Low
Dependency	`OrderService` calls `PaymentClient`	Usually	Medium
Behavior	duplicate command returns existing order	Sometimes via tests	Medium/High
Invariant	order cannot be paid twice	Sometimes	High
Operational	retry after 5xx is safe	Rarely from code alone	High
Compliance	audit trail satisfies retention policy	No	Very High
Product	customers see state as "submitted"	No	Product/Domain review

A code-to-docs system should store this taxonomy in metadata.

Example frontmatter:

aiGenerated: true
sourceType: code-to-docs
claimRisk: medium
evidence:
  - type: source
    path: services/order/src/main/java/com/acme/order/OrderService.java
    symbols:
      - com.acme.order.OrderService#submitOrder
  - type: test
    path: services/order/src/test/java/com/acme/order/OrderServiceTest.java
reviewRequired:
  - technical-owner
  - domain-owner

This prevents generated documentation from pretending every statement has the same confidence.

5. Architecture Overview

A robust code-to-docs implementation has several stages.

Each stage should have a clear contract.

Stage	Responsibility	Should Not Do
Source scanner	Find candidate files and classify them	Interpret behavior deeply
Language adapter	Parse language-specific syntax	Generate final prose
Symbol inventory	Store symbols and signatures	Infer business intent
Relationship graph	Store calls, imports, inheritance, config links	Pretend dynamic behavior is fully known
Intent classifier	Decide what doc type is needed	Publish docs
Context builder	Assemble bounded evidence	Include unlimited repo context
LLM generator	Draft narrative	Invent unverified guarantees
Verifier	Check claims against evidence	Approve product claims alone
Formatter	Produce MDX/Markdown	Change technical meaning
PR creator	Route review	Bypass owners

6. Source Scanning

The first implementation mistake is to feed the whole repository to AI. That creates noise, cost, leakage risk, and weak traceability.

Instead, scan and classify files before generation.

6.1 File Categories

Category	Examples	Use in Docs?
Public source	exported modules, public classes, controllers, SDK APIs	Yes
Internal source	private helpers, package-private implementation	Usually summarized only at module level
Tests	unit, integration, contract, golden tests	Yes, as behavior evidence
Config	route config, feature flags, deployment manifests	Yes, with caution
Generated code	protobuf output, OpenAPI generated clients	Usually no; source spec is better
Vendor code	copied dependencies	No
Build scripts	Gradle, Maven, npm, Make, CI	Yes for setup/build docs
Secrets/config samples	`.env.example`, Helm values	Yes, with redaction
Fixtures	test data	Only as examples if sanitized

6.2 Ignore Rules

Define explicit ignore rules:

codeToDocs:
  ignore:
    - "**/target/**"
    - "**/build/**"
    - "**/node_modules/**"
    - "**/vendor/**"
    - "**/*.generated.*"
    - "**/generated/**"
    - "**/.terraform/**"
    - "**/secrets/**"
  includeEvidenceFromTests: true
  includePrivateSymbols: false
  maxFileSizeKb: 256

Do not rely only on .gitignore. Documentation scanning has different safety concerns.

6.3 Source Classification Heuristics

A first version can use path and extension heuristics:

Signal	Likely Meaning
`/api/`, `/controllers/`, `/routes/`	external entrypoints
`/domain/`, `/model/`, `/aggregate/`	business model
`/infra/`, `/adapter/`, `/client/`	integration boundary
`/config/`	runtime configuration
`/test/`, `/spec/`, `/__tests__/`	behavior evidence
`Controller`, `Handler`, `*Resource`	request entrypoint
`Service`, `UseCase`, `*CommandHandler`	application operation
`Repository`, `Dao`	persistence boundary
`Client`, `Gateway`, `*Adapter`	external dependency
`Policy`, `Rule`, `*Validator`	business rule/invariant

These heuristics should be treated as hints, not truth.

7. Parsing Strategy

There are four common approaches.

Approach	Example	Pros	Cons
Regex	scan for `class`, `function`, annotations	Simple	Fragile
Language compiler API	Java compiler API, TypeScript compiler API, Roslyn	Accurate	Language-specific
Universal parser	Tree-sitter	Multi-language, fast	Semantics limited
Framework reflection	runtime route metadata, Spring beans, NestJS modules	High-level	Requires build/runtime context

A production system often combines them.

7.1 Recommended Layering

Use Tree-sitter or a similar parser for broad, fast coverage. Use compiler APIs for language-specific precision where it matters. Use framework introspection when documentation depends on runtime wiring.

7.2 Symbol Model

Create a language-neutral symbol model.

type SymbolKind =
  | "module"
  | "package"
  | "class"
  | "interface"
  | "record"
  | "enum"
  | "function"
  | "method"
  | "field"
  | "route"
  | "event-handler"
  | "configuration"
  | "test-case";

type SymbolNode = {
  id: string;
  language: "java" | "typescript" | "go" | "csharp" | "python" | "unknown";
  kind: SymbolKind;
  name: string;
  qualifiedName?: string;
  path: string;
  startLine: number;
  endLine: number;
  visibility?: "public" | "protected" | "package" | "private" | "exported" | "internal";
  signature?: string;
  annotations?: string[];
  docComment?: string;
  owner?: string;
  tags?: string[];
};

Do not make the model too language-specific. Keep extension fields for language details.

type SymbolNode = {
  id: string;
  kind: SymbolKind;
  path: string;
  startLine: number;
  endLine: number;
  languageDetails?: Record<string, unknown>;
};

7.3 Relationship Model

Docs become useful when symbols are connected.

type EdgeKind =
  | "declares"
  | "imports"
  | "calls"
  | "implements"
  | "extends"
  | "uses-type"
  | "throws"
  | "emits-event"
  | "handles-event"
  | "reads-config"
  | "writes-state"
  | "tested-by"
  | "documented-by"
  | "owned-by";

type SymbolEdge = {
  from: string;
  to: string;
  kind: EdgeKind;
  confidence: "exact" | "inferred" | "heuristic";
  evidence: EvidenceRef[];
};

type EvidenceRef = {
  path: string;
  startLine?: number;
  endLine?: number;
  commit?: string;
};

The confidence field is important. A static call edge might be exact in a simple function call, inferred in a dependency-injected service, and heuristic in reflection-heavy code.

8. Extract Before You Generate

Do not ask the LLM to discover everything from raw files. First, extract deterministic facts.

8.1 Extracted Facts

Fact	Extraction Source
Symbol names	parser/compiler
Visibility	parser/compiler
Signatures	parser/compiler
Parameters	parser/compiler
Return types	parser/compiler
Throws/errors	annotations, code, compiler
Annotations/decorators	parser/compiler
Routes	framework annotations/config
Event topics	annotations/config/constants/specs
Config keys	constants/config binding
Existing comments	doc comments
Tests	test names/assertions
Owners	CODEOWNERS, service catalog
Links to docs	comments/frontmatter/search

8.2 Generated Explanations

Only after extraction should AI generate narrative.

Narrative	Required Input
Module overview	symbol graph + package structure + README + owners
Component responsibility	class/function signature + dependencies + tests
Usage guide	public entrypoints + examples + tests
Change impact	relationship graph + owners + downstream references
Troubleshooting	errors + logs + tests + runbooks
Migration note	diff + deprecation metadata + examples

This separation reduces hallucination and makes the output reviewable.

9. Documentation Intent Classifier

Not every symbol deserves a standalone document.

A code-to-docs system should classify documentation intent.

9.1 Intent Categories

Intent	Trigger	Output
Reference	public API, SDK class, exported function	symbol reference page
Explanation	complex module, high fan-in, non-obvious policy	conceptual doc
How-to	common task, setup, extension point	procedural guide
Tutorial	onboarding path	guided exercise
Runbook	alerts, operational procedures	operational doc
ADR candidate	code shows major design decision	decision draft
Deprecated API note	deprecation annotation/tag	migration doc

9.2 Priority Score

A practical prioritization model:

priority = publicBoundaryWeight
         + changeFrequencyWeight
         + supportTicketWeight
         + incidentWeight
         + onboardingWeight
         + fanInWeight
         - existingDocQualityWeight

High-priority generated docs should be reviewed first.

10. Context Packet for Code-to-Docs

The LLM should receive a bounded context packet, not an arbitrary folder dump.

10.1 Context Packet Shape

docIntent: component-explanation
targetAudience:
  - backend engineer
  - tech lead
readerTask: "Understand how to safely modify payment submission behavior."
sourceOfTruthPriority:
  - source-code
  - tests
  - openapi-spec
  - adr
  - existing-docs
symbol:
  id: "java:com.acme.payment.PaymentProcessor"
  path: "services/payment/src/main/java/com/acme/payment/PaymentProcessor.java"
  lines: "24-188"
relatedSymbols:
  - "PaymentAttemptRepository"
  - "PaymentProviderClient"
  - "PaymentSubmittedEvent"
knownTests:
  - "PaymentProcessorTest.shouldReturnExistingAttemptForSameIdempotencyKey"
evidence:
  - path: "services/payment/src/main/java/com/acme/payment/PaymentProcessor.java"
    lines: "56-87"
    claim: "idempotency key is checked before provider call"
constraints:
  - "Do not claim provider retry behavior unless supported by tests or runbook."
  - "Separate verified facts from inferred explanation."
outputFormat: mdx

10.2 Context Rules

The context packet should include:

target reader,
documentation intent,
source hierarchy,
selected symbols,
related symbols,
evidence snippets,
existing docs to preserve,
style constraints,
forbidden claims,
required verification output.

It should exclude:

secrets,
production credentials,
personal data,
irrelevant files,
large logs,
vendor dependencies,
stale generated docs unless explicitly marked,
previous AI output unless reviewed and promoted.

11. Prompt Contract for Code-to-Docs

A useful prompt is a contract, not a wish.

You are generating a documentation draft from source evidence.

Reader task:
- Help a backend engineer understand how to safely modify payment submission behavior.

Rules:
- Use only the provided evidence.
- Do not claim runtime behavior unless evidence supports it.
- Separate verified facts from inferred explanations.
- Attach evidence references for important claims.
- Mark missing information explicitly.
- Do not document private helper methods unless they affect public behavior.
- Do not expose secrets, internal tokens, private URLs, or customer data.

Output:
- MDX document.
- Sections: Overview, When to Use, Boundaries, Key Flow, Invariants, Dependencies, Failure Modes, Change Checklist, Evidence Table, Gaps.

The prompt should force the model to admit uncertainty.

11.1 Required Evidence Table

Every generated doc should include a hidden or visible evidence table during review.

Claim	Evidence	Confidence	Review Needed
`PaymentProcessor` checks idempotency before provider call	`PaymentProcessor.java:56-87`	High	Technical owner
Provider timeout is retryable	`PaymentProviderClientTest.java:44-61`	Medium	SRE owner
Duplicate submission is safe	source + tests	Medium	Domain owner

For public docs, the final evidence table may be hidden or transformed, but the PR review should retain it.

12. Output Patterns

12.1 Module Overview

A module overview should answer:

What does this module own?
What does it not own?
What are its public boundaries?
What state does it read/write?
What external systems does it call?
What events does it emit or consume?
How do engineers safely change it?

Template:

# Payment Module

## Responsibility

## Non-Responsibilities

## Public Entry Points

## Main Flow

## Domain Invariants

## External Dependencies

## Events

## Configuration

## Operational Notes

## Change Checklist

## Evidence

12.2 Symbol Reference

A symbol reference should be terse.

## `PaymentProcessor.submit()`

Coordinates payment submission for an order that is ready for payment.

### Signature

```java
PaymentResult submit(SubmitPaymentCommand command)

Inputs

Field	Meaning	Required	Notes
`orderId`	Order to charge	Yes	Must refer to existing order
`idempotencyKey`	Duplicate submission key	Yes	Reused keys return existing attempt

Behavior

Loads the order.
Checks payment state.
Checks idempotency key.
Calls provider client.
Persists payment attempt.
Emits payment event on success.

Failure Modes

Failure	Result	Retry?
invalid state	domain error	No
provider timeout	retryable attempt	Yes
permanent provider decline	failed attempt	No


### 12.3 Change Impact Summary

```md
## Change Impact

Changing `PaymentProcessor.submit()` may affect:

- HTTP endpoint: `POST /payments`
- Event: `payment-submitted`
- Tables: `payment_attempts`
- Consumers: settlement service, notification service
- Tests: `PaymentProcessorTest`, `PaymentContractTest`

Before merging, verify:

- idempotency behavior,
- provider failure mapping,
- event payload compatibility,
- rollback behavior,
- dashboard and alert expectations.

This is much more useful than a generated method-by-method catalog.

13. Example Implementation Blueprint

13.1 Repository Layout

/tools/code-to-docs/
  src/
    scanner/
    adapters/
      java/
      typescript/
      go/
      csharp/
    graph/
    context/
    generation/
    verification/
    output/
  config/
    code-to-docs.yml
    prompts/
      module-overview.md
      symbol-reference.md
      change-impact.md
  tests/
/docs/
  engineering/
  generated/
  services/

13.2 Pipeline Command

code-to-docs generate \
  --repo . \
  --config tools/code-to-docs/config/code-to-docs.yml \
  --target services/payment \
  --intent module-overview \
  --out docs/services/payment/overview.mdx \
  --create-pr

13.3 Configuration

project:
  name: payments-platform
  defaultAudience:
    - backend-engineer
    - tech-lead

sourceScanning:
  include:
    - "services/**/src/**"
    - "services/**/test/**"
  exclude:
    - "**/target/**"
    - "**/build/**"
    - "**/generated/**"
    - "**/*.generated.*"

languages:
  java:
    enabled: true
    parser: compiler-api
    includeAnnotations:
      - RestController
      - Controller
      - Service
      - Repository
      - Deprecated
      - Transactional
  typescript:
    enabled: true
    parser: typescript-compiler
  go:
    enabled: true
    parser: tree-sitter

output:
  format: mdx
  includeEvidenceTable: true
  generatedNotice: true
  maxDocLengthWords: 1800

review:
  requireCodeOwner: true
  requireDomainOwnerForInvariants: true
  requireSecurityReviewForAuthOrSecretClaims: true

validation:
  runMarkdownLint: true
  runVale: true
  runSnippetTests: true
  failOnUnsupportedClaim: true

14. Language Adapter Design

A language adapter converts language-specific syntax into the unified symbol model.

interface LanguageAdapter {
  language: string;
  supports(path: string): boolean;
  parse(file: SourceFile): ParsedFile;
  extractSymbols(parsed: ParsedFile): SymbolNode[];
  extractEdges(parsed: ParsedFile, symbols: SymbolNode[]): SymbolEdge[];
  extractDocComments(parsed: ParsedFile, symbols: SymbolNode[]): DocComment[];
  extractTests?(parsed: ParsedFile): TestEvidence[];
}

14.1 Java Adapter

For Java, useful signals include:

package declaration,
class/interface/record/enum declarations,
method signatures,
annotations,
visibility modifiers,
Javadoc comments,
thrown exceptions,
dependency injection constructor fields,
test annotations,
framework annotations.

Example extraction result:

{
  "id": "java:com.acme.payment.PaymentProcessor#submit",
  "language": "java",
  "kind": "method",
  "name": "submit",
  "qualifiedName": "com.acme.payment.PaymentProcessor.submit",
  "path": "services/payment/src/main/java/com/acme/payment/PaymentProcessor.java",
  "startLine": 42,
  "endLine": 98,
  "visibility": "public",
  "signature": "PaymentResult submit(SubmitPaymentCommand command)",
  "annotations": ["Transactional"],
  "tags": ["application-service", "state-mutating"]
}

Javadoc is useful, but should not be blindly trusted. It may be stale. Treat it as one evidence source.

14.2 TypeScript Adapter

For TypeScript:

exported functions/classes/interfaces,
route decorators,
type aliases,
generics,
JSDoc/TSDoc comments,
imports/exports,
test cases,
schema validators.

14.3 Go Adapter

For Go:

exported symbols by capitalization,
package comments,
interface definitions,
function signatures,
struct tags,
HTTP handlers,
test functions,
error variables.

14.4 C# Adapter

For C#:

namespaces,
public/internal types,
attributes,
XML doc comments,
controller actions,
dependency injection registrations,
test attributes.

14.5 Adapter Rule

A language adapter should extract structure, not write prose.

The generation stage should be language-aware but not language-bound.

15. Symbol Graph Design

A symbol graph allows the system to answer documentation questions that are impossible with isolated files.

15.1 Useful Queries

Query	Documentation Use
What public entrypoints call this component?	Change impact
What tests cover this behavior?	Evidence
What external systems are called?	Operational docs
What config keys are read?	Setup docs
What symbols have no docs?	Coverage
What docs mention removed symbols?	Drift detection
What high-fan-in symbols are undocumented?	Prioritization

15.2 Graph Example

15.3 Storage Options

Option	Good For	Trade-off
JSON files	simple builds, CI artifacts	limited query power
SQLite	local indexing, medium repos	graph traversal manual
Graph DB	rich relationships	operational overhead
Search index	retrieval and ranking	weak relationship semantics
Hybrid	serious internal platform	more implementation complexity

For a first implementation, SQLite plus JSON artifacts is often enough. Move to graph storage when relationship queries become central.

16. AI Generation Strategy

16.1 Do Not Generate Everything

Do not generate docs for every symbol. That creates documentation spam.

Generate only when one or more conditions are true:

symbol is a public or exported boundary,
symbol is high fan-in,
symbol is high-risk,
symbol is frequently changed,
symbol is frequently searched,
symbol is frequently involved in incidents,
symbol has poor existing documentation,
symbol is part of onboarding path,
symbol appears in support tickets.

16.2 Use Templates

Prompting should be template-driven.

Doc Type	Template Sections
Module overview	responsibility, boundaries, flows, dependencies, change checklist
Component explanation	purpose, collaborators, invariants, failure modes
Public API reference	signature, params, return, errors, examples
Extension guide	extension point, steps, tests, constraints
Change impact	affected entrypoints, tests, docs, owners
Deprecation note	old behavior, replacement, migration steps, timeline

16.3 Require Missing Information Section

Every generated draft should include:

## Missing Information

- No test evidence found for retry behavior.
- No ADR found for idempotency strategy.
- No runbook found for provider outage handling.

This is one of the simplest ways to prevent false confidence.

17. Example Generation

Examples are where code-to-docs becomes dangerous.

An example that does not compile or does not match production behavior is worse than no example.

17.1 Example Source Priority

Priority	Source	Trust
1	Existing tests	Highest
2	Existing examples	High if tested
3	Contract tests	High
4	README snippets	Medium
5	AI-generated examples	Low until tested

17.2 Example Lifecycle

17.3 Snippet Testing

For Java:

./gradlew test --tests '*DocumentationSnippetTest'

For TypeScript:

npm run test:docs-snippets

For shell examples:

shellcheck docs/**/*.sh

For HTTP examples:

newman run docs/examples/payments.postman_collection.json

The exact tool does not matter as much as the invariant:

Published examples must be executable or explicitly marked as illustrative.

18. Drift Detection

Code-to-docs is only useful if it can detect staleness.

18.1 Drift Types

Drift Type	Example	Detection
Symbol removed	docs mention deleted method	symbol index check
Signature changed	docs show old parameter	signature diff
Behavior changed	tests changed but docs not updated	test-to-doc mapping
Example broken	snippet no longer compiles	snippet tests
Config changed	docs mention old key	config key index
Ownership changed	stale team owner	CODEOWNERS/service catalog diff
Public contract changed	OpenAPI differs from docs	spec diff

18.2 Stale Claim Metadata

Each generated claim can store a dependency fingerprint.

claims:
  - id: claim-001
    text: "Duplicate idempotency keys return the existing payment attempt."
    evidence:
      - path: PaymentProcessor.java
        lines: 56-87
        hash: "sha256:abc123"
      - path: PaymentProcessorTest.java
        lines: 44-62
        hash: "sha256:def456"
    reviewOwner: payments-team

When any evidence hash changes, mark the claim as stale.

18.3 PR Bot Behavior

On a code PR, the bot can comment:

Documentation impact detected.

Changed symbols:
- `PaymentProcessor.submit()`

Affected docs:
- `docs/services/payment/overview.mdx`
- `docs/services/payment/change-checklist.mdx`

Stale claims:
- duplicate idempotency behavior
- provider timeout retry behavior

Action required:
- Update docs or add `docs-impact: none` with owner approval.

This is high leverage because it keeps documentation updates close to code changes.

19. Review Workflow

Generated documentation should enter the same PR system as hand-written docs.

19.1 Review Checklist

Technical reviewer checks:

Is the source boundary correct?
Are public/private details separated?
Are behavior claims supported by code/tests/specs?
Are missing gaps clearly marked?
Are examples correct?
Are failure modes accurate?
Is the change impact checklist realistic?

Editorial reviewer checks:

Is the doc organized by reader task?
Is the title specific?
Are paragraphs short?
Are instructions imperative and direct?
Is terminology consistent?
Does it avoid overclaiming?
Does it link to reference docs instead of duplicating them?

Security reviewer checks:

Are secrets removed?
Are internal-only endpoints hidden?
Are exploit steps avoided?
Is auth behavior described safely?
Are logs/screenshots sanitized?

20. Security Controls

Code-to-docs systems can leak sensitive information because they process source repositories.

20.1 Sensitive Inputs

Sensitive Input	Risk
secrets in code	credential leakage
private endpoints	exposure of internal attack surface
feature flags	leaking unreleased features
customer-specific rules	confidentiality breach
incident workarounds	exploit guidance
auth logic details	bypass hints
infrastructure manifests	environment exposure

20.2 Safety Filters

Implement filters before LLM context assembly:

20.3 Policy Example

security:
  blockPatterns:
    - "AKIA[0-9A-Z]{16}"
    - "-----BEGIN PRIVATE KEY-----"
  restrictedPaths:
    - "infra/secrets/**"
    - "ops/break-glass/**"
  requireSecurityReviewFor:
    - authentication
    - authorization
    - encryption
    - token
    - secret
    - admin
    - bypass
  publicDocsDenyTags:
    - internal-only
    - customer-specific
    - exploit-sensitive

The system should fail closed for public docs.

21. Anti-Patterns

21.1 Line-by-Line Paraphrase

Bad:

The method creates a variable called `result`. Then it checks if result is null.

Better:

The method treats missing provider results as retryable failures and persists a failed attempt before returning.

21.2 Treating Comments as Truth

Existing comments may be stale. Use them, but verify against code and tests.

21.3 Documenting Private Helpers Publicly

Private helper docs create noise and expose unnecessary internals. Summarize internals only when they matter for safe modification.

21.4 Generating Examples Without Running Them

AI-generated examples must be tested or marked illustrative.

21.5 No Ownership

Generated docs without owners become abandoned artifacts.

21.6 No Claim Risk Model

A generated class summary and a generated compliance guarantee are not the same risk.

21.7 Recursive AI Contamination

Do not use unreviewed generated docs as primary evidence for future generated docs.

22. Quality Metrics

Track whether the system improves documentation quality.

Metric	Meaning
documented public symbols	coverage of stable API boundaries
stale claim count	drift debt
docs PR acceptance rate	quality of generated drafts
manual edit distance	how much humans must fix
unsupported claim rate	hallucination pressure
snippet pass rate	example reliability
review latency	workflow cost
search success rate	usefulness to readers
incident-linked doc gaps	operational risk
onboarding task success	practical learning value

22.1 Manual Edit Distance

A useful metric is how much the reviewer changed the generated text before merge.

High edit distance means:

context is insufficient,
prompt is weak,
template is wrong,
source extraction is noisy,
or AI is asked to infer too much.

Low edit distance does not automatically mean high quality. Reviewers may rubber-stamp. Pair it with defect metrics.

23. Minimal Viable Implementation

Start small.

23.1 MVP Scope

Implement:

scan one repository,
parse one primary language,
extract public symbols,
extract doc comments,
link tests by naming convention,
generate module overview docs,
include evidence table,
run Markdown/prose linting,
create PR for human review.

Do not start with:

full graph database,
all languages,
automatic public publishing,
autonomous edits to code,
organization-wide rollout,
compliance claims,
runtime behavior inference.

23.2 MVP Success Criteria

The MVP succeeds if:

reviewers accept at least 50% of generated structure,
generated docs identify missing information accurately,
no unsupported high-risk claims are published,
examples are either tested or excluded,
docs PRs are easy to review,
engineers voluntarily use the output for onboarding or change impact.

24. Advanced Implementation: Diff-Aware Generation

A mature system should generate docs based on code diffs, not only full repo scans.

24.1 Changed Symbol Detection

A change may affect docs when:

public signature changes,
parameter is added/removed,
error behavior changes,
config key changes,
event emission changes,
annotation changes,
route mapping changes,
test expectation changes,
deprecation metadata changes.

24.2 Docs Patch Strategy

Prefer minimal diffs.

Bad:

Regenerate the whole page on every code change.

Better:

Update only affected sections and evidence metadata.

This keeps review cost low.

25. Advanced Implementation: Behavior Evidence from Tests

Tests are often better documentation evidence than implementation code because they encode expected behavior.

25.1 Test Signal Extraction

Extract:

test class/function names,
arrange/act/assert structure,
expected errors,
boundary values,
fixtures,
mocked external systems,
contract expectations,
snapshot outputs.

25.2 Test Naming Quality

If test names are poor, AI documentation will suffer.

Bad:

@Test
void test1() {}

Better:

@Test
void submit_returnsExistingAttempt_whenIdempotencyKeyWasAlreadyUsed() {}

Good test names become high-quality documentation seeds.

25.3 Behavior Summary from Tests

AI can summarize test evidence:

Observed behavior from tests:

- duplicate idempotency keys return the existing attempt,
- invalid order state produces a domain validation error,
- provider timeout is stored as retryable failure,
- provider permanent decline is stored as non-retryable failure.

But the system must label this as test-observed behavior, not necessarily complete behavior.

26. Integration with Existing Docs

Generated docs should not overwrite human-authored architecture docs.

26.1 Merge Modes

Mode	Meaning	Use Case
generated file	full file owned by generator	reference docs
generated section	marked region inside human doc	symbol inventory
suggested patch	bot proposes changes	narrative docs
review comment	no file change	PR assistance
evidence artifact	machine-readable metadata only	audit/validation

26.2 Generated Section Markers

<!-- ai-docs:start source="code-to-docs" symbol="PaymentProcessor" -->
Generated reference content here.
<!-- ai-docs:end -->

Only update inside markers. Never overwrite surrounding human narrative automatically.

27. Failure Modeling

27.1 Failure Modes

Failure	Cause	Mitigation
Hallucinated behavior	insufficient evidence	evidence table + unsupported claim gate
Stale docs	code changes not tracked	claim fingerprints + PR impact bot
Noisy docs	too many symbols documented	intent classifier + priority scoring
Leaked internals	public/private boundary missing	security classifier + fail-closed policy
Reviewer fatigue	huge generated diffs	minimal patch generation
Wrong ownership	stale CODEOWNERS	service catalog sync
Broken examples	generated snippets not tested	snippet CI
Recursive contamination	AI output used as evidence	source hierarchy policy
Framework blind spots	static parser misses runtime wiring	framework adapters + human review

27.2 Debugging Checklist

When output quality is poor, inspect in this order:

Was the doc intent correct?
Was the target audience correct?
Did the context packet include the right symbols?
Were tests included as evidence?
Did the prompt separate facts from inference?
Did the model receive stale generated docs as source?
Were private/internal symbols overrepresented?
Was the output template appropriate?
Did validation catch unsupported claims?
Did reviewers have enough evidence to correct it?

28. Practical 20-Hour Drill

Hour 1–2: Pick a Small Service

Choose one service with:

5–20 public entrypoints,
meaningful tests,
existing but imperfect docs,
clear owner.

Hour 3–5: Build Symbol Inventory

Extract:

files,
public classes/functions,
signatures,
comments,
test names,
owners.

Output JSON.

Hour 6–8: Generate Module Overview

Build one context packet and generate one module overview.

Include evidence table and missing information.

Hour 9–11: Add Validation

Add:

markdown lint,
prose lint,
link check,
snippet test if examples exist.

Hour 12–14: Add Drift Detection

Track hashes for evidence snippets.

Mark docs stale when evidence changes.

Hour 15–17: Add PR Workflow

Generate a docs branch or patch.

Ask the code owner to review.

Measure edit distance.

Hour 18–20: Improve One Weak Point

Pick the biggest weakness:

bad context,
weak prompt,
missing tests,
noisy output,
bad template,
broken example.

Improve only that.

This is Kaufman's loop: practice, feedback, correction.

29. Final Mental Model

Code-to-docs is not about making AI explain code faster.

It is about building a documentation supply chain from source evidence to reviewed knowledge.

The system should preserve these invariants:

Extract deterministic facts before generating narrative.
Keep public documentation separate from internal implementation details.
Attach evidence to generated claims.
Mark inference as inference.
Test examples before publishing.
Detect drift when source evidence changes.
Route high-risk docs to human owners.
Avoid using unreviewed AI output as source truth.
Optimize for reader tasks, not symbol count.
Treat documentation as part of the engineering system.

If these invariants hold, AI becomes a multiplier for documentation quality.

If they do not, AI becomes a high-throughput stale-doc generator.

30. References

OpenAPI Initiative — OpenAPI Specifications: https://www.openapis.org/
OpenAPI Specification v3.2.0: https://spec.openapis.org/oas/v3.2.0.html
Oracle JavaDoc Documentation Comment Specification for the Standard Doclet, JDK 25: https://docs.oracle.com/en/java/javase/25/docs/specs/javadoc/doc-comment-spec.html
Tree-sitter Introduction: https://tree-sitter.github.io/tree-sitter/
Write the Docs — Docs as Code: https://www.writethedocs.org/guide/docs-as-code/

31. What Comes Next

Part 022 applies the same system thinking to API documentation with OpenAPI.

The key shift is:

Code-to-docs starts from implementation evidence. OpenAPI documentation starts from an explicit HTTP contract.

That changes the source-of-truth model, validation strategy, governance workflow, and AI usage pattern.

Lesson Recap

You just completed lesson 21 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 20

Documentation Generation Pipelines

Next Lesson

Lesson 22

Learn Ai Driven Documentation Part 022 Api Documentation With Openapi