Series MapLesson 21 / 35
Deepen PracticeOrdered learning track

Learn Ai Driven Documentation Part 021 Code To Docs Implementation

25 min read4925 words
PrevNext
Lesson 2135 lesson track2029 Deepen Practice

title: Learn AI-Driven Documentation and Technical Writing Implementation and Usage - Part 021 description: Deep implementation guide for code-to-docs systems: static analysis, AST extraction, symbol graphs, semantic summaries, examples, ownership, validation, and AI-assisted documentation generation from source code. series: learn-ai-driven-documentation seriesTitle: Learn AI-Driven Documentation and Technical Writing Implementation and Usage order: 21 partTitle: Code-to-Docs Implementation tags:

  • ai
  • documentation
  • technical-writing
  • code-to-docs
  • static-analysis
  • ast
  • symbol-graph
  • docs-as-code
  • series date: 2026-06-30

Part 021 — Code-to-Docs Implementation

1. Why This Part Exists

Code-to-docs is one of the most attractive AI documentation use cases, but also one of the easiest to implement badly.

The naive version says:

Read source code and ask an LLM to explain it.

That can help during local exploration, but it is not an engineering-grade documentation system. It does not know ownership, public API boundaries, runtime behavior, version compatibility, examples, operational constraints, or whether the generated text is still true after the next commit.

A production-grade code-to-docs system must answer harder questions:

  1. Which source files are allowed to become documentation sources?
  2. Which code symbols are part of a stable contract and which are internal implementation details?
  3. Which claims can be derived mechanically and which require human verification?
  4. How do we prevent generated docs from becoming stale faster than manually written docs?
  5. How do we attach evidence to AI-generated explanations?
  6. How do we keep code documentation useful without turning it into noisy line-by-line paraphrase?
  7. How do we review generated documentation in a PR workflow?
  8. How do we support multiple languages and frameworks without hardcoding one parser per team forever?

This part focuses on implementation.

We are not re-teaching Java internals, API design, domain modeling, persistence, or event governance. Those belong in separate deep series. Here, code is treated as a documentation source and a machine-readable evidence stream.

The core principle:

Code-to-docs should extract structure deterministically, generate narrative cautiously, validate claims automatically where possible, and require human ownership for published truth.


2. Kaufman Framing

Kaufman's method tells us to deconstruct the skill into sub-skills, learn enough to self-correct, remove practice barriers, and spend focused time producing real outputs.

For code-to-docs, the sub-skills are not "write a prompt". They are:

Sub-skillWhat You Must Be Able to Do
Boundary detectionIdentify public API, internal code, generated code, test fixtures, and examples.
Source extractionParse source into symbols, relationships, signatures, comments, annotations, and metadata.
Semantic classificationClassify symbols by role: entrypoint, DTO, service, repository, config, handler, adapter, command, policy.
Evidence modelingConnect each generated claim to files, line ranges, symbols, tests, specs, commits, or owners.
Narrative generationProduce useful docs without paraphrasing code mechanically.
Example generationDerive or propose usage examples that compile, run, or pass validation.
Drift detectionDetect when code changes invalidate docs.
Review integrationRoute generated docs through owners, CI checks, and editorial gates.
Safety controlPrevent leakage of secrets, internal-only behavior, exploit details, and misleading guarantees.

2.1 Target Performance Level

After this part, you should be able to design and implement a code-to-docs pipeline that can:

  1. Parse a repository and build a language-neutral symbol inventory.
  2. Generate module-level and symbol-level documentation drafts.
  3. Attach evidence to generated documentation.
  4. Distinguish extractive facts from inferred explanations.
  5. Validate code snippets and examples in CI.
  6. Detect stale docs when code changes.
  7. Route risky documentation changes to human owners.
  8. Avoid publishing AI-generated claims without verification.

2.2 Practice Output

The concrete practice output for this part is a small but realistic pipeline:

repository source
  -> source scanner
  -> parser / language adapter
  -> symbol graph
  -> documentation intent classifier
  -> context packet builder
  -> LLM draft generator
  -> claim verifier
  -> docs PR creator
  -> human review
  -> docs site publish

This is the smallest architecture that has enough structure to be useful beyond a demo.


3. Code-to-Docs Is Not Code Explanation

A code explanation is usually local:

What does this function do?

Documentation is operational:

What does a reader need to know to correctly use, change, extend, operate, or govern this code?

The difference matters.

3.1 Bad Code-to-Docs Output

Bad output often looks like this:

The `PaymentProcessor` class processes payments. It has a method called `processPayment` that takes a request and returns a response. It checks the request, calls the repository, and returns the result.

This is low value because it repeats obvious syntax. It does not answer:

  • when to use it,
  • what invariants it enforces,
  • what errors can occur,
  • what dependencies it has,
  • what state transitions it triggers,
  • what observability signals it emits,
  • how it should be tested,
  • what not to call directly,
  • and what contract must remain stable.

3.2 Good Code-to-Docs Output

Good output is reader-task oriented:

`PaymentProcessor` is the application-layer coordinator for payment submission.
Use it from command handlers, not directly from HTTP controllers, because it assumes the request has already passed transport-level authentication and request-shape validation.

It enforces these business invariants:

- an order can only be charged when its state is `PAYMENT_PENDING`,
- duplicate idempotency keys return the existing payment attempt,
- provider failures are stored as retryable attempts unless the provider marks the failure as permanent.

Changing this component requires reviewing:

- `PaymentAttemptRepository`, because persistence state is mutated during processing,
- `PaymentProviderClient`, because external provider errors are normalized here,
- `payment-submitted` event consumers, because successful processing emits downstream events.

This output is better because it explains boundaries, invariants, dependencies, and change impact.

3.3 The Code-to-Docs Value Ladder

LevelOutputValueRisk
1Symbol listLow but reliableLow
2Signature referenceUseful for navigationLow
3Extracted commentsUseful if comments are goodMedium
4Structural summaryUseful for onboardingMedium
5Behavioral explanationHigh valueHigh
6Change impact analysisVery high valueHigh
7Operational guidanceVery high valueVery high
8Contract guaranteesHighest valueHighest risk

AI can help at every level, but the validation requirement increases as value increases.


4. Mental Model: Code as Evidence, Not as Complete Truth

Source code is an important source of truth, but it is not the whole truth.

Code tells us:

  • signatures,
  • control flow,
  • data flow,
  • annotations,
  • imports,
  • call relationships,
  • error handling branches,
  • constants,
  • default values,
  • comments,
  • test assertions,
  • dependency references.

Code often does not fully tell us:

  • product intent,
  • business rationale,
  • historical decision context,
  • operational constraints,
  • external service behavior,
  • regulatory meaning,
  • compatibility promises,
  • what customers rely on,
  • what must not change.

Therefore, code-to-docs should separate claim types.

4.1 Claim Taxonomy

Claim TypeExampleCan Be Mechanically Verified?Required Review
StructuralOrderService has submitOrder()YesLow
SignaturesubmitOrder accepts SubmitOrderCommandYesLow
DependencyOrderService calls PaymentClientUsuallyMedium
Behaviorduplicate command returns existing orderSometimes via testsMedium/High
Invariantorder cannot be paid twiceSometimesHigh
Operationalretry after 5xx is safeRarely from code aloneHigh
Complianceaudit trail satisfies retention policyNoVery High
Productcustomers see state as "submitted"NoProduct/Domain review

A code-to-docs system should store this taxonomy in metadata.

Example frontmatter:

aiGenerated: true
sourceType: code-to-docs
claimRisk: medium
evidence:
  - type: source
    path: services/order/src/main/java/com/acme/order/OrderService.java
    symbols:
      - com.acme.order.OrderService#submitOrder
  - type: test
    path: services/order/src/test/java/com/acme/order/OrderServiceTest.java
reviewRequired:
  - technical-owner
  - domain-owner

This prevents generated documentation from pretending every statement has the same confidence.


5. Architecture Overview

A robust code-to-docs implementation has several stages.

Each stage should have a clear contract.

StageResponsibilityShould Not Do
Source scannerFind candidate files and classify themInterpret behavior deeply
Language adapterParse language-specific syntaxGenerate final prose
Symbol inventoryStore symbols and signaturesInfer business intent
Relationship graphStore calls, imports, inheritance, config linksPretend dynamic behavior is fully known
Intent classifierDecide what doc type is neededPublish docs
Context builderAssemble bounded evidenceInclude unlimited repo context
LLM generatorDraft narrativeInvent unverified guarantees
VerifierCheck claims against evidenceApprove product claims alone
FormatterProduce MDX/MarkdownChange technical meaning
PR creatorRoute reviewBypass owners

6. Source Scanning

The first implementation mistake is to feed the whole repository to AI. That creates noise, cost, leakage risk, and weak traceability.

Instead, scan and classify files before generation.

6.1 File Categories

CategoryExamplesUse in Docs?
Public sourceexported modules, public classes, controllers, SDK APIsYes
Internal sourceprivate helpers, package-private implementationUsually summarized only at module level
Testsunit, integration, contract, golden testsYes, as behavior evidence
Configroute config, feature flags, deployment manifestsYes, with caution
Generated codeprotobuf output, OpenAPI generated clientsUsually no; source spec is better
Vendor codecopied dependenciesNo
Build scriptsGradle, Maven, npm, Make, CIYes for setup/build docs
Secrets/config samples.env.example, Helm valuesYes, with redaction
Fixturestest dataOnly as examples if sanitized

6.2 Ignore Rules

Define explicit ignore rules:

codeToDocs:
  ignore:
    - "**/target/**"
    - "**/build/**"
    - "**/node_modules/**"
    - "**/vendor/**"
    - "**/*.generated.*"
    - "**/generated/**"
    - "**/.terraform/**"
    - "**/secrets/**"
  includeEvidenceFromTests: true
  includePrivateSymbols: false
  maxFileSizeKb: 256

Do not rely only on .gitignore. Documentation scanning has different safety concerns.

6.3 Source Classification Heuristics

A first version can use path and extension heuristics:

SignalLikely Meaning
/api/, /controllers/, /routes/external entrypoints
/domain/, /model/, /aggregate/business model
/infra/, /adapter/, /client/integration boundary
/config/runtime configuration
/test/, /spec/, /__tests__/behavior evidence
*Controller, *Handler, *Resourcerequest entrypoint
*Service, *UseCase, *CommandHandlerapplication operation
*Repository, *Daopersistence boundary
*Client, *Gateway, *Adapterexternal dependency
*Policy, *Rule, *Validatorbusiness rule/invariant

These heuristics should be treated as hints, not truth.


7. Parsing Strategy

There are four common approaches.

ApproachExampleProsCons
Regexscan for class, function, annotationsSimpleFragile
Language compiler APIJava compiler API, TypeScript compiler API, RoslynAccurateLanguage-specific
Universal parserTree-sitterMulti-language, fastSemantics limited
Framework reflectionruntime route metadata, Spring beans, NestJS modulesHigh-levelRequires build/runtime context

A production system often combines them.

Use Tree-sitter or a similar parser for broad, fast coverage. Use compiler APIs for language-specific precision where it matters. Use framework introspection when documentation depends on runtime wiring.

7.2 Symbol Model

Create a language-neutral symbol model.

type SymbolKind =
  | "module"
  | "package"
  | "class"
  | "interface"
  | "record"
  | "enum"
  | "function"
  | "method"
  | "field"
  | "route"
  | "event-handler"
  | "configuration"
  | "test-case";

type SymbolNode = {
  id: string;
  language: "java" | "typescript" | "go" | "csharp" | "python" | "unknown";
  kind: SymbolKind;
  name: string;
  qualifiedName?: string;
  path: string;
  startLine: number;
  endLine: number;
  visibility?: "public" | "protected" | "package" | "private" | "exported" | "internal";
  signature?: string;
  annotations?: string[];
  docComment?: string;
  owner?: string;
  tags?: string[];
};

Do not make the model too language-specific. Keep extension fields for language details.

type SymbolNode = {
  id: string;
  kind: SymbolKind;
  path: string;
  startLine: number;
  endLine: number;
  languageDetails?: Record<string, unknown>;
};

7.3 Relationship Model

Docs become useful when symbols are connected.

type EdgeKind =
  | "declares"
  | "imports"
  | "calls"
  | "implements"
  | "extends"
  | "uses-type"
  | "throws"
  | "emits-event"
  | "handles-event"
  | "reads-config"
  | "writes-state"
  | "tested-by"
  | "documented-by"
  | "owned-by";

type SymbolEdge = {
  from: string;
  to: string;
  kind: EdgeKind;
  confidence: "exact" | "inferred" | "heuristic";
  evidence: EvidenceRef[];
};

type EvidenceRef = {
  path: string;
  startLine?: number;
  endLine?: number;
  commit?: string;
};

The confidence field is important. A static call edge might be exact in a simple function call, inferred in a dependency-injected service, and heuristic in reflection-heavy code.


8. Extract Before You Generate

Do not ask the LLM to discover everything from raw files. First, extract deterministic facts.

8.1 Extracted Facts

FactExtraction Source
Symbol namesparser/compiler
Visibilityparser/compiler
Signaturesparser/compiler
Parametersparser/compiler
Return typesparser/compiler
Throws/errorsannotations, code, compiler
Annotations/decoratorsparser/compiler
Routesframework annotations/config
Event topicsannotations/config/constants/specs
Config keysconstants/config binding
Existing commentsdoc comments
Teststest names/assertions
OwnersCODEOWNERS, service catalog
Links to docscomments/frontmatter/search

8.2 Generated Explanations

Only after extraction should AI generate narrative.

NarrativeRequired Input
Module overviewsymbol graph + package structure + README + owners
Component responsibilityclass/function signature + dependencies + tests
Usage guidepublic entrypoints + examples + tests
Change impactrelationship graph + owners + downstream references
Troubleshootingerrors + logs + tests + runbooks
Migration notediff + deprecation metadata + examples

This separation reduces hallucination and makes the output reviewable.


9. Documentation Intent Classifier

Not every symbol deserves a standalone document.

A code-to-docs system should classify documentation intent.

9.1 Intent Categories

IntentTriggerOutput
Referencepublic API, SDK class, exported functionsymbol reference page
Explanationcomplex module, high fan-in, non-obvious policyconceptual doc
How-tocommon task, setup, extension pointprocedural guide
Tutorialonboarding pathguided exercise
Runbookalerts, operational proceduresoperational doc
ADR candidatecode shows major design decisiondecision draft
Deprecated API notedeprecation annotation/tagmigration doc

9.2 Priority Score

A practical prioritization model:

priority = publicBoundaryWeight
         + changeFrequencyWeight
         + supportTicketWeight
         + incidentWeight
         + onboardingWeight
         + fanInWeight
         - existingDocQualityWeight

High-priority generated docs should be reviewed first.


10. Context Packet for Code-to-Docs

The LLM should receive a bounded context packet, not an arbitrary folder dump.

10.1 Context Packet Shape

docIntent: component-explanation
targetAudience:
  - backend engineer
  - tech lead
readerTask: "Understand how to safely modify payment submission behavior."
sourceOfTruthPriority:
  - source-code
  - tests
  - openapi-spec
  - adr
  - existing-docs
symbol:
  id: "java:com.acme.payment.PaymentProcessor"
  path: "services/payment/src/main/java/com/acme/payment/PaymentProcessor.java"
  lines: "24-188"
relatedSymbols:
  - "PaymentAttemptRepository"
  - "PaymentProviderClient"
  - "PaymentSubmittedEvent"
knownTests:
  - "PaymentProcessorTest.shouldReturnExistingAttemptForSameIdempotencyKey"
evidence:
  - path: "services/payment/src/main/java/com/acme/payment/PaymentProcessor.java"
    lines: "56-87"
    claim: "idempotency key is checked before provider call"
constraints:
  - "Do not claim provider retry behavior unless supported by tests or runbook."
  - "Separate verified facts from inferred explanation."
outputFormat: mdx

10.2 Context Rules

The context packet should include:

  1. target reader,
  2. documentation intent,
  3. source hierarchy,
  4. selected symbols,
  5. related symbols,
  6. evidence snippets,
  7. existing docs to preserve,
  8. style constraints,
  9. forbidden claims,
  10. required verification output.

It should exclude:

  • secrets,
  • production credentials,
  • personal data,
  • irrelevant files,
  • large logs,
  • vendor dependencies,
  • stale generated docs unless explicitly marked,
  • previous AI output unless reviewed and promoted.

11. Prompt Contract for Code-to-Docs

A useful prompt is a contract, not a wish.

You are generating a documentation draft from source evidence.

Reader task:
- Help a backend engineer understand how to safely modify payment submission behavior.

Rules:
- Use only the provided evidence.
- Do not claim runtime behavior unless evidence supports it.
- Separate verified facts from inferred explanations.
- Attach evidence references for important claims.
- Mark missing information explicitly.
- Do not document private helper methods unless they affect public behavior.
- Do not expose secrets, internal tokens, private URLs, or customer data.

Output:
- MDX document.
- Sections: Overview, When to Use, Boundaries, Key Flow, Invariants, Dependencies, Failure Modes, Change Checklist, Evidence Table, Gaps.

The prompt should force the model to admit uncertainty.

11.1 Required Evidence Table

Every generated doc should include a hidden or visible evidence table during review.

ClaimEvidenceConfidenceReview Needed
PaymentProcessor checks idempotency before provider callPaymentProcessor.java:56-87HighTechnical owner
Provider timeout is retryablePaymentProviderClientTest.java:44-61MediumSRE owner
Duplicate submission is safesource + testsMediumDomain owner

For public docs, the final evidence table may be hidden or transformed, but the PR review should retain it.


12. Output Patterns

12.1 Module Overview

A module overview should answer:

  • What does this module own?
  • What does it not own?
  • What are its public boundaries?
  • What state does it read/write?
  • What external systems does it call?
  • What events does it emit or consume?
  • How do engineers safely change it?

Template:

# Payment Module

## Responsibility

## Non-Responsibilities

## Public Entry Points

## Main Flow

## Domain Invariants

## External Dependencies

## Events

## Configuration

## Operational Notes

## Change Checklist

## Evidence

12.2 Symbol Reference

A symbol reference should be terse.

## `PaymentProcessor.submit()`

Coordinates payment submission for an order that is ready for payment.

### Signature

```java
PaymentResult submit(SubmitPaymentCommand command)

Inputs

FieldMeaningRequiredNotes
orderIdOrder to chargeYesMust refer to existing order
idempotencyKeyDuplicate submission keyYesReused keys return existing attempt

Behavior

  1. Loads the order.
  2. Checks payment state.
  3. Checks idempotency key.
  4. Calls provider client.
  5. Persists payment attempt.
  6. Emits payment event on success.

Failure Modes

FailureResultRetry?
invalid statedomain errorNo
provider timeoutretryable attemptYes
permanent provider declinefailed attemptNo
### 12.3 Change Impact Summary ```md ## Change Impact Changing `PaymentProcessor.submit()` may affect: - HTTP endpoint: `POST /payments` - Event: `payment-submitted` - Tables: `payment_attempts` - Consumers: settlement service, notification service - Tests: `PaymentProcessorTest`, `PaymentContractTest` Before merging, verify: - idempotency behavior, - provider failure mapping, - event payload compatibility, - rollback behavior, - dashboard and alert expectations.

This is much more useful than a generated method-by-method catalog.


13. Example Implementation Blueprint

13.1 Repository Layout

/tools/code-to-docs/
  src/
    scanner/
    adapters/
      java/
      typescript/
      go/
      csharp/
    graph/
    context/
    generation/
    verification/
    output/
  config/
    code-to-docs.yml
    prompts/
      module-overview.md
      symbol-reference.md
      change-impact.md
  tests/
/docs/
  engineering/
  generated/
  services/

13.2 Pipeline Command

code-to-docs generate \
  --repo . \
  --config tools/code-to-docs/config/code-to-docs.yml \
  --target services/payment \
  --intent module-overview \
  --out docs/services/payment/overview.mdx \
  --create-pr

13.3 Configuration

project:
  name: payments-platform
  defaultAudience:
    - backend-engineer
    - tech-lead

sourceScanning:
  include:
    - "services/**/src/**"
    - "services/**/test/**"
  exclude:
    - "**/target/**"
    - "**/build/**"
    - "**/generated/**"
    - "**/*.generated.*"

languages:
  java:
    enabled: true
    parser: compiler-api
    includeAnnotations:
      - RestController
      - Controller
      - Service
      - Repository
      - Deprecated
      - Transactional
  typescript:
    enabled: true
    parser: typescript-compiler
  go:
    enabled: true
    parser: tree-sitter

output:
  format: mdx
  includeEvidenceTable: true
  generatedNotice: true
  maxDocLengthWords: 1800

review:
  requireCodeOwner: true
  requireDomainOwnerForInvariants: true
  requireSecurityReviewForAuthOrSecretClaims: true

validation:
  runMarkdownLint: true
  runVale: true
  runSnippetTests: true
  failOnUnsupportedClaim: true

14. Language Adapter Design

A language adapter converts language-specific syntax into the unified symbol model.

interface LanguageAdapter {
  language: string;
  supports(path: string): boolean;
  parse(file: SourceFile): ParsedFile;
  extractSymbols(parsed: ParsedFile): SymbolNode[];
  extractEdges(parsed: ParsedFile, symbols: SymbolNode[]): SymbolEdge[];
  extractDocComments(parsed: ParsedFile, symbols: SymbolNode[]): DocComment[];
  extractTests?(parsed: ParsedFile): TestEvidence[];
}

14.1 Java Adapter

For Java, useful signals include:

  • package declaration,
  • class/interface/record/enum declarations,
  • method signatures,
  • annotations,
  • visibility modifiers,
  • Javadoc comments,
  • thrown exceptions,
  • dependency injection constructor fields,
  • test annotations,
  • framework annotations.

Example extraction result:

{
  "id": "java:com.acme.payment.PaymentProcessor#submit",
  "language": "java",
  "kind": "method",
  "name": "submit",
  "qualifiedName": "com.acme.payment.PaymentProcessor.submit",
  "path": "services/payment/src/main/java/com/acme/payment/PaymentProcessor.java",
  "startLine": 42,
  "endLine": 98,
  "visibility": "public",
  "signature": "PaymentResult submit(SubmitPaymentCommand command)",
  "annotations": ["Transactional"],
  "tags": ["application-service", "state-mutating"]
}

Javadoc is useful, but should not be blindly trusted. It may be stale. Treat it as one evidence source.

14.2 TypeScript Adapter

For TypeScript:

  • exported functions/classes/interfaces,
  • route decorators,
  • type aliases,
  • generics,
  • JSDoc/TSDoc comments,
  • imports/exports,
  • test cases,
  • schema validators.

14.3 Go Adapter

For Go:

  • exported symbols by capitalization,
  • package comments,
  • interface definitions,
  • function signatures,
  • struct tags,
  • HTTP handlers,
  • test functions,
  • error variables.

14.4 C# Adapter

For C#:

  • namespaces,
  • public/internal types,
  • attributes,
  • XML doc comments,
  • controller actions,
  • dependency injection registrations,
  • test attributes.

14.5 Adapter Rule

A language adapter should extract structure, not write prose.

The generation stage should be language-aware but not language-bound.


15. Symbol Graph Design

A symbol graph allows the system to answer documentation questions that are impossible with isolated files.

15.1 Useful Queries

QueryDocumentation Use
What public entrypoints call this component?Change impact
What tests cover this behavior?Evidence
What external systems are called?Operational docs
What config keys are read?Setup docs
What symbols have no docs?Coverage
What docs mention removed symbols?Drift detection
What high-fan-in symbols are undocumented?Prioritization

15.2 Graph Example

15.3 Storage Options

OptionGood ForTrade-off
JSON filessimple builds, CI artifactslimited query power
SQLitelocal indexing, medium reposgraph traversal manual
Graph DBrich relationshipsoperational overhead
Search indexretrieval and rankingweak relationship semantics
Hybridserious internal platformmore implementation complexity

For a first implementation, SQLite plus JSON artifacts is often enough. Move to graph storage when relationship queries become central.


16. AI Generation Strategy

16.1 Do Not Generate Everything

Do not generate docs for every symbol. That creates documentation spam.

Generate only when one or more conditions are true:

  • symbol is a public or exported boundary,
  • symbol is high fan-in,
  • symbol is high-risk,
  • symbol is frequently changed,
  • symbol is frequently searched,
  • symbol is frequently involved in incidents,
  • symbol has poor existing documentation,
  • symbol is part of onboarding path,
  • symbol appears in support tickets.

16.2 Use Templates

Prompting should be template-driven.

Doc TypeTemplate Sections
Module overviewresponsibility, boundaries, flows, dependencies, change checklist
Component explanationpurpose, collaborators, invariants, failure modes
Public API referencesignature, params, return, errors, examples
Extension guideextension point, steps, tests, constraints
Change impactaffected entrypoints, tests, docs, owners
Deprecation noteold behavior, replacement, migration steps, timeline

16.3 Require Missing Information Section

Every generated draft should include:

## Missing Information

- No test evidence found for retry behavior.
- No ADR found for idempotency strategy.
- No runbook found for provider outage handling.

This is one of the simplest ways to prevent false confidence.


17. Example Generation

Examples are where code-to-docs becomes dangerous.

An example that does not compile or does not match production behavior is worse than no example.

17.1 Example Source Priority

PrioritySourceTrust
1Existing testsHighest
2Existing examplesHigh if tested
3Contract testsHigh
4README snippetsMedium
5AI-generated examplesLow until tested

17.2 Example Lifecycle

17.3 Snippet Testing

For Java:

./gradlew test --tests '*DocumentationSnippetTest'

For TypeScript:

npm run test:docs-snippets

For shell examples:

shellcheck docs/**/*.sh

For HTTP examples:

newman run docs/examples/payments.postman_collection.json

The exact tool does not matter as much as the invariant:

Published examples must be executable or explicitly marked as illustrative.


18. Drift Detection

Code-to-docs is only useful if it can detect staleness.

18.1 Drift Types

Drift TypeExampleDetection
Symbol removeddocs mention deleted methodsymbol index check
Signature changeddocs show old parametersignature diff
Behavior changedtests changed but docs not updatedtest-to-doc mapping
Example brokensnippet no longer compilessnippet tests
Config changeddocs mention old keyconfig key index
Ownership changedstale team ownerCODEOWNERS/service catalog diff
Public contract changedOpenAPI differs from docsspec diff

18.2 Stale Claim Metadata

Each generated claim can store a dependency fingerprint.

claims:
  - id: claim-001
    text: "Duplicate idempotency keys return the existing payment attempt."
    evidence:
      - path: PaymentProcessor.java
        lines: 56-87
        hash: "sha256:abc123"
      - path: PaymentProcessorTest.java
        lines: 44-62
        hash: "sha256:def456"
    reviewOwner: payments-team

When any evidence hash changes, mark the claim as stale.

18.3 PR Bot Behavior

On a code PR, the bot can comment:

Documentation impact detected.

Changed symbols:
- `PaymentProcessor.submit()`

Affected docs:
- `docs/services/payment/overview.mdx`
- `docs/services/payment/change-checklist.mdx`

Stale claims:
- duplicate idempotency behavior
- provider timeout retry behavior

Action required:
- Update docs or add `docs-impact: none` with owner approval.

This is high leverage because it keeps documentation updates close to code changes.


19. Review Workflow

Generated documentation should enter the same PR system as hand-written docs.

19.1 Review Checklist

Technical reviewer checks:

  • Is the source boundary correct?
  • Are public/private details separated?
  • Are behavior claims supported by code/tests/specs?
  • Are missing gaps clearly marked?
  • Are examples correct?
  • Are failure modes accurate?
  • Is the change impact checklist realistic?

Editorial reviewer checks:

  • Is the doc organized by reader task?
  • Is the title specific?
  • Are paragraphs short?
  • Are instructions imperative and direct?
  • Is terminology consistent?
  • Does it avoid overclaiming?
  • Does it link to reference docs instead of duplicating them?

Security reviewer checks:

  • Are secrets removed?
  • Are internal-only endpoints hidden?
  • Are exploit steps avoided?
  • Is auth behavior described safely?
  • Are logs/screenshots sanitized?

20. Security Controls

Code-to-docs systems can leak sensitive information because they process source repositories.

20.1 Sensitive Inputs

Sensitive InputRisk
secrets in codecredential leakage
private endpointsexposure of internal attack surface
feature flagsleaking unreleased features
customer-specific rulesconfidentiality breach
incident workaroundsexploit guidance
auth logic detailsbypass hints
infrastructure manifestsenvironment exposure

20.2 Safety Filters

Implement filters before LLM context assembly:

20.3 Policy Example

security:
  blockPatterns:
    - "AKIA[0-9A-Z]{16}"
    - "-----BEGIN PRIVATE KEY-----"
  restrictedPaths:
    - "infra/secrets/**"
    - "ops/break-glass/**"
  requireSecurityReviewFor:
    - authentication
    - authorization
    - encryption
    - token
    - secret
    - admin
    - bypass
  publicDocsDenyTags:
    - internal-only
    - customer-specific
    - exploit-sensitive

The system should fail closed for public docs.


21. Anti-Patterns

21.1 Line-by-Line Paraphrase

Bad:

The method creates a variable called `result`. Then it checks if result is null.

Better:

The method treats missing provider results as retryable failures and persists a failed attempt before returning.

21.2 Treating Comments as Truth

Existing comments may be stale. Use them, but verify against code and tests.

21.3 Documenting Private Helpers Publicly

Private helper docs create noise and expose unnecessary internals. Summarize internals only when they matter for safe modification.

21.4 Generating Examples Without Running Them

AI-generated examples must be tested or marked illustrative.

21.5 No Ownership

Generated docs without owners become abandoned artifacts.

21.6 No Claim Risk Model

A generated class summary and a generated compliance guarantee are not the same risk.

21.7 Recursive AI Contamination

Do not use unreviewed generated docs as primary evidence for future generated docs.


22. Quality Metrics

Track whether the system improves documentation quality.

MetricMeaning
documented public symbolscoverage of stable API boundaries
stale claim countdrift debt
docs PR acceptance ratequality of generated drafts
manual edit distancehow much humans must fix
unsupported claim ratehallucination pressure
snippet pass rateexample reliability
review latencyworkflow cost
search success rateusefulness to readers
incident-linked doc gapsoperational risk
onboarding task successpractical learning value

22.1 Manual Edit Distance

A useful metric is how much the reviewer changed the generated text before merge.

High edit distance means:

  • context is insufficient,
  • prompt is weak,
  • template is wrong,
  • source extraction is noisy,
  • or AI is asked to infer too much.

Low edit distance does not automatically mean high quality. Reviewers may rubber-stamp. Pair it with defect metrics.


23. Minimal Viable Implementation

Start small.

23.1 MVP Scope

Implement:

  1. scan one repository,
  2. parse one primary language,
  3. extract public symbols,
  4. extract doc comments,
  5. link tests by naming convention,
  6. generate module overview docs,
  7. include evidence table,
  8. run Markdown/prose linting,
  9. create PR for human review.

Do not start with:

  • full graph database,
  • all languages,
  • automatic public publishing,
  • autonomous edits to code,
  • organization-wide rollout,
  • compliance claims,
  • runtime behavior inference.

23.2 MVP Success Criteria

The MVP succeeds if:

  • reviewers accept at least 50% of generated structure,
  • generated docs identify missing information accurately,
  • no unsupported high-risk claims are published,
  • examples are either tested or excluded,
  • docs PRs are easy to review,
  • engineers voluntarily use the output for onboarding or change impact.

24. Advanced Implementation: Diff-Aware Generation

A mature system should generate docs based on code diffs, not only full repo scans.

24.1 Changed Symbol Detection

A change may affect docs when:

  • public signature changes,
  • parameter is added/removed,
  • error behavior changes,
  • config key changes,
  • event emission changes,
  • annotation changes,
  • route mapping changes,
  • test expectation changes,
  • deprecation metadata changes.

24.2 Docs Patch Strategy

Prefer minimal diffs.

Bad:

Regenerate the whole page on every code change.

Better:

Update only affected sections and evidence metadata.

This keeps review cost low.


25. Advanced Implementation: Behavior Evidence from Tests

Tests are often better documentation evidence than implementation code because they encode expected behavior.

25.1 Test Signal Extraction

Extract:

  • test class/function names,
  • arrange/act/assert structure,
  • expected errors,
  • boundary values,
  • fixtures,
  • mocked external systems,
  • contract expectations,
  • snapshot outputs.

25.2 Test Naming Quality

If test names are poor, AI documentation will suffer.

Bad:

@Test
void test1() {}

Better:

@Test
void submit_returnsExistingAttempt_whenIdempotencyKeyWasAlreadyUsed() {}

Good test names become high-quality documentation seeds.

25.3 Behavior Summary from Tests

AI can summarize test evidence:

Observed behavior from tests:

- duplicate idempotency keys return the existing attempt,
- invalid order state produces a domain validation error,
- provider timeout is stored as retryable failure,
- provider permanent decline is stored as non-retryable failure.

But the system must label this as test-observed behavior, not necessarily complete behavior.


26. Integration with Existing Docs

Generated docs should not overwrite human-authored architecture docs.

26.1 Merge Modes

ModeMeaningUse Case
generated filefull file owned by generatorreference docs
generated sectionmarked region inside human docsymbol inventory
suggested patchbot proposes changesnarrative docs
review commentno file changePR assistance
evidence artifactmachine-readable metadata onlyaudit/validation

26.2 Generated Section Markers

<!-- ai-docs:start source="code-to-docs" symbol="PaymentProcessor" -->
Generated reference content here.
<!-- ai-docs:end -->

Only update inside markers. Never overwrite surrounding human narrative automatically.


27. Failure Modeling

27.1 Failure Modes

FailureCauseMitigation
Hallucinated behaviorinsufficient evidenceevidence table + unsupported claim gate
Stale docscode changes not trackedclaim fingerprints + PR impact bot
Noisy docstoo many symbols documentedintent classifier + priority scoring
Leaked internalspublic/private boundary missingsecurity classifier + fail-closed policy
Reviewer fatiguehuge generated diffsminimal patch generation
Wrong ownershipstale CODEOWNERSservice catalog sync
Broken examplesgenerated snippets not testedsnippet CI
Recursive contaminationAI output used as evidencesource hierarchy policy
Framework blind spotsstatic parser misses runtime wiringframework adapters + human review

27.2 Debugging Checklist

When output quality is poor, inspect in this order:

  1. Was the doc intent correct?
  2. Was the target audience correct?
  3. Did the context packet include the right symbols?
  4. Were tests included as evidence?
  5. Did the prompt separate facts from inference?
  6. Did the model receive stale generated docs as source?
  7. Were private/internal symbols overrepresented?
  8. Was the output template appropriate?
  9. Did validation catch unsupported claims?
  10. Did reviewers have enough evidence to correct it?

28. Practical 20-Hour Drill

Hour 1–2: Pick a Small Service

Choose one service with:

  • 5–20 public entrypoints,
  • meaningful tests,
  • existing but imperfect docs,
  • clear owner.

Hour 3–5: Build Symbol Inventory

Extract:

  • files,
  • public classes/functions,
  • signatures,
  • comments,
  • test names,
  • owners.

Output JSON.

Hour 6–8: Generate Module Overview

Build one context packet and generate one module overview.

Include evidence table and missing information.

Hour 9–11: Add Validation

Add:

  • markdown lint,
  • prose lint,
  • link check,
  • snippet test if examples exist.

Hour 12–14: Add Drift Detection

Track hashes for evidence snippets.

Mark docs stale when evidence changes.

Hour 15–17: Add PR Workflow

Generate a docs branch or patch.

Ask the code owner to review.

Measure edit distance.

Hour 18–20: Improve One Weak Point

Pick the biggest weakness:

  • bad context,
  • weak prompt,
  • missing tests,
  • noisy output,
  • bad template,
  • broken example.

Improve only that.

This is Kaufman's loop: practice, feedback, correction.


29. Final Mental Model

Code-to-docs is not about making AI explain code faster.

It is about building a documentation supply chain from source evidence to reviewed knowledge.

The system should preserve these invariants:

  1. Extract deterministic facts before generating narrative.
  2. Keep public documentation separate from internal implementation details.
  3. Attach evidence to generated claims.
  4. Mark inference as inference.
  5. Test examples before publishing.
  6. Detect drift when source evidence changes.
  7. Route high-risk docs to human owners.
  8. Avoid using unreviewed AI output as source truth.
  9. Optimize for reader tasks, not symbol count.
  10. Treat documentation as part of the engineering system.

If these invariants hold, AI becomes a multiplier for documentation quality.

If they do not, AI becomes a high-throughput stale-doc generator.


30. References


31. What Comes Next

Part 022 applies the same system thinking to API documentation with OpenAPI.

The key shift is:

Code-to-docs starts from implementation evidence. OpenAPI documentation starts from an explicit HTTP contract.

That changes the source-of-truth model, validation strategy, governance workflow, and AI usage pattern.

Lesson Recap

You just completed lesson 21 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.