Deepen PracticeOrdered learning track

Learn Mintlify Like Ai Docs Cli Part 033 Provenance Citations And Traceability

[]13 min read2466 words

In This Lesson

1. Mental model: provenance adalah supply chain untuk knowledge 2. Why provenance is non-negotiable 3. Provenance vocabulary

PrevNext

Lesson 3348 lesson track27–39 Deepen Practice

title: Build From Scratch: Mintlify-like AI-driven Documentation Generator CLI - Part 033 description: Mendesain provenance, citations, dan traceability untuk AI-driven documentation generator: source refs, evidence refs, claim mapping, generated block metadata, citations UI, trace store, review audit, stale detection, and trust model. series: learn-mintlify-like-ai-docs-cli seriesTitle: Build From Scratch: Mintlify-like AI-driven Documentation Generator CLI order: 33 partTitle: Provenance, Citations, and Traceability tags:

documentation
ai
cli
provenance
citations
traceability
developer-tools date: 2026-07-03

Part 033 — Provenance, Citations, and Traceability

Pada Part 031 dan 032, kita membangun writer dan reviewer agent yang berbasis evidence.

Sekarang kita mendesain layer yang membuat seluruh pipeline bisa dipercaya:

provenance, citations, and traceability

Tanpa provenance, AI-generated docs hanya menjadi teks yang terlihat meyakinkan.

Dengan provenance, setiap claim penting bisa dijawab:

berasal dari file mana?
dari line berapa?
dari OpenAPI pointer mana?
dari config schema field mana?
dari test mana?
dari command artifact mana?
kapan terakhir diverifikasi?
hash sumbernya apa?
apakah sumbernya berubah sejak docs dibuat?
siapa/apa yang menghasilkan block ini?
apakah block ini aman untuk auto-update?

Provenance adalah pembeda antara "AI wrote docs" dan "AI-assisted documentation compiler".

1. Mental model: provenance adalah supply chain untuk knowledge

Dalam software build, kita peduli pada artifact lineage:

source -> compile -> bundle -> deploy

Dalam docs generator, kita perlu lineage untuk knowledge:

source fact -> evidence item -> generated claim -> content block -> MDX page -> static HTML/search/llms.txt

Diagram:

Traceability berarti kita bisa bergerak dua arah:

forward: source berubah → docs apa terdampak?
backward: docs claim → source mana yang mendukung?

2. Why provenance is non-negotiable

AI docs generator tanpa provenance akan gagal di enterprise/prod.

Masalah tanpa provenance:

Problem	Consequence
Claim tidak bisa dicek	Reviewer harus percaya model
Source berubah	Docs stale tidak terdeteksi
AI hallucination	Sulit dibuktikan/diisolasi
Manual edit bercampur generated	Update bisa overwrite human work
API docs generated dari spec lama	User copy request salah
Code sample tidak tahu asal	Sample sulit diverifikasi
Audit/security review sulit	Tidak ada lineage
Search/llms export tidak traceable	Agent memakai fakta tanpa sumber

Provenance bukan fitur nice-to-have. Ia adalah trust foundation.

3. Provenance vocabulary

Kita gunakan beberapa istilah.

Term	Meaning
Source artifact	File/source object asli: code, OpenAPI, schema, docs, test
Source ref	Pointer presisi ke bagian source
Evidence item	Curated context yang dikirim ke AI/generator
Claim	Pernyataan faktual dalam docs
Block provenance	Source refs yang mendukung content block
Page provenance	Gabungan provenance semua block/page
Trace	Metadata proses generation/review/build
Citation	User-facing reference ke source/evidence
Stale marker	Indikasi source hash berubah sejak docs dibuat

4. Source artifact identity

Dari Part 018/022:

export type SourceArtifact = {
  id: ArtifactId;
  path: string;
  kind: SourceArtifactKind;
  language?: LanguageId;
  hash: string;
  sizeBytes: number;
  generated: boolean;
  vendored: boolean;
  sensitive: SensitivityLevel;
};

Artifact identity initially path-based:

artifact:<sha256(normalized-project-relative-path)>

Hash is content-based:

sha256(file bytes)

Traceability needs both:

path ID for stable references,
content hash for stale detection.

5. SourceRef model

SourceRef points to a precise source location.

export type SourceRef = {
  artifactId: ArtifactId;
  path: string;
  kind: SourceRefKind;
  range?: SourceRange;
  selector?: string;
  hash: string;
  label?: string;
};

export type SourceRefKind =
  | "file"
  | "lineRange"
  | "symbol"
  | "openapiOperation"
  | "openapiSchema"
  | "jsonPointer"
  | "configField"
  | "cliCommand"
  | "test"
  | "example"
  | "generatedArtifact";

export type SourceRange = {
  startLine: number;
  startColumn?: number;
  endLine: number;
  endColumn?: number;
};

Examples:

Code symbol

{
  "artifactId": "artifact:src-commands-build",
  "path": "src/commands/build.ts",
  "kind": "symbol",
  "selector": "src/commands/build.ts#buildCommand",
  "range": { "startLine": 12, "endLine": 48 },
  "hash": "sha256:abc..."
}

OpenAPI operation

{
  "artifactId": "artifact:openapi-public",
  "path": "openapi/public.yaml",
  "kind": "openapiOperation",
  "selector": "#/paths/~1users/post",
  "hash": "sha256:def..."
}

Config field

{
  "artifactId": "artifact:config-schema",
  "path": "src/config/schema.ts",
  "kind": "configField",
  "selector": "build.outputDir",
  "range": { "startLine": 32, "endLine": 39 },
  "hash": "sha256:ghi..."
}

6. Selector design

Selector should be stable and human/debug friendly.

Selector examples:

Source	Selector
OpenAPI operation	`#/paths/~1users/post`
JSON Schema field	`#/properties/build/properties/outputDir`
TypeScript symbol	`src/build.ts#buildSite`
Java symbol	`com.acme.UserResource.createUser`
CLI command	`cli:docforge build`
Config field	`config:build.outputDir`
Test	`test:build command fails on invalid MDX`
MDX heading	`docs/quickstart.mdx#install`

Selectors do not replace line ranges. Use both if possible.

7. EvidenceItem model

Evidence is what AI/generator receives.

export type EvidenceItem = {
  id: EvidenceId;
  kind: EvidenceKind;
  title: string;
  content: string;
  sourceRefs: SourceRef[];
  confidence: Confidence;
  sensitivity: SensitivityLevel;
  freshness: EvidenceFreshness;
  metadata?: Record<string, unknown>;
};

export type EvidenceKind =
  | "openapiOperation"
  | "openapiSchema"
  | "cliCommand"
  | "configField"
  | "codeSymbol"
  | "test"
  | "example"
  | "existingDoc"
  | "diagnostic"
  | "searchChunk"
  | "manualNote";

export type EvidenceFreshness = {
  sourceHash: string;
  indexedAt: string;
  stale: boolean;
};

Evidence ID is stable within job:

ev_cli_build
ev_config_build_output_dir
ev_openapi_create_user

Could include hash for global uniqueness, but prompt readability matters.

8. Evidence pack provenance

An evidence pack is a set of evidence items plus selection trace.

export type EvidencePack = {
  id: string;
  objective: string;
  items: EvidenceItem[];
  retrievalTrace: RetrievalTrace;
  createdAt: string;
};

export type RetrievalTrace = {
  query: string;
  seeds: RetrievalSeed[];
  stages: RetrievalStageTrace[];
  filtersApplied: string[];
  tokenBudget: number;
};

export type RetrievalStageTrace = {
  stage: "exact" | "keyword" | "semantic" | "graph" | "rerank" | "compression";
  inputCount: number;
  outputCount: number;
  notes?: string[];
};

Trace answers:

Why did this evidence get selected?

This is useful when writer hallucinates due to poor retrieval.

9. Claim model

A claim is a factual assertion in docs.

export type Claim = {
  id: ClaimId;
  blockId: string;
  text: string;
  evidenceIds: EvidenceId[];
  sourceRefs: SourceRef[];
  supportStatus: ClaimSupportStatus;
  confidence: Confidence;
};

export type ClaimSupportStatus =
  | "supported"
  | "partiallySupported"
  | "unsupported"
  | "contradicted"
  | "notChecked";

Claims can be extracted from draft blocks.

Stored claim mapping helps:

review,
trace UI,
stale detection,
coverage,
fact-check eval.

10. Block provenance

Every content block should know source refs.

export type BlockProvenance = {
  blockId: string;
  generatedBy: GenerationSource;
  evidenceIds: EvidenceId[];
  sourceRefs: SourceRef[];
  claims: Claim[];
  sourceHashAggregate: string;
  lastVerifiedAt: string;
  verificationStatus: VerificationStatus;
};

export type GenerationSource =
  | { type: "human" }
  | { type: "deterministic"; generator: string; version: string }
  | { type: "ai"; jobId: string; promptContractVersion: string; model: string }
  | { type: "hybrid"; sources: GenerationSource[] };

export type VerificationStatus =
  | "verified"
  | "needsReview"
  | "stale"
  | "unverified"
  | "failed";

This allows block-level update, not just page-level.

11. Page provenance

export type PageProvenance = {
  pageId: PageId;
  route: RoutePath;
  sourcePath: string;
  generated: boolean;
  owner: "human" | "generated" | "hybrid";
  sourceRefs: SourceRef[];
  blockProvenance: BlockProvenance[];
  generatedAt?: string;
  lastVerifiedAt?: string;
  sourceHashAggregate: string;
  verificationStatus: VerificationStatus;
};

Source hash aggregate:

export function aggregateSourceHashes(sourceRefs: SourceRef[]): string {
  const hashes = sourceRefs
    .map((ref) => `${ref.path}:${ref.selector ?? ""}:${ref.hash}`)
    .sort()
    .join("\n");

  return sha256(hashes);
}

If aggregate changes, page may be stale.

12. Generated block metadata

When writing MDX, embed managed region metadata.

Example comment markers:

{/* docforge:begin block id="build-options" owner="generated" hash="sha256:abc" */}
## Build options

...
{/* docforge:end block id="build-options" */}

But raw comment metadata can become noisy.

Alternative sidecar file:

docs/reference/cli-build.mdx
docs/reference/cli-build.mdx.docforge.json

Sidecar:

{
  "pageId": "reference-cli-build",
  "blocks": [
    {
      "id": "build-options",
      "owner": "generated",
      "contentHash": "sha256:...",
      "sourceHashAggregate": "sha256:...",
      "evidenceIds": ["ev_cli_build"]
    }
  ]
}

Recommended: use sidecar for rich metadata, minimal inline markers for managed regions.

13. Inline markers vs sidecar metadata

Approach	Pros	Cons
Inline markers	survives file movement, visible	noisy in docs source
Sidecar	clean MDX, rich metadata	can drift from source
Hybrid	best practical choice	more implementation

Hybrid:

inline markers identify managed regions,
sidecar stores provenance details.

Inline:

{/* docforge:begin id="build-options" */}
...
{/* docforge:end id="build-options" */}

Sidecar stores hash/evidence/claims.

14. Managed region model

export type ManagedRegion = {
  id: string;
  owner: "generated" | "human" | "hybrid";
  startOffset?: number;
  endOffset?: number;
  startLine?: number;
  endLine?: number;
  contentHash: string;
  sourceHashAggregate: string;
  updatePolicy: "auto" | "reviewRequired" | "manualOnly";
};

During update:

parse MDX,
locate managed regions,
verify content hash,
update only if owner/policy allows,
if human edited generated region, switch to review.

15. Human edit detection

If generated region content hash changed since last generation, user edited it.

export function detectHumanEditedRegion(
  currentContent: string,
  region: ManagedRegion
): boolean {
  return sha256(currentContent) !== region.contentHash;
}

Policy:

Region	If edited
generated auto	mark conflict/review
hybrid	preserve human subregions
human	never overwrite
manualOnly	never overwrite

Diagnostic:

warning provenance.region.humanEdited
Generated region "build-options" was modified manually. Automatic update requires review.

16. Citation model

Citations are user-facing references.

export type Citation = {
  id: string;
  label: string;
  sourceRef: SourceRef;
  displayMode: "hidden" | "inline" | "footnote" | "debug";
};

Not every docs site should show source code citations to public users.

Modes:

Mode	Behavior
hidden	provenance stored but not displayed
debug	visible in local/dev/review mode
footnote	citations shown at bottom
inline	cite icon next to claims/sections
sourceLink	link to GitHub/source if allowed

For public external docs, hidden or sourceLink may be best. For internal engineering docs, footnote/debug can be powerful.

17. Citation visibility policy

export type CitationPolicy = {
  mode: "hidden" | "debug" | "footnote" | "inline";
  exposeSourcePaths: boolean;
  exposeLineNumbers: boolean;
  exposePrivateSources: boolean;
  sourceBaseUrl?: string;
};

Public docs:

{
  "citations": {
    "mode": "hidden",
    "exposeSourcePaths": false
  }
}

Internal docs:

{
  "citations": {
    "mode": "debug",
    "exposeSourcePaths": true,
    "exposeLineNumbers": true,
    "sourceBaseUrl": "https://github.com/acme/project/blob/main"
  }
}

Never expose private/internal paths in public docs unless configured.

18. Source links

If repo base URL configured:

export function sourceUrlForRef(ref: SourceRef, policy: CitationPolicy): string | undefined {
  if (!policy.sourceBaseUrl) return undefined;
  if (!policy.exposeSourcePaths) return undefined;

  let url = `${policy.sourceBaseUrl}/${encodeURI(ref.path)}`;

  if (policy.exposeLineNumbers && ref.range) {
    url += `#L${ref.range.startLine}-L${ref.range.endLine}`;
  }

  return url;
}

Do not generate source URLs for sensitive evidence.

19. Citation rendering

Inline debug citation:

<SourceCitation id="src-build-command" />

Footnote:

<SourceFootnotes refs={...} />

But final MDX should not contain huge provenance JSON. It can reference sidecar manifest.

Component contract:

export type SourceCitationProps = {
  citationId: string;
};

Renderer resolves citation from page provenance.

20. Trace store

Provenance is about content. Trace is about process.

Trace types:

export type TraceRecord =
  | RetrievalTraceRecord
  | PlannerTraceRecord
  | WriterTraceRecord
  | ReviewTraceRecord
  | BuildTraceRecord
  | PatchTraceRecord;

Common:

export type BaseTraceRecord = {
  id: string;
  type: string;
  jobId: string;
  createdAt: string;
  toolVersion: string;
  inputHash: string;
  outputHash: string;
  diagnostics: Diagnostic[];
};

Store traces in knowledge store or .docforge/traces.

21. Generation trace

export type GenerationTrace = {
  jobId: string;
  pageId: string;
  plannerTraceId?: string;
  retrievalTraceId: string;
  writerTraceId?: string;
  reviewTraceId?: string;
  modelCalls: ModelCallTrace[];
  finalVerdict: "applied" | "reviewRequired" | "failed";
};

Model call trace:

export type ModelCallTrace = {
  id: string;
  provider: string;
  model: string;
  promptContractId: string;
  promptContractVersion: string;
  inputTokenEstimate?: number;
  outputTokenEstimate?: number;
  costEstimate?: number;
  promptHash: string;
  outputHash: string;
  storedPrompt?: boolean;
  storedOutput?: boolean;
};

Do not store full prompts if privacy policy disallows. Store hashes.

22. Provenance in knowledge store

Tables from Part 022 can be extended.

block_provenance

CREATE TABLE block_provenance (
  id TEXT PRIMARY KEY,
  page_id TEXT NOT NULL,
  block_id TEXT NOT NULL,
  owner TEXT NOT NULL,
  generation_source_json TEXT NOT NULL,
  evidence_ids_json TEXT NOT NULL,
  source_hash_aggregate TEXT NOT NULL,
  content_hash TEXT NOT NULL,
  verification_status TEXT NOT NULL,
  last_verified_at TEXT,
  metadata_json TEXT
);

CREATE INDEX idx_block_provenance_page ON block_provenance(page_id);
CREATE INDEX idx_block_provenance_block ON block_provenance(page_id, block_id);
CREATE INDEX idx_block_provenance_status ON block_provenance(verification_status);

claim_mappings

CREATE TABLE claim_mappings (
  id TEXT PRIMARY KEY,
  page_id TEXT NOT NULL,
  block_id TEXT NOT NULL,
  claim_text TEXT NOT NULL,
  support_status TEXT NOT NULL,
  confidence TEXT NOT NULL,
  evidence_ids_json TEXT NOT NULL,
  source_refs_json TEXT NOT NULL,
  last_checked_at TEXT
);

CREATE INDEX idx_claim_mappings_page ON claim_mappings(page_id);
CREATE INDEX idx_claim_mappings_block ON claim_mappings(page_id, block_id);
CREATE INDEX idx_claim_mappings_status ON claim_mappings(support_status);

23. Provenance sidecar schema

For portable docs source:

export type PageProvenanceSidecar = {
  schemaVersion: "page-provenance/v1";
  pageId: PageId;
  route: RoutePath;
  sourcePath: string;
  owner: "human" | "generated" | "hybrid";
  contentHash: string;
  sourceHashAggregate: string;
  blocks: BlockProvenanceSidecar[];
};

export type BlockProvenanceSidecar = {
  blockId: string;
  contentHash: string;
  owner: "human" | "generated" | "hybrid";
  updatePolicy: "auto" | "reviewRequired" | "manualOnly";
  evidenceIds: EvidenceId[];
  sourceRefs: SourceRef[];
  claims: Array<{
    claimId: string;
    textHash: string;
    supportStatus: ClaimSupportStatus;
  }>;
};

Do not store huge claim text if not needed; store hash in sidecar and full in knowledge store.

24. Freshness checking

To check if a block is stale:

export function checkBlockFreshness(
  block: BlockProvenance,
  currentSourceHashes: Map<ArtifactId, string>
): FreshnessStatus {
  for (const ref of block.sourceRefs) {
    const currentHash = currentSourceHashes.get(ref.artifactId);

    if (!currentHash) {
      return {
        status: "stale",
        reason: "sourceMissing",
        sourceRef: ref,
      };
    }

    if (currentHash !== ref.hash) {
      return {
        status: "stale",
        reason: "sourceHashChanged",
        sourceRef: ref,
      };
    }
  }

  return { status: "fresh" };
}

This is conservative. A file hash changed does not always mean referenced symbol changed. Later we can compare symbol-level hashes.

25. Symbol-level hash

File hash can be too broad. Better:

export type SymbolSnapshot = {
  symbolId: SymbolId;
  signatureHash: string;
  bodyHash?: string;
  docCommentHash?: string;
  range: SourceRange;
};

For docs claims about signature/options, signature hash matters more than body hash.

Config field snapshot:

export type ConfigFieldSnapshot = {
  fieldId: string;
  typeHash: string;
  defaultHash: string;
  descriptionHash?: string;
};

OpenAPI operation snapshot:

export type OperationSnapshot = {
  operationKey: OperationKey;
  operationHash: string;
  requestHash: string;
  responseHash: string;
  parameterHash: string;
  securityHash: string;
};

Block provenance can reference semantic snapshot hash.

26. Source hash granularity

Source type	Good hash granularity
File	content hash
Symbol	signature/doc comment/body hash
CLI command	command/options hash
Config field	type/default/description hash
OpenAPI operation	normalized operation hash
Schema	normalized schema hash
Example	code hash
Test	test body/name hash

Use semantic hashes for precise stale detection.

27. Normalized operation hash

export function hashNormalizedOperation(operation: NormalizedOperation): string {
  return sha256(stableJson({
    operationId: operation.operationId,
    method: operation.method,
    path: operation.path,
    summary: operation.summary,
    description: operation.description,
    parameters: operation.parameters,
    requestBody: operation.requestBody,
    responses: operation.responses,
    security: operation.security,
    deprecated: operation.deprecated,
  }));
}

Ignore source location if only moved but contract same.

28. Stable JSON

Hashing requires stable serialization.

export function stableJson(value: unknown): string {
  if (Array.isArray(value)) {
    return `[${value.map(stableJson).join(",")}]`;
  }

  if (value && typeof value === "object") {
    const entries = Object.entries(value as Record<string, unknown>)
      .filter(([, v]) => v !== undefined)
      .sort(([a], [b]) => a.localeCompare(b));

    return `{${entries.map(([k, v]) => `${JSON.stringify(k)}:${stableJson(v)}`).join(",")}}`;
  }

  return JSON.stringify(value);
}

29. Traceability queries

Useful CLI queries:

docforge trace page /reference/cli-build
docforge trace claim --page /reference/cli-build --block build-options
docforge trace source src/commands/build.ts
docforge trace stale

Page trace

Page: /reference/cli-build
Source: docs/reference/cli-build.mdx
Owner: hybrid
Status: verified

Blocks:
- build-command-overview
  generated by AI writer job_123
  evidence: ev_cli_build
  source: src/commands/build.ts:12-48
  status: verified

- build-options
  generated by deterministic cliReference v1.0.0
  source: cli:docforge build
  status: stale

Source trace

Source: src/commands/build.ts

Documents:
- /reference/cli-build
  blocks: build-command-overview, build-options
- /guides/build-docs
  blocks: run-build

30. Provenance UI in review mode

In local dev/review mode, show source links.

UI pattern:

[Source: src/commands/build.ts:12-48]

or icon.

Click opens:

local editor link,
GitHub source link,
source excerpt,
evidence item.

This helps reviewers quickly verify.

31. Evidence excerpt rendering

Do not show entire source file.

Show excerpt:

export type EvidenceExcerpt = {
  evidenceId: EvidenceId;
  title: string;
  excerpt: string;
  sourceRefs: SourceRef[];
};

Excerpt length bounded.

If source sensitivity is internal and page public, do not render.

32. Public citations for generated docs

Some public docs may want citations like:

Generated from OpenAPI spec.

Instead of code line links.

Policy:

API docs can show "Generated from OpenAPI",
config reference can show "Generated from config schema",
CLI reference can show "Generated from CLI command metadata",
avoid exposing repo paths.

Example public footer:

<GeneratedFrom label="OpenAPI" />

Internal mode can show exact file/pointer.

33. Provenance for deterministic generators

Deterministic generators should produce provenance too.

Example config reference generator:

export function generateConfigFieldRow(field: ConfigFieldArtifact): DraftTableRow {
  return {
    field: supportedText(`\`${field.path}\``, [field.evidenceId]),
    type: supportedText(`\`${field.schemaType}\``, [field.evidenceId]),
    default: supportedText(renderDefault(field.defaultValue), [field.evidenceId]),
    description: supportedText(field.description ?? "", [field.evidenceId]),
  };
}

Even without AI, provenance exists.

34. Provenance for code samples

Code samples derive from:

operation,
request example/schema,
auth scheme,
SDK mapping.

export type CodeSampleProvenance = {
  operationRef: SourceRef;
  requestExampleRefs: SourceRef[];
  schemaRefs: SourceRef[];
  sdkMappingRef?: SourceRef;
  generator: {
    id: string;
    version: string;
  };
};

If code sample becomes stale because operation request body changes, update sample.

35. Provenance for search chunks

Search chunks should know page/block source.

export type SearchChunkProvenance = {
  chunkId: string;
  pageId: PageId;
  blockIds: string[];
  sourceRefs: SourceRef[];
};

This enables:

search result "generated from OpenAPI",
debug bad search result,
remove stale chunks,
answer agent queries with citations.

36. Provenance for `llms.txt`

llms.txt is an export. It should include trace metadata internally.

Maybe not user-facing.

export type LlmsExportRecord = {
  sourcePageId: PageId;
  sourceBlockIds: string[];
  sourceHashAggregate: string;
  exportedAt: string;
};

If source page stale, llms.txt stale.

37. Stale status model

export type StaleStatus =
  | { status: "fresh" }
  | { status: "stale"; reasons: StaleReason[] }
  | { status: "unknown"; reason: string };

export type StaleReason =
  | { type: "sourceHashChanged"; sourceRef: SourceRef; previousHash: string; currentHash: string }
  | { type: "sourceMissing"; sourceRef: SourceRef }
  | { type: "evidenceMissing"; evidenceId: EvidenceId }
  | { type: "generatorVersionChanged"; previous: string; current: string }
  | { type: "promptContractChanged"; previous: string; current: string }
  | { type: "reviewExpired"; lastVerifiedAt: string };

Docs can be stale because source changed or generator/prompt changed.

38. Verification expiry

Some docs should be re-reviewed periodically.

export type VerificationPolicy = {
  maxAgeDays?: number;
  requireReviewAfterGeneratorChange: boolean;
  requireReviewAfterPromptChange: boolean;
};

Example:

API reference: reverify on spec hash change.
Security docs: reverify after 30 days or source change.
Quickstart: reverify after command/config changes.

39. Provenance report

Command:

docforge provenance report

Output:

Provenance report:

Pages: 128
Verified: 117
Stale: 8
Unverified: 3

Top stale reasons:
- OpenAPI operation changed: 4
- CLI command options changed: 2
- Config field defaults changed: 2

Pages missing provenance:
- /guides/legacy-deployment
- /concepts/architecture-old

This tells team where trust gaps are.

40. Missing provenance diagnostics

warning provenance.page.missing
Page /guides/legacy-deployment has no provenance metadata.

warning provenance.block.unverified
Generated block "advanced-options" has no verification record.

error provenance.generated.noSource
Generated block "api-request-body" has no source refs.

Generated formal reference without source is error.

41. Provenance during import of existing docs

Existing docs may not have provenance.

Import options:

mark as human/unverified,
infer links to semantic artifacts,
ask AI/reviewer to map claims to evidence,
gradually add provenance.

Do not pretend imported docs are verified.

owner: "human"
verificationStatus: "unverified"

Later docforge verify can attempt mapping.

42. Claim-to-source backfill

For existing docs, we can run claim mapping.

Pipeline:

This is expensive and should be optional.

43. Provenance and manual notes

Sometimes human adds manual source note.

Example frontmatter:

docforge:
  sources:
    - type: manualNote
      label: "Engineering decision in ADR-004"
      path: "docs/adr/004-openapi-first.mdx"

Manual note becomes evidence with human provenance.

44. Trust levels

Not all provenance equal.

export type TrustLevel =
  | "formalContract"
  | "code"
  | "test"
  | "officialExample"
  | "existingDoc"
  | "manualNote"
  | "aiInferred";

Ranking:

formal contract,
code,
tests,
official examples,
existing docs,
manual notes,
AI inferred.

Use trust level in reviewer.

45. Provenance conflict detection

Evidence may conflict.

Example:

config schema says default true,
README says default false.

Conflict model:

export type EvidenceConflict = {
  id: string;
  claimKey: string;
  evidenceA: EvidenceId;
  evidenceB: EvidenceId;
  description: string;
  severity: "warning" | "error";
};

Resolution:

prefer higher trust level,
emit diagnostic,
require human review if conflict affects docs.

46. Conflict diagnostic

warning provenance.evidence.conflict
Configuration field search.enabled has conflicting defaults.

- Schema: true
- Existing docs: false

Preferred source: schema
Action: update existing docs or verify intended default.

This is one of the most valuable outputs of the system.

47. Provenance and route/page ownership

Page ownership:

export type PageOwnership = {
  owner: "human" | "generated" | "hybrid";
  updatePolicy: "auto" | "reviewRequired" | "manualOnly";
  protectedRegions: string[];
};

Generated API reference:

owner: generated
updatePolicy: auto

Human guide:

owner: human
updatePolicy: manualOnly

Hybrid CLI guide:

owner: hybrid
updatePolicy: reviewRequired

Ownership affects diff-aware updates.

48. Security of provenance data

Provenance can leak:

internal file paths,
symbol names,
comments,
private APIs,
generated prompts,
source excerpts.

Policies:

Do not deploy .docforge store.
Do not include sidecars in public build unless configured.
Redact sensitive source refs from public citations.
Do not store full prompts by default.
Do not send provenance of secret files to AI.
Use sensitivity labels.

Public static output should include only safe provenance metadata.

49. Provenance build artifact policy

Build output should copy:

HTML,
JS/CSS assets,
search index,
llms.txt,
sitemap,
public provenance if configured.

It should not copy:

knowledge store,
trace files,
full prompts,
private evidence,
source excerpts,
local absolute paths.

Add build check:

error build.output.privateProvenanceLeak
Public build output contains internal provenance file .docforge/index/docforge.sqlite.

50. Provenance tests

50.1 SourceRef mapping

it("maps OpenAPI operation to source ref", () => {
  const ref = sourceRefForOperation(operation);

  expect(ref.selector).toBe("#/paths/~1users/post");
  expect(ref.kind).toBe("openapiOperation");
});

50.2 Block provenance

it("creates block provenance from evidence IDs", () => {
  const provenance = blockProvenanceFromDraftBlock(block, evidenceMap);

  expect(provenance.sourceRefs).toHaveLength(1);
  expect(provenance.evidenceIds).toEqual(["ev_cli_build"]);
});

50.3 Stale detection

it("marks block stale when source hash changes", () => {
  const status = checkBlockFreshness(block, new Map([
    [artifactId, "sha256:new"],
  ]));

  expect(status.status).toBe("stale");
});

50.4 Public citation policy

it("does not expose source path when policy disables paths", () => {
  const citation = renderCitation(ref, { exposeSourcePaths: false, mode: "footnote" });

  expect(citation).not.toContain("src/commands/build.ts");
});

51. Provenance CLI commands

docforge provenance report
docforge trace page /quickstart
docforge trace source src/commands/build.ts
docforge trace claim --page /quickstart --block install
docforge stale
docforge verify --page /quickstart

docforge stale output:

Stale documentation:

/reference/cli-build
  - block build-options
    reason: CLI command options changed
    source: src/commands/build.ts

/api-reference/users/create-user
  - block api-operation
    reason: OpenAPI operation changed
    source: openapi/public.yaml#/paths/~1users/post

52. Integration with reviewer

Reviewer updates claim support.

export function applyFactCheckReportToProvenance(
  pageProvenance: PageProvenance,
  report: FactCheckReport
): PageProvenance {
  // update claim support status
  return pageProvenance;
}

If reviewer says unsupported:

block verificationStatus = failed,
page verificationStatus = needsReview or failed,
auto-apply blocked.

53. Integration with diff-aware updates

Part 034 builds on provenance.

When source changes:

source hash changed -> find sourceRefs -> find blocks -> mark stale -> generate targeted patch

Without provenance, update must rewrite too much.

With provenance, update only impacted blocks.

54. Minimal implementation milestone

First version:

define SourceRef,
define EvidenceItem,
map evidence to source refs,
attach block provenance to Content IR,
create page provenance sidecar,
detect stale by source hash,
add docforge trace page,
add docforge stale,
hide citations by default,
validate generated blocks have source refs.

Second version:

claim-level mapping,
citation UI,
source links,
provenance report,
semantic hash granularity,
conflict detection,
provenance backfill for existing docs,
review trace integration,
public/private citation policies,
GitHub PR source links.

55. Failure modes

Failure	Cause	Prevention
Docs claim cannot be verified	no claim/source mapping	block provenance and evidence IDs
Stale docs not detected	only page text stored	source refs and hashes
Manual edits overwritten	no region ownership	managed regions and content hash
Public build leaks paths	citations exposed by default	citation visibility policy
AI cites fake source	no evidence ID validation	evidence ID validator
OpenAPI change rewrites all docs	no block-level provenance	block source refs
Conflicting evidence hidden	no conflict detection	trust levels and conflict diagnostics
Review not auditable	no traces	generation/review trace store
Sidecar drifts from MDX	no content hash	content hash validation
Search/llms stale	no export provenance	export records and source hash aggregate

56. Key takeaways

Provenance is the trust infrastructure of AI-driven documentation.

Strong provenance design:

tracks source refs precisely,
maps evidence to claims,
stores block/page provenance,
separates inline markers from sidecar metadata,
detects stale content by hashes,
supports citations without leaking private data,
records generation/review traces,
protects human edits,
powers targeted updates,
and makes AI-generated docs auditable.

Next, we build on this to implement diff-aware documentation updates.

Lesson Recap

You just completed lesson 33 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 32

Learn Mintlify Like Ai Docs Cli Part 032 Doc Reviewer And Fact Check Agent

Next Lesson

Lesson 34

Learn Mintlify Like Ai Docs Cli Part 034 Diff Aware Documentation Updates