Build CoreOrdered learning track

Learn Mintlify Like Ai Docs Cli Part 023 Openapi Ingestion And Validation

12 min read2248 words
PrevNext
Lesson 2348 lesson track10–26 Build Core

title: Build From Scratch: Mintlify-like AI-driven Documentation Generator CLI - Part 023 description: Membangun OpenAPI ingestion dan validation pipeline untuk documentation generator: local/remote specs, parsing YAML/JSON, bundling refs, normalization, semantic validation, style rules, security checks, provenance, diagnostics, cache, and code/spec consistency. series: learn-mintlify-like-ai-docs-cli seriesTitle: Build From Scratch: Mintlify-like AI-driven Documentation Generator CLI order: 23 partTitle: OpenAPI Ingestion and Validation tags:

  • documentation
  • ai
  • cli
  • openapi
  • api-reference
  • validation
  • developer-tools date: 2026-07-03

Part 023 — OpenAPI Ingestion and Validation

Kita sudah punya code index dan knowledge store.

Sekarang kita masuk ke salah satu source artifact paling penting untuk developer documentation: OpenAPI.

OpenAPI adalah formal contract untuk HTTP API. Dalam documentation generator production-grade, OpenAPI tidak boleh diperlakukan sebagai "file YAML biasa". Ia harus diperlakukan sebagai high-authority source artifact.

Jika OpenAPI tersedia, maka API reference harus mengambil fakta formal dari sana:

  • operation method,
  • path,
  • operation ID,
  • tags,
  • summary,
  • description,
  • parameters,
  • request body,
  • response status,
  • response schema,
  • authentication/security,
  • examples,
  • servers,
  • deprecation,
  • schema definitions.

AI boleh membantu menjelaskan, mengelompokkan, atau menulis guide. Tetapi AI tidak boleh mengarang detail endpoint jika OpenAPI sudah menyediakannya.


1. Mental model: OpenAPI adalah formal API contract

Dalam sistem docs kita, source memiliki authority berbeda.

OpenAPI paling kuat untuk public contract. Code route discovery kuat untuk implementation evidence. Tests kuat untuk behavior evidence. Existing docs kuat untuk explanation, tetapi bisa stale.

Rule:

API reference formal harus dihasilkan dari normalized OpenAPI, bukan dari AI prose.


2. Goals

OpenAPI ingestion pipeline harus:

  1. menemukan spec file,
  2. membaca YAML/JSON,
  3. mendukung multiple specs,
  4. memvalidasi struktur OpenAPI,
  5. resolve $ref,
  6. normalize versi dan shape,
  7. menghasilkan operation registry,
  8. menghasilkan schema registry,
  9. menyimpan provenance,
  10. mengeluarkan diagnostics actionable,
  11. mendeteksi conflict dengan code discovery,
  12. menghasilkan input stabil untuk API page generation.

3. Input sources

Spec bisa berasal dari beberapa lokasi:

{
  "openapi": {
    "specs": [
      {
        "id": "public",
        "path": "openapi/public.yaml"
      },
      {
        "id": "admin",
        "path": "openapi/admin.json"
      }
    ]
  }
}

Potential source types:

SourceExampleDefault policy
Local fileopenapi.yamlAllowed
Local globopenapi/*.yamlAllowed if configured
URLhttps://example.com/openapi.jsonDisabled in deterministic build unless configured
Generated commandnpm run openapi:generateExplicit only
Inline configembedded spec objectNot recommended
Package artifactgenerated OpenAPI in build outputAllowed if path safe

Prefer local committed specs for deterministic docs.


4. OpenAPI config model

export type OpenApiConfig = {
  specs: OpenApiSpecConfig[];
  validation: OpenApiValidationConfig;
  generation: OpenApiGenerationConfig;
};

export type OpenApiSpecConfig = {
  id: string;
  path?: string;
  url?: string;
  title?: string;
  visibility?: "public" | "internal" | "admin";
  baseRoute?: string;
  includeTags?: string[];
  excludeTags?: string[];
  includeOperations?: string[];
  excludeOperations?: string[];
};

export type OpenApiValidationConfig = {
  strict: boolean;
  failOnWarnings: boolean;
  allowRemoteRefs: boolean;
  requireOperationId: boolean;
  requireSummary: boolean;
  requireTags: boolean;
};

export type OpenApiGenerationConfig = {
  groupBy: "tag" | "path" | "operation" | "resource";
  routePrefix: string;
};

Example:

{
  "openapi": {
    "specs": [
      {
        "id": "public",
        "path": "openapi/public.yaml",
        "baseRoute": "/api-reference"
      }
    ],
    "validation": {
      "strict": true,
      "allowRemoteRefs": false,
      "requireOperationId": true,
      "requireSummary": true,
      "requireTags": true
    },
    "generation": {
      "groupBy": "tag",
      "routePrefix": "/api-reference"
    }
  }
}

5. Discovery of specs

If config does not explicitly declare specs, we can infer common paths.

Common candidates:

openapi.yaml
openapi.yml
openapi.json
swagger.yaml
swagger.yml
swagger.json
api/openapi.yaml
docs/openapi.yaml
spec/openapi.yaml

But inferred discovery should be conservative.

export async function discoverOpenApiSpecs(projectRoot: string): Promise<DiscoveredOpenApiSpec[]> {
  const candidates = [
    "openapi.yaml",
    "openapi.yml",
    "openapi.json",
    "swagger.yaml",
    "swagger.yml",
    "swagger.json",
    "api/openapi.yaml",
    "docs/openapi.yaml",
    "spec/openapi.yaml",
  ];

  const found: DiscoveredOpenApiSpec[] = [];

  for (const relative of candidates) {
    const absolute = path.join(projectRoot, relative);
    if (await exists(absolute)) {
      found.push({
        id: inferSpecId(relative),
        path: relative,
        confidence: "medium",
      });
    }
  }

  return found;
}

If multiple candidates found without config, emit diagnostic and require explicit config.


6. Spec identity

Every spec needs stable ID.

export type OpenApiSpecId = string & { readonly brand: unique symbol };

Rules:

  • config id preferred,
  • fallback from filename,
  • lowercase kebab-case,
  • unique across project.

Diagnostic:

error openapi.spec.duplicateId
Multiple OpenAPI specs use id "public".

Spec ID is used for:

  • routes,
  • cache,
  • operation IDs,
  • provenance,
  • API registry,
  • graph nodes.

7. Parsing YAML/JSON

OpenAPI can be YAML or JSON.

Parser result:

export type OpenApiParseResult = {
  specId: OpenApiSpecId;
  raw: unknown;
  source: OpenApiSource;
  diagnostics: Diagnostic[];
};

export type OpenApiSource =
  | { type: "file"; path: string; hash: string }
  | { type: "url"; url: string; fetchedAt: string; hash: string };

Pseudo:

export async function parseOpenApiSpec(
  spec: OpenApiSpecConfig,
  ctx: OpenApiIngestionContext
): Promise<OpenApiParseResult> {
  const source = await loadOpenApiSource(spec, ctx);
  const text = source.text;

  try {
    const raw = spec.path?.endsWith(".json")
      ? JSON.parse(text)
      : parseYaml(text);

    return {
      specId: spec.id as OpenApiSpecId,
      raw,
      source: source.metadata,
      diagnostics: [],
    };
  } catch (error) {
    return {
      specId: spec.id as OpenApiSpecId,
      raw: undefined,
      source: source.metadata,
      diagnostics: [normalizeOpenApiParseError(spec, error)],
    };
  }
}

YAML parser errors must include line/column if possible.


8. Remote specs policy

Remote specs reduce determinism.

If allowed:

{
  "openapi": {
    "specs": [
      {
        "id": "public",
        "url": "https://api.example.com/openapi.json"
      }
    ],
    "validation": {
      "allowRemoteRefs": false
    }
  }
}

Policies:

ContextDefault
devallow if explicitly configured
buildallow only if explicitly configured
checkallow only if explicitly configured
CI deterministic modeprefer disallow
offline modedisallow

Cache remote response by URL and ETag/hash if implemented.

Diagnostic:

error openapi.remote.disabled
Remote OpenAPI specs are disabled in this build.

Hint:
Use a local committed spec file or enable remote specs explicitly.

9. Basic OpenAPI shape validation

Before full validation, check basic shape.

export function validateOpenApiRoot(raw: unknown, source: OpenApiSource): Diagnostic[] {
  const diagnostics: Diagnostic[] = [];

  if (!isObject(raw)) {
    diagnostics.push({
      code: "openapi.root.notObject",
      severity: "error",
      category: "openapi",
      message: "OpenAPI document root must be an object.",
      location: sourceLocation(source),
    });
    return diagnostics;
  }

  if (!("openapi" in raw) && !("swagger" in raw)) {
    diagnostics.push({
      code: "openapi.root.missingVersion",
      severity: "error",
      category: "openapi",
      message: "OpenAPI document is missing an openapi version field.",
      location: sourceLocation(source),
    });
  }

  if (!("paths" in raw)) {
    diagnostics.push({
      code: "openapi.root.missingPaths",
      severity: "error",
      category: "openapi",
      message: "OpenAPI document is missing required paths object.",
      location: sourceLocation(source),
    });
  }

  return diagnostics;
}

Swagger 2.0 support can be a future migration step. For this project, normalize OpenAPI 3.x first.


10. Validation levels

Validation has layers.

LayerExampleSeverity
Syntax parseinvalid YAMLerror
Spec structuremissing pathserror
Reference resolution$ref target missingerror
Semantic API qualitymissing operationIdwarning/error by config
Style qualitysummary too vaguewarning
Securityremote $ref not allowederror
Docs generation readinessoperation cannot produce routeerror/warning

This separation matters because not every weak spec should block dev mode.


11. Reference resolution

OpenAPI uses $ref.

Examples:

schema:
  $ref: "#/components/schemas/User"

Need resolver.

export type RefResolutionPolicy = {
  allowRemoteRefs: boolean;
  maxDepth: number;
  preserveRefMetadata: boolean;
};

export type ResolvedRef<T = unknown> = {
  value: T;
  ref: string;
  source: ProvenanceRef;
};

Resolver:

export function resolveJsonPointer(root: unknown, pointer: string): unknown {
  if (!pointer.startsWith("#/")) {
    throw new Error(`Only local JSON pointers are supported: ${pointer}`);
  }

  const parts = pointer
    .slice(2)
    .split("/")
    .map((part) => part.replace(/~1/g, "/").replace(/~0/g, "~"));

  let current: unknown = root;

  for (const part of parts) {
    if (!isObject(current) && !Array.isArray(current)) {
      throw new Error(`Cannot resolve ${pointer}`);
    }

    current = (current as any)[part];
  }

  return current;
}

Need cycle detection.


12. Circular refs

Schemas often have circular refs.

Example:

User:
  type: object
  properties:
    manager:
      $ref: "#/components/schemas/User"

Resolver must not infinitely expand.

Use graph of refs.

export type RefResolverState = {
  stack: string[];
  seen: Set<string>;
};

If ref repeats:

  • preserve $ref,
  • mark circular,
  • do not inline infinitely.
if (state.stack.includes(ref)) {
  return {
    type: "circularRef",
    ref,
  };
}

For rendering, schema viewer can show circular reference by name.


13. Bundling vs dereferencing

Two strategies:

StrategyMeaningProsCons
BundleExternal refs become internal refspreserves structurestill needs ref handling
DereferenceReplace refs with actual objectseasier traversalcycles/problematic, loses names
HybridNormalize registry and keep refsbest for docsmore design work

Recommended: hybrid registry.

Create registries:

export type OpenApiRegistry = {
  specs: Map<OpenApiSpecId, NormalizedOpenApiDocument>;
  operations: Map<OperationKey, NormalizedOperation>;
  schemas: Map<SchemaKey, NormalizedSchema>;
  securitySchemes: Map<string, NormalizedSecurityScheme>;
};

Do not blindly inline everything.


14. Normalized document model

export type NormalizedOpenApiDocument = {
  id: OpenApiSpecId;
  title: string;
  version?: string;
  description?: string;
  servers: NormalizedServer[];
  operations: NormalizedOperation[];
  schemas: NormalizedSchema[];
  securitySchemes: NormalizedSecurityScheme[];
  tags: NormalizedTag[];
  source: OpenApiSource;
  diagnostics: Diagnostic[];
};

Operation:

export type NormalizedOperation = {
  key: OperationKey;
  specId: OpenApiSpecId;
  operationId: string;
  method: HttpMethod;
  path: string;
  summary?: string;
  description?: string;
  tags: string[];
  deprecated: boolean;
  parameters: NormalizedParameter[];
  requestBody?: NormalizedRequestBody;
  responses: NormalizedResponse[];
  security: NormalizedSecurityRequirement[];
  servers: NormalizedServer[];
  examples: NormalizedExample[];
  source: ProvenanceRef;
};

Method:

export type HttpMethod =
  | "GET"
  | "POST"
  | "PUT"
  | "PATCH"
  | "DELETE"
  | "HEAD"
  | "OPTIONS"
  | "TRACE";

15. Operation key

Need stable operation key.

export type OperationKey = string & { readonly brand: unique symbol };

export function operationKey(specId: string, method: string, path: string): OperationKey {
  return `${specId}:${method.toUpperCase()} ${normalizeApiPath(path)}` as OperationKey;
}

Do not rely only on operationId because specs can have missing/duplicate operation IDs.

But for generated page route, operationId is preferred if valid.


16. Operation ID validation

Operation ID is extremely useful for:

  • route generation,
  • code sample generation,
  • SDK mapping,
  • stable anchors,
  • links,
  • API components.

Rules:

  1. operationId should exist,
  2. operationId should be unique within spec,
  3. operationId should be slug-safe or mappable,
  4. operationId should not change casually.

Diagnostic:

{
  code: "openapi.operation.missingOperationId",
  severity: config.requireOperationId ? "error" : "warning",
  category: "openapi",
  message: `Operation ${method} ${path} is missing operationId.`,
  location: operationLocation,
  hint: "Add a stable operationId such as createUser or listUsers.",
}

Duplicate:

error openapi.operation.duplicateOperationId
operationId "createUser" is used by multiple operations.

17. Summary and description validation

API reference without summary is weak.

Rules:

  • summary should exist,
  • summary should be concise,
  • description can be longer,
  • summary should not repeat method/path only,
  • avoid "TODO".

Diagnostic:

warning openapi.operation.missingSummary
POST /users is missing summary.

If requireSummary true, error.


18. Tags validation

Tags drive navigation grouping.

Rules:

  • operation should have at least one tag,
  • tag should be declared in root tags if strict,
  • tag names should be stable,
  • avoid too many tags per operation.

Diagnostic:

warning openapi.operation.missingTags
Operation POST /users has no tags, so API navigation grouping may be poor.

Fallback grouping by path prefix.


19. Parameter normalization

OpenAPI parameters can be defined at path or operation level.

Normalize combined parameters:

export type NormalizedParameter = {
  name: string;
  in: "path" | "query" | "header" | "cookie";
  required: boolean;
  description?: string;
  deprecated: boolean;
  schema?: NormalizedSchemaRef;
  examples: NormalizedExample[];
  source: ProvenanceRef;
};

Merge path-level + operation-level.

If operation-level parameter overrides same name + in, use operation-level.

Validation:

  • path params must be required,
  • every {id} in path should have path parameter,
  • no extra path parameter not in path template,
  • parameter should have schema,
  • parameter description recommended.

20. Path parameter validation

Path:

/users/{id}

Expected parameter:

parameters:
  - name: id
    in: path
    required: true

Check:

export function extractPathTemplateParams(path: string): string[] {
  return [...path.matchAll(/{([^}]+)}/g)].map((m) => m[1]!);
}

Diagnostics:

error openapi.path.missingPathParameter
Path /users/{id} uses parameter {id}, but operation does not define path parameter "id".
warning openapi.path.unusedPathParameter
Operation defines path parameter "userId", but it is not present in path /users/{id}.

21. Request body normalization

export type NormalizedRequestBody = {
  required: boolean;
  description?: string;
  content: NormalizedMediaType[];
  source: ProvenanceRef;
};

export type NormalizedMediaType = {
  mediaType: string;
  schema?: NormalizedSchemaRef;
  examples: NormalizedExample[];
};

Common media types:

  • application/json,
  • application/x-www-form-urlencoded,
  • multipart/form-data,
  • text/plain.

Validation:

  • write operations often should have request body, but not always,
  • request body content should have schema,
  • examples recommended for public API.

22. Response normalization

export type NormalizedResponse = {
  status: string;
  description: string;
  content: NormalizedMediaType[];
  headers: NormalizedHeader[];
  source: ProvenanceRef;
};

Rules:

  • responses object required,
  • each response should have description,
  • success response recommended,
  • error responses recommended,
  • response schemas recommended for JSON APIs.

Diagnostic:

warning openapi.response.missingSuccessResponse
POST /users has no 2xx response.
warning openapi.response.missingErrorResponses
POST /users has no 4xx error response.

23. Schema normalization

Schemas are complex. Start with normalized model.

export type NormalizedSchema =
  | ObjectSchema
  | ArraySchema
  | PrimitiveSchema
  | EnumSchema
  | RefSchema
  | OneOfSchema
  | AnyOfSchema
  | AllOfSchema
  | UnknownSchema;

export type ObjectSchema = {
  kind: "object";
  name?: string;
  description?: string;
  required: string[];
  properties: NormalizedSchemaProperty[];
  additionalProperties?: boolean | NormalizedSchemaRef;
  source: ProvenanceRef;
};

export type NormalizedSchemaProperty = {
  name: string;
  required: boolean;
  deprecated: boolean;
  description?: string;
  schema: NormalizedSchemaRef;
};

Schema ref:

export type NormalizedSchemaRef = {
  ref?: string;
  name?: string;
  schema?: NormalizedSchema;
};

Keep schema viewer capable of lazy resolving.


24. OpenAPI 3.0 vs 3.1

OpenAPI 3.1 aligns more closely with JSON Schema than 3.0. But generation pipeline should normalize into internal schema model.

Potential differences:

  • nullable handling in 3.0,
  • JSON Schema dialect in 3.1,
  • type arrays in 3.1,
  • examples behavior,
  • schema keywords.

Strategy:

export type OpenApiVersion = "3.0" | "3.1" | "unknown";

Normalize:

  • 3.0 nullable: true → include nullability metadata,
  • 3.1 type: ["string", "null"] → nullable,
  • preserve original schema for advanced viewer.

Do not erase version-specific details. Store raw pointer/provenance.


25. Security schemes

Normalize security schemes.

export type NormalizedSecurityScheme =
  | {
      type: "http";
      scheme: string;
      bearerFormat?: string;
      description?: string;
    }
  | {
      type: "apiKey";
      name: string;
      in: "query" | "header" | "cookie";
      description?: string;
    }
  | {
      type: "oauth2";
      flows: unknown;
      description?: string;
    }
  | {
      type: "openIdConnect";
      openIdConnectUrl: string;
      description?: string;
    };

Operation security:

export type NormalizedSecurityRequirement = {
  schemeName: string;
  scopes: string[];
};

Docs should show auth requirements per operation.

Validation:

  • referenced security scheme exists,
  • public operations intentionally unauthenticated if security empty,
  • auth scheme description recommended.

26. Servers

Servers can appear root-level, path-level, operation-level.

Normalize effective servers per operation.

export type NormalizedServer = {
  url: string;
  description?: string;
  variables: Record<string, {
    default: string;
    enum?: string[];
    description?: string;
  }>;
};

Docs can show:

  • base URL,
  • environment options,
  • server variables.

For generated API playground, server info matters.


27. Examples

OpenAPI examples appear in:

  • parameter example,
  • parameter examples,
  • request body media type example,
  • request body media type examples,
  • response media type example,
  • response media type examples,
  • schema example.

Normalize:

export type NormalizedExample = {
  name?: string;
  summary?: string;
  description?: string;
  value?: unknown;
  externalValue?: string;
  source: ProvenanceRef;
};

Do not fetch external examples unless configured.

If externalValue remote, validation should note.


28. Provenance for OpenAPI elements

Every normalized object should know where it came from.

OpenAPI provenance can use JSON pointer.

export type OpenApiProvenanceRef = {
  artifactId: ArtifactId;
  path: string;
  selector: string;
  hash: string;
  kind: "openapiOperation" | "openapiSchema" | "openapiParameter" | "openapiResponse";
};

Operation pointer:

openapi/public.yaml#/paths/~1users/post

Schema pointer:

openapi/public.yaml#/components/schemas/User

This is essential for:

  • diagnostics,
  • stale detection,
  • citations,
  • traceability,
  • PR comments.

29. Diagnostics location for YAML

JSON pointer is precise but not line/column.

For better diagnostics:

  1. parse YAML with CST if possible,
  2. map JSON pointer to line/column,
  3. fallback to path + selector.

Diagnostic location:

{
  path: "openapi/public.yaml",
  selector: "#/paths/~1users/post/operationId"
}

Even without line, selector is actionable.


30. OpenAPI registry

After ingestion:

export type OpenApiRegistry = {
  documents: Map<OpenApiSpecId, NormalizedOpenApiDocument>;
  operationsByKey: Map<OperationKey, NormalizedOperation>;
  operationsById: Map<string, NormalizedOperation[]>;
  schemasByKey: Map<string, NormalizedSchema>;
  tags: Map<string, NormalizedTag>;
};

Build:

export function buildOpenApiRegistry(
  documents: NormalizedOpenApiDocument[]
): OpenApiRegistry {
  const registry = createEmptyOpenApiRegistry();

  for (const document of documents) {
    registry.documents.set(document.id, document);

    for (const operation of document.operations) {
      registry.operationsByKey.set(operation.key, operation);

      const byId = registry.operationsById.get(operation.operationId) ?? [];
      byId.push(operation);
      registry.operationsById.set(operation.operationId, byId);
    }

    for (const schema of document.schemas) {
      registry.schemasByKey.set(schemaKey(document.id, schema), schema);
    }
  }

  return registry;
}

API reference generator consumes registry, not raw OpenAPI object.


31. Store integration

Store OpenAPI as semantic artifacts.

Operation semantic artifact:

{
  type: "apiEndpoint",
  id: "openapi:public:createUser",
  sourceKind: "openapi",
  key: "POST /users",
  confidence: "high",
  payload: {
    specId: "public",
    operationId: "createUser",
    method: "POST",
    path: "/users",
    tags: ["Users"],
    deprecated: false
  }
}

Schema artifact maybe:

{
  type: "apiSchema",
  id: "openapi:public:schema:User",
  key: "User",
  payload: {
    specId: "public",
    name: "User"
  }
}

Graph edges:

openapi operation --usesSchema--> User
docPage --documents--> openapi operation
code route --matchesContract--> openapi operation

32. Code/spec consistency

From framework discovery, we may have code endpoint artifacts.

Match by:

  • method,
  • normalized path template,
  • maybe operationId if annotations,
  • visibility,
  • base path mapping.

Normalize path parameters:

/users/{id}

and:

/users/:id

both map to:

/users/{id}

Function:

export function normalizeEndpointPath(path: string): string {
  return path
    .replace(/:([A-Za-z_][A-Za-z0-9_]*)/g, "{$1}")
    .replace(/\/+/g, "/")
    .replace(/\/$/, "") || "/";
}

Consistency diagnostics:

CaseDiagnostic
code route not in OpenAPIwarning
OpenAPI operation no code handlerwarning
method/path duplicatedwarning/error
code path param mismatchwarning
internal route excludedinfo/no diagnostic

33. OpenAPI quality rules

Suggested rules:

RuleCodeDefault
Missing operationIdopenapi.operation.missingOperationIdwarning/error
Duplicate operationIdopenapi.operation.duplicateOperationIderror
Missing summaryopenapi.operation.missingSummarywarning
Missing tagsopenapi.operation.missingTagswarning
Missing path parameter definitionopenapi.path.missingPathParametererror
Unused path parameteropenapi.path.unusedPathParameterwarning
Missing 2xx responseopenapi.response.missingSuccessResponsewarning
Missing response descriptionopenapi.response.missingDescriptionerror/warning
Missing request schemaopenapi.request.missingSchemawarning
Unknown security schemeopenapi.security.unknownSchemeerror
External ref disallowedopenapi.ref.remoteNotAllowederror
Circular schema refopenapi.schema.circularRefinfo/warning
Empty schema objectopenapi.schema.emptywarning

34. Style rules

Style rules improve generated docs.

Examples:

  • summary should start with verb?
  • tag names should be Title Case?
  • operationId should be camelCase?
  • schemas should have descriptions?
  • error responses should use consistent error model?

These should be configurable, not hard-coded.

{
  "openapi": {
    "style": {
      "operationIdCase": "camelCase",
      "requireSchemaDescriptions": true,
      "requireErrorResponses": true
    }
  }
}

Avoid making subjective style rules fatal by default.


35. Security checks

OpenAPI ingestion can expose risks.

Checks:

  1. remote refs disallowed by default,
  2. external examples not fetched by default,
  3. server URLs should not include secrets,
  4. examples should not contain real tokens,
  5. API keys in examples should be redacted,
  6. internal/admin specs should not be published unless configured.

Secret-like example diagnostic:

error openapi.example.secretLike
OpenAPI example appears to contain a secret-like value.

Server URL diagnostic:

warning openapi.server.suspiciousCredential
Server URL appears to contain credentials.

36. Ingestion cache

Cache by:

  • spec source hash,
  • tool version,
  • parser/normalizer version,
  • validation config hash.
export type OpenApiCacheKey = {
  specId: string;
  sourceHash: string;
  normalizerVersion: string;
  validationConfigHash: string;
};

Cache value:

export type OpenApiCacheEntry = {
  key: OpenApiCacheKey;
  document: NormalizedOpenApiDocument;
  diagnostics: Diagnostic[];
};

Cache helps large specs.


37. Command surface

Commands:

docforge openapi check
docforge openapi inspect
docforge openapi list-operations
docforge openapi list-schemas
docforge openapi diff

Examples:

docforge openapi check openapi/public.yaml

Output:

OpenAPI check completed.

Spec: public
Operations: 128
Schemas: 42
Errors: 0
Warnings: 11

List operations:

POST /users      createUser      Users
GET  /users/{id} getUser         Users

38. OpenAPI diff

Useful for docs impact.

Diff:

export type OpenApiDiff = {
  addedOperations: NormalizedOperation[];
  removedOperations: NormalizedOperation[];
  changedOperations: Array<{
    before: NormalizedOperation;
    after: NormalizedOperation;
    changes: OpenApiOperationChange[];
  }>;
  addedSchemas: NormalizedSchema[];
  removedSchemas: NormalizedSchema[];
  changedSchemas: SchemaChange[];
};

Operation changes:

  • summary changed,
  • request schema changed,
  • response schema changed,
  • parameter added/removed,
  • security changed,
  • deprecated changed.

Docs impact:

operation changed -> API page stale
schema changed -> all operations using schema may be stale

39. Testing ingestion

Fixtures:

fixtures/openapi/
  valid-3-0.yaml
  valid-3-1.yaml
  missing-operation-id.yaml
  duplicate-operation-id.yaml
  missing-path-param.yaml
  circular-schema.yaml
  remote-ref.yaml
  examples-secret.yaml

Test:

it("reports missing operationId", async () => {
  const result = await ingestOpenApiFixture("missing-operation-id.yaml", {
    requireOperationId: true,
  });

  expect(result.diagnostics).toContainEqual(
    expect.objectContaining({
      code: "openapi.operation.missingOperationId",
      severity: "error",
    })
  );
});

40. Golden normalized output tests

For valid fixtures, assert normalized operations.

it("normalizes operations", async () => {
  const result = await ingestOpenApiFixture("valid-3-0.yaml");

  expect(result.registry.operationsByKey.get("public:POST /users")).toMatchObject({
    method: "POST",
    path: "/users",
    operationId: "createUser",
    tags: ["Users"],
  });
});

Golden tests catch accidental normalizer changes.


41. Failure modes

FailureCausePrevention
API docs hallucinate fieldsAI writes from prose, not specAPI reference generated from normalized OpenAPI
Build fails on circular schemanaive dereferencehybrid registry and cycle detection
Broken nav groupingmissing tags not diagnosedtag validation/fallback grouping
Wrong endpoint routeoperationId/path unstableoperation key and route lock
Duplicate operation pagesduplicate operationIdduplicate validation
Path params missingno semantic path validationpath parameter check
Secrets published in examplesno example scanningsecret-like redaction diagnostics
Remote refs break CIremote fetch defaultdisallow remote by default
Code/spec drift hiddenno consistency checkmatch OpenAPI operations with code routes
Huge spec slowno cachesource hash cache
Bad YAML error unclearraw parser exceptionnormalized diagnostics with path/selector

42. Key takeaways

OpenAPI ingestion transforms formal API specs into normalized, validated, provenance-rich API facts.

The pipeline is:

Design principles:

  1. treat OpenAPI as high-authority source,
  2. parse and validate before generation,
  3. normalize into internal model,
  4. preserve provenance via JSON pointer,
  5. handle refs and cycles safely,
  6. keep remote access explicit,
  7. run quality/style/security checks,
  8. store operations as semantic artifacts,
  9. cross-check with code discovery,
  10. and never let AI invent formal API details.

Next, we use this normalized registry to generate API reference pages.

Lesson Recap

You just completed lesson 23 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.