Series/Learn State-of-the-Art GitOps/IaC Pipeline

Build CoreOrdered learning track

Designing the Plan Pipeline

Learn State-of-the-Art GitOps/IaC Pipeline - Part 012

Designing the IaC plan pipeline: diff classification, affected units, speculative plans, saved plans, plan JSON, policy gates, cost/risk summaries, approval binding, and evidence artifacts.

[2026-07-03]22 min read4308 words

In This Lesson

1. The Plan Pipeline Contract 2. Plan Pipeline Invariants 3. Why `plan` Is Not Just `plan`

PrevNext

Lesson 1240 lesson track09–22 Build Core

#gitops#iac#terraform#opentofu+5 more

Part 012 — Designing the Plan Pipeline

The plan pipeline is where infrastructure changes become visible before they become real.

That sentence sounds simple.

It is not.

A weak plan pipeline only runs terraform plan or tofu plan and posts a wall of text into a pull request.

A production-grade plan pipeline answers a richer set of questions:

What changed?
Which state boundaries are affected?
Which dependencies are affected?
Which credentials were used?
Which policy profile applies?
Is the plan speculative or apply-bound?
Which resources will be created, updated, replaced, or destroyed?
Is the change consistent with environment, ownership, and approval rules?
What evidence was produced?
What must be rechecked before apply?

A plan pipeline is not a preview command.

It is a risk classification and evidence-generation system.

This part designs that system.

1. The Plan Pipeline Contract

The plan pipeline must produce a reviewable, auditable answer to this question:

If this change were applied to the target system, what would likely happen, and is that acceptable under our policy?

The word "likely" matters.

A speculative plan is not a guarantee.

OpenTofu and Terraform both describe plan as creating an execution plan that previews proposed changes, and both distinguish speculative plans from saved plans intended for later apply. A speculative plan is useful for review, but the target system may change between review and apply.

Therefore, a production plan pipeline must be honest:

PR plans are evidence for review.
Apply-time plans are evidence for execution.
A stale PR plan must not be treated as eternal truth.

The plan pipeline should produce artifacts that are useful but not overclaim what they prove.

2. Plan Pipeline Invariants

A serious plan pipeline follows these invariants.

Invariant 1: Every plan has a target

A plan without a clear state boundary is noise.

The pipeline must identify:

environment;
account/subscription/project;
region;
root module/unit;
backend address;
state key/workspace;
runner identity;
policy profile.

Invariant 2: Every plan is tied to a source revision

The plan must be linked to:

repository;
commit SHA;
pull request number;
module version or source reference;
lock files;
rendered/generated config checksum.

Invariant 3: Every plan has an execution identity

Reviewers must know whether the plan was generated with:

read-only credentials;
plan-only role;
production apply-capable role;
service-team role;
platform role;
security role.

A plan generated with insufficient permissions may be incomplete.

A plan generated with overly broad permissions is a security risk.

Invariant 4: Every plan is machine-readable

Human text is not enough.

The pipeline should produce machine-readable plan output so policies, summaries, and risk classification can operate reliably.

For Terraform/OpenTofu, that commonly means saving a plan and converting it to JSON using show -json, with careful handling of sensitive values.

Invariant 5: Every plan is policy-evaluated

A plan without policy is only information.

Policy turns it into a decision.

Invariant 6: Every plan result is durable evidence

A PR comment is not enough.

Evidence should be stored as artifacts with identity, timestamp, checksum, and retention.

3. Why `plan` Is Not Just `plan`

There are multiple plan types.

Plan Type	Purpose	Apply-Bound?	Risk
Local developer plan	Fast feedback before PR	No	May use wrong credentials/context.
CI speculative plan	PR review evidence	No	Can go stale before merge.
Apply-time plan	Final pre-apply check	Usually yes or immediately followed by apply	Must be policy checked.
Saved plan artifact	Apply exactly planned actions	Yes	Must be tightly bound and protected.
Drift plan	Detect remote changes	No, unless remediation flow	Can confuse intentional vs accidental drift.
Refresh-only plan	Update/inspect state vs remote	Not normal change	Can hide or reveal drift depending on workflow.
Destroy plan	Deletion preview	Maybe	Requires special governance.
Replace-target plan	Force replacement	Maybe	High risk; easy to misuse.

A state-of-the-art pipeline treats these as different events.

Do not let all of them use the same CI job with different flags and no policy distinction.

4. High-Level Architecture

The key is not the CI tool.

GitHub Actions, GitLab CI, Buildkite, Jenkins, CircleCI, Azure DevOps, or a managed IaC runner can all implement this.

The key is the contract.

5. Phase 1 — Trigger Discipline

A plan pipeline should run when the desired state may have changed.

Common triggers:

pull request opened;
pull request synchronized;
pull request reopened;
label added for special plan scope;
manual dispatch by authorized user;
scheduled drift scan;
policy repository update;
module version update;
upstream dependency output change.

But not all triggers should have equal privilege.

Fork PRs

Fork PRs are dangerous for IaC planning because plan steps may execute code-like provider/module behavior and may require access to state or credentials.

For public repositories, never give untrusted fork PRs production credentials.

Safer patterns:

run static validation only;
run module unit tests without cloud credentials;
require maintainer approval before credentialed plan;
use isolated ephemeral accounts;
use read-only state credentials;
prevent secrets exposure in logs.

Label-triggered elevated plans

A useful pattern:

normal PR update       -> static checks + low-privilege speculative plan
label: plan-prod       -> production speculative plan with read-only/plan role
label: full-graph-plan -> expanded scope, restricted to platform maintainers

Labels should not be magic strings anyone can apply.

They are governance events.

6. Phase 2 — Change Classification

Before planning, classify the change.

A pipeline that blindly plans changed folders misses risk.

Classification table

Change	Pipeline Response
`infra-live/prod/us-east-1/platform/eks`	Plan that unit; maybe dependents.
`infra-modules/vpc` with local source usage	Plan all live units using that module.
Module version bump in one live unit	Plan that unit and direct dependents if outputs may change.
`root.hcl` or shared provider config	Plan all inheriting units or block for manual scope.
Backend config	Block normal plan; require state migration workflow.
Policy rule	Run policy tests and evaluate representative affected units.
CI workflow	Require platform approval; plan pipeline behavior changed.
README	No plan, unless docs generate config.

The classifier is part of your safety system.

Treat it like production code.

7. Phase 3 — Affected Unit Resolution

Affected unit resolution maps a diff to real state boundaries.

Input:

changed files
repository metadata
module source graph
dependency graph
unit metadata
policy scopes

Output:

unit set to plan
plan scope
risk tier
required identity
required policy profile
required reviewers

A good resolver does not only answer "what folder changed?"

It answers:

Which real state boundaries could produce a different plan because of this change?

Example: service runtime change

changed:
  infra-live/prod/us-east-1/apps/order-api/terragrunt.hcl

affected:
  prod/us-east-1/apps/order-api

plan scope:
  single unit

Example: shared region config change

changed:
  infra-live/prod/us-east-1/region.hcl

affected:
  every unit under prod/us-east-1 inheriting region.hcl

plan scope:
  regional layer or full region depending on policy

Example: module implementation change

changed:
  infra-modules/rds-postgres/**

affected:
  all live units sourcing that module by local path

plan scope:
  all consumers, maybe grouped by environment risk

Example: versioned module release

changed:
  infra-live/stage/us-east-1/data/orders-db/terragrunt.hcl
  source ref: rds-postgres v1.4.2 -> v1.5.0

affected:
  only orders-db stage unit unless outputs affect dependents

Versioned modules make affected analysis easier.

Unversioned local modules make it harder.

8. Phase 4 — Execution Context Resolution

Before running plan, resolve the context.

A unit context should include:

unit: prod/us-east-1/platform/eks
environment: prod
region: us-east-1
account: prod-platform
stateKey: prod/us-east-1/platform/eks.tfstate
engine: opentofu
engineVersion: 1.x
moduleSource: git::ssh://example/infra-modules.git//eks?ref=v2.3.1
runnerIdentity: oidc:ci-prod-platform-plan
policyProfile: prod-foundation
riskTier: high
allowDestroy: false
owners:
  - platform-foundation
  - security

This context drives:

credentials;
backend access;
policy rules;
plan command flags;
PR summary grouping;
approval requirements;
artifact naming;
retention rules.

If your pipeline cannot produce this context, it is not ready for production-grade automation.

9. Phase 5 — Workspace and Dependency Preparation

For each affected unit:

Check out exact PR commit.
Install pinned engine version.
Install or verify wrapper version if using Terragrunt.
Restore plugin cache carefully.
Initialize backend.
Verify backend target.
Verify provider lock file.
Resolve dependencies.
Validate configuration.
Generate plan.

Backend target verification

This is critical.

Before plan, print and store:

backend type;
backend key/path;
workspace name if used;
target account/project/subscription;
target region;
assumed identity ARN/principal;
commit SHA.

Many real incidents come from running a correct plan against the wrong state or account.

The plan pipeline should make the target impossible to miss.

10. Phase 6 — Validation Before Plan

Do not run plan as the first check.

Run cheaper checks first.

Typical sequence:

Why this matters

Format catches noise early.
Static lint catches obvious mistakes without cloud credentials.
Contract validation catches missing metadata.
Graph validation catches cycles before hitting providers.
Policy tests catch rule syntax and expectation changes.
Init/validate catches provider/module resolution issues.
Plan catches real target diff.

A mature pipeline fails early and clearly.

11. Phase 7 — Plan Generation

The plan command should be boring.

The context around it should be strong.

Conceptual command:

tofu plan \
  -input=false \
  -lock=true \
  -out=plan.bin

Then:

tofu show -json plan.bin > plan.json
tofu show plan.bin > plan.txt

For Terraform, the shape is similar.

The exact flags vary by workflow, but the discipline stays the same:

non-interactive;
explicit backend;
explicit variables;
lock enabled unless there is a documented exception;
saved plan if you need JSON from exact planned actions;
sensitive output handling;
artifact checksum.

Speculative PR plan vs apply-bound plan

PR plan:

generated for review;
may not be applied;
can become stale;
should be policy checked;
should not grant unnecessary write privileges.

Apply-bound plan:

generated after merge or approval;
uses final target context;
is policy checked again;
may be saved and applied immediately;
must be tightly bound to commit, inputs, and identity.

Do not skip apply-time planning simply because PR had a green plan.

12. Saved Plan Artifact Design

If you store plan artifacts, treat them as sensitive.

A plan file can contain information about infrastructure topology and may include sensitive values depending on configuration and provider behavior.

A saved plan artifact should have metadata:

{
  "unit": "prod/us-east-1/platform/eks",
  "commit": "abc123",
  "pullRequest": 481,
  "engine": "opentofu",
  "engineVersion": "1.x",
  "stateKey": "prod/us-east-1/platform/eks.tfstate",
  "runnerIdentity": "ci-prod-platform-plan",
  "policyProfile": "prod-foundation",
  "planSha256": "...",
  "createdAt": "2026-07-03T10:15:00Z",
  "expiresAt": "2026-07-03T12:15:00Z"
}

Controls:

short retention for high-risk plans;
encrypted artifact storage;
access limited to platform reviewers/runners;
checksum verification before apply;
no plan reuse across commits;
no plan reuse across changed variables;
no plan reuse after dependency apply;
no plan reuse after drift detection changes target assumptions.

A saved plan can improve determinism.

It can also become a dangerous stale artifact.

Use it deliberately.

13. Plan Normalization

Raw plan output is too noisy for reviewers.

Normalize it.

Useful normalized fields:

{
  "unit": "prod/us-east-1/data/orders-db",
  "summary": {
    "create": 1,
    "update": 2,
    "replace": 0,
    "delete": 0
  },
  "resources": [
    {
      "address": "module.db.aws_db_parameter_group.this",
      "type": "aws_db_parameter_group",
      "action": "update",
      "risk": "medium"
    }
  ],
  "warnings": [],
  "sensitiveChanges": false,
  "policyDecision": "allow_with_review"
}

The PR summary should group by:

environment;
risk tier;
unit;
action type;
policy decision;
required reviewers.

Do not force humans to read raw provider diff for routine review.

Do expose raw diff for deep inspection.

14. Resource Action Semantics

Reviewers should understand action classes.

Action	Meaning	Review Concern
Create	New object will be created	Cost, exposure, ownership, tags.
Update in-place	Existing object modified	Runtime impact, downtime, policy drift.
Replace	Destroy then recreate or recreate then destroy	Data loss, outage, identity changes.
Delete	Object removed	Availability, data loss, dependencies.
No-op	No changes	May still matter if policy changed.
Read	Data source read	Credential scope, external dependency.

Replacement is especially important.

A plan that says "1 to replace" can be much more dangerous than "20 to update".

Risk should be semantic, not numeric.

15. Risk Classification

A production plan pipeline should classify risk.

Example rules:

if environment == prod and action includes delete:
  risk = critical

if resource type is database and action includes replace:
  risk = critical

if IAM policy grants wildcard actions:
  risk = high

if public network exposure changes:
  risk = high

if only tags change on non-prod resource:
  risk = low

Risk inputs:

action type;
resource type;
environment;
data classification;
owner;
state boundary;
operation mode;
policy result;
previous incidents;
rollback difficulty.

Risk classification should be explainable.

A reviewer should see not only "high risk" but why.

16. Policy Evaluation

Policy gates should evaluate both configuration and plan.

Configuration policy

Checks source/config before provider interaction.

Examples:

required tags exist;
deletion protection enabled for production databases;
module source must be pinned;
backend must be remote;
provider version constraints exist;
public ingress requires exception label.

Plan policy

Checks proposed changes.

Examples:

block production database replacement;
block IAM wildcard expansion;
block public S3/object storage;
require approval for network route changes;
require encryption for data resources;
require cost approval above threshold;
block destroy unless destroy workflow.

Metadata policy

Checks unit context.

Examples:

production unit must have owner;
risk tier must match layer;
allowed runner identity must match environment;
CODEOWNERS must include owner group;
service team cannot apply platform foundation unit.

Policy should return structured decisions:

{
  "decision": "deny",
  "rule": "prod_database_replace_blocked",
  "message": "Production database replacement is not allowed through normal apply workflow.",
  "resource": "module.db.aws_db_instance.this",
  "requiredPath": "manual-migration-workflow"
}

A policy engine should not merely fail the build.

It should tell the engineer the correct path.

17. Cost Estimation

Cost estimation is not a perfect control.

But it is useful review context.

A plan pipeline can estimate:

monthly cost delta;
new high-cost resources;
instance size changes;
storage growth;
data transfer risk;
NAT gateway count;
managed database class changes;
logging/retention cost impact.

Cost should be treated as a policy input, not only a comment.

Example:

Cost Delta	Policy
< $100/month	Informational.
$100–$ 1,000/month	Owner approval.
$1,000–$ 10,000/month	Platform/finance approval.
> $10,000/month	Change review required.

Do not overtrust cost estimation.

Use it to catch obvious surprises.

18. Security of the Plan Pipeline

The plan pipeline is a privileged surface.

Even if it does not apply changes, it may:

read remote state;
access cloud APIs;
fetch private modules;
expose topology;
run provider/plugin code;
access variables;
produce artifacts containing sensitive values.

Security controls:

Use short-lived identity

Prefer OIDC-based workload federation over long-lived static credentials.

Separate plan and apply roles

Plan role should not automatically be able to mutate production.

Some providers need read permissions that are broad. Keep write permissions separate when possible.

Lock down PR contexts

Do not run credentialed plans for untrusted forks without explicit approval and sandboxing.

Sanitize logs

Do not print environment variables, backend secrets, or provider credentials.

Protect plan artifacts

Treat plan binary and JSON as sensitive unless proven otherwise.

Pin tools and providers

Unpinned tools make reproducibility and evidence weak.

19. Concurrency and Queuing

Plan operations can often run concurrently.

But not always.

Concurrency risks:

state backend throttling;
provider API rate limits;
dependency outputs changing mid-run;
lock contention;
noisy PR comments;
cost explosion in CI minutes;
partial evidence failure.

A useful concurrency model:

Unit Relationship	Plan Concurrency
Independent dev units	Parallel.
Independent prod leaf units	Parallel with limit.
Shared foundation units	Serialized or low parallelism.
Upstream + downstream affected units	Ordered or staged.
Same state key	Never parallel.

Use a queue key such as:

iac-plan:<environment>:<account>:<region>:<state-key>

And for broader locks:

iac-plan-prod-foundation:<account>:<region>

Do not let every PR stampede production state refresh at once.

20. PR Comment Design

A good PR comment is an interface.

It should not be a log dump.

Example structure:

## IaC Plan Summary

Commit: abc123
Plan type: speculative PR plan
Scope: 3 units
Environment: prod/us-east-1
Overall decision: requires review

### Unit: prod/us-east-1/data/orders-db
Risk: high
Policy: requires database owner approval
Actions: 0 create, 2 update, 0 replace, 0 delete
Notable changes:
- DB parameter group update: `max_connections`
- Backup retention: 14 -> 30 days

### Unit: prod/us-east-1/apps/order-api
Risk: low
Policy: allow
Actions: 1 create, 1 update, 0 replace, 0 delete

Artifacts:
- plan.json
- plan.txt
- policy-result.json
- resolved-context.json

A reviewer should quickly answer:

What changed?
Where?
How risky?
Which policy fired?
What approvals are required?
Where is the raw evidence?

21. Evidence Artifact Model

The plan pipeline should store evidence.

Minimum artifacts per unit:

Artifact	Purpose
`resolved-context.json`	Target, identity, policy profile, owner, state key.
`plan.bin` or equivalent	Saved plan if used.
`plan.json`	Machine-readable plan.
`plan.txt`	Human-readable raw plan.
`plan-summary.json`	Normalized summary.
`policy-result.json`	Structured policy decisions.
`tool-versions.json`	Engine, provider, wrapper versions.
`checksums.txt`	Integrity binding.
`dependency-graph.json`	Unit dependencies and plan scope.

Artifact retention should vary by risk:

Risk	Suggested Retention
Dev low-risk	Short.
Stage medium-risk	Medium.
Prod high-risk	Long enough for audit/compliance.
Regulated systems	Aligned to regulatory evidence policy.

Evidence is not bureaucracy.

It is how you debug incidents and prove control later.

22. Approval Binding

Approval should bind to the thing being approved.

Weak approval:

Someone clicked approve on the PR.

Strong approval:

Authorized owner approved the plan summary for commit abc123, affecting units X/Y/Z, with policy decisions A/B/C, before apply pipeline executed.

Approval binding should consider:

commit SHA;
affected unit set;
plan summary checksum;
policy result checksum;
approving identity;
CODEOWNERS/team membership at approval time;
approval expiration;
whether new commits invalidate approval.

A plan pipeline does not usually enforce final apply eligibility alone.

But it must produce the data needed for apply eligibility.

23. Handling No-Change Plans

A no-change plan is still useful.

It proves:

config is syntactically valid;
backend can be initialized;
credentials can read target;
current state matches desired config for that unit at that time;
policies did not reject metadata/config.

But a no-change plan can also hide issues:

wrong target selected;
policy did not evaluate relevant rules;
provider ignored unmanaged drift;
remote APIs returned incomplete data;
generated config did not include expected change.

So the PR summary should say:

No changes detected for prod/us-east-1/platform/eks.
Target verified: account=prod-platform, region=us-east-1, stateKey=...

No-change without target verification is not reassuring.

24. Handling Plan Failures

Plan failures are not all equal.

Failure Type	Meaning	Response
Syntax error	Broken config	Fail PR; fix code.
Provider auth error	Bad credentials or permission	Route to platform/identity owner.
Backend init error	State backend issue	Block; investigate state access.
Lock error	Another operation active	Retry or queue; do not force unlock casually.
Dependency output missing	Contract break or unapplied upstream	Block; fix dependency.
Provider API error	Remote API issue or rate limit	Retry with backoff or report transient failure.
Policy deny	Unsafe proposed change	Follow required path.
Tool version mismatch	Reproducibility problem	Fix version pinning.

A mature pipeline classifies failures so users know whether to fix code, wait, ask platform, or use a special workflow.

25. Drift-Aware Planning

Normal PR planning assumes the remote state and real infrastructure are in acceptable sync.

But drift changes the interpretation.

Scenarios:

Scenario A: PR introduces change, no drift

Easy. Review proposed change.

Scenario B: Drift exists unrelated to PR

The plan may show changes not authored by the PR.

The pipeline should flag drift separately.

Scenario C: Drift conflicts with PR

The PR may accidentally overwrite emergency/manual changes.

Require review.

Scenario D: Drift is intentional but not codified

The correct fix is to codify or explicitly waive it.

A plan pipeline should avoid mixing authored changes and drift remediation invisibly.

26. Targeted Plans and Their Risks

Terraform/OpenTofu support targeting specific resources for exceptional cases.

Targeted plans can be useful during recovery.

They are dangerous as normal workflow.

Why?

Because targeting can produce a partial view of dependencies and may hide changes that a full plan would reveal.

Production rule:

Targeted plans are break-glass or migration tools, not routine review tools.

Require:

explicit reason;
elevated approval;
evidence annotation;
follow-up full plan;
time-bound exception.

27. Destroy Plans

Destroy plans are a separate class.

Never treat destroy as just another plan action.

Controls:

separate workflow;
explicit resource scope;
production deletion protection;
data backup verification;
dependency impact analysis;
owner approval;
platform/security approval for shared resources;
retention/legal hold checks;
dry-run evidence;
post-destroy verification.

The PR comment should scream when deletes are present.

Not visually, but semantically.

Example:

Decision: DENY normal workflow
Reason: production delete detected for durable data resource
Required path: database-decommission workflow

28. Plan Pipeline for Multi-Unit Orchestration

When multiple units are planned, produce both unit-level and graph-level evidence.

Graph summary should include:

planned units;
skipped units;
failed units;
dependency order;
external dependencies;
changed upstream outputs;
downstream units requiring re-plan;
graph cycles if detected.

For large graphs, summarize aggressively and link artifacts.

29. Implementation Blueprint

A simple production-grade plan pipeline can be implemented as stages.

stages:
  - classify
  - resolve-context
  - static-checks
  - graph-checks
  - plan
  - normalize
  - policy
  - summarize
  - publish-evidence

Stage: classify

Inputs:

base SHA;
head SHA;
changed files.

Outputs:

change categories;
initial affected units;
required plan scope.

Stage: resolve-context

Inputs:

affected units;
unit metadata;
environment registry.

Outputs:

execution matrix;
credentials mapping;
policy profile.

Stage: static-checks

Run:

format;
lint;
module contract checks;
backend config checks;
version pin checks.

Stage: graph-checks

Run:

dependency graph build;
cycle detection;
forbidden dependency checks;
owner boundary checks.

Stage: plan

Run per unit:

init;
validate;
plan;
show JSON/text;
checksum artifacts.

Stage: normalize

Convert raw output to:

action counts;
resource list;
high-risk changes;
sensitive change markers;
replacement/delete highlights.

Stage: policy

Evaluate:

config policy;
metadata policy;
plan policy;
exception policy.

Stage: summarize

Generate:

PR comment;
commit status/check summary;
required reviewers.

Stage: publish-evidence

Store:

plan artifacts;
policy results;
context;
tool versions;
logs;
checksums.

30. Minimal Plan Summary Schema

A useful internal schema:

{
  "schemaVersion": "iac.plan.summary.v1",
  "repository": "infra-live",
  "pullRequest": 481,
  "commit": "abc123",
  "planType": "speculative-pr",
  "createdAt": "2026-07-03T10:15:00Z",
  "overallDecision": "requires_review",
  "units": [
    {
      "unit": "prod/us-east-1/data/orders-db",
      "owner": "data-platform",
      "environment": "prod",
      "account": "prod-data",
      "region": "us-east-1",
      "stateKey": "prod/us-east-1/data/orders-db.tfstate",
      "riskTier": "high",
      "actions": {
        "create": 0,
        "update": 2,
        "replace": 0,
        "delete": 0
      },
      "policy": {
        "decision": "allow_with_review",
        "requiredReviewers": ["data-platform"]
      },
      "artifacts": {
        "planJson": "artifact://...",
        "planText": "artifact://...",
        "policyResult": "artifact://..."
      }
    }
  ]
}

Schemas matter.

Without schemas, every downstream automation scrapes logs.

Log scraping is not a platform architecture.

31. Common Mistakes

Mistake 1: Planning too little

Only changed folder is planned, but shared module change affects many units.

Mistake 2: Planning too much

Every PR plans the world, causing slow feedback and reviewer fatigue.

Mistake 3: Treating PR plan as apply truth

Remote state changed after PR plan.

Apply must re-check.

Mistake 4: Hiding target context

Reviewer sees resource diff but not account/region/state key.

Mistake 5: No machine-readable output

Policies rely on brittle text parsing.

Mistake 6: No failure classification

Users cannot tell whether failure is their code or platform issue.

Mistake 7: Plan artifacts leak secrets

Logs/artifacts are readable by too many people.

Mistake 8: No approval binding

Approval is not tied to plan, commit, or affected units.

Mistake 9: Targeted plan as default

Partial plans become normal and hide real changes.

Mistake 10: Policy after summary only

Policy should influence decision, not merely annotate after the fact.

32. Review Checklist

Before calling a plan pipeline production-grade, verify:

Target and context

Does every plan show environment, account, region, state key, and runner identity?
Are tool and provider versions pinned?
Are module versions pinned for production?
Is backend initialization verified?

Scope

Does the pipeline classify change type?
Does it resolve affected units beyond changed folders?
Does it handle shared config and module changes?
Does it separate normal, graph-wide, drift, destroy, and targeted plans?

Security

Are fork PRs safe?
Are credentials short-lived?
Are plan and apply roles separated?
Are artifacts protected?
Are logs sanitized?

Policy

Are config, metadata, and plan policies evaluated?
Are policy decisions structured?
Are deny messages actionable?
Are exceptions auditable?

Evidence

Are plan JSON/text artifacts stored?
Are resolved contexts stored?
Are checksums stored?
Are summaries schema-based?
Is approval binding possible?

Usability

Is the PR comment readable?
Are high-risk changes highlighted?
Are no-change plans target-verified?
Are failures classified?
Is reviewer fatigue controlled?

33. Mental Model Summary

The plan pipeline is not a command runner.

It is the first serious control point in the GitOps/IaC lifecycle.

It transforms a proposed source change into:

affected state boundaries;
target execution context;
proposed infrastructure changes;
risk classification;
policy decision;
human-readable review summary;
machine-readable evidence.

The core invariants are:

Plan the right units, not merely changed folders.
Always expose target context.
Treat speculative plans as review evidence, not final apply truth.
Convert plans into structured data.
Evaluate policy before approval.
Protect credentials and artifacts.
Bind evidence to commit, context, and identity.
Re-check before apply.

A weak plan pipeline says:

Here is a diff. Good luck.

A strong plan pipeline says:

Here is what would change, where, why it matters, what policy says, who must approve, and what evidence proves this review happened.

That is the level required for serious infrastructure engineering.

34. Practice Work

Design a plan pipeline for this repository shape:

infra-live/
  prod/us-east-1/network/vpc
  prod/us-east-1/platform/eks
  prod/us-east-1/data/orders-db
  prod/us-east-1/apps/order-api
infra-modules/
  vpc
  eks
  rds-postgres
  service-runtime
policy/
  terraform/
  metadata/
  cost/

Your design must specify:

Change classifier rules.
Affected unit resolver rules.
Unit context schema.
Plan command sequence.
JSON summary schema.
Policy evaluation stages.
PR comment format.
Artifact retention policy.
Fork PR security behavior.
Apply-time re-plan rule.

Then test it against these changes:

service runtime input update;
VPC module change;
production database replacement;
policy rule update;
backend key change;
documentation-only change.

For each case, decide:

which units are planned;
which policies run;
whether the PR can proceed;
what approvals are required;
what evidence is stored.

References

OpenTofu plan command documentation: execution plan behavior, speculative plans, saved plans, and automation guidance.
OpenTofu apply command documentation: automatic plan mode, saved plan mode, and -auto-approve warning.
Terraform plan command documentation: saved plan files, speculative plans, and automation-oriented two-step workflows.
OpenTofu state locking documentation: locking behavior for state write operations.
Terragrunt run and run queue documentation: multi-unit execution, affected filtering, dependency graph ordering, and graph mode behavior.

Lesson Recap

You just completed lesson 12 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 11

Terragrunt and Stack Orchestration Patterns

Next Lesson

Lesson 13

Designing the Apply Pipeline

Designing the Plan Pipeline

Part 012 — Designing the Plan Pipeline

1. The Plan Pipeline Contract

2. Plan Pipeline Invariants

Invariant 1: Every plan has a target

Invariant 2: Every plan is tied to a source revision

Invariant 3: Every plan has an execution identity

Invariant 4: Every plan is machine-readable

Invariant 5: Every plan is policy-evaluated

Invariant 6: Every plan result is durable evidence

3. Why plan Is Not Just plan

4. High-Level Architecture

5. Phase 1 — Trigger Discipline

Fork PRs

Label-triggered elevated plans

6. Phase 2 — Change Classification

Classification table

7. Phase 3 — Affected Unit Resolution

Example: service runtime change

Example: shared region config change

Example: module implementation change

Example: versioned module release

8. Phase 4 — Execution Context Resolution

9. Phase 5 — Workspace and Dependency Preparation

Backend target verification

10. Phase 6 — Validation Before Plan

Why this matters

11. Phase 7 — Plan Generation

Speculative PR plan vs apply-bound plan

12. Saved Plan Artifact Design

13. Plan Normalization

14. Resource Action Semantics

15. Risk Classification

16. Policy Evaluation

Configuration policy

Plan policy

Metadata policy

17. Cost Estimation

18. Security of the Plan Pipeline

Use short-lived identity

Separate plan and apply roles

Lock down PR contexts

Sanitize logs

Protect plan artifacts

Pin tools and providers

19. Concurrency and Queuing

20. PR Comment Design

21. Evidence Artifact Model

22. Approval Binding

23. Handling No-Change Plans

24. Handling Plan Failures

25. Drift-Aware Planning

Scenario A: PR introduces change, no drift

Scenario B: Drift exists unrelated to PR

Scenario C: Drift conflicts with PR

Scenario D: Drift is intentional but not codified

26. Targeted Plans and Their Risks

27. Destroy Plans

28. Plan Pipeline for Multi-Unit Orchestration

29. Implementation Blueprint

Stage: classify

Stage: resolve-context

Stage: static-checks

Stage: graph-checks

Stage: plan

Stage: normalize

Stage: policy

Stage: summarize

Stage: publish-evidence

30. Minimal Plan Summary Schema

31. Common Mistakes

Mistake 1: Planning too little

Mistake 2: Planning too much

Mistake 3: Treating PR plan as apply truth

Mistake 4: Hiding target context

Mistake 5: No machine-readable output

Mistake 6: No failure classification

Mistake 7: Plan artifacts leak secrets

Mistake 8: No approval binding

Mistake 9: Targeted plan as default

3. Why `plan` Is Not Just `plan`