Start HereOrdered learning track

Learn Ai Coding Agent Part 004 Agent Taxonomy Cli Ide Cloud Background Fleet

25 min read4932 words
PrevNext
Lesson 0464 lesson track01–12 Start Here

title: Learn AI Coding Agent From Scratch - Part 004 description: Taxonomy AI coding agent: CLI agent, IDE agent, cloud agent, background agent, fleet agent, PR reviewer, codemod bot, dan hybrid migration agent beserta konsekuensi arsitekturnya. series: learn-ai-coding-agent seriesTitle: Learn AI Coding Agent From Scratch order: 4 partTitle: Agent Taxonomy CLI IDE Cloud Background Fleet tags:

  • ai-coding-agent
  • taxonomy
  • cli-agent
  • ide-agent
  • cloud-agent
  • background-agent
  • fleet-agent date: 2026-07-03

Part 004 — Agent Taxonomy: CLI, IDE, Cloud, Background, Fleet

Banyak pembahasan AI coding agent gagal sejak awal karena semua jenis agent dicampur menjadi satu.

Autocomplete di IDE, chatbot yang menjawab pertanyaan kode, CLI agent yang mengedit file lokal, cloud agent yang membuat PR, background agent yang berjalan dari Slack, dan fleet agent yang memigrasi ribuan repository adalah sistem yang berbeda. Mereka sama-sama memakai LLM, tetapi berbeda dalam:

  • lokasi eksekusi;
  • sumber konteks;
  • permission model;
  • artifact yang dihasilkan;
  • feedback loop;
  • risk profile;
  • observability;
  • governance;
  • pola interaksi manusia.

Part ini bertujuan membuat taxonomy yang bersih agar kita tidak salah desain.

Kita sedang membangun Honk-like background/fleet coding agent, tetapi untuk membangun itu dengan benar kita perlu tahu batasnya terhadap jenis agent lain.

Referensi faktual yang relevan:


1. Kenapa taxonomy penting?

Tanpa taxonomy, kita mudah membuat kesimpulan salah.

Contoh kesalahan:

"Agent saya aman karena hanya berjalan di terminal developer."

Itu mungkin benar untuk local CLI agent, tetapi tidak cukup untuk background fleet agent. Kalau agent berjalan otomatis di 500 repository, masalahnya bukan hanya terminal permission. Masalahnya adalah blast radius, campaign rollout, PR noise, repository lock, reviewer load, dan audit.

Contoh lain:

"Agent saya sudah production-grade karena bisa run test."

Run test penting, tetapi cloud/background agent juga butuh sandbox, policy, credential boundary, PR orchestration, run lifecycle, cancellation, quota, dan trace.

Taxonomy membantu kita bertanya:

  1. Di mana agent berjalan?
  2. Siapa yang memberi izin?
  3. Apa yang boleh agent lakukan?
  4. Apa output resminya?
  5. Apa evidence bahwa output benar?
  6. Siapa yang menanggung risiko?
  7. Bagaimana agent dihentikan?
  8. Bagaimana hasilnya diaudit?
  9. Bagaimana sistem scale?

2. Dimensi klasifikasi agent

Kita klasifikasikan coding agent bukan berdasarkan nama produk, tetapi berdasarkan dimensi desain.

DimensiPertanyaan desain
Interaction modeApakah agent interaktif, background, atau scheduled?
Execution locationLokal, IDE, cloud sandbox, CI, Kubernetes worker?
ScopeSatu file, satu repo, multi-repo, fleet?
AutonomySuggestion-only, edit-with-approval, autonomous PR, campaign?
PermissionRead-only, write file, run command, network, secret, push branch?
Context sourceOpen files, repo search, issue, docs, build logs, metadata platform?
Output artifactSuggestion, patch, commit, PR, review comment, migration report?
Feedback loopHuman chat, compiler, test, CI, reviewer, fleet metrics?
Risk modelLocal mistake, repo breakage, supply chain, blast radius, governance?
ObservabilityChat history, local log, trace, replay, audit trail?

Produk yang terlihat mirip bisa berbeda total di dimensi ini.


3. Jenis 1 — Chat Coding Assistant

Definisi

Chat coding assistant menjawab pertanyaan dan menghasilkan kode di chat. Ia tidak punya akses langsung ke repository atau hanya punya konteks yang ditempel user.

Ciri utama

AspekKarakteristik
ExecutionTidak menjalankan kode
ContextDiberikan user secara manual
OutputTeks/snippet
VerificationUser manual
RiskSalah saran, outdated context
StrengthCepat untuk penjelasan, brainstorming, review kecil

Cocok untuk

  • bertanya konsep;
  • menjelaskan error;
  • membuat draft function;
  • membandingkan opsi desain;
  • review snippet kecil;
  • menulis test case sederhana.

Tidak cocok untuk

  • migrasi multi-file;
  • large refactor;
  • dependency upgrade nyata;
  • PR otomatis;
  • codebase yang butuh build/test;
  • perubahan sensitif.

Mental model

Chat assistant adalah advisor, bukan actor.

Masalah utama: ia tidak punya ground truth kecuali yang diberikan user. Karena itu ia sering terlihat meyakinkan tetapi salah konteks.


4. Jenis 2 — IDE Completion / Inline Assistant

Definisi

IDE assistant bekerja di dalam editor: autocomplete, inline edit, code action, quick fix, test generation, atau refactor suggestion.

Ciri utama

AspekKarakteristik
ExecutionBiasanya tidak menjalankan full workflow
ContextOpen files, project index, cursor location
OutputInline suggestion atau edit lokal
VerificationDeveloper + IDE checks
RiskLocal edit buruk, context sempit
StrengthLatency rendah, cocok untuk flow coding harian

Cocok untuk

  • melengkapi boilerplate;
  • menulis mapping sederhana;
  • membuat helper function;
  • memperbaiki syntax;
  • menulis unit test lokal;
  • rename atau small refactor dengan kontrol developer.

Tidak cocok untuk

  • task background;
  • perubahan lintas banyak repo;
  • automated PR;
  • campaign migration;
  • long-running verification.

Mental model

IDE assistant adalah pairing accelerator. Developer tetap menjadi execution controller.


5. Jenis 3 — CLI Coding Agent

Definisi

CLI coding agent berjalan di terminal lokal atau dev container. Ia bisa membaca repository, mencari file, mengedit file, menjalankan command, dan melakukan iterasi.

Claude Code adalah contoh modern yang secara resmi dideskripsikan sebagai agentic coding tool yang membaca codebase, mengedit file, menjalankan command, dan terintegrasi dengan development tools.

Ciri utama

AspekKarakteristik
ExecutionLokal/dev container
ContextRepository lokal + command output
OutputLocal diff, commit ops optional
VerificationCommand lokal
PermissionDeveloper approval atau configured mode
RiskShell command berbahaya, secret lokal, repo damage
StrengthSangat powerful untuk task developer sehari-hari

Cocok untuk

  • bug fix lokal;
  • test repair;
  • refactor satu repo;
  • dependency upgrade kecil;
  • eksplorasi codebase;
  • menyiapkan PR manual.

Tidak cocok untuk

  • unattended execution dalam skala besar;
  • multi-tenant platform;
  • fleet migration tanpa control plane;
  • task yang butuh audit enterprise;
  • menjalankan untrusted repo tanpa sandbox.

Design implication

CLI agent butuh:

  • permission prompt;
  • command approval;
  • .agent / AGENTS.md / project instruction;
  • context compaction;
  • local session log;
  • git diff discipline;
  • safe default untuk destructive command.

Tetapi CLI agent tidak otomatis punya:

  • queue;
  • central audit;
  • campaign control;
  • PR orchestration;
  • repository targeting;
  • multi-user policy.

Mental model

CLI agent adalah local autonomous pair programmer. Ia kuat karena dekat dengan dev environment, tetapi boundary-nya adalah mesin/developer yang menjalankannya.


6. Jenis 4 — Cloud Coding Agent

Definisi

Cloud coding agent menerima task, menyiapkan repository di cloud sandbox, menjalankan agent di environment terpisah, lalu menghasilkan patch atau PR.

OpenAI Codex Cloud dan GitHub Copilot cloud agent berada di kelas ini: task bisa berjalan di background, repository disiapkan di environment cloud, dan hasilnya bisa menjadi branch/PR.

Ciri utama

AspekKarakteristik
ExecutionCloud sandbox
ContextRepo checkout + task + optional issue/PR context
OutputPatch, branch, PR
VerificationSandbox commands + CI
PermissionConnected git account/app installation
RiskSandbox escape, credential scope, PR spam, supply chain
StrengthBisa berjalan async, tidak mengganggu mesin developer

Cocok untuk

  • task yang bisa dijelaskan sebagai issue;
  • bug fix terisolasi;
  • feature kecil/menengah;
  • test generation;
  • PR review fix;
  • background work paralel.

Tidak cocok untuk

  • perubahan lintas banyak repo tanpa campaign layer;
  • perubahan high-risk tanpa governance;
  • task butuh akses production secret;
  • task dengan requirement ambigu tinggi;
  • refactor arsitektural yang butuh keputusan produk/organisasi.

Design implication

Cloud agent butuh:

  • sandbox allocator;
  • repository provider;
  • credential scoping;
  • network policy;
  • execution timeout;
  • run state;
  • PR integration;
  • secure logging;
  • cancellation;
  • user-visible progress.

Mental model

Cloud agent adalah remote worker yang membuat code artifact. Ia harus diperlakukan seperti worker tidak dipercaya yang diberi izin minimum.


7. Jenis 5 — Background Coding Agent

Definisi

Background coding agent berjalan sebagai service internal yang menerima task dari sistem lain: Slack, ticket, migration campaign, scheduled job, dependency dashboard, developer portal, atau platform engineering workflow.

Honk berada di kelas ini. Yang penting bukan UI-nya, tetapi sifatnya: agent berjalan di belakang layar, membuat perubahan, memverifikasi, lalu membawa hasil ke PR workflow.

Ciri utama

AspekKarakteristik
ExecutionManaged worker/sandbox
ContextTask contract + repo + platform metadata
OutputPR, report, status update
VerificationCustom verifier + CI
PermissionPlatform policy, not just user prompt
RiskAutomation wrong at org scale
StrengthGreat for maintenance work and async productivity

Cocok untuk

  • dependency migration;
  • build config migration;
  • API deprecation cleanup;
  • repetitive PR generation;
  • test repair campaign;
  • small bug from ticket;
  • policy-driven code modernization.

Tidak cocok untuk

  • product discovery;
  • ambiguous architecture decision;
  • changes requiring deep domain negotiation;
  • high-risk production behavior change without review;
  • no-test legacy system unless additional guard exists.

Design implication

Background agent butuh semua elemen cloud agent, plus:

  • task contract;
  • policy engine;
  • central run queue;
  • observability dashboard;
  • audit trail;
  • team/repo ownership;
  • run cancellation;
  • verifier registry;
  • judge registry;
  • PR conventions;
  • notification integration.

Mental model

Background agent adalah software maintenance worker. Ia bukan “developer pengganti”. Ia mengerjakan kelas pekerjaan yang bisa dibatasi, diverifikasi, dan direview.


8. Jenis 6 — Fleet Coding Agent

Definisi

Fleet coding agent adalah background agent yang targetnya banyak repository atau banyak service sekaligus.

Inilah kelas yang paling dekat dengan large-scale internal platform.

Ciri utama

AspekKarakteristik
ExecutionMany workers, many repositories
ContextRepo metadata, ownership, dependency inventory
OutputMany PRs, campaign report
VerificationPer-repo verifier + aggregate metrics
PermissionOrg-level governance
RiskBlast radius, reviewer overload, systemic bad patch
StrengthMassive maintenance leverage

Cocok untuk

  • framework upgrade across services;
  • dependency vulnerability remediation;
  • config standardization;
  • deprecation removal;
  • code ownership metadata migration;
  • CI template migration;
  • organization-wide API adoption.

Tidak cocok untuk

  • task yang belum terbukti di beberapa repo;
  • migration tanpa verifier;
  • migration tanpa rollback/stop strategy;
  • perubahan semantik besar yang berbeda di tiap domain;
  • perubahan yang memerlukan koordinasi release manual di banyak tim.

Design implication

Fleet agent butuh kemampuan tambahan:

  • repository inventory;
  • targeting query;
  • dry-run mode;
  • canary batch;
  • prompt/verifier iteration;
  • per-team rate limit;
  • PR grouping;
  • duplicate detection;
  • campaign dashboard;
  • success/failure taxonomy;
  • automatic halt condition.

Mental model

Fleet agent adalah migration platform. LLM membantu adaptasi edge case, tetapi campaign governance menjaga agar kesalahan tidak menyebar.


9. Jenis 7 — PR Reviewer Agent

Definisi

PR reviewer agent tidak terutama membuat perubahan. Ia membaca diff, menjalankan analisis, memberi komentar, atau menyarankan patch.

Ciri utama

AspekKarakteristik
ExecutionPR context, optional sandbox
ContextDiff + surrounding files + CI logs
OutputReview comments, suggestions
VerificationStatic checks, optional test
PermissionUsually comment-only or suggestion-only
RiskNoise, false positives, bad advice
StrengthScales review support

Cocok untuk

  • catching obvious bugs;
  • style/policy feedback;
  • test gap detection;
  • migration compliance;
  • summarizing large diff;
  • checking generated PRs.

Tidak cocok untuk

  • acting as final authority;
  • replacing domain reviewer;
  • reviewing unclear business semantics alone;
  • blocking PR without deterministic policy.

Design implication

Reviewer agent should optimize for:

  • precision over recall;
  • actionable comments;
  • low noise;
  • citation to file/line;
  • severity classification;
  • deterministic checks for hard blocks;
  • explainable rationale.

Mental model

PR reviewer agent adalah review amplifier, bukan merge authority.


10. Jenis 8 — Deterministic Codemod Bot

Definisi

Codemod bot melakukan transformasi deterministik: AST rewrite, regex terkontrol, formatter, generator, atau migration script.

Ciri utama

AspekKarakteristik
ExecutionScript/rule-based
ContextAST/text/build metadata
OutputDeterministic patch
VerificationTest + rule checks
PermissionProgrammatic
RiskRule bug affects many files
StrengthFast, repeatable, cheap, predictable

Cocok untuk

  • import rename;
  • annotation migration;
  • method signature transform yang jelas;
  • package rename;
  • config key rename;
  • generated code update;
  • formatting.

Tidak cocok untuk

  • task dengan banyak semantic judgement;
  • ambiguous bug fix;
  • unknown API usage patterns;
  • migration yang butuh adaptasi domain-specific.

Design implication

Codemod bot harus diutamakan ketika transformasi jelas. Jangan memakai LLM jika AST transform cukup.

Top engineer memilih codemod untuk deterministic work dan agent untuk adaptive work.

Mental model

Codemod bot adalah compiler-like transformer. Ia kurang fleksibel, tetapi lebih predictable.


11. Jenis 9 — Hybrid Migration Agent

Definisi

Hybrid migration agent menggabungkan deterministic codemod dan LLM agent.

Pola umum:

  1. deterministic codemod melakukan 80% perubahan aman;
  2. build/test menemukan edge case;
  3. LLM agent memperbaiki kasus yang tidak tertangani;
  4. verifier memastikan hasil;
  5. judge menilai scope.

Ciri utama

AspekKarakteristik
ExecutionScript + agent loop
ContextRule output + verifier error + targeted files
OutputPatch/PR
VerificationStrongly required
PermissionConstrained to migration scope
RiskAgent overcorrects codemod result
StrengthBest of deterministic and adaptive approaches

Cocok untuk

  • framework migration;
  • dependency upgrade with breaking changes;
  • API deprecation cleanup;
  • test repair after codemod;
  • monorepo migration;
  • config migration with edge cases.

Tidak cocok untuk

  • migration without clear invariant;
  • tasks where no verifier can detect correctness;
  • changes that need product decisions.

Design implication

Hybrid agent butuh:

  • clear codemod output;
  • verifier feedback;
  • strict diff boundary;
  • agent repair scope;
  • before/after invariant;
  • fallback to manual review when repair fails.

Mental model

Hybrid migration agent adalah codemod with adaptive repair. Ini sering lebih aman daripada pure autonomous agent.


12. Comparative matrix

Agent typeRuns whereWrites code?Runs commands?Creates PR?Best forMain risk
Chat assistantChatNoNoNoExplanation, snippetsWrong context
IDE assistantIDEWith user acceptRare/limitedNoInline productivityLocal bad edit
CLI agentLocal terminalYesYesOptionalSingle-repo workLocal secret/shell risk
Cloud agentCloud sandboxYesYesYesAsync task PRCredential/sandbox risk
Background agentManaged serviceYesYesYesMaintenance automationBad autonomous PRs
Fleet agentManaged fleetYesYesMany PRsOrg-wide migrationsBlast radius
PR reviewerPR workflowUsually noOptionalNoReview assistNoisy comments
Codemod botScript runnerYesOptionalOptionalDeterministic migrationRule bug
Hybrid migrationScript + agentYesYesYesAdaptive migrationOverrepair

13. Taxonomy by autonomy level

Autonomy matters more than product name.

L0 — Suggestion only

Agent suggests code. Human applies.

Risk low, leverage limited.

L1 — User-accepted edit

Agent proposes edit inside IDE. Human accepts.

Good for local productivity.

L2 — Agent edits files

Agent writes to workspace. Need diff review.

L3 — Agent runs commands

Now risk increases sharply. Shell command can be dangerous. Need permissions, timeout, redaction.

L4 — Agent creates PR

Now output affects team workflow. Need PR convention, verifier, reviewer expectation.

L5 — Background multi-task agent

Now agent works without continuous human supervision. Need state machine, queue, audit.

L6 — Fleet campaign agent

Now one bad pattern can affect many repositories. Need rollout control, canary, halt, metrics, governance.

Our target is eventually L6, but we build from L2/L3 vertical slice first.


14. Taxonomy by artifact

Another useful classification: what does the agent produce?

ArtifactRequired rigor
ExplanationCite reasoning, no execution needed
SnippetSyntax check maybe enough
Local diffGit diff and local tests
CommitCommit message, author policy
Pull requestPR body, verification evidence, reviewers
Review commentPrecision, line reference, severity
Campaign reportAggregate metrics, failure taxonomy
Migration planTarget selection, rollout, rollback

A PR-producing agent has much higher responsibility than a snippet-producing assistant.


15. Taxonomy by execution trust

Execution location changes the risk model.

Local execution

Pros:

  • close to developer;
  • has existing environment;
  • fast iteration;
  • easy to inspect diff.

Cons:

  • secrets may exist locally;
  • command can damage workspace;
  • hard to centrally audit;
  • environment may not be reproducible.

Cloud sandbox

Pros:

  • isolated;
  • reproducible;
  • parallelizable;
  • suitable for async tasks.

Cons:

  • credential management hard;
  • network policy needed;
  • environment parity problem;
  • cost and quota management.

CI runner

Pros:

  • already tied to repo;
  • good for verification;
  • ephemeral;
  • familiar permission model.

Cons:

  • not designed for long agent loops;
  • expensive if abused;
  • hard for interactive context;
  • supply-chain risk still exists.

16. Taxonomy by context source

Context determines quality.

Context sourceUsed byStrengthRisk
User promptallintentambiguity
Open fileIDElocal precisionnarrow scope
Repo searchCLI/cloud/backgroundreal codetoo much noise
Build logsCLI/cloud/backgroundconcrete failurehuge/noisy logs
Issue/PR discussioncloud/backgroundrequirement contextstale or contradictory
Ownership metadatabackground/fleetreviewer routingstale org data
Dependency inventoryfleettargetingincomplete data
Docs/ADRadvanced agentsdesign intentoutdated docs
MCP tools/resourcesadvanced agentsstructured integrationtool trust boundary

The more autonomous the agent, the more curated its context must be.

Chat assistant can ask user for clarification. Fleet agent cannot ask 500 teams for every ambiguity. It needs task contract, metadata, and stop conditions.


17. Taxonomy by verifier strength

Agent autonomy should not exceed verifier strength.

Guideline:

Verifier levelSafe autonomy level
No verificationexplanation/snippet only
Format/syntaxsmall local edits
Compilesimple refactor
Unit testsPR for bounded change
Integration testsmoderate behavior change
Policy + semantic checksbackground automation
Rollout signalsfleet/platform change

If verifier is weak, autonomy must be low.

This is one of the most important rules in the whole series:

Do not increase autonomy without increasing verification.


18. Taxonomy by failure mode

Each agent class fails differently.

Agent typeTypical failure
Chat assistantplausible but wrong answer
IDE assistantbad inline completion accepted too quickly
CLI agentdestructive command, over-editing, local env mismatch
Cloud agentsandbox missing dependency, wrong branch, PR noise
Background agentweak task contract, hidden failure, bad PR artifact
Fleet agentrepeated wrong pattern across many repos
Reviewer agentnoisy false positive comments
Codemod botdeterministic bug applied everywhere
Hybrid migrationagent repairs symptoms instead of root cause

Failure modeling is not pessimism. It is architecture.


19. Where Honk-like fits

Our target system is not just a CLI agent and not just a cloud agent.

It is closer to:

Background Agent + Fleet Agent + Hybrid Migration Agent + PR Orchestrator

Meaning:

  • task can be triggered asynchronously;
  • agent runs in managed sandbox;
  • platform controls policy;
  • output is PR/report;
  • verifier is mandatory;
  • judge decides accept/repair/reject;
  • system can scale from one repo to many;
  • deterministic codemod can be combined with agent repair;
  • human review remains part of trust chain.

Diagram:

That means we should not optimize only for chat UX or local editing speed. We optimize for:

  • repeatable task execution;
  • safe automation;
  • evidence-based PR;
  • scalable rollout;
  • auditability;
  • low blast radius.

20. Choosing the right agent for a task

Use this decision table.

TaskBest mechanismWhy
“Explain this stack trace”Chat assistant / CLI read-onlyNo code change needed
“Implement this helper function”IDE / CLI agentLocal context enough
“Fix failing unit test in this repo”CLI/cloud agentNeeds command feedback
“Upgrade this one dependency”Cloud/background agentBounded PR workflow
“Replace deprecated API across 200 repos”Fleet hybrid migrationNeeds campaign control
“Rename import package across codebase”Codemod botDeterministic transform
“Migrate API with varied call patterns”Hybrid agentCodemod + adaptive repair
“Review this PR for risky changes”PR reviewer agentReview artifact exists
“Refactor core domain architecture”Human-led with agent assistAmbiguity too high
“Change production behavior with weak tests”Human-led, add tests firstVerifier too weak

A strong engineer does not ask “can an LLM do this?” first. They ask:

  1. Is the change bounded?
  2. Is the desired outcome observable?
  3. Is verification strong enough?
  4. Is blast radius controlled?
  5. Is human review placed at the right point?

21. Anti-pattern: one agent to rule them all

Do not build a generic agent that can do anything across every repository with broad permissions.

That path leads to:

  • unpredictable behavior;
  • impossible debugging;
  • high token cost;
  • poor verifier fit;
  • broad security exposure;
  • reviewer distrust;
  • many abandoned PRs.

Better pattern:

Many narrow task modes + strong contracts + specific verifiers + controlled rollout

Examples:

  • dependency-upgrade-agent;
  • api-migration-agent;
  • test-repair-agent;
  • config-modernization-agent;
  • pr-review-agent;
  • build-fix-agent.

They can share the same platform, but each task mode should have its own:

  • prompt contract;
  • allowed paths;
  • tool permissions;
  • verifier pipeline;
  • judge criteria;
  • PR template;
  • risk level.

22. Anti-pattern: treating fleet work as repeated single-repo work

A fleet campaign is not just many single-repo tasks.

Fleet work introduces new concerns:

ConcernWhy it appears at fleet scale
Target selectionNeed know which repos are affected
BatchingAvoid huge blast radius
OwnershipPRs need correct reviewers
Rate limitGit provider/CI/model quotas
Pattern driftEdge cases differ per repo
MetricsNeed aggregate success/failure view
Halt conditionStop if failure pattern emerges
Reviewer loadToo many PRs creates org friction
Duplicate workTeams may already be migrating manually

So the correct mental model is:

single-repo agent = execution unit
fleet agent = campaign control system

The single-repo agent is a worker. The fleet system is the manager.


23. Anti-pattern: using LLM where compiler already knows the answer

If compiler, type checker, AST, schema validator, or formatter can solve it deterministically, use them.

Examples:

  • format code → formatter;
  • sort imports → IDE/compiler tool;
  • rename symbol → language server/refactoring tool;
  • update OpenAPI generated client → generator;
  • validate JSON/YAML → schema validator;
  • find old dependency → dependency parser;
  • detect old import → grep/AST.

LLM should focus on adaptation and reasoning, not replace deterministic tools.

Best architecture:

This pattern is essential for production-grade migration.


24. Agent class vs architecture requirements

RequirementChatIDECLICloudBackgroundFleet
Tool registryLowMediumHighHighHighHigh
SandboxNoneLowMediumHighHighHigh
Central policyNoneLowMediumHighHighVery High
QueueNoneNoneLowMediumHighVery High
VerifierLowMediumHighHighVery HighVery High
JudgeLowLowMediumHighHighVery High
PR orchestrationNoneLowMediumHighHighVery High
ObservabilityLowLowMediumHighVery HighVery High
AuditLowLowMediumHighVery HighVery High
Rollout controlNoneNoneNoneLowMediumVery High

This table tells us why Honk-like architecture is heavier. It needs more machinery because it takes more responsibility.


25. The practical build order

Given the taxonomy, we should not start with fleet campaign. We build layers.

Phase 1 — Local single-repo agent

Goal:

  • read/search/edit/run command;
  • produce diff;
  • run verifier;
  • generate report.

Phase 2 — Sandboxed agent

Goal:

  • isolate workspace;
  • control command;
  • enforce path policy;
  • capture logs.

Phase 3 — Verifier-driven repair

Goal:

  • run build/test;
  • summarize errors;
  • feed back to agent;
  • stop after bounded iterations.

Phase 4 — PR artifact

Goal:

  • generate branch/commit/PR body;
  • include verification evidence;
  • no auto-merge.

Phase 5 — Background orchestrator

Goal:

  • task API;
  • queue;
  • run state;
  • worker;
  • cancellation;
  • audit.

Phase 6 — Policy and governance

Goal:

  • allowed paths;
  • permissions;
  • sandbox profiles;
  • budgets;
  • team rules.

Phase 7 — Fleet campaign

Goal:

  • target selection;
  • batching;
  • rollout metrics;
  • halt condition;
  • many PRs safely.

26. Design rule: autonomy must match evidence

This rule deserves repetition because it prevents bad platforms.

If you have...You may allow...
no verifierexplanation only
syntax verifiersmall generated snippet
compile verifierconstrained code edit
test verifierPR proposal
policy + test verifierbackground PR creation
fleet metrics + canarymulti-repo rollout

Do not build a fleet agent on top of weak tests and vibes.


27. Design rule: permission must match execution location

Permission that is acceptable locally may be unacceptable in cloud.

Example:

Local developer runs: mvn test

Usually acceptable.

But background agent running arbitrary mvn test across untrusted repositories in shared infrastructure needs:

  • container isolation;
  • network control;
  • CPU/memory limit;
  • artifact redaction;
  • dependency cache policy;
  • no broad secret exposure.

The same command has different risk depending on where it runs.


28. Design rule: artifact determines workflow

If output is a chat answer, no PR process needed.

If output is a PR, then you need:

  • branch naming;
  • commit message;
  • PR body;
  • labels;
  • reviewers;
  • CI;
  • run link;
  • verification evidence;
  • review response workflow.

If output is 300 PRs, you need campaign governance.

Artifact drives architecture.


29. The taxonomy we will use in this series

For this series, every component will be designed with the following target class:

targetAgentClass:
  interactionMode: background
  executionLocation: sandboxed_worker
  scope: single_repo_first_then_fleet
  autonomy: create_reviewable_pr_not_auto_merge
  permissions:
    readRepository: true
    writeWorkspace: true
    runCommands: restricted
    network: restricted
    secrets: minimal_ephemeral
    pushBranch: controlled
  context:
    taskContract: required
    repositoryMap: required
    verifierFeedback: required
    platformMetadata: optional_then_required
  output:
    - diff
    - verificationReport
    - judgeVerdict
    - pullRequest
  governance:
    policyEngine: required
    auditTrail: required
    humanReview: required

This is the backbone for the rest of the course.


30. Summary

The taxonomy gives us a precise target.

We are not building:

  • a pure chatbot;
  • a simple autocomplete;
  • a toy script that writes files;
  • an uncontrolled terminal agent;
  • a blind PR bot;
  • a fleet campaign without rollout control.

We are building:

A sandboxed, verifier-driven, policy-constrained, background coding agent platform that starts with one repository and can evolve into fleet-wide code change automation.

The key conclusion:

Agent type determines architecture.
Architecture determines safety.
Safety determines whether humans trust the agent.
Trust determines whether the system survives production use.

31. Apa yang akan dilanjutkan di Part 005

Part 005 akan masuk ke domain problem: code change automation.

Kita akan membedah:

  • kenapa perubahan kode otomatis sulit;
  • jenis perubahan yang aman vs berbahaya;
  • kenapa “compile pass” tidak cukup;
  • bagaimana developer trust rusak;
  • bagaimana PR agent bisa menjadi noise generator;
  • bagaimana memilih use case awal yang realistis;
  • bagaimana membuat sistem yang menghasilkan leverage tanpa merusak codebase.
Lesson Recap

You just completed lesson 04 in start here. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.