Start HereOrdered learning track

Learn Ai Coding Agent Part 009 End To End Reference Flow

[]14 min read2732 words

In This Lesson

1. Prinsip utama: flow harus deterministic di luar LLM 2. The complete end-to-end flow 3. Stage 1 — Task submitted

Lesson 0964 lesson track01–12 Start Here

title: Learn AI Coding Agent From Scratch - Part 009 description: End-to-end reference flow untuk Honk-like AI coding agent: dari task intake, normalization, policy gate, repository preparation, planning, agent loop, patching, verification, judge, PR creation, human review, sampai learning feedback. series: learn-ai-coding-agent seriesTitle: Learn AI Coding Agent From Scratch order: 9 partTitle: End-to-End Reference Flow tags:

ai-coding-agent
architecture
workflow
orchestration
verifier
pull-request
state-machine date: 2026-07-03

Part 009 — End-to-End Reference Flow

Kita sudah punya batas sistem, taxonomy, requirements, invariants, dan threat model. Sekarang kita masuk ke alur lengkap.

Pertanyaan inti part ini:

Ketika user berkata “migrasikan semua service dari library A ke library B”, apa yang sebenarnya terjadi di dalam Honk-like AI coding agent sampai akhirnya sebuah pull request muncul?

Jawaban yang buruk biasanya berbunyi:

Kirim prompt ke LLM, biarkan LLM edit file, jalankan test, lalu buat PR.

Jawaban itu terlalu pendek dan berbahaya. Sistem production-grade tidak boleh disusun sebagai “LLM diberi shell”. Yang kita bangun adalah pipeline perubahan kode terkontrol:

Task request
  -> normalized task contract
  -> policy decision
  -> repository snapshot
  -> sandbox run
  -> planning
  -> tool-mediated edits
  -> verification feedback
  -> judge decision
  -> branch + commit + PR
  -> human review
  -> telemetry + learning artifact

Referensi faktual yang relevan:

Spotify Engineering menjelaskan Honk sebagai background coding agent untuk large-scale software maintenance dan PR workflow; mereka juga menekankan feedback loop seperti verifier, judge, dan MCP-based integration.
https://engineering.atspotify.com/2025/11/spotifys-background-coding-agent-part-1
https://engineering.atspotify.com/2025/12/feedback-loops-background-coding-agents-part-3
OpenAI Codex cloud mendeskripsikan coding agent yang dapat bekerja di background, paralel, dan menjalankan task di cloud environment sendiri.
https://developers.openai.com/codex/cloud
MCP specification memisahkan tools, resources, dan prompts sebagai mekanisme integrasi model dengan external systems.
https://modelcontextprotocol.io/specification/2025-06-18
Claude Code documentation memosisikan agent sebagai tool yang dapat bekerja dengan command, permissions, memory, MCP, dan repository context.
https://docs.anthropic.com/en/docs/claude-code/overview

Part ini belum menulis implementation detail penuh. Kita sedang membuat reference flow yang nanti akan diterjemahkan menjadi domain model, API, database schema, worker, sandbox, tool runtime, verifier, dan PR orchestrator.

1. Prinsip utama: flow harus deterministic di luar LLM

LLM boleh membantu reasoning dan editing, tetapi sistem di sekitarnya harus deterministic sejauh mungkin.

Artinya:

siapa yang boleh membuat task ditentukan policy;
repo mana yang boleh disentuh ditentukan target selector;
branch mana yang dibuat ditentukan branch strategy;
command mana yang boleh dijalankan ditentukan sandbox policy;
perubahan mana yang boleh keluar ditentukan diff policy;
hasil mana yang boleh dibuat PR ditentukan verifier dan judge;
kapan human approval dibutuhkan ditentukan risk policy.

LLM tidak boleh menjadi satu-satunya sumber kebenaran untuk:

izin;
success criteria;
safety;
scope;
readiness;
mergeability;
production risk.

Mental model-nya:

LLM = heuristic worker inside a controlled change system.
System = source of authority.

Dalam engineering terms:

Agent runtime produces proposals.
Control plane decides lifecycle.
Verifier produces evidence.
Judge evaluates evidence.
Human reviewer retains authority for high-risk changes.

2. The complete end-to-end flow

Diagram besar:

Kunci dari diagram ini bukan jumlah langkahnya. Kunci utamanya adalah setiap fase punya input, output, authority, dan failure semantics.

Kalau sebuah fase tidak punya failure semantics, agent akan masuk ke kondisi abu-abu:

Tidak jelas apakah harus retry, stop, escalate, atau bikin PR draft.

Itu sumber banyak agent yang terasa “pintar”, tetapi tidak bisa dipercaya.

3. Stage 1 — Task submitted

Task adalah permintaan perubahan. Jangan mulai dari prompt. Mulai dari task contract.

Contoh input buruk:

Upgrade library auth-nya ya.

Contoh input yang bisa dinormalisasi:

taskType: dependency_upgrade
repository: payments-service
baseBranch: main
objective: Upgrade com.example:auth-client from 2.x to 3.x
scope:
  include:
    - pom.xml
    - src/main/java/**
    - src/test/java/**
  exclude:
    - database/migrations/**
    - deployment/**
constraints:
  - Keep public REST API backward compatible
  - Do not modify generated sources
  - Preserve existing authentication behavior
successCriteria:
  - mvn -q test passes
  - no forbidden path modified
  - PR summary explains migration decisions
riskLevel: medium

Sistem boleh menerima natural language, tetapi harus segera mengubahnya menjadi bentuk yang lebih ketat.

Task contract minimal:

Field	Fungsi
`taskType`	Mengarahkan strategy dan verifier default
`repository`	Target perubahan
`baseBranch`	Snapshot awal
`objective`	Tujuan perubahan yang bisa diverifikasi
`scope.include`	Area yang boleh disentuh
`scope.exclude`	Area yang tidak boleh disentuh
`constraints`	Batas perilaku agent
`successCriteria`	Evidence yang harus dipenuhi
`riskLevel`	Menentukan autonomy level
`requestedBy`	Audit dan permission

Invariant:

No run without a normalized task contract.

Bukan karena natural language selalu buruk, tetapi karena natural language terlalu longgar untuk menjadi audit artifact.

4. Stage 2 — Task normalization

Task normalization mengubah permintaan user menjadi format operasional.

Tugas normalizer:

mengklasifikasikan jenis task;
mengekstrak target repo dan branch;
menentukan scope perubahan;
menebak verifier default;
mengidentifikasi risiko;
meminta klarifikasi hanya jika task tidak bisa dibuat aman;
menghasilkan task contract yang dapat disimpan.

Contoh normalizer output:

{
  "taskType": "api_migration",
  "autonomyMode": "supervised_pr",
  "riskLevel": "medium",
  "targetRepositories": ["billing-service"],
  "baseBranch": "main",
  "allowedPaths": ["src/**", "pom.xml"],
  "blockedPaths": ["infra/**", "secrets/**", ".github/workflows/**"],
  "verifiers": ["format", "compile", "unit_test"],
  "requiresHumanApprovalBeforePR": false,
  "requiresHumanApprovalBeforeMerge": true
}

Perhatikan: output ini bukan prompt. Ini adalah control-plane decision input.

LLM bisa membantu normalization, tetapi hasilnya harus divalidasi secara deterministic:

repo harus exist;
branch harus exist;
requester harus punya permission;
blocked path tidak boleh overlap sembarangan;
task type harus dikenal;
verifier harus tersedia;
autonomy mode harus valid.

Kalau normalizer gagal, sistem tidak boleh melanjutkan ke sandbox.

5. Stage 3 — Policy and risk gate

Policy gate menjawab:

Apakah task ini boleh dijalankan?
Dengan mode autonomy apa?
Di environment apa?
Dengan tool permission apa?
Butuh approval di titik mana?

Contoh policy matrix:

Risk	Example	Allowed mode	PR allowed?	Human gate
Low	format-only, docs typo	autonomous_pr	yes	after PR
Medium	dependency upgrade minor	supervised_pr	yes	before merge
High	auth flow change	draft_only	draft only	before PR and merge
Critical	secret handling, crypto, payment semantics	analysis_only or blocked	no	explicit architecture review

Policy gate tidak boleh menerima jawaban LLM seperti:

This looks safe.

Ia butuh evidence:

task type;
repo sensitivity;
path sensitivity;
dependency criticality;
service ownership;
blast radius;
compliance tag;
production tier;
historical failure data;
user role.

Contoh decision record:

policyDecision:
  decision: allowed
  autonomyMode: supervised_pr
  reason:
    - task type dependency_upgrade is permitted
    - repository is non-critical tier 2
    - blocked paths exclude deployment and secrets
    - requester is repository maintainer
  requiredGates:
    - verifier_pass
    - diff_judge_pass
    - human_review_before_merge

Invariant:

Agent runtime receives capabilities, not trust.

Jadi agent tidak diberi akses penuh karena “sepertinya task-nya sederhana”. Ia diberi capability yang dibatasi.

6. Stage 4 — Target repository resolution

Setelah policy allow, sistem harus memastikan target benar.

Untuk single repo:

repository = github.com/org/payments-service
baseBranch = main
commit = 4f6a...c29

Untuk fleet change:

selector:
  organization: platform
  repoQuery:
    language: java
    hasFile: pom.xml
    containsDependency: com.example:auth-client
    excludes:
      - archived repos
      - deprecated services
      - critical payment gateway
batching:
  maxReposPerBatch: 20
  stopOnFailureRateAbove: 0.25

Target resolution menghasilkan snapshot yang immutable:

target:
  repo: payments-service
  baseBranch: main
  baseCommit: 4f6a921...
  resolvedAt: 2026-07-03T13:00:00+07:00

Kenapa base commit penting?

Karena branch main bergerak. Kalau run gagal dan direplay besok, kita harus tahu agent bekerja dari snapshot mana.

Invariant:

Every run must bind to an immutable base commit.

7. Stage 5 — Create run record

Run record dibuat sebelum eksekusi.

Minimal fields:

run:
  id: run_01J...
  taskId: task_01J...
  repository: payments-service
  baseCommit: 4f6a921...
  state: preparing
  autonomyMode: supervised_pr
  policyDecisionId: pol_01J...
  createdAt: 2026-07-03T13:01:00+07:00

Run record adalah ledger utama. Semua step, tool call, artifact, verifier result, patch, dan PR metadata menempel ke run ini.

Kenapa harus dibuat sebelum worker mulai?

Karena worker bisa mati. Tanpa run record, kita tidak tahu:

task mana yang sedang diproses;
sandbox mana yang dibuat;
biaya token berapa;
command apa yang sudah dijalankan;
artifact apa yang hilang;
apakah run aman untuk retry.

Rule:

No execution without durable run identity.

8. Stage 6 — Prepare sandbox

Sandbox bukan detail ops. Sandbox adalah trust boundary.

Sandbox menyiapkan:

filesystem workspace;
repository checkout;
dependency cache yang dikontrol;
network policy;
CPU/memory/time limit;
secret policy;
command policy;
artifact export path;
log capture;
cleanup strategy.

Contoh sandbox policy:

sandbox:
  image: ghcr.io/company/agent-java-runner:17
  cpu: 4
  memory: 8Gi
  timeoutMinutes: 45
  network:
    mode: restricted
    allowedHosts:
      - repo.maven.apache.org
      - internal-artifact-proxy.company.test
  secrets:
    mount: none
  filesystem:
    writable:
      - /workspace
    readonly:
      - /tools
  commands:
    allow:
      - git
      - mvn
      - java
      - rg
      - sed
      - awk
    deny:
      - curl
      - wget
      - ssh
      - nc
      - rm -rf /

Untuk production, policy ini tidak boleh hanya berada di prompt. Ia harus ditegakkan di runtime.

Salah:

Prompt: please do not access the network.

Benar:

Network namespace blocks unknown egress.

LLM instruction adalah soft control. Sandbox adalah hard control.

9. Stage 7 — Repository checkout

Worker melakukan checkout base commit.

Flow:

clone repo
checkout base commit
create worktree or branch
validate clean status
load repository instructions
compute initial repository map

Contoh branch awal:

agent/task-01j9-auth-client-upgrade

Sebelum agent membaca file, sistem perlu melakukan baseline scan:

file count;
language distribution;
build files;
test layout;
generated directories;
protected paths;
repo instruction files seperti AGENTS.md, CLAUDE.md, atau internal policy file;
dependency manifests;
known verifier commands.

Output stage ini:

repositorySnapshot:
  baseCommit: 4f6a921
  branch: agent/task-01j9-auth-client-upgrade
  buildSystem: maven
  languages:
    java: 0.82
    xml: 0.12
    yaml: 0.06
  protectedPaths:
    - .github/workflows/**
    - infra/**
    - secrets/**
  detectedCommands:
    compile: mvn -q -DskipTests compile
    test: mvn -q test

10. Stage 8 — Context collection

Agent tidak boleh membaca seluruh repository secara membabi buta.

Context collection bertujuan memberi agent konteks cukup untuk memulai tanpa memenuhi context window dengan noise.

Context awal biasanya mencakup:

task contract;
policy constraints;
repository summary;
relevant files;
build files;
dependency manifests;
error logs kalau ini repair task;
ownership/rules;
examples dari repo;
prior migration guide;
verifier commands.

Contoh context packet:

contextPacket:
  objective: Upgrade auth-client 2.x to 3.x
  relevantFiles:
    - pom.xml
    - src/main/java/com/acme/payment/security/AuthClientFactory.java
    - src/test/java/com/acme/payment/security/AuthClientFactoryTest.java
  repositoryRules:
    - Do not edit generated files
    - Use constructor injection
    - Prefer AssertJ in tests
  verifierCommands:
    - mvn -q -DskipTests compile
    - mvn -q test

Kesalahan umum:

Masukkan semua file penting ke prompt awal.

Lebih baik:

Mulai dengan map kecil, beri tool search/read, biarkan agent meminta file tambahan sesuai kebutuhan.

Ini membuat agent lebih hemat token, lebih traceable, dan lebih mudah dikontrol.

11. Stage 9 — Planning

Planning layer mengubah task menjadi langkah kerja.

Contoh plan:

Plan:
1. Inspect current dependency version and usages of AuthClient.
2. Read migration notes for auth-client 3.x if available.
3. Update pom.xml dependency version.
4. Compile to discover breaking changes.
5. Update affected call sites.
6. Add or update tests for changed behavior.
7. Run unit tests.
8. Produce PR summary with changed files and verification evidence.

Plan harus dianggap sebagai artifact, bukan hanya private reasoning.

Kenapa?

Karena plan dipakai untuk:

review trace;
debugging run gagal;
judge evaluation;
cost estimation;
stop condition;
learning dataset;
human approval.

Namun plan tidak boleh terlalu dipercaya. Agent boleh mengubah plan ketika evidence berubah, tetapi perubahan plan harus dicatat.

Rule:

Plan is mutable, but plan changes are auditable.

12. Stage 10 — Agentic edit loop

Ini bagian yang paling sering disalahpahami.

Agentic loop bukan “LLM menulis semua kode sekali”. Ia adalah loop:

observe -> decide -> tool call -> inspect result -> update plan -> continue

Diagram:

Tool calls harus typed dan logged.

Contoh tool call event:

{
  "runId": "run_01J",
  "step": 17,
  "tool": "file.patch",
  "input": {
    "path": "src/main/java/com/acme/payment/security/AuthClientFactory.java",
    "patchType": "unified_diff"
  },
  "result": {
    "status": "applied",
    "linesAdded": 12,
    "linesRemoved": 8
  }
}

Loop harus punya batas:

max steps;
max tool calls;
max command duration;
max modified files;
max diff size;
max verifier retries;
max cost;
blocked paths.

Tanpa batas, agent bisa terus memperbaiki error baru yang ia buat sendiri.

Stop condition yang sehat:

Stop when objective is satisfied and verifier evidence is sufficient.

Stop condition yang buruk:

Stop when model says done.

13. Stage 11 — Patch candidate

Patch candidate adalah keadaan workspace setelah agent selesai mengedit sebelum verifier final dan PR.

Patch candidate harus diekspor sebagai artifact:

unified diff;
changed file list;
added/deleted line counts;
touched modules;
dependency changes;
generated files;
lockfile changes;
test changes;
command history;
explanation draft.

Contoh summary:

patchCandidate:
  changedFiles:
    - pom.xml
    - src/main/java/com/acme/payment/security/AuthClientFactory.java
    - src/test/java/com/acme/payment/security/AuthClientFactoryTest.java
  stats:
    filesChanged: 3
    linesAdded: 34
    linesRemoved: 19
  categories:
    - dependency_manifest
    - production_code
    - test_code
  forbiddenPathTouched: false

Patch candidate belum tentu aman. Ia baru proposal.

Invariant:

Patch is not accepted until policy, verifier, and judge pass.

14. Stage 12 — Verification loop

Verifier mengubah pertanyaan subjektif menjadi evidence.

Pertanyaan subjektif:

Apakah perubahan ini benar?

Dipecah menjadi checks:

Apakah format valid?
Apakah compile berhasil?
Apakah unit test berhasil?
Apakah forbidden path tidak tersentuh?
Apakah dependency lock konsisten?
Apakah secret scan bersih?
Apakah public API tidak berubah?

Contoh verifier chain:

verifiers:
  - name: diff_policy
    command: internal
    required: true
  - name: format
    command: mvn -q spotless:check
    required: false
  - name: compile
    command: mvn -q -DskipTests compile
    required: true
  - name: unit_test
    command: mvn -q test
    required: true
  - name: secret_scan
    command: internal
    required: true

Verification result:

verification:
  status: failed
  failedChecks:
    - name: compile
      summary: Constructor AuthClientConfig(String) no longer exists
      repairHint: AuthClientConfig now requires builder API
  repairable: true

Kalau failure repairable, output verifier masuk kembali ke agent loop.

Kunci desain:

Verifier feedback must be short enough for LLM, complete enough for repair, and linked to raw logs for audit.

Jangan kirim 20.000 baris Maven log mentah ke LLM. Ringkas error yang relevan, sertakan file/line/symbol, dan simpan log penuh sebagai artifact.

15. Stage 13 — Diff review judge

Verifier membuktikan command pass. Judge mengevaluasi apakah perubahan masuk akal terhadap task.

Verifier bisa hijau tetapi diff salah.

Contoh:

agent menghapus test yang gagal;
agent mengganti assertion menjadi terlalu longgar;
agent mengubah public API tanpa diminta;
agent memodifikasi file infra;
agent menambahkan workaround tidak maintainable;
agent mengubah business behavior untuk membuat test pass.

Judge memeriksa:

Dimension	Pertanyaan
Scope	Apakah diff hanya menyentuh area yang dibolehkan?
Intent alignment	Apakah perubahan sesuai objective?
Minimality	Apakah perubahan tidak berlebihan?
Test integrity	Apakah test diperkuat, bukan dilemahkan?
Safety	Apakah ada secret, unsafe command, suspicious code?
Maintainability	Apakah solusi readable dan idiomatic?
Evidence	Apakah verifier cukup membuktikan perubahan?

Judge bisa deterministic, LLM-based, atau hybrid.

Contoh hybrid judge:

judge:
  deterministicChecks:
    - forbidden_path_check
    - deleted_test_check
    - public_api_diff_check
    - secret_scan
  llmChecks:
    - intent_alignment
    - maintainability
    - pr_summary_quality
  finalVerdict: accept

Important:

LLM judge is not a security boundary.

Gunakan LLM judge untuk semantic review assistance, bukan untuk menggantikan hard policy.

16. Stage 14 — Commit and PR creation

Jika verifier dan judge pass, sistem boleh membuat commit dan PR sesuai autonomy mode.

Commit message harus deterministic dan traceable:

chore(auth): upgrade auth-client to 3.x

Generated by AI Coding Agent run run_01J...
Task: task_01J...
Verification:
- mvn -q -DskipTests compile: passed
- mvn -q test: passed

PR body minimal:

## Summary
Upgrades `com.example:auth-client` from 2.x to 3.x and migrates affected call sites to the new builder-based configuration API.

## Changed files
- `pom.xml`
- `AuthClientFactory.java`
- `AuthClientFactoryTest.java`

## Verification
- `mvn -q -DskipTests compile` passed
- `mvn -q test` passed
- forbidden path check passed

## Agent metadata
- Task: `task_01J...`
- Run: `run_01J...`
- Base commit: `4f6a921...`
- Autonomy mode: `supervised_pr`

## Reviewer notes
Please verify that the new auth-client timeout defaults match production expectations.

PR bukan hanya output kode. PR adalah artifact komunikasi.

PR yang baik membantu reviewer menjawab:

apa yang berubah;
kenapa berubah;
evidence apa yang sudah dikumpulkan;
risiko apa yang tersisa;
area mana yang perlu human judgment.

17. Stage 15 — Outer CI and review

Inner verifier berjalan di sandbox agent. Outer CI berjalan di platform PR.

Keduanya punya fungsi berbeda:

Layer	Tujuan
Inner verifier	Cepat memberi feedback ke agent selama editing
Outer CI	Validasi resmi repository/platform sebelum merge
Human review	Validasi intent, maintainability, domain semantics

Jangan menganggap inner verifier menggantikan CI.

Inner verifier bisa punya environment berbeda. Ia mungkin tidak punya semua secret, integration dependency, atau deployment checks.

Flow setelah PR dibuat:

PR created
  -> CI starts
  -> reviewer notified
  -> agent monitors CI if allowed
  -> if CI fails and policy allows, agent creates follow-up commit
  -> if review comments actionable, agent may address comments
  -> merge remains human-controlled unless policy explicitly allows auto-merge

Untuk seri ini, default kita:

Agent may create PR.
Agent may update PR after verifier/CI feedback.
Human owns merge decision.

18. Stage 16 — Telemetry and learning artifacts

Setiap run menghasilkan data pembelajaran.

Bukan untuk “melatih model” secara otomatis, tetapi untuk meningkatkan platform:

task classification accuracy;
verifier quality;
prompt contracts;
repo instruction quality;
use case risk score;
policy thresholds;
cost prediction;
failure pattern library;
evaluation dataset.

Run artifact yang perlu disimpan:

artifacts:
  - normalized_task.yaml
  - policy_decision.yaml
  - repository_snapshot.yaml
  - plan.md
  - tool_calls.jsonl
  - patch.diff
  - verifier_results.json
  - raw_logs.tar.gz
  - judge_report.md
  - pr_metadata.json
  - final_run_summary.md

Learning record:

learningRecord:
  taskType: dependency_upgrade
  outcome: pr_created
  verifierRetries: 2
  humanReviewOutcome: changes_requested
  reviewThemes:
    - missing edge case test
    - PR summary needs rollout note
  futureImprovements:
    - add verifier for timeout default assertion
    - update prompt contract for auth-client migration

Inilah bedanya agent toy dan platform:

Toy agent executes.
Production platform learns from execution.

19. State transition view

End-to-end flow perlu state machine, bukan hanya procedural code.

State penting karena:

retry harus aman;
worker bisa crash;
user butuh status;
audit butuh history;
SLA butuh measurement;
scheduler butuh tahu run mana yang stuck.

Contoh state semantics:

State	Meaning	Retry safe?
`submitted`	Task diterima, belum diproses	yes
`normalizing`	Task sedang dibuat contract	yes, idempotent
`blocked`	Policy menolak task	no automatic retry
`queued`	Menunggu worker	yes
`preparing_sandbox`	Worker membuat environment	yes with cleanup
`running`	Agent loop aktif	maybe, needs checkpoint
`verifying`	Verifier berjalan	yes
`repairing`	Agent memperbaiki failure	bounded retry
`judging`	Diff dievaluasi	yes
`approval_required`	Menunggu human decision	no automatic progress
`creating_pr`	Branch/commit/PR dibuat	needs idempotency key
`completed`	Run selesai dengan outcome	no
`failed`	Run gagal terminal	manual or policy-based rerun

20. Error handling model

Setiap failure harus masuk kategori.

Failure	Example	Action
User input invalid	repo tidak ditemukan	return validation error
Policy denied	task menyentuh secrets	block
Sandbox setup failed	image tidak tersedia	retry infra
Tool failure	command timeout	retry or summarize
Compile failure	API breaking change	repair loop
Test failure	assertion gagal	repair loop if in scope
Diff policy violation	protected file modified	revert or fail
Judge rejection	overbroad change	repair or fail
PR creation failed	branch exists	idempotent retry
Cost exceeded	token budget habis	stop with partial artifact

Jangan treat semua failure sebagai “try again”.

Retry buta berbahaya karena:

biaya naik;
state makin kacau;
diff makin besar;
reviewer sulit memahami;
agent bisa memperbaiki error dengan cara destruktif.

Rule:

Retry only when failure class is retryable and retry budget remains.

21. Idempotency model

Background agent sangat rentan duplikasi:

user klik submit dua kali;
queue redelivers message;
worker crash setelah commit tapi sebelum update DB;
PR creation timeout padahal PR sudah dibuat;
scheduler restart.

Karena itu setiap side effect butuh idempotency key.

Contoh:

idempotency:
  taskSubmitKey: requesterId + repository + objectiveHash
  runKey: taskId + baseCommit + attemptNumber
  branchName: agent/task-01j9-auth-client-upgrade
  commitTrailer: Agent-Run: run_01J...
  prSearchKey: Agent-Task: task_01J...

PR creation harus bisa melakukan:

create if not exists
else attach existing PR to run

Bukan:

always create new PR

Untuk fleet platform, idempotency menentukan apakah sistem akan membuat 10 PR rapi atau 10.000 PR duplikat.

22. Human touchpoints

Human-in-the-loop bukan berarti “tanya manusia setiap 5 menit”.

Human touchpoint yang sehat:

sebelum task high-risk dijalankan;
ketika policy tidak bisa menentukan risiko;
ketika agent butuh domain decision;
sebelum PR untuk critical repo;
saat judge menemukan ambiguity;
saat CI gagal karena external dependency;
saat review comment butuh keputusan produk;
sebelum merge.

Contoh escalation message:

Agent cannot safely continue.

Reason:
`AuthClientConfig.timeout` default changed between v2 and v3. Existing tests do not specify expected timeout behavior.

Options:
1. Preserve old timeout explicitly in migrated config.
2. Accept new v3 default.
3. Stop migration for this repository.

Recommended: option 1, because task constraint says preserve authentication behavior.

Agent yang baik tidak selalu lanjut. Kadang agent yang baik berhenti dengan pertanyaan yang tepat.

23. Minimal implementation for the first prototype

Meskipun seri ini menuju production-grade, prototype pertama tidak perlu semua fitur.

Minimal viable flow:

1. Accept task contract from CLI/API.
2. Clone repository into local sandbox directory.
3. Run agent loop with file read/search/patch and shell command tools.
4. Run verifier command.
5. Produce diff and run summary.
6. Stop before PR creation.

Kemudian naik level:

Level 1: Local agent creates patch only.
Level 2: Agent creates branch and local commit.
Level 3: Agent opens draft PR.
Level 4: Agent reacts to verifier and CI feedback.
Level 5: Agent runs fleet-wide with batching and governance.

Jangan langsung bangun Level 5.

Reasoning-nya sederhana:

If a single-repo patch-only run is not safe and explainable,
a fleet-wide PR platform will only amplify the problem.

24. Reference sequence diagram

25. Checklist: end-to-end flow readiness

Sebelum implementasi, pastikan jawaban untuk pertanyaan ini jelas:

Apa bentuk task contract?
Siapa yang menentukan risk level?
Apa autonomy mode yang tersedia?
Apa state machine run?
Apa sandbox boundary?
Tool apa yang tersedia untuk agent?
Command apa yang boleh dijalankan?
Bagaimana repository snapshot dibuat immutable?
Bagaimana context awal dipilih?
Apa stop condition agent?
Apa verifier minimum?
Bagaimana raw log disimpan dan diringkas?
Apa bedanya verifier dan judge?
Apa idempotency key untuk PR creation?
Kapan human approval dibutuhkan?
Artifact apa yang harus disimpan?
Bagaimana run gagal bisa direplay?

Jika belum bisa menjawab ini, jangan mulai dari coding LLM loop. Mulai dari flow dan contract.

26. What we will build next

Part berikutnya akan masuk ke project skeleton dan repository layout.

Kita akan membuat struktur yang memisahkan:

API/control plane;
domain model;
orchestrator;
worker;
sandbox;
agent runtime;
tool runtime;
verifier;
judge;
git provider integration;
persistence;
telemetry.

Tujuannya bukan membuat folder banyak agar terlihat enterprise. Tujuannya agar setiap boundary jelas:

Policy is not tool runtime.
Verifier is not judge.
Sandbox is not orchestrator.
Agent plan is not system authority.
PR creation is not patch generation.

Ringkasan

End-to-end flow Honk-like AI coding agent adalah pipeline perubahan kode yang dikontrol, bukan sekadar prompt panjang.

Flow minimalnya:

Task -> Normalize -> Policy -> Resolve Repo -> Create Run -> Sandbox -> Context -> Plan -> Agent Loop -> Patch -> Verify -> Judge -> PR -> Review -> Telemetry

Invariant utama:

Every autonomous code change must be scoped, reproducible, verifiable, reviewable, and auditable.

Kalau sistem hanya bisa mengedit file tetapi tidak bisa menjelaskan scope, evidence, policy decision, dan failure path, sistem itu belum layak disebut production-grade coding agent.

Lesson Recap

You just completed lesson 09 in start here. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 08

Learn Ai Coding Agent Part 008 Threat Model And Failure Model

Next Lesson

Lesson 10

Learn Ai Coding Agent Part 010 Project Skeleton And Repository Layout