Build CoreOrdered learning track

Learn Ai Coding Agent Part 013 State Machine For Agent Runs

16 min read3164 words
PrevNext
Lesson 1364 lesson track13–35 Build Core

title: Learn AI Coding Agent From Scratch - Part 013 description: State machine agent run untuk Honk-like AI coding agent: lifecycle task, run, attempt, step, terminal state, retry, cancellation, timeout, lease, idempotency, dan invariant transisi. series: learn-ai-coding-agent seriesTitle: Learn AI Coding Agent From Scratch order: 13 partTitle: State Machine for Agent Runs tags:

  • ai-coding-agent
  • state-machine
  • orchestration
  • workflow
  • retry
  • idempotency
  • sandbox
  • verifier
  • pr-orchestration date: 2026-07-03

Part 013 — State Machine for Agent Runs

Kita sudah punya domain model pada Part 012. Sekarang kita harus memberi domain model itu gerak yang sah.

AI coding agent bukan fungsi sederhana:

input prompt -> output patch

Model itu terlalu naif. Sistem nyata punya queue, worker, sandbox, timeout, retry, cancellation, verifier, judge, human approval, PR creation, dan audit trail. Tanpa state machine, semua hal itu akan berubah menjadi kumpulan boolean yang saling bertabrakan:

isRunning = true
isFinished = false
hasPatch = true
isVerified = false
isPrCreated = true
isCancelled = true

Status seperti itu tidak bisa dipertanggungjawabkan.

State machine membuat lifecycle agent menjadi eksplisit:

  • state apa yang mungkin terjadi;
  • transisi mana yang legal;
  • siapa yang boleh memicu transisi;
  • event apa yang harus dicatat;
  • resource apa yang harus dilepas;
  • retry mana yang aman;
  • kapan run dianggap terminal;
  • kapan manusia harus mengambil keputusan.

Untuk Honk-like background coding agent, state machine bukan detail implementasi. Ia adalah kontrak keselamatan operasional.


1. Tujuan state machine

State machine agent run menyelesaikan lima masalah utama.

Pertama, ia mencegah lifecycle kabur. Worker tidak boleh langsung membuat PR kalau task belum melewati policy gate. Verifier tidak boleh berjalan kalau tidak ada patch. Judge tidak boleh memberi verdict kalau verification report belum ada.

Kedua, ia membuat retry bisa dikontrol. Retry dari kegagalan clone berbeda dengan retry dari kegagalan test. Retry karena LLM timeout berbeda dengan retry karena policy violation.

Ketiga, ia membuat cancellation aman. Membatalkan task tidak sama dengan membunuh container secara kasar tanpa mencatat artifact. Sistem harus tahu apakah sedang aman untuk dihentikan, apakah perlu cleanup, dan apakah patch parsial harus disimpan.

Keempat, ia membuat observability menjadi terstruktur. Daripada membaca log mentah, kita bisa melihat transisi:

QUEUED -> PREPARING -> RUNNING -> VERIFYING -> JUDGING -> PR_CREATED

Kelima, ia memberi dasar untuk governance. Tim platform bisa menetapkan policy seperti:

Only low-risk tasks can transition from JUDGED_PASS to PR_CREATING automatically.
High-risk tasks must transition to HUMAN_APPROVAL_REQUIRED.

2. Prinsip desain state machine

State machine untuk AI coding agent harus memenuhi prinsip berikut.

2.1 State harus merepresentasikan kondisi sistem, bukan mood agent

Jangan membuat state seperti:

THINKING
CONFUSED
TRYING_HARDER

State harus observable dari luar dan punya konsekuensi operasional.

Lebih baik:

PLANNING
EDITING
VERIFYING
WAITING_FOR_APPROVAL

2.2 State harus punya owner yang jelas

Setiap state harus jelas dikendalikan oleh siapa:

State groupOwner utama
IntakeAPI/control plane
Queuescheduler
Executionworker/execution plane
Verificationverifier runtime
Judgejudge service
Approvalhuman/platform policy
PRgit provider integration
Terminalcontrol plane

Kalau owner tidak jelas, dua proses bisa mencoba mengubah state yang sama.

2.3 Terminal state harus final

Terminal state tidak boleh keluar lagi ke state aktif. Kalau task perlu dijalankan ulang, buat Run baru, bukan menghidupkan kembali run lama.

Contoh terminal state:

SUCCEEDED
FAILED
CANCELLED
REJECTED
EXPIRED

2.4 Transisi harus divalidasi secara server-side

Client, worker, atau LLM tidak boleh bebas menulis status.

Jangan sediakan API seperti:

PATCH /runs/{id}
{ "status": "PR_CREATED" }

Lebih aman:

POST /runs/{id}/events
{ "type": "verification_passed", "verificationReportId": "..." }

Control plane yang menentukan state berikutnya.

2.5 State transition harus atomic

Transisi state biasanya harus terjadi bersama update metadata penting.

Contoh:

RUNNING -> VERIFYING

harus disimpan bersama:

  • patch id;
  • diff summary;
  • artifact location;
  • worker id;
  • timestamp;
  • attempt number.

Kalau state berubah tanpa artifact, sistem akan kehilangan evidence.


3. Dua lifecycle: Task dan Run

Sebelum membuat state machine, pisahkan dua lifecycle:

Task = permintaan kerja dari user/platform
Run  = satu eksekusi konkret untuk mencoba menyelesaikan Task

Satu Task bisa punya banyak Run.

Contoh:

Task: Upgrade service A from library X 1.2 to 1.5
Run #1: failed because repository checkout failed
Run #2: failed because tests revealed breaking API
Run #3: succeeded and created PR

Jangan mencampur state task dan state run.

Task menjawab:

Apakah permintaan ini masih aktif, ditolak, selesai, atau menunggu approval?

Run menjawab:

Apa status satu percobaan eksekusi tertentu?

4. Task state machine

Task state machine relatif kecil.

4.1 Task states

StateMeaningTerminal?
DRAFTTask dibuat tapi belum disubmitNo
SUBMITTEDTask diterima APINo
VALIDATINGContract dan policy sedang diperiksaNo
READYTask layak dibuatkan runNo
RUNNINGAda run aktif atau scheduledNo
WAITING_FOR_APPROVALButuh keputusan manusiaNo
COMPLETEDTask selesai dengan hasil sahYes
FAILEDTask gagal setelah retry/attempt habisYes
REJECTEDTask ditolak sebelum eksekusiYes
CANCELLEDTask dibatalkanYes

4.2 Kenapa Task tidak punya state PR_CREATED?

Karena PR adalah output dari Run, bukan selalu definisi selesai Task.

Ada task yang selesai ketika:

  • analysis report dibuat;
  • patch dibuat tapi tidak di-PR;
  • PR dibuat;
  • PR merged;
  • migration batch selesai untuk banyak repo.

Task completion criterion harus berasal dari task contract:

completionMode: pr_created

atau:

completionMode: patch_ready_for_review

atau:

completionMode: analysis_report_created

5. Run state machine

Run state machine lebih kaya. Ini lifecycle satu attempt eksekusi nyata.

Ini bukan satu-satunya possible design. Tetapi ini cukup kuat untuk platform produksi.


6. Run state catalog

6.1 CREATED

Run record sudah dibuat, tapi belum masuk queue.

Gunakan state ini untuk memastikan pembuatan run dan event awal bisa atomic.

Invariant:

run.task_id exists
run.attempt_no assigned
run.status = CREATED
no worker lease exists

Transisi legal:

CREATED -> QUEUED
CREATED -> CANCELLED

6.2 QUEUED

Run siap diambil worker.

Invariant:

run has queue record
run has no active worker lease
run not terminal

Transisi legal:

QUEUED -> LEASED
QUEUED -> CANCELLED
QUEUED -> EXPIRED

Jangan menjalankan pekerjaan langsung di API request. API hanya submit dan enqueue.

6.3 LEASED

Worker sudah mengambil run tetapi belum mulai eksekusi penuh.

State ini penting untuk distributed worker. Tanpa lease, dua worker bisa menjalankan run yang sama.

Invariant:

lease.owner_worker_id is not null
lease.expires_at > now
run.status = LEASED

Transisi legal:

LEASED -> PREPARING
LEASED -> CANCELLED
LEASED -> EXPIRED

6.4 PREPARING

Worker menyiapkan sandbox, repo clone, dependency cache, environment, dan policy runtime.

Invariant:

sandbox allocation exists
repository checkout is in progress or done
no code edit has happened yet

Transisi legal:

PREPARING -> CONTEXT_BUILDING
PREPARING -> FAILED
PREPARING -> CANCELLED

Kegagalan di state ini biasanya bukan kesalahan LLM. Contoh:

  • repository tidak bisa di-clone;
  • branch target tidak ada;
  • Git provider token invalid;
  • image sandbox gagal dibuat;
  • disk quota habis;
  • package registry unreachable.

6.5 CONTEXT_BUILDING

Sistem membuat repository map, membaca instruction file, menyiapkan search index, dan memilih context awal.

Invariant:

repository exists in sandbox
agent has not modified files yet
context artifacts may be produced

Transisi legal:

CONTEXT_BUILDING -> PLANNING
CONTEXT_BUILDING -> FAILED
CONTEXT_BUILDING -> CANCELLED

6.6 PLANNING

Agent menyusun rencana perubahan.

Invariant:

task contract loaded
policy loaded
repo context available
plan artifact must be produced before leaving PLANNING successfully

Transisi legal:

PLANNING -> EXECUTING
PLANNING -> FAILED
PLANNING -> CANCELLED

Planning tidak harus sempurna. Tetapi rencana minimal harus menjawab:

  • file mana yang mungkin disentuh;
  • verifier apa yang akan dijalankan;
  • risiko apa yang terdeteksi;
  • stop condition apa yang dipakai;
  • perubahan apa yang tidak boleh dilakukan.

6.7 EXECUTING

Agent sedang membaca file, menjalankan search, mengedit kode, menjalankan command ringan, atau memperbaiki error.

Invariant:

active sandbox exists
tool calls are recorded
file modifications are allowed only inside workspace boundary

Transisi legal:

EXECUTING -> EXECUTING
EXECUTING -> PATCH_READY
EXECUTING -> FAILED
EXECUTING -> CANCELLED

Self-transition dipakai karena agentic loop berulang:

observe -> act -> observe -> act

Tetapi self-transition tetap harus dicatat sebagai Step, bukan mengubah Run status setiap tool call.

6.8 PATCH_READY

Agent menghasilkan diff yang dianggap kandidat untuk diverifikasi.

Invariant:

patch artifact exists
changed files list exists
no PR exists yet

Transisi legal:

PATCH_READY -> VERIFYING
PATCH_READY -> CANCELLED
PATCH_READY -> FAILED

State ini memisahkan "agent selesai mengedit" dari "perubahan benar".

6.9 VERIFYING

Verifier menjalankan format, lint, compile, test, static analysis, atau verifier domain-specific.

Invariant:

patch exists
verification command set determined
verification report will be produced

Transisi legal:

VERIFYING -> EXECUTING
VERIFYING -> JUDGING
VERIFYING -> FAILED
VERIFYING -> CANCELLED

Kalau verifier gagal tetapi bisa diperbaiki, kembali ke EXECUTING dengan feedback.

Contoh:

VERIFYING -> EXECUTING because maven compile failed with missing import

Kalau verifier gagal non-retryable, masuk FAILED.

Contoh:

task asks to upgrade library but target version does not exist

6.10 JUDGING

Judge menilai apakah patch benar-benar memenuhi task contract, tidak overreach, dan punya evidence cukup.

Judge bisa deterministic, LLM-based, atau hybrid.

Invariant:

verification report exists
patch exists
judge report must be created before transition out

Transisi legal:

JUDGING -> EXECUTING
JUDGING -> WAITING_FOR_APPROVAL
JUDGING -> PR_CREATING
JUDGING -> FAILED
JUDGING -> CANCELLED

6.11 WAITING_FOR_APPROVAL

Sistem berhenti menunggu keputusan manusia.

Invariant:

patch exists or report exists
approval request exists
no further code mutation while waiting

Transisi legal:

WAITING_FOR_APPROVAL -> PR_CREATING
WAITING_FOR_APPROVAL -> EXECUTING
WAITING_FOR_APPROVAL -> CANCELLED
WAITING_FOR_APPROVAL -> FAILED

Kadang reviewer tidak hanya approve/reject, tetapi memberi instruction tambahan:

Looks good, but do not update generated files.

Maka transisi bisa kembali ke EXECUTING dengan revised instruction.

6.12 PR_CREATING

Sistem membuat branch, commit, dan pull request.

Invariant:

patch exists
commit message generated and approved by policy
git provider credentials scoped
no additional code changes should happen except git metadata operations

Transisi legal:

PR_CREATING -> PR_CREATED
PR_CREATING -> FAILED
PR_CREATING -> CANCELLED

6.13 PR_CREATED

Pull request sudah dibuat.

Invariant:

pull_request_url exists
commit_sha exists
branch_name exists
pr body contains evidence summary

Transisi legal:

PR_CREATED -> SUCCEEDED
PR_CREATED -> FAILED

Dalam desain sederhana, PR_CREATED bisa langsung terminal. Namun lebih fleksibel menjadikannya intermediate agar completion mode bisa diperiksa.

6.14 SUCCEEDED

Run selesai sukses.

Invariant:

terminal = true
no active lease
no active sandbox required
final verdict exists

6.15 FAILED

Run gagal.

Invariant:

terminal = true
failure reason exists
failure category exists
retryability classified

6.16 CANCELLED

Run dibatalkan oleh user, policy, admin, atau parent task.

Invariant:

terminal = true
cancel reason exists
cleanup attempted

6.17 EXPIRED

Run mati karena timeout queue, lease, atau heartbeat.

Invariant:

terminal = true
expiration reason exists

EXPIRED berbeda dari FAILED. Expired sering berarti sistem kehilangan kepastian, bukan pekerjaan terbukti gagal secara semantik.


7. Transition table

Tabel berikut adalah versi ringkas dari transisi legal.

FromEventToActor
CREATEDrun_enqueuedQUEUEDscheduler
QUEUEDlease_acquiredLEASEDworker
LEASEDworker_startedPREPARINGworker
PREPARINGsandbox_readyCONTEXT_BUILDINGworker
CONTEXT_BUILDINGcontext_readyPLANNINGworker
PLANNINGplan_createdEXECUTINGagent runtime
EXECUTINGpatch_createdPATCH_READYagent runtime
PATCH_READYverification_startedVERIFYINGverifier
VERIFYINGverification_failed_retryableEXECUTINGverifier/orchestrator
VERIFYINGverification_passedJUDGINGverifier
JUDGINGjudge_requires_fixEXECUTINGjudge/orchestrator
JUDGINGhuman_approval_requiredWAITING_FOR_APPROVALpolicy
JUDGINGauto_pr_allowedPR_CREATINGpolicy
WAITING_FOR_APPROVALapprovedPR_CREATINGhuman
PR_CREATINGpr_createdPR_CREATEDgit integration
PR_CREATEDcompletion_condition_metSUCCEEDEDcontrol plane
active statecancel_requestedCANCELLEDuser/admin/policy
active statenon_retryable_failureFAILEDowner service
queue/lease statetimeoutEXPIREDscheduler

8. Event-driven transition, bukan direct status write

Desain buruk:

UPDATE runs SET status = 'SUCCEEDED' WHERE id = ?;

Kenapa buruk?

Karena tidak ada informasi:

  • siapa yang mengubah;
  • event apa yang terjadi;
  • evidence apa yang mendukung;
  • apakah transisi legal;
  • apa side effect-nya;
  • apakah event sudah pernah diproses.

Desain lebih baik:

append event -> validate transition -> update aggregate -> emit side effects

Contoh event:

{
  "eventId": "evt_01J...",
  "runId": "run_01J...",
  "type": "verification_passed",
  "actor": "verifier-service",
  "occurredAt": "2026-07-03T10:20:30Z",
  "payload": {
    "verificationReportId": "vr_01J...",
    "commands": ["mvn -q test"],
    "durationMs": 84231
  }
}

Control plane memutuskan:

current state = VERIFYING
incoming event = verification_passed
next state = JUDGING

9. Transition validator

Implementasikan state machine sebagai library kecil di domain layer.

Contoh pseudo-Java:

public enum RunStatus {
    CREATED,
    QUEUED,
    LEASED,
    PREPARING,
    CONTEXT_BUILDING,
    PLANNING,
    EXECUTING,
    PATCH_READY,
    VERIFYING,
    JUDGING,
    WAITING_FOR_APPROVAL,
    PR_CREATING,
    PR_CREATED,
    SUCCEEDED,
    FAILED,
    CANCELLED,
    EXPIRED;

    public boolean isTerminal() {
        return switch (this) {
            case SUCCEEDED, FAILED, CANCELLED, EXPIRED -> true;
            default -> false;
        };
    }
}

Event enum:

public enum RunEventType {
    RUN_ENQUEUED,
    LEASE_ACQUIRED,
    WORKER_STARTED,
    SANDBOX_READY,
    CONTEXT_READY,
    PLAN_CREATED,
    PATCH_CREATED,
    VERIFICATION_STARTED,
    VERIFICATION_FAILED_RETRYABLE,
    VERIFICATION_PASSED,
    JUDGE_REQUIRES_FIX,
    HUMAN_APPROVAL_REQUIRED,
    AUTO_PR_ALLOWED,
    APPROVED,
    PR_CREATED,
    COMPLETION_CONDITION_MET,
    CANCEL_REQUESTED,
    NON_RETRYABLE_FAILURE,
    TIMEOUT
}

Transition function:

public final class RunStateMachine {
    public RunStatus next(RunStatus current, RunEventType event) {
        if (current.isTerminal()) {
            throw new IllegalTransitionException(current, event);
        }

        return switch (current) {
            case CREATED -> switch (event) {
                case RUN_ENQUEUED -> RunStatus.QUEUED;
                case CANCEL_REQUESTED -> RunStatus.CANCELLED;
                default -> illegal(current, event);
            };
            case QUEUED -> switch (event) {
                case LEASE_ACQUIRED -> RunStatus.LEASED;
                case CANCEL_REQUESTED -> RunStatus.CANCELLED;
                case TIMEOUT -> RunStatus.EXPIRED;
                default -> illegal(current, event);
            };
            case LEASED -> switch (event) {
                case WORKER_STARTED -> RunStatus.PREPARING;
                case CANCEL_REQUESTED -> RunStatus.CANCELLED;
                case TIMEOUT -> RunStatus.EXPIRED;
                default -> illegal(current, event);
            };
            case PREPARING -> switch (event) {
                case SANDBOX_READY -> RunStatus.CONTEXT_BUILDING;
                case NON_RETRYABLE_FAILURE -> RunStatus.FAILED;
                case CANCEL_REQUESTED -> RunStatus.CANCELLED;
                default -> illegal(current, event);
            };
            case CONTEXT_BUILDING -> switch (event) {
                case CONTEXT_READY -> RunStatus.PLANNING;
                case NON_RETRYABLE_FAILURE -> RunStatus.FAILED;
                case CANCEL_REQUESTED -> RunStatus.CANCELLED;
                default -> illegal(current, event);
            };
            case PLANNING -> switch (event) {
                case PLAN_CREATED -> RunStatus.EXECUTING;
                case NON_RETRYABLE_FAILURE -> RunStatus.FAILED;
                case CANCEL_REQUESTED -> RunStatus.CANCELLED;
                default -> illegal(current, event);
            };
            case EXECUTING -> switch (event) {
                case PATCH_CREATED -> RunStatus.PATCH_READY;
                case NON_RETRYABLE_FAILURE -> RunStatus.FAILED;
                case CANCEL_REQUESTED -> RunStatus.CANCELLED;
                default -> illegal(current, event);
            };
            case PATCH_READY -> switch (event) {
                case VERIFICATION_STARTED -> RunStatus.VERIFYING;
                case NON_RETRYABLE_FAILURE -> RunStatus.FAILED;
                case CANCEL_REQUESTED -> RunStatus.CANCELLED;
                default -> illegal(current, event);
            };
            case VERIFYING -> switch (event) {
                case VERIFICATION_FAILED_RETRYABLE -> RunStatus.EXECUTING;
                case VERIFICATION_PASSED -> RunStatus.JUDGING;
                case NON_RETRYABLE_FAILURE -> RunStatus.FAILED;
                case CANCEL_REQUESTED -> RunStatus.CANCELLED;
                default -> illegal(current, event);
            };
            case JUDGING -> switch (event) {
                case JUDGE_REQUIRES_FIX -> RunStatus.EXECUTING;
                case HUMAN_APPROVAL_REQUIRED -> RunStatus.WAITING_FOR_APPROVAL;
                case AUTO_PR_ALLOWED -> RunStatus.PR_CREATING;
                case NON_RETRYABLE_FAILURE -> RunStatus.FAILED;
                case CANCEL_REQUESTED -> RunStatus.CANCELLED;
                default -> illegal(current, event);
            };
            case WAITING_FOR_APPROVAL -> switch (event) {
                case APPROVED -> RunStatus.PR_CREATING;
                case CANCEL_REQUESTED -> RunStatus.CANCELLED;
                case NON_RETRYABLE_FAILURE -> RunStatus.FAILED;
                default -> illegal(current, event);
            };
            case PR_CREATING -> switch (event) {
                case PR_CREATED -> RunStatus.PR_CREATED;
                case NON_RETRYABLE_FAILURE -> RunStatus.FAILED;
                case CANCEL_REQUESTED -> RunStatus.CANCELLED;
                default -> illegal(current, event);
            };
            case PR_CREATED -> switch (event) {
                case COMPLETION_CONDITION_MET -> RunStatus.SUCCEEDED;
                case NON_RETRYABLE_FAILURE -> RunStatus.FAILED;
                default -> illegal(current, event);
            };
            case SUCCEEDED, FAILED, CANCELLED, EXPIRED -> illegal(current, event);
        };
    }

    private RunStatus illegal(RunStatus current, RunEventType event) {
        throw new IllegalTransitionException(current, event);
    }
}

Catatan penting: ini contoh domain-level validator, bukan seluruh orchestration engine. Side effect seperti enqueue, cleanup, PR creation, dan event emission tetap dikelola service layer.


10. State machine dengan guard

Transisi tidak cukup hanya berdasarkan current state dan event. Kita butuh guard.

Contoh:

JUDGING + AUTO_PR_ALLOWED -> PR_CREATING

Transisi ini hanya legal kalau:

  • task risk level rendah atau sedang;
  • verifier pass;
  • judge confidence memenuhi threshold;
  • changed files tidak menyentuh forbidden paths;
  • no secret detected;
  • no generated file policy violation;
  • budget tidak terlampaui;
  • user/task policy mengizinkan auto PR.

Representasi guard:

public record TransitionContext(
    RiskLevel riskLevel,
    boolean verifierPassed,
    boolean judgePassed,
    boolean secretScanPassed,
    boolean changedForbiddenPath,
    boolean autoPrAllowedByTask,
    boolean budgetExceeded
) {}

Transition rule:

public boolean canAutoCreatePr(TransitionContext ctx) {
    return ctx.verifierPassed()
        && ctx.judgePassed()
        && ctx.secretScanPassed()
        && !ctx.changedForbiddenPath()
        && ctx.autoPrAllowedByTask()
        && !ctx.budgetExceeded()
        && ctx.riskLevel() != RiskLevel.HIGH
        && ctx.riskLevel() != RiskLevel.CRITICAL;
}

State machine tanpa guard akan terlalu permisif.


11. Step lifecycle

Run state menggambarkan lifecycle besar. Step menggambarkan aksi granular.

Step bisa punya state sendiri:

Step bukan pengganti run state. Step adalah trace.

Contoh step:

Step 001: read repository instructions
Step 002: search usages of deprecated API
Step 003: edit src/main/java/.../Foo.java
Step 004: run mvn -q -DskipTests compile
Step 005: fix compile error
Step 006: run mvn -q test

Run state mungkin tetap EXECUTING sepanjang Step 001–005.


12. Attempt lifecycle

Attempt berbeda dari retry internal.

Gunakan Attempt untuk percobaan besar yang pantas dilihat manusia.

Contoh:

Run 1 Attempt 1:
  agent tries direct edit
  verifier fails
  agent repairs
  verifier fails again
  attempt exhausted

Run 1 Attempt 2:
  agent starts from clean checkout
  uses different strategy
  succeeds

Attempt punya state:

STARTED
SUCCEEDED
FAILED
ABORTED

Invariant Attempt:

attempt belongs to one run
attempt has monotonic attempt_no
attempt has bounded max_steps
attempt has bounded max_cost
attempt has bounded max_duration

Jangan membuat retry tanpa batas. AI agent yang tidak punya attempt budget akan menjadi cost sink.


13. Lease dan heartbeat

Dalam distributed worker, state LEASED dan heartbeat wajib ada.

Tanpa lease:

Worker A picks run_123
Worker B also picks run_123
Both edit same repo
Both create different PRs

Lease record minimal:

CREATE TABLE run_leases (
    run_id TEXT PRIMARY KEY,
    worker_id TEXT NOT NULL,
    lease_token TEXT NOT NULL,
    acquired_at TIMESTAMPTZ NOT NULL,
    expires_at TIMESTAMPTZ NOT NULL,
    heartbeat_at TIMESTAMPTZ NOT NULL
);

Worker harus memperpanjang lease:

every 15s: heartbeat(run_id, lease_token)

Control plane hanya menerima event dari worker yang punya lease token valid.

if event.lease_token != active_lease.lease_token:
    reject event

Ini mencegah zombie worker mengirim hasil setelah lease hilang.


14. Idempotency

Distributed system akan mengirim event ganda. API client bisa retry. Worker bisa timeout setelah berhasil mengirim event tetapi sebelum menerima response.

Maka transition event harus idempotent.

Gunakan event_id atau idempotency_key.

CREATE TABLE run_events (
    event_id TEXT PRIMARY KEY,
    run_id TEXT NOT NULL,
    event_type TEXT NOT NULL,
    from_status TEXT NOT NULL,
    to_status TEXT NOT NULL,
    actor_type TEXT NOT NULL,
    actor_id TEXT NOT NULL,
    idempotency_key TEXT,
    payload JSONB NOT NULL,
    occurred_at TIMESTAMPTZ NOT NULL,
    recorded_at TIMESTAMPTZ NOT NULL,
    UNIQUE (run_id, idempotency_key)
);

Kalau event dengan idempotency key sama diterima lagi, response harus sama, bukan membuat transition baru.

POST /runs/run_123/events
Idempotency-Key: verifier-run_123-report_456

15. Optimistic concurrency

State transition harus menjaga versi aggregate.

Kolom version:

ALTER TABLE runs ADD COLUMN version BIGINT NOT NULL DEFAULT 0;

Update:

UPDATE runs
SET status = :next_status,
    version = version + 1,
    updated_at = now()
WHERE id = :run_id
  AND status = :expected_status
  AND version = :expected_version;

Kalau affected rows = 0, berarti ada concurrent update.

Jangan diam-diam overwrite.


16. Failure classification

State FAILED saja tidak cukup. Kita butuh failure category.

CategoryMeaningRetry?
INFRASTRUCTUREWorker, sandbox, network, disk, provider downOften
REPOSITORY_ACCESSClone/auth/branch errorSometimes
POLICY_DENIEDDiblok policyNo
PLANNING_FAILEDAgent tidak bisa membuat plan validSometimes
TOOL_FAILEDTool call gagalDepends
VERIFICATION_FAILEDBuild/test/lint gagalSometimes
JUDGE_REJECTEDPatch tidak memenuhi taskSometimes
BUDGET_EXCEEDEDCost/step/time limit habisMaybe with approval
PR_CREATION_FAILEDGit provider errorOften
USER_CANCELLEDUser cancelNo
UNKNOWNTidak terklasifikasiManual review

Failure detail:

{
  "category": "VERIFICATION_FAILED",
  "retryable": true,
  "reason": "maven_compile_error",
  "summary": "Compilation failed because FooService still imports removed package",
  "evidenceArtifactId": "artifact_01J..."
}

State machine menentukan terminal. Failure classification menentukan next action.


17. Retry policy

Retry jangan disamakan dengan while not success.

Retry harus mempertimbangkan:

  • state tempat gagal;
  • failure category;
  • jumlah attempt;
  • cost yang sudah dipakai;
  • task risk;
  • apakah repo berubah sejak run dimulai;
  • apakah failure deterministik;
  • apakah instruksi perlu diubah.

Contoh retry matrix:

Failed stateFailure categoryRetry action
PREPARINGINFRASTRUCTURERetry same run or new run after backoff
PREPARINGREPOSITORY_ACCESSFail unless credential refreshed
PLANNINGPLANNING_FAILEDRetry with smaller context once
EXECUTINGBUDGET_EXCEEDEDStop or require approval
VERIFYINGVERIFICATION_FAILEDReturn to EXECUTING if budget remains
JUDGINGJUDGE_REJECTEDReturn to EXECUTING with judge feedback
PR_CREATINGINFRASTRUCTURERetry PR creation idempotently

Backoff:

retry_delay = min(base * 2^attempt, max_delay)

Tapi untuk agent loop, retry bukan hanya delay. Kadang harus mengubah strategy.


18. Cancellation model

Cancellation harus bisa terjadi di banyak state, tetapi efeknya berbeda.

Current stateCancellation behavior
QUEUEDRemove from queue
LEASEDMark cancel requested; worker should stop before work
PREPARINGStop sandbox setup; cleanup partial resources
EXECUTINGStop after current safe point; preserve logs and diff
VERIFYINGStop command if safe; preserve partial report
WAITING_FOR_APPROVALClose approval request
PR_CREATINGHarder; may need reconcile if PR already created
PR_CREATEDUsually do not cancel run; create follow-up action if PR should close

Cancellation should be cooperative first, forceful second.

cancel_requested -> worker receives signal -> current tool call stops at boundary -> cleanup -> CANCELLED

For shell commands, use process group termination and timeout escalation.


19. Timeout model

Ada beberapa timeout yang berbeda.

TimeoutScopeResult
Queue timeoutRun terlalu lama menunggu workerEXPIRED
Lease timeoutWorker hilang heartbeatEXPIRED or retry
Step timeoutTool call terlalu lamaStep TIMED_OUT, run may continue/fail
Command timeoutShell command terlalu lamaStep fail, maybe retry
Attempt timeoutSatu attempt terlalu lamaAttempt fail
Run timeoutSeluruh run terlalu lamaFAILED or EXPIRED
Approval timeoutManusia tidak meresponscancel/fail/escalate

Jangan pakai satu global timeout untuk semua. Itu membuat diagnosis buruk.


20. Sandbox lifecycle and state machine

Sandbox lifecycle terkait erat dengan Run state.

Rule praktis:

Run terminal does not mean artifacts are gone.
Run terminal means mutation is no longer allowed.

Artifact harus disimpan sebelum sandbox dihancurkan.


21. PR lifecycle is not the same as Run lifecycle

Run bisa SUCCEEDED ketika PR dibuat. Tetapi PR sendiri punya lifecycle:

OPEN
UPDATED
REVIEW_REQUESTED
CHANGES_REQUESTED
APPROVED
MERGED
CLOSED

Untuk seri ini, platform agent bertanggung jawab sampai PR dibuat dan evidence tersedia. Auto-update PR setelah review akan dibahas pada bagian PR orchestration.

Jangan membuat Run tetap RUNNING sampai PR merged. Itu akan membuat run lifecycle terlalu panjang dan rentan menggantung.

Buat entity terpisah:

PullRequestRecord
ReviewFeedbackRun
FollowUpTask

22. Database schema dasar

Tabel runs:

CREATE TABLE runs (
    id TEXT PRIMARY KEY,
    task_id TEXT NOT NULL,
    attempt_no INTEGER NOT NULL,
    status TEXT NOT NULL,
    status_reason TEXT,
    worker_id TEXT,
    lease_token TEXT,
    started_at TIMESTAMPTZ,
    completed_at TIMESTAMPTZ,
    expires_at TIMESTAMPTZ,
    version BIGINT NOT NULL DEFAULT 0,
    created_at TIMESTAMPTZ NOT NULL,
    updated_at TIMESTAMPTZ NOT NULL,
    CONSTRAINT runs_status_check CHECK (status IN (
        'CREATED',
        'QUEUED',
        'LEASED',
        'PREPARING',
        'CONTEXT_BUILDING',
        'PLANNING',
        'EXECUTING',
        'PATCH_READY',
        'VERIFYING',
        'JUDGING',
        'WAITING_FOR_APPROVAL',
        'PR_CREATING',
        'PR_CREATED',
        'SUCCEEDED',
        'FAILED',
        'CANCELLED',
        'EXPIRED'
    ))
);

Index:

CREATE INDEX idx_runs_task_id ON runs(task_id);
CREATE INDEX idx_runs_status ON runs(status);
CREATE INDEX idx_runs_active ON runs(status, updated_at)
WHERE status NOT IN ('SUCCEEDED', 'FAILED', 'CANCELLED', 'EXPIRED');

Tabel event:

CREATE TABLE run_events (
    id TEXT PRIMARY KEY,
    run_id TEXT NOT NULL REFERENCES runs(id),
    event_type TEXT NOT NULL,
    from_status TEXT NOT NULL,
    to_status TEXT NOT NULL,
    actor_type TEXT NOT NULL,
    actor_id TEXT NOT NULL,
    idempotency_key TEXT,
    payload JSONB NOT NULL,
    recorded_at TIMESTAMPTZ NOT NULL,
    UNIQUE (run_id, idempotency_key)
);

23. Transition service

Service layer melakukan orchestration kecil:

receive event
load run with version
check idempotency
validate actor/lease
validate transition
validate guard
write event
update run
commit transaction
emit domain event
trigger side effect

Pseudo-code:

public TransitionResult applyEvent(RunEvent event) {
    return transaction.execute(() -> {
        Run run = runRepository.getForUpdate(event.runId());

        IdempotencyHit hit = eventRepository.findByIdempotencyKey(
            event.runId(),
            event.idempotencyKey()
        );
        if (hit.exists()) {
            return hit.previousResult();
        }

        leaseValidator.validate(event, run);

        RunStatus next = stateMachine.next(run.status(), event.type());
        transitionGuard.validate(run, event, next);

        RunEventRecord record = RunEventRecord.from(run, event, next);
        eventRepository.insert(record);

        run.transitionTo(next, event.reason());
        runRepository.save(run);

        outboxRepository.insert(DomainEvent.from(record));

        return TransitionResult.of(run.id(), run.status(), next);
    });
}

Gunakan transactional outbox agar event eksternal tidak hilang setelah DB commit.


24. Domain events dari transition

Setiap transition penting menghasilkan domain event.

Contoh:

RunQueued
RunLeased
RunPreparing
RunContextReady
RunPlanCreated
RunPatchReady
RunVerificationStarted
RunVerificationPassed
RunJudgePassed
RunApprovalRequired
RunPrCreating
RunPrCreated
RunSucceeded
RunFailed
RunCancelled
RunExpired

Domain event dipakai oleh:

  • notification;
  • metrics;
  • audit;
  • UI timeline;
  • artifact processor;
  • scheduler;
  • PR orchestrator;
  • cleanup worker.

25. UI timeline dari state machine

State machine bisa langsung diterjemahkan menjadi timeline user-facing.

Contoh:

10:00 Task accepted
10:01 Worker acquired run
10:01 Sandbox prepared
10:02 Repository context built
10:03 Plan created
10:04 Edited 6 files
10:06 Compile failed; agent repaired import
10:08 Tests passed
10:09 Judge approved patch
10:10 Pull request created

Ini lebih berguna daripada log mentah 20.000 baris.


26. Mermaid sequence untuk run sukses


27. Mermaid sequence untuk verifier repair loop

Inti desain Honk-like adalah feedback loop seperti ini: agent tidak dipercaya hanya karena menulis patch; patch harus diuji dan feedback dikembalikan secara terstruktur.


28. State machine anti-patterns

28.1 Boolean soup

is_running
is_done
is_failed
is_waiting
is_pr_created
is_cancelled

Ini cepat rusak.

Gunakan enum status plus event log.

28.2 Terminal state yang bisa dibuka lagi

FAILED -> RUNNING

Jangan. Buat run baru.

28.3 State terlalu granular

Jangan menjadikan setiap tool call sebagai Run status.

Buruk:

READING_FILE
WRITING_FILE
RUNNING_GREP
RUNNING_MAVEN

Itu Step, bukan Run.

28.4 State terlalu abstrak

Buruk:

IN_PROGRESS
DONE
ERROR

Terlalu sedikit untuk diagnosa dan policy.

28.5 LLM boleh menentukan state

LLM boleh memberi output:

I think the patch is ready.

Tetapi sistem yang menentukan:

EXECUTING -> PATCH_READY

28.6 Tidak ada lease

Tanpa lease, distributed worker akan membuat duplicate execution.

28.7 Tidak ada idempotency

Tanpa idempotency, retry API bisa membuat event ganda, PR ganda, atau state corrupt.


29. Exercise: desain state untuk task analysis-only

Tidak semua task menghasilkan PR.

Task:

Analyze all usages of deprecated API X and produce migration report.

Completion mode:

completionMode: analysis_report_created

Run flow:

CREATED
QUEUED
LEASED
PREPARING
CONTEXT_BUILDING
PLANNING
EXECUTING
JUDGING
SUCCEEDED

Tidak perlu:

PATCH_READY
VERIFYING
PR_CREATING
PR_CREATED

Pelajaran: state machine bisa punya optional path. Jangan memaksa semua task melewati PR.


30. Exercise: desain state untuk high-risk schema migration

Task:

Migrate payment schema and update service code.

Flow:

CREATED
QUEUED
LEASED
PREPARING
CONTEXT_BUILDING
PLANNING
WAITING_FOR_APPROVAL
EXECUTING
PATCH_READY
VERIFYING
JUDGING
WAITING_FOR_APPROVAL
PR_CREATING
PR_CREATED
SUCCEEDED

Ada dua approval:

  1. approval plan sebelum eksekusi;
  2. approval patch sebelum PR.

Untuk high-risk task, approval bukan gangguan. Approval adalah control point.


31. Minimal implementation plan

Untuk implementasi awal, jangan langsung membuat workflow engine kompleks.

Bangun bertahap:

Step 1 — Enum dan transition validator

Implementasikan RunStatus, RunEventType, dan RunStateMachine.

Step 2 — DB table runs dan run_events

Pastikan semua transition tercatat.

Step 3 — API transition endpoint internal

Worker tidak update status langsung. Worker mengirim event.

Step 4 — Lease sederhana

Tambahkan worker_id, lease_token, heartbeat_at, expires_at.

Step 5 — Worker loop

Worker mengambil queued run, memperbarui state, dan menjalankan dummy execution.

Step 6 — Verifier loop

Tambahkan path PATCH_READY -> VERIFYING -> EXECUTING/JUDGING.

Step 7 — PR orchestration

Tambahkan path JUDGING -> PR_CREATING -> PR_CREATED -> SUCCEEDED.

Step 8 — UI timeline

Render event log menjadi timeline.


32. Checklist desain state machine

Sebelum lanjut ke API design, pastikan jawaban ini jelas:

  • Apa beda Task, Run, Attempt, dan Step?
  • Apa terminal state untuk Run?
  • Apakah terminal state benar-benar final?
  • Siapa yang boleh memicu setiap transisi?
  • Apakah transition divalidasi server-side?
  • Apakah worker punya lease?
  • Apakah zombie worker ditolak?
  • Apakah event idempotent?
  • Apakah failure diklasifikasi?
  • Apakah retry punya budget?
  • Apakah cancellation cooperative?
  • Apakah verifier failure bisa kembali ke execution?
  • Apakah judge bisa meminta fix?
  • Apakah high-risk task bisa berhenti di approval gate?
  • Apakah PR lifecycle dipisahkan dari Run lifecycle?
  • Apakah UI bisa menjelaskan apa yang terjadi tanpa membaca log mentah?

33. Ringkasan

State machine adalah tulang punggung agent platform.

Untuk Honk-like AI coding agent, kita tidak boleh membangun sistem yang hanya berkata:

running / done / failed

Kita butuh lifecycle yang memodelkan kenyataan:

CREATED
QUEUED
LEASED
PREPARING
CONTEXT_BUILDING
PLANNING
EXECUTING
PATCH_READY
VERIFYING
JUDGING
WAITING_FOR_APPROVAL
PR_CREATING
PR_CREATED
SUCCEEDED
FAILED
CANCELLED
EXPIRED

State machine yang baik memberi:

  • kontrol;
  • auditability;
  • retryability;
  • cancellation safety;
  • verifier feedback loop;
  • human approval gate;
  • PR orchestration yang aman;
  • foundation untuk API dan database.

Part berikutnya akan membangun API OpenAPI-first di atas state machine ini. API itu akan menjadi kontrak resmi antara UI, CLI, scheduler, worker, verifier, judge, dan integration layer.


References

  • Spotify Engineering — Background coding agents and feedback loops for predictable results: https://engineering.atspotify.com/2025/12/feedback-loops-background-coding-agents-part-3
  • OpenAI Codex sandboxing concept: https://developers.openai.com/codex/concepts/sandboxing
  • OpenAI Codex cloud tasks and pull request workflow: https://developers.openai.com/codex/cloud
  • Model Context Protocol specification: https://modelcontextprotocol.io/specification/2025-06-18
Lesson Recap

You just completed lesson 13 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.