Start HereOrdered learning track

Learn Ai Coding Agent Part 010 Project Skeleton And Repository Layout

12 min read2323 words
PrevNext
Lesson 1064 lesson track01–12 Start Here

title: Learn AI Coding Agent From Scratch - Part 010 description: Project skeleton dan repository layout untuk membangun Honk-like AI coding agent secara bertahap: monorepo structure, module boundary, control plane, worker, sandbox, agent runtime, tool runtime, verifier, judge, git integration, persistence, telemetry, dan local development workflow. series: learn-ai-coding-agent seriesTitle: Learn AI Coding Agent From Scratch order: 10 partTitle: Project Skeleton and Repository Layout tags:

  • ai-coding-agent
  • project-structure
  • monorepo
  • architecture
  • worker
  • sandbox
  • local-development date: 2026-07-03

Part 010 — Project Skeleton and Repository Layout

Part sebelumnya menjelaskan end-to-end flow. Sekarang kita ubah flow itu menjadi struktur project yang bisa dibangun.

Pertanyaan inti part ini:

Bagaimana menyusun repository untuk Honk-like AI coding agent agar tidak menjadi satu script besar yang sulit diamankan, dites, dan dikembangkan?

Kita belum akan menulis implementasi penuh. Kita akan menyusun skeleton yang cukup konkret agar part berikutnya bisa langsung masuk ke domain model, API, database schema, queue, worker, sandbox, tool runtime, agent loop, verifier, judge, dan PR orchestration.

Target kita bukan membuat folder cantik. Targetnya adalah membuat batas tanggung jawab yang tahan terhadap kompleksitas.

AI coding agent production-grade punya banyak boundary:

  • control plane vs execution plane;
  • API request vs durable task;
  • orchestrator vs worker;
  • sandbox host vs sandbox guest;
  • LLM runtime vs tool runtime;
  • tool permission vs tool implementation;
  • verifier vs judge;
  • patch generation vs PR creation;
  • raw logs vs summarized feedback;
  • policy decision vs model suggestion.

Kalau semua dicampur dalam satu service, agent akan cepat terasa “berfungsi”, tetapi sulit dikontrol ketika ada failure.


1. Design goal skeleton

Skeleton ini harus mendukung lima tahap evolusi:

Stage 1: local patch-only agent
Stage 2: local branch + commit agent
Stage 3: background worker + durable run
Stage 4: PR-creating supervised agent
Stage 5: fleet-wide governed code-change platform

Karena itu struktur repo harus cukup modular sejak awal, tetapi tidak terlalu abstrak.

Prinsipnya:

Start modular, not distributed.

Artinya kita boleh mulai dengan satu repository dan beberapa process lokal. Jangan langsung memecah menjadi 12 microservices sebelum boundary terbukti.

Skeleton yang baik harus memungkinkan:

  • menjalankan agent dari CLI;
  • menjalankan API server;
  • menjalankan worker;
  • menjalankan sandbox local;
  • mengganti LLM provider;
  • mengganti git provider;
  • menambah verifier;
  • menambah tool;
  • menyimpan run trace;
  • membuat replay dari artifact;
  • mengetes domain logic tanpa memanggil LLM.

2. Technology choice for this series

Karena seri ini fokus pada sistem, bukan framework hype, pilihan teknologi harus pragmatis.

Kita akan mendesain dengan stack konseptual seperti ini:

language:
  primary: TypeScript or Java
  examples: mostly TypeScript-style pseudocode and Java-case verifiers
runtime:
  api: HTTP/OpenAPI
  worker: queue consumer
  persistence: PostgreSQL
  queue: Redis Streams or PostgreSQL-backed queue for prototype, Kafka/NATS/SQS for production option
sandbox:
  local: Docker/Podman process isolation
  production: container runtime / Kubernetes job / Firecracker-like boundary depending risk
llm:
  provider abstraction: OpenAI, Anthropic, local-compatible endpoint
integration:
  git: GitHub first, generic provider abstraction later
  mcp: custom MCP servers for verifier and repository context later

Kenapa tidak mengunci satu bahasa?

Karena AI coding agent architecture bukan persoalan bahasa. Namun repository skeleton harus konkret. Di seri ini kita akan memakai struktur yang mudah diterjemahkan ke Java, Node.js, atau Go.

Untuk implementasi yang paling cepat diikuti, TypeScript cocok karena:

  • ecosystem CLI kuat;
  • JSON schema dan tool calling mudah;
  • HTTP/OpenAPI ergonomis;
  • cocok untuk orchestration;
  • mudah membuat wrapper command dan file tool.

Untuk bagian verifier dan case study Java, kita tetap memakai Maven/Java repository sebagai target yang dimodifikasi agent.


3. Top-level monorepo layout

Skeleton awal:

learn-ai-coding-agent/
  README.md
  LICENSE
  package.json
  pnpm-workspace.yaml
  tsconfig.base.json
  .env.example
  .gitignore
  AGENTS.md

  apps/
    api/
    worker/
    cli/
    web/

  packages/
    domain/
    contracts/
    persistence/
    policy/
    orchestrator/
    sandbox/
    agent-runtime/
    tool-runtime/
    tools/
    verifier/
    judge/
    git-provider/
    telemetry/
    config/
    testkit/

  mcp-servers/
    verifier-server/
    repo-context-server/

  examples/
    target-repos/
      java-maven-service/
      node-service/
    tasks/
    policies/
    prompts/

  infra/
    docker/
    compose/
    migrations/
    k8s/

  docs/
    architecture/
    adr/
    runbooks/

  scripts/
    dev/
    ci/
    sandbox/

Diagram dependency ideal:

Dependency rule:

Domain has no infrastructure dependency.
Policy depends on domain, not worker.
Agent runtime depends on tool runtime, not concrete shell internals.
Verifier depends on sandbox execution, not PR creation.
Git provider depends on domain contracts, not LLM runtime.

4. Why monorepo first

Distributed systems premature decomposition akan memperlambat pembelajaran.

Untuk coding agent, bagian tersulit bukan HTTP routing. Bagian tersulit adalah:

  • state machine;
  • sandbox safety;
  • tool semantics;
  • context selection;
  • verifier feedback;
  • traceability;
  • idempotency;
  • policy boundaries;
  • failure recovery.

Monorepo memudahkan kita melihat semua boundary tanpa network complexity tambahan.

Namun monorepo bukan berarti semua code boleh saling import sembarangan.

Kita gunakan package boundary sebagai simulasi service boundary.

Rule:

A package boundary is a future service boundary candidate.

Kalau suatu package tidak punya API yang jelas, jangan jadikan microservice.


5. apps/api

Path:

apps/api/
  src/
    main.ts
    routes/
      tasks.routes.ts
      runs.routes.ts
      artifacts.routes.ts
      health.routes.ts
    middleware/
      auth.middleware.ts
      request-id.middleware.ts
      error.middleware.ts
    openapi/
      openapi.yaml
  test/

Fungsi API:

  • menerima task;
  • membaca status run;
  • membaca artifact;
  • membatalkan run;
  • menerima webhook dari git provider;
  • menyediakan health endpoint.

API tidak boleh menjalankan agent langsung.

Salah:

POST /tasks langsung clone repo dan panggil LLM.

Benar:

POST /tasks validates and persists task, then enqueues run.

API boundary:

API owns request validation.
Orchestrator owns workflow transition.
Worker owns execution.

Contoh endpoint awal:

POST /tasks
GET /tasks/{taskId}
POST /tasks/{taskId}/runs
GET /runs/{runId}
POST /runs/{runId}/cancel
GET /runs/{runId}/artifacts
GET /runs/{runId}/logs

6. apps/worker

Path:

apps/worker/
  src/
    main.ts
    worker.ts
    handlers/
      run-task.handler.ts
      verifier.handler.ts
      pr.handler.ts
    lifecycle/
      graceful-shutdown.ts
      heartbeat.ts
  test/

Worker adalah execution coordinator.

Worker bertugas:

  • mengambil run dari queue;
  • membuat sandbox;
  • checkout repository;
  • memanggil agent runtime;
  • menjalankan verifier;
  • memanggil judge;
  • membuat branch/commit/PR jika allowed;
  • menyimpan artifact;
  • mengupdate state.

Worker harus idempotent.

Artinya kalau worker crash dan run diambil worker lain, sistem tidak membuat PR duplikat atau kehilangan artifact.

Worker tidak boleh menyimpan state penting hanya di memory.

Minimal heartbeat:

worker:
  id: worker-7
  runId: run_01J
  state: running
  heartbeatAt: 2026-07-03T13:45:00+07:00

Jika heartbeat hilang:

orchestrator marks run as stale -> cleanup sandbox -> retry if retryable

7. apps/cli

Path:

apps/cli/
  src/
    main.ts
    commands/
      run-local.ts
      submit-task.ts
      inspect-run.ts
      replay-run.ts
      verify.ts
      diff.ts

CLI penting untuk pembelajaran dan debugging.

Command awal:

agent run-local \
  --repo ./examples/target-repos/java-maven-service \
  --task ./examples/tasks/auth-client-upgrade.yaml

agent submit-task ./examples/tasks/auth-client-upgrade.yaml

agent inspect-run run_01J

agent replay-run run_01J --from-artifacts

CLI membuat kita bisa membangun agent sebelum API/worker production matang.

Rule:

The CLI should call the same orchestration use cases as API/worker.

Jangan buat logic terpisah antara CLI dan worker. Nanti behavior lokal dan background berbeda.


8. apps/web

Path:

apps/web/
  src/
    pages/
      tasks/
      runs/
      artifacts/
    components/
      RunTimeline.tsx
      DiffViewer.tsx
      ToolCallList.tsx
      VerifierPanel.tsx
      JudgeReport.tsx

Web UI bukan prioritas awal, tetapi penting untuk production.

Developer perlu melihat:

  • run timeline;
  • current state;
  • changed files;
  • tool calls;
  • raw and summarized logs;
  • verifier results;
  • judge verdict;
  • PR link;
  • cost;
  • retry history.

Untuk prototype, CLI cukup. Untuk platform, UI meningkatkan trust.

Trust tidak hanya datang dari hasil PR. Trust datang dari kemampuan menjawab:

Apa yang agent lakukan?
Kenapa agent melakukannya?
Evidence apa yang dimiliki?
Di mana agent gagal?
Apa yang bisa saya audit?

9. packages/domain

Path:

packages/domain/
  src/
    ids.ts
    task.ts
    run.ts
    state-machine.ts
    artifact.ts
    patch.ts
    verifier.ts
    judge.ts
    policy.ts
    errors.ts
  test/

Domain adalah pusat bahasa sistem.

Ia mendefinisikan:

  • Task;
  • TaskContract;
  • Run;
  • RunState;
  • RunStep;
  • ToolCall;
  • Artifact;
  • PatchCandidate;
  • VerifierResult;
  • JudgeVerdict;
  • PolicyDecision.

Domain tidak boleh import PostgreSQL, Docker, GitHub SDK, atau LLM SDK.

Domain harus bisa dites cepat:

pnpm test packages/domain

Contoh domain model ringkas:

export type RunState =
  | 'submitted'
  | 'normalizing'
  | 'policy_checking'
  | 'queued'
  | 'preparing_sandbox'
  | 'running'
  | 'verifying'
  | 'repairing'
  | 'judging'
  | 'approval_required'
  | 'creating_pr'
  | 'completed'
  | 'failed'
  | 'blocked';

State transition harus diuji di domain package, bukan tersebar di worker.


10. packages/contracts

Path:

packages/contracts/
  openapi/
    agent-api.yaml
  schemas/
    task-contract.schema.json
    policy.schema.json
    run.schema.json
    artifact.schema.json
  src/
    generated/

Contracts menyimpan API dan schema publik/internal.

Kenapa dipisah dari domain?

Karena domain model internal tidak selalu sama dengan API representation.

Contoh:

  • internal RunState bisa punya state transient;
  • public API mungkin hanya expose state group;
  • internal policy reason bisa mengandung data sensitif;
  • external artifact URL bisa signed URL.

Rule:

Contracts define what crosses boundaries.
Domain defines what the system means internally.

Untuk agent platform, schema penting karena:

  • task input harus valid;
  • tool input harus valid;
  • verifier output harus valid;
  • artifact metadata harus valid;
  • run replay butuh format stabil.

11. packages/persistence

Path:

packages/persistence/
  src/
    db.ts
    repositories/
      task.repository.ts
      run.repository.ts
      artifact.repository.ts
      tool-call.repository.ts
      policy.repository.ts
    transactions.ts
    migrations.ts
  test/

Persistence menyimpan durable state.

Tables yang nanti akan dibuat:

  • tasks;
  • runs;
  • run_steps;
  • tool_calls;
  • artifacts;
  • verifier_results;
  • judge_reports;
  • policy_decisions;
  • pr_records;
  • worker_heartbeats.

Persistence tidak boleh tahu detail LLM prompt. Ia menyimpan artifact dan event.

Repository method harus berbasis use case:

createTask(task)
createRun(run)
transitionRunState(runId, expectedState, nextState)
appendToolCall(runId, toolCall)
storeArtifact(runId, artifact)
attachPullRequest(runId, prRecord)

Perhatikan expectedState.

State transition harus optimistic dan safe:

UPDATE runs
SET state = 'running'
WHERE id = ? AND state = 'queued'

Ini mencegah dua worker menjalankan run yang sama.


12. packages/policy

Path:

packages/policy/
  src/
    policy-engine.ts
    rules/
      task-type.rules.ts
      repo-sensitivity.rules.ts
      path-scope.rules.ts
      autonomy.rules.ts
      user-permission.rules.ts
      cost.rules.ts
    risk-score.ts
  test/

Policy package menjawab:

  • apakah task boleh dijalankan;
  • autonomy mode apa;
  • tools apa yang boleh;
  • path mana yang boleh;
  • verifier apa yang wajib;
  • approval gate mana yang diperlukan.

Policy output harus explicit:

export interface PolicyDecision {
  decision: 'allowed' | 'blocked' | 'requires_approval';
  autonomyMode: 'analysis_only' | 'draft_only' | 'supervised_pr' | 'autonomous_pr';
  allowedTools: string[];
  blockedPaths: string[];
  requiredVerifiers: string[];
  reasons: string[];
}

Jangan menyimpan policy hanya di prompt.

Prompt boleh berkata:

Do not modify deployment files.

Tetapi policy runtime harus enforce:

Reject patch if deployment files changed.

13. packages/orchestrator

Path:

packages/orchestrator/
  src/
    submit-task.usecase.ts
    start-run.usecase.ts
    execute-run.usecase.ts
    verify-run.usecase.ts
    judge-run.usecase.ts
    create-pr.usecase.ts
    cancel-run.usecase.ts
    retry-run.usecase.ts
    ports/
      task-store.ts
      queue.ts
      sandbox.ts
      agent-runtime.ts
      verifier.ts
      judge.ts
      git-provider.ts
  test/

Orchestrator adalah application layer.

Ia menghubungkan domain, policy, persistence, queue, worker, sandbox, agent runtime, verifier, judge, dan git provider.

Orchestrator tidak boleh berisi implementasi Docker, GitHub SDK, atau OpenAI SDK langsung.

Gunakan port/interface:

export interface SandboxPort {
  prepare(input: PrepareSandboxInput): Promise<SandboxHandle>;
  cleanup(handle: SandboxHandle): Promise<void>;
}

export interface AgentRuntimePort {
  run(input: AgentRunInput): Promise<AgentRunOutput>;
}

Kenapa?

Karena nanti kita ingin:

  • test orchestrator dengan fake sandbox;
  • replay run tanpa memanggil LLM;
  • mengganti provider;
  • menjalankan local mode tanpa API;
  • menjalankan worker production dengan sandbox nyata.

14. packages/sandbox

Path:

packages/sandbox/
  src/
    sandbox-manager.ts
    local-sandbox.ts
    docker-sandbox.ts
    workspace.ts
    command-runner.ts
    network-policy.ts
    filesystem-policy.ts
    cleanup.ts
  test/

Sandbox package mengelola execution boundary.

Tanggung jawab:

  • membuat workspace;
  • clone/checkout repo;
  • menjalankan command;
  • menerapkan timeout;
  • menangkap stdout/stderr;
  • membatasi path;
  • membersihkan resource;
  • mengekspor artifact.

Sandbox tidak membuat keputusan agent. Ia hanya mengeksekusi capability yang diberikan.

Contoh interface:

export interface CommandResult {
  exitCode: number;
  stdout: string;
  stderr: string;
  durationMs: number;
  timedOut: boolean;
}

export interface Sandbox {
  readFile(path: string): Promise<string>;
  writeFile(path: string, content: string): Promise<void>;
  applyPatch(patch: string): Promise<PatchApplyResult>;
  runCommand(command: string[], options: CommandOptions): Promise<CommandResult>;
  diff(): Promise<string>;
  cleanup(): Promise<void>;
}

Security note:

Do not expose raw host shell to agent runtime.

Agent tool shell.run harus lewat sandbox, bukan child_process.exec bebas dari API process.


15. packages/agent-runtime

Path:

packages/agent-runtime/
  src/
    agent-session.ts
    agent-loop.ts
    llm/
      llm-provider.ts
      openai-provider.ts
      anthropic-provider.ts
      local-provider.ts
    memory/
      session-memory.ts
      compaction.ts
    planning/
      planner.ts
      plan-artifact.ts
    prompts/
      system.prompt.md
      repair.prompt.md
      summarize-log.prompt.md
    stop-condition.ts
  test/

Agent runtime adalah model interaction layer.

Ia bertugas:

  • membuat message sequence;
  • memanggil LLM provider;
  • menerima tool call request;
  • mengirim tool result;
  • mengelola context window;
  • membuat plan artifact;
  • menghentikan loop saat objective selesai;
  • mengembalikan patch candidate.

Agent runtime tidak boleh langsung membaca file. Ia harus lewat tool runtime.

Rule:

LLM talks to tools.
Tools talk to sandbox.
Sandbox talks to filesystem/commands.

Diagram:


16. packages/tool-runtime

Path:

packages/tool-runtime/
  src/
    tool-registry.ts
    tool-schema.ts
    tool-dispatcher.ts
    tool-permission.ts
    tool-result.ts
    tool-timeout.ts
    tool-audit.ts
  test/

Tool runtime adalah gate antara model dan side effect.

Tanggung jawab:

  • register tool;
  • validate input schema;
  • check permission;
  • dispatch tool;
  • enforce timeout;
  • sanitize output;
  • log tool call;
  • classify error.

Tool runtime harus menjawab:

Can the agent call this tool with this input in this run state?

Contoh tool definition:

export interface ToolDefinition<I, O> {
  name: string;
  description: string;
  inputSchema: JsonSchema;
  risk: 'read' | 'write' | 'execute' | 'network';
  run(input: I, context: ToolContext): Promise<O>;
}

Jangan beri tool terlalu besar.

Buruk:

tool: execute_anything(command: string)

Lebih baik:

file.read(path)
file.search(pattern)
file.applyPatch(diff)
shell.runAllowed(command, args)
git.diff()

17. packages/tools

Path:

packages/tools/
  src/
    file/
      read-file.tool.ts
      write-file.tool.ts
      apply-patch.tool.ts
      search.tool.ts
      list-files.tool.ts
    shell/
      run-command.tool.ts
    git/
      diff.tool.ts
      status.tool.ts
    context/
      repo-map.tool.ts
      symbol-search.tool.ts

Tools adalah kemampuan konkret.

Setiap tool harus punya:

  • schema input;
  • schema output;
  • permission classification;
  • timeout;
  • output size limit;
  • audit log;
  • tests.

Contoh output size control:

if (stdout.length > maxOutputBytes) {
  return {
    truncated: true,
    summary: summarize(stdout),
    artifactRef: storeRawOutput(stdout)
  };
}

Kenapa penting?

Karena command seperti mvn test bisa menghasilkan log sangat panjang. Jika output mentah langsung masuk context window, agent akan boros token dan kehilangan fokus.


18. packages/verifier

Path:

packages/verifier/
  src/
    verifier-runner.ts
    checks/
      diff-policy.check.ts
      format.check.ts
      compile.check.ts
      unit-test.check.ts
      secret-scan.check.ts
      public-api.check.ts
    log-summary/
      maven-log-parser.ts
      generic-log-summarizer.ts
    verifier-result.ts
  test/

Verifier membuktikan evidence.

Verifier bukan judge. Verifier menjawab checks yang relatif objektif.

Contoh:

mvn test exit code 0 -> pass
protected path modified -> fail
secret pattern found -> fail
compile error found -> fail with repair hint

Verifier result harus terstruktur:

export interface VerifierResult {
  name: string;
  status: 'passed' | 'failed' | 'skipped' | 'error';
  required: boolean;
  summary: string;
  repairable: boolean;
  rawLogArtifactId?: string;
  evidence: VerifierEvidence[];
}

Nanti kita akan membuat verifier yang bisa dipakai sebagai MCP server juga.


19. packages/judge

Path:

packages/judge/
  src/
    judge-runner.ts
    deterministic/
      forbidden-path.judge.ts
      deleted-tests.judge.ts
      diff-size.judge.ts
    llm/
      diff-review-judge.ts
      prompt.ts
    verdict.ts
  test/

Judge mengevaluasi diff terhadap intent.

Judge output:

export interface JudgeVerdict {
  verdict: 'accept' | 'reject' | 'escalate';
  confidence: 'low' | 'medium' | 'high';
  reasons: string[];
  requiredActions: string[];
}

Judge bisa melakukan:

  • deterministic checks;
  • LLM review;
  • hybrid aggregation.

Tetapi hard policy tetap di policy/verifier.

Rule:

Judge may recommend. Policy gates enforce.

20. packages/git-provider

Path:

packages/git-provider/
  src/
    git-provider.ts
    github-provider.ts
    local-git-provider.ts
    branch-strategy.ts
    commit-message.ts
    pr-body.ts
    reviewers.ts
  test/

Git provider mengelola:

  • clone URL resolution;
  • branch creation;
  • commit;
  • push;
  • PR creation;
  • PR update;
  • labels;
  • reviewers;
  • CI status;
  • review comments.

Interface:

export interface GitProvider {
  resolveRepository(input: ResolveRepoInput): Promise<ResolvedRepository>;
  createBranch(input: CreateBranchInput): Promise<BranchRef>;
  commitChanges(input: CommitChangesInput): Promise<CommitRef>;
  createPullRequest(input: CreatePullRequestInput): Promise<PullRequestRef>;
  getPullRequestChecks(pr: PullRequestRef): Promise<CheckRun[]>;
}

Idempotency penting di package ini.

Branch naming harus deterministic:

agent/{taskId-short}-{slug}

PR body harus mencantumkan run metadata agar bisa dicari ulang.


21. packages/telemetry

Path:

packages/telemetry/
  src/
    logger.ts
    tracing.ts
    metrics.ts
    cost-meter.ts
    event-bus.ts
    audit-log.ts
  test/

Telemetry untuk agent bukan hanya request latency.

Yang perlu diukur:

  • task count;
  • run duration;
  • time per state;
  • tool call count;
  • verifier retry count;
  • token usage;
  • cost per run;
  • changed files;
  • PR created;
  • PR merged;
  • review rejection rate;
  • failure class;
  • sandbox errors;
  • policy blocks.

Trace event contoh:

{
  "event": "tool_call.completed",
  "runId": "run_01J",
  "tool": "shell.run",
  "durationMs": 8421,
  "exitCode": 1,
  "outputBytes": 18742,
  "truncated": true
}

Untuk production, telemetry adalah cara mengetahui apakah agent benar-benar membantu atau hanya menghasilkan aktivitas.


22. packages/config

Path:

packages/config/
  src/
    env.ts
    app-config.ts
    provider-config.ts
    policy-config.ts
    sandbox-config.ts

Config harus explicit.

Contoh .env.example:

DATABASE_URL=postgres://agent:agent@localhost:5432/agent
REDIS_URL=redis://localhost:6379
LLM_PROVIDER=openai
OPENAI_API_KEY=
ANTHROPIC_API_KEY=
GITHUB_APP_ID=
GITHUB_PRIVATE_KEY_PATH=
SANDBOX_MODE=docker
ARTIFACT_DIR=.agent-artifacts

Jangan membaca environment variable secara acak di banyak package.

Rule:

Only config package reads environment variables.
Other packages receive typed config.

Ini membuat test dan replay lebih mudah.


23. packages/testkit

Path:

packages/testkit/
  src/
    fake-llm-provider.ts
    fake-sandbox.ts
    fake-git-provider.ts
    fixture-repos.ts
    run-builder.ts
    artifact-fixtures.ts

Testkit penting karena memanggil LLM nyata di unit test adalah anti-pattern.

Kita butuh fake provider:

const fakeLlm = new FakeLlmProvider([
  { toolCall: { name: 'file.read', input: { path: 'pom.xml' } } },
  { toolCall: { name: 'file.applyPatch', input: { diff: '...' } } },
  { final: 'Patch complete' }
]);

Dengan ini kita bisa test:

  • state transition;
  • tool dispatch;
  • verifier loop;
  • retry semantics;
  • PR idempotency;
  • artifact storage;
  • cancellation.

Production-grade agent harus banyak dites tanpa model.

Model behavior berubah. System invariants tidak boleh bergantung pada model tertentu.


24. mcp-servers

Path:

mcp-servers/
  verifier-server/
    src/
      main.ts
      tools/
        run-maven-test.ts
        run-lint.ts
        summarize-log.ts
  repo-context-server/
    src/
      main.ts
      resources/
        repository-map.ts
        ownership.ts
        architecture-docs.ts
      tools/
        search-symbol.ts
        find-call-sites.ts

MCP servers tidak wajib untuk prototype pertama. Namun seri ini akan membangun MCP karena Honk-like architecture modern sering membutuhkan tool/context integration yang distandarkan.

MCP berguna untuk memisahkan:

  • agent runtime;
  • domain-specific tools;
  • repository context resources;
  • verifier capabilities.

Namun jangan jadikan MCP sebagai alasan untuk melemahkan security.

Rule:

MCP server is another trust boundary, not a magic safe abstraction.

MCP tool tetap harus punya permission, logging, timeout, dan output control.


25. examples

Path:

examples/
  target-repos/
    java-maven-service/
      pom.xml
      src/main/java/...
      src/test/java/...
    node-service/
      package.json
      src/...
  tasks/
    auth-client-upgrade.yaml
    rename-api-method.yaml
    add-regression-test.yaml
  policies/
    local-dev-policy.yaml
    supervised-pr-policy.yaml
  prompts/
    migration-prompt.md
    repair-prompt.md

Examples adalah learning harness.

Jangan hanya punya satu happy path. Minimal fixture:

FixturePurpose
Java Maven compile passbaseline
Java dependency breaking changeverifier repair loop
Forbidden path modificationpolicy/diff failure
Test deletion attemptjudge failure
Long Maven loglog summarization
Multi-module reporepository map
Generated source directorypath guard

Dengan fixture ini, kita bisa menguji agent behavior secara repeatable.


26. infra

Path:

infra/
  docker/
    Dockerfile.api
    Dockerfile.worker
    Dockerfile.sandbox-java17
  compose/
    docker-compose.yml
  migrations/
    001_initial_schema.sql
  k8s/
    api-deployment.yaml
    worker-deployment.yaml
    sandbox-job.yaml

Untuk local development, compose cukup:

services:
  postgres:
    image: postgres:16
  redis:
    image: redis:7
  api:
    build: ../../apps/api
  worker:
    build: ../../apps/worker

Sandbox image terpisah dari worker image.

Kenapa?

Karena worker adalah control process, sandbox adalah execution process.

Worker tidak perlu Maven, Gradle, Node, Go, Python, semua tool build. Sandbox image yang spesifik target repo boleh punya itu.

Rule:

Keep orchestration runtime smaller than execution runtime.

27. docs

Path:

docs/
  architecture/
    001-system-context.md
    002-control-plane-execution-plane.md
    003-sandbox-boundary.md
    004-agent-loop.md
  adr/
    0001-use-monorepo.md
    0002-start-with-local-sandbox.md
    0003-use-postgres-for-run-ledger.md
  runbooks/
    failed-run.md
    stuck-worker.md
    pr-duplication.md
    sandbox-cleanup.md

Docs bukan hiasan. Untuk agent platform, docs adalah governance artifact.

Minimal ADR awal:

# ADR 0001: Use modular monorepo

## Context
We need to build a coding agent platform with multiple boundaries but avoid premature distributed complexity.

## Decision
Use a modular monorepo with package boundaries representing future service boundaries.

## Consequences
- Easier local development.
- Clear import rules required.
- Future extraction remains possible.

Runbook penting karena failure agent sering tidak intuitif.

Contoh runbook:

Run stuck in verifying for > 30 minutes:
1. Check worker heartbeat.
2. Check sandbox command timeout.
3. Fetch raw verifier log artifact.
4. If worker dead, mark run stale and cleanup sandbox.
5. Retry only if verifier command is idempotent.

28. scripts

Path:

scripts/
  dev/
    bootstrap.sh
    reset-db.sh
    seed-examples.sh
  ci/
    test.sh
    lint.sh
    typecheck.sh
  sandbox/
    build-java17-image.sh
    cleanup-workspaces.sh

Scripts harus repeatable.

Developer baru harus bisa menjalankan:

./scripts/dev/bootstrap.sh
./scripts/dev/seed-examples.sh
pnpm test
pnpm dev

Jangan menyembunyikan setup penting di dokumentasi panjang tanpa script.

Agent platform sendiri akan menjalankan banyak command. Kalau development setup tidak reproducible, verifier juga akan rapuh.


29. Initial package dependency rules

Aturan import:

packages/domain          -> no internal package dependency
packages/contracts       -> domain allowed only for shared enums/types if generated carefully
packages/policy          -> domain
packages/persistence     -> domain
packages/sandbox         -> domain optional, config
packages/tool-runtime    -> domain, sandbox port types
packages/tools           -> tool-runtime, sandbox
packages/agent-runtime   -> domain, tool-runtime
packages/verifier        -> domain, sandbox
packages/judge           -> domain, llm provider abstraction if needed
packages/git-provider    -> domain
packages/orchestrator    -> all ports, domain, policy
apps/*                   -> orchestrator and infrastructure adapters

Forbidden:

packages/domain imports packages/persistence
packages/policy imports packages/agent-runtime
packages/verifier imports packages/git-provider
packages/tools imports apps/worker
apps/api imports concrete LLM SDK directly

Kalau dependency graph mulai melingkar, itu tanda boundary salah.


30. Local development flow

Flow pertama yang akan kita dukung:

Command:

agent run-local \
  --repo examples/target-repos/java-maven-service \
  --task examples/tasks/auth-client-upgrade.yaml \
  --policy examples/policies/local-dev-policy.yaml \
  --out .agent-runs/run-local-001

Output:

.agent-runs/run-local-001/
  normalized-task.yaml
  policy-decision.yaml
  repository-snapshot.yaml
  plan.md
  tool-calls.jsonl
  patch.diff
  verifier-results.json
  judge-report.md
  summary.md
  logs/
    compile.log
    test.log

Ini cukup untuk membangun dan debug sebelum API/worker production.


31. Build order

Urutan implementasi jangan dimulai dari UI atau GitHub integration.

Urutan yang masuk akal:

1. domain types and state machine
2. task contract schema
3. local sandbox abstraction
4. file/search/patch tools
5. tool runtime
6. fake LLM agent loop
7. real LLM provider abstraction
8. verifier runner
9. local CLI run
10. persistence ledger
11. worker queue
12. PR creation
13. judge
14. MCP servers
15. fleet mode

Kenapa fake LLM dulu?

Karena kita perlu membuktikan system mechanics:

  • tool dispatch;
  • state transitions;
  • artifact capture;
  • verifier loop;
  • error handling.

Kalau langsung pakai LLM, bug sistem dan variasi model akan tercampur.

Rule:

Make the system deterministic before adding probabilistic behavior.

32. Common structural mistakes

Mistake 1: One giant agent service

agent-service/
  index.ts

Masalah:

  • policy bercampur prompt;
  • shell command sulit dikontrol;
  • retry tidak jelas;
  • artifact hilang;
  • test sulit;
  • PR creation duplikat.

Mistake 2: LLM SDK everywhere

Kalau banyak package import OpenAI/Anthropic SDK langsung, provider abstraction gagal.

LLM SDK harus terkonsentrasi di:

packages/agent-runtime/src/llm

Mistake 3: Sandbox treated as helper

Sandbox bukan helper function. Sandbox adalah boundary.

Mistake 4: Verifier logic inside prompt

Buruk:

Please run tests and make sure they pass.

Benar:

System invokes verifier and stores structured result.

Mistake 5: PR creation before evidence

Agent boleh membuat PR hanya setelah evidence cukup untuk autonomy mode tersebut.


33. Minimal repository files to create first

Untuk mulai benar-benar coding, file minimum:

package.json
pnpm-workspace.yaml
tsconfig.base.json
.env.example
AGENTS.md

packages/domain/src/run.ts
packages/domain/src/task.ts
packages/domain/src/state-machine.ts
packages/domain/src/errors.ts

packages/sandbox/src/local-sandbox.ts
packages/tool-runtime/src/tool-registry.ts
packages/tools/src/file/read-file.tool.ts
packages/tools/src/file/search.tool.ts
packages/tools/src/file/apply-patch.tool.ts
packages/tools/src/shell/run-command.tool.ts

packages/agent-runtime/src/agent-loop.ts
packages/verifier/src/verifier-runner.ts
apps/cli/src/commands/run-local.ts
examples/tasks/auth-client-upgrade.yaml

Jangan buat semua folder kosong. Buat skeleton bertahap dengan test.


34. Definition of Done for Part 010

Setelah memahami part ini, kamu harus bisa menjelaskan:

  • kenapa agent platform tidak boleh dimulai sebagai satu script besar;
  • apa perbedaan apps dan packages;
  • kenapa domain harus bebas infrastructure;
  • kenapa orchestrator memakai ports;
  • kenapa sandbox harus jadi boundary eksplisit;
  • kenapa tool runtime berbeda dari tools;
  • kenapa verifier dan judge dipisah;
  • kenapa Git provider tidak boleh berada di agent runtime;
  • kenapa local CLI penting sebelum cloud worker;
  • bagaimana skeleton ini berkembang menjadi fleet-wide platform.

Ringkasan

Project skeleton untuk Honk-like AI coding agent harus mencerminkan flow sistem:

API/CLI -> Orchestrator -> Policy -> Worker -> Sandbox -> Agent Runtime -> Tool Runtime -> Tools -> Verifier -> Judge -> Git Provider -> Telemetry

Struktur yang kita pakai:

apps/       = executable surfaces
packages/   = bounded modules
mcp-servers/ = tool/context extension boundary
examples/   = learning and evaluation fixtures
infra/      = local/prod runtime support
docs/       = architecture/governance/runbooks
scripts/    = reproducible operations

Prinsip utama:

Do not let LLM convenience erase system boundaries.

Agent yang baik bukan agent yang paling bebas. Agent yang baik adalah agent yang bisa melakukan perubahan berguna dalam batas yang jelas, dengan evidence yang bisa diverifikasi, dan jejak yang bisa diaudit.

Lesson Recap

You just completed lesson 10 in start here. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.