Deepen PracticeOrdered learning track

Learn Agentic Ai Engineering Part 020 Coding Agent Execution Loop

[]15 min read2888 words

In This Lesson

1. Kaufman Framing 2. Core Mental Model: Evidence-Driven Coding Loop 3. Execution State Machine

Lesson 2035 lesson track20–29 Deepen Practice

title: Learn Advanced Agentic AI Engineering & Autonomous Software Engineering - Part 020 description: Coding agent execution loop for autonomous software engineering: issue intake, environment preparation, reproduction, localization, patch planning, edit loop, verification, review packet, and feedback-driven iteration. series: learn-agentic-ai-engineering seriesTitle: Learn Advanced Agentic AI Engineering & Autonomous Software Engineering order: 20 partTitle: Coding Agent Execution Loop tags:

agentic-ai
autonomous-software-engineering
coding-agent
execution-loop
verification
series date: 2026-06-29

Part 020 — Coding Agent Execution Loop

Target part ini: mampu mendesain coding agent execution loop yang aman, terukur, dan reviewable: mulai dari issue intake, environment preparation, reproduction, localization, patch planning, edit loop, verification, sampai PR evidence packet. Fokusnya bukan “agent menulis kode”, tetapi agent menjalankan proses engineering yang benar.

Autonomous coding agent harus diperlakukan seperti engineer junior yang sangat cepat tetapi non-deterministic.

Ia bisa membaca, mencari, menjalankan tool, membuat patch, dan menjelaskan. Tetapi ia juga bisa:

salah memahami requirement,
salah memilih file,
membuat patch terlalu besar,
melewatkan test penting,
menganggap command berhasil padahal tidak,
menutupi failure,
mengubah behavior tak terkait,
menghasilkan PR yang tampak rapi tetapi secara sistemik salah.

Karena itu coding agent butuh execution loop yang eksplisit.

A coding agent is not a code generator.
A coding agent is a controlled engineering executor that changes code through evidence, tests, and reviewable state transitions.

1. Kaufman Framing

1.1 Target performance

Setelah part ini, kita ingin mampu:

mendesain loop coding agent dari issue sampai PR,
menentukan state dan transition penting,
membuat policy kapan agent boleh patch dan kapan harus berhenti,
membedakan reproduction, localization, patching, dan verification,
mengatur patch minimal, rollback, dan diff review,
membangun evidence packet untuk human reviewer,
mengevaluasi agent berdasarkan task outcome dan process quality,
menghindari autonomous coding anti-pattern seperti “patch before reproduce” dan “tests ignored”.

Target praktis:

Jika diberi request “buat agent yang memperbaiki bug otomatis dari GitHub issue”, kita bisa membuat state machine, tool contract, verification hierarchy, stop condition, failure handling, dan PR output format yang production-grade.

1.2 Deconstruct the skill

Coding agent execution terdiri dari subskill:

Task intake — memahami issue, acceptance criteria, constraints.
Risk classification — menentukan autonomy level dan approval gate.
Environment preparation — checkout, dependency, build/test readiness, sandbox.
Repository understanding — menggunakan repo map dan context packet.
Reproduction — membuat failure observable.
Localization — menemukan candidate root cause.
Patch planning — memilih perubahan minimal dan test plan.
Edit execution — apply diff secara scoped.
Verification — run tests, static checks, targeted regression.
Self-review — inspect diff, risk, unintended changes.
Human review packet — PR body, evidence, limitation.
Feedback iteration — respond to CI/review/failure.

1.3 Learn enough to self-correct

Minimal knowledge agar bisa self-correct:

patch tanpa reproduction adalah spekulasi,
patch tanpa test adalah klaim belum diverifikasi,
diff besar meningkatkan risk surface,
passing targeted test tidak selalu cukup,
agent harus mencatat command, output, dan decision,
setiap action mutating harus bisa direplay atau diaudit,
coding agent harus punya stop condition.

1.4 Remove friction

Friction umum:

tidak tahu command test,
dependency install gagal,
test lambat,
flaky tests,
environment beda dari CI,
issue tidak jelas,
repo map tidak lengkap,
permission terlalu luas,
no review packet.

Solusi:

repo understanding packet,
command manifest,
sandboxed execution,
targeted test selection,
retry policy,
approval gate,
PR evidence template.

2. Core Mental Model: Evidence-Driven Coding Loop

Coding agent loop ideal:

The loop is evidence-driven because every major step needs proof:

Step	Evidence
Intake	parsed requirement, acceptance criteria, unknowns
Risk classification	risk category, autonomy level, required gates
Repo understanding	candidate files/tests with reasons
Reproduction	failing command/output or baseline rationale
Localization	root-cause hypothesis with code evidence
Patch	minimal diff linked to hypothesis
Verification	command outputs and test results
Review	diff summary, risk, limitation

3. Execution State Machine

A coding agent should not be a free-form loop. It should be a state machine.

3.1 Terminal states

Not every task should end with patch.

Valid terminal states:

State	Meaning
`PATCH_READY`	patch verified and review packet created
`NEEDS_HUMAN_CLARIFICATION`	requirement ambiguous or risky
`CANNOT_REPRODUCE`	no reliable reproduction after allowed attempts
`ENVIRONMENT_BLOCKED`	build/test setup impossible under policy
`OUT_OF_SCOPE`	task violates permission/scope
`UNSAFE_TO_CONTINUE`	security/risk threshold exceeded

A good agent stops honestly.

Bad agent invents success.

4. Step 1 — Issue Intake

4.1 Intake fields

task_intake:
  source: github_issue
  title: "Payment reversal duplicates ledger entry after retry timeout"
  description: "..."
  task_type: bugfix
  expected_behavior: "reversal should be idempotent"
  observed_behavior: "duplicate ledger entries"
  artifacts:
    - logs
    - stack_trace
    - reproduction_steps
  constraints:
    - do_not_change_public_api
    - preserve backward compatibility
  unknowns:
    - exact retry timing not specified

4.2 Intake parser should extract

task type,
affected component,
expected behavior,
observed behavior,
reproduction clues,
constraints,
non-goals,
urgency,
risk signals,
missing information.

4.3 Acceptance criteria

For bug fixing:

acceptance_criteria:
  - failing behavior is reproduced or convincingly simulated
  - regression test demonstrates failure before fix
  - patch makes regression test pass
  - existing relevant tests pass
  - diff does not broaden unrelated behavior

For feature implementation:

acceptance_criteria:
  - analogous existing pattern followed
  - new behavior covered by tests
  - API/contract updates included if needed
  - backward compatibility considered
  - docs updated if user-facing

5. Step 2 — Risk Classification

Risk changes autonomy.

5.1 Risk tiers

Tier	Example	Agent autonomy
R0 trivial	typo, docs, comments	can patch with light review
R1 low	isolated test, small internal helper	patch with tests
R2 medium	business logic, non-critical API	patch with targeted+broad tests
R3 high	auth, billing, data migration, security	approval before patch/merge
R4 critical	production secrets, destructive infra, legal/compliance	human-led, agent assists only

5.2 Risk classifier output

risk_classification:
  tier: R3
  categories:
    - money-movement
    - idempotency-critical
  autonomy:
    may_read: true
    may_run_tests: true
    may_patch: after_approval
    may_open_pr: true
    may_merge: false
  required_reviews:
    - domain_owner
    - backend_reviewer

5.3 Risk signals

auth/authorization,
money movement,
tax/billing/ledger,
personally identifiable information,
encryption/crypto,
database migration,
public API,
infrastructure/deployment,
concurrency primitives,
data deletion,
compliance audit.

6. Step 3 — Prepare Environment

6.1 Environment goals

Agent needs:

clean checkout,
known commit,
sandbox,
no production secrets,
build/test commands,
dependency cache if allowed,
resource limits,
command logging.

6.2 Environment manifest

environment:
  repository_commit: abc123
  branch: agent/task-123
  sandbox:
    network: restricted
    secrets: unavailable
    filesystem: workspace-scoped
  commands_discovered:
    - ./gradlew test
    - ./gradlew :payment:test
  baseline_status:
    clean_git_status: true
    build_command_known: true
    test_command_known: true

6.3 Baseline check

Before patch:

git status --short
./gradlew :payment:test --dry-run
./gradlew :payment:test --tests PaymentReversalServiceTest

If baseline already fails, record it.

baseline:
  command: ./gradlew :payment:test
  status: failed
  interpretation: "baseline has unrelated failing tests"
  action: "continue only with targeted test and report limitation"

Do not hide baseline failures.

7. Step 4 — Repository Understanding

This uses Part 019.

Required before patch:

repo_understanding_required:
  build_manifest: true
  candidate_files: true
  candidate_tests: true
  risk_notes: true
  unknowns: true

Coding agent should consume context packet:

context_packet:
  likely_patch_targets:
    - PaymentReversalService.java
    - IdempotencyStore.java
  likely_tests:
    - PaymentReversalServiceTest
  invariants:
    - reversal must be idempotent by transaction id
  risks:
    - money movement
    - ledger duplication

If repo understanding confidence is low, agent should not jump to patch. It should gather more evidence.

8. Step 5 — Reproduce or Establish Baseline

8.1 Reproduction-first rule

For bugfix tasks:

Prefer reproduction before patch.

Why:

proves bug exists,
prevents fixing wrong thing,
gives regression test,
produces objective verification.

8.2 Forms of reproduction

Form	Strength
Existing failing test	strongest
New regression test that fails before patch	strong
Minimal script/fixture	medium-strong
Manual command/log reproduction	medium
Static reasoning only	weak, acceptable only when execution impossible

8.3 Reproduction packet

reproduction:
  status: reproduced
  command: ./gradlew :payment:test --tests PaymentReversalServiceTest.duplicateLedgerOnRetry
  failure_summary: "expected one ledger entry but found two"
  evidence:
    - test_output_excerpt
    - fixture_used
  confidence: high

If cannot reproduce:

reproduction:
  status: not_reproduced
  attempts:
    - command: ./gradlew :payment:test
      result: pass
    - command: ./gradlew :payment:test --tests PaymentReversalServiceTest
      result: pass
  possible_reasons:
    - missing timing condition
    - integration dependency unavailable
    - issue environment-specific
  recommended_action: ask_for_logs_or_reproduction_details

8.4 When patch without reproduction is acceptable

Rarely acceptable when:

issue is obvious static defect,
compile error is clear,
typo/config mismatch is exact,
test environment unavailable but code evidence is strong,
production incident requires emergency mitigation under human control.

Even then, mark confidence lower.

9. Step 6 — Localize Root Cause

Localization connects symptom to code.

9.1 Root-cause hypothesis

root_cause_hypothesis:
  statement: "Payment reversal idempotency key uses timeout attempt id instead of original transaction id."
  evidence:
    - "PaymentReversalService constructs key from retryContext.getAttemptId()"
    - "LedgerEntryService deduplicates by idempotency key"
    - "test fails when retry attempt id changes"
  confidence: 0.82
  missing_evidence:
    - "no integration test with real queue retry"

9.2 Localization tools

stack trace mapping,
log message search,
symbol references,
call graph,
data flow,
test failure diff,
git history,
recent commit inspection,
config/feature flag inspection.

9.3 Avoid single-cause bias

A bug can involve:

code,
config,
migration,
dependency version,
race condition,
test fixture,
environment,
third-party API.

Agent should keep competing hypotheses when evidence is incomplete.

hypotheses:
  - statement: "wrong idempotency key"
    confidence: 0.82
  - statement: "ledger unique constraint missing"
    confidence: 0.44
  - statement: "retry handler duplicates event"
    confidence: 0.51

10. Step 7 — Patch Planning

Patch plan should be explicit before editing.

10.1 Patch plan template

patch_plan:
  objective: "make payment reversal idempotent across retry attempts"
  intended_files:
    - PaymentReversalService.java
    - PaymentReversalServiceTest.java
  strategy:
    - add failing regression test for retry with changed attempt id
    - derive idempotency key from original transaction id
    - keep public API unchanged
  verification:
    targeted:
      - ./gradlew :payment:test --tests PaymentReversalServiceTest
    broader:
      - ./gradlew :payment:test
  risks:
    - ledger behavior change
    - backward compatibility with existing keys
  rollback:
    - revert service and test changes

10.2 Patch planning rules

prefer smallest behavior-preserving change,
add/adjust tests before or with code,
avoid unrelated cleanup,
avoid broad rewrites,
avoid changing public contracts unless required,
document migration/compatibility if needed,
plan verification before patch.

10.3 Approval before patch

High-risk plan should be approved before code mutation.

approval_request:
  risk_tier: R3
  reason: "money movement and ledger idempotency"
  proposed_change: "use original transaction id for reversal idempotency key"
  files: [...]
  tests: [...]

11. Step 8 — Apply Minimal Diff

11.1 Edit discipline

Agent should:

apply focused changes,
keep style consistent,
preserve formatting conventions,
avoid drive-by refactors,
keep generated files untouched unless appropriate,
update tests/docs/config when required,
keep commits/diffs reviewable.

11.2 Diff size control

Bad:

Refactor whole service while fixing one bug.

Better:

Change one idempotency key derivation method and add one regression test.

11.3 Edit packet

edit_packet:
  changed_files:
    - path: PaymentReversalService.java
      reason: "fix idempotency key derivation"
    - path: PaymentReversalServiceTest.java
      reason: "add retry regression test"
  intentional_non_changes:
    - "No public API change"
    - "No database schema change"
  generated_files_touched: false

12. Step 9 — Verification Hierarchy

Verification should be layered.

12.1 Verification layers

Layer	Example	Purpose
Format/lint	formatter, lint	mechanical correctness
Compile/typecheck	`javac`, `tsc`, `go test` compile	syntax/type correctness
Targeted unit test	nearest test	direct behavior
Regression test	new failing-then-passing test	bug prevention
Integration test	module/service interaction	boundary behavior
Full module suite	module-wide	local regression
CI	full project	broader confidence
Human review	expert judgment	maintainability/risk

12.2 Verification packet

verification:
  commands:
    - command: ./gradlew :payment:test --tests PaymentReversalServiceTest
      status: passed
      duration: 12s
    - command: ./gradlew :payment:test
      status: passed
      duration: 2m10s
  not_run:
    - command: ./gradlew check
      reason: "exceeds time budget in sandbox"
  interpretation: "targeted and module tests pass; full CI still required"

12.3 Failing verification

If verification fails:

verification_failure:
  command: ./gradlew :payment:test --tests PaymentReversalServiceTest
  failure: "expected one ledger entry but found two"
  classification: patch_incomplete
  next_action: relocalize_root_cause

Do not repeatedly patch blindly.

Failure should update hypothesis.

13. Step 10 — Diagnose Feedback

Coding agent must interpret feedback, not only retry.

13.1 Feedback types

Feedback	Meaning
compiler error	code does not build
test assertion failure	behavior mismatch
snapshot diff	output changed
lint/format error	style/mechanical issue
flaky failure	nondeterministic or infra issue
timeout	performance/deadlock/slow test
CI-only failure	environment difference or hidden dependency
reviewer comment	human semantic feedback

13.2 Feedback diagnosis packet

feedback_diagnosis:
  source: test_output
  type: assertion_failure
  failed_test: PaymentReversalServiceTest.shouldDeduplicateRetry
  likely_cause: "idempotency key still includes attempt id"
  evidence:
    - "actual keys differ between retry attempts"
  next_step: "inspect key builder"

13.3 Retry policy

Agent should limit retries.

retry_policy:
  max_patch_iterations: 4
  max_same_failure_retries: 2
  stop_if:
    - no_new_information
    - risk_tier_increases
    - patch_scope_expands_beyond_plan
    - tests_fail_for_unrelated_reasons

14. Step 11 — Self-Review

Before PR, agent reviews its own diff.

14.1 Self-review checklist

Does diff match original task?
Are there unrelated changes?
Are generated/vendor files touched?
Are public contracts changed?
Are tests added/updated?
Are edge cases considered?
Is error handling consistent?
Is logging/metric behavior acceptable?
Does patch preserve invariants?
Are limitations documented?

14.2 Diff risk scoring

diff_risk_score:
  files_changed: 2
  lines_added: 42
  lines_removed: 5
  public_api_changed: false
  database_changed: false
  auth_changed: false
  money_movement_changed: true
  generated_files_changed: false
  overall: medium_high

14.3 Self-review output

self_review:
  summary: "Patch changes reversal idempotency key derivation and adds retry regression test."
  unrelated_changes: false
  public_api_change: false
  generated_files_touched: false
  remaining_risks:
    - "integration behavior with real queue retry should be validated in CI"

15. Step 12 — PR Evidence Packet

The output of coding agent should not just be a diff. It should be a review artifact.

15.1 PR body template

## Summary
- Fixed payment reversal idempotency across retry attempts.
- Added regression test for retry timeout scenario.

## Root Cause
The reversal idempotency key was derived from retry attempt id, so retries generated different ledger deduplication keys.

## Changes
- Use original transaction id as reversal idempotency key source.
- Added `shouldDeduplicateReversalAfterRetryTimeout` regression test.

## Verification
- `./gradlew :payment:test --tests PaymentReversalServiceTest` ✅
- `./gradlew :payment:test` ✅

## Risk
- Touches money movement behavior.
- No public API or database schema changes.
- Full CI still required before merge.

## Reviewer Notes
Please pay special attention to compatibility with existing ledger idempotency records.

15.2 Evidence fields

pr_evidence_packet:
  task: issue-123
  root_cause: "..."
  changed_files: [...]
  tests_run: [...]
  tests_not_run: [...]
  risks: [...]
  limitations: [...]
  reviewer_focus: [...]

15.3 Never overclaim

Bad:

This fully fixes all retry issues.

Better:

This fixes the reproduced duplicate ledger entry scenario for reversal retry timeout. Full CI and integration environment validation are still recommended.

16. Autonomous SWE Loop Variants

16.1 Bugfix loop

intake → reproduce → localize → regression test → patch → targeted test → broader test → PR

16.2 Feature loop

intake → find analogous feature → design small change → implement → test → docs/contracts → PR

16.3 Refactor loop

intake → map references → ensure baseline tests → transform → run impacted tests → verify behavior → PR

16.4 Migration loop

intake → inventory usages → plan phases → codemod/sample patch → verify → staged PRs

16.5 Code review fix loop

read review → classify comments → patch actionable comments → verify → reply with evidence

17. Tool Contract for Coding Agent

17.1 Tool categories

Category	Examples	Risk
Read-only	list files, read file, search	low
Analysis	parse AST, dependency graph	low
Execution	run test/build	medium
Mutation	edit file, apply patch	medium/high
External	open PR, comment, push	high
Destructive	delete branch, deploy, migrate DB	critical

17.2 Mutating tool guard

before_edit_required:
  - task_intake_complete
  - risk_classification_complete
  - repo_context_packet_exists
  - patch_plan_exists
  - approval_if_required

17.3 Command execution guard

command_policy:
  allowed:
    - git status
    - git diff
    - ./gradlew test
    - npm test
  requires_approval:
    - npm install
    - pip install
    - docker compose up
  forbidden:
    - rm -rf /
    - deploy
    - production migration
    - printenv

18. Handling Common Difficult Cases

18.1 Ambiguous issue

If issue says:

Login sometimes broken.

Agent should not invent specifics.

It should produce:

needs_clarification:
  missing:
    - user role
    - environment
    - reproduction steps
    - error message
  possible_next_actions:
    - inspect recent auth failures if logs provided
    - search login flow tests
    - identify likely entrypoints

18.2 No tests

If no tests exist:

add characterization test if possible,
use compile/static checks,
create small reproduction script,
mark confidence lower,
recommend human review.

18.3 Flaky tests

Agent should distinguish:

patch-caused failure,
existing flaky failure,
environment failure.

Record repeated runs:

flaky_test_observation:
  test: PaymentEventConsumerIT
  runs: [pass, fail, pass]
  likely_flaky: true
  action: "do not claim patch failure; report limitation"

18.4 Huge repo

Use:

repo map,
affected project calculation,
top-k localization,
context budget,
targeted tests,
staged PR.

18.5 Long-running tests

Use hierarchy:

compile/typecheck,
targeted unit tests,
impacted module tests,
CI for full validation.

Report what was not run.

18.6 CI-only failure

CI failure may involve:

OS differences,
dependency cache,
credentials,
service container,
hidden env var,
race condition,
timeouts.

Agent should read CI logs, classify failure, and produce a follow-up patch only if evidence supports it.

19. Autonomous Coding Agent Memory

Coding agent should remember process artifacts for a run.

19.1 Run memory

run_memory:
  task_id: issue-123
  decisions:
    - "classified risk as R3 due to money movement"
    - "selected PaymentReversalService based on stack trace and symbol graph"
  commands:
    - command: ./gradlew :payment:test --tests PaymentReversalServiceTest
      result: failed_before_patch
    - command: ./gradlew :payment:test --tests PaymentReversalServiceTest
      result: passed_after_patch
  files_read:
    - PaymentReversalService.java
    - LedgerEntryService.java
  files_changed:
    - PaymentReversalService.java
    - PaymentReversalServiceTest.java

19.2 Do not use memory as authority

Memory is evidence candidate, not truth.

If memory says a test exists, verify it in current commit.

20. Evaluation of Coding Agent Loop

20.1 Outcome metrics

Metric	Meaning
issue resolved	final patch passes hidden/eval tests
test pass rate	targeted/broader tests pass
regression coverage	new/updated test captures bug
patch minimality	diff is scoped
review acceptance	human reviewer accepts patch
CI success	pipeline passes

20.2 Process metrics

Metric	Meaning
reproduction rate	agent reproduced bug before patch
localization accuracy	agent selected correct files
iteration count	number of patch loops
command success	valid build/test commands
evidence quality	PR packet grounded and complete
stop honesty	agent stops instead of faking success
autonomy violations	unsafe actions attempted

20.3 Trajectory eval

Do not evaluate only final diff. Evaluate trajectory:

Did agent read relevant files?
Did agent run appropriate commands?
Did agent ignore failures?
Did agent change plan when evidence contradicted hypothesis?
Did agent stop when stuck?

A plausible final answer with bad process is not production-ready.

21. Reference Architecture

21.1 Required platform services

Service	Role
Policy engine	controls actions and approvals
Repo map service	repository understanding
Sandbox runner	safe command execution
Patch service	controlled file mutation
Verification service	test/static check execution
Trace store	audit and replay
PR service	draft PR/comment generation
Eval harness	regression evaluation

22. Internal Engineering Handbook Rules

Rule 1 — No silent success

Agent must never claim success without verification evidence.

Rule 2 — Reproduce before patch when possible

For bugfixes, reproduction is the strongest guard against wrong fixes.

Rule 3 — Patch plan before mutation

Every edit should map to hypothesis and verification.

Rule 4 — Minimal diff by default

Avoid unrelated refactor, formatting churn, and broad rewrites.

Rule 5 — Test failures are information

Failed tests should update hypothesis, not trigger blind retries.

Rule 6 — Risk gates autonomy

Higher-risk code requires approval and stronger evidence.

Rule 7 — Review packet is part of deliverable

A diff without explanation, tests, risks, and limitations is incomplete.

Rule 8 — Stop honestly

CANNOT_REPRODUCE, ENVIRONMENT_BLOCKED, and NEEDS_CLARIFICATION are valid outputs.

23. Practice Lab

Lab 1 — Build a coding loop state machine

For a sample repo, define:

states,
transitions,
allowed tools per state,
terminal states,
approval gates.

Lab 2 — Reproduction packet

Take a known bug. Produce:

failing command,
failure summary,
evidence,
regression test plan.

Lab 3 — Patch plan

Before editing, write:

root-cause hypothesis,
files to change,
tests to add/run,
risks,
rollback.

Lab 4 — Verification hierarchy

For a repo module, identify:

fastest compile/check command,
nearest unit test,
module suite,
full CI command.

Lab 5 — PR evidence packet

Given a final diff, produce:

summary,
root cause,
changes,
verification,
risks,
reviewer notes.

24. Self-Assessment

You understand this part if you can answer:

Why is coding agent not equivalent to code generator?
Why should bugfix agents reproduce before patching?
What are valid terminal states besides PATCH_READY?
How does risk tier affect autonomy?
What should be inside a patch plan?
What is the difference between targeted verification and broader verification?
How should agent react to failing tests?
Why is PR evidence packet part of the deliverable?
How do you evaluate process quality, not only final diff?
When should agent stop instead of continuing?

25. Key Takeaways

A production-grade coding agent follows an execution loop:

parse task,
classify risk,
prepare sandbox,
understand repository,
reproduce or establish baseline,
localize root cause,
plan patch,
apply minimal diff,
verify,
diagnose failures,
self-review,
produce PR evidence.

The main invariant:

Every code change must be traceable to a task, hypothesis, evidence, and verification result.

The next part will go deeper into autonomous debugging and repair: failure reproduction, log/stack-trace reasoning, test minimization, root-cause graph, and patch validation.

References

SWE-bench — official benchmark for resolving real-world GitHub software issues: https://www.swebench.com/
SWE-bench GitHub repository: https://github.com/swe-bench/SWE-bench
OpenAI Introducing Codex: https://openai.com/index/introducing-codex/
OpenAI Codex cloud documentation: https://developers.openai.com/codex/cloud
OpenAI Codex skills documentation: https://developers.openai.com/codex/skills
Anthropic Claude Code documentation: https://docs.anthropic.com/en/docs/claude-code/overview
OpenAI Agents SDK documentation: https://openai.github.io/openai-agents-python/
OWASP Top 10 for LLM Applications: https://owasp.org/www-project-top-10-for-large-language-model-applications/

Lesson Recap

You just completed lesson 20 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 19

Learn Agentic Ai Engineering Part 019 Repository Understanding Agents

Next Lesson

Lesson 21

Learn Agentic Ai Engineering Part 021 Autonomous Debugging And Repair