Series MapLesson 16 / 30
Build CoreOrdered learning track

Learn Ai Development Driven Implementation Usage Part 016 Ai For Testing Strategy

16 min read3086 words
PrevNext
Lesson 1630 lesson track0717 Build Core

title: Learn AI Development Driven Implementation and Usage - Part 016 description: AI for testing strategy: merancang test sebagai risk model, menggunakan AI untuk scenario discovery, oracle design, test gap analysis, dan verification strategy yang efektif. series: learn-ai-development-driven-implementation-usage seriesTitle: Learn AI Development Driven Implementation and Usage order: 16 partTitle: AI for Testing Strategy tags:

  • ai
  • software-engineering
  • testing
  • test-strategy
  • quality-engineering
  • verification
  • ci-cd
  • series date: 2026-06-30

Part 016 — AI for Testing Strategy

Testing strategy bukan target coverage. Testing strategy adalah model risiko.

Coverage menjawab: “baris mana yang pernah dieksekusi test?”

Testing strategy yang baik menjawab:

  1. behavior apa yang harus dipercaya;
  2. risiko mana yang paling mahal jika lolos;
  3. jenis test apa yang paling murah untuk menangkap risiko itu;
  4. test mana yang memberi feedback cepat;
  5. test mana yang memberi confidence tinggi sebelum release;
  6. bagaimana AI membantu menemukan gap tanpa membuat test theater.

AI sangat berguna untuk testing, tetapi juga sering menghasilkan test yang terlihat banyak namun miskin sinyal. Part ini mengajarkan cara memakai AI untuk merancang strategi test yang efektif, bukan sekadar menghasilkan file test.


1. Kaufman Framing: Skill yang Sedang Dipelajari

Kita tidak belajar semua teknik testing sekaligus. Kita mempelajari sub-skill yang paling mempercepat kemampuan implementasi dengan AI.

1.1 Target Performance

Setelah part ini, kamu harus bisa:

  1. memetakan risiko perubahan menjadi test strategy;
  2. memilih level test yang tepat: unit, component, integration, contract, end-to-end, property-based, golden master, atau manual exploratory;
  3. memakai AI untuk menemukan scenario, edge case, invariant, dan oracle;
  4. membedakan coverage, confidence, dan correctness;
  5. menulis prompt test strategy yang mencegah test kosong;
  6. mengevaluasi kualitas test yang dibuat AI;
  7. mengintegrasikan test strategy dengan CI, PR review, dan refactoring workflow.

1.2 Deconstruct the Skill

Sub-skillFungsiFailure jika lemah
Risk mappingMenghubungkan perubahan dengan risikoTest banyak tapi tidak menangkap bug penting
Test level selectionMemilih unit/integration/contract/e2eFeedback lambat atau confidence rendah
Oracle designMenentukan expected resultTest hanya mengecek implementation detail
Scenario discoveryMenemukan edge caseBug lolos di boundary condition
Test data designMembuat fixture bermaknaTest brittle dan sulit dibaca
Flakiness controlMenjaga determinismCI tidak dipercaya
AI prompt controlMengarahkan test generationAI menulis test theater
Test reviewMenilai nilai testTest buruk masuk codebase

1.3 Learn Enough to Self-Correct

Gunakan lima pertanyaan ini:

  1. Bug apa yang paling ingin dicegah test ini?
  2. Apakah test memverifikasi behavior atau implementation detail?
  3. Apakah failure test mudah didiagnosis?
  4. Apakah test deterministic?
  5. Apakah level test ini paling murah untuk confidence yang dibutuhkan?

Jika test tidak bisa menjawab bug yang dicegah, test itu kemungkinan noise.


2. Testing sebagai Risk Model

Testing strategy dimulai dari perubahan dan risiko, bukan dari framework.

2.1 Risk Classes

Risk ClassExampleTest Strategy
Pure logic riskcalculation, eligibility, classificationUnit/property-based tests
State transition riskworkflow, approval, lifecycleTable-driven tests, state machine tests
API contract riskrequest/response compatibilityContract tests, schema tests
Integration riskDB, queue, external serviceIntegration tests with controlled dependency
Data migration riskschema/backfillMigration tests, shadow read, reconciliation
Authorization riskpermission, tenant boundaryNegative tests, access matrix tests
Time risktimezone, expiration, SLAClock-injected tests, boundary tests
Concurrency riskrace, idempotency, lockingStress tests, deterministic concurrency tests
Operational risklogging, metrics, alertingObservability assertions/smoke tests
Regression riskold behavior accidentally changesGolden master/characterization tests

3. Test Pyramid, Test Trophy, dan Realitas Enterprise

Test pyramid memberi heuristik: banyak unit test, lebih sedikit integration test, lebih sedikit end-to-end test. Google pernah menyarankan baseline 70% unit, 20% integration, 10% end-to-end sebagai first guess, bukan hukum universal.

3.1 Mengapa Pyramid Sering Rusak

Pyramid rusak ketika:

  • business logic hanya bisa dites lewat UI/API;
  • domain logic bercampur dengan database/external calls;
  • dependency injection buruk;
  • test data terlalu mahal dibuat;
  • tidak ada contract testing;
  • team mengejar coverage, bukan risk confidence;
  • AI menulis test level tinggi karena lebih mudah meniru user journey.

3.2 Prinsip Pemilihan Level Test

Pilih level test terendah yang masih bisa membuktikan behavior penting.

PertanyaanJika YaLevel yang Cenderung Tepat
Apakah logic pure dan deterministic?YaUnit/property-based
Apakah behavior butuh DB transaction?YaIntegration/component
Apakah risiko ada di schema compatibility?YaContract/schema
Apakah risiko ada di wiring antar service?YaIntegration/e2e terbatas
Apakah risiko ada di full user journey?YaE2E smoke
Apakah behavior existing tidak terdokumentasi?YaCharacterization/golden master

4. AI Roles dalam Testing Strategy

AI bisa membantu di banyak titik, tetapi peran harus jelas.

RoleApa yang AI lakukanHuman harus memvalidasi
Risk analystMengidentifikasi risk class dari diff/requirementApakah risk sesuai domain
Scenario generatorMembuat matrix positive/negative/boundaryApakah scenario relevan dan lengkap
Oracle assistantMengusulkan expected outcomeApakah expected benar
Test data designerMembuat fixture minimalApakah data merepresentasikan domain
Test implementerMenulis test codeApakah test meaningful dan deterministic
Gap reviewerMembaca existing tests dan mencari missing riskApakah gap bukan noise
Flakiness investigatorMenganalisis flaky behaviorApakah hypothesis didukung evidence

AI tidak boleh menjadi final authority untuk expected behavior. Expected behavior adalah domain contract.


5. Test Oracle Design

Test oracle adalah mekanisme yang menentukan apakah output benar.

Tanpa oracle yang kuat, test hanya menjalankan kode.

5.1 Oracle Types

Oracle TypeExampleStrengthRisk
Exact valuetotal == 12500Sangat jelasBisa brittle jika format tidak penting
Propertytotal >= 0, order stableGeneral dan kuatPerlu invariant benar
State transitionPending -> ApprovedCocok workflowHarus cover invalid transition
Side effectaudit row/event emittedEnterprise-relevantButuh fixture/spy yang tepat
Contract schemaresponse matches schemaCompatibilityTidak membuktikan semantic penuh
Snapshot/golden masteroutput sama seperti baselineRegression kuatBisa mengunci bug lama
Metamorphicperubahan input memberi relasi outputBagus untuk domain kompleksSulit dirancang
Differentialcompare old vs new implementationBagus untuk migrationOld implementation mungkin salah

5.2 Prompt: Oracle Discovery

Given this requirement and code path, propose test oracles.

Do not write tests yet.
Return:
1. observable behaviors that should be verified;
2. strongest oracle for each behavior;
3. weaker but cheaper oracle alternative;
4. data required;
5. risks not covered by each oracle;
6. whether the oracle depends on implementation details.

5.3 Oracle Smell

Test oracle lemah jika:

  • hanya assert “not null”;
  • hanya verify method dipanggil tanpa output penting;
  • snapshot terlalu besar dan tidak dijelaskan;
  • expected value dibuat dari function yang sedang dites;
  • test mengikuti mock interaction internal;
  • test tetap pass jika bug utama dimasukkan.

6. Scenario Discovery dengan AI

AI sangat baik untuk membuat scenario matrix jika diberi domain invariant dan risk class.

6.1 Scenario Matrix Template

Create a scenario matrix for this behavior.

Behavior:
<describe behavior>

Domain invariants:
<list invariants>

Known states:
<states>

Inputs:
<input fields>

Risk classes to cover:
- positive path
- negative path
- boundary values
- invalid state transitions
- authorization failures
- idempotency
- concurrency/time edge cases
- external dependency failure

Return a table:
Scenario | Given | When | Then | Test level | Oracle | Priority | Notes

6.2 Example: Case Escalation

ScenarioGivenWhenThenTest LevelOraclePriority
Eligible escalationCase open, SLA breachedEscalateStatus Escalated, audit written, event emittedComponentState + side effectHigh
Not breachedCase open, SLA not breachedEscalateDomain error, no state changeUnit/componentError + no side effectHigh
Already closedCase closedEscalateInvalid transitionUnitState transition oracleHigh
Unauthorized actorUser lacks permissionEscalatePermission error, no audit/eventComponentError + side effect absenceHigh
Duplicate requestSame idempotency keyEscalate twiceSingle transition/eventIntegrationIdempotency oracleHigh
External queue downEvent publish failsEscalateTransaction behavior as specifiedIntegrationTransaction oracleMedium

AI dapat menghasilkan matrix. Engineer harus memastikan matrix sesuai policy bisnis.


7. Testing State Machines

State machine harus dites sebagai transition system, bukan hanya method call.

7.1 Transition Coverage

Minimal coverage:

  1. all valid transitions;
  2. invalid transitions from each state;
  3. guard failure;
  4. permission failure;
  5. terminal state behavior;
  6. idempotency behavior;
  7. side effects per transition;
  8. concurrency boundary jika relevan.

7.2 Mermaid Model

7.3 Prompt: Generate Transition Test Plan

Generate a state machine test plan from this transition table.

Return:
1. valid transition tests;
2. invalid transition tests;
3. guard failure tests;
4. permission tests;
5. side effect assertions;
6. idempotency tests;
7. concurrency tests if needed;
8. suggested level for each test.

Do not generate code until the test plan is reviewed.

8. Property-Based Testing dengan AI

Property-based testing berguna ketika input space besar dan invariant jelas.

8.1 Cocok untuk

  • calculation;
  • parser/serializer;
  • normalization;
  • sorting;
  • allocation;
  • validation;
  • date/time rules;
  • idempotency;
  • state transition invariant.

8.2 Contoh Properties

DomainProperty
Money allocationSum of allocated amounts equals original amount
SortingOutput contains same elements and is ordered
NormalizationApplying normalization twice gives same result
AuthorizationUser without permission never reaches protected transition
IdempotencySame command key produces at most one side effect
Date rangeEnd date must not precede start date

8.3 Prompt: Property Discovery

Identify property-based tests for this function/module.

Return:
1. candidate invariants;
2. generated input domains;
3. shrinking considerations;
4. invalid input classes;
5. examples of bugs each property would catch;
6. cases where property-based testing is not appropriate.

8.4 Warning

AI bisa mengusulkan property yang salah karena tidak memahami domain. Property harus divalidasi oleh domain knowledge.


9. Contract Testing untuk API dan Event

Dalam sistem terdistribusi, banyak bug bukan bug logic internal, tetapi contract mismatch.

9.1 API Contract Tests

Verifikasi:

  • required/optional field;
  • type;
  • enum value;
  • status code;
  • error format;
  • backward compatibility;
  • pagination/sorting;
  • authentication/authorization semantics.

9.2 Event Contract Tests

Verifikasi:

  • event name/type;
  • schema version;
  • required field;
  • semantic field meaning;
  • idempotency/correlation key;
  • ordering assumption;
  • compatibility dengan consumer lama.

9.3 Prompt: Contract Risk from Diff

Review this diff for API/event contract risk.

Return:
1. changed request/response/event fields;
2. required vs optional changes;
3. enum/status/error changes;
4. consumer compatibility risks;
5. contract tests needed;
6. migration/versioning recommendation.

10. Integration Testing Without Pain

Integration test sering lambat dan flaky jika boundary tidak dikontrol.

10.1 Good Integration Test

Integration test yang baik:

  • menguji boundary nyata yang berisiko;
  • deterministic;
  • isolated data;
  • tidak bergantung pada urutan test;
  • tidak bergantung pada waktu real tanpa control;
  • menggunakan test container/local dependency bila perlu;
  • memiliki failure message jelas;
  • tidak menguji semua skenario kecil yang bisa ditangkap unit test.

10.2 Boundary yang Layak Integration Test

BoundaryWhy
ORM mapping/queryBug sering terjadi di mapping, lazy loading, transaction
Message publish/consumeSerialization, routing, retry, idempotency
External service adapterRequest mapping, error handling, timeout
Auth middlewarePermission enforcement
MigrationData transformation correctness
CacheExpiration, invalidation, consistency

10.3 Prompt: Integration Test Minimality

Propose the minimal integration tests required for this change.

Do not duplicate unit-testable logic.
Return:
1. integration risks;
2. required real dependencies;
3. dependencies that can be faked;
4. test data setup;
5. cleanup strategy;
6. flakiness risks;
7. why each test cannot be a unit test.

11. Flakiness Control

Flaky test mengurangi trust pada CI. AI-generated tests sering flaky jika tidak diberi constraint.

11.1 Flakiness Sources

SourceExampleFix
Timenow() langsungInject clock
Randomnessrandom data tanpa seedFixed seed / explicit data
Orderingassume DB order tanpa sortExplicit sort
Concurrencysleep-based waitDeterministic synchronization
External dependencyreal network callStub/test container/contract fake
Shared stateglobal/static mutationIsolated fixture
Asyncimmediate assert after publishAwait condition with timeout
Environmentpath/timezone/localePin environment

11.2 Prompt: Flaky Test Review

Review these tests for flakiness risk.

Check for:
- real time usage;
- sleeps;
- random data;
- unordered collections;
- shared mutable state;
- external calls;
- async race;
- environment assumptions.

Return a risk table and deterministic rewrite suggestions.

12. AI Test Generation Workflow

Jangan mulai dari “write unit tests”. Mulai dari strategy.

12.1 Step 1 — Ask for Risk Map

Analyze this requirement/diff and produce a testing risk map.
Do not write test code yet.

12.2 Step 2 — Ask for Test Plan

Convert the risk map into a test plan.
Include test level, oracle, data, priority, and reason.

12.3 Step 3 — Generate Tests Incrementally

Implement only the high-priority unit/component tests from the approved test plan.
Do not add broad snapshots.
Do not modify production code unless a testability seam is explicitly required.

12.4 Step 4 — Review Tests

Review these tests for meaningful assertions, determinism, and implementation-detail coupling.
Suggest improvements but do not change code automatically.

13. Test Quality Rubric

Score each generated test from 0–2.

Dimension012
Behavior relevanceTidak jelasTerkait sebagianLangsung cover behavior penting
Oracle strengthWeak assertMediumStrong semantic oracle
DeterminismFlaky riskMinor riskDeterministic
DiagnosisFailure membingungkanCukup jelasFailure langsung mengarah ke bug
ScopeTerlalu luasMediumMinimal dan fokus
IndependenceShared stateSebagian isolatedFully isolated
MaintainabilitySulit dibacaCukupJelas dan domain-oriented
Risk coverageTidak cover risk pentingCover satu riskCover risk prioritas tinggi

Interpretasi:

ScoreMeaningAction
0–6PoorRewrite or delete
7–11Acceptable with reviewImprove oracle/determinism
12–16StrongKeep

14. Testing Strategy by Change Type

14.1 Pure Function Change

Use:

  • unit tests;
  • table-driven tests;
  • boundary values;
  • property-based tests if invariant exists.

Avoid:

  • e2e tests for every branch;
  • mock-heavy tests.

14.2 Workflow/State Change

Use:

  • transition table tests;
  • invalid transition tests;
  • permission matrix tests;
  • side effect assertion;
  • idempotency test.

14.3 API Change

Use:

  • contract tests;
  • backward compatibility tests;
  • error shape tests;
  • generated client compatibility if applicable.

14.4 Database Migration

Use:

  • migration up/down test;
  • fixture before/after test;
  • reconciliation query;
  • performance sample if table large;
  • rollback simulation.

14.5 Refactoring

Use:

  • characterization tests;
  • existing regression suite;
  • targeted invariants;
  • static analysis;
  • diff review.

14.6 Bug Fix

Use:

  • failing regression test first;
  • minimal reproduction;
  • adjacent scenario tests;
  • negative test for false positive.

14.7 Security Fix

Use:

  • negative tests;
  • abuse-case tests;
  • permission boundary tests;
  • input validation tests;
  • security scan/static analysis.

15. Avoiding AI Test Theater

AI test theater terjadi ketika test terlihat banyak, tetapi tidak menambah confidence.

15.1 Smells

  • tests assert only non-null;
  • tests mirror implementation logic;
  • mocks verify every private collaborator call;
  • snapshots besar tanpa semantic assertion;
  • test names tidak menyebut behavior;
  • test data acak tanpa makna;
  • generated tests pass even when production logic is obviously broken;
  • tests added only to raise coverage.

15.2 Counter-Prompt

Evaluate these tests for test theater.
For each test, answer:
1. what bug would this catch?
2. would it fail if the main business rule is broken?
3. does it assert behavior or implementation detail?
4. is the fixture minimal and meaningful?
5. should this test be deleted, rewritten, or kept?

16. Mutation Thinking

Mutation testing tools intentionally change code to see whether tests fail. Even if you do not run a mutation testing tool, think mutationally.

Ask:

  • if > becomes >=, will test fail?
  • if permission check is removed, will test fail?
  • if event is not emitted, will test fail?
  • if wrong state is saved, will test fail?
  • if timezone changes, will test fail?
  • if duplicate command is processed twice, will test fail?

Prompt:

Perform mutation-thinking review of this test suite.
List plausible code mutations and whether existing tests would catch them.
Prioritize mutations related to business risk, not syntax trivia.

17. Test Data Design

Test data should communicate domain meaning.

17.1 Bad Test Data

user1, user2, itemA, itemB, status=1, amount=100

17.2 Better Test Data

caseOwnerWithoutApprovalRole
seniorReviewerWithEscalationPermission
caseSubmittedBeforeSlaDeadline
caseSubmittedAfterSlaDeadline
penaltyAmountAtUpperBoundary

17.3 Fixture Principles

  • create only what the test needs;
  • name data by role in scenario;
  • avoid magical defaults;
  • centralize builders only when they reduce noise;
  • do not hide important field values in builders;
  • avoid production-like giant fixture unless doing golden master.

Prompt:

Improve this test data for readability and domain intent.
Do not change tested behavior.
Make fixture names reveal scenario roles and boundary values.

18. CI Strategy

Testing strategy harus terhubung ke feedback loop.

StageTestsPurpose
Local pre-commitfast unit, lint, typecheckimmediate feedback
PR fast laneunit + component + contractreview confidence
PR extendedintegration + selected e2erelease confidence
Nightlyfull e2e, mutation subset, performance sampledeeper risk scan
Pre-releasemigration, smoke, rollback, compatibilitydeployment safety
Post-releasesynthetic checks, canary, observabilityproduction verification

AI can help generate commands and triage CI failures, but the strategy must be owned by the team.


19. Testing in Regulated or High-Stakes Workflows

Untuk sistem enforcement, compliance, finance, healthcare, telecom, atau audit-heavy workflow, test harus membuktikan lebih dari “response benar”.

Verifikasi juga:

  • decision trace;
  • audit event;
  • actor identity;
  • timestamp source;
  • rule version;
  • evidence reference;
  • reason code;
  • notification requirement;
  • retention behavior;
  • appeal/reopen behavior;
  • escalation timer;
  • immutable log.

AI prompt harus memasukkan auditability sebagai first-class behavior.

Generate a test strategy for this regulated workflow.
Include audit evidence, reason codes, actor identity, rule version, state transition, and downstream event assertions.
Do not treat HTTP response as the only observable behavior.

20. 20-Hour Practice Plan

Hour 1–2: Risk Map

Ambil satu feature kecil. Buat risk map dengan AI. Review dan koreksi manual.

Deliverable:

Risk map grouped by logic, state, contract, integration, security, and operational risk.

Hour 3–4: Test Level Selection

Ubah risk map menjadi test plan. Pilih level test termurah untuk tiap risk.

Deliverable:

Test plan with level, oracle, data, priority.

Hour 5–6: Oracle Design

Ambil lima scenario. Minta AI mengusulkan oracle, lalu perkuat assertion.

Deliverable:

Strong oracle table.

Hour 7–9: Unit/Component Tests

Generate dan review test cepat untuk pure/domain logic.

Deliverable:

Focused tests with meaningful names and deterministic data.

Hour 10–12: State/Workflow Tests

Buat transition table dan tests untuk valid/invalid transitions.

Deliverable:

Transition coverage test suite.

Hour 13–14: Contract Tests

Review satu API/event diff. Buat contract tests.

Deliverable:

Schema/compatibility tests.

Hour 15–16: Flakiness Review

Minta AI review flaky risks di test suite.

Deliverable:

Flakiness risk table + fixes.

Hour 17–18: Mutation Thinking

Minta AI membuat mutation-thinking review.

Deliverable:

List of plausible mutations and tests that catch them.

Hour 19–20: CI Gate Design

Susun test stage untuk local, PR, nightly, pre-release, post-release.

Deliverable:

CI test gate proposal.

21. Part Summary

Testing dengan AI harus dimulai dari strategy, bukan generation.

Core principles:

Risk first.
Oracle before code.
Smallest useful test level.
Deterministic by design.
Behavior over implementation detail.
Confidence over coverage.
Human owns expected behavior.

AI membantu memperluas scenario awareness, menemukan edge case, mempercepat test code, dan meninjau test quality. Tetapi AI juga bisa menghasilkan test theater yang menaikkan coverage tanpa menaikkan confidence.

Testing strategy yang kuat membuat Part 017 lebih efektif, karena test generation dan repair hanya aman jika strategi, oracle, dan risk model sudah benar.


References


Next Part

Part 017 akan membahas AI for Test Generation and Repair: bagaimana mengubah test strategy menjadi test code, memperbaiki flaky/failing tests, mengevaluasi assertion quality, dan menghindari generated-test debt.

Lesson Recap

You just completed lesson 16 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.