Series MapLesson 63 / 64
Final StretchOrdered learning track

Learn Java Payment Systems Part 063 Readiness Review And Certification

12 min read2307 words
PrevNext
Lesson 6364 lesson track5464 Final Stretch

title: Build From Scratch: Large Production Grade Java Payment Systems - Part 063 description: Production readiness review, certification evidence, launch gates, and go/no-go decision model for a Java payment platform. series: learn-java-payment-systems seriesTitle: Build From Scratch: Large Production Grade Java Payment Systems order: 63 partTitle: Readiness Review and Certification tags:

  • java
  • payments
  • payment-systems
  • fintech
  • pci-dss
  • readiness-review
  • certification
  • sre
  • compliance
  • enterprise-architecture date: 2026-07-02

Part 063 — Readiness Review and Certification

Payment system yang sudah selesai dibangun belum otomatis boleh live.

Di sistem biasa, readiness sering berarti:

service bisa deploy, endpoint bisa dipanggil, test hijau, dashboard ada.

Di payment system, readiness berarti jauh lebih keras:

sistem boleh menyentuh uang nyata, menghasilkan kewajiban finansial nyata, menerima audit, bertahan ketika provider tidak konsisten, dan masih bisa menjelaskan setiap rupiah/dollar/cent yang bergerak.

Part ini membahas readiness review sebagai launch gate terakhir sebelum production. Ini bukan checklist dekoratif. Ini adalah mekanisme untuk menjawab pertanyaan:

  1. Apakah sistem benar secara finansial?
  2. Apakah sistem aman secara data dan credential?
  3. Apakah sistem bisa dioperasikan ketika ada unknown state?
  4. Apakah sistem punya evidence cukup untuk audit, dispute, reconciliation, incident, dan regulator?
  5. Apakah tim tahu apa yang harus dilakukan ketika uang nyangkut?

Kita tidak sedang membuat demo payment gateway. Kita sedang membuat platform yang harus bisa dipertanggungjawabkan.


1. Mental Model: Readiness Bukan Satu Checklist

Readiness payment platform punya beberapa dimensi paralel.

Kesalahan paling umum adalah menganggap readiness sebagai status tunggal:

READY = true

Padahal readiness harus diperlakukan sebagai multi-domain decision:

READY_FOR_SANDBOX_CERTIFICATION
READY_FOR_LIMITED_INTERNAL_PRODUCTION
READY_FOR_PILOT_MERCHANT
READY_FOR_PUBLIC_MERCHANT
READY_FOR_HIGH_VALUE_PAYMENT
READY_FOR_AUTOMATED_SETTLEMENT
READY_FOR_AUTOMATED_PAYOUT

Sistem bisa siap menerima payment kecil di pilot, tetapi belum siap automatic payout. Sistem bisa siap card authorization, tetapi belum siap partial refund. Sistem bisa siap API, tetapi belum siap dispute operations.


2. Readiness State Machine

Gunakan state machine eksplisit untuk platform capability.

Setiap transition harus punya evidence.

Contoh:

TransitionEvidence wajib
BUILD_COMPLETE -> INTERNAL_TESTINGtest matrix, schema migration proof, local simulator pass
INTERNAL_TESTING -> SECURITY_REVIEWthreat model, data classification, secret inventory
SECURITY_REVIEW -> PROVIDER_CERTIFICATIONPCI scope memo, logging redaction proof, key rotation test
PROVIDER_CERTIFICATION -> OPERATIONAL_REHEARSALprovider UAT result, webhook replay proof, settlement report sample
OPERATIONAL_REHEARSAL -> LIMITED_PRODUCTIONrunbook drill, incident drill, reconciliation drill, support script
LIMITED_PRODUCTION -> CONTROLLED_RAMPpilot metrics, zero unresolved critical break, payout validation

3. Readiness Object

Jangan simpan readiness sebagai spreadsheet manual saja. Representasikan readiness sebagai object yang bisa diaudit.

public record ReadinessDecision(
    String capability,
    ReadinessStage stage,
    Decision decision,
    String decidedBy,
    Instant decidedAt,
    List<EvidenceRef> evidence,
    List<RiskAcceptance> acceptedRisks,
    List<LaunchCondition> conditions
) {}

public enum ReadinessStage {
    DESIGN_REVIEW,
    BUILD_COMPLETE,
    INTERNAL_TESTING,
    SECURITY_REVIEW,
    PROVIDER_CERTIFICATION,
    OPERATIONAL_REHEARSAL,
    LIMITED_PRODUCTION,
    CONTROLLED_RAMP,
    FULL_PRODUCTION
}

public enum Decision {
    APPROVED,
    APPROVED_WITH_CONDITIONS,
    REJECTED,
    DEFERRED
}

Kenapa ini penting?

Karena saat terjadi incident 3 bulan setelah launch, pertanyaan yang muncul bukan hanya:

bug-nya di mana?

Tetapi:

siapa yang menyetujui capability ini live, berdasarkan evidence apa, dengan risiko apa yang sudah diterima?

Readiness adalah bagian dari audit trail.


4. Capability-Based Readiness

Payment platform tidak boleh launch sebagai satu unit monolitik capability.

Pisahkan readiness per capability:

card.payment.authorization
card.payment.capture
authentication.3ds.challenge
authentication.3ds.frictionless
refund.full
refund.partial
refund.async
bank_transfer.virtual_account.collection
wallet.topup
wallet.spend
merchant.settlement.batch
merchant.payout.automatic
dispute.chargeback.intake
backoffice.manual_adjustment
backoffice.webhook_replay
backoffice.break_glass

Setiap capability punya risiko berbeda.

Contoh:

CapabilityRisiko utamaLaunch strategy
card authorizationdouble charge, unknown statepilot merchant, low amount limit
automatic capturecapture after auth expirystrict auth window validation
partial refundover-refundledger refundable balance invariant
automatic payoutsending money to wrong beneficiarymaker-checker initially, then automate
manual adjustmentinternal fraud/operator errorhigh-risk approval policy
webhook replayduplicate state transitionidempotent inbox + permission gate
dispute representmentmissed deadline/legal lossdeadline engine + ops SLA

5. Financial Correctness Review

Ini review paling penting.

Sistem yang cepat tapi salah secara ledger adalah sistem gagal.

5.1 Ledger Invariants

Minimal invariant:

For every posted journal:
  sum(debit) == sum(credit)
  currency is single or explicitly FX-balanced
  journal is immutable after posting
  reversal is separate journal
  source_operation_id is idempotent

SQL check yang harus ada:

-- journal cannot be posted twice by same source operation
CREATE UNIQUE INDEX uq_ledger_journal_source_operation
ON ledger_journal(source_type, source_id, posting_rule_code)
WHERE status = 'POSTED';

-- entry amount cannot be zero
ALTER TABLE ledger_entry
ADD CONSTRAINT chk_ledger_entry_non_zero
CHECK (amount_minor <> 0);

-- account currency must match posting currency unless account is multi-currency enabled
-- enforced by posting service and periodic integrity job.

Integrity job wajib:

SELECT journal_id, currency, SUM(amount_minor) AS total
FROM ledger_entry
GROUP BY journal_id, currency
HAVING SUM(amount_minor) <> 0;

Hasil query harus selalu kosong untuk single-currency journal.

5.2 Payment Invariants

Checklist:

InvariantBukti
no double chargeidempotency test + provider operation uniqueness
no over-capturestate machine test + DB constraint/service invariant
no over-refundrefundable balance projection + concurrency test
unknown is not failedtimeout scenario + recovery workflow
provider success posts ledger oncewebhook duplicate test
capture and cancel cannot both winconcurrency test
settlement cannot include unreconciled blocked transactionsettlement eligibility test
payout cannot exceed available balancereservation test
manual adjustment must be balancedmaker-checker + ledger rule test

5.3 Property-Based Financial Tests

Contoh property:

@Property
void totalPlatformMoneyIsConserved(@ForAll("paymentScenarios") PaymentScenario scenario) {
    var result = testHarness.run(scenario);

    assertThat(result.ledger().unbalancedJournals()).isEmpty();
    assertThat(result.ledger().negativeLiabilityWithoutPolicy()).isEmpty();
    assertThat(result.payment().doubleCharges()).isEmpty();
    assertThat(result.payment().overRefunds()).isEmpty();
}

Property test lebih cocok untuk payment daripada test example-only, karena bug payment sering muncul dari kombinasi event:

confirm timeout
+ webhook success delayed
+ client retry
+ duplicate webhook
+ partial refund
+ settlement file late

6. Provider Certification Review

Provider certification biasanya bukan sekadar "API key sudah bisa dipakai".

Yang harus dibuktikan:

  1. API request format benar.
  2. Idempotency key diterima/dipakai sesuai kontrak provider.
  3. Timeout dan unknown outcome ditangani.
  4. Webhook signature diverifikasi.
  5. Duplicate webhook aman.
  6. Out-of-order webhook aman.
  7. Refund/cancel/reversal semantics benar.
  8. Settlement report bisa diparse.
  9. Reconciliation bisa mencocokkan transaksi.
  10. Error code provider ternormalisasi.

6.1 Certification Matrix

ScenarioExpected platform behavior
authorization approvedpayment becomes authorized/captured according to flow
authorization declined hardno retry, customer gets actionable decline
authorization timeout but later successstate remains unknown until evidence arrives
duplicate confirm requestsame logical result returned
duplicate webhookno duplicate ledger posting
webhook invalid signaturestored as rejected evidence, not applied
capture after auth expiredrejected before provider call if known
refund greater than capturedrejected by platform invariant
provider sends unknown codenormalized to PROVIDER_UNMAPPED, triggers review
settlement report amount mismatchreconciliation break created

6.2 Provider Operation Evidence

Setiap external operation harus punya record.

CREATE TABLE provider_operation_log (
    operation_id UUID PRIMARY KEY,
    provider_code TEXT NOT NULL,
    operation_type TEXT NOT NULL,
    aggregate_type TEXT NOT NULL,
    aggregate_id UUID NOT NULL,
    idempotency_key TEXT NOT NULL,
    request_fingerprint TEXT NOT NULL,
    provider_reference TEXT,
    normalized_status TEXT NOT NULL,
    raw_status TEXT,
    raw_error_code TEXT,
    request_payload_ref TEXT NOT NULL,
    response_payload_ref TEXT,
    started_at TIMESTAMPTZ NOT NULL,
    completed_at TIMESTAMPTZ,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    UNIQUE(provider_code, operation_type, idempotency_key)
);

Jangan hanya log ke stdout. Operation log adalah evidence.


7. Security and PCI Scope Review

PCI readiness harus menjawab:

  1. Apakah platform menyimpan, memproses, atau mentransmisikan cardholder data?
  2. Service mana yang masuk Cardholder Data Environment?
  3. Data apa yang boleh masuk log, trace, event, DB, data lake, support tool?
  4. Apakah PAN/CVC/SAD pernah keluar dari boundary yang diizinkan?
  5. Apakah token vault tersegmentasi?
  6. Apakah access ke CDE memakai least privilege dan MFA?
  7. Apakah evidence audit bisa diambil tanpa membuka data sensitif?

7.1 CDE Data Flow Diagram

Idealnya, platform tidak pernah melihat PAN/CVC. Jika harus, scope berubah drastis.

7.2 Security Evidence

Minimal evidence pack:

AreaEvidence
data classificationlist field sensitif per event/table/API
log redactionautomated test yang membuktikan PAN/CVC tidak keluar
secret managementinventory secret + rotation proof
key managementkey owner, purpose, rotation, access policy
vulnerability managementSAST/DAST/dependency scan result
access controlrole matrix + privileged access review
network segmentationdiagram + firewall/security group rule
incident responsetabletop result + runbook
secure SDLCchange approval + code review evidence

8. Reconciliation and Settlement Readiness

Payment bisa live hanya jika ada cara untuk membuktikan uangnya benar.

8.1 Reconciliation Readiness Checklist

CapabilityRequired before live?Reason
provider report ingestionyescompare provider truth
bank statement ingestionyes for settlement/payoutconfirm cash movement
matching engineyesdetect break
manual break workflowyesunmatched records will happen
correction postingyesrepair must be controlled
dashboardyesops must see unreconciled exposure
SLA alertyesbreaks cannot rot silently

8.2 Settlement Readiness Checklist

CapabilityRequired before automatic payout?
settlement batch generationyes
cutoff calendaryes
fee/reserve/netting ruleyes
negative balance policyyes
payout eligibility gateyes
payout approval / risk holdyes
merchant statementyes
failed payout handlingyes
reconciliation gateyes

Payout tanpa reconciliation adalah cara cepat menciptakan financial loss yang sulit ditemukan.


9. Operational Readiness Review

Pertanyaan operational readiness:

  1. Jika webhook hilang, siapa yang tahu?
  2. Jika provider timeout tapi uang berhasil terdebit, sistem melakukan apa?
  3. Jika merchant komplain settlement kurang, support melihat apa?
  4. Jika operator salah melakukan manual adjustment, bagaimana reversal-nya?
  5. Jika reconciliation break muncul 5000 item, bagaimana triage-nya?
  6. Jika payout salah beneficiary, apa emergency procedure-nya?

9.1 Runbook Minimum

Runbook wajib:

RUNBOOK-001 unknown payment state
RUNBOOK-002 duplicate charge complaint
RUNBOOK-003 missing webhook
RUNBOOK-004 provider outage
RUNBOOK-005 settlement mismatch
RUNBOOK-006 payout failed
RUNBOOK-007 payout sent wrong beneficiary
RUNBOOK-008 reconciliation break spike
RUNBOOK-009 manual adjustment correction
RUNBOOK-010 card data leakage suspicion
RUNBOOK-011 secret/key compromise
RUNBOOK-012 database migration rollback/compensation

9.2 Drill

Runbook yang tidak pernah dilatih adalah dokumentasi fiksi.

Minimal drill sebelum live:

DrillExpected proof
provider timeout + later webhook successno double charge, correct state recovery
webhook invalid signature floodrejected safely, alert works
duplicate settlement filededupe works
reconciliation mismatchbreak created, assigned, resolved
emergency provider disablerouting stops using provider
backoffice manual adjustmentmaker-checker + ledger balanced
secret rotationno downtime or controlled downtime
payment incident tabletoprole clarity + timeline

10. SRE Readiness

SRE readiness bukan hanya CPU/memory dashboard.

Payment-specific SLO:

SLOExample
authorization availability99.9% successful platform handling excluding issuer decline
webhook processing latencyp95 provider event applied within 60s
unknown state resolution99% resolved within 15 minutes
reconciliation freshnessprovider report reconciled within T+1 cutoff
payout executioneligible payout processed before cutoff
ledger integrityzero unbalanced posted journal
duplicate ledger postingzero
critical break ageno critical break older than SLA

Critical alert examples:

ledger_unbalanced_journal_count > 0
payment_duplicate_charge_detected > 0
webhook_signature_invalid_rate spikes
unknown_payment_state_age_p95 > threshold
settlement_batch_generation_failed
payout_instruction_stuck_after_cutoff
reconciliation_critical_break_count > threshold
provider_success_rate_drop_by_route

11. Compliance Readiness

Compliance readiness bergantung yurisdiksi dan business model, tetapi payment platform umumnya perlu evidence untuk:

  1. PCI DSS scope jika terkait kartu.
  2. AML/CFT/sanctions screening jika melakukan regulated money movement.
  3. KYC/KYB/beneficial ownership jika onboarding merchant.
  4. Data protection/privacy.
  5. Audit log retention.
  6. Incident response.
  7. Outsourcing/third-party risk provider.
  8. Business continuity/disaster recovery.
  9. Operational risk controls.
  10. Consumer/merchant dispute handling.

Gunakan compliance sebagai control design, bukan dokumen terpisah dari sistem.

Contoh mapping:

Compliance needSystem control
sanctions screeningcompliance decision before payout
beneficial ownershipKYB entity model + document evidence
auditabilityimmutable audit event + evidence ref
least privilegerole-based backoffice action policy
incident responseincident timeline + runbook drill
data minimizationfield-level data classification
PCI loggingredaction test + access log
operational resilienceDR drill + RTO/RPO evidence

12. Readiness Scoring Model

Gunakan scoring untuk komunikasi, tetapi jangan biarkan scoring menggantikan judgement.

public record ReadinessScore(
    String domain,
    int score,
    Severity highestOpenRisk,
    List<String> blockers,
    List<String> acceptedRisks
) {}

Contoh domain scoring:

DomainScoreLaunch meaning
Financial correctness100must be near perfect
Security boundary95no critical open finding
Provider certification90all required scenario pass
Reconciliation90breaks manageable
Settlement85okay for limited pilot if payout manual
Operational runbook80acceptable with limited traffic
Observability85enough to detect critical failure
Performance75okay for pilot, not mass launch

Hard blockers tidak boleh dikalahkan rata-rata score.

If ledger integrity blocker exists:
  launch = NO

If CDE leakage risk exists:
  launch = NO

If unknown outcome has no recovery path:
  launch = NO

If payout has no reconciliation gate:
  automatic payout = NO

13. Go / No-Go Meeting

Agenda yang efektif:

  1. Scope capability yang akan diluncurkan.
  2. Traffic/risk limit awal.
  3. Review hard blockers.
  4. Review accepted risks.
  5. Review evidence pack.
  6. Review rollback/freeze plan.
  7. Review support/ops coverage.
  8. Review provider contacts/escalation.
  9. Review monitoring dashboard.
  10. Explicit decision.

Decision harus tertulis:

# Launch Decision

Capability: card.payment.authorization_capture
Launch stage: LIMITED_PRODUCTION
Merchant scope: 5 pilot merchants
Amount limit: IDR 1,000,000 per transaction
Daily volume limit: 500 transactions
Automatic payout: disabled
Manual settlement review: enabled
Decision: APPROVED_WITH_CONDITIONS
Approver: Head of Engineering, Risk Lead, Finance Ops Lead, Security Lead
Conditions:
- Reconciliation must be reviewed daily at 09:00 Asia/Jakarta.
- Unknown payments older than 30 minutes must page L2.
- Any duplicate charge complaint freezes ramp.
Accepted risks:
- Provider settlement report arrives T+1 only.
- Payout remains manual during pilot.

14. Limited Production Ramp

Jangan langsung full traffic.

Ramp gate example:

GateCondition
duplicate chargezero
unbalanced ledgerzero
unknown older than SLAzero critical
reconciliation breakbelow threshold and explained
provider error ratewithin baseline
support complaintbelow threshold
settlement validationpassed

15. Certification Is Not The End

Provider certification, PCI assessment, internal readiness, dan security review hanya membuktikan kondisi pada titik waktu tertentu.

Production readiness harus hidup terus:

every deploy
+ every provider change
+ every new payment method
+ every new country/currency
+ every new backoffice action
+ every new risk policy
+ every database migration
+ every settlement rule change
= readiness impact assessment

Jangan pernah menganggap payment system "sudah certified jadi aman".

Certification adalah snapshot. Payment platform adalah organisme hidup.


16. Final Readiness Checklist

Sebelum Part 064 capstone, sistem minimum harus punya:

  • Payment lifecycle state machine.
  • Idempotency store.
  • Provider operation log.
  • Webhook inbox.
  • Transactional outbox.
  • Double-entry ledger.
  • Ledger integrity job.
  • Reconciliation ingestion.
  • Matching engine.
  • Settlement batch.
  • Payout control.
  • Backoffice controlled action.
  • Audit trail.
  • Risk decision store.
  • Compliance decision where needed.
  • PCI scope diagram where relevant.
  • Secret/key inventory.
  • Observability dashboard.
  • Incident runbooks.
  • Provider simulator.
  • Load test result.
  • Property-based invariant tests.
  • Go/no-go evidence pack.

Jika salah satu dari berikut belum ada, jangan full production:

No ledger integrity proof
No idempotency proof
No unknown-state recovery
No webhook replay/dedupe
No reconciliation workflow
No settlement explainability
No payout control
No audit trail
No incident runbook
No security boundary

17. Sources and Further Reading

Lesson Recap

You just completed lesson 63 in final stretch. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.