Exception Queue, Repair Workbench, and Case-Oriented Operations
Learn Java Core Banking System - Part 025
Exception queue, repair workbench, manual intervention, SLA, escalation, evidence, ownership, and case-oriented operations for defensible core banking systems.
Part 025 — Exception Queue, Repair Workbench, and Case-Oriented Operations
Core banking yang matang tidak berpura-pura semua proses akan selalu straight-through. Sistem yang matang menganggap exception sebagai first-class domain: terklasifikasi, dimiliki, punya SLA, bisa diperbaiki, punya evidence, dan tidak merusak ledger truth.
Di aplikasi biasa, error sering dipandang sebagai sesuatu yang harus dihilangkan. Di core banking, sebagian exception memang defect, tetapi sebagian lain adalah hasil normal dari realitas operasional:
- payment rail mengirim acknowledgement terlambat;
- incoming file mengandung data yang tidak bisa dipetakan;
- account sedang blocked tetapi ada payment masuk;
- posting request lolos validasi channel tetapi gagal business rule di core;
- GL handoff selesai tetapi acknowledgement dari downstream hilang;
- EOD job terhenti di tengah batch;
- settlement amount tidak cocok dengan statement eksternal;
- checker menolak transaksi manual karena reason tidak cukup;
- fraud/sanctions engine mengembalikan keputusan
REVIEW; - hasil migration berbeda dari expected opening balance.
Sistem top-tier tidak hanya bertanya:
How do we avoid errors?
Ia bertanya:
When errors happen, how do we classify them, preserve truth, assign ownership, repair safely, prove what happened, and prevent recurrence?
1. Posisi Part Ini dalam Seri
Part sebelumnya membahas maker-checker dan audit trail. Sekarang kita menyatukan keduanya ke dalam operational workbench.
Exception queue adalah jembatan antara mesin dan operasi manusia. Repair workbench adalah tempat operator memperbaiki masalah tanpa perlu akses langsung ke database atau menjalankan script ad-hoc.
2. Mental Model: Exception Bukan Sama dengan Error Log
Jangan menyamakan exception queue dengan application_error_log.
| Hal | Error Log | Exception Queue |
|---|---|---|
| Tujuan | Debugging teknis | Penyelesaian operasional |
| Pengguna utama | Engineer/SRE | Operations, back office, risk, support |
| Unit kerja | Log line, stack trace | Case/item yang punya lifecycle |
| Ownership | Tidak selalu ada | Wajib ada owner/queue/team |
| SLA | Jarang eksplisit | Wajib eksplisit |
| Audit evidence | Biasanya lemah | Wajib kuat |
| Repair action | Manual/script | Controlled action |
| Business impact | Implisit | Eksplisit |
| Closure | Tidak formal | Reasoned closure |
Error log menjawab:
What did the code fail to do?
Exception case menjawab:
What business item is blocked, what is its financial/regulatory impact, who owns it, what decision was made, and how was it resolved?
3. Jenis Exception dalam Core Banking
Exception harus diklasifikasi berdasarkan sumber dan dampak, bukan hanya berdasarkan stack trace.
3.1 Validation Exception
Terjadi sebelum financial effect dibuat.
Contoh:
- account tidak ditemukan;
- currency tidak sesuai;
- product tidak eligible;
- amount melebihi limit;
- business date sudah melewati cutoff;
- duplicate business key dengan payload berbeda.
Handling:
- reject deterministik;
- tidak membuat journal;
- simpan rejection evidence;
- tampilkan reason yang bisa dipahami channel/ops;
- jangan auto-retry kecuali root cause transient.
3.2 Processing Exception
Terjadi saat proses sedang mengeksekusi decision yang sudah diterima.
Contoh:
- database timeout setelah journal insert tetapi sebelum response terkirim;
- balance snapshot gagal update;
- event outbox gagal dipublish;
- EOD batch terhenti setelah sebagian account selesai;
- external acknowledgement tidak diterima.
Handling:
- tentukan outcome: success, failed, atau unknown;
- gunakan idempotency untuk safe retry;
- gunakan reconciliation untuk outcome yang tidak pasti;
- tidak boleh membuat posting kedua karena panic retry.
3.3 Integration Exception
Terjadi di boundary dengan sistem lain.
Contoh:
- ISO 20022 message gagal parse;
- payment rail mengembalikan status yang tidak dikenal;
- GL downstream menolak batch;
- sanctions engine timeout;
- partner API mengirim field mandatory kosong;
- card processor mengirim reversal setelah hold expired.
Handling:
- isolasi canonical model dari message eksternal;
- simpan raw evidence dengan aman;
- mapping error masuk repair queue;
- gunakan acknowledgement protocol yang eksplisit;
- jangan menyimpulkan settlement hanya dari message ambiguous.
3.4 Reconciliation Exception
Terjadi ketika dua sumber kebenaran tidak cocok.
Contoh:
- internal ledger berbeda dengan settlement statement;
- GL control account tidak cocok dengan subledger;
- bank statement eksternal berisi transaction yang tidak ditemukan;
- expected fee posting tidak muncul;
- EOD control total tidak balance.
Handling:
- create recon break;
- assign owner;
- hitung aging;
- tentukan financial impact;
- lakukan investigation;
- repair dengan posting/reversal/adjustment resmi.
3.5 Control Exception
Terjadi karena aturan kontrol tidak terpenuhi.
Contoh:
- maker dan checker sama;
- approval limit tidak cukup;
- override reason kosong;
- privileged user mencoba action tanpa ticket;
- emergency access belum ditutup;
- EOD precheck gagal.
Handling:
- block action;
- require approval tambahan;
- escalate;
- simpan security/control evidence;
- tidak boleh bypass melalui database patch.
4. Exception Case sebagai Aggregate
Dalam domain model, exception case harus punya identity dan lifecycle sendiri.
public record ExceptionCaseId(String value) {}
public enum ExceptionType {
VALIDATION,
PROCESSING,
INTEGRATION,
RECONCILIATION,
CONTROL,
SECURITY,
MIGRATION,
UNKNOWN_OUTCOME
}
public enum ExceptionSeverity {
LOW,
MEDIUM,
HIGH,
CRITICAL
}
public enum ExceptionStatus {
OPEN,
TRIAGED,
ASSIGNED,
INVESTIGATING,
WAITING_FOR_EXTERNAL,
WAITING_FOR_APPROVAL,
READY_FOR_REPAIR,
REPAIR_IN_PROGRESS,
REPAIRED,
CLOSED_NO_ACTION,
CLOSED_DUPLICATE,
ESCALATED,
CANCELLED
}
public final class ExceptionCase {
private final ExceptionCaseId id;
private final ExceptionType type;
private ExceptionSeverity severity;
private ExceptionStatus status;
private final String sourceSystem;
private final String businessKey;
private final String correlationId;
private final String causationId;
private final LocalDate businessDate;
private final Instant detectedAt;
private String ownerTeam;
private String ownerUserId;
private String currentReason;
private int attemptCount;
private Instant slaDueAt;
private final List<ExceptionCaseEvent> events = new ArrayList<>();
public void assignTo(String team, String userId, Actor actor, Instant now) {
requireStatus(ExceptionStatus.TRIAGED, ExceptionStatus.ASSIGNED, ExceptionStatus.ESCALATED);
this.ownerTeam = Objects.requireNonNull(team);
this.ownerUserId = userId;
this.status = ExceptionStatus.ASSIGNED;
events.add(ExceptionCaseEvent.assigned(id, team, userId, actor, now));
}
public void markReadyForRepair(String reason, Actor actor, Instant now) {
requireStatus(ExceptionStatus.INVESTIGATING, ExceptionStatus.WAITING_FOR_EXTERNAL);
if (reason == null || reason.isBlank()) {
throw new IllegalArgumentException("Repair reason is required");
}
this.currentReason = reason;
this.status = ExceptionStatus.READY_FOR_REPAIR;
events.add(ExceptionCaseEvent.readyForRepair(id, reason, actor, now));
}
private void requireStatus(ExceptionStatus... allowed) {
if (Arrays.stream(allowed).noneMatch(s -> s == status)) {
throw new IllegalStateException("Invalid transition from " + status);
}
}
}
Aggregate ini bukan pengganti workflow engine. Ia adalah domain object yang menjaga invariant case. Workflow engine, BPMN, queue, atau task manager boleh mengorkestrasi, tetapi domain rule tetap eksplisit.
5. State Machine Exception Case
Exception tanpa state machine akan berubah menjadi daftar tiket yang tidak bisa dikontrol.
State transition harus menghasilkan audit event. Jangan update status diam-diam.
6. Case Identity dan Deduplication
Exception bisa muncul berkali-kali dari retry, batch rerun, atau event replay. Tanpa deduplication, operations akan melihat ratusan case untuk masalah yang sama.
Gunakan identity bertingkat:
| Identity | Tujuan |
|---|---|
exception_case_id | primary identity case |
business_key | identity domain: payment id, transaction id, batch id |
source_system | asal exception |
exception_fingerprint | dedupe root cause |
correlation_id | trace end-to-end |
causation_id | event/action penyebab langsung |
occurrence_id | kejadian individual |
Contoh fingerprint:
source=PAYMENT_GATEWAY
businessKey=PAY-20260628-001
exceptionType=INTEGRATION
errorCode=UNKNOWN_RAIL_STATUS
normalizedDetailHash=bd32...91
Aturan:
- fingerprint sama + case masih open = tambah occurrence;
- fingerprint sama + case closed = boleh reopen atau create linked case;
- fingerprint beda = create case baru;
- business key sama tetapi error berbeda = jangan otomatis merge;
- duplicate harus bisa dibuktikan, bukan hanya feeling operator.
7. Severity, Priority, dan SLA
Banyak tim mencampur severity dan priority. Dalam banking, bedanya penting.
| Dimensi | Arti | Contoh |
|---|---|---|
| Severity | dampak objektif | customer money at risk, GL imbalance |
| Priority | urutan pengerjaan | high-value customer, regulator deadline |
| SLA | waktu target penyelesaian | 2 jam, same day, T+1 |
| Escalation | jalur naik jika tidak selesai | ops lead, risk, incident commander |
Contoh rule:
public ExceptionSeverity classifySeverity(ExceptionContext ctx) {
if (ctx.moneyMovementUncertain()) return ExceptionSeverity.CRITICAL;
if (ctx.affectsLedgerBalance()) return ExceptionSeverity.HIGH;
if (ctx.affectsCustomerVisibility()) return ExceptionSeverity.MEDIUM;
return ExceptionSeverity.LOW;
}
public Duration calculateSla(ExceptionType type, ExceptionSeverity severity) {
return switch (severity) {
case CRITICAL -> Duration.ofHours(1);
case HIGH -> Duration.ofHours(4);
case MEDIUM -> Duration.ofDays(1);
case LOW -> Duration.ofDays(3);
};
}
Namun SLA bukan sekadar angka. Ia harus mempertimbangkan:
- business date;
- cutoff;
- EOD dependency;
- settlement window;
- regulatory reporting deadline;
- customer impact;
- financial amount;
- affected product;
- repeat occurrence;
- fraud/security signal.
8. Repair Workbench: Bukan Admin CRUD
Repair workbench bukan layar untuk edit record.
Repair workbench adalah controlled interface untuk membuat tindakan korektif yang valid secara domain.
Workbench harus menyediakan:
- read-only evidence panel;
- timeline kejadian;
- related ledger entries;
- related payments/messages;
- related recon breaks;
- decision history;
- safe repair actions;
- simulation output;
- approval path;
- closure reason.
Yang tidak boleh:
- arbitrary SQL edit;
- free-form amount mutation;
- silent status update;
- changing ledger journal in place;
- bypass maker-checker;
- repair tanpa reason;
- repair tanpa idempotency key;
- closing high-risk case tanpa verification.
9. Repair Intent
Operator tidak boleh memilih “edit database”. Ia harus memilih intent yang domain-valid.
| Intent | Makna | Output |
|---|---|---|
RETRY_PROCESSING | retry proses idempotent | command replay |
REPAIR_REFERENCE_DATA | perbaiki mapping/reference data | config change + reprocess |
REVERSE_TRANSACTION | membatalkan efek finansial | reversal journal |
ADJUST_BALANCE | koreksi melalui adjustment resmi | adjustment journal |
RECLASSIFY_GL | ubah mapping/accounting classification | GL reclass posting |
RELEASE_HOLD | melepas hold/lien/block | hold release event |
MARK_DUPLICATE | case duplikat | linked closure |
WAIT_EXTERNAL | butuh bukti eksternal | pending state |
ESCALATE_RISK | perlu risk/compliance decision | escalation event |
CLOSE_NO_ACTION | valid setelah investigasi | closure evidence |
Contoh command:
public sealed interface RepairCommand permits
RetryProcessing,
ReverseTransaction,
CreateAdjustment,
ReleaseHold,
CloseNoAction {
ExceptionCaseId caseId();
String repairReason();
String idempotencyKey();
Actor actor();
}
public record CreateAdjustment(
ExceptionCaseId caseId,
AccountId accountId,
Money amount,
AdjustmentReason reason,
String repairReason,
String idempotencyKey,
Actor actor
) implements RepairCommand {}
10. Simulation Before Repair
Sebelum repair dieksekusi, sistem harus bisa mensimulasikan dampak.
Untuk financial repair, simulation minimal menjawab:
- journal apa yang akan dibuat;
- debit/credit account mana yang terpengaruh;
- balance before/after;
- GL account mapping;
- statement impact;
- customer-visible impact;
- tax/fee/interest side effect;
- business date/posting date/value date;
- approval requirement;
- reconciliation impact.
Contoh output simulation:
{
"caseId": "EXC-20260628-00091",
"repairIntent": "CREATE_ADJUSTMENT",
"requiresApproval": true,
"approvalReason": "Amount exceeds operator repair limit",
"journalPreview": {
"balanced": true,
"lines": [
{ "account": "CUSTOMER_DEPOSIT_123", "direction": "CREDIT", "amount": "100000.00", "currency": "IDR" },
{ "account": "SUSPENSE_REPAIR", "direction": "DEBIT", "amount": "100000.00", "currency": "IDR" }
]
},
"balanceImpact": {
"ledgerBalanceAfter": "2500000.00",
"availableBalanceAfter": "2500000.00"
}
}
Simulation bukan guarantee mutlak, karena state bisa berubah sebelum execution. Karena itu execution tetap harus revalidate.
11. Controlled Retry dan Unknown Outcome
Retry paling berbahaya adalah retry setelah outcome tidak diketahui.
Contoh:
- request timeout setelah database commit;
- payment rail tidak mengirim response;
- batch job crash setelah sebagian item berhasil;
- GL handoff sudah diterima tetapi acknowledgement hilang.
Jangan langsung retry command dengan asumsi gagal. Gunakan state UNKNOWN_OUTCOME.
Unknown outcome harus diselesaikan melalui evidence, bukan asumsi.
12. Case-Oriented Operations dan Human Workflow
Dalam bank, banyak proses bukan hanya technical retry. Ada keputusan manusia.
Contoh:
- apakah payment boleh dilepas setelah sanctions review;
- apakah fee waiver layak;
- apakah settlement break boleh ditulis ke suspense;
- apakah backdated adjustment boleh dilakukan;
- apakah EOD boleh lanjut meskipun ada warning;
- apakah customer complaint valid;
- apakah migration defect harus cutover-blocking.
Case-oriented operations berarti setiap item kerja punya:
- identity;
- classification;
- owner;
- SLA;
- evidence;
- decision;
- action;
- verification;
- closure.
Ini cocok dengan domain regulatory/enforcement-style thinking: bukan hanya “task done”, tetapi “decision defensible”.
13. Data Model Minimal
Contoh relational schema sederhana:
CREATE TABLE exception_case (
id VARCHAR(64) PRIMARY KEY,
type VARCHAR(40) NOT NULL,
severity VARCHAR(20) NOT NULL,
status VARCHAR(40) NOT NULL,
source_system VARCHAR(80) NOT NULL,
business_key VARCHAR(160),
exception_fingerprint VARCHAR(128) NOT NULL,
correlation_id VARCHAR(128),
causation_id VARCHAR(128),
business_date DATE NOT NULL,
detected_at TIMESTAMP NOT NULL,
owner_team VARCHAR(80),
owner_user_id VARCHAR(80),
sla_due_at TIMESTAMP,
current_reason VARCHAR(1000),
occurrence_count BIGINT NOT NULL DEFAULT 1,
version BIGINT NOT NULL,
closed_at TIMESTAMP,
closure_reason VARCHAR(1000)
);
CREATE UNIQUE INDEX ux_exception_open_fingerprint
ON exception_case(exception_fingerprint)
WHERE status NOT IN ('REPAIRED', 'CLOSED_NO_ACTION', 'CLOSED_DUPLICATE', 'CANCELLED');
CREATE TABLE exception_occurrence (
id VARCHAR(64) PRIMARY KEY,
case_id VARCHAR(64) NOT NULL REFERENCES exception_case(id),
occurred_at TIMESTAMP NOT NULL,
error_code VARCHAR(80),
normalized_message VARCHAR(1000),
raw_evidence_ref VARCHAR(256),
trace_id VARCHAR(128),
span_id VARCHAR(128)
);
CREATE TABLE exception_case_event (
id VARCHAR(64) PRIMARY KEY,
case_id VARCHAR(64) NOT NULL REFERENCES exception_case(id),
event_type VARCHAR(80) NOT NULL,
actor_id VARCHAR(80) NOT NULL,
actor_role VARCHAR(80),
occurred_at TIMESTAMP NOT NULL,
reason VARCHAR(1000),
payload_hash VARCHAR(128),
previous_status VARCHAR(40),
next_status VARCHAR(40)
);
Catatan penting:
- jangan simpan raw sensitive payload sembarangan;
raw_evidence_refharus menunjuk ke evidence store yang access-controlled;versiondigunakan untuk optimistic locking;- open-fingerprint uniqueness mencegah case explosion;
- case event adalah audit timeline.
14. Exception Queue Query Model
Operations butuh query yang berbeda dari engineer.
Contoh filter penting:
- by status;
- by severity;
- by SLA breach;
- by owner team;
- by business date;
- by source system;
- by product;
- by amount band;
- by customer segment;
- by dependency to EOD;
- by reconciliation break age;
- by repeat count.
Contoh read model:
CREATE VIEW exception_case_worklist AS
SELECT
c.id,
c.type,
c.severity,
c.status,
c.source_system,
c.business_key,
c.business_date,
c.owner_team,
c.owner_user_id,
c.detected_at,
c.sla_due_at,
CASE WHEN c.sla_due_at < CURRENT_TIMESTAMP THEN true ELSE false END AS sla_breached,
c.occurrence_count,
c.current_reason
FROM exception_case c
WHERE c.status NOT IN ('REPAIRED', 'CLOSED_NO_ACTION', 'CLOSED_DUPLICATE', 'CANCELLED');
Untuk high-volume case, query model boleh diproyeksikan ke search index. Tetapi source-of-truth case lifecycle tetap transactional.
15. Evidence Pack per Case
Exception case harus punya evidence pack.
Minimal evidence:
| Evidence | Contoh |
|---|---|
| Trigger evidence | message inbound, command request, job id |
| Business context | account, product, amount, currency, business date |
| Technical context | trace id, service version, error code |
| Decision context | rule result, validation result, approval matrix |
| Financial context | journal id, posting batch, balance impact |
| External context | rail status, bank statement, GL acknowledgement |
| Human context | assignee, notes, approval, closure reason |
| Verification | recon result, post-repair check, control total |
Evidence pack harus menjawab:
Can another qualified person reconstruct the case without asking the original operator?
Jika jawabannya tidak, evidence masih lemah.
16. Notes vs Structured Findings
Operator notes berguna, tetapi tidak cukup.
Buruk:
Sudah dicek, aman.
Lebih baik:
Finding type: EXTERNAL_ACK_RECEIVED
Source: Clearing portal
Reference: CLR-ACK-20260628-8891
Outcome: Payment accepted at 2026-06-28T10:21:11+07:00
Impact: Safe to mark outgoing payment as accepted; no duplicate debit required.
Gunakan structured finding:
public record CaseFinding(
String id,
ExceptionCaseId caseId,
FindingType type,
String source,
String externalReference,
String conclusion,
String evidenceRef,
Actor recordedBy,
Instant recordedAt
) {}
Notes boleh ada, tetapi decision tidak boleh bergantung hanya pada free text.
17. Maker-Checker untuk Repair
Tidak semua repair butuh checker. Tetapi financial-impacting repair hampir selalu butuh kontrol tambahan.
Rule contoh:
| Repair | Checker? | Alasan |
|---|---|---|
| retry idempotent setelah no-commit verified | Tidak selalu | low risk jika evidence kuat |
| close duplicate case | Kadang | tergantung severity |
| release small expired hold | Kadang | policy-dependent |
| manual debit/credit adjustment | Ya | financial impact |
| backdated posting | Ya | period/reporting impact |
| GL reclassification | Ya | accounting impact |
| EOD override | Ya | operational risk |
| sanctions/fraud release | Ya | compliance risk |
Checker harus melihat:
- original case;
- evidence;
- repair simulation;
- maker reason;
- policy that grants authority;
- expected financial impact;
- duplicate/retry risk.
18. Integration dengan Reconciliation
Banyak exception hanya bisa ditutup setelah reconciliation.
Contoh:
- external transfer timeout;
- settlement mismatch;
- ATM dispense uncertain;
- GL handoff unknown;
- incoming camt statement contains unmatched item.
Closure rule:
If financial outcome is externally determined, do not close until external evidence or reconciliation result exists.
Relasi model:
Case boleh memicu recon. Recon break boleh memicu case.
19. Integration dengan Incident Management
Tidak semua exception adalah incident. Tetapi beberapa exception harus dinaikkan menjadi incident.
Escalate menjadi incident jika:
- banyak case dengan fingerprint sama dalam waktu pendek;
- ada ledger imbalance;
- ada customer money movement uncertain;
- EOD terblokir;
- payment rail outage berdampak luas;
- privileged access anomaly;
- data corruption suspected;
- regulatory reporting deadline terancam.
Exception workbench harus bisa membuat link ke incident/ticket system. Tetapi jangan pindahkan evidence hanya ke ticketing tool. Core evidence tetap harus berada di sistem yang controlled dan queryable.
20. Auto-Repair: Kapan Boleh?
Auto-repair boleh jika semua syarat terpenuhi:
- outcome deterministic;
- no financial ambiguity;
- repair idempotent;
- impact kecil atau non-financial;
- policy mengizinkan;
- evidence cukup;
- monitoring aktif;
- bisa rollback melalui correction/reversal resmi;
- control totals diverifikasi;
- ada audit event.
Contoh auto-repair aman:
- reprocess event projection yang gagal setelah source ledger confirmed;
- retry GL notification setelah outbox stuck;
- re-read external acknowledgement setelah timeout;
- close duplicate low-risk case dengan exact fingerprint.
Contoh auto-repair berbahaya:
- membuat debit/credit manual;
- mengubah value date;
- melepas sanctions hold;
- menulis ke suspense tanpa approval;
- bypass insufficient fund rule;
- mengubah balance snapshot tanpa journal.
21. Repair Idempotency
Repair juga harus idempotent.
Contoh:
@Service
public class RepairCommandHandler {
private final RepairIdempotencyRepository idempotencyRepository;
private final PostingService postingService;
private final ExceptionCaseRepository caseRepository;
@Transactional
public RepairResult handle(CreateAdjustment command) {
return idempotencyRepository.find(command.idempotencyKey())
.map(RepairResult::fromExisting)
.orElseGet(() -> executeNew(command));
}
private RepairResult executeNew(CreateAdjustment command) {
ExceptionCase c = caseRepository.getForUpdate(command.caseId());
c.requireReadyForRepair();
PostingResult result = postingService.postAdjustment(
command.accountId(),
command.amount(),
command.reason(),
command.idempotencyKey()
);
c.markRepaired("Adjustment posted: " + result.journalId(), command.actor(), Instant.now());
caseRepository.save(c);
idempotencyRepository.save(command.idempotencyKey(), result);
return RepairResult.completed(result.journalId());
}
}
Perhatikan bahwa idempotency record dan case update harus berada dalam transaction boundary yang jelas.
22. Jangan Menggunakan Exception Queue sebagai Tempat Sampah
Anti-pattern umum:
22.1 Semua Error Masuk Queue
Akibat:
- operations kewalahan;
- critical case tertutup noise;
- engineer berhenti memperbaiki root cause;
- SLA tidak berarti.
Solusi:
- classification gate;
- dedupe;
- auto-close untuk known benign cases;
- technical alert tetap ke observability, bukan ops queue.
22.2 Repair by SQL
Akibat:
- audit trail hilang;
- ledger tidak balance;
- downstream tidak tahu;
- reconciliation makin buruk;
- defect berulang.
Solusi:
- controlled repair command;
- maker-checker;
- simulation;
- journal-based correction;
- privileged script hanya emergency dengan governance ketat.
22.3 Closing Without Verification
Akibat:
- case terlihat selesai tapi uang belum benar;
- GL mismatch muncul besok;
- customer dispute muncul belakangan.
Solusi:
- closure criteria per type;
- verification checklist;
- evidence required;
- post-repair control totals.
22.4 Treating Human Notes as Truth
Akibat:
- tidak queryable;
- tidak comparable;
- tidak machine-checkable;
- sulit audit.
Solusi:
- structured findings;
- enum reason;
- evidence reference;
- mandatory fields untuk high-risk case.
23. Metrics untuk Exception Operations
Metrics harus memisahkan volume, risk, dan effectiveness.
| Metric | Makna |
|---|---|
| open case count by severity | risk exposure saat ini |
| SLA breach count | operational backlog |
| mean time to triage | kecepatan klasifikasi |
| mean time to repair | kecepatan penyelesaian |
| repeat fingerprint count | root cause belum selesai |
| auto-repair success rate | kualitas automation |
| reopened case count | closure quality rendah |
| manual financial adjustment count | operational friction/risk |
| suspense aging | financial uncertainty |
| EOD-blocking exception count | operational readiness |
Metric yang buruk:
Total errors today: 10,000
Metric yang berguna:
Critical unknown-outcome payment cases: 4
Total amount at risk: IDR 2.4B
Oldest case age: 3h 12m
SLA breaches: 1
Settlement window impacted: BI-FAST T+0 batch 14
24. Dashboard Design
Dashboard operations harus menjawab pertanyaan:
- apa yang harus dikerjakan sekarang;
- apa yang mengancam EOD;
- apa yang punya financial/regulatory impact;
- apa yang perlu escalation;
- apa yang berulang;
- apakah repair berhasil;
- apakah backlog membaik atau memburuk.
Tampilan minimal:
Critical / High Cases
- Unknown payment outcome: 4 cases, IDR 2.4B
- GL batch rejected: 1 batch, affects EOD
- Suspense break > T+1: 3 cases
SLA
- Breached: 1
- Due next 1h: 7
- Due today: 39
Root Cause Clusters
- UNKNOWN_RAIL_STATUS: 22 occurrences
- GL_MAPPING_MISSING: 8 occurrences
- CUSTOMER_STATUS_STALE: 5 occurrences
25. Testing Exception Queue
Test bukan hanya happy path.
25.1 State Machine Test
@Test
void cannotRepairCaseBeforeInvestigation() {
ExceptionCase c = ExceptionCase.open(validationException());
assertThrows(IllegalStateException.class, () ->
c.markReadyForRepair("fix", actor(), Instant.now())
);
}
25.2 Deduplication Test
@Test
void repeatedOccurrenceWithSameFingerprintUpdatesExistingOpenCase() {
ExceptionOccurrence first = occurrence("PAY-1", "UNKNOWN_RAIL_STATUS");
ExceptionOccurrence second = occurrence("PAY-1", "UNKNOWN_RAIL_STATUS");
ExceptionCase c1 = service.ingest(first);
ExceptionCase c2 = service.ingest(second);
assertEquals(c1.id(), c2.id());
assertEquals(2, repository.get(c1.id()).occurrenceCount());
}
25.3 Repair Idempotency Test
@Test
void duplicateRepairCommandDoesNotCreateTwoAdjustments() {
CreateAdjustment command = validAdjustment("repair-key-001");
RepairResult first = handler.handle(command);
RepairResult second = handler.handle(command);
assertEquals(first.journalId(), second.journalId());
assertEquals(1, journalRepository.countByIdempotencyKey("repair-key-001"));
}
25.4 Closure Evidence Test
@Test
void highSeverityCaseCannotCloseWithoutVerificationEvidence() {
ExceptionCase c = highSeverityCaseReadyToClose();
assertThrows(MissingEvidenceException.class, () ->
closureService.close(c.id(), ClosureRequest.noEvidence())
);
}
26. Security dan Access Control
Repair workbench adalah high-risk surface.
Kontrol minimal:
- role-based access;
- attribute-based restriction: branch, product, amount, case type;
- maker-checker separation;
- privileged action logging;
- session re-authentication untuk high-risk repair;
- no direct database access untuk operator;
- masking PII;
- least privilege evidence access;
- export control;
- break-glass governance.
Contoh policy:
User can create adjustment repair if:
- user.role contains OPERATIONS_REPAIR_MAKER
- case.status == READY_FOR_REPAIR
- case.severity != CRITICAL
- amount <= user.repairLimit
- user.branch in account.allowedBranches
- user.id != originalMakerId
Authorization decision harus disimpan sebagai evidence, minimal policy version dan evaluated attributes.
27. Relationship dengan Data Governance
Exception case sering mengungkap masalah data:
- missing product mapping;
- stale customer status;
- invalid reference data;
- duplicated party identity;
- currency precision mismatch;
- holiday calendar error;
- wrong GL account mapping;
- incomplete external message mapping.
Jangan hanya memperbaiki item. Catat root cause.
Top-tier system menghubungkan operational exception ke improvement loop.
28. Mini Project: Build Exception Repair Slice
Bangun vertical slice kecil:
- ingest exception occurrence;
- dedupe by fingerprint;
- create case;
- triage case;
- assign owner;
- attach evidence;
- simulate repair;
- require checker for financial adjustment;
- execute idempotent repair;
- close with verification.
Domain yang cukup:
Payment timeout after debit request.
System must determine whether debit happened.
If debit happened, mark payment completed.
If debit did not happen, retry safely.
If ambiguous, keep case open and require external evidence.
Acceptance criteria:
- no duplicate posting under retry;
- case timeline complete;
- high-risk repair requires checker;
- closure impossible without verification;
- dashboard can show open critical cases;
- evidence links payment, journal, and external acknowledgement.
29. Self-Correction Checklist
Gunakan checklist ini saat review desain:
| Pertanyaan | Red Flag |
|---|---|
| Apakah exception punya lifecycle? | hanya tabel error log |
| Apakah owner jelas? | case open tanpa assignee |
| Apakah repair domain-valid? | operator edit field bebas |
| Apakah financial repair membuat journal? | balance diubah langsung |
| Apakah retry idempotent? | retry bisa double debit |
| Apakah unknown outcome diperlakukan khusus? | timeout dianggap gagal |
| Apakah closure punya evidence? | closed karena “sudah dicek” |
| Apakah repeat root cause terlihat? | fingerprint tidak ada |
| Apakah SLA berbasis risk? | semua case punya SLA sama |
| Apakah high-risk repair butuh checker? | maker bisa execute sendiri |
30. Ringkasan
Exception queue dan repair workbench adalah bagian dari core banking control plane.
Prinsip utama:
- exception adalah operational case, bukan sekadar log;
- setiap case harus punya identity, status, owner, SLA, evidence, dan closure reason;
- repair harus berupa command domain-valid, bukan database edit;
- financial repair harus melalui posting/reversal/adjustment resmi;
- unknown outcome harus diselesaikan dengan idempotency + reconciliation + evidence;
- high-risk repair butuh maker-checker;
- deduplication mencegah operational noise;
- structured findings lebih kuat daripada free-text notes;
- metrics harus menunjukkan risk exposure, bukan hanya error count;
- exception system harus menghubungkan operasi harian ke root-cause improvement.
Pada part berikutnya kita naik ke level risk data aggregation dan regulatory reporting readiness: bagaimana core banking menghasilkan angka yang complete, accurate, timely, traceable, dan defensible.
References
- FFIEC, Architecture, Infrastructure, and Operations Booklet, 2021. https://ithandbook.ffiec.gov/it-booklets/architecture,-infrastructure,-and-operations.aspx
- Federal Reserve SR 21-11, FFIEC Architecture, Infrastructure, and Operations Booklet. https://www.federalreserve.gov/supervisionreg/srletters/SR2111.htm
- NIST, Cybersecurity Framework 2.0, 2024. https://www.nist.gov/cyberframework
- OWASP, Logging Cheat Sheet. https://cheatsheetseries.owasp.org/cheatsheets/Logging_Cheat_Sheet.html
- BIS/CPMI-IOSCO, Principles for Financial Market Infrastructures. https://www.bis.org/cpmi/publ/d101a.pdf
You just completed lesson 25 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.
Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.