Series/Learn Java Microservices CPQ OMS Platform

Final StretchOrdered learning track

Capstone, Production Readiness, and Top 1% Review

Learn Java Microservices CPQ OMS Platform - Part 035

Capstone akhir untuk membangun, menilai, menguji, dan mengoperasikan platform Java microservices CPQ/OMS secara production-ready dengan OpenAPI First, Schema First, JAX-RS/Jersey, PostgreSQL, MyBatis, Camunda 7, Kafka, Redis, dan praktik engineering tingkat lanjut.

[2026-07-02]34 min read6757 words

In This Lesson

1. Kaufman Skill Closure 2. Capstone Scenario 3. Target Repository at the End of the Series

Finish

Lesson 3535 lesson track30–35 Final Stretch

#java#microservices#cpq#oms+12 more

Part 035 — Capstone, Production Readiness, and Top 1% Review

Part ini adalah penutup seri. Tujuannya bukan memperkenalkan teknologi baru, tetapi menyatukan seluruh keputusan engineering dari Part 001 sampai Part 034 menjadi satu cara berpikir yang utuh.

Pada titik ini, kita sudah membangun mental model, kontrak API, schema, persistence layer, catalog, configuration, pricing, quote lifecycle, approval, order lifecycle, Camunda orchestration, Kafka event backbone, outbox/inbox, Redis runtime patterns, security, testing, observability, resilience, performance, deployment, runbook, dan auditability.

Sekarang pertanyaannya berubah.

Bukan lagi:

“Bagaimana cara membuat service CPQ/OMS?”

Tetapi:

“Bagaimana membuktikan bahwa platform ini benar, aman, operable, evolvable, dan layak dipakai untuk transaksi bisnis nyata?”

Itulah perbedaan antara engineer yang bisa membuat sistem berjalan dan engineer yang bisa membuat sistem dipercaya.

1. Kaufman Skill Closure

Dalam kerangka Josh Kaufman, tahap akhir pembelajaran bukan sekadar mengumpulkan informasi. Tahap akhirnya adalah kemampuan melakukan self-correction melalui praktik yang nyata.

Untuk seri ini, self-correction berarti kita mampu menjawab pertanyaan berikut dengan bukti teknis:

Apakah lifecycle quote dan order memiliki invariant yang jelas?
Apakah setiap command punya idempotency behavior?
Apakah setiap state transition bisa diaudit?
Apakah setiap event bisa direplay tanpa merusak data?
Apakah setiap failure mode punya recovery path?
Apakah setiap API change punya compatibility rule?
Apakah setiap schema change bisa dimigrasikan aman?
Apakah setiap decision penting punya evidence?
Apakah setiap service punya ownership data yang jelas?
Apakah operator bisa memahami dan memperbaiki sistem saat terjadi incident?

Jika jawabannya hanya “seharusnya bisa”, platform belum matang.

Jika jawabannya “ini query-nya, ini event-nya, ini runbook-nya, ini test-nya, ini dashboard-nya, ini rollback path-nya”, platform mulai masuk kelas production engineering.

2. Capstone Scenario

Capstone yang akan dipakai adalah satu journey lengkap:

Sales membuat quote untuk customer enterprise, memilih bundle produk, menjalankan configuration validation, menghitung harga, mengajukan approval karena diskon tinggi, quote disetujui, customer menerima quote, order dibuat, order diorkestrasi dengan Camunda 7, fulfillment line dikirim ke downstream service, event diterbitkan ke Kafka, audit trail tersimpan, dan operator mampu menangani partial failure.

Journey ini cukup kaya karena menyentuh hampir semua boundary:

HTTP API
OpenAPI contract
JSON Schema
PostgreSQL transaction
MyBatis mapper
Redis cache
pricing calculation
quote state machine
approval policy
order capture
order normalization
Camunda BPMN
Kafka event
outbox/inbox
observability
security
failure recovery
audit evidence

Diagram alurnya:

3. Target Repository at the End of the Series

Pada akhir seri, struktur repository tidak harus identik dengan contoh ini, tetapi minimal harus memiliki boundary yang sama.

learn-java-microservices-cpq-oms-platform/
├── pom.xml
├── platform-bom/
├── platform-parent/
├── contracts/
│   ├── openapi/
│   │   ├── catalog-api.yaml
│   │   ├── configuration-api.yaml
│   │   ├── pricing-api.yaml
│   │   ├── quote-api.yaml
│   │   ├── approval-api.yaml
│   │   └── order-api.yaml
│   ├── json-schema/
│   │   ├── common/
│   │   ├── commands/
│   │   ├── events/
│   │   └── snapshots/
│   └── asyncapi/
│       └── cpq-oms-events.yaml
├── services/
│   ├── catalog-service/
│   ├── configuration-service/
│   ├── pricing-service/
│   ├── quote-service/
│   ├── approval-service/
│   └── order-service/
├── workflow/
│   ├── camunda7/
│   │   ├── bpmn/
│   │   ├── delegates/
│   │   └── process-tests/
├── adapters/
│   ├── fulfillment-adapter/
│   ├── notification-adapter/
│   └── document-adapter/
├── local/
│   ├── docker-compose.yml
│   ├── init-postgres/
│   ├── init-kafka/
│   ├── init-redis/
│   └── mock-services/
├── deployment/
│   ├── kubernetes/
│   ├── helm/
│   └── environments/
├── operations/
│   ├── dashboards/
│   ├── alerts/
│   ├── runbooks/
│   └── failure-drills/
└── docs/
    ├── architecture-decisions/
    ├── threat-model/
    ├── data-classification/
    ├── audit-model/
    └── production-readiness-review/

Boundary pentingnya:

contracts/ adalah sumber kebenaran public API dan event contract.
services/* menyimpan business capability, bukan shared domain spaghetti.
workflow/camunda7/ dipisahkan agar Camunda tidak bocor ke seluruh domain.
operations/ diperlakukan sebagai bagian produk, bukan dokumentasi belakangan.
docs/architecture-decisions/ menyimpan reasoning, trade-off, dan keputusan irreversible.

4. End-to-End Artifact Map

Satu capability bisnis harus menghasilkan banyak artifact yang saling konsisten.

Contoh untuk command AcceptQuote:

Top-tier engineering tidak melihat artifact ini sebagai hal terpisah. Ia melihatnya sebagai satu chain of evidence.

Jika requirement berubah, kita tahu artifact mana yang terdampak.

Jika bug terjadi, kita tahu bukti mana yang diperiksa.

Jika auditor bertanya, kita tahu state, actor, policy, snapshot, event, dan command yang relevan.

5. The Final Architecture Review

Architecture review akhir harus menilai platform dari beberapa sudut, bukan hanya diagram service.

5.1 Business Capability Review

Pertanyaan utama:

Apakah setiap service memegang capability yang jelas?
Apakah service boundary mengikuti perubahan bisnis, bukan sekadar noun decomposition?
Apakah Quote, Approval, Order, dan Fulfillment tidak saling mencuri responsibility?
Apakah pricing snapshot cukup kuat untuk sengketa komersial?
Apakah product catalog snapshot cukup kuat untuk quote lama?
Apakah order lifecycle bisa menjelaskan partial fulfillment?
Apakah cancellation, amendment, expiry, dan rejection punya semantic yang jelas?

Red flag:

status string tanpa transition guard.
Harga quote dihitung ulang dari catalog live.
Approval hanya menyimpan approved=true.
Order line tidak punya lifecycle sendiri.
Semua service membaca database yang sama.
Event dipakai sebagai command tersembunyi tanpa ownership yang jelas.

5.2 Contract Review

Contract review menilai apakah API dan event bisa berevolusi aman.

Checklist:

Semua public endpoint punya OpenAPI spec.
Semua request/response punya schema eksplisit.
Error response memakai standard problem model.
Endpoint command punya idempotency behavior.
Pagination/filtering/sorting konsisten.
API versioning rule terdokumentasi.
Event envelope stabil.
Event schema punya compatibility rule.
Breaking change harus lewat explicit migration plan.
Contract tests berjalan di CI.

Contoh red flag:

# Bad: ambiguous event
type: QuoteUpdated
payload:
  quoteId: "..."
  status: "APPROVED"

Event seperti ini lemah karena consumer tidak tahu apa yang berubah, mengapa berubah, siapa aktornya, dan apakah ia boleh memicu side effect.

Versi yang lebih kuat:

{
  "eventId": "01J...",
  "eventType": "QuoteApproved",
  "eventVersion": 1,
  "occurredAt": "2026-07-02T10:15:30Z",
  "tenantId": "tenant-001",
  "aggregateType": "Quote",
  "aggregateId": "quote-001",
  "aggregateVersion": 7,
  "causationId": "cmd-accept-001",
  "correlationId": "corr-001",
  "actor": {
    "actorType": "USER",
    "actorId": "user-123"
  },
  "payload": {
    "approvalId": "approval-001",
    "policyVersion": "approval-policy-2026.07",
    "approvedDiscountPercent": "18.50"
  }
}

5.3 Data Architecture Review

Data architecture review harus dimulai dari invariant.

Pertanyaan:

Invariant mana yang ditegakkan di domain code?
Invariant mana yang ditegakkan di database constraint?
Invariant mana yang ditegakkan oleh state transition?
Invariant mana yang ditegakkan oleh asynchronous reconciliation?
Apakah ada invariant penting yang hanya hidup di komentar atau dokumentasi?

Contoh invariant:

A quote cannot be accepted unless:
- quote.state = APPROVED
- quote.expires_at > now
- quote.pricing_snapshot_id is not null
- quote.configuration_snapshot_id is not null
- no existing order_capture exists for the same quote acceptance idempotency key

Sebagian invariant ini harus hidup di domain logic. Sebagian lain harus hidup di database.

Contoh PostgreSQL guardrail:

create unique index uq_order_capture_quote_acceptance
on order_capture (tenant_id, quote_id, acceptance_idempotency_key);

Kode domain tanpa constraint ini masih rawan duplicate order saat retry, timeout, atau race condition.

5.4 Workflow Review

Camunda 7 harus dilihat sebagai orchestration engine, bukan database bisnis.

Pertanyaan:

Apakah order state tetap dimiliki Order Service?
Apakah process instance hanya mengorkestrasi step?
Apakah process variable tidak menyimpan full aggregate?
Apakah setiap service task idempotent?
Apakah retry behavior dibedakan antara transient dan business error?
Apakah incident punya runbook?
Apakah process versioning aman untuk instance lama?
Apakah ada migration seam jika suatu saat keluar dari Camunda 7?

Red flag:

Camunda variable menjadi sumber kebenaran order.
BPMN berisi terlalu banyak business rule.
Delegate langsung menulis banyak database service lain.
Retry Camunda memanggil external system tanpa idempotency.
Incident diselesaikan dengan delete process instance tanpa audit.

5.5 Event Architecture Review

Kafka bukan tempat untuk membuang semua perubahan.

Pertanyaan:

Apakah event merepresentasikan fakta bisnis yang sudah terjadi?
Apakah command dan event dibedakan?
Apakah topic ownership jelas?
Apakah partition key menjaga ordering yang dibutuhkan?
Apakah consumer idempotent?
Apakah replay aman?
Apakah DLT punya owner dan runbook?
Apakah event PII diklasifikasikan?
Apakah schema evolution policy berlaku?

Red flag:

topic: cpq.events
key: random UUID
event type: AnyDomainEvent
payload: arbitrary JSON

Ini bukan event backbone. Ini distributed log dumping ground.

6. Production Readiness Review

Production readiness review harus dilakukan sebelum sistem dipakai untuk transaksi nyata.

Gunakan scoring sederhana:

Score	Meaning
0	Tidak ada
1	Ada secara informal
2	Ada dan terdokumentasi
3	Ada, diuji, dan dipakai di CI/operasi
4	Ada, diuji, dipantau, dan pernah divalidasi dengan failure drill

Target minimal untuk production adalah 3 untuk area kritis. Area seperti order capture, pricing snapshot, approval audit, outbox/inbox, dan incident recovery idealnya mencapai 4.

7. Production Readiness Checklist

7.1 API Readiness

Semua endpoint public punya OpenAPI spec.
OpenAPI spec dilint di CI.
Generated code tidak dimodifikasi manual.
Error response konsisten.
Idempotency key digunakan untuk command berisiko.
Pagination dan filtering konsisten.
Authentication dan authorization terdokumentasi di contract.
Deprecated endpoint punya sunset plan.
API compatibility diuji.

7.2 Schema Readiness

Semua payload penting punya JSON Schema.
Money tidak memakai floating-point.
Timestamp memakai format dan timezone policy konsisten.
Enum evolution punya rule.
Event schema punya compatibility gate.
Snapshot schema immutable.
PII field diklasifikasikan.
Schema changes punya migration plan.

7.3 Database Readiness

7.4 MyBatis Readiness

Mapper tidak mengandung business decision kompleks.
SQL hot path punya test.
Result map eksplisit.
Type handler untuk money/JSON/time jelas.
N+1 query dicegah.
SQL timing diobservasi.
Error SQL diklasifikasikan.
Mapper test memakai PostgreSQL nyata, bukan hanya mock.

7.5 Catalog Readiness

Product/offer lifecycle jelas.
Published catalog snapshot immutable.
Quote menyimpan catalog reference/snapshot yang cukup.
Catalog publish atomic.
Catalog cache invalidation aman.
Incompatible catalog changes punya migration policy.
Catalog event punya schema stabil.

7.6 Configuration Readiness

Configuration session punya expiry.
Validation error explainable.
Compatibility rule versioned.
Finalized configuration snapshot immutable.
Configuration tidak bergantung pada catalog live setelah finalized.
Rule evaluation deterministic.
Performance untuk large bundle diuji.

7.7 Pricing Readiness

Price book versioned.
Pricing calculation deterministic.
Money precision benar.
Discount stacking rule eksplisit.
Pricing snapshot immutable.
Reprice rule jelas.
Approval signal dihasilkan dari pricing evidence.
Golden master test tersedia.
Pricing audit bisa menjelaskan harga akhir.

7.8 Quote Readiness

Quote state machine eksplisit.
Transition guard diuji.
Quote versioning ada.
Quote acceptance idempotent.
Quote expiry job aman.
Quote document mencerminkan snapshot, bukan live state.
Quote audit lengkap.
Duplicate acceptance dicegah.

7.9 Approval Readiness

Approval policy versioned.
Approval requirement explainable.
Delegation dan escalation terdokumentasi.
Manual override punya evidence.
Approval task punya SLA.
Approval decision immutable.
Approval tidak bisa dilakukan oleh actor yang tidak berwenang.
Stuck approval punya runbook.

7.10 Order Readiness

Order capture idempotent.
Order normalized dari quote snapshot.
Order line dependency graph valid.
Root state dan line state konsisten.
Cancellation semantics jelas.
Partial failure punya representation.
Manual repair command tersedia.
Reconciliation job tersedia.
Order audit lengkap.

7.11 Camunda 7 Readiness

BPMN process versioned.
Business key konsisten.
Process variable minimal dan schema-bound.
Delegate idempotent.
Async continuation dipakai di boundary yang tepat.
Incident handling punya runbook.
Job executor tuning diuji.
Process instance lama kompatibel saat deployment baru.
Exit/migration seam terdokumentasi.

7.12 Kafka Readiness

7.13 Redis Readiness

Redis hanya optimization, bukan sumber kebenaran kritis.
TTL policy jelas.
Cache key punya tenant scope.
Stampede prevention ada untuk hot keys.
Rate limiter fail-mode ditentukan.
Distributed lock memakai fencing jika side effect kritis.
Redis degradation diuji.
Memory/eviction policy dipantau.

7.14 Security Readiness

Authentication boundary jelas.
Authorization diuji negatif.
Tenant isolation diterapkan di semua layer.
Object-level authorization ada.
Privileged action diaudit.
Secrets tidak masuk log.
PII masking diterapkan.
Service-to-service authorization ada.
Break-glass access punya approval dan audit.

7.15 Observability Readiness

Correlation ID end-to-end.
Structured logs konsisten.
Metrics untuk API, DB, Kafka, Camunda, Redis ada.
Business metrics tersedia.
Trace melewati API/process/event boundary.
Dashboard untuk quote-to-order journey ada.
Alert action-oriented.
Runbook terhubung dari alert.

7.16 Resilience Readiness

Timeout budget tersedia.
Retry policy diklasifikasikan.
Circuit breaker untuk dependency berisiko.
Bulkhead untuk resource kritis.
Backpressure strategy ada.
Fallback tidak merusak invariant.
Retry storm dicegah.
Chaos/failure drill dilakukan.

7.17 Deployment Readiness

Build artifact immutable.
Config externalized.
Secrets managed.
Health/readiness probes benar.
DB migration choreography aman.
Kafka topic provisioning terkendali.
Rollback/roll-forward strategy jelas.
Canary atau progressive rollout tersedia untuk perubahan berisiko.
Deployment audit ada.

7.18 Operations Readiness

Runbook stuck order ada.
Runbook stuck approval ada.
Runbook Kafka lag ada.
Runbook Camunda incident ada.
Runbook outbox stuck ada.
Runbook Redis degradation ada.
Manual repair API punya authorization ketat.
Reconciliation report dijalankan berkala.
Post-incident review template ada.

7.19 Compliance Readiness

Audit event append-only.
Pricing explainability tersedia.
Approval evidence lengkap.
Quote/order lineage bisa ditelusuri.
Retention policy jelas.
Legal hold didukung.
PII minimization diterapkan.
Compliance export tersedia.
Audit tamper-evidence dipertimbangkan.

8. Final Failure Review

Sistem CPQ/OMS yang matang harus dinilai dari kegagalannya.

Berikut failure review yang harus bisa dijawab.

8.1 Duplicate Accept Quote

Scenario:

User menekan accept quote.
Request berhasil di server.
Network timeout terjadi sebelum client menerima response.
Client retry.
Sistem tidak boleh membuat dua order.

Expected controls:

Idempotency-Key di command API.
Unique constraint di order capture.
Domain guard quote state.
Acceptance record.
Audit event.
Response replay dari idempotency table.

Pseudo-flow:

8.2 Pricing Rule Changed After Quote

Scenario:

Quote dibuat tanggal 1.
Price book berubah tanggal 2.
Customer menerima quote tanggal 3.
Order harus memakai harga yang ada di quote, bukan harga live.

Expected controls:

Pricing snapshot immutable.
Quote binds to pricing snapshot ID.
Order capture copies commercial snapshot.
Audit stores price book version.
Pricing recalculation hanya terjadi jika explicit reprice command.

8.3 Approval Policy Changed During Approval

Scenario:

Quote submitted dengan approval policy version A.
Policy berubah ke version B.
Approver approve quote yang masih berjalan.

Expected controls:

Approval request stores policy version.
Decision uses policy version captured at submission time.
Re-evaluation hanya terjadi via explicit command.
Audit records policy version and decision actor.

8.4 Kafka Event Published but Consumer Fails

Scenario:

OrderCaptured event published.
Fulfillment consumer receives it.
Consumer writes partial data.
Consumer crashes before committing offset.

Expected controls:

Inbox table.
Idempotent handler.
External side effect idempotency key.
Consumer offset committed only after durable processing.
Duplicate event safe.

8.5 Camunda Delegate Fails After External Call

Scenario:

Camunda delegate calls fulfillment provider.
Provider accepts request.
Delegate crashes before marking BPMN task complete.
Retry calls provider again.

Expected controls:

External command has idempotency key.
Delegate records command attempt.
Fulfillment adapter checks previous result.
Retry returns existing fulfillment reference.
BPMN progresses safely.

8.6 Redis Unavailable

Scenario:

Redis cluster unavailable.
Pricing cache, idempotency cache, and rate limiter affected.

Expected controls:

Source of truth stays PostgreSQL.
Cache miss falls back to database for critical path.
Rate limiter fail-mode decided per endpoint.
Circuit breaker prevents thread exhaustion.
Alert and runbook available.

8.7 PostgreSQL Lock Contention

Scenario:

Bulk quote expiry job locks many rows.
User quote submit slows down.
API latency breaches SLO.

Expected controls:

Batch processing with small chunks.
FOR UPDATE SKIP LOCKED where appropriate.
Proper index on expiry query.
Job runs with timeout.
Observability on lock waits.
Kill/retry runbook.

8.8 Bad Deployment With Broken Event Consumer

Scenario:

New consumer version deployed.
It misinterprets event schema.
DLT grows.

Expected controls:

Consumer contract tests.
Canary deployment.
Lag and DLT alert.
Rollback path.
Replay from offset after fix.
Consumer tolerance to unknown fields.

9. Final State Machine Review

State machines are the spine of the platform.

9.1 Quote State Machine

Review questions:

Can a rejected quote be accepted?
Can an expired quote be accepted?
Can an accepted quote be cancelled?
Does approval happen before acceptance?
Is quote expiry deterministic?
Is each transition audited?

9.2 Order Root State Machine

Review questions:

Can root order be completed while a mandatory line is failed?
Can cancellation skip compensation?
Can repair resume without audit?
Can order state be derived from line states?
Is manual override represented as a transition?

9.3 Order Line State Machine

Review questions:

Can dependent line start before parent completes?
Can failed line be retried safely?
Is downstream request idempotent?
Is line state observable by operators?
Does line failure affect root state correctly?

10. Final Data Consistency Review

A useful consistency review is not “is everything eventually consistent?” That phrase is too vague.

Use this table instead.

Boundary	Consistency Need	Mechanism
Quote acceptance creates order	Strong per quote	DB transaction + unique constraint
Quote accepted event publication	Atomic with quote/order write	Transactional outbox
Fulfillment consumer processing	Effectively once	Inbox + idempotent side effect
Catalog update to quote draft	Eventual	Catalog published event + cache invalidation
Price book update to existing quote	No automatic mutation	Immutable pricing snapshot
Approval decision to quote state	Strong or controlled eventual	Command transition + event reconciliation
Order root state from line states	Controlled derived consistency	State transition service + reconciliation
Camunda process to order DB	Eventually consistent with repair	Process message + order command + runbook
Redis cache to PostgreSQL	Eventually consistent	TTL + invalidation + source-of-truth fallback

Top-tier design names the consistency model per boundary. It does not claim one universal consistency mode.

11. Final Security Review

Security review must test real abuse paths.

11.1 Tenant Escape

Question:

Can a user from tenant A access quote, order, approval, or event data from tenant B?

Required controls:

tenant ID from trusted token/edge context
no tenant ID accepted blindly from request body
tenant predicate in all queries
object-level authorization
audit event contains tenant ID
Kafka event keys and payloads include tenant classification
Redis keys include tenant namespace
negative tests for cross-tenant access

11.2 Unauthorized Approval

Question:

Can a sales user approve their own high-discount quote?

Required controls:

approval policy separates requester and approver
authorization checks actor attributes
conflict-of-interest guard
approval decision stores actor, policy, signal, timestamp
manual override requires stronger permission
audit evidence immutable

11.3 Privileged Repair Abuse

Question:

Can an operator “fix” an order in a way that hides commercial or fulfillment reality?

Required controls:

repair command is explicit
repair reason required
before/after state recorded
dual approval for high-risk repair
repair does not delete original event
operator action visible in audit trail
repair API heavily authorized and rate-limited

11.4 PII Leakage

Question:

Does PII leak into logs, traces, Kafka events, Redis keys, or Camunda variables?

Required controls:

data classification
field-level masking
no raw customer sensitive data in Redis keys
no full quote payload in Camunda variables
event payload minimization
log redaction tests
trace attribute allowlist

12. Final Observability Review

A mature CPQ/OMS platform answers business and technical questions quickly.

12.1 Business Questions

How many quotes are stuck in approval?
Which approval policy causes most escalations?
How many accepted quotes failed order capture?
Which product bundle causes most invalid configurations?
Which pricing rule causes most manual approvals?
How many orders are partially fulfilled?
Which downstream fulfillment system causes the most delay?
What is quote-to-order conversion latency?
How many orders required manual repair this week?

12.2 Technical Questions

Which API endpoint breaches latency SLO?
Which SQL query causes lock wait?
Which Kafka consumer group is lagging?
Which Camunda job definition creates most incidents?
Which Redis key pattern is hot?
Which service has retry storms?
Which deployment introduced error spike?
Which tenant is causing abnormal load?

12.3 Evidence Chain Query

For a single orderId, operators should be able to reconstruct:

orderId
 -> source quoteId
 -> quote version
 -> pricing snapshot
 -> configuration snapshot
 -> approval request
 -> approval decision
 -> order capture command
 -> order normalized lines
 -> Camunda process instance
 -> fulfillment commands
 -> Kafka events
 -> audit events
 -> manual repair actions

If this chain requires five people and manual database archaeology, observability is not good enough.

13. Final Performance Review

Performance engineering should be tied to workload.

Example workload target:

Flow	Target
Catalog browse p95	< 150 ms
Configuration validation p95	< 500 ms
Pricing calculation p95	< 700 ms
Quote create p95	< 300 ms
Quote submit p95	< 500 ms
Quote accept/order capture p95	< 800 ms
Order orchestration step p95	< 2 s
Event publication lag p95	< 5 s
Approval task creation p95	< 3 s

These are example targets, not universal truth. The important point is that every target must have a measurement path.

13.1 Capacity Formula

Use Little’s Law as a basic sanity check:

concurrency = throughput × latency

If order capture receives 100 requests per second and p95 latency is 800 ms:

concurrency ≈ 100 × 0.8 = 80 in-flight requests

Then ask:

Can the DB connection pool support this?
Can Camunda process starts support this?
Can outbox table handle this insert rate?
Can Kafka publisher drain fast enough?
Can downstream fulfillment absorb the resulting commands?

13.2 Hot Path Review

Hot paths:

catalog published view lookup
configuration validation
price calculation
quote submit
quote acceptance
order state transition
outbox polling
Kafka consumer processing
Camunda job execution

For each hot path, document:

expected throughput
p50/p95/p99 latency
database queries
indexes
lock behavior
cache behavior
timeout budget
retry behavior
fallback behavior
dashboard link
runbook link

14. Final Deployment Review

Deployment review should prove that change can be introduced safely.

14.1 Deployment Categories

Change Type	Risk	Strategy
Add optional API field	Low	Normal rollout
Remove API field	High	Deprecate, migrate, remove later
Add nullable DB column	Low	Expand migration
Add not-null column	Medium/High	Expand, backfill, enforce
Change pricing rule	High	Versioned rule + golden master
Change approval policy	High	Versioned policy + simulation
Change BPMN process	High	Versioning + instance compatibility
Change event schema	Medium/High	Compatibility check + consumer test
Change Kafka partition key	Very high	Usually new topic
Change tenant isolation logic	Critical	Security review + negative tests

14.2 Rollback Reality

Rollback is not always possible.

For many distributed systems, the safer path is roll-forward. Especially when:

database migration has already changed data
event schema has been published
BPMN process instances have started under new version
external side effects have happened
cache invalidation has propagated partially

Therefore every deployment plan needs:

pre-deploy validation
migration plan
rollout strategy
monitoring window
abort condition
rollback or roll-forward path
data repair plan
communication plan

15. Final Compliance Review

Compliance is not only “store logs”.

For CPQ/OMS, defensibility means the platform can explain commercial decisions and operational state changes.

15.1 Quote Evidence

A quote must answer:

Who created it?
What catalog version was used?
What configuration was selected?
Why was configuration valid?
What price book was used?
What discounts were applied?
Why was approval required or not required?
Who approved it?
When did the customer accept it?
What exact terms were accepted?

15.2 Order Evidence

An order must answer:

Which accepted quote created it?
Was order capture idempotent?
How were quote lines normalized into order lines?
Which line depended on which other line?
Which fulfillment step failed?
Was it retried?
Was it manually repaired?
Who performed repair?
Was the customer impacted?
What final state did each line reach?

15.3 Audit Properties

Audit evidence should be:

append-only
timestamped
actor-attributed
tenant-scoped
correlation-linked
policy-versioned
snapshot-backed
tamper-evident where required
exportable for investigation
retained according to policy

16. Top 1% Review: What Separates Strong Engineers

This section is intentionally direct.

A good engineer can implement endpoints.

A stronger engineer can implement services.

A senior engineer can model boundaries.

A top-tier engineer can explain failure, evolution, and accountability.

16.1 They Think in Invariants

Instead of asking:

“What tables do we need?”

They ask:

“What must never become false?”

Examples:

one quote acceptance must not create two orders
accepted quote price must not drift
approval decision must be traceable to policy version
order root state must not contradict line states
retry must not duplicate external side effects
tenant data must never cross boundary

Tables, APIs, Kafka topics, and BPMN flows are consequences of these invariants.

16.2 They Separate State, Process, and Event

State answers:

What is true now?

Process answers:

What should happen next?

Event answers:

What happened?

Audit answers:

Why, by whom, and under what rule did it happen?

Confusing these leads to weak systems.

Common mistakes:

using Kafka as source of command truth without ownership
using Camunda variables as business database
using audit log as operational state
using Redis cache as durable state
using API response as evidence

16.3 They Design for Retry Before Writing Code

Every command should answer:

What if request is repeated?
What if response is lost?
What if transaction commits but event publish fails?
What if event is consumed twice?
What if external call succeeds but local handler fails?
What if operator retries manually?

If there is no answer, the system is not production-ready.

16.4 They Treat Manual Repair as a First-Class Capability

Manual repair is not failure of engineering. Hidden repair is failure of engineering.

A mature system provides:

explicit repair commands
strict authorization
required reason
before/after state
audit event
reconciliation validation
dashboards
runbooks

16.5 They Prefer Explicit Trade-Offs

Bad architecture hides trade-offs.

Good architecture states them.

Example:

Decision:
Use PostgreSQL as source of truth for Quote and Order state.

Consequence:
Kafka events are derived facts, not authoritative state.

Trade-off:
Consumers may lag, but command-side invariants stay strong.

Mitigation:
Outbox, inbox, reconciliation, lag alert, replay runbook.

16.6 They Avoid Framework-Centric Architecture

The platform is not “a Camunda system” or “a Kafka system” or “a Redis system”.

It is a CPQ/OMS platform with:

business state
process orchestration
event propagation
persistence
caching
observability
security
compliance

Frameworks are implementation choices. The domain and invariants are the architecture.

17. Final Architecture Decision Records

At minimum, the capstone should include ADRs for these decisions:

ADR-001: Service boundaries and bounded contexts
ADR-002: OpenAPI-first contract governance
ADR-003: Schema-first payload and event model
ADR-004: PostgreSQL ownership per service
ADR-005: MyBatis over ORM for explicit SQL control
ADR-006: Quote snapshot strategy
ADR-007: Pricing snapshot and money precision
ADR-008: Approval policy versioning
ADR-009: Order state machine design
ADR-010: Camunda 7 orchestration boundary
ADR-011: Kafka topic and event taxonomy
ADR-012: Transactional outbox/inbox
ADR-013: Redis runtime usage limits
ADR-014: Tenant isolation strategy
ADR-015: Audit event model
ADR-016: Deployment and migration strategy
ADR-017: Manual repair and reconciliation model
ADR-018: Camunda 7 migration seam

Example ADR skeleton:

# ADR-010: Camunda 7 Orchestration Boundary

## Status
Accepted

## Context
Order fulfillment is long-running, may involve timers, retries, human intervention, and external systems.

## Decision
Use Camunda 7 to orchestrate process progression, but keep authoritative order state in Order Service PostgreSQL tables.

## Consequences
- BPMN process variables must remain minimal.
- Java delegates must call idempotent service commands.
- Order state transitions remain guarded by Order Service.
- Camunda incidents require runbooks and reconciliation.
- Future migration away from Camunda 7 remains possible.

## Alternatives Considered
- Pure choreography with Kafka
- Custom workflow engine
- Camunda as source of truth

## Validation
- Process tests
- Incident drills
- State reconciliation jobs
- Manual repair flow

18. Final Implementation Lab

The final lab is not one small exercise. It is a staged capstone.

Stage 1 — Contracts

Build or verify:

OpenAPI for quote acceptance
OpenAPI for order lookup
JSON Schema for quote snapshot
JSON Schema for order captured event
JSON Schema for audit event
AsyncAPI for public events

Acceptance criteria:

contracts linted
generated DTOs build
contract tests pass
breaking change detected in CI

Stage 2 — Persistence

Build or verify:

quote table
quote version table
pricing snapshot table
approval request table
order table
order line table
outbox table
inbox table
audit event table
idempotency table

Acceptance criteria:

migrations run from empty database
constraints enforce critical invariants
hot path indexes exist
mapper tests pass
duplicate accept quote fails safely

Stage 3 — Quote-to-Order Flow

Build or verify:

create quote
submit quote
approve quote
accept quote
capture order
publish outbox event
start Camunda process

Acceptance criteria:

full flow works locally
duplicate accept returns same result
expired quote cannot be accepted
rejected quote cannot be accepted
audit chain complete

Stage 4 — Orchestration

Build or verify:

BPMN order orchestration
service task delegates
message correlation
timer retry/escalation
incident handling
manual repair/resume

Acceptance criteria:

process test passes
delegate retry is idempotent
incident can be reproduced
incident runbook resolves test incident
order state remains authoritative outside Camunda

Stage 5 — Event Processing

Build or verify:

outbox publisher
Kafka topics
consumer inbox
retry topic
DLT
replay utility

Acceptance criteria:

duplicate event safe
consumer crash safe
DLT alert generated
replay does not duplicate side effect
lag dashboard visible

Stage 6 — Redis Runtime

Build or verify:

catalog published cache
pricing cache
idempotency fast path
rate limiter
Redis degradation fallback

Acceptance criteria:

cache miss falls back safely
Redis unavailable does not corrupt state
hot key metrics visible
rate limiter fail-mode tested

Stage 7 — Security

Build or verify:

JWT tenant context
object-level authorization
approval authorization
repair authorization
PII log redaction
negative tests

Acceptance criteria:

cross-tenant access denied
sales cannot approve own restricted quote
repair command requires privileged role
audit records privileged action
logs do not leak sensitive fields

Stage 8 — Observability and Operations

Build or verify:

dashboards
alerts
structured logs
traces
business metrics
runbooks
reconciliation job

Acceptance criteria:

stuck order detectable
stuck approval detectable
Kafka lag detectable
Camunda incident detectable
outbox stuck detectable
operator can reconstruct evidence chain

Stage 9 — Failure Drills

Run:

duplicate quote acceptance
Kafka consumer crash
Redis unavailable
Camunda delegate failure
PostgreSQL lock contention
bad deployment rollback/roll-forward
DLT replay
manual repair

Acceptance criteria:

system remains consistent
failure is visible
runbook works
audit trail remains complete
post-drill action items captured

19. Final Rubric

Use this rubric to self-assess.

Area	Junior Signal	Senior Signal	Top-Tier Signal
API	Can build endpoint	Can define contract	Can evolve contract safely
Schema	Uses DTOs	Separates schemas	Governs compatibility
Database	Creates tables	Models transactions	Enforces invariants
MyBatis	Writes queries	Structures mappers	Controls SQL performance and correctness
Catalog	CRUD products	Publishes catalog	Preserves commercial snapshots
Configuration	Validates inputs	Models rules	Explains invalidity and versioning
Pricing	Calculates totals	Handles discounts	Produces defensible pricing evidence
Quote	Tracks status	Models lifecycle	Prevents invalid transitions under concurrency
Approval	Creates task	Applies policy	Provides defensible decision trail
Order	Stores order	Models lifecycle	Handles partial failure and repair
Camunda	Draws BPMN	Orchestrates flow	Keeps workflow separate from business truth
Kafka	Publishes events	Designs topics	Enables replay and idempotency
Redis	Caches data	Controls TTL	Prevents cache from becoming hidden truth
Security	Adds auth	Applies RBAC	Enforces object/tenant policy everywhere
Testing	Unit tests	Integration tests	Contract/failure/invariant tests
Observability	Logs errors	Adds metrics	Answers business/debug questions quickly
Resilience	Adds retries	Uses circuit breaker	Designs retry-safe side effects
Operations	Fixes manually	Writes runbook	Builds repair/reconciliation as product
Compliance	Stores logs	Stores audit	Produces explainable evidence chain

20. Common Final Anti-Patterns

20.1 The Shared Database Platform

All services write the same database schema.

Why it fails:

ownership unclear
migrations conflict
transaction coupling grows
service boundary becomes fake
audits become ambiguous

Better:

service-owned schema
explicit APIs/events
read projections when needed
controlled reporting pipeline

20.2 The God Common Library

All services depend on a shared common-domain library.

Why it fails:

hidden coupling
coordinated deployments
accidental domain leakage
version conflicts

Better:

small technical libraries
generated contract models
service-local domain model
stable shared primitives only

20.3 The Workflow-as-Database Trap

Camunda variables store order state.

Why it fails:

hard query
weak constraints
difficult audit
migration pain
business logic leaks into BPMN

Better:

Order Service owns state
Camunda orchestrates next steps
variables hold IDs and small process metadata
reconciliation links process and order state

20.4 The Kafka-as-RPC Trap

Service sends event and expects immediate reaction like synchronous call.

Why it fails:

hidden latency assumptions
weak error handling
unclear ownership
replay danger

Better:

command API for immediate decision
event for durable facts
saga for long-running outcome
timeout/compensation model

20.5 The Cache-as-Truth Trap

Redis contains data that cannot be reconstructed.

Why it fails:

eviction loses state
restart loses evidence
failover corrupts assumptions
audit impossible

Better:

PostgreSQL as source of truth
Redis as acceleration
TTL and invalidation policy
fallback path

20.6 The Retry Without Idempotency Trap

Retry is added to improve reliability but duplicates side effects.

Why it fails:

duplicate orders
duplicate fulfillment
duplicate notifications
inconsistent audit

Better:

idempotency key
unique constraint
inbox/outbox
state transition guard
external request reference

21. Final Mental Model

The final mental model can be compressed into this diagram:

Read it from top to bottom:

Business invariants define correctness.
State machines encode lifecycle.
Command APIs request state transitions.
PostgreSQL commits durable truth.
Outbox emits durable facts.
Kafka distributes facts.
Camunda orchestrates long-running work but does not own business truth.
Redis accelerates but does not define truth.
Security constrains every boundary.
Observability explains behavior.
Runbooks and repair close the loop when reality deviates.

22. What to Build Next After This Series

After completing this series, natural advanced continuations are:

22.1 Build a Rule Engine from Scratch

Useful for:

pricing rules
eligibility rules
approval policy
configuration constraints

Focus:

DSL design
rule compilation
decision trace
versioning
simulation
explainability
performance

22.2 Build a Workflow Engine from Scratch

Useful for understanding Camunda/Temporal/Zeebe-like systems.

Focus:

process definition model
job queue
timer wheel
persistence
retry
incident
deterministic execution
migration

22.3 Build a Contract Governance Platform

Useful for large organizations.

Focus:

OpenAPI registry
schema registry
compatibility checker
API linting
consumer mapping
breaking-change workflow
documentation portal

22.4 Build a Multi-Tenant SaaS Control Plane

Useful for enterprise CPQ/OMS.

Focus:

tenant provisioning
quota
feature flag
entitlement
data isolation
billing
audit
control plane vs data plane

22.5 Build an Operational Repair Console

Useful for real production systems.

Focus:

stuck order dashboard
repair command authorization
evidence timeline
replay tooling
reconciliation diff
operator workflow
audit export

23. Final Review Questions

Use these questions to test whether you truly understand the system.

Why should accepted quote use pricing snapshot instead of recalculating from current price book?
Why is idempotency not enough without database unique constraint?
Why should Camunda not be the source of truth for order state?
How do outbox and inbox work together?
What makes an event replay-safe?
How does tenant isolation fail in subtle ways?
What is the difference between audit log and application log?
How do you handle approval policy changes while approval is in progress?
Why can retry make reliability worse?
When should a failure become a BPMN incident instead of an automatic retry?
How do you prove that quote acceptance did not create duplicate orders?
How do you recover a stuck order without hiding the original failure?
How do you know whether Redis outage is safe?
How do you deploy a not-null database column safely?
How do you change an event schema without breaking consumers?
How do you design a Kafka partition key for order events?
How do you know if order root state contradicts line states?
What is the minimum evidence needed to defend a pricing dispute?
Why is manual repair a product feature?
What would you remove if the system must be simplified without losing correctness?

24. Completion Criteria

Seri ini dianggap benar-benar selesai secara praktik jika Anda bisa:

menggambar architecture diagram dari ingatan
menjelaskan setiap service boundary
menulis OpenAPI command endpoint yang aman
mendesain schema event yang evolvable
menulis PostgreSQL constraint untuk invariant kritis
menulis MyBatis mapper untuk hot path query
membangun quote state machine
membangun order line state machine
membuat BPMN orchestration yang retry-safe
menerapkan outbox/inbox
menjelaskan Redis failure mode
menulis contract tests
menjalankan integration tests dengan real dependencies
mengobservasi quote-to-order flow
menjalankan failure drill
memperbaiki stuck order dengan audit trail
melakukan production readiness review
menulis ADR untuk keputusan besar
menjelaskan trade-off tanpa bergantung pada buzzword

25. Final Summary

Platform CPQ/OMS adalah contoh sempurna dari sistem bisnis kompleks: banyak state, banyak aktor, banyak aturan, banyak integrasi, banyak risiko, dan banyak konsekuensi jika salah.

Membangun sistem seperti ini bukan tentang menggabungkan Java, PostgreSQL, Kafka, Redis, dan Camunda.

Membangun sistem seperti ini adalah tentang menjaga kebenaran bisnis melewati:

request retry
stale catalog
price changes
approval conflicts
partial fulfillment
event duplication
consumer lag
cache outage
workflow incidents
bad deployments
operator repair
audit investigation
regulatory review

Teknologi adalah alat. Invariant adalah pusatnya.

Jika Anda bisa melihat sistem dari invariant, lifecycle, transaction boundary, event facts, operational recovery, dan defensible evidence, Anda sudah berpikir jauh melampaui implementasi CRUD.

Itulah target seri ini.

Series Completion Notice

Seri Learn Java Microservices CPQ OMS Platform sudah mencapai bagian terakhir.

Part ini adalah Part 035, sesuai batas maksimal seri besar yang ditentukan.

Status seri: SELESAI.

Lesson Recap

You just completed lesson 35 in final stretch. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 34

Learn Java Microservices Cpq Oms Platform Part 034 Compliance Auditability And Regulatory Defensibility

END_OF_SERIES