Deepen PracticeOrdered learning track

Concurrency Control and Race Conditions

Learn Enterprise CPQ OMS Camunda 7 - Part 046

Concurrency control and race condition design for a production-grade Java microservices CPQ and OMS platform using PostgreSQL, EclipseLink JPA, Kafka, Redis, and Camunda 7.

17 min read3210 words
PrevNext
Lesson 4664 lesson track3653 Deepen Practice
#java#microservices#cpq#oms+8 more

Part 046 — Concurrency Control and Race Conditions

Concurrency bugs in CPQ/OMS are expensive because they do not look like normal bugs.

They look like business inconsistency:

  • two users edit the same quote and one change disappears
  • approval is granted for a price that is no longer current
  • quote is accepted twice and creates two orders
  • order is cancelled while fulfillment callback marks it completed
  • inventory reservation is released after being confirmed
  • Kafka event is consumed twice and projection increments wrong counter
  • Camunda external task completes after timeout and another worker already retried
  • Redis lock expires while work is still running
  • optimistic lock retry creates a duplicated side effect

A top-level engineer treats concurrency as a domain-design problem, not only a database problem.

The core question is:

what must be true if two valid actions happen close together?


1. Why CPQ/OMS Is Race-Prone

CPQ/OMS has many naturally concurrent actors:

ActorConcurrent action
Sales agentedits quote lines
Sales managerapproves discount
Customer/partneraccepts quote
Pricing systemrepublishes price book
Product catalogpublishes new catalog version
Order servicecreates order
Camunda workflowadvances fulfillment step
External fulfillment systemsends callback
Kafka consumerupdates projection
Reconciliation jobrepairs stale state
Case workermanually resolves fallout

Concurrency is not exceptional.

It is the normal operating condition.

Therefore the design must state which actor wins, which action is rejected, which action is retried, and which action creates a new revision.


2. The Three Layers of Concurrency

Concurrency control has three layers.

Do not jump straight to locks.

First define business semantics.

Example:

If quote revision changed after approval, the old approval cannot be used for acceptance.

Only after that do you choose mechanism:

  • approved_revision_id must equal accepted_revision_id
  • command checks expected revision
  • DB constraint/order creation query enforces it
  • event/audit records the rejection

3. Concurrency Vocabulary

Use precise terms.

TermMeaning
Lost updatetwo writers update same state; one silently overwrites the other
Double submitsame command executed twice
Stale decisionapproval/price/config result used after input changed
Check-then-act racecode checks a condition, another transaction changes it, then code acts on stale assumption
ABA problemstate changes from A → B → A; simple state check misses intervening change
Duplicate eventsame event processed more than once
Out-of-order eventlater fact observed before earlier fact
Split brain authoritytwo services both think they own the same truth
Lock expiry racedistributed lock expires before protected work completes
Retry side effectretry repeats a non-idempotent action
Unknown outcomecaller times out and cannot know whether callee acted

These are not academic labels.

They are production failure categories.


4. Core Rule: Commands Must Carry Preconditions

A command should not say only:

{
  "discountPercent": 25
}

It should say:

{
  "quoteId": "q-123",
  "expectedQuoteVersion": 17,
  "expectedRevisionId": "qr-004",
  "discountPercent": 25,
  "reasonCode": "STRATEGIC_DEAL"
}

Preconditions make concurrency explicit.

Common CPQ/OMS preconditions:

CommandRequired precondition
edit quote lineexpected quote version/revision
price quoteexpected configuration version/catalog version
submit for approvalcurrent price result id and config result id
approve quoteexpected approval task id and revision id
accept quoteapproved revision id, document artifact id, expected quote version
create orderaccepted quote revision id, idempotency key
cancel orderexpected order version/state
complete fulfillment stepexpected step version/status
resolve falloutexpected fallout case version/status

A command without preconditions is an invitation to race conditions.


5. Optimistic Locking as Default

For CPQ/OMS business aggregates, optimistic locking should usually be the default.

Why?

Because users mostly do not edit the exact same quote/order at the exact same millisecond, but when they do, silent overwrite is unacceptable.

JPA optimistic locking uses a version field to detect conflicting concurrent updates. In a typical mapping:

@Entity
@Table(name = "quote")
public class QuoteEntity {

    @Id
    private UUID id;

    @Version
    @Column(name = "version", nullable = false)
    private long version;

    @Enumerated(EnumType.STRING)
    @Column(name = "status", nullable = false)
    private QuoteStatus status;

    // ...
}

The command checks expected version:

public QuoteDto updateQuote(UUID quoteId, long expectedVersion, UpdateQuoteCommand command) {
    QuoteEntity quote = quoteRepository.findByIdForUpdateIntent(quoteId);

    if (quote.getVersion() != expectedVersion) {
        throw new ConflictException("QUOTE_VERSION_CONFLICT");
    }

    quote.apply(command);
    return mapper.toDto(quote);
}

Then JPA/database detects conflict if another transaction committed first.

Do not hide this from the user.

Return a conflict that says:

  • quote changed
  • reload required
  • current version
  • conflicting fields if safe to expose

Bad Retry Pattern

Do not blindly retry user-edit commands after optimistic lock failure.

Bad:

user edit fails optimistic lock
system reloads latest state
system reapplies user edit automatically
system saves

This can silently override another person’s decision.

Retry is acceptable for technical commands whose operation is known to be commutative/idempotent.

Retry is dangerous for semantic edits.


6. Pessimistic Locking: Use Narrowly

Pessimistic locking means blocking other writers earlier.

Use it only when business semantics require exclusivity or conflict cost is too high.

Examples:

CasePossible mechanism
allocate unique sequence-like business numberDB sequence/unique constraint, not application lock
claim outbox batchFOR UPDATE SKIP LOCKED style claim
assign fallout case to one workerrow update with status/owner precondition
reserve scarce local resourceauthority service transaction
process idempotency recordunique key + status transition

Do not use pessimistic locks as a general solution for quote editing.

A quote can be open for minutes or hours. Database locks must not live that long.

Lock Window Rule

A lock should protect a short critical section, not a human session.

Bad:

user opens quote
system locks quote
user goes to lunch
all other users blocked

Better:

user opens quote
system shows version 17
user submits command with expected version 17
system rejects if quote changed

7. PostgreSQL Constraints Beat Application Hope

If a business invariant must never be violated, put it in the database when possible.

Application checks are necessary.

They are not sufficient.

Example: One Order Per Accepted Quote Revision

CREATE UNIQUE INDEX uq_order_quote_revision_primary
ON sales_order (quote_revision_id)
WHERE order_kind = 'PRIMARY';

Application logic may check first:

SELECT id FROM sales_order WHERE quote_revision_id = :revisionId;

But the unique index is what prevents a race between two submitters.

Example: Idempotency Key

CREATE TABLE idempotency_record (
  tenant_id uuid NOT NULL,
  idempotency_key text NOT NULL,
  command_name text NOT NULL,
  request_hash text NOT NULL,
  status text NOT NULL,
  response_ref text,
  created_at timestamptz NOT NULL,
  updated_at timestamptz NOT NULL,
  PRIMARY KEY (tenant_id, command_name, idempotency_key)
);

The primary key is the concurrency control.

Not a SELECT before INSERT.

Use insert-first semantics:

INSERT INTO idempotency_record (...)
VALUES (...)
ON CONFLICT DO NOTHING;

Then decide whether this request owns execution or must return stored result/conflict.


8. Quote Edit Race

Scenario:

T1: Alice loads quote version 17
T2: Bob loads quote version 17
T3: Alice changes term from 12 months to 24 months
T4: Bob adds discount 20%
T5: Alice saves version 18
T6: Bob saves based on version 17

Without concurrency control, Bob may overwrite Alice.

Correct behavior:

Bob's command is rejected with QUOTE_VERSION_CONFLICT
Bob reloads version 18
Bob decides whether discount still applies

State Diagram

API Shape

PATCH /quotes/q-123
If-Match: "17"
Idempotency-Key: edit-q123-abc

Conflict response:

{
  "type": "https://errors.example.com/quote-version-conflict",
  "title": "Quote has changed",
  "status": 409,
  "code": "QUOTE_VERSION_CONFLICT",
  "detail": "The quote was modified after this workspace was loaded.",
  "quoteId": "q-123",
  "expectedVersion": 17,
  "currentVersion": 18,
  "safeUserAction": "RELOAD_AND_REAPPLY"
}

9. Price Staleness Race

Scenario:

T1: Quote priced using catalog v2026.07 and price book p-44
T2: New price book p-45 is published
T3: User accepts quote based on p-44

The system must know whether old price is still valid.

Possible policies:

PolicyMeaning
valid until quote expiryold price result remains acceptable
reprice required on publishquote must be repriced before submit/accept
reprice required on material changeonly certain price book changes invalidate
approval required if staleaccept allowed only with authority

Do not bury this in code.

Model it explicitly.

Quote price result must include:

{
  "priceResultId": "pr-991",
  "priceBookVersion": "p-44",
  "catalogVersion": "2026.07",
  "pricedAt": "2026-07-02T10:00:00+07:00",
  "validUntil": "2026-07-09T23:59:59+07:00",
  "stalenessPolicy": "VALID_UNTIL_QUOTE_EXPIRY",
  "inputHash": "sha256:..."
}

Acceptance guard:

accept quote only if:
  quote.revision_id == approved_revision_id
  price_result_id == approved_price_result_id
  price_result still acceptable under policy
  document artifact matches same revision

10. Approval Staleness Race

Scenario:

T1: Quote revision 4 priced at IDR 100M with 20% discount
T2: Manager approves revision 4
T3: Sales changes line quantity, revision 5
T4: Sales attempts to accept quote using approval from revision 4

Correct behavior:

approval for revision 4 cannot authorize revision 5.

Approval decision should bind to immutable facts:

{
  "approvalDecisionId": "ap-123",
  "quoteId": "q-123",
  "quoteRevisionId": "qr-004",
  "priceResultId": "pr-991",
  "approvalRequirementId": "req-456",
  "decision": "APPROVED",
  "approvedBy": "manager-17",
  "authoritySnapshot": {
    "role": "REGIONAL_SALES_MANAGER",
    "maxDiscountPercent": 25,
    "region": "ID-JAVA"
  }
}

Acceptance must check the same IDs.

Never model approval as quote.approved = true.

That boolean is not enough evidence.


11. Double Submit Race

Scenario:

T1: User clicks Accept Quote
T2: Browser retries due to network timeout
T3: User clicks again
T4: Partner integration also submits same accept command

Expected result:

  • one order is created
  • repeated same command returns same result
  • different command with same idempotency key is rejected
  • duplicate order is impossible by database constraint

Idempotency Flow

Database Protection

Use both:

  1. idempotency record
  2. unique order constraint

Why both?

The idempotency record handles API retry semantics.

The unique order constraint protects business invariant even if the idempotency layer has a bug.


12. Order Cancel vs Fulfillment Complete Race

Scenario:

T1: Customer requests cancellation
T2: System sends cancel to fulfillment provider
T3: Provider sends complete callback for original fulfillment step
T4: Provider sends cancellation confirmed callback

Which state wins?

Wrong answer:

whichever event arrives last.

Correct answer:

state transition rules decide.

Example fulfillment step states:

If completion arrives after cancel requested, it may still be valid because external system already completed before cancellation took effect.

The order service must record facts:

  • cancel requested at
  • external complete received at
  • external event id
  • external event timestamp if trustworthy
  • internal received timestamp
  • final transition reason

Do not delete the contradiction.

Explain it.


13. External Callback Race

External callbacks can be:

  • duplicate
  • delayed
  • out of order
  • contradictory
  • missing
  • retried after timeout

Every callback must have an idempotency key or deduplication identity.

Example:

CREATE TABLE external_callback_record (
  tenant_id uuid NOT NULL,
  provider_name text NOT NULL,
  external_event_id text NOT NULL,
  received_at timestamptz NOT NULL,
  payload_hash text NOT NULL,
  processed_status text NOT NULL,
  PRIMARY KEY (tenant_id, provider_name, external_event_id)
);

Processing flow:

insert callback record
if duplicate same hash -> return success
if duplicate different hash -> raise provider conflict
load fulfillment step
apply state transition if allowed
if transition impossible -> create fallout case
emit event/outbox

Never process callback side effects before deduplication.


14. Kafka Consumer Race

Kafka consumers are at-least-once in many practical designs.

Therefore consumers must be idempotent.

Inbox Table

CREATE TABLE consumer_inbox (
  consumer_name text NOT NULL,
  event_id uuid NOT NULL,
  processed_at timestamptz NOT NULL,
  event_type text NOT NULL,
  aggregate_id uuid NOT NULL,
  PRIMARY KEY (consumer_name, event_id)
);

Consumer transaction:

begin
  insert into consumer_inbox
  if duplicate -> commit and skip
  apply projection update
commit

Projection Race

Bad projection update:

UPDATE quote_search
SET approval_count = approval_count + 1
WHERE quote_id = :quoteId;

If duplicate event arrives, count is wrong.

Better:

  • store decision rows keyed by decision id
  • derive count from unique facts
  • use upsert with event identity
INSERT INTO quote_approval_projection_decision (
  quote_id,
  approval_decision_id,
  decision,
  decided_at
)
VALUES (...)
ON CONFLICT (approval_decision_id) DO NOTHING;

Then update aggregate projection from facts or use deterministic merge.


15. Out-of-Order Event Race

Scenario:

OrderCreated version 1
OrderCancelled version 3
OrderSubmitted version 2

Consumer sees version 3 before version 2.

Options:

StrategyUse when
version-aware ignore olderprojection only needs latest state
buffer missing versionsstrict sequence needed
rebuild from authorityprojection can query source
event-sourced reducercomplete ordered stream per aggregate

For most CPQ/OMS projections, a version-aware merge is practical:

UPDATE order_search_projection
SET status = :eventStatus,
    aggregate_version = :eventVersion,
    updated_at = now()
WHERE order_id = :orderId
  AND aggregate_version < :eventVersion;

This prevents older events from moving projection backward.

But it does not recover missing details if version 2 carried data needed by version 3.

Design events accordingly.


16. Camunda External Task Race

External task worker pattern introduces its own races.

Scenario:

T1: worker A locks task for 30s
T2: worker A calls external provider, slow response
T3: lock expires
T4: worker B locks same task
T5: both workers attempt side effect

The external side effect must be idempotent.

Camunda lock is not the final business protection.

Design worker command:

{
  "orderId": "ord-123",
  "fulfillmentStepId": "fs-88",
  "workflowTaskId": "camunda-task-xyz",
  "attemptId": "attempt-001",
  "idempotencyKey": "fulfillment:fs-88:reserve-inventory"
}

Domain service protects:

execute fulfillment step only if:
  step status is PENDING or RETRYABLE
  command name not already completed for step
  expected step version matches

External provider receives stable idempotency key if supported.

If not supported, reconciliation must handle unknown outcome.


17. Camunda Process Start Race

Scenario:

T1: order service commits order
T2: starts Camunda process
T3: timeout occurs before response
T4: retry starts another process

Use a deterministic business key and process start idempotency strategy.

Options:

  1. store workflow correlation record in order DB
  2. use unique business key discipline at application level
  3. reconcile orders with missing workflow
  4. do not assume timeout means process did not start

Correlation table:

CREATE TABLE workflow_correlation (
  tenant_id uuid NOT NULL,
  aggregate_type text NOT NULL,
  aggregate_id uuid NOT NULL,
  workflow_name text NOT NULL,
  business_key text NOT NULL,
  process_instance_id text,
  status text NOT NULL,
  created_at timestamptz NOT NULL,
  updated_at timestamptz NOT NULL,
  PRIMARY KEY (tenant_id, aggregate_type, aggregate_id, workflow_name)
);

If process start times out:

  • mark correlation START_UNKNOWN
  • reconciliation queries Camunda by business key or process variables
  • if found, attach process instance id
  • if not found after safe interval, retry start

Unknown outcome is a state, not an exception log.


18. Redis Lock Race

Redis locks are often overused.

The dangerous pattern:

acquire Redis lock TTL 30s
perform work 45s
lock expires
another worker acquires lock
both perform work

A Redis lock can reduce duplicate work.

It should not be the only correctness mechanism for money/order/approval.

For critical commands, use database constraints and idempotency records.

Redis lock is acceptable for:

  • cache rebuild stampede prevention
  • low-value duplicate background work reduction
  • short-lived local coordination where correctness is still enforced elsewhere

It is not acceptable as sole protection for:

  • one order per quote
  • one approval decision per task
  • inventory reservation authority
  • payment capture
  • contract signing

19. ABA Problem in CPQ/OMS

ABA means state returns to the same visible value, hiding intermediate changes.

Scenario:

Quote status = DRAFT
User loads quote
Quote becomes SUBMITTED
Approval rejected
Quote returns to DRAFT
User saves old draft command because status is DRAFT again

If command checks only status, it passes incorrectly.

Use version/revision, not status alone.

Bad precondition:

status == DRAFT

Good precondition:

status == DRAFT
and quote_version == expectedVersion
and quote_revision_id == expectedRevisionId

State is not enough.

History matters.


20. Check-Then-Act Race

Bad:

if (!orderRepository.existsForQuoteRevision(revisionId)) {
    orderRepository.createOrder(revisionId);
}

Two requests can both pass the check.

Better:

try {
    orderRepository.createOrderWithUniqueQuoteRevision(revisionId);
} catch (UniqueConstraintViolation e) {
    return orderRepository.findByQuoteRevision(revisionId);
}

Or use explicit idempotency flow.

The rule:

do not separate eligibility check from state change unless storage enforces the invariant.


21. Retry Rules

Retries are dangerous unless classified.

OperationRetry?Why
read catalogyesno side effect
price previewyes if deterministicno committed side effect, but watch CPU
save quote editusually no automatic semantic retrymay overwrite human edit
create orderyes with idempotency keymust return same result
start workflowyes with correlation/unknown outcome handlingtimeout ambiguity
send notificationyes with delivery idempotencyduplicate communication risk
external reservationyes only with provider idempotency/reconciliationunknown outcome
payment captureyes only with provider idempotencyfinancial duplicate risk

Retry must include:

  • max attempts
  • backoff
  • idempotency identity
  • timeout budget
  • error classification
  • audit/failure record

No infinite retry loops.

No retry without ownership of side-effect semantics.


22. Transaction Boundary Rules

Do not hold DB transactions while doing remote work.

Bad:

begin transaction
  update order status
  call inventory service
  call payment service
  start Camunda process
commit

Better:

begin transaction
  validate command
  update order state
  write audit
  write outbox/workflow command
commit

async worker/process performs remote work with idempotency

Transaction should protect local truth.

Workflow/events coordinate external side effects.

Short Transaction Pattern

@Transactional
public OrderCreatedResult createOrder(CreateOrderCommand command) {
    IdempotencyDecision idem = idempotency.claim(command.identity());
    if (idem.isReplay()) return idem.replay();

    QuoteSnapshot quote = quoteRepository.loadAcceptedRevision(command.quoteRevisionId());
    OrderEntity order = OrderEntity.fromAcceptedQuote(quote, command);

    orderRepository.insert(order);
    audit.record(OrderAudit.created(order));
    outbox.record(OrderEvents.created(order));
    workflowOutbox.record(StartOrderWorkflow.forOrder(order));

    idempotency.complete(command.identity(), order.id());
    return new OrderCreatedResult(order.id());
}

Remote calls happen after commit.


23. Aggregate Design and Hot Rows

A common performance/concurrency bug is over-centralizing state.

Bad design:

order header row updated every time any fulfillment step changes

If an order has many parallel steps, the header becomes a hot row.

Better:

  • fulfillment steps have their own rows and versions
  • order header updates only on meaningful aggregate state transition
  • derived progress is computed/projected
  • state transition function decides when order status changes

Example:

UPDATE fulfillment_step
SET status = :newStatus,
    version = version + 1
WHERE id = :stepId
  AND version = :expectedVersion;

Then evaluate whether order status can move:

UPDATE sales_order
SET status = 'FULFILLED', version = version + 1
WHERE id = :orderId
  AND status = 'IN_PROGRESS'
  AND NOT EXISTS (
    SELECT 1 FROM fulfillment_step
    WHERE order_id = :orderId
      AND status NOT IN ('COMPLETED', 'SKIPPED')
  );

This update is safe because it encodes the condition at update time.


24. Concurrency Control Matrix

RacePrimary protectionSecondary protection
quote edit lost updateexpected version + JPA @Versiontransition log
stale price acceptanceprice result id/revision checkacceptance guard + audit
stale approvalapproval bound to revision/price resultCamunda task completion validation
double order creationidempotency keyunique index on quote revision
duplicate external callbackcallback dedup tablestate transition guard
duplicate Kafka eventinbox tableidempotent projection merge
out-of-order Kafka eventaggregate versionrebuild/reconciliation
external task duplicate executiondomain step idempotencyprovider idempotency key
workflow start timeoutworkflow correlation recordreconciliation by business key
Redis lock expiryDB invariantidempotency record
manual fallout conflictfallout case versionassignment/authority guard
order cancel vs completetransition matrixcompensation path

This matrix should exist in architecture docs.

If nobody can explain the protection for each race, the system is not production-grade.


25. Testing Race Conditions

Race conditions must be tested intentionally.

Unit tests are not enough.

Example: Double Submit Test

@Test
void acceptingSameQuoteTwiceCreatesOnlyOneOrder() throws Exception {
    AcceptedQuote quote = fixture.acceptedQuote();

    ExecutorService pool = Executors.newFixedThreadPool(2);
    CountDownLatch start = new CountDownLatch(1);

    Callable<OrderResult> task = () -> {
        start.await();
        return quoteApi.acceptQuote(
            quote.id(),
            quote.revisionId(),
            "same-idempotency-key"
        );
    };

    Future<OrderResult> f1 = pool.submit(task);
    Future<OrderResult> f2 = pool.submit(task);
    start.countDown();

    OrderResult r1 = f1.get();
    OrderResult r2 = f2.get();

    assertThat(r1.orderId()).isEqualTo(r2.orderId());
    assertThat(orderRepository.countByQuoteRevision(quote.revisionId())).isEqualTo(1);
}

Example: Stale Approval Test

@Test
void approvalCannotBeUsedAfterQuoteRevisionChanges() {
    Quote quote = fixture.quotePricedAndSubmitted();
    Approval approval = approvalService.approve(quote.currentRevisionId());

    quoteService.editLine(quote.id(), quote.version(), editQuantity());

    assertThatThrownBy(() ->
        quoteService.acceptQuote(
            quote.id(),
            approval.quoteRevisionId(),
            approval.priceResultId()
        )
    ).hasErrorCode("STALE_APPROVAL");
}

Example: Out-of-Order Projection Test

@Test
void olderOrderEventDoesNotMoveProjectionBackwards() {
    projection.apply(orderCancelled(version(3)));
    projection.apply(orderSubmitted(version(2)));

    OrderProjection row = projection.find(orderId);
    assertThat(row.status()).isEqualTo("CANCELLED");
    assertThat(row.aggregateVersion()).isEqualTo(3);
}

Required Race Tests

  • two users edit same quote
  • approval completes after quote revision changes
  • accept quote twice
  • accept quote while quote expires
  • create order while price book changes
  • cancel order while fulfillment completes
  • duplicate external callback
  • out-of-order external callback
  • duplicate Kafka event
  • out-of-order Kafka event
  • Camunda external task lock expires
  • process start timeout then retry
  • fallout case claimed by two workers
  • Redis lock expires before work completes
  • reconciliation races with normal callback

26. Observability for Concurrency

You cannot debug races without evidence.

Log structured fields:

FieldWhy
tenantIdisolate tenant impact
aggregateTypequote/order/step/case
aggregateIdtrace object
expectedVersiondetect stale command
actualVersionconflict analysis
commandIdcommand identity
idempotencyKeyretry/double-submit analysis
eventIdKafka duplicate analysis
processInstanceIdCamunda trace
businessKeybusiness workflow trace
externalRequestIdprovider call trace
externalEventIdcallback dedup
transitionFrom / transitionTolifecycle correctness
conflictCoderace classification

Metrics:

MetricMeaning
optimistic lock conflicts by commandedit contention
idempotency replay countretry/double-submit frequency
idempotency hash conflict countclient misuse/bug
duplicate callback countprovider behavior
stale approval rejection countapproval/revision friction
stale price rejection countpricing/catalog drift impact
Kafka duplicate skipped countconsumer idempotency activity
out-of-order event ignored countpartitioning/versioning issue
workflow start unknown countCamunda/API timeout issue
lock wait p95DB contention
deadlock countlock ordering bug

A race should produce a named metric, not only an exception.


27. Anti-Patterns

Anti-Pattern 1 — “Just Use Synchronized”

synchronized protects only one JVM instance.

It does not protect:

  • multiple service pods
  • database writes from another service
  • Kafka consumers
  • Camunda workers
  • external callbacks

Use it only for local in-memory structures, not business correctness.

Anti-Pattern 2 — “Redis Lock Solves It”

Redis lock can reduce duplicate work.

It does not replace database invariants.

Anti-Pattern 3 — “Retry Everything”

Retry without idempotency creates duplicate side effects.

Retry without semantic awareness overwrites human decisions.

Anti-Pattern 4 — “Status Check Is Enough”

Status can return to the same value.

Use version/revision/decision IDs.

Anti-Pattern 5 — “Events Are Ordered Globally”

Kafka does not give useful global business ordering across all aggregates.

Design per-aggregate ordering and version-aware consumers.

Anti-Pattern 6 — “Camunda Is the Source of Truth”

Camunda owns workflow execution state.

Domain service owns business truth.

Anti-Pattern 7 — “UI Disable Button Prevents Double Submit”

UI prevention is helpful.

Server-side idempotency is mandatory.


28. Design Review Questions

Ask these before approving any CPQ/OMS feature:

  1. What happens if the command is sent twice?
  2. What happens if two users perform this command concurrently?
  3. What precondition does the command carry?
  4. What database constraint enforces the invariant?
  5. What is the aggregate version/revision check?
  6. Can the approval/price/config become stale?
  7. What happens if Kafka publishes/consumes duplicate event?
  8. What happens if event arrives out of order?
  9. What happens if Camunda external task is executed twice?
  10. What happens if process start times out?
  11. What happens if external provider acts but response is lost?
  12. What happens if Redis lock expires?
  13. Is retry safe? Why?
  14. Is manual resolution versioned and audited?
  15. Which metric proves this race happened?

If a feature owner cannot answer these, the feature is not ready.


29. Mental Model

Concurrency control is not one mechanism.

It is a stack:

business invariant
  -> command precondition
  -> aggregate version/revision
  -> idempotency identity
  -> database constraint
  -> transaction boundary
  -> event identity/version
  -> workflow correlation
  -> external idempotency/reconciliation
  -> audit/observability

Every layer catches a different failure.

A production-grade CPQ/OMS system does not rely on luck that commands arrive in a nice order.

It assumes disorder.

Then it makes disorder safe.


30. Closing

A race condition is not merely two threads touching the same variable.

In enterprise CPQ/OMS, a race condition is any situation where two true business facts compete to change the same commercial or fulfillment reality.

The solution is not “add locks everywhere”.

The solution is to model the business truth precisely:

  • what can change
  • who can change it
  • based on which version
  • with which authority
  • producing which immutable evidence
  • protected by which storage invariant
  • recoverable by which workflow/reconciliation path

That is the difference between a demo system and a production-grade order platform.

Lesson Recap

You just completed lesson 46 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.