Series MapLesson 36 / 64
Deepen PracticeOrdered learning track

Learn Java Payment Systems Part 036 Payment Routing Engine

8 min read1545 words
PrevNext
Lesson 3664 lesson track3653 Deepen Practice

title: Build From Scratch: Large Production Grade Java Payment Systems - Part 036 description: Payment routing engine design for enterprise Java payment systems, covering eligibility, ranking, provider health, cost, success rate, risk, BIN, currency, fallback, experimentation, and deterministic routing evidence. series: learn-java-payment-systems seriesTitle: Build From Scratch: Large Production Grade Java Payment Systems order: 36 partTitle: Payment Routing Engine tags:

  • java
  • payments
  • payment-routing
  • orchestration
  • risk
  • provider-routing
  • payment-systems
  • enterprise-architecture date: 2026-07-02

Part 036 — Payment Routing Engine

A payment routing engine answers one deceptively simple question:

Where should this payment attempt go?

Naive answer:

always Provider A

Slightly better answer:

if Provider A fails, use Provider B

Production answer:

choose an eligible route based on payment method, currency, country, merchant capability,
provider capability, BIN/card attributes, risk, cost, success rate, latency, health,
contractual constraints, compliance constraints, retry history, and operational policy;
then preserve the decision evidence so the platform can explain and replay it later.

Routing is not just optimization.

It is a financial control plane.

A bad routing decision can cause:

declined payments
higher processing cost
fraud exposure
compliance breach
duplicate charge through unsafe retry
provider outage blast radius
unexplainable merchant settlement
incorrect payment method display

This part builds a production-grade routing engine for a Java payment platform.


1. Mental Model: Routing Is a Decision, Not an If-Else

A routing decision should be treated as a domain artifact.

input context + rule/policy version + observed metrics + candidate routes -> selected route + explanation

You should be able to answer later:

Why did payment pay_123 go to provider_adyen instead of provider_stripe?
Why was bank transfer hidden for this customer?
Why did we retry on Provider B after Provider A timeout?
Why did high-risk merchant traffic stop using route X?
Why did authorization rate drop after rule release 2026.07.02?

So the output is not just:

ProviderId provider = chooseProvider(payment);

It is:

public record RouteDecision(
        RouteDecisionId id,
        PaymentAttemptId attemptId,
        List<RouteCandidate> evaluatedCandidates,
        RouteCandidate selectedCandidate,
        RouteDecisionReason reason,
        RoutingPolicyVersion policyVersion,
        MetricsSnapshot metricsSnapshot,
        Instant decidedAt
) {}

The routing engine is a deterministic decision function around changing inputs.

same input + same policy version + same metric snapshot = same decision

This is how routing becomes auditable.


2. What Is a Route?

A route is not only provider.

A route may include:

provider
merchant account / sub-merchant account
acquirer
payment method
card brand/network
capture mode
authentication strategy
currency handling
settlement currency
MCC / merchant category
risk mode
processor endpoint
credential set
fee plan
settlement account

Example route:

{
  "provider": "adyen",
  "providerMerchantAccount": "platform_us_cards_high_volume",
  "paymentMethodFamily": "card",
  "cardNetwork": "visa",
  "authCaptureMode": "manual_capture",
  "settlementCurrency": "USD",
  "riskProfile": "standard",
  "credentialProfile": "adyen-us-prod-2026q3"
}

Another route:

{
  "provider": "bank_partner_a",
  "rail": "instant_credit_transfer",
  "scheme": "BI_FAST",
  "settlementAccount": "id_bifast_operational_001",
  "riskProfile": "low_value_transfer",
  "cutoffMode": "always_on"
}

A route is a complete execution plan, not just a URL.


3. Routing Pipeline

The engine has stages.

A clean pipeline:

candidate generation
eligibility filtering
policy filtering
health filtering
scoring/ranking
selection
persistence
execution

Do not mix these stages into one 900-line method.


4. Payment Context

The router needs a normalized context.

public record RoutingContext(
        PaymentIntentId paymentIntentId,
        PaymentAttemptId attemptId,
        MerchantId merchantId,
        Money amount,
        CustomerContext customer,
        PaymentMethodContext paymentMethod,
        OrderContext order,
        RiskContext risk,
        RetryContext retry,
        FulfillmentContext fulfillment,
        Instant requestedAt
) {}

Important fields:

merchant country
merchant capabilities
merchant risk tier
merchant MCC/category
customer country
billing/shipping country
IP/device country
currency
amount
payment method family
card BIN/IIN metadata
card brand
card funding type
issuer country
wallet type
prior attempt result
provider outage state
authentication result
capture mode
settlement preference

Routing quality depends on context quality.

Garbage context gives garbage route.


5. Candidate Generation

Candidate generation answers:

Which routes might possibly process this payment?

Example table:

CREATE TABLE payment_route (
    id UUID PRIMARY KEY,
    provider_id TEXT NOT NULL,
    provider_merchant_account TEXT NOT NULL,
    payment_method_family TEXT NOT NULL,
    country TEXT,
    currency CHAR(3),
    card_brand TEXT,
    merchant_risk_tier TEXT,
    capture_mode TEXT,
    enabled BOOLEAN NOT NULL DEFAULT true,
    priority INT NOT NULL DEFAULT 100,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_payment_route_lookup
ON payment_route(payment_method_family, country, currency, enabled);

Candidate generation should be broad enough not to exclude valid routes too early.

It should not decide best route.

It should produce candidates.


6. Eligibility Filter

Eligibility means the route can legally/technically process the payment.

Examples:

provider supports payment method
provider supports currency
provider supports merchant country
provider supports customer country
provider supports amount range
provider supports capture mode
provider supports refund/partial capture if required
merchant is onboarded for provider route
merchant capability is active
credential exists and is valid
card network supported
wallet supported
rail availability window allows processing

Eligibility result should carry reasons.

public record EligibilityResult(
        RouteCandidate candidate,
        boolean eligible,
        List<IneligibilityReason> reasons
) {}

Example reasons:

currency_not_supported
merchant_not_onboarded
amount_below_minimum
amount_above_maximum
payment_method_disabled
provider_credential_missing
capture_mode_not_supported
country_not_supported

Never silently drop candidates.

You need explainability.


7. Policy Filter

Policy is not the same as eligibility.

Eligibility asks:

Can this route process the payment?

Policy asks:

Should this route be allowed for this business/risk/compliance context?

Policy examples:

Do not route high-risk MCC to provider X.
Do not route EU cards to non-EU acquirer for this merchant group.
Use local acquiring when available for issuer country.
Block route Y for merchant under compliance review.
Disable instant payout route for merchant risk tier HIGH.
Do not retry hard declines on another provider.
Route low-value transactions to low-cost provider.
Force provider A for merchant contract group Enterprise-001.

Represent policy as versioned rules.

CREATE TABLE routing_policy_version (
    id UUID PRIMARY KEY,
    version TEXT NOT NULL UNIQUE,
    status TEXT NOT NULL,
    effective_from TIMESTAMPTZ NOT NULL,
    created_by TEXT NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE TABLE routing_rule (
    id UUID PRIMARY KEY,
    policy_version_id UUID NOT NULL REFERENCES routing_policy_version(id),
    rule_name TEXT NOT NULL,
    priority INT NOT NULL,
    condition_json JSONB NOT NULL,
    action_json JSONB NOT NULL,
    enabled BOOLEAN NOT NULL DEFAULT true
);

A policy rule should not be a hidden Java deploy every time.

But also do not create a chaotic no-code rules system without testing/versioning.

For payment routing, rule changes are production risk.


8. Health Filter

Provider health is not binary.

Health dimensions:

availability
latency
timeout rate
authorization success rate
error code distribution
webhook delay
settlement/report delay
refund API health
capture API health
provider incident/manual override
regional degradation
merchant-specific degradation
payment-method-specific degradation

A provider can be healthy for refunds but unhealthy for authorizations. A provider can be healthy for cards but unhealthy for wallets. A provider can be healthy globally but failing for one region.

Model health by route segment.

public record RouteHealth(
        RouteId routeId,
        OperationType operationType,
        HealthState state,
        double successRate5m,
        double timeoutRate5m,
        double p95LatencyMs,
        int consecutiveFailures,
        Instant measuredAt
) {}

Health states:

HEALTHY
DEGRADED
PROBATION
DISABLED_AUTOMATIC
DISABLED_MANUAL

Manual disable must override scoring.

incident commander disabled route -> router must not select it

9. Scoring and Ranking

Once candidates are eligible and allowed, rank them.

Factors:

expected authorization success
processing cost
latency
risk score
provider health
merchant preference
local acquiring benefit
retry suitability
settlement speed
refund/dispute capability
contractual volume commitments

Simple scoring:

score =
  success_weight * normalized_success_rate
- cost_weight    * normalized_cost
- latency_weight * normalized_latency
- risk_weight    * normalized_risk_penalty
+ preference_bonus
- degradation_penalty

Do not start with ML.

Start with transparent rule-based scoring.

public final class RouteScorer {
    public ScoredRoute score(RouteCandidate c, RoutingContext ctx, MetricsSnapshot m) {
        BigDecimal score = BigDecimal.ZERO;

        score = score.add(weight("success").multiply(m.successRate(c.routeId())));
        score = score.subtract(weight("cost").multiply(c.estimatedCost().normalized()));
        score = score.subtract(weight("latency").multiply(m.latencyPenalty(c.routeId())));
        score = score.subtract(weight("risk").multiply(ctx.risk().routePenalty(c.routeId())));
        score = score.add(c.preferenceBonus(ctx.merchantId()));

        return new ScoredRoute(c, score, explanation(c, ctx, m, score));
    }
}

The explanation is not optional.

{
  "selectedRoute": "route_cards_us_adyen_001",
  "score": "0.8421",
  "reasons": [
    "eligible_for_currency_usd",
    "merchant_onboarded",
    "provider_healthy",
    "higher_success_rate_15m",
    "cost_within_policy"
  ],
  "rejectedCandidates": [
    {
      "route": "route_cards_us_provider_b_001",
      "reason": "provider_degraded_timeout_rate"
    }
  ]
}

10. Cost Model

Payment routing often optimizes cost.

Cost may include:

MDR
interchange estimate
scheme fee
provider markup
cross-border fee
FX spread
refund fee
chargeback fee expectation
minimum fee
monthly volume commitment
local acquiring benefit
settlement delay cost
operational/reconciliation cost

Cost is not always known exactly at authorization time.

Use estimate + later settlement truth.

estimated_route_cost_at_decision
actual_route_cost_after_settlement
variance

Cost model table:

CREATE TABLE route_cost_model (
    id UUID PRIMARY KEY,
    route_id UUID NOT NULL,
    version TEXT NOT NULL,
    effective_from TIMESTAMPTZ NOT NULL,
    condition_json JSONB NOT NULL,
    fixed_fee_minor BIGINT NOT NULL DEFAULT 0,
    variable_bps INT NOT NULL DEFAULT 0,
    min_fee_minor BIGINT,
    max_fee_minor BIGINT,
    currency CHAR(3) NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    UNIQUE (route_id, version)
);

Never overwrite old cost models.

Routing decisions must preserve the cost model version used.


11. Success Rate Model

Authorization success rate is not global.

Bad metric:

Provider A success rate = 82%

Better metric:

Provider A success rate for:
  merchant segment = enterprise retail
  payment method = card
  card network = visa
  issuer country = ID
  currency = IDR
  amount bucket = 100k-500k
  last 15 minutes = 91.2%

But segmentation can become sparse.

Use fallback hierarchy:

exact segment metric
merchant + method + country
method + country
method global
provider global

Store metric confidence.

public record SuccessMetric(
        RouteId routeId,
        MetricSegment segment,
        BigDecimal successRate,
        long sampleSize,
        Duration window,
        BigDecimal confidence
) {}

Do not route aggressively on 3 samples.


12. BIN/IIN and Card Attribute Routing

Card routing often uses card attributes:

BIN/IIN
card brand
issuer country
issuer bank
funding type: credit/debit/prepaid
commercial vs consumer
3DS capability
network token availability
local acquiring availability

Rules:

Use local acquirer for domestic issuer when available.
Avoid provider X for issuer bank Y if timeout spike detected.
Require 3DS route for high-risk BIN range.
Block prepaid cards for merchant category Z.
Prefer network token route if credential-on-file and token present.

Be careful with BIN data:

BIN ranges change
8-digit BIN migration exists in many card ecosystems
issuer metadata can be stale
card attributes are not always available before tokenization

Therefore:

BIN metadata is routing signal, not absolute truth unless contractually guaranteed

13. Risk-Aware Routing

Routing and risk are linked.

Risk engine may say:

allow
challenge with 3DS
manual review
block
route only through provider with stronger fraud tooling
route with manual capture
route with delayed fulfillment

Risk-aware route example:

low risk -> frictionless route, low cost provider
medium risk -> provider supporting 3DS/challenge
high risk -> block or manual review
merchant under investigation -> force manual capture

Never let routing bypass risk.

Pipeline:

Risk output becomes routing constraint.

public record RiskRoutingConstraint(
        boolean require3ds,
        boolean requireManualCapture,
        Set<ProviderId> blockedProviders,
        Set<RouteCapability> requiredCapabilities,
        RiskAction action
) {}

14. Retry-Aware Routing

Retry is routing with history.

A retry should know:

previous provider
previous route
previous error code
previous error class
previous auth response code
whether decline was hard or soft
whether provider may have processed request
whether idempotency allows retry
whether customer action is required
whether retrying another provider is allowed

Bad:

Provider A timeout -> immediately charge via Provider B

Why bad:

Provider A may have authorized successfully
Provider B may also authorize
customer may see two holds/charges

Retry matrix:

Previous ResultSame Provider RetryDifferent Provider RetryNotes
network connect failed before request writemaybemaybedepends on certainty request not sent
HTTP timeout after request sentidempotent status check firstno until resolvedoutcome unknown
provider 5xx with idempotency keysame key retryno until knownprovider may have created payment
soft decline insufficient fundsno immediate blind retryusually nomay retry later/customer update
issuer unavailablemaybe route alternativepolicy-dependentensure no prior auth
hard decline stolen cardnonorisk/compliance block
provider manual outage before requestnoyessafe fallback if no request sent

The route engine needs retry context.

public record RetryContext(
        int attemptNumber,
        List<PreviousRouteAttempt> previousAttempts,
        boolean previousOutcomeKnown,
        Optional<DeclineClassification> declineClassification
) {}

15. Fallback Routing

Fallback is not retry everything elsewhere.

Fallback is safe only when:

previous operation did not create financial effect
or previous outcome is known terminal failed
or rail/provider supports idempotent transfer of attempt semantics
or customer explicitly reattempts after clear failure

Fallback policy examples:

If provider disabled before request, use next eligible provider.
If provider returns retryable technical failure before auth created, use backup route.
If provider returns unknown, hold and resolve.
If provider returns hard decline, do not fallback.
If provider returns fraud decline, block.
If provider returns 3DS required, route to 3DS-capable flow.

Fallback must preserve evidence.

attempt_1 -> provider A -> result technical_failure_known_no_auth
attempt_2 -> provider B -> result authorized

Do not mutate attempt 1 into attempt 2.


16. Payment Method Display Routing

Routing begins before payment execution.

Checkout may ask:

Which payment methods should be shown to this customer?

This is not the same as provider route selection, but related.

Inputs:

merchant capability
customer country
currency
amount
device/channel
risk pre-score
payment method availability
provider availability
commercial preference
conversion expectations

Output:

eligible payment method list
ordering
method-specific display metadata
constraints/warnings

Example:

{
  "paymentMethods": [
    {"type": "card", "rank": 1},
    {"type": "wallet", "rank": 2},
    {"type": "bank_transfer", "rank": 3}
  ],
  "policyVersion": "pm-display-2026.07.02"
}

Payment method display should be versioned, because it affects conversion and compliance.

Do not hardcode checkout order in frontend.


17. Route Decision Persistence

Persist the decision before execution.

CREATE TABLE route_decision (
    id UUID PRIMARY KEY,
    payment_attempt_id UUID NOT NULL,
    merchant_id UUID NOT NULL,
    selected_route_id UUID NOT NULL,
    policy_version TEXT NOT NULL,
    cost_model_version TEXT,
    metric_snapshot_id UUID,
    decision_context_hash TEXT NOT NULL,
    decision_json JSONB NOT NULL,
    selected_reason TEXT NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    UNIQUE (payment_attempt_id)
);

CREATE TABLE route_decision_candidate (
    id UUID PRIMARY KEY,
    route_decision_id UUID NOT NULL REFERENCES route_decision(id),
    route_id UUID NOT NULL,
    eligible BOOLEAN NOT NULL,
    score NUMERIC(18,8),
    rejection_reasons JSONB NOT NULL DEFAULT '[]'::jsonb,
    explanation_json JSONB NOT NULL DEFAULT '{}'::jsonb
);

Decision evidence should include:

input context hash
candidate list
rejection reasons
scores
selected reason
policy version
cost model version
metric snapshot reference
manual override if any
operator/incident reference if any

This helps with:

merchant disputes
provider incident analysis
A/B experiment analysis
finance cost analysis
audit/compliance review
routing regression investigation

18. Routing Configuration Lifecycle

Routing config changes are dangerous.

Minimum lifecycle:

draft
validated
approved
scheduled
active
rolled back
archived

Controls:

schema validation
static rule conflict detection
simulation against historical traffic
approval workflow
canary rollout
shadow evaluation
automatic rollback threshold
full audit trail

Example validation:

No active route for IDR QR payment after rule change.
Provider X receives 100% traffic accidentally.
High-risk merchant route loses required 3DS capability.
Currency EUR routed to account settling only USD.
Rule priority creates unreachable fallback.

Build a route simulator before giving non-engineers route editing power.


19. Shadow Routing

Shadow routing means computing a decision without executing it.

production selected route: A
shadow policy selected route: B
execute A
record B for analysis

Use shadow mode for:

new scoring algorithm
new cost model
new provider onboarding
new policy version
new ML/bandit model
new health threshold

Schema:

CREATE TABLE route_shadow_decision (
    id UUID PRIMARY KEY,
    payment_attempt_id UUID NOT NULL,
    experiment_key TEXT NOT NULL,
    shadow_policy_version TEXT NOT NULL,
    selected_route_id UUID,
    decision_json JSONB NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

Shadow routing prevents experimenting with real money before evidence exists.


20. Experimentation and Traffic Allocation

Routing experiments can improve success/cost, but they can also harm merchants.

Guardrails:

small initial allocation
merchant opt-in/segmenting
exclude high-risk or high-value payments
monitor auth rate and timeout rate
monitor chargeback/fraud delayed metrics
monitor settlement/reconciliation breaks
stop-loss threshold
manual kill switch

Traffic allocation:

hash(payment_intent_id + experiment_key) % 100 < allocation_percent

Do not randomize per retry attempt; that makes behavior unstable.

Use stable assignment.

public boolean assignedToExperiment(PaymentIntentId id, String experimentKey, int percent) {
    int bucket = stableHash(id.value() + ":" + experimentKey) % 100;
    return bucket < percent;
}

Do not optimize only authorization success.

A route that improves auth success but doubles chargebacks may be worse.


21. Adaptive Routing Without Losing Control

Dynamic routing can use live metrics.

But payment systems need stability.

Problems with naive adaptive routing:

oscillation: all traffic moves to provider B, B degrades, traffic moves back
small sample overreaction
feedback loop caused by retry behavior
provider-specific decline code differences
cost ignored while chasing success
fraud delayed signal ignored

Controls:

minimum sample size
cooldown windows
maximum traffic shift per interval
confidence thresholds
manual override
route probation state
merchant-level caps
delayed fraud/dispute feedback

A safe adaptive system changes gradually.

Provider A success falls from 92% to 86% over 5 minutes.
Provider B is 91% with sufficient samples.
Shift 10% traffic to B.
Observe.
Shift another 10% if stable.
Do not instantly shift 100%.

22. Operational Overrides

There must be an emergency control plane.

Operations need to:

disable route globally
disable route for merchant
disable route for payment method
disable route for country/currency
force route for merchant temporarily
pause fallback
set provider to probation
add incident reference
schedule automatic expiry of override

Override table:

CREATE TABLE routing_override (
    id UUID PRIMARY KEY,
    scope_type TEXT NOT NULL,
    scope_value TEXT NOT NULL,
    action TEXT NOT NULL,
    route_id UUID,
    provider_id TEXT,
    payment_method_family TEXT,
    reason TEXT NOT NULL,
    incident_ref TEXT,
    effective_from TIMESTAMPTZ NOT NULL,
    expires_at TIMESTAMPTZ,
    created_by TEXT NOT NULL,
    approved_by TEXT,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

Every override must expire or be reviewed.

Permanent emergency overrides become invisible architecture.


23. Java Architecture

Interfaces:

public interface PaymentRouter {
    RouteDecision decide(RoutingContext context);
}

public interface CandidateRouteRepository {
    List<RouteCandidate> findCandidates(RoutingContext context);
}

public interface RouteEligibilityPolicy {
    EligibilityResult evaluate(RouteCandidate candidate, RoutingContext context);
}

public interface RoutingPolicyEngine {
    PolicyEvaluation evaluate(RouteCandidate candidate, RoutingContext context);
}

public interface RouteHealthProvider {
    RouteHealth getHealth(RouteId routeId, OperationType operationType);
}

public interface RouteScoringModel {
    ScoredRoute score(RouteCandidate candidate, RoutingContext context, MetricsSnapshot metrics);
}

Implementation skeleton:

public final class DefaultPaymentRouter implements PaymentRouter {
    private final CandidateRouteRepository candidates;
    private final List<RouteEligibilityPolicy> eligibilityPolicies;
    private final RoutingPolicyEngine policyEngine;
    private final RouteHealthProvider healthProvider;
    private final RouteScoringModel scoringModel;
    private final RouteDecisionRepository decisions;
    private final RoutingClock clock;

    @Override
    public RouteDecision decide(RoutingContext context) {
        List<RouteCandidate> initial = candidates.findCandidates(context);

        List<EvaluatedRoute> evaluated = initial.stream()
                .map(candidate -> evaluate(candidate, context))
                .toList();

        List<ScoredRoute> scored = evaluated.stream()
                .filter(EvaluatedRoute::selectable)
                .map(e -> scoringModel.score(e.candidate(), context, e.metrics()))
                .sorted(Comparator.comparing(ScoredRoute::score).reversed())
                .toList();

        if (scored.isEmpty()) {
            RouteDecision noRoute = RouteDecision.noRoute(context, evaluated, clock.now());
            decisions.insert(noRoute);
            return noRoute;
        }

        RouteDecision decision = RouteDecision.selected(
                context,
                evaluated,
                scored.get(0),
                clock.now()
        );
        decisions.insert(decision);
        return decision;
    }

    private EvaluatedRoute evaluate(RouteCandidate candidate, RoutingContext context) {
        List<DecisionReason> reasons = new ArrayList<>();

        for (RouteEligibilityPolicy p : eligibilityPolicies) {
            EligibilityResult r = p.evaluate(candidate, context);
            reasons.addAll(r.reasons());
            if (!r.eligible()) {
                return EvaluatedRoute.rejected(candidate, reasons);
            }
        }

        PolicyEvaluation policy = policyEngine.evaluate(candidate, context);
        reasons.addAll(policy.reasons());
        if (!policy.allowed()) {
            return EvaluatedRoute.rejected(candidate, reasons);
        }

        RouteHealth health = healthProvider.getHealth(candidate.routeId(), context.operationType());
        if (health.state() == HealthState.DISABLED_MANUAL || health.state() == HealthState.DISABLED_AUTOMATIC) {
            reasons.add(DecisionReason.providerUnhealthy(health.state()));
            return EvaluatedRoute.rejected(candidate, reasons);
        }

        return EvaluatedRoute.selectable(candidate, reasons, health.toMetricsSnapshot());
    }
}

The router should not call provider APIs.

It decides.

The orchestrator executes.


24. API Shape

Internal route decision API:

POST /internal/routing/decisions
Content-Type: application/json

{
  "paymentAttemptId": "pa_123",
  "merchantId": "m_123",
  "amount": {"valueMinor": 10000, "currency": "USD"},
  "paymentMethod": {
    "family": "card",
    "brand": "visa",
    "issuerCountry": "US",
    "funding": "credit"
  },
  "risk": {
    "level": "medium",
    "require3ds": true
  },
  "retry": {
    "attemptNumber": 1
  }
}

Response:

{
  "routeDecisionId": "rd_123",
  "selectedRouteId": "route_us_cards_adyen_001",
  "providerId": "adyen",
  "policyVersion": "routing-2026.07.02",
  "reason": "best_score_after_eligibility_policy_health",
  "candidates": [
    {
      "routeId": "route_us_cards_adyen_001",
      "eligible": true,
      "score": "0.84210000"
    },
    {
      "routeId": "route_us_cards_provider_b_001",
      "eligible": false,
      "rejectionReasons": ["provider_degraded_timeout_rate"]
    }
  ]
}

Public API should usually not expose provider route.

Merchant-facing API may expose generic status:

payment method unavailable
payment route unavailable
payment processing unavailable

Do not leak internal provider incident details unless there is a merchant contract reason.


25. Observability

Routing metrics:

route selected count
route rejected count by reason
no route found count
provider health state
auth success by route
decline code by route
technical failure by route
timeout rate by route
fallback count
retry count
cost estimate by route
cost variance after settlement
manual override active count
policy version distribution
shadow policy difference rate

Dashboard questions:

Which routes are receiving traffic right now?
Which routes were disabled automatically?
Which merchants are affected by no-route errors?
Did policy version X reduce auth success?
Did route B reduce cost but increase disputes?
Are retries causing duplicate unknown states?

Logs must include:

payment_attempt_id
route_decision_id
selected_route_id
policy_version
provider_id
candidate_count
rejection_reasons
score components
metric_snapshot_id

Never log raw PAN/card data.


26. Testing Strategy

26.1 Unit Tests

currency unsupported -> route rejected
merchant not onboarded -> route rejected
provider disabled manually -> route rejected
risk requires 3DS -> non-3DS route rejected
hard decline retry -> no fallback route
timeout unknown -> no different-provider retry
cost scoring selects lower cost when success equal
success scoring selects higher success when cost threshold allows

26.2 Golden Decision Tests

Create fixture files:

routing-input-001.json
routing-policy-2026.07.02.json
metrics-snapshot-001.json
expected-decision-001.json

Run deterministic comparison.

Golden tests prevent accidental routing drift.

26.3 Historical Simulation

Replay last 30 days traffic through new policy in dry-run mode.

Measure:

route distribution change
auth success predicted impact
cost predicted impact
no-route rate
risk policy violations
merchant-level impact

26.4 Chaos/Incident Tests

provider A manual disabled -> traffic shifts safely
provider A timeout spike -> automatic degradation
metrics stale -> router uses fallback policy
all routes disabled -> no-route with explainable reason
policy config invalid -> deployment blocked
shadow policy unavailable -> production routing unaffected

27. Failure Model

FailureNaive SystemProduction Routing Engine
Provider outageall payments failhealth filter/fallback before request
Timeout after provider requestretry elsewheremark unknown; resolve first
Rule change disables all card routescheckout outagevalidation and simulation block release
Low sample success spikesends all traffic to unstable routeconfidence threshold and gradual shift
Manual override forgottensilent permanent behaviorexpiry/review/audit
Cost model overwrittencannot explain marginversioned cost model stored in decision
Merchant not onboardedprovider error at executioneligibility rejects before execution
Risk-required 3DS lostfraud/compliance issuerisk constraints filter route
Duplicate route decisioninconsistent provider selectionone decision per payment attempt

28. Common Anti-Patterns

28.1 Routing Inside Provider Adapter

Bad:

PaymentService calls StripeAdapter
StripeAdapter decides to call AdyenAdapter if failed

Provider adapter should not route.

It should execute one provider contract.

28.2 Routing After Provider Failure Without Outcome Classification

Bad:

catch Exception -> try next provider

This causes duplicate authorizations.

28.3 No Decision Persistence

Bad:

route = router.choose(ctx)
provider.call(route)

If you do not persist the decision, you cannot explain behavior later.

28.4 ML Before Rules

If the team cannot build transparent eligibility, policy, health, and scoring first, ML will amplify confusion.

28.5 Frontend Hardcoded Payment Methods

Checkout display is part of routing. If frontend hardcodes payment methods, backend policy cannot control availability safely.


29. Build Order

Implement in this order:

1. Define RouteCandidate and RouteDecision domain model.
2. Build route/capability registry.
3. Implement eligibility filters.
4. Implement policy version model.
5. Persist decision evidence.
6. Add manual route override.
7. Add health provider and health filter.
8. Add simple transparent scoring.
9. Add retry-aware/fallback-safe policy.
10. Add cost model versioning.
11. Add metrics and dashboards.
12. Add shadow routing.
13. Add historical simulation.
14. Add controlled experimentation.
15. Only then consider adaptive/ML routing.

Do not begin with a dynamic optimization algorithm.

Begin with deterministic correctness.


30. Readiness Checklist

A production routing engine is not ready until:

[ ] route is more precise than provider ID
[ ] route decision is persisted before execution
[ ] every rejected route has reason codes
[ ] routing policy is versioned
[ ] cost model is versioned
[ ] metric snapshot is referenced
[ ] manual override exists with audit and expiry
[ ] provider health is operation-specific
[ ] retry policy prevents duplicate financial effects
[ ] risk constraints are applied before route selection
[ ] no-route response is explainable
[ ] rule changes can be simulated on historical traffic
[ ] shadow routing exists for new policy/model
[ ] dashboards show route distribution and failures
[ ] incident kill switch exists
[ ] checkout payment method display is policy-controlled

31. What You Should Internalize

Payment routing is not an optimization afterthought.

It is where business, risk, reliability, cost, compliance, and customer experience meet.

A strong routing engine does not merely choose the highest success provider.

It produces an explainable execution plan:

eligible
allowed
healthy
scored
selected
persisted
observable
safe to retry/fallback

The production mindset:

Do not route money through a path you cannot explain.
Do not retry money through a path when previous outcome is unknown.
Do not change routing policy without simulation, audit, and rollback.

That is the difference between a payment gateway wrapper and a payment orchestration platform.


References

  • Stripe Docs — Dynamic payment methods.
  • Stripe Docs — Payment method rules.
  • Stripe Docs — Supported payment methods.
  • Adyen Knowledge Hub — Payment orchestration.
  • Adyen Docs — Manage payment methods with API.
  • AWS Builders Library — Making retries safe with idempotent APIs.
  • Stripe Docs — Idempotent requests.
Lesson Recap

You just completed lesson 36 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.