Deepen PracticeOrdered learning track

Risk-Based Authentication

Learn Java Authentication Pattern - Part 031

Risk-based authentication untuk Java engineers: risk signal, device fingerprinting yang defensible, velocity, impossible travel, risk score, adaptive challenge, privacy boundary, data model, Spring/Jakarta implementation, observability, testing, dan failure modes.

14 min read2789 words
PrevNext
Lesson 3140 lesson track23–33 Deepen Practice
#java#authentication#risk-based-authentication#adaptive-authentication+9 more

Part 031 — Risk-Based Authentication

Target part ini: membangun risk-based authentication yang bisa dipakai di sistem nyata: bukan sekadar “kalau IP beda, minta OTP”, tetapi sistem yang punya signal, scoring, decision, challenge orchestration, audit, privacy boundary, dan failure model yang jelas.

Risk-based authentication adalah pola untuk menyesuaikan proses autentikasi berdasarkan risiko konteks login.

Bentuk paling sederhana:

Normal login:
  password/passkey/session/token valid -> authenticated

Risk-based login:
  credential valid + context low risk -> authenticated
  credential valid + context medium risk -> step-up challenge
  credential valid + context high risk -> deny / hold / recovery review

Masalahnya: banyak implementasi risk-based auth gagal karena risk engine diperlakukan seperti if-else kecil.

Contoh buruk:

if (!request.ip().equals(lastLoginIp)) {
  requireOtp();
}

Ini rapuh.

VPN, mobile carrier NAT, corporate proxy, IPv6 rotation, travel, remote work, dan privacy relay membuat IP bukan identitas. IP adalah signal, bukan bukti.

Mental model yang lebih benar:

Authentication decision = proof + context + policy + history + risk tolerance

Credential membuktikan kontrol atas authenticator. Risk engine menilai apakah konteks penggunaan authenticator masuk akal. Step-up menaikkan assurance saat konteks tidak cukup dipercaya.


1. Problem Yang Diselesaikan

Authentication biasa menjawab:

Apakah claimant bisa membuktikan kontrol atas authenticator?

Risk-based authentication menambah pertanyaan:

Apakah cara, lokasi, perangkat, waktu, pola, dan target aksi ini masuk akal untuk subject tersebut?

Contoh kasus:

KasusCredential valid?RisikoRespons
User login dari device biasa, lokasi biasaYaRendahAllow
User login dari browser baru setelah password benarYaMediumStep-up MFA
Login dari negara baru 3 menit setelah login domestikYaTinggiDeny / hold
Session lama mencoba ganti MFA factorYaTinggiReauthentication + existing factor
API token valid dipakai dari region yang tidak pernah dipakaiYaMedium/TinggiLimit / revoke / alert
Banyak failed login lintas account dari IP yang samaTidakAbuseThrottle / block

Risk-based auth tidak menggantikan authentication.

Ia mengatur assurance transition.

credential proof -> baseline assurance
context evaluation -> risk classification
policy decision -> allow / step-up / deny / review

2. Core Invariant

Risk-based authentication harus menjaga invariant berikut.

Invariant 1 — Risk is advisory until policy consumes it

Risk score tidak boleh otomatis menjadi keputusan tanpa policy.

Risk engine: "login risk = 72 / high"
Policy engine: "for password login, high risk requires passkey or recovery review"

Kenapa?

Karena risiko yang sama bisa berarti keputusan berbeda.

ContextHigh risk actionDecision
View public dashboardAllow + log
Export customer dataStep-up
Change recovery emailReauth + existing MFA + delay
Create admin API keyStep-up + approval
Suspected account takeoverDeny + reset session family

Invariant 2 — Risk signal is not identity

IP address, user agent, geolocation, ASN, timezone, device fingerprint, dan behavior bukan principal.

Mereka hanya signal.

Principal = subject yang berhasil dibuktikan melalui authenticator.
Signal = fakta lingkungan yang membantu menilai anomali.

Invariant 3 — New device is not always attacker

Jika policy terlalu agresif, user sah akan terkunci.

Risk-based auth harus punya jalur recovery yang aman:

new device -> step-up -> verified -> bind device -> reduce future friction

Bukan:

new device -> deny forever

Invariant 4 — Recovery must not bypass risk

Attackers sering menyerang recovery karena recovery cenderung lebih lemah dari login normal.

forgot password
change email
replace MFA
disable passkey
invite admin
rotate API key

Semua harus melewati risk policy.

Invariant 5 — Risk data must be explainable enough for audit

Untuk sistem enterprise/regulatory, keputusan harus bisa dijelaskan.

Bukan:

denied because model said 0.87392

Minimal:

Denied because impossible travel + new device + high-risk ASN + failed MFA attempts.

Invariant 6 — Do not leak risk decision details to attacker

Internal audit boleh detail. User-facing response harus hati-hati.

Bad:
  "Login blocked because your IP is from ASN 12345 and impossible travel was detected."

Better:
  "We could not verify this sign-in. Please complete additional verification."

Invariant 7 — Step-up must actually raise assurance

Meminta faktor yang mudah dicuri tidak selalu menaikkan assurance.

Contoh lemah:

Password compromised -> send OTP to compromised mailbox -> allow MFA reset

Step-up harus mempertimbangkan independensi faktor.


3. Risk-Based Auth Dalam Lifecycle Authentication

Risk engine bisa dipakai di banyak titik.

Risk evaluation juga muncul setelah login.

Contoh sensitive action:

  • change password
  • change email
  • add/remove MFA factor
  • create API key
  • export data
  • update payout bank account
  • approve enforcement action
  • impersonate user
  • grant admin role

4. Data Model

Risk-based authentication butuh data historis. Tetapi data historis harus dibatasi supaya tidak menjadi privacy liability.

4.1 Core Tables

CREATE TABLE auth_risk_event (
  id UUID PRIMARY KEY,
  tenant_id UUID NOT NULL,
  subject_id UUID,
  account_id UUID,
  event_type TEXT NOT NULL,
  occurred_at TIMESTAMPTZ NOT NULL,

  ip_hash BYTEA,
  ip_prefix TEXT,
  asn INTEGER,
  country_code CHAR(2),
  region_code TEXT,
  city_hash BYTEA,

  user_agent_hash BYTEA,
  browser_family TEXT,
  os_family TEXT,
  device_id UUID,
  device_known BOOLEAN,

  credential_type TEXT,
  authenticator_id UUID,
  assurance_before TEXT,
  assurance_after TEXT,

  risk_score INTEGER,
  risk_level TEXT,
  decision TEXT,
  reason_codes TEXT[] NOT NULL DEFAULT '{}',

  correlation_id TEXT,
  request_id TEXT,
  trace_id TEXT
);

CREATE INDEX idx_auth_risk_event_subject_time
  ON auth_risk_event (tenant_id, subject_id, occurred_at DESC);

CREATE INDEX idx_auth_risk_event_ip_time
  ON auth_risk_event (tenant_id, ip_prefix, occurred_at DESC);

Catatan:

  • Jangan simpan raw IP kalau tidak perlu.
  • Simpan prefix/hashed value untuk korelasi abuse.
  • Simpan reason_codes, bukan hanya score.
  • Gunakan retention policy.

4.2 Known Device

CREATE TABLE trusted_device (
  id UUID PRIMARY KEY,
  tenant_id UUID NOT NULL,
  subject_id UUID NOT NULL,
  device_binding_id BYTEA NOT NULL,
  display_name TEXT,
  first_seen_at TIMESTAMPTZ NOT NULL,
  last_seen_at TIMESTAMPTZ NOT NULL,
  last_ip_prefix TEXT,
  last_country_code CHAR(2),
  assurance_level TEXT NOT NULL,
  trust_state TEXT NOT NULL,
  revoked_at TIMESTAMPTZ,
  revoked_reason TEXT,
  UNIQUE (tenant_id, subject_id, device_binding_id)
);

device_binding_id sebaiknya bukan fingerprint pasif yang agresif.

Lebih baik:

server-issued device cookie + signed binding + rotation + revocation

Daripada:

canvas fingerprint + font list + audio fingerprint + invasive tracking

Untuk enterprise system, passive fingerprinting ekstrem biasanya bermasalah secara privasi, tidak stabil, dan sulit dijelaskan.

4.3 Risk Profile Snapshot

CREATE TABLE subject_risk_profile (
  tenant_id UUID NOT NULL,
  subject_id UUID NOT NULL,
  updated_at TIMESTAMPTZ NOT NULL,

  usual_country_codes TEXT[] NOT NULL DEFAULT '{}',
  usual_asns INTEGER[] NOT NULL DEFAULT '{}',
  known_device_count INTEGER NOT NULL DEFAULT 0,
  last_successful_login_at TIMESTAMPTZ,
  last_mfa_success_at TIMESTAMPTZ,
  last_password_change_at TIMESTAMPTZ,
  last_recovery_change_at TIMESTAMPTZ,

  failed_login_count_1h INTEGER NOT NULL DEFAULT 0,
  failed_login_count_24h INTEGER NOT NULL DEFAULT 0,
  account_takeover_suspected BOOLEAN NOT NULL DEFAULT FALSE,

  PRIMARY KEY (tenant_id, subject_id)
);

Risk profile adalah cache/read model.

Source of truth tetap event.


5. Risk Signal Taxonomy

Risk signal harus diklasifikasi.

5.1 Network Signals

SignalContohKekuatanRisiko salah
IP reputationknown proxy/botnet/VPNMediumMedium
ASNcloud provider / residential ISPMediumMedium
Geo countryIndonesia, Singapore, USMediumHigh untuk VPN/travel
Impossible travelJakarta -> London dalam 10 menitHighMedium
IP velocitybanyak account dari prefix samaHigh untuk abuseMedium

IP bukan bukti attacker. Tetapi IP velocity sangat berguna untuk abuse detection.

5.2 Device Signals

SignalMaknaCatatan
Known device cookiePernah diverifikasiLebih stabil daripada passive fingerprint
User-Agent familyBrowser/OSMudah berubah dan dipalsukan
WebAuthn credentialStrong device-bound credentialBagus untuk phishing resistance
Mobile app device bindingDevice keypair / attestationKuat jika desain benar
Device trust ageBerapa lama device dikenalUseful untuk step-up

Device trust harus revocable.

trusted today does not mean trusted forever

5.3 Account Behavior Signals

SignalMakna
Failed attempts spikeCredential stuffing atau brute force
Password reset just happenedIncreased account takeover risk
MFA factor recently changedHigh risk for sensitive action
New email addedHigh risk for recovery path
Dormant account suddenly activeRisk may be elevated
Admin privileges newly grantedStep-up for critical operations

5.4 Session Signals

SignalMakna
Session ageReauth needed for sensitive action
Last MFA timeStep-up freshness
Authenticator typePassword vs passkey vs mTLS
Assurance levelCurrent confidence
Device binding mismatchPossible hijack
IP/ASN jump mid-sessionPossible proxy/mobile/VPN/hijack

5.5 Action Signals

Risk tidak hanya bergantung pada login.

Aksi menentukan risk tolerance.

ActionRisk tolerance
View own profileLow
Change passwordMedium
Change recovery emailHigh
Disable MFAHigh
Create admin userCritical
Export regulated dataCritical
Approve enforcement sanctionCritical

6. Risk Score Model Yang Bisa Dioperasikan

Jangan mulai dari machine learning.

Mulai dari deterministic scoring yang bisa diuji.

public enum RiskLevel {
    LOW,
    MEDIUM,
    HIGH,
    CRITICAL
}

public record RiskReason(
    String code,
    int weight,
    String description
) {}

public record RiskAssessment(
    int score,
    RiskLevel level,
    List<RiskReason> reasons
) {
    public boolean hasReason(String code) {
        return reasons.stream().anyMatch(r -> r.code().equals(code));
    }
}

Contoh evaluator:

public final class AuthenticationRiskEvaluator {

    public RiskAssessment evaluate(AuthRiskInput input) {
        List<RiskReason> reasons = new ArrayList<>();

        if (!input.knownDevice()) {
            reasons.add(new RiskReason(
                "NEW_DEVICE",
                25,
                "Login from device not previously verified for subject"
            ));
        }

        if (input.impossibleTravelDetected()) {
            reasons.add(new RiskReason(
                "IMPOSSIBLE_TRAVEL",
                60,
                "Login location is inconsistent with recent successful login"
            ));
        }

        if (input.highRiskAsn()) {
            reasons.add(new RiskReason(
                "HIGH_RISK_ASN",
                20,
                "Network ASN is associated with elevated abuse risk"
            ));
        }

        if (input.failedAttemptsForSubjectLastHour() >= 5) {
            reasons.add(new RiskReason(
                "SUBJECT_FAILED_ATTEMPT_SPIKE",
                30,
                "Subject has many recent failed authentication attempts"
            ));
        }

        if (input.passwordChangedRecently()) {
            reasons.add(new RiskReason(
                "RECENT_PASSWORD_CHANGE",
                15,
                "Password was changed recently"
            ));
        }

        int score = reasons.stream().mapToInt(RiskReason::weight).sum();
        return new RiskAssessment(score, classify(score, reasons), reasons);
    }

    private RiskLevel classify(int score, List<RiskReason> reasons) {
        if (reasons.stream().anyMatch(r -> r.code().equals("IMPOSSIBLE_TRAVEL")) && score >= 80) {
            return RiskLevel.CRITICAL;
        }
        if (score >= 70) return RiskLevel.HIGH;
        if (score >= 35) return RiskLevel.MEDIUM;
        return RiskLevel.LOW;
    }
}

Kenapa reason code penting?

Karena audit, debugging, dan policy butuh penjelasan.

score=75 is not actionable
reasons=[NEW_DEVICE, HIGH_RISK_ASN, SUBJECT_FAILED_ATTEMPT_SPIKE] is actionable

7. Risk Policy Engine

Risk score tidak langsung menentukan hasil. Policy yang menentukan.

public enum AuthDecisionType {
    ALLOW,
    STEP_UP,
    DENY,
    HOLD_FOR_REVIEW
}

public record AuthDecision(
    AuthDecisionType type,
    String requiredAssuranceLevel,
    Set<String> allowedChallengeTypes,
    String userMessageCode,
    List<String> auditReasonCodes
) {}

Contoh policy:

public final class AuthenticationRiskPolicy {

    public AuthDecision decide(LoginContext ctx, RiskAssessment risk) {
        if (risk.level() == RiskLevel.CRITICAL) {
            return new AuthDecision(
                AuthDecisionType.DENY,
                "AAL2",
                Set.of(),
                "AUTH_VERIFICATION_FAILED",
                reasonCodes(risk)
            );
        }

        if (risk.level() == RiskLevel.HIGH) {
            return new AuthDecision(
                AuthDecisionType.STEP_UP,
                "AAL2",
                Set.of("PASSKEY", "TOTP", "RECOVERY_REVIEW"),
                "AUTH_ADDITIONAL_VERIFICATION_REQUIRED",
                reasonCodes(risk)
            );
        }

        if (risk.level() == RiskLevel.MEDIUM && ctx.credentialType().equals("PASSWORD")) {
            return new AuthDecision(
                AuthDecisionType.STEP_UP,
                "AAL2",
                Set.of("TOTP", "PASSKEY"),
                "AUTH_ADDITIONAL_VERIFICATION_REQUIRED",
                reasonCodes(risk)
            );
        }

        return new AuthDecision(
            AuthDecisionType.ALLOW,
            ctx.currentAssuranceLevel(),
            Set.of(),
            "AUTH_OK",
            reasonCodes(risk)
        );
    }

    private List<String> reasonCodes(RiskAssessment risk) {
        return risk.reasons().stream().map(RiskReason::code).toList();
    }
}

8. Assurance Level Model

Risk-based auth lebih masuk akal jika session/token memiliki assurance.

public enum AssuranceLevel {
    ANONYMOUS,
    AAL1,
    AAL2,
    AAL3
}

Simplified mapping:

EvidenceApprox assurance
Anonymous requestAnonymous
Password onlyAAL1-ish
Password + TOTPAAL2-ish
Passkey with user verificationStrong phishing-resistant AAL2-ish / potentially stronger depending policy
Hardware-backed phishing-resistant authenticator under strict policyHigher assurance candidate
mTLS workload credentialService assurance, not user assurance

Jangan asal mengklaim compliance.

AAL formal bergantung pada enrollment, proofing, authenticator requirements, verifier requirements, lifecycle, dan federation profile. Tetapi sebagai internal engineering model, AAL1/AAL2/AAL3 berguna untuk menghindari boolean authenticated=true yang terlalu miskin.

public record AuthSession(
    UUID sessionId,
    UUID subjectId,
    AssuranceLevel assuranceLevel,
    Instant authenticatedAt,
    Instant lastStepUpAt,
    Set<String> completedFactors,
    UUID deviceId
) {}

Sensitive action bisa meminta assurance freshness.

public record AssuranceRequirement(
    AssuranceLevel minimumLevel,
    Duration maxAge,
    Set<String> acceptedFactors
) {}

Contoh:

Change display name:
  minimum=AAL1, maxAge=24h

Change password:
  minimum=AAL2, maxAge=15m

Disable MFA:
  minimum=AAL2, maxAge=5m, acceptedFactors must include existing enrolled factor

Create admin API key:
  minimum=AAL2, maxAge=5m, plus admin approval

9. Step-Up Orchestration

Step-up bukan redirect acak.

Step-up adalah state machine.

Step-up challenge harus terikat ke:

  • subject
  • session id
  • tenant
  • action
  • redirect/continuation target
  • expiration time
  • nonce/challenge id
CREATE TABLE step_up_challenge (
  id UUID PRIMARY KEY,
  tenant_id UUID NOT NULL,
  subject_id UUID NOT NULL,
  session_id UUID NOT NULL,
  action_code TEXT NOT NULL,
  challenge_type TEXT NOT NULL,
  status TEXT NOT NULL,
  created_at TIMESTAMPTZ NOT NULL,
  expires_at TIMESTAMPTZ NOT NULL,
  verified_at TIMESTAMPTZ,
  attempt_count INTEGER NOT NULL DEFAULT 0,
  max_attempts INTEGER NOT NULL,
  risk_reason_codes TEXT[] NOT NULL DEFAULT '{}',
  continuation_token_hash BYTEA NOT NULL
);

Continuation target tidak boleh raw URL bebas.

Gunakan signed continuation token atau server-side record.


10. Device Binding Pattern

Device trust sering disalahpahami.

Ada tiga pendekatan umum.

10.1 Passive Fingerprint

Menggabungkan user-agent, screen size, fonts, canvas, timezone, dll.

Kelemahan:

  • privacy risk tinggi
  • tidak stabil
  • bisa dipalsukan
  • susah dijelaskan
  • sering terpengaruh browser privacy protection

Gunakan hanya dengan sangat hati-hati, sebagai low-confidence signal.

Setelah step-up berhasil, server membuat device binding.

device_cookie = random_id + signature / encrypted envelope
server stores hash(random_id)

Flow:

Catatan:

  • Device cookie bukan session cookie.
  • Device cookie tidak membuktikan user hadir.
  • Device cookie bisa dicuri jika endpoint/browser compromise.
  • Device cookie harus revocable.

10.3 Public-Key Device Binding

Device membuat keypair; server menyimpan public key; device menandatangani challenge.

Ini lebih kuat, tetapi implementasi lebih kompleks.

WebAuthn/passkey adalah contoh standar untuk browser context. Mobile app dapat memakai platform keystore/secure enclave, tetapi harus hati-hati dengan attestation dan portability.


11. Impossible Travel

Impossible travel terlihat sederhana, tapi banyak edge case.

Pseudo logic:

public boolean impossibleTravel(LoginLocation previous, LoginLocation current, Instant previousAt, Instant currentAt) {
    if (previous == null || current == null) return false;
    if (previous.countryCode().equals(current.countryCode())) return false;

    double distanceKm = haversine(previous.lat(), previous.lon(), current.lat(), current.lon());
    double hours = Duration.between(previousAt, currentAt).toMinutes() / 60.0;
    if (hours <= 0.0) return true;

    double speedKmh = distanceKm / hours;
    return speedKmh > 900; // intentionally rough threshold
}

Masalah:

Edge caseDampak
VPNGeo appears remote
Corporate proxyMany users share egress
Mobile carrier NATIP geo inaccurate
Privacy relayLocation distorted
Cloud desktop/VDILogin location != user physical location
TravelLegitimate anomaly

Jadi impossible travel sebaiknya menaikkan risk, bukan otomatis permanent block.

impossible travel + known passkey + verified device -> step-up
impossible travel + password only + new device + high-risk ASN -> deny/hold

12. Spring Security Integration

Risk-based auth bisa diintegrasikan pada beberapa titik.

12.1 Setelah Credential Verification

Untuk form/password login, custom AuthenticationSuccessHandler dapat mengevaluasi risk sebelum membuat final session.

Simplified pattern:

@Component
public final class RiskAwareAuthenticationSuccessHandler implements AuthenticationSuccessHandler {

    private final RiskService riskService;
    private final StepUpService stepUpService;
    private final AuthenticationSuccessHandler defaultSuccessHandler;

    public RiskAwareAuthenticationSuccessHandler(
        RiskService riskService,
        StepUpService stepUpService
    ) {
        this.riskService = riskService;
        this.stepUpService = stepUpService;
        this.defaultSuccessHandler = new SavedRequestAwareAuthenticationSuccessHandler();
    }

    @Override
    public void onAuthenticationSuccess(
        HttpServletRequest request,
        HttpServletResponse response,
        Authentication authentication
    ) throws IOException, ServletException {

        LoginRiskContext ctx = LoginRiskContext.from(request, authentication);
        RiskAssessment assessment = riskService.evaluate(ctx);
        AuthDecision decision = riskService.decide(ctx, assessment);

        if (decision.type() == AuthDecisionType.ALLOW) {
            riskService.recordDecision(ctx, assessment, decision);
            defaultSuccessHandler.onAuthenticationSuccess(request, response, authentication);
            return;
        }

        if (decision.type() == AuthDecisionType.STEP_UP) {
            StepUpChallenge challenge = stepUpService.createChallenge(ctx, assessment, decision);
            riskService.recordDecision(ctx, assessment, decision);
            response.sendRedirect("/auth/step-up?challenge=" + challenge.publicId());
            return;
        }

        SecurityContextHolder.clearContext();
        riskService.recordDecision(ctx, assessment, decision);
        response.sendRedirect("/login?error=verification_required");
    }
}

Important caveat:

If credential succeeded but step-up is still pending, do not treat the user as fully authenticated for the whole app.

Patterns:

  1. Store partial authentication in dedicated temporary state, not normal full session.
  2. Or create session with low assurance and block protected actions until step-up.
  3. Avoid giving normal application authorities before step-up is complete.

12.2 Authorization Manager for Sensitive Actions

Sensitive action needs assurance-aware authorization.

public final class AssuranceAuthorizationManager
        implements AuthorizationManager<RequestAuthorizationContext> {

    private final AssuranceRequirement requirement;

    public AssuranceAuthorizationManager(AssuranceRequirement requirement) {
        this.requirement = requirement;
    }

    @Override
    public AuthorizationDecision check(
        Supplier<Authentication> authentication,
        RequestAuthorizationContext context
    ) {
        Authentication auth = authentication.get();
        if (auth == null || !auth.isAuthenticated()) {
            return new AuthorizationDecision(false);
        }

        AuthenticatedPrincipal principal = (AuthenticatedPrincipal) auth.getPrincipal();
        SessionAssurance assurance = principal.assurance();

        boolean sufficientLevel = assurance.level().compareTo(requirement.minimumLevel()) >= 0;
        boolean fresh = assurance.lastVerifiedAt().isAfter(Instant.now().minus(requirement.maxAge()));
        boolean factorOk = assurance.completedFactors().containsAll(requirement.acceptedFactors());

        return new AuthorizationDecision(sufficientLevel && fresh && factorOk);
    }
}

Jika gagal karena assurance, jangan return 403 biasa. Redirect ke step-up atau return structured 401 step_up_required untuk API.

12.3 OAuth2 Resource Server

Untuk API, risk decision biasanya tidak dilakukan di semua request karena mahal.

Pendekatan:

  • validate token setiap request
  • evaluate risk di token issuance / token exchange / sensitive endpoint
  • gunakan token claim untuk assurance dan auth time
  • enforce reauth/step-up untuk sensitive action

Example claims:

{
  "iss": "https://id.example.com/realms/acme",
  "sub": "01J...",
  "aud": "case-management-api",
  "scope": "case:read case:approve",
  "acr": "aal2",
  "amr": ["pwd", "otp"],
  "auth_time": 1783090100,
  "risk_level": "low",
  "tenant_id": "acme"
}

Do not trust arbitrary risk_level claim unless token issuer is trusted and audience is correct.


13. Jakarta Security Integration

With Jakarta Security, risk can be placed inside custom HttpAuthenticationMechanism.

@ApplicationScoped
public class RiskAwareAuthenticationMechanism implements HttpAuthenticationMechanism {

    @Inject IdentityStoreHandler identityStoreHandler;
    @Inject RiskService riskService;
    @Inject StepUpService stepUpService;

    @Override
    public AuthenticationStatus validateRequest(
        HttpServletRequest request,
        HttpServletResponse response,
        HttpMessageContext context
    ) throws AuthenticationException {

        Optional<UsernamePasswordCredential> credential = extractCredential(request);
        if (credential.isEmpty()) {
            return context.doNothing();
        }

        CredentialValidationResult result = identityStoreHandler.validate(credential.get());
        if (result.getStatus() != CredentialValidationResult.Status.VALID) {
            riskService.recordFailure(request);
            return context.responseUnauthorized();
        }

        LoginRiskContext riskContext = LoginRiskContext.from(request, result);
        RiskAssessment assessment = riskService.evaluate(riskContext);
        AuthDecision decision = riskService.decide(riskContext, assessment);

        if (decision.type() == AuthDecisionType.ALLOW) {
            return context.notifyContainerAboutLogin(
                result.getCallerPrincipal(),
                result.getCallerGroups()
            );
        }

        if (decision.type() == AuthDecisionType.STEP_UP) {
            stepUpService.createChallenge(riskContext, assessment, decision);
            return context.redirect("/auth/step-up");
        }

        return context.responseUnauthorized();
    }
}

Principle:

credential validation remains separate from risk policy
risk policy decides final authentication status / assurance level

14. API Response Semantics

For browser:

low risk -> 302 /home
medium risk -> 302 /auth/step-up
high risk -> generic verification failure

For API:

HTTP/1.1 401 Unauthorized
Content-Type: application/json
WWW-Authenticate: Bearer error="insufficient_authentication", error_description="additional verification required"

{
  "error": "step_up_required",
  "challenge_uri": "/auth/challenges/abc",
  "accepted_methods": ["passkey", "totp"]
}

For high-risk deny:

HTTP/1.1 401 Unauthorized
Content-Type: application/json

{
  "error": "verification_failed"
}

Do not expose exact risk reasons in API response. Audit log can store reasons.


15. Risk-Based Auth Untuk Regulated Case Management

Karena banyak enterprise Java systems dipakai untuk workflow sensitif, risk-based auth harus menyatu dengan action semantics.

Contoh enforcement lifecycle:

Read case -> normal session OK
Draft action -> AAL1 OK
Submit recommendation -> AAL2 required
Approve sanction -> AAL2 fresh + known device + no high risk signals
Override escalation -> AAL2 fresh + privileged role + supervisor approval
Export evidence -> AAL2 fresh + audit reason

Mermaid flow:

Key design:

Authentication risk informs workflow controls.
It must not be hidden inside login code only.

16. Observability

Risk-based auth without observability is impossible to tune.

16.1 Metrics

auth_risk_assessment_total{level="low"}
auth_risk_assessment_total{level="medium"}
auth_risk_assessment_total{level="high"}
auth_step_up_required_total{method="totp"}
auth_step_up_success_total{method="passkey"}
auth_step_up_failure_total{method="totp"}
auth_high_risk_deny_total{reason="impossible_travel"}
auth_known_device_bind_total
auth_known_device_revoke_total
auth_false_positive_report_total

16.2 Logs

Log internal event:

{
  "event": "AUTH_RISK_DECISION",
  "tenant_id": "acme",
  "subject_id": "01J...",
  "risk_score": 75,
  "risk_level": "HIGH",
  "decision": "STEP_UP",
  "reason_codes": ["NEW_DEVICE", "HIGH_RISK_ASN", "FAILED_ATTEMPT_SPIKE"],
  "credential_type": "PASSWORD",
  "device_known": false,
  "correlation_id": "req-123"
}

Do not log:

  • full IP if not required
  • raw user agent if policy prohibits
  • OTP codes
  • tokens
  • password values
  • raw device fingerprint material

16.3 Dashboards

Useful dashboards:

DashboardPurpose
Step-up conversionDetect excessive friction
High-risk denies by reasonTune policies
New device login trendDetect attack or rollout issue
Failed MFA trendDetect MFA fatigue / bot automation
Risk false positive reportsReduce user pain
Risk reason distributionAvoid one signal dominating incorrectly

17. Privacy and Data Minimization

Risk-based auth is powerful but dangerous.

Minimize collection.

Collect what is needed.
Hash what can be hashed.
Aggregate where possible.
Expire aggressively.
Explain policy internally.
Restrict access to risk logs.

Bad pattern:

Collect every browser fingerprint property forever because maybe useful.

Better:

Store known-device binding, coarse geo, ASN, reason codes, and event history with retention.

Retention examples:

DataSuggested retention idea
Failed login countershours/days
Risk event audit90 days / policy-dependent
Sensitive action auditlonger, compliance-dependent
Device bindinguntil revoked / inactive timeout
Raw enrichmentsavoid or short retention

18. Testing Strategy

18.1 Unit Tests

Risk evaluator must have deterministic tests.

@Test
void newDeviceAndFailedAttemptsRequiresMediumRisk() {
    AuthRiskInput input = AuthRiskInput.builder()
        .knownDevice(false)
        .failedAttemptsForSubjectLastHour(6)
        .highRiskAsn(false)
        .impossibleTravelDetected(false)
        .build();

    RiskAssessment assessment = evaluator.evaluate(input);

    assertThat(assessment.level()).isEqualTo(RiskLevel.MEDIUM);
    assertThat(assessment.reasons())
        .extracting(RiskReason::code)
        .contains("NEW_DEVICE", "SUBJECT_FAILED_ATTEMPT_SPIKE");
}

18.2 Policy Tests

@Test
void highRiskPasswordLoginRequiresStepUp() {
    RiskAssessment risk = new RiskAssessment(
        75,
        RiskLevel.HIGH,
        List.of(new RiskReason("NEW_DEVICE", 25, ""), new RiskReason("HIGH_RISK_ASN", 50, ""))
    );

    AuthDecision decision = policy.decide(passwordLogin(), risk);

    assertThat(decision.type()).isEqualTo(AuthDecisionType.STEP_UP);
    assertThat(decision.requiredAssuranceLevel()).isEqualTo("AAL2");
}

18.3 Integration Tests

Scenarios:

ScenarioExpected
Known device + passwordallow
New device + passwordstep-up
New device + passkey UVallow or lower friction depending policy
Impossible travel + passworddeny/step-up
Change MFA after fresh login onlyrequire existing factor
High-risk action with stale MFAstep-up
Step-up successassurance raised
Step-up failureno assurance raised

18.4 Abuse Simulation

Simulate:

  • credential stuffing across many accounts
  • one account from many IPs
  • one IP against many accounts
  • repeated MFA failure
  • device cookie theft
  • session hijack with IP jump
  • recovery email replacement

19. Failure Modes

Failure modeRoot causeConsequenceMitigation
Risk score used as identitySignal confused with proofAccount takeover or false trustKeep proof separate from risk
Step-up bypassSensitive endpoint ignores assurancePrivilege escalationCentral assurance enforcement
OverblockingPolicy too aggressiveUser lockout, support loadGraceful step-up/recovery
Weak recovery pathRecovery bypasses MFAAccount takeoverRisk-aware recovery
Raw fingerprint hoardingExcess data collectionPrivacy/legal riskMinimize and retain less
Risk detail leakedError messages expose signalsAttacker adaptsGeneric user response
Device binding never expiresStale trustLong-term compromiseExpiry/revocation
Risk event not auditedNo explainabilityCannot investigateStructured event log
ML black box onlyNo reason codeNon-defensible decisionDeterministic baseline + explanations
IP-based hard denyIP treated as proofFalse positivesUse IP as signal only

20. Anti-Patterns

Anti-pattern: “MFA means no risk engine needed”

MFA does not solve:

  • stolen session
  • MFA fatigue
  • recovery takeover
  • SIM swap
  • device theft
  • malicious insider
  • token replay

Anti-pattern: “Risk-based auth means hidden ML model”

Start deterministic. Only add ML when you have labels, observability, human review, and rollback.

Anti-pattern: “New country equals attacker”

New country means anomaly. It does not prove attacker.

Anti-pattern: “Known device means safe”

Known device can be stolen, infected, shared, or controlled remotely.

Anti-pattern: “Step-up with email OTP is strong enough for all actions”

Email may be compromised or recoverable by attacker. Critical actions need stronger challenge and sometimes administrative approval.


21. Production Checklist

Before shipping risk-based auth:

  • Risk signal taxonomy documented.
  • Risk score has reason codes.
  • Policy decision separated from scoring.
  • User-facing error messages do not leak sensitive risk detail.
  • Step-up state is bound to session, subject, tenant, action, and expiry.
  • Sensitive actions enforce assurance centrally.
  • Recovery flow goes through risk policy.
  • Device trust is revocable and expires.
  • Raw signals minimized and protected.
  • Audit logs include reason codes and decisions.
  • Metrics measure false positives and challenge success.
  • Abuse simulation included in test suite.
  • Incident runbook exists for account takeover spike.
  • Support tooling can explain decision safely.

22. Minimal Implementation Roadmap

Phase 1 — Baseline

  • login event table
  • failed/successful attempts
  • known device cookie
  • simple risk evaluator
  • step-up for new device
  • sensitive action assurance requirement

Phase 2 — Abuse Controls

  • IP prefix velocity
  • subject velocity
  • ASN reputation
  • recovery path risk
  • structured audit dashboard

Phase 3 — Stronger Assurance

  • passkey support
  • phishing-resistant step-up
  • transaction-bound challenge for critical actions
  • admin review for suspicious recovery

Phase 4 — Advanced

  • device key binding
  • adaptive policy by tenant
  • supervised model / anomaly detection
  • user self-service device management
  • automated account takeover response

23. Reference Implementation Shape

Package layout:

com.example.auth.risk
  RiskAssessment.java
  RiskReason.java
  RiskLevel.java
  RiskSignal.java
  AuthRiskInput.java
  AuthenticationRiskEvaluator.java
  AuthenticationRiskPolicy.java
  RiskDecisionRecorder.java

com.example.auth.stepup
  StepUpChallenge.java
  StepUpService.java
  StepUpController.java
  AssuranceRequirement.java
  AssuranceAuthorizationManager.java

com.example.auth.device
  DeviceBinding.java
  DeviceBindingService.java
  DeviceCookieSigner.java
  TrustedDeviceRepository.java

Service boundary:

AuthService verifies credential.
RiskService evaluates context.
StepUpService orchestrates challenges.
SessionService raises/lowers assurance.
AuditService records decision.

Do not put all logic in one login controller.


24. Closing Mental Model

Risk-based authentication is not a magic security add-on.

It is a control loop:

observe context -> score risk -> apply policy -> challenge/deny/allow -> record outcome -> tune policy

The hardest part is not calculating the score.

The hardest part is preserving these boundaries:

proof != risk
risk != decision
authentication != authorization
session != assurance
recovery != bypass

If those boundaries stay clean, risk-based auth becomes a powerful production pattern. If they collapse, the system becomes arbitrary, leaky, and hard to defend.


References

  • NIST SP 800-63B-4 — Digital Identity Guidelines: Authentication and Authenticator Management
  • OWASP Authentication Cheat Sheet
  • OWASP Multifactor Authentication Cheat Sheet
  • OWASP Session Management Cheat Sheet
  • Spring Security Reference — Servlet Authentication Architecture
  • Spring Security Reference — OAuth2 Resource Server
Lesson Recap

You just completed lesson 31 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.