Final StretchOrdered learning track

Observability, Auditability, and Decision Explainability

Learn Java Authorization Pattern - Part 038

Observability, auditability, and explainability for Java authorization systems: decision logs, reason codes, regulatory evidence, safe errors, and incident investigation.

[2026-07-03]16 min read3190 words

In This Lesson

1. Logging vs Audit vs Decision Trace 2. What an Authorization Decision Event Must Contain 3. Decision, Reason, Obligation, Advice

PrevNext

Lesson 3840 lesson track34–40 Final Stretch

#java#authorization#observability#audit+4 more

Part 038 — Observability, Auditability, and Decision Explainability

Goal part ini: kamu bisa membuat authorization system yang bukan hanya “mengizinkan atau menolak”, tetapi bisa menjelaskan, membuktikan, diamati, diinvestigasi, dan dipertanggungjawabkan tanpa membocorkan data sensitif.

Authorization decision adalah salah satu keputusan paling penting dalam sistem enterprise.

Ia menentukan:

Can this person see this evidence?
Can this officer approve this enforcement action?
Can this service export this customer dataset?
Can this admin grant themselves access?
Can this worker execute a command that was authorized yesterday?

Jika jawaban authorization salah, konsekuensinya bisa berupa data breach, regulatory finding, fraud, privilege escalation, operational outage, atau loss of trust.

Karena itu, production-grade authorization membutuhkan tiga capability:

Capability	Question
Observability	What is happening now?
Auditability	What happened, who did it, and why was it allowed/denied?
Explainability	Why did this decision happen, safely and usefully?

Tanpa tiga hal ini, authorization system menjadi black box. Black box authorization sulit dioperasikan, sulit di-debug, dan lemah secara regulatory defensibility.

1. Logging vs Audit vs Decision Trace

Jangan campur semua event menjadi “log”. Ada perbedaan penting.

Type	Audience	Purpose	Durability	Sensitivity
Application log	engineers/SRE	debugging runtime behavior	medium	medium-high
Security log	security/SOC	detect suspicious behavior	high	high
Audit log	auditors/compliance/domain ops	evidence of business/security action	very high	high
Authorization decision log	authz/platform/security	prove and debug decisions	high	high
Policy trace	policy engineers	explain rule evaluation	low-medium	very high

Aplikasi Java sering melakukan ini:

log.info("Access denied for user {}", userId);

Itu tidak cukup.

Production-grade authorization membutuhkan event terstruktur:

{
  "eventType": "authorization.decision",
  "decisionId": "dec_01J...",
  "timestamp": "2026-07-03T10:15:30.145Z",
  "correlationId": "req_abc123",
  "subject": {
    "type": "user",
    "id": "user-alice",
    "tenantId": "tenant-a"
  },
  "actor": {
    "type": "user",
    "id": "user-alice"
  },
  "action": "case.approve",
  "resource": {
    "type": "case",
    "id": "case-123",
    "tenantId": "tenant-a"
  },
  "decision": "DENY",
  "reasonCode": "MAKER_CHECKER_VIOLATION",
  "policyVersion": "case-policy-v42",
  "pep": "case-service:CaseApprovalService.approve",
  "pdp": "opa-sidecar:bundle-2026-07-03.3",
  "latencyMs": 7,
  "externalStatus": 403
}

This is evidence. Not noise.

2. What an Authorization Decision Event Must Contain

Authorization decision event minimal:

Field	Why it matters
`decisionId`	unique evidence reference
`timestamp`	sequence and forensic reconstruction
`correlationId` / `traceId`	link to request, service call, async job
`subject`	who/what is being authorized
`actor`	who initiated action, especially delegated/service flows
`action`	protected operation
`resource`	protected object reference
`context` summary	relevant conditions: channel, time, tenant, risk, workflow state
`decision`	allow/deny/indeterminate
`reasonCode`	stable explanation category
`policyVersion`	reproducibility
`modelVersion`	ReBAC/schema version if applicable
`obligations`	required follow-up behavior
`pep`	where enforcement happened
`pdp`	where decision was made
`latencyMs`	performance and degradation analysis
`cacheStatus`	hit/miss/stale/forced-refresh
`externalStatus`	API response semantics

A strong event distinguishes subject and actor.

Example:

Subject: service:report-worker
Actor: user:alice
Action: case.export
Resource: case-query:monthly-high-risk-report

This matters in delegated and async flows. The worker executed the action, but Alice caused it.

3. Decision, Reason, Obligation, Advice

Do not reduce authorization result to boolean.

public enum DecisionEffect {
    ALLOW,
    DENY,
    INDETERMINATE
}

public record AuthorizationDecision(
    DecisionEffect effect,
    String reasonCode,
    String policyVersion,
    List<Obligation> obligations,
    List<Advice> advice,
    CacheDirective cache,
    AuditDirective audit,
    DecisionDiagnostics diagnostics
) {
    public boolean allowed() {
        return effect == DecisionEffect.ALLOW;
    }
}

Reason code answers:

Which stable category explains this decision?

Obligation answers:

What must the PEP do if it proceeds?

Advice answers:

What helpful non-mandatory information can be returned internally?

Examples:

Decision	Reason	Obligation
ALLOW	`ASSIGNED_INVESTIGATOR`	`REDACT_PII`
ALLOW	`BREAK_GLASS_APPROVED`	`ESCALATE_AUDIT`
DENY	`TENANT_MISMATCH`	none
DENY	`MAKER_CHECKER_VIOLATION`	none
DENY	`INSUFFICIENT_CLEARANCE`	none
INDETERMINATE	`PDP_TIMEOUT`	none; fail closed

Do not expose all reason details to end users. Internal reason and external reason are different.

4. Safe External Error vs Internal Explanation

External response should be safe:

{
  "code": "ACCESS_DENIED",
  "message": "Access denied",
  "correlationId": "req_abc123"
}

Internal audit can be precise:

{
  "decision": "DENY",
  "reasonCode": "MAKER_CHECKER_VIOLATION",
  "reasonDetails": {
    "requestCreatedBy": "user-alice",
    "approver": "user-alice",
    "policyRule": "case.approve.requiresDifferentApprover"
  }
}

Never leak sensitive details externally:

Denied because this case is SEALED due to witness-protection matter.
Denied because user lacks clearance for terrorism-financing category.
Denied because case exists in tenant-bank-secret.
Denied because policy rule high_value_customer_investigation matched.

External errors should avoid object enumeration and sensitive classification leakage. Internal audit must retain enough evidence for investigation.

5. Observability Model

Authorization observability answers operational questions:

Are decisions fast?
Are denies increasing?
Which policy version changed deny rate?
Are PDP calls timing out?
Is cache hit ratio healthy?
Are any endpoints missing authorization decision events?
Are any users repeatedly probing object IDs?

Core metrics:

Metric	Dimension
`authz_decision_total`	action, resource_type, decision, reason_code, service, policy_version
`authz_decision_latency_ms`	action, PDP, cache_status
`authz_pdp_error_total`	error_type, service, policy_version
`authz_cache_hit_ratio`	cache_name, action, resource_type
`authz_deny_rate`	tenant, action, endpoint, reason_code
`authz_indeterminate_total`	reason_code, PDP, service
`authz_obligation_failure_total`	obligation_type, service
`authz_break_glass_total`	tenant, actor_type, action
`authz_policy_diff_total`	old_version, new_version, changed_effect

High-cardinality warning:

Do not use raw user ID or resource ID as Prometheus label.

Use logs/traces for exact IDs. Use metrics labels for bounded dimensions.

6. Trace Authorization in Distributed Systems

Authorization decision should appear in distributed trace, but carefully.

Trace span attributes should include safe bounded values:

authz.decision = DENY
authz.reason_code = MAKER_CHECKER_VIOLATION
authz.action = case.approve
authz.resource_type = case
authz.policy_version = case-policy-v42
authz.cache_status = MISS

Avoid:

authz.resource_title = "Witness retaliation case"
authz.policy_trace = full Rego/Cedar evaluation details
authz.subject_email = alice@example.com

Traces are often widely accessible internally. Treat trace attributes as semi-sensitive.

7. Java Implementation: Decision Event Model

A practical Java model:

public record AuthorizationDecisionEvent(
    String eventType,
    String decisionId,
    Instant timestamp,
    String correlationId,
    String traceId,
    SubjectRef subject,
    ActorRef actor,
    String action,
    ResourceRef resource,
    DecisionEffect decision,
    String reasonCode,
    String externalReasonCode,
    String policyVersion,
    String modelVersion,
    String pep,
    String pdp,
    List<String> obligations,
    String cacheStatus,
    long latencyMs,
    Integer externalStatus,
    Map<String, Object> safeContext
) {}

Builder example:

public final class AuthorizationAuditEmitter {
    private final AuditSink auditSink;
    private final Clock clock;

    public void emit(AuthorizationRequest request,
                     AuthorizationDecision decision,
                     EnforcementMetadata enforcement) {

        AuthorizationDecisionEvent event = new AuthorizationDecisionEvent(
            "authorization.decision",
            DecisionIds.newId(),
            clock.instant(),
            enforcement.correlationId(),
            enforcement.traceId(),
            SubjectRef.from(request.subject()),
            ActorRef.from(request.actor()),
            request.action().value(),
            ResourceRef.safe(request.resource()),
            decision.effect(),
            decision.reasonCode(),
            decision.externalReasonCode(),
            decision.policyVersion(),
            decision.modelVersion(),
            enforcement.pepName(),
            decision.pdpName(),
            decision.obligations().stream().map(Obligation::type).toList(),
            decision.cache().status().name(),
            enforcement.latencyMs(),
            enforcement.externalStatus(),
            SafeContextExtractor.extract(request.context())
        );

        auditSink.write(event);
    }
}

Do not let each controller manually log authorization decisions. Centralize it in PEP/authorization service wrapper.

8. Structured Logging with MDC

Use MDC for correlation, but do not rely on MDC as audit storage.

try (MDC.MDCCloseable ignored1 = MDC.putCloseable("correlationId", correlationId);
     MDC.MDCCloseable ignored2 = MDC.putCloseable("tenantId", tenantId)) {

    AuthorizationDecision decision = authorizationService.check(request);
    auditEmitter.emit(request, decision, metadata);

    if (!decision.allowed()) {
        throw new AccessDeniedException(decision.externalReasonCode());
    }
}

MDC helps connect logs. Audit event is still separate durable evidence.

Threading warning:

MDC does not automatically propagate across async executors, Reactor pipelines, virtual thread boundaries, or Kafka consumers unless configured.

For async jobs, carry correlation and actor metadata explicitly in command/job envelope.

9. Audit Sink Design

Audit sink should not be a best-effort debug log.

Options:

Sink	Strength	Risk
database table	queryable, transactional option	may affect request latency
Kafka topic	scalable, decoupled	delivery/ordering semantics matter
append-only storage	tamper resistance	query complexity
SIEM pipeline	security monitoring	cost/cardinality/sensitivity
cloud audit service	managed retention	vendor constraints

Common production pattern:

For write operations, audit should be aligned with business transaction.

Example:

If case approval commits, the authorization decision event must be durably recorded.
If authorization deny occurs before transaction starts, deny event must still be recorded in security log/audit path.

10. Outbox Pattern for Authorization Audit

For sensitive state-changing actions, use transactional outbox.

CREATE TABLE audit_outbox (
    id UUID PRIMARY KEY,
    event_type TEXT NOT NULL,
    aggregate_type TEXT NOT NULL,
    aggregate_id TEXT NOT NULL,
    payload JSONB NOT NULL,
    created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
    published_at TIMESTAMPTZ NULL
);

Service example:

@Transactional
public void approveCase(Subject subject, String caseId) {
    CaseFile c = caseRepository.findForUpdate(caseId);

    AuthorizationDecision decision = authorizationService.check(
        request(subject, Action.CASE_APPROVE, c.asResource())
    );

    if (!decision.allowed()) {
        auditSecurityDenyOutsideBusinessCommit(subject, caseId, decision);
        throw new AccessDeniedException("ACCESS_DENIED");
    }

    c.approveBy(subject.id());
    caseRepository.save(c);

    auditOutbox.insert(AuthorizationDecisionEventFactory.allow(subject, c, decision));
    auditOutbox.insert(CaseApprovedEventFactory.from(c, subject));
}

Deny events often happen without business state change. Design separate durable security event path for denies.

11. Tamper Resistance and Evidence Integrity

Audit logs are evidence. Treat them accordingly.

Controls:

append-only write path,
restricted delete/update privileges,
retention policy,
immutability or WORM storage for high-value audit,
checksum/hash chain for tamper evidence,
separation between app operators and audit administrators,
clock synchronization,
signed policy artifacts,
policy version included in event,
backup and restore validation.

Simple hash-chain concept:

event_hash_n = hash(event_payload_n + previous_event_hash)

This does not solve all tampering problems, but makes deletion/reordering detectable if implemented correctly and anchored periodically.

Do not over-engineer before threat modeling. But do not pretend plain application logs are regulatory-grade audit evidence.

12. Reason Code Taxonomy

Reason codes should be stable, bounded, and domain meaningful.

Example taxonomy:

Category	Reason code
identity/context	`NO_SUBJECT`, `SUBJECT_INACTIVE`, `MFA_REQUIRED`
tenant	`TENANT_MISMATCH`, `TENANT_SUSPENDED`
capability	`MISSING_PERMISSION`, `ROLE_NOT_ALLOWED`
relationship	`NOT_OWNER`, `NOT_ASSIGNED`, `NOT_TEAM_MEMBER`
state	`INVALID_RESOURCE_STATE`, `CASE_SEALED`, `CASE_CLOSED`
SoD	`MAKER_CHECKER_VIOLATION`, `SELF_APPROVAL_DENIED`
classification	`INSUFFICIENT_CLEARANCE`, `FIELD_REDACTED`
delegation	`DELEGATION_EXPIRED`, `DELEGATION_SCOPE_MISMATCH`
policy/runtime	`NO_MATCHING_POLICY`, `PDP_TIMEOUT`, `POLICY_EVALUATION_ERROR`
emergency	`BREAK_GLASS_REQUIRED`, `BREAK_GLASS_EXPIRED`

Avoid reason codes like:

DENIED
ERROR
RULE_17_FAILED
NOT_ALLOWED_2

Reason codes are part of your operating model.

13. Explainability Levels

Not every user deserves the same explanation.

Audience	Explanation level
external API caller	generic safe error + correlation ID
normal end user	user-actionable but non-sensitive message
support operator	safe reason category and access request path
security analyst	full reason code, subject/resource refs, context summary
policy engineer	policy trace, rule IDs, input snapshot
auditor	evidence chain, policy version, actor, decision, obligation

Example end-user message:

You do not have access to approve this case. Ask a supervisor or request access.

Example support message:

Denied by maker-checker rule. The requester and approver are the same user.

Example policy engineer trace:

{
  "matchedRules": ["case.approve.requiresDifferentApprover"],
  "failedConditions": ["principal.id != resource.createdBy"],
  "inputHash": "sha256:...",
  "policyVersion": "case-policy-v42"
}

Do not expose policy traces broadly.

14. Redaction and Sensitive Data Handling

Authorization events often contain sensitive metadata. Log less, but enough.

Do log:

subject id, tenant id, resource type, resource id, action, decision, reason code, policy version

Be careful with:

email, full name, national ID, account number, case title, investigation category, evidence filename, full policy input, full JWT, access token, request body

Never log:

access token
refresh token
password
client secret
private key
full Authorization header
raw PII fields
full evidence content

Safe resource reference:

public record SafeResourceRef(String type, String id, String tenantId) {
    static SafeResourceRef from(CaseFile c) {
        return new SafeResourceRef("case", c.id(), c.tenantId());
    }
}

Bad:

log.info("Denied access to case {}", caseFile); // toString may include sensitive fields

Good:

log.info("Denied authorization decisionId={} subject={} action={} resourceType={} resourceId={} reason={}",
    decisionId, subject.id(), action, resource.type(), resource.id(), reasonCode);

15. Decision Input Snapshot: How Much to Store?

For reproducibility, you may want to store policy input. But full input may be sensitive and huge.

Options:

Option	Use when
store no input	low risk, low audit requirement
store normalized safe summary	most enterprise systems
store hashed input	prove same input was used without exposing values
store encrypted full input	high assurance/debug need, strict access control
store input in short-lived debug trace	policy engineering only

Recommended default:

Store safe summary + input hash + policy version.

Example:

{
  "inputSummary": {
    "subjectType": "user",
    "subjectTenant": "tenant-a",
    "action": "case.read",
    "resourceType": "case",
    "resourceTenant": "tenant-a",
    "resourceState": "SEALED",
    "relationshipHints": ["assigned"]
  },
  "inputHash": "sha256:2b4f...",
  "policyVersion": "case-policy-v42"
}

This is usually enough for audit and debugging without dumping raw request state.

16. Obligation Observability

If PDP returns obligations, the PEP must prove obligations were executed.

Example decision:

{
  "decision": "ALLOW",
  "reasonCode": "ASSIGNED_INVESTIGATOR",
  "obligations": [
    { "type": "REDACT_FIELDS", "fields": ["nationalId", "witnessAddress"] },
    { "type": "WATERMARK_EXPORT", "value": "user-alice" }
  ]
}

PEP audit should record obligation execution:

{
  "eventType": "authorization.obligation_applied",
  "decisionId": "dec_01J...",
  "obligationType": "REDACT_FIELDS",
  "status": "APPLIED",
  "fields": ["nationalId", "witnessAddress"]
}

If obligation fails:

Fail closed unless the obligation is explicitly non-mandatory advice.

Java guard:

AuthorizationDecision decision = authorizationService.check(request);

if (!decision.allowed()) {
    throw denied(decision);
}

ObligationResult result = obligationExecutor.apply(decision.obligations(), response);
if (!result.success()) {
    auditEmitter.emitObligationFailure(decision, result);
    throw new AccessDeniedException("ACCESS_DENIED");
}

Do not allow “authorized but redaction failed” responses.

17. Detecting Suspicious Authorization Behavior

Authorization logs can detect attacks and misuse.

Signals:

Signal	Possible meaning
many denies for sequential IDs	object enumeration / IDOR probing
many tenant mismatch denies	tenant breakout attempt or bug
sudden deny spike after deploy	policy/config regression
sudden allow spike	dangerous policy expansion
repeated `PDP_TIMEOUT`	authz infrastructure issue
break-glass usage outside hours	emergency misuse
deny then admin grant then allow	suspicious privilege escalation
service account accessing unusual resource type	confused deputy or credential compromise
high export allow volume	data exfiltration risk

Detection example:

Alert: user has >50 TENANT_MISMATCH denies across >20 resource IDs in 10 minutes.

Alert quality matters. Too many noisy denies will be ignored.

18. Dashboards

Useful dashboards:

Authorization Health

decision latency p50/p95/p99
PDP error rate
cache hit/miss rate
indeterminate count
fail-closed count
policy version distribution

Security Posture

deny rate by action/resource type
tenant mismatch trend
BOLA-like probing signals
field redaction count
denied export attempts
break-glass usage

Release Safety

decisions by policy version
allow/deny delta after rollout
shadow evaluation diff
reason code distribution change
obligation failure count

Operational Support

top deny reason codes
access request candidates
user-facing 403/404 count
most denied actions

Dashboard must not expose sensitive user/resource identifiers broadly. Drill-down should require privileged access.

19. Authorization Incident Investigation

When a possible authorization incident occurs, you need reconstructable evidence.

Questions:

Who accessed what?
When?
Through which endpoint/service/job?
What policy version was active?
What attributes/relationships were used?
Was decision cached?
Was the user recently granted/revoked?
Was this human, service, or delegated execution?
Were obligations applied?
Was the response/export actually delivered?
Were similar decisions made for other resources?

Investigation data sources:

Source	Purpose
authorization decision logs	allow/deny evidence
application logs	runtime flow
distributed traces	cross-service path
access grant history	entitlement changes
policy version history	rule changes
tuple/relationship history	ReBAC graph state
audit outbox/topic	durable business evidence
export/download logs	data movement
admin action logs	privilege changes
SIEM alerts	suspicious patterns

If these sources cannot be correlated by correlationId, decisionId, subjectId, and resourceId, incident response becomes guesswork.

20. Access Review and Recertification Support

Authorization audit data should support periodic access review.

Questions:

Who has access?
Who used access?
Who granted access?
Was access still justified?
Which roles/permissions are unused?
Which break-glass grants were used?
Which service accounts accessed high-risk resources?

Decision logs help distinguish:

Access state	Meaning
assigned but never used	candidate for removal
frequently denied	user may need training or access correction
break-glass used	requires post-review
export permission used	requires data movement review
admin grant followed by sensitive read	requires escalation review

Design audit schema so access review does not require fragile ad-hoc log parsing.

21. Break-Glass Observability

Break-glass is an intentional bypass under controlled conditions. It must be louder than normal access.

Break-glass event should include:

actor,
subject if delegated,
resource/action,
justification,
approval/reference if required,
start/end time,
scope,
decision ID,
policy version,
post-review status.

Example:

{
  "eventType": "authorization.break_glass_used",
  "decisionId": "dec_01J...",
  "actor": "user-supervisor-1",
  "action": "case.read",
  "resource": "case:case-789",
  "reasonCode": "BREAK_GLASS_APPROVED",
  "justification": "urgent safety review",
  "expiresAt": "2026-07-03T12:00:00Z",
  "reviewRequired": true
}

Break-glass should generate alert/review workflow, not just a log line.

22. Policy Version and Artifact Provenance

Every decision should identify which policy/model decided it.

For local Java policy:

policyVersion = git commit SHA or semantic policy package version

For OPA:

policyVersion = bundle revision / ETag / signed bundle version

For Cedar/AVP:

policyVersion = policy store version / deployment revision / schema version

For OpenFGA:

modelVersion = authorization model ID
relationshipVersion = tuple store checkpoint/change token if available

This matters when a user asks:

Why was access allowed last Tuesday?

You need to know the policy that existed last Tuesday, not today's policy.

23. Authorization Decision Replay

For high-assurance systems, support decision replay.

Replay requires:

policy version,
input snapshot or reconstructable attributes,
relationship state at time of decision,
code version/PDP version,
time/context,
deterministic evaluator or recorded nondeterministic facts.

Replay modes:

Mode	Description
exact replay	reproduce original decision with original policy/input
current-policy replay	evaluate old input with current policy
what-if replay	evaluate changed attribute/policy scenario
shadow replay	compare proposed policy against historical decisions

Do not promise exact replay if you do not store enough state. Be explicit.

24. Explainability API for Internal Tools

Internal support/security tools often need explanation.

Example endpoint:

POST /internal/authorization/explain

Request:

{
  "subjectId": "user-alice",
  "action": "case.approve",
  "resourceType": "case",
  "resourceId": "case-123"
}

Response:

{
  "decision": "DENY",
  "reasonCode": "MAKER_CHECKER_VIOLATION",
  "explanation": "Requester and approver are the same subject.",
  "policyVersion": "case-policy-v42",
  "relatedFacts": {
    "resourceState": "PENDING_APPROVAL",
    "createdBySameSubject": true,
    "tenantMatched": true
  },
  "recommendedAction": "Ask a different approver with CASE_APPROVE permission."
}

Access to explain endpoint must itself be authorized and audited. Explanation can reveal sensitive facts.

25. Admin UX and Supportability

Authorization observability is not only backend logs. Admin/support UX should show useful permission diagnosis.

Example support view:

User: Alice
Action: Approve case
Resource: CASE-123
Decision: Denied
Reason: Maker-checker violation
Policy version: case-policy-v42
What to do: assign a different approver
Correlation ID: req_abc123

Bad support UX:

403 Forbidden

A good support UX reduces unsafe admin behavior. If support cannot explain denial, they may grant broad roles to “fix it”.

26. Handling Indeterminate Decisions

Indeterminate means the system could not decide safely.

Causes:

Cause	Example
PDP timeout	OPA/AVP/OpenFGA unavailable
attribute unavailable	user department service down
malformed request	missing tenant/resource type
policy error	runtime evaluation failure
schema mismatch	Java sends unexpected field
cache corruption	cannot trust cached decision

Default:

Indeterminate => deny/fail closed for protected actions.

Decision event:

{
  "decision": "INDETERMINATE",
  "effectiveDecision": "DENY",
  "reasonCode": "PDP_TIMEOUT",
  "failMode": "FAIL_CLOSED"
}

Metrics should track indeterminate separately from normal deny. A spike in PDP_TIMEOUT is operational incident, not user behavior.

27. Authorization Log Volume and Sampling

Authorization can happen on every request, so volume can be high.

Do not sample high-risk audit events.

Never sample:

sensitive write allow,
deny for protected actions,
break-glass,
admin role grants,
export/download,
PDP errors/indeterminate,
policy rollout diff,
cross-tenant deny,
service account high-risk access.

May sample:

low-risk repeated read allows,
static asset authorization,
health checks,
internal low-risk cache hits.

Even when sampling metrics/logs, preserve durable audit for legally required actions.

28. Correlation Across Service Boundaries

In microservices, each service may make its own authorization decision.

Example flow:

Gateway: request-level coarse auth
Case service: object-level case auth
Document service: document relation auth
Export service: export entitlement and field policy
Storage service: signed download auth

Each decision should carry:

traceId
correlationId
parentDecisionId if derived from earlier decision
actor/subject continuity
delegation context

Decision chain example:

{
  "decisionId": "dec_export_2",
  "parentDecisionId": "dec_case_read_1",
  "action": "document.download",
  "resource": "document:doc-55",
  "actor": "user-alice",
  "subject": "service:export-service"
}

This prevents confused deputy investigations from becoming impossible.

29. Observability for Query Scoping

Query scoping usually does not call PDP per row. Still, it needs observability.

Log safe scope summary:

{
  "eventType": "authorization.query_scope_applied",
  "subjectId": "user-alice",
  "action": "case.search",
  "resourceType": "case",
  "scopeType": "tenant_assignment_jurisdiction",
  "tenantId": "tenant-a",
  "policyVersion": "case-query-scope-v12",
  "returnedRows": 25,
  "totalVisibleRows": 193,
  "redactedFields": ["nationalId", "sealedReason"]
}

Do not log full SQL with sensitive literals unless access to logs is tightly controlled.

Important query-scope metrics:

returned row count distribution,
visible count vs raw count if computed safely,
scope predicate version,
slow scoped queries,
export rows by subject/action,
field redaction count.

30. Authorization Observability in Kafka/Workers

Worker decision event should include job metadata.

{
  "eventType": "authorization.decision",
  "executionMode": "ASYNC_WORKER",
  "jobId": "job-123",
  "commandId": "cmd-456",
  "submittedAt": "2026-07-03T09:00:00Z",
  "executedAt": "2026-07-03T09:15:00Z",
  "actor": "user-alice",
  "subject": "service:export-worker",
  "action": "case.export.execute",
  "decision": "DENY",
  "reasonCode": "ACCESS_REVOKED_BEFORE_EXECUTION"
}

Worker observability must show:

authorized at submission,
rechecked at execution,
snapshot used or not,
actor preserved,
resource scope preserved,
output delivered or suppressed,
DLQ reason.

Async authorization without actor/resource trace is a future incident.

31. Performance Observability

Decision latency breakdown matters.

Total authz latency = subject resolution + attribute fetch + PDP eval + relationship check + cache + audit emission

Emit timing where useful:

{
  "latency": {
    "totalMs": 18,
    "subjectMs": 1,
    "attributeMs": 6,
    "pdpMs": 8,
    "auditMs": 3
  }
}

But keep metrics cardinality under control. For detailed breakdown, logs/traces are better than labels.

Latency alerts:

p95 authz decision > 50ms for 10 minutes
PDP timeout rate > 1%
cache hit ratio drops from 90% to 20%
indeterminate decisions > baseline

Tie performance alerts to fail-closed behavior. An authorization outage may look like mass 403/500 to users.

32. Security of the Authorization Logs Themselves

Authorization logs are sensitive.

Protect them with:

access control,
least privilege,
purpose-bound access,
encryption at rest,
encryption in transit,
retention limits,
immutable storage for audit-grade events,
monitoring of log access,
redaction/tokenization,
segregation of duties.

A system can leak data through logs even if API response is correct.

Examples of log leakage:

Denied access to sealed case "Witness retaliation involving John Doe".
Policy input: { full JWT, nationalId, address, evidenceNotes }
Debug trace: matched rule high_risk_customer_investigation.

Secure logging is part of authorization design.

33. Testing Observability and Audit

Audit/observability must be tested.

Decision event test:

@Test
void denied_case_read_emits_decision_event() {
    Subject alice = subject("alice", "tenant-a");
    CaseResource hidden = caseResource("case-hidden", "tenant-a").assignedTo("bob");

    assertThatThrownBy(() -> caseService.readCase(alice, hidden.id()))
        .isInstanceOf(AccessDeniedException.class);

    assertThat(auditSink.events()).anySatisfy(event -> {
        assertThat(event.eventType()).isEqualTo("authorization.decision");
        assertThat(event.subject().id()).isEqualTo("alice");
        assertThat(event.action()).isEqualTo("case.read");
        assertThat(event.resource().id()).isEqualTo("case-hidden");
        assertThat(event.decision()).isEqualTo(DecisionEffect.DENY);
        assertThat(event.reasonCode()).isEqualTo("NOT_ASSIGNED");
        assertThat(event.policyVersion()).isNotBlank();
        assertThat(event.correlationId()).isNotBlank();
    });
}

Redaction test:

@Test
void decision_event_does_not_contain_raw_token_or_pii() {
    AuthorizationDecisionEvent event = exerciseSensitiveDecision();

    String serialized = objectMapper.writeValueAsString(event);

    assertThat(serialized).doesNotContain("Bearer ");
    assertThat(serialized).doesNotContain("refresh_token");
    assertThat(serialized).doesNotContain("nationalId");
    assertThat(serialized).doesNotContain("fullAddress");
}

Obligation event test:

@Test
void redaction_obligation_is_logged_as_applied() {
    CaseDto dto = caseService.readCase(subjectWithoutPiiClearance(), "case-123");

    assertThat(dto.nationalId()).isEqualTo("***-**-1234");
    assertThat(auditSink.events()).anySatisfy(event ->
        assertThat(event.eventType()).isEqualTo("authorization.obligation_applied")
    );
}

34. Common Anti-Patterns

Anti-pattern 1: Boolean Decision Only

boolean allowed = authorizationService.can(user, action, resource);

This discards reason, policy version, obligations, cacheability, and audit metadata.

Anti-pattern 2: Logs Without Resource

Access denied for user alice

This is not investigable.

Anti-pattern 3: Logs With Too Much Resource

Access denied for case CaseFile{id=..., title=..., notes=..., evidence=...}

This leaks sensitive data.

Anti-pattern 4: External Error Leaks Internal Policy

Denied because sealed witness-protection case requires tier-4 clearance.

Useful to attacker, harmful to confidentiality.

Anti-pattern 5: No Policy Version

Without policy version, you cannot explain historical decisions after policy changes.

Anti-pattern 6: No Correlation ID

Without correlation, you cannot reconstruct cross-service incidents.

Anti-pattern 7: Audit Best-Effort Only

For sensitive writes, “we tried to log but failed silently” is unacceptable.

Anti-pattern 8: Metrics With Raw IDs

Raw user/resource IDs as metric labels can destroy cardinality and leak sensitive data.

Anti-pattern 9: Unprotected Explain Endpoint

An explain endpoint is itself a data exfiltration risk.

Anti-pattern 10: Debug Trace in Production Response

Never return PDP traces to normal users.

35. Production Checklist

Decision Event

Safety

Observability

Auditability

sensitive allow decisions audited
sensitive deny decisions audited
admin grants audited
break-glass audited and reviewed
export/download audited
policy changes linked to decisions
decision replay strategy defined

Explainability

stable reason code taxonomy
audience-specific explanation levels
support/operator view
policy engineer trace controlled
explanation endpoint protected and audited

36. Final Mental Model

Authorization decision without observability is a hidden runtime behavior.

Authorization decision without auditability is weak evidence.

Authorization decision without explainability becomes operational friction.

Production-grade authorization should produce this chain:

Request
-> AuthorizationRequest
-> AuthorizationDecision
-> Enforcement
-> Safe external response
-> Durable decision event
-> Metrics/traces
-> Audit evidence
-> Explainability path

The best authorization systems are not only strict. They are inspectable, explainable, and defensible.

References

OWASP Authorization Cheat Sheet — permission validation, centralized enforcement, deny-by-default guidance.
OWASP Logging Cheat Sheet — security logging design, sensitive data handling, log integrity considerations.
NIST SP 800-92 — Guide to Computer Security Log Management.
Spring Security Reference — authorization architecture and method/request security testing.
Open Policy Agent Documentation — decision logs, policy bundles, policy testing.
Cedar Policy Language Documentation — permit/forbid, principal/action/resource/context model.
OpenFGA Documentation — relationship tuples, check/list APIs, authorization model versioning.
OpenTelemetry Semantic Conventions — traces, logs, and metrics instrumentation concepts.

Lesson Recap

You just completed lesson 38 in final stretch. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 37

Authorization Testing: Unit, Integration, Property-Based, and Abuse Cases

Next Lesson

Lesson 39

Operational Governance: Admin UX, Approval, SoD, and Access Review