Series MapLesson 29 / 35
Deepen PracticeOrdered learning track

Learn Java Security Cryptography Integrity Part 029 Secure Observability Logging And Sensitive Data Control

20 min read3802 words
PrevNext
Lesson 2935 lesson track2029 Deepen Practice

title: Learn Java Security, Cryptography and Integrity - Part 029 description: Secure observability untuk aplikasi Java: logging yang aman, telemetry hygiene, sensitive-data control, trace/log leakage prevention, security signals, dan operational controls. series: learn-java-security-cryptography-integrity seriesTitle: Learn Java Security, Cryptography and Integrity order: 29 partTitle: Secure Observability, Logging & Sensitive Data Control tags:

  • java
  • security
  • observability
  • logging
  • opentelemetry
  • sensitive-data
  • secure-engineering date: 2026-06-30

Part 029 — Secure Observability, Logging & Sensitive Data Control

Target part ini: kamu mampu mendesain observability untuk sistem Java yang berguna untuk debugging, operasi, deteksi keamanan, dan forensic readiness, tanpa mengubah observability pipeline menjadi sumber kebocoran data, kebocoran secret, atau jalur manipulasi bukti.

Part sebelumnya sudah membahas tamper-evident audit trail sebagai evidence system. Part ini berbeda: kita membahas production observability — logs, metrics, traces, events, alerts, dashboards, dan telemetry pipeline — sebagai sistem operasional yang harus aman, minim data sensitif, dan tetap cukup informatif untuk incident response.

Security observability yang baik bukan berarti “log sebanyak mungkin”. Security observability yang baik berarti:

  1. mencatat event yang benar,
  2. dengan struktur yang konsisten,
  3. tanpa membocorkan sensitive data,
  4. tidak bisa mudah dimanipulasi attacker,
  5. dapat dikorelasikan lintas service,
  6. punya retention dan access control yang sesuai,
  7. bisa menghasilkan signal yang actionable.

OWASP Logging Cheat Sheet menekankan bahwa log dapat mengandung personal/sensitive information dan harus dilindungi dari misuse seperti tampering, unauthorized access, modification, dan deletion. OpenTelemetry juga menekankan bahwa telemetry dapat tanpa sengaja menangkap sensitive/personal information dan perlu handling khusus.


1. Kaufman Deconstruction

Menurut pendekatan Josh Kaufman, skill yang kompleks harus dipecah menjadi sub-skill kecil yang bisa dipraktikkan dan diukur. Untuk secure observability, skill map-nya seperti ini:

Minimum effective learning target

Setelah part ini, kamu harus bisa menjawab dengan jelas:

  • Data apa yang tidak boleh masuk log, metric label, trace attribute, baggage, exception, atau dashboard?
  • Event keamanan apa yang harus tercatat untuk auth, authz, data access, admin action, secret/key usage, dan integration failure?
  • Bagaimana memastikan logging tetap aman saat input berasal dari attacker?
  • Bagaimana membuat log/trace berguna untuk incident response tanpa melanggar privacy/minimization?
  • Bagaimana menguji bahwa aplikasi tidak membocorkan token, password, credential, PII, secret, atau raw payload sensitif?

2. Mental Model: Observability Is a Data Product With Attack Surface

Observability sering diperlakukan sebagai “side effect” aplikasi. Itu salah. Dalam sistem production, observability adalah data product yang punya:

  • producer: application, sidecar, agent, collector;
  • transport: stdout, filebeat, OTLP, syslog, Kafka, HTTP exporter;
  • processor: redactor, sampler, enricher, aggregator;
  • storage: log backend, trace backend, SIEM, data lake;
  • consumer: developer, SRE, security analyst, auditor;
  • policy: retention, access, masking, export, deletion;
  • failure mode: leakage, tampering, flooding, missing signal, false positives.

Security implication: attacker-controlled data can reach your observability plane. If you log user input, exception messages, headers, query strings, payloads, file names, or claims blindly, attacker can:

  • leak secrets into logs;
  • inject fake log lines;
  • poison dashboards;
  • create high-cardinality metrics that increase cost or degrade backend performance;
  • hide malicious actions in noise;
  • exfiltrate sensitive data through third-party telemetry providers;
  • trigger retention violations.

3. Four Different Things: Logs, Metrics, Traces, Audit

Do not collapse these concepts.

SurfacePrimary QuestionTypical PayloadSecurity RiskSecurity Use
Logs“What happened?”structured events, errors, state transitionssensitive-data leakage, injection, excessive detailinvestigation, detection, debugging
Metrics“How much/how often/how slow?”counters, gauges, histograms, labelshigh-cardinality leakage, tenant/user leakageanomaly detection, SLO, abuse detection
Traces“Where did request time go?”spans, attributes, propagation contextheader/payload leakage, cross-tenant correlation leakdistributed investigation
Audit Trail“Who did what, under what authority, to what object?”evidence-grade security eventmissing proof, tampering, ambiguous actoraccountability, regulatory evidence

Audit trail was covered in Part 023. This part may mention audit, but only to clarify boundary.

Practical rule

  • Logs can be verbose but must be sanitized.
  • Metrics must avoid sensitive/high-cardinality labels.
  • Traces must avoid payload/secret attributes and baggage leaks.
  • Audit must be complete, immutable enough, and semantically precise.

4. Security Invariants for Observability

Use these as non-negotiable rules.

Invariant 1 — Sensitive data minimization

Telemetry must not include raw secrets, credentials, session tokens, refresh tokens, private keys, OTPs, password reset tokens, full payment data, or unnecessary personal data.

Invariant 2 — Structured events only for important security signals

Security-relevant logs should be structured. Free-text logs are hard to detect, correlate, parse, and verify.

Invariant 3 — Input data is hostile even inside logs

Log output is an interpreter boundary. Newlines, tabs, terminal escape sequences, JSON-breaking characters, and unbounded strings must be controlled.

Invariant 4 — Correlation IDs are not identity

A correlation ID identifies a flow, not a user. Actor, tenant, client, subject, and authority must be explicitly represented where needed.

Invariant 5 — Observability must degrade safely

If telemetry backend fails, the business request should usually continue unless the event is evidence-critical. But the system must expose telemetry failure as an operational alert.

Invariant 6 — Security signals need owner and runbook

An alert without owner and response procedure is noise.

Invariant 7 — Log access is production data access

Anyone who can read logs may read sensitive operational data. Log access must be least-privilege, monitored, and time-bound.


5. Sensitive Data Taxonomy

A serious team defines a telemetry data classification table. Example:

ClassExamplesAllowed in Logs?Allowed in Traces?Allowed as Metrics Label?Handling
Secretpassword, API key, private key, refresh token, session cookieNoNoNonever collect; redact at source and collector
Authentication materialOTP, magic link, reset token, authorization codeNoNoNonever collect; store hash only if operationally required
Sensitive identityNIK, SSN, passport, full DOBUsually noNoNotokenize/hash with strict need
Personal dataemail, phone, address, namelimitedlimitedusually nominimize; consider keyed hash or internal ID
Financial dataPAN, bank account, card tokenusually nononocomply with domain rules; mask/tokenize
Authorization contextroles, scopes, tenant ID, policy decisionyes, selectedyes, selectedtenant maybe with cautionstructured, avoid excessive detail
Technical identifiersrequest ID, trace ID, object IDyesyesmaybeensure not guessable secret
Security outcomelogin failed, access denied, risk score bandyesyesyesstructured signal

The hidden trap: “It is just metadata”

Metadata can be sensitive. Examples:

  • tenant ID can expose customer relationship;
  • object ID can expose business volume;
  • endpoint name can expose internal capability;
  • trace topology can expose architecture;
  • error class can reveal implementation;
  • metric label with user ID can leak personal data and explode cardinality.

6. Redaction, Masking, Tokenization, Hashing

These terms are often confused.

TechniqueWhat It DoesReversible?Good ForBad For
Redactionremoves valuenosecrets, tokens, passwordsanalysis needing grouping
Maskingshows partial valuepartlydisplay/debug with limited revealsecrets; attackers can combine partials
Tokenizationreplaces with mapped tokenyes through vault/systemregulated identifierscomplex lifecycle, access control
Hashingdeterministic digestno, but brute-force possiblegrouping non-secret high-entropy valueslow-entropy PII like email unless keyed
HMAC/keyed hashdeterministic keyed digestno without keygrouping sensitive identifierskey management required

Practical guidance

  • Passwords, access tokens, refresh tokens, private keys: redact.
  • Email/phone for correlation: prefer keyed hash or internal subject ID.
  • Payment/regulated identifiers: use tokenization through approved boundary.
  • Request body: do not log by default; log schema/size/content type/result.
  • Headers: allowlist, never dump all headers.

7. Java Logging Architecture

A typical Java application uses a facade plus implementation:

  • facade: SLF4J;
  • implementation: Logback or Log4j2;
  • structured encoder: JSON encoder;
  • context propagation: MDC/thread context;
  • export path: stdout, file, OTLP, collector, sidecar;
  • runtime environment: container, Kubernetes, cloud logging agent.

Rule: application code should not assemble ad-hoc security strings

Bad:

log.info("Login failed for user=" + email + " password=" + password);

Better:

securityEvents.authenticationFailed(
    new AuthenticationFailureEvent(
        safeSubjectRef(subjectRef),
        clientRef,
        FailureReason.BAD_CREDENTIALS,
        requestContext
    )
);

The application should expose security semantics, not random strings.


8. Structured Security Event Schema

Define a minimal schema and version it.

{
  "schema_version": "security-event.v1",
  "event_type": "authorization.denied",
  "event_time": "2026-06-30T10:15:30.123Z",
  "service": "case-api",
  "environment": "prod",
  "trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
  "correlation_id": "req-01J...",
  "tenant_ref": "tenant_9f3a",
  "actor_ref": "subject_7c12",
  "client_ref": "oauth-client-risk-engine",
  "source_ip_class": "public",
  "action": "case.read",
  "object_type": "case",
  "object_ref": "case_8842",
  "decision": "deny",
  "reason_code": "missing_scope",
  "risk_level": "medium"
}

Field principles

  • Use stable machine-readable event_type.
  • Use references, not raw personal identifiers.
  • Use reason codes, not full internal policy dumps.
  • Use schema_version for compatibility.
  • Include service/environment to support multi-service investigation.
  • Include trace/correlation IDs for navigation, not as security proof.
  • Include action/object/decision for authorization events.

9. Security Event Taxonomy

Minimum event families for Java enterprise systems:

FamilyEvent ExamplesNotes
Authenticationlogin success/failure, MFA challenge, recovery started/completed, suspicious loginavoid passwords, OTP, reset token
Sessionsession created, renewed, revoked, logout, refresh token rotationlog token family ID, not token
Authorizationaccess denied, privilege escalation attempt, policy override, admin grantinclude action/object/decision
Data accesssensitive object viewed/exported/deleted, bulk read, search over sensitive dataavoid raw data values
Administrative actionrole change, config change, key policy change, feature flag for security controlhigh-value alert candidates
Secret/key usageKMS decrypt, key rotation, signing operation, failed key lookupdo not log key material
Integrationwebhook verification failed, invalid signature, replay detected, partner cert changedinclude partner ref
Input boundaryvalidation failure, parser rejection, SSRF blocked, file upload rejectedavoid raw payload
Abuse signalrate-limit triggered, credential stuffing pattern, enumeration attemptaggregate carefully
Observability healthtelemetry exporter failed, redaction rule error, collector dropped eventsdetect blind spots

10. Java Example: Safe Security Event Logger

A basic design is to make unsafe logging harder.

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.time.Instant;
import java.util.Map;
import java.util.Objects;

public final class SecurityEventLogger {
    private static final Logger log = LoggerFactory.getLogger("SECURITY_EVENT");

    public void authorizationDenied(AuthzDenied event) {
        Objects.requireNonNull(event, "event");

        // Use structured logging support when available.
        // This compact example emits JSON manually only to show the shape.
        log.warn("{}", Json.safeObject(Map.of(
            "schema_version", "security-event.v1",
            "event_type", "authorization.denied",
            "event_time", Instant.now().toString(),
            "service", event.service(),
            "tenant_ref", Ref.safe(event.tenantRef()),
            "actor_ref", Ref.safe(event.actorRef()),
            "action", Ref.safe(event.action()),
            "object_type", Ref.safe(event.objectType()),
            "object_ref", Ref.safe(event.objectRef()),
            "decision", "deny",
            "reason_code", Ref.safe(event.reasonCode()),
            "correlation_id", Ref.safe(event.correlationId())
        )));
    }

    public record AuthzDenied(
        String service,
        String tenantRef,
        String actorRef,
        String action,
        String objectType,
        String objectRef,
        String reasonCode,
        String correlationId
    ) {}
}

Supporting utility:

import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.ObjectMapper;

import java.util.Map;

final class Json {
    private static final ObjectMapper MAPPER = new ObjectMapper();

    static String safeObject(Map<String, ?> fields) {
        try {
            return MAPPER.writeValueAsString(fields);
        } catch (JsonProcessingException e) {
            return "{\"schema_version\":\"security-event.v1\",\"event_type\":\"logging.serialization_failed\"}";
        }
    }
}

And reference sanitizer:

final class Ref {
    private static final int MAX_LENGTH = 128;

    static String safe(String value) {
        if (value == null || value.isBlank()) return "unknown";

        String normalized = value
            .replace('\r', '_')
            .replace('\n', '_')
            .replace('\t', '_');

        if (normalized.length() > MAX_LENGTH) {
            return normalized.substring(0, MAX_LENGTH) + "...";
        }
        return normalized;
    }
}

This is not a full production logger, but it demonstrates the invariant: security logs must be structured, bounded, and sanitized.


11. Log Injection and Log Forging

Log injection happens when attacker-controlled input changes how logs are interpreted.

Example attack input:

alice@example.com\n{"event_type":"admin.role_granted","actor_ref":"attacker"}

Bad log:

log.info("login failed user={}", username);

If the backend treats newline-delimited JSON as event boundary, an attacker can forge extra log entries.

Defenses

  • Use JSON encoder that escapes fields correctly.
  • Do not manually concatenate JSON.
  • Strip/control CRLF from reference-like fields.
  • Bound field length.
  • Use allowlisted fields for security events.
  • Avoid logging raw payloads.
  • Avoid terminal escape sequences in interactive logs.

12. Exception Logging Without Data Leakage

Exceptions often carry sensitive information through message strings.

Bad:

try {
    paymentGateway.charge(request);
} catch (PaymentException e) {
    log.error("Charge failed for request {}", request, e);
}

Problems:

  • request may include card token, billing address, user data;
  • exception message may include provider response body;
  • stack traces can expose internal packages and architecture;
  • repeated failures can flood logs with sensitive data.

Better:

try {
    paymentGateway.charge(request);
} catch (PaymentException e) {
    log.error("payment.charge_failed provider={} request_ref={} reason_code={}",
        Ref.safe(providerRef),
        Ref.safe(request.id()),
        Ref.safe(e.reasonCode()),
        e instanceof RetryablePaymentException ? e : null
    );
    throw e;
}

For high-risk boundaries, prefer mapping external exception into internal reason code before logging.


13. MDC, Correlation ID, and Context Leakage

Mapped Diagnostic Context is useful but dangerous if treated casually.

Common fields:

  • correlation_id
  • trace_id
  • tenant_ref
  • actor_ref
  • client_ref
  • request_path_template
  • deployment_unit

Avoid:

  • raw email,
  • username if personal,
  • session ID,
  • access token,
  • request body,
  • Authorization header,
  • raw query string,
  • sensitive object content.

Java pattern with try/finally

import org.slf4j.MDC;

public final class LoggingContext implements AutoCloseable {
    private final String[] keys;

    private LoggingContext(String... keys) {
        this.keys = keys;
    }

    public static LoggingContext putRequestContext(RequestContext ctx) {
        MDC.put("correlation_id", Ref.safe(ctx.correlationId()));
        MDC.put("tenant_ref", Ref.safe(ctx.tenantRef()));
        MDC.put("actor_ref", Ref.safe(ctx.actorRef()));
        return new LoggingContext("correlation_id", "tenant_ref", "actor_ref");
    }

    @Override
    public void close() {
        for (String key : keys) {
            MDC.remove(key);
        }
    }
}

Usage:

try (LoggingContext ignored = LoggingContext.putRequestContext(ctx)) {
    service.handle(command);
}

Threading warning

MDC is thread-local in common implementations. In async code, thread pools, virtual threads, reactive pipelines, or executor handoff, context propagation must be explicit and tested. Otherwise, logs may be missing context or, worse, use stale context from a different request.


14. OpenTelemetry Hygiene

OpenTelemetry is powerful because it propagates context across services. That also means mistakes propagate across services.

Trace attribute rules

Allowed examples:

service.name=case-api
http.route=/cases/{caseId}
http.request.method=GET
app.tenant_ref=tenant_9f3a
app.case_ref=case_8842
security.decision=deny
security.reason_code=missing_scope

Avoid:

http.request.header.authorization=Bearer eyJ...
http.request.body={...}
user.email=alice@example.com
password=...
reset_token=...

Baggage is especially dangerous

Baggage propagates application-defined key/value pairs downstream. Do not put sensitive data in baggage. Treat baggage like an outbound header to every participating service.

Collector as second line of defense

Application-side minimization is first line. Collector redaction is second line. Do not rely only on collector redaction because:

  • bad telemetry may still exist locally;
  • redaction config can drift;
  • third-party agents may export before redaction;
  • sampling may preserve sensitive spans;
  • processors may not cover all fields.

15. Metrics Security

Metrics can leak sensitive data through labels.

Bad:

login_failure_total{email="alice@example.com", reason="bad_password"} 1
case_read_total{tenant="big-bank", case_id="CASE-2026-000012"} 1

Better:

login_failure_total{reason="bad_credentials", auth_surface="password"} 1
case_read_total{object_type="case", outcome="success"} 1

Cardinality risk

High-cardinality labels can:

  • explode storage cost;
  • overload metrics backend;
  • reveal user/customer/object population;
  • become DoS vector if attacker controls label values.

Metric label invariant

A metric label must be:

  • low-cardinality;
  • non-secret;
  • non-personal unless explicitly approved;
  • not attacker-controlled raw input;
  • stable enough for dashboard/alert semantics.

16. Sensitive Data in HTTP Logging

HTTP request/response logging is one of the most common leakage sources.

Dangerous by default

  • full URL with query string;
  • all headers;
  • request body;
  • response body;
  • cookies;
  • multipart file names/content;
  • upstream/downstream error bodies;
  • debug proxy logs.

Safer approach

Log:

  • method;
  • route template, not raw path if path has identifiers;
  • status code;
  • duration bucket;
  • request size;
  • response size;
  • client classification;
  • correlation ID;
  • selected safe reason code.

Example:

log.info("http.request_completed method={} route={} status={} duration_ms={} correlation_id={}",
    request.method(),
    routeTemplate,
    response.status(),
    duration.toMillis(),
    Ref.safe(correlationId)
);

Avoid:

log.info("request={} headers={} body={}", request.getRequestURI(), headers, body);

17. Security Alert Engineering

Logs are not enough. You need detection rules.

SignalPossible DetectionCommon Noise ProblemBetter Design
Login failuresthreshold per subject/IP/clientNAT/shared IP noisecombine velocity, reputation, subject, ASN, device change
Access deniedrepeated deny on sensitive objectnormal user mistakesalert on deny after policy probing pattern
Signature verification failedinvalid HMAC/JWT/webhookclient bugsseparate invalid format, invalid key id, bad signature, replay
Role changeadmin privilege grantedlegitimate admin opsalert outside change window or without ticket ref
Secret accessunusual KMS decrypt volumebatch jobbaseline per service/key/environment
Telemetry gapno logs from servicedeploymentscorrelate with health checks and exporter errors

Alert quality criteria

A good security alert has:

  • clear condition;
  • severity rationale;
  • owner;
  • runbook;
  • suppression rule;
  • expected false-positive causes;
  • evidence fields;
  • link to trace/log query;
  • test event.

18. Access Control for Observability Backends

Log backend access is production data access.

Minimum controls:

  • role-based access per environment;
  • separation for production vs staging;
  • field-level masking where supported;
  • just-in-time elevated access;
  • query audit logs;
  • retention policy;
  • export/download restrictions;
  • break-glass procedure;
  • monitoring for bulk export;
  • legal/privacy deletion workflow if applicable.

Anti-pattern

“Developers can access all production logs because they need debugging.”

Better:

  • default: limited production log access;
  • elevated access with ticket/timebox;
  • sensitive fields redacted before storage;
  • replayable sanitized traces for debugging;
  • incident room with auditable access.

19. Retention and Data Lifecycle

Observability data has lifecycle:

Design questions:

  • How long do we need hot searchable logs?
  • Which security events need longer retention?
  • Which fields must be removed before storage?
  • Which fields require restricted access?
  • Which backends replicate data to third parties?
  • How do we delete or reprocess telemetry after a redaction bug?
  • What happens when a secret was accidentally logged?

Secret accidentally logged: response checklist

  1. Identify scope: value, services, timeframe, backends.
  2. Revoke/rotate secret immediately.
  3. Stop further leakage at source.
  4. Purge/redact backend if possible.
  5. Invalidate caches/exports.
  6. Review access logs for who viewed/exported it.
  7. Add regression test/redaction rule.
  8. Document incident and preventive control.

20. Secure Observability for Multi-Tenant Systems

Multi-tenant systems need special care.

Risks

  • tenant identifier leaks in shared dashboards;
  • support engineer sees unrelated tenant data;
  • trace spans cross tenant boundary;
  • metrics labels reveal tenant volume;
  • alert notifications include sensitive tenant data;
  • screenshots of dashboards leak customer names.

Controls

  • use tenant reference, not customer legal name;
  • avoid tenant as high-cardinality metrics label unless backend access is controlled;
  • separate dashboards for internal vs customer-facing;
  • row/field-level access if backend supports it;
  • sanitize alert notification payloads;
  • test tenant isolation in telemetry queries;
  • restrict trace search by tenant where possible.

21. Pattern: Security Event Logger as Domain Boundary

Instead of scattering log.warn(...) everywhere, centralize security event construction.

Benefits:

  • consistent schema;
  • easier redaction testing;
  • stable detection rules;
  • code review surface is smaller;
  • security can evolve event schema without hunting random log lines.

22. Pattern: Redaction at Source and Pipeline

Use two layers.

Source minimization

  • do not create sensitive telemetry in the first place;
  • structured safe event APIs;
  • safe exception mapping;
  • no raw body/header dumps.

Pipeline redaction

  • denylist known secret patterns;
  • allowlist approved attributes;
  • redact before export to third-party backend;
  • detect unknown sensitive fields;
  • alert on redaction count spikes.

23. Testing Secure Observability

Testing should prove absence of obvious leaks and presence of required signals.

Test 1 — No token in logs

@Test
void loginFailureDoesNotLogPasswordOrToken() {
    var appender = new InMemoryAppender("SECURITY_EVENT");
    appender.start();

    authenticationService.login("alice@example.com", "P@ssw0rd!", "Bearer abc.def.ghi");

    String output = appender.joinedMessages();
    assertThat(output).doesNotContain("P@ssw0rd!");
    assertThat(output).doesNotContain("abc.def.ghi");
    assertThat(output).contains("authentication.failed");
}

Test 2 — Log injection is neutralized

@Test
void attackerControlledUsernameCannotForgeLogLine() {
    String username = "alice@example.com\n{\"event_type\":\"admin.role_granted\"}";

    securityEvents.authenticationFailed(username, "bad_credentials", requestContext);

    String output = testLogs.singleLine();
    assertThat(output).doesNotContain("\n{\"event_type\":\"admin.role_granted\"}");
    assertThat(output).contains("authentication.failed");
}

Test 3 — Required security event exists

@Test
void authorizationDeniedEmitsStructuredSecurityEvent() {
    assertThatThrownBy(() -> caseService.readCase(actorWithoutScope, caseId))
        .isInstanceOf(AccessDeniedException.class);

    assertThat(testSecurityEvents.events())
        .anySatisfy(event -> {
            assertThat(event.type()).isEqualTo("authorization.denied");
            assertThat(event.action()).isEqualTo("case.read");
            assertThat(event.objectType()).isEqualTo("case");
            assertThat(event.decision()).isEqualTo("deny");
        });
}

24. Common Anti-Patterns

Anti-pattern 1 — Debug logging raw payloads

log.debug("request body: {}", body);

Even debug logs can be enabled accidentally, collected in staging, or captured by support bundles.

Anti-pattern 2 — Logging all headers

log.info("headers={}", headers);

Headers may contain Authorization, cookies, API keys, internal routing data, or personal data.

Anti-pattern 3 — Metrics by user ID

request_total{user_id="123"}

This leaks identity and explodes cardinality.

Anti-pattern 4 — Treating CORS/auth failures as “noise”

Repeated browser boundary failures may be probing, misconfiguration, or active attack. Log structured reason codes with rate controls.

Anti-pattern 5 — Logging stack traces for expected security denials

Access denied is often expected. Do not flood logs with stack traces. Emit structured security event instead.

Anti-pattern 6 — Putting secrets in OpenTelemetry baggage

Baggage propagates. Do not put tokens, email, tenant legal names, or sensitive state in it.

Anti-pattern 7 — Relying only on backend redaction

By the time backend redaction runs, data may already exist in local stdout, sidecar buffers, collector queues, or exported copies.


25. Design Review Checklist

Use this checklist for PRs and architecture reviews.

Event selection

  • Are auth/authz/session/admin/security-boundary events captured?
  • Are success and failure events differentiated?
  • Are reason codes stable and non-sensitive?
  • Are event schemas versioned?
  • Are important security events alertable?

Data minimization

  • No passwords, OTPs, tokens, reset links, API keys, private keys.
  • No raw request/response body by default.
  • No full headers dump.
  • No full query string for sensitive endpoints.
  • PII has explicit justification and approved handling.

Logging safety

  • User input is escaped/sanitized/bounded.
  • Logs are structured.
  • No manual JSON concatenation.
  • CRLF/log forging is tested.
  • Exceptions are mapped before logging where needed.

Telemetry pipeline

  • Collector/exporter uses TLS where applicable.
  • Redaction rules exist at pipeline boundary.
  • Telemetry backend access is least-privilege.
  • Retention is defined.
  • Alert exists for telemetry export failure or ingestion gap.

Metrics/traces

  • Metric labels are low-cardinality and non-sensitive.
  • Trace attributes do not include body/header/token/PII.
  • Baggage does not include sensitive data.
  • Tenant/user/object references are safe and intentional.

26. Lab: Secure Observability Hardening

Scenario

You own a Java service case-api with endpoints:

  • POST /login
  • GET /cases/{caseId}
  • POST /cases/{caseId}/decision
  • POST /webhooks/payment-provider
  • POST /admin/roles

Task A — Define telemetry schema

Create structured events for:

  • login success;
  • login failure;
  • MFA failure;
  • authorization denied;
  • admin role granted;
  • webhook signature failed;
  • case decision submitted;
  • telemetry exporter failed.

For each field, classify:

  • safe;
  • sensitive;
  • derived;
  • forbidden.

Task B — Build redaction tests

Write tests proving logs do not include:

  • password;
  • authorization header;
  • cookie;
  • reset token;
  • raw webhook body;
  • email unless intentionally hashed;
  • private key material;
  • full exception response from payment provider.

Task C — Build alert candidates

Define alert rules for:

  • repeated access denied on same object;
  • signature verification failures above baseline;
  • admin role changes outside change window;
  • sudden drop to zero security events;
  • redaction processor errors;
  • high KMS decrypt failure rate.

Task D — Write incident runbook

Write runbook for “secret accidentally logged”. Include detection, rotation, purge, access review, and regression test.


27. Production Readiness Rubric

LevelDescription
L1Basic logs exist, but raw payload/header leakage is possible.
L2Structured logs for major flows; some redaction; inconsistent schema.
L3Security event taxonomy, redaction tests, safe context propagation, backend access control.
L4Detection rules, telemetry health alerts, retention policy, field-level minimization, incident runbooks.
L5Observability treated as governed data product with schema lifecycle, automated leak tests, access audit, and continuous tuning.

For a top-tier engineering environment, aim for L4 as default and L5 for regulated/high-risk domains.


28. What to Remember

Secure observability is not “more logs”. It is controlled visibility.

The right model:

observe enough to operate and investigate,
collect little enough to avoid creating a new breach surface,
structure enough to automate detection,
protect enough to preserve trust.

Your logs, traces, metrics, dashboards, and alerts are part of the system’s security boundary. Treat them as production data, not debug leftovers.


References

Lesson Recap

You just completed lesson 29 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.