Deepen PracticeOrdered learning track

Sensitive Data Leakage Prevention

Learn Java Microservices File Handling, State, Configuration and Secret Management - Part 056

Sensitive data leakage prevention untuk Java microservices: logs, metrics, traces, exceptions, config dumps, file exports, headers, MDC, OpenTelemetry, and operational tooling.

12 min read2246 words
PrevNext
Lesson 5670 lesson track39–58 Deepen Practice
#java#microservices#sensitive-data#logging+4 more

Part 056 — Sensitive Data Leakage Prevention

The most common data breach in internal systems is not a cinematic exploit.

It is a log line someone thought was harmless.

Sensitive data leakage prevention adalah disiplin memastikan data sensitif tidak keluar dari boundary yang seharusnya.

Dalam Java microservices, kebocoran sering terjadi lewat:

  • logs;
  • metrics;
  • traces;
  • exception messages;
  • HTTP access logs;
  • audit payload;
  • config dumps;
  • actuator endpoints;
  • heap dump;
  • thread dump;
  • generated file/export;
  • dead-letter queue;
  • retry payload;
  • CI/CD logs;
  • support tooling;
  • dashboards;
  • alert messages;
  • presigned URL logs;
  • object key naming.

Part ini bukan hanya “jangan log password”. Kita akan membahas desain sistematis untuk mengklasifikasikan data, membatasi propagation, membuat redaction, dan menguji bahwa leakage tidak terjadi.

OpenTelemetry secara eksplisit menempatkan tanggung jawab handling sensitive data pada implementer, karena telemetry library tidak dapat mengetahui sendiri data mana yang sensitif untuk domain tertentu. OWASP Logging guidance juga menekankan bahwa data sensitif tidak boleh disimpan dalam log tanpa kebutuhan dan kontrol yang tepat.


1. What Counts as Sensitive?

Jangan batasi sensitive data hanya ke password.

Untuk service file/config/secret/state, data sensitif mencakup:

CategoryExamples
Secret materialpassword, token, API key, private key, signing key
Authentication datasession ID, JWT, refresh token, auth code
Authorization capabilitypresigned URL, one-time download token
PIIname, email, phone, address, national ID
Financial dataaccount number, payment detail, invoice sensitive fields
Health/legal datamedical record, investigation note, enforcement evidence
Security metadatainternal IP, service account token path, KMS key use
File payloaduploaded document, attachment, image, evidence
File metadatafilename, case ID, owner, content hash in some contexts
Config sensitive valueendpoint with embedded credential, feature flag revealing security posture
Operational secretdatabase URL with password, broker credential
Correlation-sensitive datarequest ID combined with user ID and case ID

Important:

Data can be non-sensitive alone but sensitive when combined.

Example:

caseId + userId + timestamp + action = sensitive audit context

2. Leakage Surfaces

Every arrow is a potential exfiltration path.

Common weak assumption:

"It's internal logs, so it's okay."

Wrong. Logs are often copied to:

  • central observability vendor;
  • data lake;
  • developer laptop;
  • incident ticket;
  • Slack alert;
  • long-term archive;
  • SIEM;
  • AI analysis tool;
  • support dashboard.

A log line can travel farther than production database access.


3. Data Classification Model

Use classification before redaction.

Example classification:

LevelMeaningHandling
Publicsafe for public docsno special control
Internalinternal operational infoavoid external exposure
Confidentialcustomer/business sensitiveminimize logs, access control
Restrictedsecrets, credentials, regulated datanever log raw
Evidencelegally/audit sensitive artifactstrict access, retention, audit

In code, model data semantics.

public enum Sensitivity {
    PUBLIC,
    INTERNAL,
    CONFIDENTIAL,
    RESTRICTED,
    EVIDENCE
}

Data wrapper:

public record SensitiveValue(String label, String value, Sensitivity sensitivity) {
    @Override
    public String toString() {
        return "[REDACTED:" + label + "]";
    }

    public String revealForAuthorizedUse() {
        return value;
    }
}

This is not perfect memory protection, but it makes accidental logging harder.


4. Logging Rules

4.1 Golden Rule

Log decisions and identifiers, not raw sensitive payloads.

Good:

fileId=FILE-01JZ status=QUARANTINED scanResult=CLEAN actor=user-123 correlationId=req-abc

Bad:

filename=john-smith-medical-report.pdf contentBase64=...
Authorization=Bearer eyJ...
presignedUrl=https://bucket.s3...

4.2 Log Stable Internal IDs

Prefer:

  • fileId;
  • caseId if allowed;
  • actorId;
  • tenantId;
  • correlationId;
  • secretVersion;
  • configVersion.

Avoid raw:

  • filename with personal data;
  • email;
  • token;
  • document content;
  • URL with signed query string;
  • database URL with password.

4.3 Structured Logging

Use structured logging so fields can be redacted by key.

Example Logback/logstash style:

log.info("File accepted fileId={} status={} scanDecision={} correlationId={}",
    fileId,
    status,
    scanDecision,
    correlationId
);

Avoid:

log.info("Upload request: {}", request);

request.toString() may include headers, body, token, filename, or form fields.

4.4 MDC Hygiene

Mapped Diagnostic Context is useful but dangerous.

Good MDC:

MDC.put("correlationId", correlationId);
MDC.put("tenantId", tenantId);
MDC.put("actorId", actorId);

Bad MDC:

MDC.put("authorization", authHeader);
MDC.put("email", userEmail);
MDC.put("fileName", originalFilename);
MDC.put("presignedUrl", url);

Always clear MDC after request in thread-pool environments.

try {
    MDC.put("correlationId", correlationId);
    chain.doFilter(request, response);
} finally {
    MDC.clear();
}

Virtual threads reduce some thread reuse concerns, but MDC propagation still needs explicit design depending on logging framework and async boundaries.


5. Redaction Strategy

Redaction should be layered.

5.1 Prevent

Best: do not create log event with sensitive value.

5.2 Type-Safe Redaction

Use value types whose toString() redacts.

public final class SecretValue {
    private final String value;

    public SecretValue(String value) {
        if (value == null || value.isBlank()) {
            throw new IllegalArgumentException("Secret cannot be blank");
        }
        this.value = value;
    }

    public String reveal() {
        return value;
    }

    @Override
    public String toString() {
        return "[REDACTED]";
    }
}

5.3 Field-Based Redaction

Redact by structured keys:

password
token
authorization
cookie
set-cookie
apiKey
secret
privateKey
presignedUrl
signature
credential

5.4 Pattern-Based Redaction

Useful for fallback:

  • JWT regex;
  • AWS access key pattern;
  • Authorization: Bearer;
  • URL query param X-Amz-Signature;
  • credit card number;
  • email depending policy.

But pattern redaction is not enough. It has false negatives and false positives.

5.5 Collector-Level Redaction

If using OpenTelemetry Collector or log pipeline, redact before export to external backend.

Architecture:

Redaction should happen as close to source as practical, but collector-level protection is a useful backstop.


6. Exception Leakage

Exceptions leak more than logs because developers often log whole object context.

Bad:

throw new IllegalStateException("Failed with request " + request);

Bad:

log.error("Failed to call dependency headers={} body={}", headers, body, ex);

Better:

log.error("Failed to call dependency dependency={} operation={} status={} correlationId={}",
    "scanner",
    "scanFile",
    statusCode,
    correlationId,
    ex
);

6.1 Safe Error Response

Never return internal details to client.

Bad response:

{
  "error": "Failed to connect jdbc:postgresql://db/evidence?user=evidence&password=secret"
}

Better:

{
  "errorCode": "DEPENDENCY_UNAVAILABLE",
  "message": "The request cannot be completed right now.",
  "correlationId": "req-abc"
}

6.2 Exception Classification

public enum ErrorExposure {
    CLIENT_SAFE,
    INTERNAL_ONLY,
    SECURITY_SENSITIVE
}

Use exception mapper:

public record ApiError(
    String code,
    String message,
    String correlationId
) {}

Expose safe message, log internal context with redaction.


7. HTTP Header Leakage

Headers that must be redacted:

Authorization
Cookie
Set-Cookie
X-Api-Key
X-Amz-Security-Token
X-Amz-Signature
Proxy-Authorization
X-Forwarded-Client-Cert

If logging inbound/outbound HTTP, implement explicit allowlist.

Good:

log headers: content-type, content-length, user-agent, x-request-id

Bad:

log all headers except a few known ones

Allowlist beats blocklist.


8. Presigned URL Leakage

Presigned URLs often contain signature and credential scope in query parameters.

Bad:

log.info("Generated presigned URL {}", presignedUrl);

Better:

log.info("Generated presigned URL fileId={} method={} expiresAt={}",
    fileId,
    "GET",
    expiresAt
);

If you must log URL shape:

s3://bucket/evidence/.../payload?signature=[REDACTED]

But usually do not log URL at all.

Also avoid:

  • returning presigned URL in error body;
  • storing presigned URL in audit event;
  • putting presigned URL in metrics label;
  • sending presigned URL to third-party telemetry.

9. Metrics Leakage

Metrics labels are dangerous because they are high-cardinality and widely visible.

Bad:

file_download_total{filename="john-smith-report.pdf", userEmail="john@example.com"}

Better:

file_download_total{status="success", fileType="pdf", tenantTier="enterprise"}

9.1 Metrics Rules

Do not put these in labels:

  • user email;
  • raw user ID if policy forbids;
  • file name;
  • file ID if high cardinality;
  • URL;
  • token;
  • case title;
  • error message;
  • SQL query;
  • object key with semantic data;
  • exception stack trace.

Use bounded labels:

  • operation;
  • status;
  • error class;
  • dependency;
  • region;
  • environment;
  • lifecycle state.

9.2 Secret Version Metrics

In Part 054, we mentioned secret version metrics. Be careful.

Good:

secret_refresh_success_total{secret="evidence-db"}
secret_seconds_until_expiry{secret="evidence-db"}

Maybe acceptable with bounded version:

secret_current_version_info{secret="evidence-db", version="v42"} 1

Avoid raw secret manager version IDs if long/high-cardinality or sensitive. Use sanitized release version if needed.


10. Trace Leakage

Distributed tracing captures:

  • HTTP route;
  • headers;
  • query params;
  • attributes;
  • exception event;
  • database statements;
  • messaging payload metadata.

Threat:

GET /download?token=abc123

If the full URL is captured, token leaks.

10.1 Trace Attribute Policy

Allowed:

http.request.method
http.route
http.response.status_code
service.name
deployment.environment
file.lifecycle.status

Dangerous:

http.url with query string
authorization header
request body
response body
file original name
presigned URL
SQL with literal values

Prefer route template:

/files/{fileId}/download

not:

/files/FILE-123/download?token=...

10.2 Span Events

Bad:

span.addEvent("request", Attributes.of(
    stringKey("body"), requestBody
));

Better:

span.addEvent("file.upload.validated", Attributes.of(
    stringKey("file.lifecycle.status"), "UPLOADED",
    longKey("file.size.bytes"), sizeBytes
));

Even file.size.bytes may be sensitive in some domains, but it is usually safer than filename or content.


11. Audit Log vs Application Log

Do not mix audit and app logs.

Application LogAudit Log
debugging/operationsaccountability/evidence
may be sampled/rotatedretention governed
contains operational eventscontains material decisions
often accessible by engineersrestricted access
redactedredacted but evidence-grade
can include error contextincludes actor/action/object/result

Audit log should answer:

who did what to which artifact, when, under which policy, with what result

But audit log should still avoid raw sensitive payload.

Good audit event:

{
  "eventType": "FILE_DOWNLOAD_GRANTED",
  "actorId": "user-123",
  "artifactType": "EVIDENCE_FILE",
  "artifactId": "FILE-01JZ",
  "policyVersion": "case-access-v7",
  "decision": "ALLOW",
  "correlationId": "req-abc",
  "occurredAt": "2026-07-05T10:00:00Z"
}

Bad audit event:

{
  "eventType": "FILE_DOWNLOAD_GRANTED",
  "presignedUrl": "https://bucket.s3...?X-Amz-Signature=...",
  "jwt": "eyJ..."
}

12. Config and Actuator Leakage

Spring Boot Actuator can expose useful operational endpoints. In production, configure exposure carefully.

Dangerous surfaces:

  • /actuator/env;
  • /actuator/configprops;
  • /actuator/heapdump;
  • /actuator/threaddump;
  • /actuator/logfile;
  • /actuator/prometheus if labels leak;
  • custom debug endpoint.

Rules:

Do not expose config dump endpoints publicly.
Do not assume sanitization catches all secret names.
Do not put secret in config if it belongs in secret store.

Use:

  • endpoint exposure allowlist;
  • management port/network restriction;
  • authentication/authorization;
  • sanitizer customization;
  • disable heapdump in normal production path;
  • no public actuator.

Example:

management:
  endpoints:
    web:
      exposure:
        include: health,info,prometheus
  endpoint:
    env:
      show-values: never

13. Heap Dump and Thread Dump Leakage

Heap dump can contain:

  • tokens;
  • passwords;
  • request bodies;
  • file content chunks;
  • decrypted secrets;
  • user data;
  • cached authorization decisions.

Thread dump can contain:

  • stack frames with sensitive string values in some cases;
  • thread names containing user/request data;
  • SQL/debug context;
  • file names.

Controls:

  • restrict heapdump access;
  • encrypt dump storage;
  • define dump retention;
  • do not auto-upload dumps to broad buckets;
  • sanitize before sharing externally;
  • limit who can trigger dumps;
  • prefer ephemeral secure incident storage.

Java cannot guarantee secret strings disappear immediately from memory. Do not overclaim memory secrecy. Minimize lifetime and avoid unnecessary copies.


14. Dead Letter Queue and Retry Payload Leakage

DLQ often stores failed message payloads.

If message contains:

  • file metadata with personal data;
  • presigned URL;
  • token;
  • raw request body;
  • secret;
  • user detail;

then DLQ becomes sensitive storage.

Rules:

DLQ must be classified according to payload sensitivity.
Retry payload must not contain secret material unless absolutely required.

Better message design:

{
  "eventId": "evt-123",
  "fileId": "FILE-01JZ",
  "operation": "SCAN_FILE",
  "attempt": 3
}

Avoid:

{
  "presignedUrl": "...",
  "fileContentBase64": "...",
  "authorizationHeader": "Bearer ..."
}

Worker can fetch payload by authorized service identity using fileId.


15. Generated Files and Exports

Reports, CSV exports, and debug bundles are a major leakage path.

Controls:

  • classification per export type;
  • explicit user authorization;
  • watermark/audit if needed;
  • row/column filtering by policy;
  • no hidden columns with sensitive data;
  • short-lived download link;
  • retention and cleanup;
  • encryption for archive/export;
  • generated file lifecycle state;
  • export audit event.

CSV-specific risk:

  • formula injection (=HYPERLINK(...));
  • embedded sensitive fields;
  • accidental extra columns;
  • filenames with PII;
  • export stored in public bucket.

Mitigation:

  • escape formula-leading characters if opened in spreadsheet;
  • explicit schema;
  • approved columns;
  • server-side generated safe filename;
  • no direct public access.

16. Object Key and Filename Leakage

Even if payload is protected, object key can leak.

Bad:

s3://evidence-prod/cases/CASE-123/john-smith-police-report.pdf

Better:

s3://evidence-prod/evidence/2026/07/05/FILE-01JZ/payload

Original filename can still be stored as metadata if needed, but access to metadata must be controlled.

Downloaded filename should be sanitized:

public String safeDownloadName(String originalName) {
    String fallback = "download.bin";
    if (originalName == null || originalName.isBlank()) {
        return fallback;
    }

    return originalName
        .replace("\\", "_")
        .replace("/", "_")
        .replace("\r", "")
        .replace("\n", "")
        .replace("\"", "'");
}

Also avoid response header injection through filename.


17. Java Redaction Utilities

17.1 Redacting Headers

public final class HeaderRedactor {
    private static final Set<String> ALLOWED_HEADERS = Set.of(
        "content-type",
        "content-length",
        "user-agent",
        "x-request-id",
        "traceparent"
    );

    public Map<String, String> safeHeaders(Map<String, List<String>> headers) {
        Map<String, String> result = new LinkedHashMap<>();

        for (Map.Entry<String, List<String>> entry : headers.entrySet()) {
            String key = entry.getKey().toLowerCase(Locale.ROOT);

            if (ALLOWED_HEADERS.contains(key)) {
                result.put(key, String.join(",", entry.getValue()));
            } else {
                result.put(key, "[REDACTED]");
            }
        }

        return result;
    }
}

17.2 Redacting URL

public final class UrlRedactor {
    private static final Set<String> SENSITIVE_QUERY_KEYS = Set.of(
        "token",
        "signature",
        "x-amz-signature",
        "x-amz-security-token",
        "access_token",
        "refresh_token",
        "code"
    );

    public URI redact(URI uri) {
        if (uri.getRawQuery() == null) {
            return uri;
        }

        String safeQuery = Arrays.stream(uri.getRawQuery().split("&"))
            .map(pair -> {
                int idx = pair.indexOf('=');
                String key = idx >= 0 ? pair.substring(0, idx) : pair;
                String normalized = URLDecoder.decode(key, StandardCharsets.UTF_8)
                    .toLowerCase(Locale.ROOT);

                if (SENSITIVE_QUERY_KEYS.contains(normalized)) {
                    return key + "=[REDACTED]";
                }

                return pair;
            })
            .collect(Collectors.joining("&"));

        try {
            return new URI(
                uri.getScheme(),
                uri.getAuthority(),
                uri.getPath(),
                safeQuery,
                uri.getFragment()
            );
        } catch (URISyntaxException ex) {
            throw new IllegalArgumentException("Invalid URI", ex);
        }
    }
}

17.3 Safe DTO Logging

Bad:

log.info("Request {}", uploadRequest);

Better:

public record UploadRequestLogView(
    String fileId,
    long sizeBytes,
    String detectedContentType,
    String actorId
) {}

Log the log view, not the domain/request object.


18. Logback Redaction Concept

Example conceptual TurboFilter/encoder approach:

application code should avoid logging sensitive data
+
log encoder redacts known fields
+
log pipeline redacts patterns as backstop
+
backend access controlled

Do not rely only on regex at backend. It is too late if logs are already forwarded to multiple sinks.


19. OpenTelemetry Sensitive Data Policy

OpenTelemetry instrumentation can be automatic. Automatic instrumentation is useful, but it may capture more than expected.

Policy:

Telemetry must be treated as data export.

Controls:

  • disable capture of request/response bodies;
  • sanitize headers;
  • drop query string;
  • use route template;
  • configure DB statement sanitization;
  • collector processor redaction;
  • environment-specific exporter policy;
  • restrict backend access.

Span naming:

Good:

HTTP POST /files/{fileId}/download

Bad:

HTTP POST /files/FILE-123/download?token=abc

20. CI/CD Leakage

CI/CD often handles:

  • decrypted SOPS files;
  • rendered manifests;
  • docker build args;
  • test config;
  • Maven settings;
  • cloud credentials;
  • kubeconfig;
  • secret scanner output.

Rules:

CI logs are production data if they can contain production secret.

Controls:

  • masked secrets;
  • no shell tracing around secret commands;
  • no upload decrypted artifacts;
  • ephemeral runners;
  • restricted job permissions;
  • environment approvals;
  • no secrets in build args;
  • use workload identity/OIDC where possible;
  • scan logs/artifacts for leaks.

Bad:

ARG DB_PASSWORD
RUN echo $DB_PASSWORD

Also bad:

set -x
sops -d secret.sops.yaml

21. Access Control for Observability

If logs contain sensitive operational context, log backend needs access control.

Minimum:

  • environment separation;
  • team-based access;
  • audit access to log search;
  • restricted raw log export;
  • retention limit;
  • masking in UI;
  • break-glass process for sensitive logs;
  • no broad vendor/admin access without review.

Do not make developers query production logs with unrestricted full-text access to all tenants unless policy allows it.


22. Data Minimization Patterns

22.1 Tokenize

Log token reference, not token.

tokenId=tok_123

22.2 Hash with Salt/Pepper

For some identifiers, log irreversible hash for correlation.

Careful: unsalted hash of low-entropy values like email can be brute-forced.

22.3 Truncate

Useful for debugging but still risky.

fileId prefix maybe okay
token prefix not okay unless policy permits

22.4 Classify and Store Separately

Sensitive audit evidence may need restricted audit store, not generic app logs.

22.5 Separate Payload from Control Message

Message carries fileId, not file content or presigned URL.


23. Leakage Prevention Tests

23.1 Unit Test Secret Redaction

@Test
void secretToStringIsRedacted() {
    SecretValue secret = new SecretValue("super-secret");
    assertEquals("[REDACTED]", secret.toString());
}

23.2 Log Capture Test

Use test appender to assert sensitive value not logged.

@Test
void uploadFailureDoesNotLogAuthorizationHeader() {
    String token = "Bearer secret-token";

    service.handleFailure(token);

    assertThat(logs()).doesNotContain("secret-token");
}

23.3 Integration Test Error Response

Given dependency error contains internal secret-looking detail
When API returns error
Then response contains safe code and correlationId
And does not contain JDBC URL/password/token

23.4 Telemetry Test

Validate exported spans:

[ ] no Authorization header
[ ] no query token
[ ] no request body
[ ] no presigned URL
[ ] route template used

23.5 CI Policy Test

Fail build if plaintext secret detected in:

  • repository;
  • rendered manifests;
  • logs;
  • generated docs;
  • test snapshots.

24. Incident Response for Leakage

If sensitive data leaks into logs:

1. Identify data type.
2. Stop further leakage.
3. Rotate secret if credential/capability leaked.
4. Restrict log backend access.
5. Determine exposure window.
6. Delete/purge if policy and system allow.
7. Audit who accessed logs.
8. Notify required parties based on classification/regulation.
9. Add test and redaction rule.
10. Update runbook.

If secret leaked:

Assume compromised. Rotate.

If presigned URL leaked:

Expire quickly, rotate object key/version if necessary,
review access logs, reduce TTL/control issuance.

If PII leaked:

Follow privacy/compliance incident process.

25. Production Checklist

Logging

[ ] No request/response body logging by default
[ ] Header logging allowlist
[ ] Query string redaction
[ ] Secret wrapper redacts toString
[ ] DTO logging uses safe log view
[ ] MDC does not contain sensitive fields
[ ] Logs access controlled

Metrics

[ ] No high-cardinality sensitive labels
[ ] No user email/filename/token/object key labels
[ ] Error labels bounded
[ ] Secret metrics expose health, not values

Tracing

[ ] No auth headers
[ ] No request/response body
[ ] Route templates used
[ ] Query params dropped/redacted
[ ] DB statements sanitized
[ ] Collector redaction configured

Exceptions

[ ] Client responses safe
[ ] Internal exceptions sanitized
[ ] Dependency errors classified
[ ] Stack traces not exposed externally

Config/Actuator

[ ] Actuator env/configprops restricted
[ ] Heapdump endpoint disabled/restricted
[ ] Management endpoint protected
[ ] Config values redacted

Files/Exports

[ ] Object key avoids PII
[ ] Filename sanitized
[ ] Exports have explicit schema
[ ] CSV formula injection handled
[ ] Generated file lifecycle and cleanup defined

CI/CD

[ ] No decrypted secrets in artifacts
[ ] No shell tracing for secret commands
[ ] Secret scanning enabled
[ ] Build logs access controlled

26. Key Takeaways

  1. Telemetry is data export. Treat it as such.
  2. Sensitive data includes more than passwords: presigned URLs, filenames, headers, case IDs, and payload-derived metadata may be sensitive.
  3. Prevent sensitive logging at source; pipeline redaction is only a backstop.
  4. Use allowlists for headers and structured logging fields.
  5. Do not put sensitive data in metrics labels.
  6. Traces must avoid query strings, auth headers, bodies, and raw file metadata.
  7. Audit logs need evidence-grade events, not raw secrets or payload.
  8. Config dumps, heap dumps, DLQs, and CI logs are common leakage paths.
  9. Leakage prevention must be tested like any other invariant.
  10. If credential material leaks, rotate; do not debate intent.

Next, we move deeper into data protection mechanics: Encryption in Transit and at Rest, including TLS, KMS, envelope encryption, object storage encryption, DB encryption, and Java implementation boundaries.


References

Lesson Recap

You just completed lesson 56 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.