Series/Learn Java Microservices Design and Architect

Series MapLesson 60 / 100

Deepen PracticeOrdered learning track

Data Privacy and Sensitive Data Flow

Learn Java Microservices Design and Architect - Part 060

Data privacy and sensitive data flow in Java microservices: classification, minimization, purpose limitation, redaction, tokenization, retention, deletion, and privacy-aware service collaboration.

[2026-07-05]14 min read2771 words

In This Lesson

1. Core idea 2. Privacy is an architecture constraint, not a legal afterthought 3. Sensitive data is broader than PII

PrevNext

Lesson 60100 lesson track55–82 Deepen Practice

#java#microservices#architecture#privacy+4 more

Part 060 — Data Privacy and Sensitive Data Flow

1. Core idea

In a microservices system, sensitive data does not stay in one place.

It moves through:

APIs
events
queues
logs
traces
metrics
caches
read models
search indexes
object storage
workflow variables
dead-letter queues
batch exports
admin tools
backups
analytics pipelines

A service may “own” data, but once it emits the wrong field into an event or log, many other systems become accidental processors of that data.

Privacy architecture is therefore not only about encryption or access control.

It is about controlling data flow.

The core rule:

Sensitive data should move only when there is a clear purpose, explicit authority, minimal payload, bounded retention, observable access, and a defined deletion/redaction strategy.

2. Privacy is an architecture constraint, not a legal afterthought

Engineers often treat privacy as a checklist handled by legal/compliance.

That fails in microservices because privacy is implemented through technical choices:

Which service owns personal data?
Which event payload includes personal data?
Which read model duplicates personal data?
Which log field exposes a name, email, phone number, token, or case detail?
Which team can query which data?
Which cache retains sensitive payloads?
Which backup contains deleted user data?
Which workflow variable stores evidence content?
Which service exports CSV files?

Legal may define obligations. Architecture determines whether the system can actually satisfy them.

GDPR Article 5 describes principles such as lawfulness/fairness/transparency, purpose limitation, data minimisation, accuracy, storage limitation, and integrity/confidentiality. NIST Privacy Framework is designed to help organizations identify and manage privacy risk. OWASP logging guidance also warns not to log sensitive data unnecessarily.

The engineering translation is simple:

Every data field needs an owner, purpose, classification, propagation rule, retention rule, and protection rule.

3. Sensitive data is broader than PII

PII is important, but privacy-sensitive data includes more than name/email/address.

Category	Examples	Risk
Direct identifiers	Name, email, phone, national ID, passport	Re-identification
Indirect identifiers	Date of birth, location, employer, device ID	Linkability
Sensitive attributes	Health, finance, biometrics, religion, political affiliation	Harm/discrimination
Case/regulatory data	Allegation, evidence, enforcement status	Legal/reputational harm
Security data	Session token, API key, password reset token	Account compromise
Operational secrets	DB password, signing key, webhook secret	System compromise
Behavioral data	Access history, risk score, model signal	Surveillance/profiling
Tenant data	Tenant ID, contract tier, internal segmentation	Commercial exposure

Do not build privacy rules around a narrow definition of PII only.

4. Data classification model

A practical classification model for microservices:

PUBLIC
  Safe to disclose publicly.

INTERNAL
  Internal business data. Not public, but not highly sensitive.

CONFIDENTIAL
  Business-sensitive or customer-sensitive data. Access controlled.

RESTRICTED
  High-impact personal, regulatory, financial, security, legal, or secret data.

SECRET
  Credentials, tokens, private keys, signing keys, passwords.

Classification should attach to fields, not just databases.

Example:

resource: EnforcementCase
fields:
  caseId:
    classification: INTERNAL
    owner: case-service
    retention: case_retention
  partyName:
    classification: RESTRICTED
    owner: party-service
    retention: party_retention
    propagation: reference_only
  allegationSummary:
    classification: RESTRICTED
    owner: allegation-service
    retention: enforcement_retention
    propagation: restricted_event_only
  riskScore:
    classification: CONFIDENTIAL
    owner: risk-service
    propagation: reason_code_only
  accessToken:
    classification: SECRET
    propagation: never

The classification must drive API design, event design, logging, tracing, caching, analytics, and retention.

5. Data-flow map

You cannot protect what you cannot see.

Create a data-flow map for every sensitive field class.

For each edge, ask:

What fields move?
Why do they move?
Who approved this purpose?
Is the receiver allowed to store them?
How long are they retained?
Are they encrypted?
Are they logged?
Are they indexed?
Can they be deleted or redacted?
Is the transfer observable?

A data-flow diagram without field classification is just a network diagram.

6. Data minimization

Data minimization means a service should receive and retain only the data it needs for a defined purpose.

It is one of the most important privacy principles because it reduces breach impact, compliance scope, cognitive load, and accidental coupling.

6.1 Bad command design

{
  "caseId": "case-123",
  "party": {
    "partyId": "party-456",
    "name": "Jane Doe",
    "email": "jane@example.com",
    "phone": "+62...",
    "dateOfBirth": "1990-01-01",
    "nationalId": "...",
    "address": "..."
  },
  "reason": "ESCALATE"
}

If the escalation service only needs partyId and risk category, this payload is over-collected.

6.2 Better command design

{
  "caseId": "case-123",
  "partyRef": "party-456",
  "riskCategory": "HIGH",
  "reason": "ESCALATE"
}

Even better, if the service can fetch authorized data from the owning service when required, pass references and reason codes rather than copying sensitive fields.

6.3 Java DTO discipline

Avoid reusing rich internal DTOs at API boundaries.

public record EscalateCaseRequest(
    String caseId,
    String partyId,
    String escalationReason,
    String expectedCaseVersion,
    String idempotencyKey
) {}

Do not accept PartyDto “just in case”.

A DTO should express the minimum command payload needed for the use case.

7. Purpose limitation as a service contract

A service should not process data merely because it can access it.

Every sensitive-data flow should have a purpose label:

flow: case-service -> notification-service
fields:
  - partyId
  - notificationTemplateId
  - communicationChannel
purpose: notify_party_about_case_deadline
legal_basis_or_authority: internal_policy:data-processing-notification-v3
retention: 30_days
receiver_storage: transient_only

Purpose labels help answer:

Why does this service receive this data?
Can the data be reused for analytics?
Can the data be stored permanently?
Can the data be joined with another dataset?
Can the data be exported?

In regulated systems, “we already had the data” is not a sufficient purpose.

8. Privacy-aware event design

Events are one of the easiest ways to accidentally spread sensitive data.

8.1 Bad event

{
  "eventType": "case.created",
  "caseId": "case-123",
  "partyName": "Jane Doe",
  "email": "jane@example.com",
  "phone": "+62...",
  "allegationText": "...",
  "evidenceSummary": "..."
}

This event makes every consumer a processor of sensitive party and allegation data.

8.2 Better event

{
  "eventType": "case.created",
  "eventVersion": "2.0",
  "caseId": "case-123",
  "partyRef": "party-456",
  "classification": "RESTRICTED",
  "jurisdiction": "ID",
  "occurredAt": "2026-07-05T10:15:30Z"
}

Consumers that truly need party details can request them from the owning service under explicit authorization and purpose.

8.3 When event-carried state is justified

Event-carried state transfer is useful for decoupling, but dangerous for privacy.

Use it when:

The data is not highly sensitive, or
The receiver has a legitimate stable need, and
Retention is defined, and
Consumers are known/controlled, and
Payload classification is explicit, and
Deletion/redaction strategy exists.

Avoid it when:

The field is a secret.
The field is highly sensitive and rarely needed.
The event is broadly subscribed.
Consumer list is unknown.
The field may need deletion/correction.
The event bus retains payloads long-term.

9. Sensitive data in observability

Logs, traces, and metrics are frequent privacy leaks.

9.1 Logs

Never log:

Passwords
Access tokens
Refresh tokens
Session IDs
API keys
Private keys
Full national IDs
Payment card data
Full evidence content
Large personal payloads

Be careful with:

Email addresses
Phone numbers
Names
Addresses
Dates of birth
IP addresses
User agents
Case descriptions
Error messages containing payload snippets

9.2 Traces

Trace attributes are often indexed and searchable.

Bad:

span.setAttribute("party.email", request.email());
span.setAttribute("evidence.text", evidence.text());

Better:

span.setAttribute("case.id", caseId.value());
span.setAttribute("party.ref", partyId.value());
span.setAttribute("evidence.count", evidenceCount);
span.setAttribute("data.classification", "RESTRICTED");

9.3 Metrics

Metrics labels must have low cardinality and must not expose personal data.

Bad:

case_submitted_total{email="jane@example.com"}

Better:

case_submitted_total{tenant="public-sector", jurisdiction="ID", channel="portal"}

Even labels like caseId or userId are usually dangerous because they explode cardinality and expose identifiers in monitoring systems.

10. Redaction as code

Do not rely on every developer remembering what to redact.

Build redaction into shared infrastructure.

10.1 Sensitive annotation

@Retention(RetentionPolicy.RUNTIME)
@Target({ElementType.FIELD, ElementType.RECORD_COMPONENT})
public @interface Sensitive {
    Sensitivity value();
}

public enum Sensitivity {
    PII,
    RESTRICTED,
    SECRET
}

Example DTO:

public record PartyProfile(
    String partyId,
    @Sensitive(Sensitivity.PII) String fullName,
    @Sensitive(Sensitivity.PII) String email,
    @Sensitive(Sensitivity.RESTRICTED) String nationalId,
    @Sensitive(Sensitivity.SECRET) String resetToken
) {}

10.2 Safe logging wrapper

public final class SafeLog {
    private final Redactor redactor;

    public SafeLog(Redactor redactor) {
        this.redactor = redactor;
    }

    public Object field(String name, Object value) {
        return redactor.redact(name, value);
    }
}

Usage:

log.info("party lookup failed partyId={} reason={}",
    safeLog.field("partyId", partyId),
    reasonCode
);

Do not serialize full request bodies into logs unless a strict allowlist controls fields.

11. Allowlist beats denylist

A denylist says:

“Log everything except fields named password, token, secret.”

This fails when new sensitive fields appear:

credential
apiKey
authorizationCode
nationalId
documentText
evidenceContent

An allowlist says:

“Only these approved fields may enter logs/traces/events.”

For sensitive systems, prefer allowlists at boundaries:

loggable_fields:
  EscalateCaseRequest:
    - caseId
    - escalationReason
    - expectedCaseVersion

12. Tokenization, pseudonymization, and encryption

These mechanisms solve different problems.

Mechanism	What it does	Main use
Encryption	Makes data unreadable without key	Protect storage/transport
Hashing	One-way digest	Integrity, matching when salt/strategy is safe
Tokenization	Replaces sensitive value with token	Reduce exposure in downstream systems
Pseudonymization	Replaces direct identity with pseudonym	Reduce linkability, still reversible/relatable under controls
Anonymization	Removes ability to identify individual	Analytics/public sharing, hard to guarantee
Redaction	Removes/masks data from output	Logs, UI, exports, support tooling

Encryption is not minimization.

Encrypted personal data is still personal/sensitive data if the organization can decrypt or link it.

13. Field-level access control

A user may be allowed to view a case but not every field in the case.

Example:

{
  "caseId": "case-123",
  "status": "UNDER_REVIEW",
  "party": {
    "partyId": "party-456",
    "displayName": "Jane D.",
    "nationalId": "REDACTED"
  }
}

Field-level access should be based on:

Role/permission
Purpose
Case assignment
Tenant
Jurisdiction
Sensitivity level
Break-glass status
Data subject relationship
Time-bound authorization

13.1 Java field projection model

public record CaseViewPolicy(
    boolean canSeePartyName,
    boolean canSeeNationalId,
    boolean canSeeEvidenceSummary,
    boolean canSeeInternalNotes
) {}

public CaseResponse toResponse(CaseAggregate aggregate, CaseViewPolicy policy) {
    return new CaseResponse(
        aggregate.id().value(),
        aggregate.status().name(),
        policy.canSeePartyName() ? aggregate.partyName() : "REDACTED",
        policy.canSeeNationalId() ? aggregate.nationalId() : "REDACTED",
        policy.canSeeEvidenceSummary() ? aggregate.evidenceSummary() : null,
        policy.canSeeInternalNotes() ? aggregate.internalNotes() : null
    );
}

Do not rely on frontend hiding alone. Redaction must happen server-side.

14. Sensitive data in workflow engines

Workflow variables are often overlooked.

A workflow engine may persist variables for a long time. If you store full personal/evidence data as workflow variables, the workflow database becomes another sensitive data store.

Prefer:

{
  "caseId": "case-123",
  "partyRef": "party-456",
  "evidenceBundleRef": "evidence-bundle-789",
  "riskCategory": "HIGH"
}

Avoid:

{
  "partyName": "Jane Doe",
  "nationalId": "...",
  "evidenceText": "full evidence content..."
}

Workflow should hold references and state, not unnecessary sensitive payloads.

15. Sensitive data in DLQ and retry systems

Dead-letter queues are production graveyards for bad payloads.

They often retain:

Raw command payloads
Raw event payloads
Error stack traces
Failed export data
Third-party responses
Authentication headers

A DLQ must have:

Retention limit
Access control
Redaction/field minimization
Replay authorization
Replay audit trail
Poison payload quarantine
Deletion policy

Do not let DLQ become an ungoverned long-term data lake.

16. Caches, search indexes, and read models

Duplication is normal in microservices, but every duplicate copy increases privacy scope.

For every cache/read model/search index, define:

Source owner
Fields copied
Purpose
Refresh mechanism
Staleness tolerance
Retention/TTL
Deletion/redaction propagation
Encryption
Access control
Rebuild process
Breach impact

16.1 Search index danger

Search indexes often tokenize and replicate sensitive text.

If evidence content, notes, allegations, or names are indexed, deletion/redaction must handle:

Primary database
Search index
Cache
Snapshots
Backups
Analytics copy
Export history

Search is not a harmless performance optimization. It is a data-processing system.

17. Deletion, correction, and retention in distributed systems

Microservices make deletion hard because copies exist everywhere.

A privacy-aware deletion flow should be modeled as a workflow:

Deletion is not always physical deletion. Depending on legal/business context, the operation may be:

Delete
Redact
Anonymize
Pseudonymize
Suppress from view
Mark as legally retained
Stop processing
Remove from search/export

The service must know which one applies.

18. Privacy event pattern

Privacy operations should themselves produce events.

Examples:

personal-data.redaction-requested.v1
personal-data.redacted.v1
personal-data.deletion-requested.v1
personal-data.deletion-confirmed.v1
personal-data.legal-hold-applied.v1
personal-data.processing-restricted.v1

Payload should avoid including the sensitive data itself:

{
  "eventType": "personal-data.redaction-requested.v1",
  "requestId": "privacy-request-123",
  "subjectRef": "party-456",
  "scope": "CASE_READ_MODELS",
  "reason": "RETENTION_EXPIRED",
  "requestedAt": "2026-07-05T10:15:30Z"
}

Each service receiving the event should reply or publish confirmation:

{
  "eventType": "personal-data.redaction-confirmed.v1",
  "requestId": "privacy-request-123",
  "service": "case-read-model-service",
  "resourceType": "CASE_SEARCH_INDEX",
  "status": "COMPLETED",
  "completedAt": "2026-07-05T10:16:11Z"
}

19. Data owner vs data processor service

In microservices, distinguish:

Role	Meaning
Data owner service	Authoritative owner of field/entity
Processor service	Receives/uses data for a purpose
Projector service	Maintains read model from owner events
Exporter service	Produces external data extracts
Audit service	Records material actions/evidence
Analytics service	Processes data for aggregate insights

A processor does not automatically become owner.

Example:

Party Service owns partyName, nationalId, dateOfBirth.
Case Service references partyId and may keep a display snapshot if approved.
Notification Service temporarily processes email/phone for delivery.
Search Service indexes allowed display fields only.
Audit Store stores reason codes and references, not full party profile unless required.

20. Privacy-aware service catalog extension

Add data-flow metadata to service catalog:

service: case-service
ownerTeam: enforcement-platform
sensitivity: restricted
personalData:
  owns:
    - caseId
    - caseStatus
    - assignedOfficerId
  references:
    - partyId
    - evidenceBundleId
  storesSnapshots:
    - field: partyDisplayName
      source: party-service
      purpose: case_listing_display
      retention: until_case_closure_plus_policy
      redaction: supported
  emits:
    - event: case.created.v2
      classification: restricted
      piiFields: []
      references:
        - partyId
  logs:
    strategy: allowlist
    piiAllowed: false
  retention:
    policy: enforcement_case_retention_v4
  deletion:
    mode: legal_hold_aware_redaction

This makes privacy visible during architecture review.

21. Java pattern: typed sensitive value

A useful technique is to avoid plain String for sensitive values.

public record EmailAddress(String value) implements SensitiveValue {
    @Override
    public String redacted() {
        int at = value.indexOf('@');
        if (at <= 1) return "REDACTED";
        return value.charAt(0) + "***" + value.substring(at);
    }
}

public interface SensitiveValue {
    String redacted();
}

Then logging code can recognize sensitive types:

public String safe(Object value) {
    if (value instanceof SensitiveValue sensitive) {
        return sensitive.redacted();
    }
    return String.valueOf(value);
}

This does not solve every privacy problem, but it makes unsafe handling more visible.

22. Java pattern: outbound data policy

Before sending data to another service, apply a policy:

public interface DataDisclosurePolicy {
    DisclosureDecision decide(DisclosureRequest request);
}

public record DisclosureRequest(
    String sourceService,
    String targetService,
    String purpose,
    String tenantId,
    String jurisdiction,
    Set<String> requestedFields,
    String actorId
) {}

public record DisclosureDecision(
    boolean allowed,
    Set<String> allowedFields,
    String policyVersion,
    String reasonCode
) {}

Usage:

DisclosureDecision decision = disclosurePolicy.decide(new DisclosureRequest(
    "case-service",
    "notification-service",
    "notify_party_about_deadline",
    tenantId,
    jurisdiction,
    Set.of("partyId", "email", "deadlineDate"),
    actorId
));

if (!decision.allowed()) {
    throw new DataDisclosureDenied(decision.reasonCode());
}

NotificationPayload payload = payloadFactory.create(caseData, decision.allowedFields());

This pattern makes data disclosure reviewable and auditable.

23. Data privacy testing

Privacy controls should be tested like business rules.

23.1 Unit tests

Redaction function masks expected fields.
Sensitive value toString() does not leak raw value.
DTO mapper excludes restricted fields.
Event factory excludes PII.
Field-level policy hides unauthorized fields.

23.2 Contract tests

API response schema does not expose forbidden fields.
Event payload schema contains classification metadata.
Consumer contract does not depend on restricted fields.

23.3 Integration tests

Logs do not contain known secret/test PII markers.
Trace attributes do not contain raw payload.
DLQ payload is redacted/minimized.
Delete/redact workflow reaches all processors.

23.4 Production guardrails

Secret scanning on logs/events.
DLP scanning on object storage/export.
Canary markers for synthetic sensitive values.
Audit alert for unexpected field disclosure.

24. Privacy failure modes

24.1 Event payload leaks personal data

Root cause:

Event designed as convenience DTO.
Unknown future consumers.
No classification review.

Defense:

Reference-first event design.
Event schema review.
Sensitive-field scanner.
Consumer allowlist.

24.2 Logs leak request body

Root cause:

Generic request logging filter.
Exception handler includes payload.
Debug logs left enabled.

Defense:

Allowlist logging.
Redaction middleware.
Log review tests.
Runtime log sampling without payload.

24.3 Read model retains deleted data

Root cause:

Projection only handles create/update.
Delete/redact event missing.
Rebuild uses old snapshots.

Defense:

Redaction event type.
Projection deletion handler.
Rebuild from privacy-filtered source.
Reconciliation job.

24.4 Analytics copy becomes uncontrolled processor

Root cause:

CDC sends all tables.
No field minimization.
Analysts get raw personal data.

Defense:

Data product contract.
Field-level masking.
Purpose-limited analytics view.
Access review.
Aggregation/anonymization where appropriate.

24.5 Support tool exposes too much

Root cause:

Admin UI bypasses product authorization.
Support role has broad DB access.
Break-glass not audited.

Defense:

Purpose-based support access.
Time-limited elevation.
Field-level redaction.
Access audit.
Approval/ticket linkage.

25. Privacy architecture review checklist

For every service, ask:

What personal/sensitive fields does it own?
What personal/sensitive fields does it reference?
What personal/sensitive fields does it duplicate?
Why does it need each field?
What purpose label applies to each data flow?
Which APIs expose sensitive fields?
Which events carry sensitive fields?
Which logs/traces/metrics may contain sensitive fields?
Which caches/read models/search indexes duplicate sensitive fields?
What is the retention policy?
How is deletion/redaction propagated?
How are backups handled?
How are DLQs handled?
How is tenant/jurisdiction isolation enforced?
How are exports controlled?
How is support/admin access constrained?
How are data disclosures audited?
How is schema evolution reviewed for new sensitive fields?
What automated tests prevent leaks?
What incident response exists for privacy leakage?

26. Privacy ADR template

# ADR: Sensitive Data Flow for <Feature>

## Context

Describe the feature, data subjects, data classes, tenant/jurisdiction constraints, and processing purpose.

## Sensitive fields

| Field | Owner service | Classification | Purpose | Retention |
|---|---|---|---|---|

## Decision

Describe which fields move, to which services, through which APIs/events, and why.

## Data minimization

Describe fields intentionally excluded.

## Protection

Describe encryption, tokenization, redaction, field-level access, and logging controls.

## Retention and deletion

Describe TTL, redaction, legal hold, deletion propagation, and backup implications.

## Auditability

Describe disclosure audit event, access logging, and reconstruction path.

## Alternatives rejected

Describe why broader payloads, raw event-carried state, or shared database access were rejected.

## Consequences

Describe operational cost, coupling, query limitations, and review obligations.

27. Minimal production checklist

A privacy-aware Java microservice should have:

Field-level data classification
Data owner/processor map
Purpose labels for sensitive flows
API response minimization
Event payload minimization
Log/trace/metric redaction
Sensitive value handling in Java code
Field-level authorization where needed
Retention policy
Deletion/redaction workflow
DLQ retention/access control
Search/read-model privacy controls
Export controls
Support/admin access audit
Automated privacy leak tests
Privacy ADR for material flows

28. Practical exercise

Pick one sensitive field, for example:

party.nationalId

Trace it through the system:

Which service owns it?
Which APIs accept it?
Which APIs return it?
Which events include it?
Which logs could contain it?
Which traces could contain it?
Which read models duplicate it?
Which search indexes include it?
Which exports include it?
Which backups retain it?
Which teams can access it?
What is the retention period?
How is it deleted/redacted?
What audit event records its disclosure?

If you cannot answer these questions, the data is not under architectural control.

29. References

GDPR Article 5 — https://gdpr-info.eu/art-5-gdpr/
NIST Privacy Framework — https://www.nist.gov/privacy-framework
OWASP Logging Cheat Sheet — https://cheatsheetseries.owasp.org/cheatsheets/Logging_Cheat_Sheet.html
OWASP Secrets Management Cheat Sheet — https://cheatsheetseries.owasp.org/cheatsheets/Secrets_Management_Cheat_Sheet.html
Microsoft Privacy by Design principles — https://www.microsoft.com/en-us/trust-center/privacy/privacy-by-design
Kubernetes Secrets — https://kubernetes.io/docs/concepts/configuration/secret/

30. Key takeaways

Privacy in microservices is a data-flow problem.
Sensitive data includes more than obvious PII.
Classification must attach to fields and payloads, not just databases.
Reference-first event design reduces uncontrolled data propagation.
Logs, traces, metrics, DLQs, caches, search indexes, and workflow variables are common leakage points.
Encryption is not a substitute for minimization.
Deletion/redaction must be modeled as a distributed workflow.
A top-tier engineer designs privacy as an architectural invariant, not as a late-stage compliance patch.

Lesson Recap

You just completed lesson 60 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 59

Auditability and Regulatory Defensibility

Next Lesson

Lesson 61

Runtime Topology of Java Microservices