Final StretchOrdered learning track

Engineering Playbook

Learn Java Microservices File Handling, State, Configuration and Secret Management - Part 069

Engineering playbook untuk Java microservices file, state, configuration, dan secret management: decision trees, ADR templates, runbooks, review gates, operational patterns, and reusable implementation guides.

5 min read934 words
PrevNext
Lesson 6970 lesson track59–70 Final Stretch
#java#microservices#engineering-playbook#adr+3 more

Part 069 — Engineering Playbook

Senior engineering is not knowing every answer.

It is knowing which question must be answered before code is allowed to exist.

This part turns the whole series into a practical engineering playbook.

A playbook is not a tutorial. It is a set of reusable decision tools:

  • decision trees;
  • review gates;
  • ADR templates;
  • runbooks;
  • design invariants;
  • testing matrices;
  • failure models;
  • implementation skeletons;
  • production readiness checks.

The goal:

When a Java microservice needs file handling, state, config, or secret logic,
the team should not reinvent the mental model from scratch.

This playbook is designed for tech leads, staff engineers, platform engineers, security reviewers, and service owners.


1. How to Use This Playbook

Use it at six moments:

MomentUse
New service designchoose file/state/config/secret architecture
Feature designidentify ownership, lifecycle, and invariants
Code reviewcatch boundary and failure mistakes
Security reviewvalidate threat controls and least privilege
Production readinessconfirm observability, runbook, and rollback
Incident reviewmap failure back to missing invariant/control

Do not apply every section mechanically. Use the parts relevant to the risk level.


2. Risk Classification

Before design depth, classify the artifact.

Risk levels:

LevelExampleRequired Review
Lowtemp export cachebasic code review
Mediuminternal config timeoutservice + SRE review
Highfile upload/downloadsecurity + operational review
Criticalevidence/legal-hold filesecurity + compliance + architecture review

3. Decision Tree: File Handling

Rules:

Do not store user file directly in accepted state.
Do not use client filename as storage path.
Do not expose bucket/key as domain identity.
Do not skip metadata-payload consistency checks.

4. Decision Tree: Upload Architecture

QuestionChoose
File > service memory budget?streaming/direct upload
Client can reach object storage?presigned direct upload
Need synchronous inspection before storage?proxy upload
Need malware scan?quarantine-first
Need resumable upload?multipart upload/session
Need regulatory retention?versioning/object lock/legal hold
Need low-latency small files?proxy may be simpler
Need mobile/browser direct upload?presigned with strict TTL/CORS

Recommended default:

Large or untrusted files:
upload session -> presigned upload -> quarantine -> scan -> accepted/rejected

5. Decision Tree: State Placement

Rules:

No correctness-critical state only in local disk, heap, or pod memory.
No shared database mutation across service ownership boundaries.
No cache as hidden source of truth.

6. Decision Tree: Configuration

Rules:

Config must have owner, schema, validation, provenance, and safe default.
Feature flags must have owner and expiry.
Secrets must never be stored in ConfigMap/plain config.

7. Decision Tree: Secret Delivery

SituationRecommended Pattern
Cloud workload accessing cloud resourcesworkload identity/IAM role
DB credential needing rotationsecret manager + dual credential/alternating users
short-lived dynamic credentialVault dynamic secret + lease-aware consumer
GitOps with external managerExternal Secrets Operator/reference
GitOps without external managerSOPS with KMS or Sealed Secrets
TLS cert lifecyclecert-manager/service mesh/app reload
Java app cannot hot reload secretrolling restart with overlap
high-risk secretaudit, access review, rotation evidence

Rules:

Secret delivery is not secret rotation.
Secret update is not consumer adoption.
Old credential revoke is final step.

8. ADR Template: File Architecture

# ADR: File Handling Architecture for <Service>

## Status
Proposed | Accepted | Deprecated

## Context
- What files are handled?
- Who uploads/generates them?
- Data classification:
- Size range:
- Retention/compliance requirements:
- Access model:
- Expected volume:

## Decision
- Upload model:
- Storage:
- Metadata source of truth:
- Lifecycle states:
- Scan/validation:
- Download model:
- Retention/legal hold:
- Audit events:
- Reconciliation:

## Alternatives Considered
1. Proxy upload
2. Direct upload
3. DB BLOB
4. Shared filesystem
5. External DMS

## Consequences
### Positive
- ...

### Negative
- ...

## Invariants
- ...

## Failure Handling
- ...

## Observability
- ...

## Security
- ...

## Rollback/Migration
- ...

9. ADR Template: Configuration

# ADR: Configuration Delivery for <Service>

## Context
- Config categories:
- Environments:
- Reload requirements:
- Security/compliance impact:
- Existing platform constraints:

## Decision
- Source of truth:
- Delivery mechanism:
- Schema:
- Validation:
- Promotion:
- Runtime reload:
- Drift detection:
- Rollback:

## Reload Classification
| Key/Group | Reloadable | Owner | Risk |
|---|---|---|---|

## Invariants
- ...

## Operational Model
- ...

## Alternatives
- Spring Cloud Config
- Kubernetes ConfigMap
- GitOps rendered config
- Feature flag platform
- Database-backed config

## Consequences
- ...

10. ADR Template: Secret Management

# ADR: Secret Management for <Service>

## Context
- Secret list:
- Consumers:
- Dependency:
- Rotation requirement:
- Availability impact:
- Compliance/security requirement:

## Decision
- Secret authority:
- Delivery:
- Runtime consumption:
- Rotation strategy:
- Reload/rollout:
- Least privilege:
- Audit:
- Emergency rotation:

## Secret Inventory
| Secret | Source | Consumer | Rotation | Reload |
|---|---|---|---|---|

## Invariants
- Secret not logged
- Consumer refresh defined
- Old credential not revoked before proof
- ...

## Failure Handling
- Secret source down:
- Bad new credential:
- Expired secret:
- Rollback:

## Observability
- ...

11. Runbook Template

# Runbook: <Problem>

## Symptoms
- Alerts:
- User-visible impact:
- Metrics:
- Logs:

## Severity
- Sev:
- Escalation:

## Immediate Checks
1. ...
2. ...
3. ...

## Diagnosis
- Query/dashboard:
- Expected normal:
- Abnormal patterns:

## Mitigation
- Safe action:
- Risk:
- Approval required:

## Recovery
- Steps:
- Validation:
- Rollback:

## Evidence to Capture
- Audit events:
- Logs:
- Metrics:
- Config/secret versions:
- Timeline:

## Post-Incident
- Root cause:
- Corrective actions:
- Tests to add:
- Runbook updates:

12. Runbook: Metadata-Payload Mismatch

Symptoms:

metadata_payload_mismatch_total > 0
download returns 404 from object storage
accepted file missing object

Diagnosis:

1. Identify fileId.
2. Load metadata row.
3. Check lifecycle status.
4. Check object bucket/key/version.
5. Query object storage HEAD.
6. Check audit events around upload/acceptance.
7. Check recent storage/delete events.
8. Check reconciliation reports.

Mitigation:

CaseAction
UPLOADING/temp object missingexpire upload session
QUARANTINED object missingmark failed/rejected and notify
ACCEPTED object missingseverity high; freeze deletion; start forensic investigation
object exists with checksum mismatchquarantine/lock and investigate tampering
metadata wrong keyrepair only with audit/approval

Never silently recreate accepted evidence unless domain policy allows and evidence source is provable.


13. Runbook: Secret Rotation Failure

Symptoms:

dependency_auth_failure_total spike
secret_rotation_failed_total
pods mixed secret versions
secret_seconds_until_expiry low

Immediate:

1. Do not revoke old credential.
2. Pause rollout.
3. Identify new version and affected consumers.
4. Check canary readiness.
5. Check dependency auth logs.
6. Confirm old credential still valid.

Mitigation:

FailureAction
new credential invalidrevert secret current version
ESO sync delayedwait/fix ESO before rollout
app cannot reloadrolling restart
old credential already revokedrecreate/third credential emergency path
secret leakedemergency rotation and incident process

Completion criteria:

all consumers healthy
new version used
old usage zero for observation window
old revoked
audit event recorded

14. Runbook: Config Drift

Symptoms:

config_drift_detected_total
live ConfigMap differs from Git
pods report unknown config version
behavior changed without deployment

Diagnosis:

1. Compare desired Git version vs live ConfigMap.
2. Check Kubernetes audit for update actor.
3. Check GitOps controller events.
4. Check app effective config version endpoint.
5. Identify high-risk keys changed.
6. Check correlated incidents after drift timestamp.

Mitigation:

- If unsafe: revert immediately to Git desired or previous safe version.
- If emergency valid: record exception and backfill Git.
- Restrict actor/RBAC if unauthorized.
- Add policy to prevent recurrence.

15. Runbook: Audit Outbox Backlog

Symptoms:

audit_outbox_oldest_age_seconds > threshold
audit_publish_failure_total increasing
audit sink unavailable

Diagnosis:

1. Check audit sink health.
2. Check outbox table growth.
3. Check publisher logs.
4. Check duplicate/idempotency errors.
5. Check network/credential issues.

Mitigation:

- Restart/fix publisher.
- Scale publisher if throughput issue.
- Pause high-risk operations if policy requires.
- Do not delete outbox rows manually.
- Replay idempotently.

Validation:

oldest age decreasing
publish success increasing
no gaps in event IDs
audit sink confirms receipt

16. Design Review Gate

Before implementation:

[ ] Artifact classified
[ ] Owner defined
[ ] Source of truth defined
[ ] Lifecycle state machine defined
[ ] Invariants defined
[ ] Threat model completed
[ ] Access model defined
[ ] Config/secret classification done
[ ] Failure modes listed
[ ] Observability plan defined
[ ] Audit events defined
[ ] Reconciliation plan defined
[ ] ADR written

17. Code Review Gate

During code review:

[ ] No `MultipartFile.getBytes()` for large/unbounded file
[ ] No client filename used as path
[ ] No bucket/key exposed as public domain ID
[ ] No raw secret in logs/exceptions
[ ] No request/response body logging by default
[ ] Typed config uses validation
[ ] Invalid lifecycle transition impossible
[ ] Idempotency implemented for retryable commands
[ ] External side effects have compensation/reconciliation
[ ] Metrics labels bounded and non-sensitive
[ ] Audit events emitted for material decisions

18. Security Review Gate

[ ] File upload allowlist/size/content validation
[ ] Quarantine before trust
[ ] Malware scan decision model
[ ] Payload access authorization separate from metadata
[ ] Presigned URL TTL and logging policy
[ ] Least privilege IAM/RBAC
[ ] Kubernetes Secret access restricted
[ ] Secret rotation strategy documented
[ ] Config cannot disable security controls in prod
[ ] Actuator/debug endpoints restricted
[ ] Logs/traces/metrics redacted
[ ] Audit store protected

19. Production Readiness Gate

[ ] Readiness checks required dependencies
[ ] Liveness not tied to transient dependency failure
[ ] Dashboards exist
[ ] Alerts have runbooks
[ ] Reconciliation jobs deployed
[ ] DLQ monitored
[ ] Storage lifecycle/incomplete multipart cleanup configured
[ ] Secret version/expiry observable
[ ] Config version observable
[ ] Audit outbox monitored
[ ] Rollback tested
[ ] Chaos/failure scenario tested in staging
[ ] Incident owner defined

20. Migration Playbook

When migrating existing file/config/secret system:

20.1 Inventory

- all buckets/prefixes
- DB metadata tables
- file types
- object keys with PII
- configs and sources
- secrets and consumers
- access policies
- retention rules
- audit gaps

20.2 Stabilize

- stop direct bucket access where possible
- add metadata-payload reconciliation
- add audit events for critical operations
- add config schema validation
- add secret scanning
- add least privilege

20.3 Migrate

- introduce stable fileId
- backfill checksums
- backfill object metadata/tags
- map lifecycle states
- move secrets to manager
- move config to governed path
- add runbooks and alerts

20.4 Verify

- compare counts
- verify checksums
- test downloads
- test retention
- test secret rotation
- test rollback

21. Implementation Skeleton: File Lifecycle Guard

public final class FileLifecycleGuard {
    private static final Map<FileStatus, Set<FileStatus>> ALLOWED = Map.of(
        FileStatus.UPLOADING, Set.of(FileStatus.UPLOADED),
        FileStatus.UPLOADED, Set.of(FileStatus.QUARANTINED),
        FileStatus.QUARANTINED, Set.of(FileStatus.SCANNING),
        FileStatus.SCANNING, Set.of(FileStatus.ACCEPTED, FileStatus.REJECTED),
        FileStatus.ACCEPTED, Set.of(FileStatus.ARCHIVED, FileStatus.DELETION_REQUESTED),
        FileStatus.ARCHIVED, Set.of(FileStatus.DELETION_REQUESTED),
        FileStatus.DELETION_REQUESTED, Set.of(FileStatus.DELETED),
        FileStatus.REJECTED, Set.of(FileStatus.DELETED),
        FileStatus.DELETED, Set.of()
    );

    public void requireAllowed(FileStatus from, FileStatus to) {
        if (!ALLOWED.getOrDefault(from, Set.of()).contains(to)) {
            throw new InvalidLifecycleTransitionException(from, to);
        }
    }
}

22. Implementation Skeleton: Safe Config Snapshot

public record ConfigSnapshot<T>(
    String version,
    T value,
    Instant loadedAt
) {}

public final class ReloadableConfig<T> {
    private final AtomicReference<ConfigSnapshot<T>> current;

    public ReloadableConfig(ConfigSnapshot<T> initial) {
        this.current = new AtomicReference<>(initial);
    }

    public ConfigSnapshot<T> current() {
        return current.get();
    }

    public void reload(ConfigSnapshot<T> candidate, Predicate<T> validator) {
        if (!validator.test(candidate.value())) {
            throw new InvalidConfigurationException(candidate.version());
        }

        current.set(candidate);
    }
}

23. Implementation Skeleton: Safe Secret Display

public final class SecretValue {
    private final String value;

    private SecretValue(String value) {
        if (value == null || value.isBlank()) {
            throw new IllegalArgumentException("Secret value is required");
        }
        this.value = value;
    }

    public static SecretValue of(String value) {
        return new SecretValue(value);
    }

    public String revealForUse() {
        return value;
    }

    @Override
    public String toString() {
        return "[REDACTED]";
    }
}

24. Review Questions for Staff-Level Engineers

Ask these in design reviews.

What is the source of truth?
What happens if this operation succeeds halfway?
What retries this command?
What makes retry safe?
What state can become stale?
Who can mutate this value?
How is this audited?
Can we prove the file was not changed?
What happens if config changes during traffic?
What happens if secret expires during traffic?
What happens if old pods and new pods run different config?
What data leaks if logs are exported?
What is the recovery path?
What is the smallest blast radius?
What is the rollback path?

If the team cannot answer, the design is not done.


25. Key Takeaways

  1. A playbook converts engineering judgment into repeatable review practice.
  2. Decision trees prevent teams from defaulting to unsafe convenience.
  3. ADR templates capture consequences before code hardens them.
  4. Runbooks must be written before production incidents.
  5. Review gates catch different classes of failure: design, code, security, operations.
  6. Migration should inventory, stabilize, migrate, and verify.
  7. Reusable skeletons should encode invariants, not just wrappers.
  8. The best senior engineers ask boundary, failure, ownership, and proof questions early.
  9. Production-grade systems make correct operation boring and unsafe operation difficult.
  10. Playbooks improve only when incidents and game days feed back into them.

Next is the final part: a capstone that combines the entire series into an end-to-end production design exercise and next learning path.


References

Lesson Recap

You just completed lesson 69 in final stretch. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.