Series/Learn Docker, Containerization, Docker Compose, Docker Swarm

Deepen PracticeOrdered learning track

Swarm Secrets, Configs, Volumes, and Stateful Service Design

Learn Docker, Containerization, Docker Compose, Docker Swarm - Part 029

Deep dive Docker Swarm secrets, configs, volumes, stateful services, data placement, backup, restore, and operational governance.

[2026-07-01]19 min read3681 words

In This Lesson

1. Kaufman Skill Deconstruction 2. Four Runtime Data Categories 3. Swarm Object Model for Secrets, Configs, and Volumes

PrevNext

Lesson 2935 lesson track20–29 Deepen Practice

#docker#containerization#docker-swarm#secrets+4 more

Part 029 — Swarm Secrets, Configs, Volumes, and Stateful Service Design

Target part ini: kita mampu mendesain service Swarm yang memisahkan artifact, configuration, secret, dan durable state secara benar. Kita tidak sekadar tahu syntax secrets: atau volumes:, tetapi memahami failure mode: secret leak, config drift, node-local volume, misplaced task, backup gagal, restore tidak teruji, dan database yang tampak berjalan tetapi tidak recoverable.

Di Part 028 kita membahas stack sebagai release unit. Sekarang kita membedah komponen paling sensitif dalam stack: secret, config, dan state.

Mental model utama:

Image adalah artifact. Config adalah behavior. Secret adalah trust boundary. Volume adalah durability boundary. Stateful service adalah kombinasi placement, persistence, recovery, and ownership.

1. Kaufman Skill Deconstruction

Untuk menguasai topik ini, pecah skill menjadi subskill berikut:

Subskill	Yang Harus Dikuasai	Bukti Penguasaan
Secret model	Bisa menjelaskan secret lifecycle, mount path, permission, immutability, rotation	Tidak pernah memasukkan password/API key ke image, Git, atau environment sembarangan
Config model	Bisa memisahkan config non-sensitive dari image dan secret	Bisa mengganti config tanpa rebuild image
Volume semantics	Bisa membedakan container writable layer, named volume, bind mount, external volume	Tidak kehilangan data saat task reschedule
Stateful scheduling	Bisa mendesain placement constraint, label node, drain behavior, and failure recovery	Bisa menjelaskan di node mana data berada dan bagaimana recovery terjadi
Backup/restore	Bisa membuat runbook backup, restore, integrity verification, RPO/RTO	Restore pernah diuji, bukan hanya backup dibuat
Governance	Bisa membuat naming, versioning, ownership, retention, and incident rules	Tidak ada orphan secret/config/volume tanpa owner

Kaufman-style focus:

Deconstruct: pisahkan image, config, secret, state, placement, backup.
Learn enough to self-correct: pahami command inspect, service ps, mount path, and task reschedule behavior.
Remove friction: buat template stack, secret rotation script, backup script, and failure lab.
Practice deliberately: simulasi secret rotation, config rollout, node drain, volume restore, and failed database migration.

2. Four Runtime Data Categories

Containerized system sering kacau karena semua data diperlakukan sama. Dalam desain yang matang, kita bedakan empat kategori:

Category	Example	Should Live In Image?	Should Live In Secret?	Should Live In Config?	Should Live In Volume?
Application artifact	JAR, binary, static asset	Yes	No	No	No
Non-sensitive config	log level, feature flag default, endpoint name	Sometimes default only	No	Yes	Rarely
Sensitive material	DB password, private key, token	No	Yes	No	No, unless encrypted store with strict policy
Durable business state	database data, uploaded files, queue data	No	No	No	Yes / external state service

Rule:

Jika data harus berubah antar-environment, jangan embed sebagai image constant kecuali itu default aman. Jika data membuktikan identitas atau akses, perlakukan sebagai secret. Jika data harus selamat dari task replacement, perlakukan sebagai state.

3. Swarm Object Model for Secrets, Configs, and Volumes

Swarm memiliki object model berbeda untuk setiap jenis data.

Key distinction:

Object	Stored In Swarm Control Plane?	Mounted Into Container?	Sensitive?	Mutable?	Data Follows Task?
Secret	Yes	Yes, as file	Yes	Effectively immutable	Yes, secret follows service task
Config	Yes	Yes, as file	No	Effectively immutable	Yes, config follows service task
Volume	No, not data itself	Yes	Depends	Mutable	Only if storage backend supports it

Important:

Secret/config metadata follows the service. Volume data usually does not magically follow a task across nodes.

This is the most important stateful-service lesson in Swarm.

4. Swarm Secrets: Mental Model

Swarm secrets are designed for sensitive data needed by services at runtime.

Examples:

database password
TLS private key
API token
signing key
encryption key material
service credential

A secret should answer:

Who owns it?
Which services may read it?
Where is it mounted?
How is it rotated?
How is access revoked?
How do we know it is not leaked?

Docker's model:

Operational invariant:

A service only receives the secrets explicitly granted to that service.

Do not mount all secrets into all services. Secret access should be minimal and auditable.

5. Creating and Inspecting Secrets

Create secret from file:

printf '%s' 'super-secret-password' > db_password.txt

docker secret create orders_db_password db_password.txt

rm db_password.txt

Create secret from stdin:

printf '%s' 'super-secret-password' | docker secret create orders_db_password -

List:

docker secret ls

Inspect metadata:

docker secret inspect orders_db_password

The secret value is not shown by inspect. This is expected.

6. Mounting Secrets Into Services

CLI example:

docker service create \
  --name orders-api \
  --secret orders_db_password \
  registry.example.com/orders-api:2026.07.01

Inside the container:

cat /run/secrets/orders_db_password

Stack file example:

services:
  api:
    image: registry.example.com/orders-api:2026.07.01
    secrets:
      - orders_db_password

secrets:
  orders_db_password:
    external: true

Custom target:

services:
  api:
    image: registry.example.com/orders-api:2026.07.01
    secrets:
      - source: orders_db_password
        target: db_password
        uid: "10001"
        gid: "10001"
        mode: 0400

secrets:
  orders_db_password:
    external: true

The application should read:

/run/secrets/db_password

not:

DB_PASSWORD=plain-text-value

7. Why Environment Variables Are a Weak Secret Boundary

Environment variables are convenient but weak for secrets.

Potential leakage surfaces:

process inspection inside container
crash dumps
logs that print full environment
debug endpoints
support bundles
shell history
CI logs
Compose files committed to Git
application framework diagnostics

A pragmatic rule:

Use environment variables for non-sensitive runtime switches. Use secret files for sensitive material.

Example safe-ish env:

environment:
  LOG_LEVEL: "INFO"
  FEATURE_X_ENABLED: "false"

Example unsafe env:

environment:
  DB_PASSWORD: "prod-password"
  JWT_PRIVATE_KEY: "-----BEGIN PRIVATE KEY-----..."

8. Secret Immutability and Versioning

Secrets are effectively immutable. Treat them like versioned objects.

Bad naming:

db_password

Better naming:

orders_db_password_2026_07_v1
orders_db_password_2026_07_v2
orders_api_jwt_private_key_2026_q3

Why version names matter:

clear rotation history
rollback clarity
audit evidence
safe staged rollout
no ambiguity over which service uses which credential

Naming convention:

<system>_<purpose>_<scope>_<yyyy_mm>_v<n>

Examples:

orders_postgres_password_primary_2026_07_v1
orders_api_stripe_token_2026_07_v3
orders_tls_private_key_public_2026_q3_v1

9. Secret Rotation Pattern

Rotation is not “edit secret”. Rotation is controlled replacement.

9.1 Simple Password Rotation

Create new secret:

printf '%s' 'new-password' | docker secret create orders_db_password_2026_07_v2 -

Update service:

docker service update \
  --secret-add source=orders_db_password_2026_07_v2,target=db_password_new \
  orders_api

After database credential and application config are updated, remove old secret:

docker service update \
  --secret-rm orders_db_password_2026_07_v1 \
  orders_api

Remove unused secret:

docker secret rm orders_db_password_2026_07_v1

9.2 Rotation With Zero-Downtime Requirement

For zero-downtime rotation, the dependent system must support overlap:

DB accepts old and new credential temporarily, or
application supports dual credential lookup, or
token issuer supports multiple active keys, or
load balancer drains old tasks before revocation.

Without overlap, rotation becomes a downtime event.

9.3 Rotation Failure Mode

Common broken sequence:

Update app to use new secret.
Remove old secret.
Forget to update database password.
All new tasks fail healthcheck.
Rollback cannot restore because old secret was removed.

Safer sequence:

Create new credential in backend.
Create new Swarm secret.
Update service to use new secret.
Observe health and auth success.
Revoke old credential.
Remove old secret only after rollback window closes.

10. Swarm Configs: Mental Model

Configs are for non-sensitive configuration data mounted into services.

Examples:

Nginx config
application YAML config without secret values
feature flag bootstrap file
logback/log4j config
Prometheus scrape config
static routing table
trusted public certificate bundle

Configs solve this problem:

We want generic immutable images, but runtime behavior differs by environment.

Docker config object keeps image generic without bind-mounting host files.

11. Creating and Mounting Configs

Create config:

docker config create orders_api_config_2026_07_v1 application-prod.yml

Inspect:

docker config inspect orders_api_config_2026_07_v1

Service usage:

services:
  api:
    image: registry.example.com/orders-api:2026.07.01
    configs:
      - source: orders_api_config_2026_07_v1
        target: /app/config/application.yml
        uid: "10001"
        gid: "10001"
        mode: 0444

configs:
  orders_api_config_2026_07_v1:
    external: true

Application launch:

services:
  api:
    command:
      - "java"
      - "-jar"
      - "/app/orders-api.jar"
      - "--spring.config.location=file:/app/config/application.yml"

12. Config Immutability and Rollout

Config should be versioned like code.

Bad:

application-prod.yml

Better:

orders_api_application_prod_2026_07_01_sha7f3a9c

When config changes:

Create new config object.
Update service to use new config.
Allow rolling update.
Observe health.
Remove old config after rollback window.

Example:

docker config create orders_api_config_2026_07_v2 application-prod.yml

docker service update \
  --config-rm orders_api_config_2026_07_v1 \
  --config-add source=orders_api_config_2026_07_v2,target=/app/config/application.yml \
  orders_api

If the service has update policy, this triggers controlled replacement of tasks.

13. Config vs Secret Decision Matrix

Data	Config	Secret	Reason
log level	Yes	No	Not sensitive
DB host	Yes	Usually no	Infrastructure metadata
DB password	No	Yes	Credential
public certificate	Yes	No	Public trust data
private key	No	Yes	Identity material
OAuth client id	Usually config	Depends	Public identifier in many systems
OAuth client secret	No	Yes	Credential
feature flag default	Yes	No	Runtime behavior
license key	No	Yes	Often confidential/commercial secret

Rule:

If disclosure grants access, impersonation, privilege, or commercial loss, treat it as secret.

14. Volumes in Swarm: The Dangerous Mental Model

The dangerous assumption:

“A named volume in Swarm is cluster-wide.”

Usually false.

A named volume created with the default local driver is local to a node.

If a database task originally runs on Node A and writes to orders_pgdata, then later reschedules to Node B with the same volume name, it may get a different empty local volume.

This is the classic Swarm stateful-service trap.

15. Volume Types and Swarm Implications

Storage Type	Scope	Good For	Risk
Container writable layer	task/container-local	ephemeral files	lost on task replacement
Local named volume	node-local	single-node durable state	task reschedule may lose access
Bind mount	node-local host path	explicit host integration	brittle path/permission/security
tmpfs	memory/ephemeral	cache, sockets, temp secret-like files	data lost on restart
External volume driver	backend-dependent	durable/movable state	driver complexity and split-brain risk
Managed external DB/storage	service-level	production durable state	operational dependency outside Swarm

Top 1% engineer behavior:

Never say “we have a volume” as proof of durability. Ask: where is the data physically, how is it replicated, how is it backed up, and what happens during reschedule?

16. Stateful Service Categories

Not every stateful service has the same risk.

Category	Example	Swarm Fit	Notes
Stateless	API, worker, web	Excellent	scale horizontally
Soft state	cache, local temp index	Good	rebuildable; no backup needed
Durable single-primary	PostgreSQL single node	Risky but possible	requires pinning, backup, restore runbook
Durable clustered	Kafka, Elasticsearch, database cluster	Advanced / risky	requires protocol-level clustering and storage design
External managed state	RDS, Cloud SQL, S3, managed queue	Often best	Swarm runs compute, external service owns durability

Practical principle:

Use Swarm for compute orchestration. Use dedicated state systems for critical durable business data unless your team owns storage operations deeply.

17. Single-Primary Database on Swarm

Sometimes acceptable:

internal tool
low-to-medium criticality system
small deployment
strong backup discipline
one-node or pinned-node design
clear recovery expectation

Usually risky for:

high-volume transactional systems
strict RPO/RTO
multi-node HA requirement
regulated evidence systems without tested restore
workloads requiring automatic failover

A minimal pinned PostgreSQL example:

version: "3.9"

services:
  postgres:
    image: postgres:16.4
    environment:
      POSTGRES_DB: orders
      POSTGRES_USER: orders_app
      POSTGRES_PASSWORD_FILE: /run/secrets/orders_postgres_password_2026_07_v1
    secrets:
      - orders_postgres_password_2026_07_v1
    volumes:
      - orders_pgdata:/var/lib/postgresql/data
    networks:
      - app
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.labels.orders.db == true
      restart_policy:
        condition: on-failure
      update_config:
        order: stop-first
        failure_action: rollback
      rollback_config:
        order: stop-first

secrets:
  orders_postgres_password_2026_07_v1:
    external: true

volumes:
  orders_pgdata:
    driver: local

networks:
  app:
    driver: overlay

Node label:

docker node update --label-add orders.db=true worker-1

Critical caveat:

This pins scheduling, not high availability. If worker-1 dies, the data is still on worker-1 unless storage/backend strategy says otherwise.

18. Why `replicas: 3` Does Not Make a Database HA

Bad example:

services:
  postgres:
    image: postgres:16.4
    deploy:
      replicas: 3
    volumes:
      - pgdata:/var/lib/postgresql/data

This is wrong for a normal PostgreSQL image because three independent database processes do not automatically form a safe cluster.

Potential outcomes:

independent databases with divergent data
three nodes writing to separate local volumes
corrupted shared filesystem if using unsafe shared volume
client sees inconsistent behavior
backups are meaningless because there are multiple truths

Correct HA database requires a database-level clustering/replication protocol:

primary/replica replication
leader election
fencing
WAL/archive strategy
split-brain prevention
backup consistency
failover process

Swarm can schedule processes. It does not magically turn stateful software into a distributed database.

19. Placement Strategy for Stateful Services

Stateful service scheduling must be explicit.

19.1 Node Labels

docker node update --label-add storage=ssd worker-1
docker node update --label-add zone=az-a worker-1
docker node update --label-add orders.pgdata=true worker-1

Stack:

deploy:
  placement:
    constraints:
      - node.labels.orders.pgdata == true

19.2 Avoid Scheduling on Managers

deploy:
  placement:
    constraints:
      - node.role == worker

19.3 Resource Reservation

deploy:
  resources:
    reservations:
      cpus: "1.0"
      memory: 2G
    limits:
      cpus: "2.0"
      memory: 4G

For databases, memory limit needs careful tuning. If memory limit conflicts with DB buffer/cache expectations, performance and OOM behavior can become unstable.

20. Drain Behavior and Stateful Risk

docker node update --availability drain <node> tells Swarm to stop tasks on that node and reschedule them elsewhere.

For stateless service, this is normal.

For stateful local-volume service, this can be dangerous:

Stateful maintenance runbook must include:

Identify local-volume services on node.
Confirm backup freshness.
Stop or migrate service intentionally.
Avoid accidental reschedule to node without data.
Verify data after restart.

Command to inspect services on node:

docker node ps worker-1

Command to inspect service tasks:

docker service ps orders_postgres --no-trunc

21. External Volume Drivers and Shared Storage

External volume drivers can make volumes available across nodes depending on backend.

But they introduce their own risks:

network storage latency
filesystem semantics mismatch
lock/split-brain behavior
backup consistency
driver availability
credential management
mount failure during task start
IO performance unpredictability

Decision question:

Does the storage backend provide the consistency, locking, durability, and performance semantics required by the application?

Do not assume every shared filesystem is safe for every database.

22. Pattern: Swarm Compute + External Managed State

For serious production systems, the clean pattern is often:

Benefits:

Swarm handles stateless compute.
Managed DB handles durability, replication, backup, failover.
Object storage handles file durability.
Queue service handles message durability.
Platform team avoids reinventing storage operations.

Trade-off:

external service dependency
network/security configuration
cloud/vendor cost
cross-environment parity concerns

23. Pattern: Local Single-Node Stateful Service With Strong Runbook

Acceptable for small systems if documented:

services:
  minio:
    image: minio/minio:RELEASE.2026-01-01T00-00-00Z
    command: server /data --console-address ":9001"
    environment:
      MINIO_ROOT_USER_FILE: /run/secrets/minio_root_user
      MINIO_ROOT_PASSWORD_FILE: /run/secrets/minio_root_password
    secrets:
      - minio_root_user
      - minio_root_password
    volumes:
      - minio_data:/data
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.labels.minio.data == true

volumes:
  minio_data:
    driver: local

secrets:
  minio_root_user:
    external: true
  minio_root_password:
    external: true

Runbook must state:

node label owner
backup schedule
restore test cadence
acceptable downtime
how to replace node
how to migrate data
how to rotate secrets

24. Backup and Restore Mental Model

Backup is not the goal. Restore is the goal.

Questions:

What is backed up?
When is it backed up?
Where is it stored?
Is it encrypted?
Who can restore it?
How long does restore take?
How much data can be lost?
When was restore last tested?

25. PostgreSQL Backup Example

For Postgres in a container, do not copy raw data directory while database is running unless using a database-safe method.

Logical backup:

docker exec -t $(docker ps -q -f name=orders_postgres) \
  pg_dump -U orders_app -d orders \
  > orders_$(date +%Y%m%d_%H%M%S).sql

Compressed:

docker exec -t $(docker ps -q -f name=orders_postgres) \
  pg_dump -U orders_app -d orders | gzip \
  > orders_$(date +%Y%m%d_%H%M%S).sql.gz

Restore drill:

gunzip -c orders_20260701_010000.sql.gz | \
  docker exec -i $(docker ps -q -f name=orders_postgres_restore) \
  psql -U orders_app -d orders

For production-grade databases, prefer native backup strategy:

base backup
WAL archiving
point-in-time recovery
replica verification
backup catalog
retention policy

26. Volume Backup Example

For generic volume backup:

docker run --rm \
  -v orders_pgdata:/source:ro \
  -v "$PWD/backups:/backup" \
  busybox \
  tar czf /backup/orders_pgdata_$(date +%Y%m%d_%H%M%S).tar.gz -C /source .

Restore:

docker run --rm \
  -v orders_pgdata:/target \
  -v "$PWD/backups:/backup" \
  busybox \
  sh -c 'cd /target && tar xzf /backup/orders_pgdata_20260701_010000.tar.gz'

Warning:

For databases, filesystem-level backup must respect database consistency rules. Use database-aware backup unless you have quiesced the service or the database supports snapshot-safe backup procedure.

27. Secrets and Backups

Backups can leak secrets indirectly.

Possible leakage:

database contains third-party credentials
config file stored in volume contains passwords
application log contains secret by mistake
backup archive contains mounted secret copied accidentally
support dump includes /run/secrets

Backup policy must include:

encryption at rest
access control
retention
deletion
audit logging
restore access governance
redaction rules for support bundles

28. Stack Example: API + Worker + Postgres With Secrets and Configs

version: "3.9"

services:
  api:
    image: registry.example.com/orders-api:2026.07.01
    networks:
      - app
      - public
    ports:
      - target: 8080
        published: 8080
        protocol: tcp
        mode: ingress
    secrets:
      - source: orders_db_password_2026_07_v1
        target: db_password
        uid: "10001"
        gid: "10001"
        mode: 0400
    configs:
      - source: orders_api_config_2026_07_v1
        target: /app/config/application.yml
        uid: "10001"
        gid: "10001"
        mode: 0444
    environment:
      DB_PASSWORD_FILE: /run/secrets/db_password
      SPRING_CONFIG_ADDITIONAL_LOCATION: file:/app/config/application.yml
    deploy:
      replicas: 3
      placement:
        constraints:
          - node.role == worker
      update_config:
        parallelism: 1
        delay: 10s
        failure_action: rollback
        monitor: 30s
      rollback_config:
        parallelism: 1
        delay: 5s
        failure_action: pause

  worker:
    image: registry.example.com/orders-worker:2026.07.01
    networks:
      - app
    secrets:
      - source: orders_db_password_2026_07_v1
        target: db_password
    configs:
      - source: orders_worker_config_2026_07_v1
        target: /app/config/application.yml
    environment:
      DB_PASSWORD_FILE: /run/secrets/db_password
    deploy:
      replicas: 2
      placement:
        constraints:
          - node.role == worker

  postgres:
    image: postgres:16.4
    networks:
      - app
    environment:
      POSTGRES_DB: orders
      POSTGRES_USER: orders_app
      POSTGRES_PASSWORD_FILE: /run/secrets/postgres_password
    secrets:
      - source: orders_db_password_2026_07_v1
        target: postgres_password
        mode: 0400
    volumes:
      - orders_pgdata:/var/lib/postgresql/data
    deploy:
      replicas: 1
      placement:
        constraints:
          - node.labels.orders.postgres == true
      restart_policy:
        condition: on-failure

networks:
  public:
    driver: overlay
  app:
    driver: overlay
    internal: true

volumes:
  orders_pgdata:
    driver: local

secrets:
  orders_db_password_2026_07_v1:
    external: true

configs:
  orders_api_config_2026_07_v1:
    external: true
  orders_worker_config_2026_07_v1:
    external: true

Review points:

API and worker share DB secret explicitly.
Postgres is pinned to labeled node.
App config is separate from image.
Internal network isolates DB from public ingress.
API is scalable; Postgres is intentionally single replica.
Production risk remains: local volume is node-bound.

29. Application Pattern: Reading Secret Files

Application should support _FILE style environment variables.

Pseudo-code:

String readConfigValue(String key) throws IOException {
    String fileKey = key + "_FILE";
    String path = System.getenv(fileKey);
    if (path != null && !path.isBlank()) {
        return Files.readString(Path.of(path), StandardCharsets.UTF_8).trim();
    }

    String value = System.getenv(key);
    if (value != null) {
        return value;
    }

    throw new IllegalStateException("Missing required config: " + key + " or " + fileKey);
}

This makes the app work with:

DB_PASSWORD_FILE=/run/secrets/db_password

without forcing secret into environment value.

30. Governance: Ownership and Inventory

Every secret/config/volume should have metadata outside Docker.

Example inventory table:

Object	Owner	Environment	Services	Rotation	Backup	Last Review
orders_db_password_2026_07_v1	Payments Platform	prod	api, worker, postgres	90 days	n/a	2026-07-01
orders_api_config_2026_07_v1	Orders Team	prod	api	per release	Git source	2026-07-01
orders_pgdata	DBA/Platform	prod	postgres	n/a	hourly logical + daily full	2026-07-01

Labels can help but are not a full governance system.

deploy:
  labels:
    com.example.owner: "orders-platform"
    com.example.data-classification: "confidential"
    com.example.runbook: "https://internal/runbooks/orders"

31. Failure Mode Catalog

31.1 Secret Missing

Symptom:

secret not found

Likely causes:

secret not created on target swarm
stack references wrong name
external secret omitted
deployed to wrong Docker context

Checks:

docker secret ls
docker stack config -c stack.yml
docker context show

31.2 Secret Permission Error

Symptom:

Permission denied: /run/secrets/db_password

Likely causes:

app runs as non-root UID
secret mode too restrictive
wrong uid/gid in stack file

Fix:

secrets:
  - source: orders_db_password
    target: db_password
    uid: "10001"
    gid: "10001"
    mode: 0400

31.3 Config Changed But App Not Updated

Likely causes:

config object immutable; service still mounts old config
app reads config once at startup
stack deploy did not replace task because reference unchanged

Fix:

create new config name
update service to new config
trigger rolling update

31.4 Stateful Task Rescheduled to Empty Volume

Symptom:

database starts but data missing
service appears healthy but records gone
volume exists but on wrong node

Checks:

docker service ps orders_postgres --no-trunc
docker node ps worker-1
docker volume ls
docker volume inspect orders_pgdata

Fix:

stop service
locate original data node
restore from backup or move data intentionally
apply node placement constraints

31.5 Backup Exists But Restore Fails

Likely causes:

backup was inconsistent
missing secrets/credentials
wrong database version
backup not encrypted/decrypted correctly
restore process never tested

Fix:

implement restore drill
pin backup tool version
store metadata with backup
verify checksum
document restore dependencies

32. Operational Runbook: Secret Rotation

Template:

# Runbook: Rotate <secret-name>

## Preconditions
- Current secret: <name-v1>
- New secret: <name-v2>
- Services affected: <list>
- Backend credential created: yes/no
- Rollback window: <duration>

## Steps
1. Create new backend credential.
2. Create new Docker secret.
3. Add new secret to service.
4. Update app config to read new target if required.
5. Observe service health.
6. Validate authentication metrics/logs.
7. Revoke old backend credential after rollback window.
8. Remove old secret from service.
9. Remove old Docker secret.

## Rollback
- Re-add old credential if still active.
- Roll back service spec.
- Restore previous config if needed.

## Evidence
- command output
- service ps output
- health metrics
- audit ticket

33. Operational Runbook: Stateful Node Maintenance

Template:

# Runbook: Maintain Node <node-name> With Stateful Services

## Pre-check
- List services on node.
- Identify stateful services.
- Confirm latest backup timestamp.
- Confirm restore test status.
- Confirm placement constraints.

## Safe Procedure
1. Announce maintenance window.
2. Stop or migrate stateful service intentionally.
3. Backup before maintenance.
4. Drain node only after stateful risk is handled.
5. Perform maintenance.
6. Reactivate node.
7. Start service on intended node.
8. Validate data and app behavior.

## Do Not
- Blindly drain a node with local-volume database.
- Assume named volume data follows task.
- Remove old backup before new restore test succeeds.

34. Security Review Checklist

For each service:

35. Stateful Design Checklist

For each stateful service:

36. Practice Lab

Lab 1 — Secret Lifecycle

Create a secret lab_db_password_v1.
Mount it into a service.
Read it from /run/secrets.
Rotate to lab_db_password_v2.
Remove v1 after service becomes healthy.

Expected learning:

secret immutability
service update behavior
secret permission
rollback window

Lab 2 — Config Rollout

Create nginx_conf_v1.
Deploy Nginx service with config.
Create nginx_conf_v2.
Update service.
Validate response changed.

Expected learning:

config versioning
task replacement
service convergence

Lab 3 — Local Volume Reschedule Trap

Deploy a single-replica service with named volume and node constraint.
Write data.
Remove constraint or drain node.
Observe behavior when task lands elsewhere.
Restore intended placement.

Expected learning:

node-local volume risk
service ps diagnosis
placement discipline

Lab 4 — Backup Restore Drill

Populate database.
Take logical backup.
Remove database volume in lab.
Restore into fresh volume.
Validate row count and application behavior.

Expected learning:

backup is not restore
backup metadata matters
database-aware procedure matters

37. Common Anti-Patterns

Anti-Pattern	Why It Fails	Better Approach
Secret in image	Anyone with image can extract it	Runtime secret file
Secret in env file committed to Git	Plaintext leak	External secret management
Unversioned config	Rollback ambiguity	Immutable versioned config names
`replicas: 3` for non-clustered DB	data divergence	DB-specific replication or single-primary
Local volume without placement	accidental empty data on reschedule	node label + backup or external storage
Blind node drain	stateful task moves unsafely	stateful maintenance runbook
Backup without restore test	false confidence	scheduled restore drill
Shared volume for unsafe multi-writer app	corruption/split-brain	app-level clustering protocol
All services mount all secrets	blast radius too large	least-privilege secret mapping

38. Mental Model Summary

Final invariant:

Stateless services can be rescheduled freely. Stateful services can only be rescheduled safely when storage, placement, and recovery semantics are explicitly designed.

39. What Good Looks Like

A production-ready Swarm stateful design has:

immutable image artifact
versioned config object
least-privilege secret access
secret rotation path
explicit placement constraints for local state
external managed state for critical data where possible
database-aware backup
tested restore
node maintenance runbook
incident path for secret leak
evidence trail for changes

This is the difference between “it runs” and “it survives operational reality”.

40. References

Docker Docs — Manage sensitive data with Docker secrets: https://docs.docker.com/engine/swarm/secrets/
Docker Docs — Store configuration data using Docker Configs: https://docs.docker.com/engine/swarm/configs/
Docker Docs — Deploy services to a swarm: https://docs.docker.com/engine/swarm/services/
Docker Docs — Deploy a stack to a swarm: https://docs.docker.com/engine/swarm/stack-deploy/
Docker Docs — Drain a node on the swarm: https://docs.docker.com/engine/swarm/swarm-tutorial/drain-node/
Docker Docs — Administer and maintain a swarm of Docker Engines: https://docs.docker.com/engine/swarm/admin_guide/

Lesson Recap

You just completed lesson 29 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 28

Swarm Stacks: Compose Deploy Spec, Stack Files, and Environment Promotion

Next Lesson

Lesson 30

Swarm Release Safety: Rolling Updates, Rollbacks, Health Gates, Failure Modes

Swarm Secrets, Configs, Volumes, and Stateful Service Design

Part 029 — Swarm Secrets, Configs, Volumes, and Stateful Service Design

1. Kaufman Skill Deconstruction

2. Four Runtime Data Categories

3. Swarm Object Model for Secrets, Configs, and Volumes

4. Swarm Secrets: Mental Model

5. Creating and Inspecting Secrets

6. Mounting Secrets Into Services

7. Why Environment Variables Are a Weak Secret Boundary

8. Secret Immutability and Versioning

9. Secret Rotation Pattern

9.1 Simple Password Rotation

9.2 Rotation With Zero-Downtime Requirement

9.3 Rotation Failure Mode

10. Swarm Configs: Mental Model

11. Creating and Mounting Configs

12. Config Immutability and Rollout

13. Config vs Secret Decision Matrix

14. Volumes in Swarm: The Dangerous Mental Model

15. Volume Types and Swarm Implications

16. Stateful Service Categories

17. Single-Primary Database on Swarm

18. Why replicas: 3 Does Not Make a Database HA

19. Placement Strategy for Stateful Services

19.1 Node Labels

19.2 Avoid Scheduling on Managers

19.3 Resource Reservation

20. Drain Behavior and Stateful Risk

21. External Volume Drivers and Shared Storage

22. Pattern: Swarm Compute + External Managed State

23. Pattern: Local Single-Node Stateful Service With Strong Runbook

24. Backup and Restore Mental Model

25. PostgreSQL Backup Example

26. Volume Backup Example

27. Secrets and Backups

28. Stack Example: API + Worker + Postgres With Secrets and Configs

29. Application Pattern: Reading Secret Files

30. Governance: Ownership and Inventory

31. Failure Mode Catalog

31.1 Secret Missing

31.2 Secret Permission Error

31.3 Config Changed But App Not Updated

31.4 Stateful Task Rescheduled to Empty Volume

31.5 Backup Exists But Restore Fails

32. Operational Runbook: Secret Rotation

33. Operational Runbook: Stateful Node Maintenance

34. Security Review Checklist

35. Stateful Design Checklist

36. Practice Lab

Lab 1 — Secret Lifecycle

Lab 2 — Config Rollout

Lab 3 — Local Volume Reschedule Trap

Lab 4 — Backup Restore Drill

37. Common Anti-Patterns

38. Mental Model Summary

39. What Good Looks Like

40. References

18. Why `replicas: 3` Does Not Make a Database HA