Swarm Secrets, Configs, Volumes, and Stateful Service Design
Learn Docker, Containerization, Docker Compose, Docker Swarm - Part 029
Deep dive Docker Swarm secrets, configs, volumes, stateful services, data placement, backup, restore, and operational governance.
Part 029 — Swarm Secrets, Configs, Volumes, and Stateful Service Design
Target part ini: kita mampu mendesain service Swarm yang memisahkan artifact, configuration, secret, dan durable state secara benar. Kita tidak sekadar tahu syntax
secrets:atauvolumes:, tetapi memahami failure mode: secret leak, config drift, node-local volume, misplaced task, backup gagal, restore tidak teruji, dan database yang tampak berjalan tetapi tidak recoverable.
Di Part 028 kita membahas stack sebagai release unit. Sekarang kita membedah komponen paling sensitif dalam stack: secret, config, dan state.
Mental model utama:
Image adalah artifact. Config adalah behavior. Secret adalah trust boundary. Volume adalah durability boundary. Stateful service adalah kombinasi placement, persistence, recovery, and ownership.
1. Kaufman Skill Deconstruction
Untuk menguasai topik ini, pecah skill menjadi subskill berikut:
| Subskill | Yang Harus Dikuasai | Bukti Penguasaan |
|---|---|---|
| Secret model | Bisa menjelaskan secret lifecycle, mount path, permission, immutability, rotation | Tidak pernah memasukkan password/API key ke image, Git, atau environment sembarangan |
| Config model | Bisa memisahkan config non-sensitive dari image dan secret | Bisa mengganti config tanpa rebuild image |
| Volume semantics | Bisa membedakan container writable layer, named volume, bind mount, external volume | Tidak kehilangan data saat task reschedule |
| Stateful scheduling | Bisa mendesain placement constraint, label node, drain behavior, and failure recovery | Bisa menjelaskan di node mana data berada dan bagaimana recovery terjadi |
| Backup/restore | Bisa membuat runbook backup, restore, integrity verification, RPO/RTO | Restore pernah diuji, bukan hanya backup dibuat |
| Governance | Bisa membuat naming, versioning, ownership, retention, and incident rules | Tidak ada orphan secret/config/volume tanpa owner |
Kaufman-style focus:
- Deconstruct: pisahkan image, config, secret, state, placement, backup.
- Learn enough to self-correct: pahami command inspect, service ps, mount path, and task reschedule behavior.
- Remove friction: buat template stack, secret rotation script, backup script, and failure lab.
- Practice deliberately: simulasi secret rotation, config rollout, node drain, volume restore, and failed database migration.
2. Four Runtime Data Categories
Containerized system sering kacau karena semua data diperlakukan sama. Dalam desain yang matang, kita bedakan empat kategori:
| Category | Example | Should Live In Image? | Should Live In Secret? | Should Live In Config? | Should Live In Volume? |
|---|---|---|---|---|---|
| Application artifact | JAR, binary, static asset | Yes | No | No | No |
| Non-sensitive config | log level, feature flag default, endpoint name | Sometimes default only | No | Yes | Rarely |
| Sensitive material | DB password, private key, token | No | Yes | No | No, unless encrypted store with strict policy |
| Durable business state | database data, uploaded files, queue data | No | No | No | Yes / external state service |
Rule:
Jika data harus berubah antar-environment, jangan embed sebagai image constant kecuali itu default aman. Jika data membuktikan identitas atau akses, perlakukan sebagai secret. Jika data harus selamat dari task replacement, perlakukan sebagai state.
3. Swarm Object Model for Secrets, Configs, and Volumes
Swarm memiliki object model berbeda untuk setiap jenis data.
Key distinction:
| Object | Stored In Swarm Control Plane? | Mounted Into Container? | Sensitive? | Mutable? | Data Follows Task? |
|---|---|---|---|---|---|
| Secret | Yes | Yes, as file | Yes | Effectively immutable | Yes, secret follows service task |
| Config | Yes | Yes, as file | No | Effectively immutable | Yes, config follows service task |
| Volume | No, not data itself | Yes | Depends | Mutable | Only if storage backend supports it |
Important:
Secret/config metadata follows the service. Volume data usually does not magically follow a task across nodes.
This is the most important stateful-service lesson in Swarm.
4. Swarm Secrets: Mental Model
Swarm secrets are designed for sensitive data needed by services at runtime.
Examples:
- database password
- TLS private key
- API token
- signing key
- encryption key material
- service credential
A secret should answer:
- Who owns it?
- Which services may read it?
- Where is it mounted?
- How is it rotated?
- How is access revoked?
- How do we know it is not leaked?
Docker's model:
Operational invariant:
A service only receives the secrets explicitly granted to that service.
Do not mount all secrets into all services. Secret access should be minimal and auditable.
5. Creating and Inspecting Secrets
Create secret from file:
printf '%s' 'super-secret-password' > db_password.txt
docker secret create orders_db_password db_password.txt
rm db_password.txt
Create secret from stdin:
printf '%s' 'super-secret-password' | docker secret create orders_db_password -
List:
docker secret ls
Inspect metadata:
docker secret inspect orders_db_password
The secret value is not shown by inspect. This is expected.
6. Mounting Secrets Into Services
CLI example:
docker service create \
--name orders-api \
--secret orders_db_password \
registry.example.com/orders-api:2026.07.01
Inside the container:
cat /run/secrets/orders_db_password
Stack file example:
services:
api:
image: registry.example.com/orders-api:2026.07.01
secrets:
- orders_db_password
secrets:
orders_db_password:
external: true
Custom target:
services:
api:
image: registry.example.com/orders-api:2026.07.01
secrets:
- source: orders_db_password
target: db_password
uid: "10001"
gid: "10001"
mode: 0400
secrets:
orders_db_password:
external: true
The application should read:
/run/secrets/db_password
not:
DB_PASSWORD=plain-text-value
7. Why Environment Variables Are a Weak Secret Boundary
Environment variables are convenient but weak for secrets.
Potential leakage surfaces:
- process inspection inside container
- crash dumps
- logs that print full environment
- debug endpoints
- support bundles
- shell history
- CI logs
- Compose files committed to Git
- application framework diagnostics
A pragmatic rule:
Use environment variables for non-sensitive runtime switches. Use secret files for sensitive material.
Example safe-ish env:
environment:
LOG_LEVEL: "INFO"
FEATURE_X_ENABLED: "false"
Example unsafe env:
environment:
DB_PASSWORD: "prod-password"
JWT_PRIVATE_KEY: "-----BEGIN PRIVATE KEY-----..."
8. Secret Immutability and Versioning
Secrets are effectively immutable. Treat them like versioned objects.
Bad naming:
db_password
Better naming:
orders_db_password_2026_07_v1
orders_db_password_2026_07_v2
orders_api_jwt_private_key_2026_q3
Why version names matter:
- clear rotation history
- rollback clarity
- audit evidence
- safe staged rollout
- no ambiguity over which service uses which credential
Naming convention:
<system>_<purpose>_<scope>_<yyyy_mm>_v<n>
Examples:
orders_postgres_password_primary_2026_07_v1
orders_api_stripe_token_2026_07_v3
orders_tls_private_key_public_2026_q3_v1
9. Secret Rotation Pattern
Rotation is not “edit secret”. Rotation is controlled replacement.
9.1 Simple Password Rotation
Create new secret:
printf '%s' 'new-password' | docker secret create orders_db_password_2026_07_v2 -
Update service:
docker service update \
--secret-add source=orders_db_password_2026_07_v2,target=db_password_new \
orders_api
After database credential and application config are updated, remove old secret:
docker service update \
--secret-rm orders_db_password_2026_07_v1 \
orders_api
Remove unused secret:
docker secret rm orders_db_password_2026_07_v1
9.2 Rotation With Zero-Downtime Requirement
For zero-downtime rotation, the dependent system must support overlap:
- DB accepts old and new credential temporarily, or
- application supports dual credential lookup, or
- token issuer supports multiple active keys, or
- load balancer drains old tasks before revocation.
Without overlap, rotation becomes a downtime event.
9.3 Rotation Failure Mode
Common broken sequence:
- Update app to use new secret.
- Remove old secret.
- Forget to update database password.
- All new tasks fail healthcheck.
- Rollback cannot restore because old secret was removed.
Safer sequence:
- Create new credential in backend.
- Create new Swarm secret.
- Update service to use new secret.
- Observe health and auth success.
- Revoke old credential.
- Remove old secret only after rollback window closes.
10. Swarm Configs: Mental Model
Configs are for non-sensitive configuration data mounted into services.
Examples:
- Nginx config
- application YAML config without secret values
- feature flag bootstrap file
- logback/log4j config
- Prometheus scrape config
- static routing table
- trusted public certificate bundle
Configs solve this problem:
We want generic immutable images, but runtime behavior differs by environment.
Docker config object keeps image generic without bind-mounting host files.
11. Creating and Mounting Configs
Create config:
docker config create orders_api_config_2026_07_v1 application-prod.yml
Inspect:
docker config inspect orders_api_config_2026_07_v1
Service usage:
services:
api:
image: registry.example.com/orders-api:2026.07.01
configs:
- source: orders_api_config_2026_07_v1
target: /app/config/application.yml
uid: "10001"
gid: "10001"
mode: 0444
configs:
orders_api_config_2026_07_v1:
external: true
Application launch:
services:
api:
command:
- "java"
- "-jar"
- "/app/orders-api.jar"
- "--spring.config.location=file:/app/config/application.yml"
12. Config Immutability and Rollout
Config should be versioned like code.
Bad:
application-prod.yml
Better:
orders_api_application_prod_2026_07_01_sha7f3a9c
When config changes:
- Create new config object.
- Update service to use new config.
- Allow rolling update.
- Observe health.
- Remove old config after rollback window.
Example:
docker config create orders_api_config_2026_07_v2 application-prod.yml
docker service update \
--config-rm orders_api_config_2026_07_v1 \
--config-add source=orders_api_config_2026_07_v2,target=/app/config/application.yml \
orders_api
If the service has update policy, this triggers controlled replacement of tasks.
13. Config vs Secret Decision Matrix
| Data | Config | Secret | Reason |
|---|---|---|---|
| log level | Yes | No | Not sensitive |
| DB host | Yes | Usually no | Infrastructure metadata |
| DB password | No | Yes | Credential |
| public certificate | Yes | No | Public trust data |
| private key | No | Yes | Identity material |
| OAuth client id | Usually config | Depends | Public identifier in many systems |
| OAuth client secret | No | Yes | Credential |
| feature flag default | Yes | No | Runtime behavior |
| license key | No | Yes | Often confidential/commercial secret |
Rule:
If disclosure grants access, impersonation, privilege, or commercial loss, treat it as secret.
14. Volumes in Swarm: The Dangerous Mental Model
The dangerous assumption:
“A named volume in Swarm is cluster-wide.”
Usually false.
A named volume created with the default local driver is local to a node.
If a database task originally runs on Node A and writes to orders_pgdata, then later reschedules to Node B with the same volume name, it may get a different empty local volume.
This is the classic Swarm stateful-service trap.
15. Volume Types and Swarm Implications
| Storage Type | Scope | Good For | Risk |
|---|---|---|---|
| Container writable layer | task/container-local | ephemeral files | lost on task replacement |
| Local named volume | node-local | single-node durable state | task reschedule may lose access |
| Bind mount | node-local host path | explicit host integration | brittle path/permission/security |
| tmpfs | memory/ephemeral | cache, sockets, temp secret-like files | data lost on restart |
| External volume driver | backend-dependent | durable/movable state | driver complexity and split-brain risk |
| Managed external DB/storage | service-level | production durable state | operational dependency outside Swarm |
Top 1% engineer behavior:
Never say “we have a volume” as proof of durability. Ask: where is the data physically, how is it replicated, how is it backed up, and what happens during reschedule?
16. Stateful Service Categories
Not every stateful service has the same risk.
| Category | Example | Swarm Fit | Notes |
|---|---|---|---|
| Stateless | API, worker, web | Excellent | scale horizontally |
| Soft state | cache, local temp index | Good | rebuildable; no backup needed |
| Durable single-primary | PostgreSQL single node | Risky but possible | requires pinning, backup, restore runbook |
| Durable clustered | Kafka, Elasticsearch, database cluster | Advanced / risky | requires protocol-level clustering and storage design |
| External managed state | RDS, Cloud SQL, S3, managed queue | Often best | Swarm runs compute, external service owns durability |
Practical principle:
Use Swarm for compute orchestration. Use dedicated state systems for critical durable business data unless your team owns storage operations deeply.
17. Single-Primary Database on Swarm
Sometimes acceptable:
- internal tool
- low-to-medium criticality system
- small deployment
- strong backup discipline
- one-node or pinned-node design
- clear recovery expectation
Usually risky for:
- high-volume transactional systems
- strict RPO/RTO
- multi-node HA requirement
- regulated evidence systems without tested restore
- workloads requiring automatic failover
A minimal pinned PostgreSQL example:
version: "3.9"
services:
postgres:
image: postgres:16.4
environment:
POSTGRES_DB: orders
POSTGRES_USER: orders_app
POSTGRES_PASSWORD_FILE: /run/secrets/orders_postgres_password_2026_07_v1
secrets:
- orders_postgres_password_2026_07_v1
volumes:
- orders_pgdata:/var/lib/postgresql/data
networks:
- app
deploy:
replicas: 1
placement:
constraints:
- node.labels.orders.db == true
restart_policy:
condition: on-failure
update_config:
order: stop-first
failure_action: rollback
rollback_config:
order: stop-first
secrets:
orders_postgres_password_2026_07_v1:
external: true
volumes:
orders_pgdata:
driver: local
networks:
app:
driver: overlay
Node label:
docker node update --label-add orders.db=true worker-1
Critical caveat:
This pins scheduling, not high availability. If
worker-1dies, the data is still onworker-1unless storage/backend strategy says otherwise.
18. Why replicas: 3 Does Not Make a Database HA
Bad example:
services:
postgres:
image: postgres:16.4
deploy:
replicas: 3
volumes:
- pgdata:/var/lib/postgresql/data
This is wrong for a normal PostgreSQL image because three independent database processes do not automatically form a safe cluster.
Potential outcomes:
- independent databases with divergent data
- three nodes writing to separate local volumes
- corrupted shared filesystem if using unsafe shared volume
- client sees inconsistent behavior
- backups are meaningless because there are multiple truths
Correct HA database requires a database-level clustering/replication protocol:
- primary/replica replication
- leader election
- fencing
- WAL/archive strategy
- split-brain prevention
- backup consistency
- failover process
Swarm can schedule processes. It does not magically turn stateful software into a distributed database.
19. Placement Strategy for Stateful Services
Stateful service scheduling must be explicit.
19.1 Node Labels
docker node update --label-add storage=ssd worker-1
docker node update --label-add zone=az-a worker-1
docker node update --label-add orders.pgdata=true worker-1
Stack:
deploy:
placement:
constraints:
- node.labels.orders.pgdata == true
19.2 Avoid Scheduling on Managers
deploy:
placement:
constraints:
- node.role == worker
19.3 Resource Reservation
deploy:
resources:
reservations:
cpus: "1.0"
memory: 2G
limits:
cpus: "2.0"
memory: 4G
For databases, memory limit needs careful tuning. If memory limit conflicts with DB buffer/cache expectations, performance and OOM behavior can become unstable.
20. Drain Behavior and Stateful Risk
docker node update --availability drain <node> tells Swarm to stop tasks on that node and reschedule them elsewhere.
For stateless service, this is normal.
For stateful local-volume service, this can be dangerous:
Stateful maintenance runbook must include:
- Identify local-volume services on node.
- Confirm backup freshness.
- Stop or migrate service intentionally.
- Avoid accidental reschedule to node without data.
- Verify data after restart.
Command to inspect services on node:
docker node ps worker-1
Command to inspect service tasks:
docker service ps orders_postgres --no-trunc
21. External Volume Drivers and Shared Storage
External volume drivers can make volumes available across nodes depending on backend.
But they introduce their own risks:
- network storage latency
- filesystem semantics mismatch
- lock/split-brain behavior
- backup consistency
- driver availability
- credential management
- mount failure during task start
- IO performance unpredictability
Decision question:
Does the storage backend provide the consistency, locking, durability, and performance semantics required by the application?
Do not assume every shared filesystem is safe for every database.
22. Pattern: Swarm Compute + External Managed State
For serious production systems, the clean pattern is often:
Benefits:
- Swarm handles stateless compute.
- Managed DB handles durability, replication, backup, failover.
- Object storage handles file durability.
- Queue service handles message durability.
- Platform team avoids reinventing storage operations.
Trade-off:
- external service dependency
- network/security configuration
- cloud/vendor cost
- cross-environment parity concerns
23. Pattern: Local Single-Node Stateful Service With Strong Runbook
Acceptable for small systems if documented:
services:
minio:
image: minio/minio:RELEASE.2026-01-01T00-00-00Z
command: server /data --console-address ":9001"
environment:
MINIO_ROOT_USER_FILE: /run/secrets/minio_root_user
MINIO_ROOT_PASSWORD_FILE: /run/secrets/minio_root_password
secrets:
- minio_root_user
- minio_root_password
volumes:
- minio_data:/data
deploy:
replicas: 1
placement:
constraints:
- node.labels.minio.data == true
volumes:
minio_data:
driver: local
secrets:
minio_root_user:
external: true
minio_root_password:
external: true
Runbook must state:
- node label owner
- backup schedule
- restore test cadence
- acceptable downtime
- how to replace node
- how to migrate data
- how to rotate secrets
24. Backup and Restore Mental Model
Backup is not the goal. Restore is the goal.
Questions:
- What is backed up?
- When is it backed up?
- Where is it stored?
- Is it encrypted?
- Who can restore it?
- How long does restore take?
- How much data can be lost?
- When was restore last tested?
25. PostgreSQL Backup Example
For Postgres in a container, do not copy raw data directory while database is running unless using a database-safe method.
Logical backup:
docker exec -t $(docker ps -q -f name=orders_postgres) \
pg_dump -U orders_app -d orders \
> orders_$(date +%Y%m%d_%H%M%S).sql
Compressed:
docker exec -t $(docker ps -q -f name=orders_postgres) \
pg_dump -U orders_app -d orders | gzip \
> orders_$(date +%Y%m%d_%H%M%S).sql.gz
Restore drill:
gunzip -c orders_20260701_010000.sql.gz | \
docker exec -i $(docker ps -q -f name=orders_postgres_restore) \
psql -U orders_app -d orders
For production-grade databases, prefer native backup strategy:
- base backup
- WAL archiving
- point-in-time recovery
- replica verification
- backup catalog
- retention policy
26. Volume Backup Example
For generic volume backup:
docker run --rm \
-v orders_pgdata:/source:ro \
-v "$PWD/backups:/backup" \
busybox \
tar czf /backup/orders_pgdata_$(date +%Y%m%d_%H%M%S).tar.gz -C /source .
Restore:
docker run --rm \
-v orders_pgdata:/target \
-v "$PWD/backups:/backup" \
busybox \
sh -c 'cd /target && tar xzf /backup/orders_pgdata_20260701_010000.tar.gz'
Warning:
For databases, filesystem-level backup must respect database consistency rules. Use database-aware backup unless you have quiesced the service or the database supports snapshot-safe backup procedure.
27. Secrets and Backups
Backups can leak secrets indirectly.
Possible leakage:
- database contains third-party credentials
- config file stored in volume contains passwords
- application log contains secret by mistake
- backup archive contains mounted secret copied accidentally
- support dump includes
/run/secrets
Backup policy must include:
- encryption at rest
- access control
- retention
- deletion
- audit logging
- restore access governance
- redaction rules for support bundles
28. Stack Example: API + Worker + Postgres With Secrets and Configs
version: "3.9"
services:
api:
image: registry.example.com/orders-api:2026.07.01
networks:
- app
- public
ports:
- target: 8080
published: 8080
protocol: tcp
mode: ingress
secrets:
- source: orders_db_password_2026_07_v1
target: db_password
uid: "10001"
gid: "10001"
mode: 0400
configs:
- source: orders_api_config_2026_07_v1
target: /app/config/application.yml
uid: "10001"
gid: "10001"
mode: 0444
environment:
DB_PASSWORD_FILE: /run/secrets/db_password
SPRING_CONFIG_ADDITIONAL_LOCATION: file:/app/config/application.yml
deploy:
replicas: 3
placement:
constraints:
- node.role == worker
update_config:
parallelism: 1
delay: 10s
failure_action: rollback
monitor: 30s
rollback_config:
parallelism: 1
delay: 5s
failure_action: pause
worker:
image: registry.example.com/orders-worker:2026.07.01
networks:
- app
secrets:
- source: orders_db_password_2026_07_v1
target: db_password
configs:
- source: orders_worker_config_2026_07_v1
target: /app/config/application.yml
environment:
DB_PASSWORD_FILE: /run/secrets/db_password
deploy:
replicas: 2
placement:
constraints:
- node.role == worker
postgres:
image: postgres:16.4
networks:
- app
environment:
POSTGRES_DB: orders
POSTGRES_USER: orders_app
POSTGRES_PASSWORD_FILE: /run/secrets/postgres_password
secrets:
- source: orders_db_password_2026_07_v1
target: postgres_password
mode: 0400
volumes:
- orders_pgdata:/var/lib/postgresql/data
deploy:
replicas: 1
placement:
constraints:
- node.labels.orders.postgres == true
restart_policy:
condition: on-failure
networks:
public:
driver: overlay
app:
driver: overlay
internal: true
volumes:
orders_pgdata:
driver: local
secrets:
orders_db_password_2026_07_v1:
external: true
configs:
orders_api_config_2026_07_v1:
external: true
orders_worker_config_2026_07_v1:
external: true
Review points:
- API and worker share DB secret explicitly.
- Postgres is pinned to labeled node.
- App config is separate from image.
- Internal network isolates DB from public ingress.
- API is scalable; Postgres is intentionally single replica.
- Production risk remains: local volume is node-bound.
29. Application Pattern: Reading Secret Files
Application should support _FILE style environment variables.
Pseudo-code:
String readConfigValue(String key) throws IOException {
String fileKey = key + "_FILE";
String path = System.getenv(fileKey);
if (path != null && !path.isBlank()) {
return Files.readString(Path.of(path), StandardCharsets.UTF_8).trim();
}
String value = System.getenv(key);
if (value != null) {
return value;
}
throw new IllegalStateException("Missing required config: " + key + " or " + fileKey);
}
This makes the app work with:
DB_PASSWORD_FILE=/run/secrets/db_password
without forcing secret into environment value.
30. Governance: Ownership and Inventory
Every secret/config/volume should have metadata outside Docker.
Example inventory table:
| Object | Owner | Environment | Services | Rotation | Backup | Last Review |
|---|---|---|---|---|---|---|
| orders_db_password_2026_07_v1 | Payments Platform | prod | api, worker, postgres | 90 days | n/a | 2026-07-01 |
| orders_api_config_2026_07_v1 | Orders Team | prod | api | per release | Git source | 2026-07-01 |
| orders_pgdata | DBA/Platform | prod | postgres | n/a | hourly logical + daily full | 2026-07-01 |
Labels can help but are not a full governance system.
deploy:
labels:
com.example.owner: "orders-platform"
com.example.data-classification: "confidential"
com.example.runbook: "https://internal/runbooks/orders"
31. Failure Mode Catalog
31.1 Secret Missing
Symptom:
secret not found
Likely causes:
- secret not created on target swarm
- stack references wrong name
- external secret omitted
- deployed to wrong Docker context
Checks:
docker secret ls
docker stack config -c stack.yml
docker context show
31.2 Secret Permission Error
Symptom:
Permission denied: /run/secrets/db_password
Likely causes:
- app runs as non-root UID
- secret mode too restrictive
- wrong uid/gid in stack file
Fix:
secrets:
- source: orders_db_password
target: db_password
uid: "10001"
gid: "10001"
mode: 0400
31.3 Config Changed But App Not Updated
Likely causes:
- config object immutable; service still mounts old config
- app reads config once at startup
- stack deploy did not replace task because reference unchanged
Fix:
- create new config name
- update service to new config
- trigger rolling update
31.4 Stateful Task Rescheduled to Empty Volume
Symptom:
- database starts but data missing
- service appears healthy but records gone
- volume exists but on wrong node
Checks:
docker service ps orders_postgres --no-trunc
docker node ps worker-1
docker volume ls
docker volume inspect orders_pgdata
Fix:
- stop service
- locate original data node
- restore from backup or move data intentionally
- apply node placement constraints
31.5 Backup Exists But Restore Fails
Likely causes:
- backup was inconsistent
- missing secrets/credentials
- wrong database version
- backup not encrypted/decrypted correctly
- restore process never tested
Fix:
- implement restore drill
- pin backup tool version
- store metadata with backup
- verify checksum
- document restore dependencies
32. Operational Runbook: Secret Rotation
Template:
# Runbook: Rotate <secret-name>
## Preconditions
- Current secret: <name-v1>
- New secret: <name-v2>
- Services affected: <list>
- Backend credential created: yes/no
- Rollback window: <duration>
## Steps
1. Create new backend credential.
2. Create new Docker secret.
3. Add new secret to service.
4. Update app config to read new target if required.
5. Observe service health.
6. Validate authentication metrics/logs.
7. Revoke old backend credential after rollback window.
8. Remove old secret from service.
9. Remove old Docker secret.
## Rollback
- Re-add old credential if still active.
- Roll back service spec.
- Restore previous config if needed.
## Evidence
- command output
- service ps output
- health metrics
- audit ticket
33. Operational Runbook: Stateful Node Maintenance
Template:
# Runbook: Maintain Node <node-name> With Stateful Services
## Pre-check
- List services on node.
- Identify stateful services.
- Confirm latest backup timestamp.
- Confirm restore test status.
- Confirm placement constraints.
## Safe Procedure
1. Announce maintenance window.
2. Stop or migrate stateful service intentionally.
3. Backup before maintenance.
4. Drain node only after stateful risk is handled.
5. Perform maintenance.
6. Reactivate node.
7. Start service on intended node.
8. Validate data and app behavior.
## Do Not
- Blindly drain a node with local-volume database.
- Assume named volume data follows task.
- Remove old backup before new restore test succeeds.
34. Security Review Checklist
For each service:
- Does it mount only secrets it needs?
- Are secrets mounted as files, not environment values?
- Are secret target names stable and app-friendly?
- Are secret file permissions compatible with non-root UID?
- Are configs free of sensitive values?
- Are config names versioned?
- Are old secrets/configs removed after rollback window?
- Are logs checked for accidental secret output?
- Are support bundles redacted?
- Is Docker socket not mounted into app containers?
35. Stateful Design Checklist
For each stateful service:
- What is the state?
- Where is it physically stored?
- Is it node-local or cluster-backed?
- What happens when task restarts on same node?
- What happens when task reschedules to another node?
- What happens during node drain?
- What happens during node loss?
- Is backup database-aware?
- Is restore tested?
- What are RPO and RTO?
- Who owns recovery?
- Is there a migration path to external managed state?
36. Practice Lab
Lab 1 — Secret Lifecycle
- Create a secret
lab_db_password_v1. - Mount it into a service.
- Read it from
/run/secrets. - Rotate to
lab_db_password_v2. - Remove v1 after service becomes healthy.
Expected learning:
- secret immutability
- service update behavior
- secret permission
- rollback window
Lab 2 — Config Rollout
- Create
nginx_conf_v1. - Deploy Nginx service with config.
- Create
nginx_conf_v2. - Update service.
- Validate response changed.
Expected learning:
- config versioning
- task replacement
- service convergence
Lab 3 — Local Volume Reschedule Trap
- Deploy a single-replica service with named volume and node constraint.
- Write data.
- Remove constraint or drain node.
- Observe behavior when task lands elsewhere.
- Restore intended placement.
Expected learning:
- node-local volume risk
- service ps diagnosis
- placement discipline
Lab 4 — Backup Restore Drill
- Populate database.
- Take logical backup.
- Remove database volume in lab.
- Restore into fresh volume.
- Validate row count and application behavior.
Expected learning:
- backup is not restore
- backup metadata matters
- database-aware procedure matters
37. Common Anti-Patterns
| Anti-Pattern | Why It Fails | Better Approach |
|---|---|---|
| Secret in image | Anyone with image can extract it | Runtime secret file |
| Secret in env file committed to Git | Plaintext leak | External secret management |
| Unversioned config | Rollback ambiguity | Immutable versioned config names |
replicas: 3 for non-clustered DB | data divergence | DB-specific replication or single-primary |
| Local volume without placement | accidental empty data on reschedule | node label + backup or external storage |
| Blind node drain | stateful task moves unsafely | stateful maintenance runbook |
| Backup without restore test | false confidence | scheduled restore drill |
| Shared volume for unsafe multi-writer app | corruption/split-brain | app-level clustering protocol |
| All services mount all secrets | blast radius too large | least-privilege secret mapping |
38. Mental Model Summary
Final invariant:
Stateless services can be rescheduled freely. Stateful services can only be rescheduled safely when storage, placement, and recovery semantics are explicitly designed.
39. What Good Looks Like
A production-ready Swarm stateful design has:
- immutable image artifact
- versioned config object
- least-privilege secret access
- secret rotation path
- explicit placement constraints for local state
- external managed state for critical data where possible
- database-aware backup
- tested restore
- node maintenance runbook
- incident path for secret leak
- evidence trail for changes
This is the difference between “it runs” and “it survives operational reality”.
40. References
- Docker Docs — Manage sensitive data with Docker secrets:
https://docs.docker.com/engine/swarm/secrets/ - Docker Docs — Store configuration data using Docker Configs:
https://docs.docker.com/engine/swarm/configs/ - Docker Docs — Deploy services to a swarm:
https://docs.docker.com/engine/swarm/services/ - Docker Docs — Deploy a stack to a swarm:
https://docs.docker.com/engine/swarm/stack-deploy/ - Docker Docs — Drain a node on the swarm:
https://docs.docker.com/engine/swarm/swarm-tutorial/drain-node/ - Docker Docs — Administer and maintain a swarm of Docker Engines:
https://docs.docker.com/engine/swarm/admin_guide/
You just completed lesson 29 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.
Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.