Capstone: Production-Grade Container Platform from Laptop to Swarm
Learn Docker, Containerization, Docker Compose, Docker Swarm - Part 035
Capstone end-to-end untuk membangun production-grade container platform dari laptop ke Swarm: image, Compose dev/test, registry promotion, SBOM, secrets, stack deploy, rolling update, rollback, observability, incident drill, dan 20-hour deliberate practice checklist.
Part 035 — Capstone: Production-Grade Container Platform from Laptop to Swarm
Ini adalah part terakhir seri.
Kita tidak akan menambah konsep baru secara besar-besaran.
Kita akan menggabungkan seluruh seri menjadi satu sistem end-to-end:
- aplikasi dibangun menjadi image yang reproducible;
- developer menjalankan dependency lokal dengan Compose;
- test integration berjalan pada ephemeral Compose stack;
- image dipromosikan ke registry dengan tag dan digest yang benar;
- SBOM/provenance/security scan menjadi evidence;
- secrets/configs dipisahkan dari image;
- stack dideploy ke Docker Swarm;
- rolling update punya health gate dan rollback path;
- observability cukup untuk debugging incident nyata;
- failure drill dilakukan secara sengaja;
- kemampuan diukur dengan rubric, bukan perasaan.
Capstone ini dibuat sebagai internal engineering handbook style: bukan tutorial mainan, melainkan blueprint yang bisa kamu adaptasi untuk platform kecil-menengah, internal tools, regulated system, service lab, atau migration runway sebelum Kubernetes.
1. Kaufman Lens: Dari Pengetahuan ke Keluwesan
Kaufman mengajarkan bahwa belajar cepat bukan berarti belajar dangkal.
Untuk skill kompleks, kita perlu:
- deconstruct skill menjadi subskill kecil;
- learn enough to self-correct;
- remove barriers to practice;
- practice deliberately pada skenario nyata;
- create feedback loops yang cepat.
Dalam konteks Docker, tujuan akhir bukan “hafal command”.
Tujuan akhir adalah bisa menjawab pertanyaan seperti:
- kenapa container ini restart loop;
- kenapa build lambat;
- kenapa image membengkak;
- kenapa service bisa connect di laptop tapi gagal di Swarm;
- kenapa secret bisa bocor ke image layer;
- kenapa rolling update menyebabkan outage;
- kenapa stateful service tidak aman dipindahkan antar-node;
- kapan Compose cukup;
- kapan Swarm masuk akal;
- kapan perlu orchestrator lain;
- bukti apa yang dibutuhkan agar deployment defensible.
Capstone ini memaksa kita mengerjakan semua boundary itu dalam satu lifecycle.
2. Capstone System: Regulatory Case API Platform
Kita akan memakai contoh domain yang cukup realistis:
Regulatory Case API Platform — sistem backend untuk menerima laporan, membuat enforcement case, menyimpan state case, menerbitkan event audit, dan menyediakan endpoint health/metrics.
Arsitekturnya sengaja sederhana tapi cukup kaya:
| Komponen | Fungsi | Runtime |
|---|---|---|
case-api | HTTP API untuk case lifecycle | custom application image |
worker | asynchronous worker untuk event processing | same image, different command |
postgres | database transactional | official image, dev/test only atau managed/external di production |
redis | queue/cache lightweight | official image |
reverse-proxy | ingress lokal/prod edge contoh | nginx/traefik/caddy style |
prometheus | metrics scrape lab | optional profile |
grafana | dashboard lab | optional profile |
migrate | one-shot migration job | same image, migration command |
seed | one-shot fixture job | same image, seed command |
Target sistem:
- build image sekali;
- jalankan di local Compose;
- test di CI Compose;
- push ke registry;
- deploy ke Swarm sebagai stack;
- update image by digest;
- rollback jika health gagal;
- audit artifact tersedia.
3. End-to-End Mental Model
Diagram berikut menggambarkan aliran dari developer laptop sampai Swarm.
Ada tiga state penting:
- source state — Git commit, Dockerfile, Compose file, stack file;
- artifact state — image digest, SBOM, provenance, scan result;
- runtime state — containers/tasks/services, logs, metrics, health, events.
Kesalahan engineer sering terjadi karena mencampur tiga state ini.
Contoh:
- deploy berdasarkan tag mutable, bukan digest;
- memperbaiki container live, bukan memperbaiki source/build;
- menyimpan config rahasia di image;
- menganggap Compose file dev sama dengan deployment spec prod;
- menganggap healthcheck sama dengan readiness aplikasi;
- menganggap rollback aman tanpa memikirkan migration/state.
4. Repository Layout
Layout yang rapi mengurangi friction latihan dan mengurangi ambiguitas operasional.
reg-case-platform/
├── app/
│ ├── src/
│ ├── package.json | pom.xml | build.gradle | go.mod
│ └── migrations/
├── docker/
│ ├── case-api.Dockerfile
│ ├── nginx.conf
│ └── entrypoint.sh
├── compose.yml
├── compose.dev.yml
├── compose.test.yml
├── compose.observability.yml
├── stack.yml
├── env/
│ ├── dev.env.example
│ ├── test.env.example
│ └── prod.env.example
├── secrets/
│ └── README.md
├── scripts/
│ ├── dev-up.sh
│ ├── dev-reset.sh
│ ├── test-integration.sh
│ ├── build-image.sh
│ ├── push-image.sh
│ ├── deploy-stack.sh
│ ├── rollback-stack.sh
│ └── collect-diagnostics.sh
├── ops/
│ ├── release-checklist.md
│ ├── incident-runbook.md
│ ├── capacity-worksheet.md
│ └── threat-model.md
└── README.md
Prinsip:
compose.ymlberisi model dasar aplikasi;compose.dev.ymlhanya override untuk developer loop;compose.test.ymlhanya override untuk test isolation;compose.observability.ymloptional profile untuk metrics/log dashboard;stack.ymladalah deployment contract untuk Swarm;- secrets tidak pernah di-commit;
- script hanya membungkus command yang sudah jelas, bukan menyembunyikan logic.
5. Production-Grade Dockerfile
Contoh berikut generik untuk aplikasi compiled/runtime modern.
Jika aplikasimu Java, Node, Go, .NET, atau Python, detail builder/runtime akan berbeda, tetapi invariants-nya sama.
# syntax=docker/dockerfile:1.7
ARG RUNTIME_BASE=debian:bookworm-slim
ARG BUILD_BASE=debian:bookworm-slim
FROM ${BUILD_BASE} AS build
WORKDIR /src
# Install build dependencies in one layer.
# Replace this with Maven/Gradle/npm/go/dotnet specific dependency resolution.
RUN --mount=type=cache,target=/var/cache/apt \
--mount=type=cache,target=/var/lib/apt \
apt-get update && \
apt-get install -y --no-install-recommends ca-certificates curl build-essential && \
rm -rf /var/lib/apt/lists/*
COPY app/ ./app/
# Build artifact. Keep output deterministic.
RUN --mount=type=cache,target=/root/.cache \
mkdir -p /out && \
cp -r app /out/app
FROM ${RUNTIME_BASE} AS runtime
LABEL org.opencontainers.image.title="reg-case-platform" \
org.opencontainers.image.description="Regulatory Case API Platform" \
org.opencontainers.image.source="https://example.internal/reg-case-platform" \
org.opencontainers.image.vendor="internal-platform"
RUN groupadd --system app && \
useradd --system --gid app --home-dir /app --shell /usr/sbin/nologin app && \
mkdir -p /app /var/log/app /tmp/app && \
chown -R app:app /app /var/log/app /tmp/app
WORKDIR /app
COPY --from=build --chown=app:app /out/app/ ./
COPY --chown=app:app docker/entrypoint.sh /usr/local/bin/entrypoint.sh
RUN chmod 0555 /usr/local/bin/entrypoint.sh
USER app:app
ENV APP_PORT=8080 \
APP_ENV=production \
TMPDIR=/tmp/app
EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=3s --start-period=20s --retries=3 \
CMD curl -fsS http://127.0.0.1:${APP_PORT}/health || exit 1
ENTRYPOINT ["/usr/local/bin/entrypoint.sh"]
CMD ["api"]
Review invariants:
| Area | Expected invariant |
|---|---|
| Base image | explicit, reviewed, scanned |
| Build stage | contains build dependencies only |
| Runtime stage | contains runtime artifact only |
| User | non-root by default |
| Secrets | not copied, not passed via ARG |
| Cache | dependency/build cache isolated |
| Metadata | OCI labels present |
| Entrypoint | signal-safe, no shell trap if avoidable |
| Health | local, cheap, dependency-aware enough |
| Filesystem | writable paths explicit |
Jangan jadikan Dockerfile sebagai tempat semua operational logic.
Dockerfile harus menjawab:
- apa artifact runtime;
- apa dependency minimum;
- siapa user proses;
- apa command default;
- apa health signal minimum.
Bukan:
- bagaimana deploy;
- bagaimana rotate secret;
- bagaimana menjalankan migration production tanpa kontrol;
- bagaimana patch live container.
6. Entrypoint Contract
Entrypoint harus kecil dan defensible.
#!/usr/bin/env sh
set -eu
mode="${1:-api}"
shift || true
case "$mode" in
api)
exec /app/bin/case-api "$@"
;;
worker)
exec /app/bin/case-worker "$@"
;;
migrate)
exec /app/bin/case-migrate "$@"
;;
seed)
exec /app/bin/case-seed "$@"
;;
*)
echo "unknown mode: $mode" >&2
exit 64
;;
esac
Prinsip:
- gunakan
execagar proses aplikasi menjadi PID 1; - validasi mode secara eksplisit;
- jangan menjalankan migration otomatis di semua startup API;
- jangan menulis secret ke log;
- jangan menelan signal;
- jangan membuat infinite loop tersembunyi.
7. Compose Base Model
compose.yml adalah application graph.
name: reg-case
services:
case-api:
image: ${APP_IMAGE:-reg-case-platform:local}
build:
context: .
dockerfile: docker/case-api.Dockerfile
command: ["api"]
environment:
APP_ENV: ${APP_ENV:-development}
APP_PORT: "8080"
DATABASE_URL: postgres://case:${POSTGRES_PASSWORD:-case_dev}@postgres:5432/case
REDIS_URL: redis://redis:6379/0
ports:
- "8080:8080"
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_started
migrate:
condition: service_completed_successfully
healthcheck:
test: ["CMD-SHELL", "curl -fsS http://127.0.0.1:8080/health || exit 1"]
interval: 10s
timeout: 3s
retries: 5
start_period: 20s
networks:
- app-net
read_only: true
tmpfs:
- /tmp/app
cap_drop:
- ALL
security_opt:
- no-new-privileges:true
worker:
image: ${APP_IMAGE:-reg-case-platform:local}
build:
context: .
dockerfile: docker/case-api.Dockerfile
command: ["worker"]
environment:
APP_ENV: ${APP_ENV:-development}
DATABASE_URL: postgres://case:${POSTGRES_PASSWORD:-case_dev}@postgres:5432/case
REDIS_URL: redis://redis:6379/0
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_started
migrate:
condition: service_completed_successfully
networks:
- app-net
read_only: true
tmpfs:
- /tmp/app
cap_drop:
- ALL
security_opt:
- no-new-privileges:true
migrate:
image: ${APP_IMAGE:-reg-case-platform:local}
build:
context: .
dockerfile: docker/case-api.Dockerfile
command: ["migrate"]
environment:
DATABASE_URL: postgres://case:${POSTGRES_PASSWORD:-case_dev}@postgres:5432/case
depends_on:
postgres:
condition: service_healthy
restart: "no"
networks:
- app-net
postgres:
image: postgres:16
environment:
POSTGRES_DB: case
POSTGRES_USER: case
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-case_dev}
volumes:
- postgres-data:/var/lib/postgresql/data
healthcheck:
test: ["CMD-SHELL", "pg_isready -U case -d case"]
interval: 5s
timeout: 3s
retries: 10
networks:
- app-net
redis:
image: redis:7
command: ["redis-server", "--appendonly", "yes"]
volumes:
- redis-data:/data
networks:
- app-net
networks:
app-net:
driver: bridge
volumes:
postgres-data:
redis-data:
Perhatikan beberapa hal:
case-apidanworkermemakai image yang sama tetapi command berbeda;migrateone-shot dan tidak restart;- database dev/test ada di Compose, bukan berarti database production harus ikut di Swarm;
depends_ondipakai untuk readiness minimum, bukan jaminan aplikasi bebas race;read_only,tmpfs,cap_drop,no-new-privilegesdimasukkan sejak dev agar hardening tidak jadi kejutan production;- port database tidak dipublish ke host secara default.
8. Compose Dev Override
compose.dev.yml mempercepat inner loop.
services:
case-api:
build:
target: runtime
environment:
APP_ENV: development
LOG_LEVEL: debug
volumes:
- ./app:/app:ro
develop:
watch:
- action: sync+restart
path: ./app
target: /app
ignore:
- node_modules/
- target/
- build/
worker:
environment:
APP_ENV: development
LOG_LEVEL: debug
seed:
image: ${APP_IMAGE:-reg-case-platform:local}
command: ["seed"]
profiles: ["seed"]
environment:
DATABASE_URL: postgres://case:${POSTGRES_PASSWORD:-case_dev}@postgres:5432/case
depends_on:
postgres:
condition: service_healthy
migrate:
condition: service_completed_successfully
networks:
- app-net
mailhog:
image: mailhog/mailhog
profiles: ["debug"]
ports:
- "8025:8025"
networks:
- app-net
Cara menjalankan:
docker compose -f compose.yml -f compose.dev.yml up --build
Dengan debug tools:
docker compose -f compose.yml -f compose.dev.yml --profile debug up --build
Dengan seed fixture:
docker compose -f compose.yml -f compose.dev.yml --profile seed run --rm seed
Compose dev bukan production.
Tetapi Compose dev harus tetap mengajarkan boundary production:
- service discovery via service name;
- dependency readiness via healthcheck;
- secrets/config dipisahkan;
- writable path eksplisit;
- non-root user tetap jalan;
- logs ke stdout/stderr;
- no hidden host dependency.
9. Compose Test Harness
compose.test.yml harus deterministic, disposable, dan mudah dibersihkan.
name: reg-case-test-${BUILD_ID:-local}
services:
case-api:
ports: []
environment:
APP_ENV: test
LOG_LEVEL: info
test-runner:
image: ${APP_IMAGE:-reg-case-platform:local}
command: ["test-integration"]
environment:
APP_ENV: test
API_URL: http://case-api:8080
DATABASE_URL: postgres://case:${POSTGRES_PASSWORD:-case_test}@postgres:5432/case
REDIS_URL: redis://redis:6379/0
depends_on:
case-api:
condition: service_healthy
networks:
- app-net
CI script:
#!/usr/bin/env bash
set -euo pipefail
export BUILD_ID="${BUILD_ID:-$(date +%s)}"
export APP_IMAGE="reg-case-platform:${GIT_SHA:-local}"
cleanup() {
docker compose -f compose.yml -f compose.test.yml down -v --remove-orphans || true
}
trap cleanup EXIT
docker buildx build \
--load \
-t "$APP_IMAGE" \
-f docker/case-api.Dockerfile \
.
docker compose -f compose.yml -f compose.test.yml up \
--abort-on-container-exit \
--exit-code-from test-runner
Testing invariant:
| Invariant | Why it matters |
|---|---|
| unique project name | parallel CI isolation |
down -v cleanup | no data contamination |
| health-gated test runner | avoids startup race |
| no host ports | avoids CI port collision |
| same image as deploy candidate | tests artifact, not source checkout illusion |
| diagnostics on failure | makes flakiness actionable |
Tambahkan diagnostics saat gagal:
docker compose -f compose.yml -f compose.test.yml ps
docker compose -f compose.yml -f compose.test.yml logs --no-color
docker compose -f compose.yml -f compose.test.yml events --json || true
10. Build and Release Pipeline
Pipeline ideal minimal:
Build command example:
#!/usr/bin/env bash
set -euo pipefail
IMAGE_REPO="registry.example.internal/platform/reg-case-platform"
GIT_SHA="${GIT_SHA:?GIT_SHA required}"
VERSION="${VERSION:-0.0.0-$GIT_SHA}"
IMAGE_SHA_TAG="$IMAGE_REPO:$GIT_SHA"
IMAGE_VERSION_TAG="$IMAGE_REPO:$VERSION"
# Use docker-container builder for multi-platform builds and attestations.
docker buildx create --name reg-case-builder --use --bootstrap || docker buildx use reg-case-builder
docker buildx build \
--platform linux/amd64,linux/arm64 \
--file docker/case-api.Dockerfile \
--tag "$IMAGE_SHA_TAG" \
--tag "$IMAGE_VERSION_TAG" \
--sbom=true \
--provenance=true \
--push \
.
Deployment should consume digest, not only tag.
Resolve digest:
docker buildx imagetools inspect "$IMAGE_SHA_TAG"
Or store digest from CI output.
Release evidence should include:
- Git commit;
- image tags;
- image digest;
- Dockerfile path;
- build command/pipeline run;
- SBOM reference;
- provenance reference;
- vulnerability scan result;
- integration test result;
- stack file version;
- deploy timestamp;
- smoke test result;
- rollback plan.
11. Registry Promotion Model
Do not treat registry as a dumping ground.
Use promotion tiers:
| Tier | Tag example | Meaning |
|---|---|---|
| CI candidate | :sha-abc123 | built from commit |
| Release candidate | :0.9.0-rc.1 | passed integration gate |
| Production release | :1.4.2 | approved immutable release |
| Environment marker | :prod | optional pointer, never sole source of truth |
Hard rule:
Deployment records must include digest.
Tag is human-friendly.
Digest is evidence-friendly.
registry.example.internal/platform/reg-case-platform@sha256:...
Operational anti-pattern:
image: registry.example.internal/platform/reg-case-platform:latest
Better:
image: registry.example.internal/platform/reg-case-platform@sha256:012345...
Or if tooling forces tags, store resolved digest beside the deployment record.
12. Swarm Stack File
stack.yml is not just Compose copied to production.
It is deployment intent.
version: "3.9"
services:
case-api:
image: ${APP_IMAGE_DIGEST}
command: ["api"]
environment:
APP_ENV: production
APP_PORT: "8080"
DATABASE_URL_FILE: /run/secrets/database_url
REDIS_URL: redis://redis:6379/0
secrets:
- database_url
configs:
- source: case_api_config_v1
target: /app/config/application.yml
networks:
- public-net
- app-net
ports:
- target: 8080
published: 8080
protocol: tcp
mode: ingress
healthcheck:
test: ["CMD-SHELL", "curl -fsS http://127.0.0.1:8080/health || exit 1"]
interval: 10s
timeout: 3s
retries: 3
start_period: 30s
deploy:
mode: replicated
replicas: 4
endpoint_mode: vip
placement:
constraints:
- node.labels.workload == app
resources:
reservations:
cpus: "0.25"
memory: 256M
limits:
cpus: "1.00"
memory: 768M
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
window: 120s
update_config:
parallelism: 1
delay: 15s
order: start-first
failure_action: rollback
monitor: 60s
max_failure_ratio: 0
rollback_config:
parallelism: 1
delay: 10s
order: stop-first
monitor: 60s
failure_action: pause
worker:
image: ${APP_IMAGE_DIGEST}
command: ["worker"]
environment:
APP_ENV: production
DATABASE_URL_FILE: /run/secrets/database_url
REDIS_URL: redis://redis:6379/0
secrets:
- database_url
networks:
- app-net
deploy:
mode: replicated
replicas: 2
placement:
constraints:
- node.labels.workload == app
resources:
reservations:
cpus: "0.25"
memory: 256M
limits:
cpus: "1.00"
memory: 512M
restart_policy:
condition: on-failure
update_config:
parallelism: 1
delay: 15s
failure_action: rollback
monitor: 60s
redis:
image: redis:7
command: ["redis-server", "--appendonly", "yes"]
volumes:
- redis-data:/data
networks:
- app-net
deploy:
mode: replicated
replicas: 1
placement:
constraints:
- node.labels.stateful == true
restart_policy:
condition: any
networks:
public-net:
driver: overlay
app-net:
driver: overlay
attachable: false
volumes:
redis-data:
secrets:
database_url:
external: true
configs:
case_api_config_v1:
external: true
Production notes:
- database is represented as external URL secret;
- Redis is shown as stateful example, but production Redis may also be external/managed;
- configs are versioned because Swarm configs are immutable in practice;
update_config.failure_action: rollbackencodes release safety;placement.constraintsprevent accidental scheduling to wrong nodes;- resource reservations improve scheduling honesty;
- limits create failure predictability;
start-firstrequires enough capacity to run old and new tasks during update.
13. Swarm Deployment Script
#!/usr/bin/env bash
set -euo pipefail
STACK_NAME="reg-case"
APP_IMAGE_DIGEST="${APP_IMAGE_DIGEST:?APP_IMAGE_DIGEST required}"
export APP_IMAGE_DIGEST
# Validate final config after interpolation.
docker stack config -c stack.yml > /tmp/${STACK_NAME}.resolved.yml
# Optional: show image digest and deploy summary.
grep "image:" /tmp/${STACK_NAME}.resolved.yml
# Deploy/update stack. Must run on manager node.
docker stack deploy \
--with-registry-auth \
-c /tmp/${STACK_NAME}.resolved.yml \
"$STACK_NAME"
# Observe convergence.
docker stack services "$STACK_NAME"
docker stack ps "$STACK_NAME" --no-trunc
Post-deploy smoke:
curl -fsS http://platform.example.internal:8080/health
curl -fsS http://platform.example.internal:8080/ready
Convergence check:
docker service ls --filter label=com.docker.stack.namespace=reg-case
docker service ps reg-case_case-api --no-trunc
docker service inspect reg-case_case-api --pretty
Rollback:
docker service rollback reg-case_case-api
docker service rollback reg-case_worker
Remember: rollback is safe only if data/schema changes are compatible.
Application compatibility is part of release engineering, not Docker magic.
14. Migration Strategy
Migration is the most dangerous part of a “simple” container deployment.
Bad pattern:
Every API replica runs migration on startup.
This creates:
- lock contention;
- duplicate migration attempts;
- partial rollout hazards;
- rollback impossibility;
- unclear ownership.
Better pattern:
Migration rules:
| Rule | Reason |
|---|---|
| expand-contract | old and new versions can coexist |
| no destructive pre-deploy change | rollback remains possible |
| one migration owner | avoids concurrency race |
| migration evidence recorded | auditability |
| migration timeout bounded | prevents stuck deploy |
| migration logs preserved | incident diagnosis |
For Swarm, migration can be done as:
- external CI job with network access to DB;
- temporary one-shot service;
- manually controlled maintenance runbook.
Do not let every app task mutate schema on boot.
15. Secrets and Configs Runbook
Create secret:
printf '%s' 'postgres://case:REDACTED@db.internal:5432/case' | \
docker secret create database_url -
Create versioned config:
docker config create case_api_config_v1 ./config/application.prod.yml
Rotate secret:
printf '%s' 'postgres://case:NEW_REDACTED@db.internal:5432/case' | \
docker secret create database_url_v2 -
Update service to use new secret:
docker service update \
--secret-rm database_url \
--secret-add source=database_url_v2,target=database_url \
reg-case_case-api
Then remove old secret after all consumers are migrated:
docker secret rm database_url
Governance invariants:
- secrets are not passed through
ARG; - secrets are not stored in image;
- secrets are not printed in logs;
- secrets are injected at runtime;
- config versions are named explicitly;
- rotation has a rollback path;
- secret consumers are known.
16. Observability Design
A production-grade container platform needs enough signal to answer:
- what version is running;
- where it is running;
- why it restarted;
- whether it is healthy;
- whether it is overloaded;
- whether logs are flowing;
- whether deployment is converging;
- whether rollback happened;
- what changed before incident.
Minimum telemetry:
| Signal | Source | Example question |
|---|---|---|
| logs | app stdout/stderr, Docker logging driver | what failed? |
| health | container healthcheck, service ps | is task healthy? |
| events | Docker events | did task die/restart/update? |
| metrics | app metrics, daemon metrics, host metrics | is it saturated? |
| traces | app instrumentation | where latency occurs? |
| labels | image/service labels | what release is this? |
Useful labels:
deploy:
labels:
com.example.service: case-api
com.example.owner: platform-enforcement
com.example.release: "1.4.2"
com.example.git_sha: "abc123"
com.example.runtime: swarm
Diagnostics command set:
docker service ps reg-case_case-api --no-trunc
docker service logs --since 30m reg-case_case-api
docker service inspect reg-case_case-api --pretty
docker events --since 30m --filter type=service --filter type=container
docker node ls
docker node ps <node-name> --no-trunc
docker stats --no-stream
Incident bundle:
mkdir -p incident-$(date +%Y%m%d-%H%M%S)
cd incident-*
docker node ls > nodes.txt
docker service ls > services.txt
docker stack ps reg-case --no-trunc > stack-ps.txt
docker service inspect reg-case_case-api > case-api.inspect.json
docker service logs --since 1h reg-case_case-api > case-api.logs.txt 2>&1
docker events --since 1h > docker-events.txt 2>&1 || true
This is not fancy.
It is enough to start debugging under pressure.
17. Failure Drills
A top engineer does not wait for production to teach failure.
Practice controlled failure.
Drill 1 — Container Crash
Inject app crash.
Expected:
- service detects task failure;
- restart policy applies;
- logs show cause;
- alert triggers if repeated;
- deployment not silently marked healthy.
Commands:
docker service ps reg-case_case-api --no-trunc
docker service logs reg-case_case-api --since 10m
Drill 2 — Bad Image Rollout
Deploy image with failing healthcheck.
Expected:
- rolling update starts;
- monitor window detects failure;
- update pauses or rolls back depending policy;
- previous version remains serving;
- evidence records failure.
Commands:
docker service update --image registry.example/internal/reg-case-platform:bad reg-case_case-api
docker service ps reg-case_case-api --no-trunc
docker service inspect reg-case_case-api --pretty
Drill 3 — Node Drain
Drain a worker node.
Expected:
- node receives no new tasks;
- existing tasks are rescheduled if replicated;
- capacity remains sufficient;
- stateful workloads are handled intentionally.
docker node update --availability drain worker-01
docker node ps worker-01 --no-trunc
docker service ps reg-case_case-api --no-trunc
Restore:
docker node update --availability active worker-01
Drill 4 — Secret Rotation
Rotate database URL secret to a test credential.
Expected:
- only intended services updated;
- old secret is not removed before successful rollout;
- rollback can restore previous secret;
- logs do not reveal secret.
Drill 5 — Resource Pressure
Lower memory limit temporarily.
Expected:
- OOM/restart visible;
- metrics show memory pressure;
- capacity worksheet updated;
- limit adjusted with evidence.
Drill 6 — Registry Unavailable
Simulate registry pull failure on a node.
Expected:
- existing tasks keep running;
- new task scheduling/pull fails visibly;
- release process blocks until registry recovers;
- runbook identifies image availability dependency.
18. Release Checklist
Use this before every production deployment.
Source
- Git commit is known.
- Dockerfile reviewed.
- Compose/stack file reviewed.
- Migration plan reviewed.
- Rollback compatibility reviewed.
- New secrets/configs identified.
- Capacity impact estimated.
Build
- Build uses BuildKit/buildx.
- Image is tagged with commit SHA.
- Image digest is recorded.
- SBOM generated.
- Provenance generated where available.
- Vulnerability/policy scan reviewed.
- Base image is approved.
Test
- Unit tests pass.
- Integration tests pass using image artifact.
- Migration test pass.
- Healthcheck tested.
- Container starts as non-root.
- Read-only filesystem compatibility tested.
- Compose test stack cleaned up.
Deploy
- Swarm manager quorum healthy.
- Nodes have enough capacity for
start-firstupdate. - Stack config rendered and reviewed.
- Secrets/configs exist.
- External dependencies reachable.
-
docker stack deployexecuted from manager. - Convergence monitored.
Verify
- Service replicas desired = running.
- Healthchecks passing.
- Smoke test passing.
- Logs free from immediate errors.
- Metrics normal.
- No unexpected restarts.
- Release evidence archived.
Rollback
- Previous image digest known.
-
docker service rollbackpath known. - Data/schema rollback risk understood.
- Secret/config rollback path known.
- Incident owner assigned.
19. Architecture Decision Record Template
For container platform decisions, use ADRs.
# ADR: Deploy Regulatory Case API on Docker Swarm
## Status
Accepted
## Context
We need a production runtime for a small internal case-management platform.
The team already operates Docker Engine and requires simple orchestration,
rolling updates, secrets, overlay networking, and service scheduling.
## Decision
Use Docker Swarm with stack deploy for the initial production runtime.
Use Compose for local development and integration testing.
Use immutable image digests for deployment.
Use external managed PostgreSQL.
## Consequences
Positive:
- lower operational complexity than Kubernetes for this scope;
- native Docker workflow from dev to deploy;
- built-in service orchestration and secrets;
- rolling update/rollback supported.
Negative:
- smaller ecosystem than Kubernetes;
- stateful workloads need careful placement/externalization;
- advanced traffic management requires extra components;
- team must maintain Swarm manager quorum and backup.
## Invariants
- no production deployment by mutable tag only;
- all app containers run non-root;
- no Docker socket mounted into app containers;
- secrets injected at runtime;
- healthcheck required for rolling updates;
- rollback compatibility reviewed before deployment.
ADR quality matters because container decisions often look obvious until failure.
A good ADR preserves reasoning.
20. Platform Maturity Rubric
Use this rubric to measure your Docker maturity.
| Level | Description | Observable evidence |
|---|---|---|
| 1 — Command user | Can run containers manually | ad hoc docker run, few conventions |
| 2 — Compose user | Can define local multi-service apps | compose.yml, volumes, networks, healthchecks |
| 3 — Image engineer | Can build reproducible images | multi-stage, cache, non-root, labels, small runtime |
| 4 — Runtime engineer | Can reason about lifecycle/resources/security | signals, restart, cgroups, mounts, capabilities |
| 5 — Platform engineer | Can operate release lifecycle | CI build, registry, scan, deploy, rollback, observability |
| 6 — Systems engineer | Can model failures and govern decisions | threat model, ADRs, incident drills, capacity model |
Top 1% practical target:
You can design, review, debug, and operate containerized systems under uncertainty, explain trade-offs clearly, and leave behind evidence that another engineer can trust.
21. Final 20-Hour Practice Plan
This is the Kaufman-style finishing plan.
Hour 1–2 — Rebuild the Mental Model
Deliverable:
- one-page diagram of Docker Engine, image, container, network, volume, registry, Compose, Swarm;
- explain image vs container vs service vs task without notes.
Practice:
docker image inspect <image>
docker container inspect <container>
docker network inspect <network>
docker volume inspect <volume>
docker service inspect <service>
Hour 3–4 — Dockerfile Mastery
Deliverable:
- multi-stage Dockerfile;
- build cache optimization;
- non-root runtime;
- read-only-compatible runtime.
Practice:
docker buildx build --progress=plain -t app:local .
docker history app:local
docker scout quickview app:local || true
Hour 5–6 — Runtime Lifecycle
Deliverable:
- app handles SIGTERM;
- healthcheck works;
- restart behavior understood;
- logs are clean.
Practice:
docker run --rm app:local
docker stop <container>
docker inspect <container>
docker events --since 10m
Hour 7–8 — Networking and Storage
Deliverable:
- service-to-service DNS works;
- no accidental host port exposure;
- volumes and bind mounts understood;
- backup/restore tested.
Practice:
docker network create lab-net
docker run --network lab-net --name a ...
docker run --network lab-net --name b ...
docker volume create lab-data
Hour 9–10 — Compose Dev Stack
Deliverable:
compose.ymlbase;compose.dev.ymloverride;- health-gated startup;
- profile for debug/seed tools.
Practice:
docker compose config
docker compose up --build
docker compose ps
docker compose logs -f
Hour 11–12 — Compose Test Harness
Deliverable:
- ephemeral integration test stack;
- unique project name;
- cleanup trap;
- diagnostics on failure.
Practice:
docker compose -f compose.yml -f compose.test.yml up \
--abort-on-container-exit \
--exit-code-from test-runner
Hour 13–14 — Supply Chain Evidence
Deliverable:
- image pushed to registry;
- digest captured;
- SBOM/provenance generated;
- vulnerability report reviewed.
Practice:
docker buildx build --sbom=true --provenance=true --push ...
docker buildx imagetools inspect <image>
docker scout sbom <image> || true
Hour 15–16 — Swarm Stack Deploy
Deliverable:
- swarm initialized;
- overlay network works;
- stack deployed;
- services converge;
- placement constraints tested.
Practice:
docker swarm init
docker stack deploy -c stack.yml reg-case
docker stack services reg-case
docker stack ps reg-case --no-trunc
Hour 17 — Rolling Update and Rollback
Deliverable:
- successful rolling update;
- failed rollout triggers rollback/pause;
- previous digest known.
Practice:
docker service update --image <new-image> reg-case_case-api
docker service ps reg-case_case-api --no-trunc
docker service rollback reg-case_case-api
Hour 18 — Security Hardening Review
Deliverable:
- no root container unless justified;
- no Docker socket mount;
- caps dropped;
- secrets externalized;
- read-only filesystem tested.
Practice:
docker inspect <container> | jq '.[0].Config.User'
docker inspect <container> | jq '.[0].HostConfig.CapDrop'
docker inspect <container> | jq '.[0].HostConfig.ReadonlyRootfs'
Hour 19 — Observability and Incident Drill
Deliverable:
- collect logs/events/inspect output;
- identify restart reason;
- correlate deployment to failure;
- write incident note.
Practice:
docker service logs --since 30m reg-case_case-api
docker events --since 30m
docker stats --no-stream
Hour 20 — Final Review and Teachback
Deliverable:
- explain platform lifecycle end-to-end;
- produce ADR;
- produce release checklist;
- identify three remaining risks;
- teach another engineer the design.
This final hour matters.
If you cannot teach it, you do not own it yet.
22. Common Capstone Failure Modes
| Failure | Typical cause | Better response |
|---|---|---|
| Works on laptop, fails in Swarm | hidden host dependency | use service DNS, overlay networks, external configs |
| Rolling update causes outage | no health gate/capacity | use healthcheck, start-first, enough headroom |
| Rollback fails | destructive migration | expand-contract migration |
| Secret appears in image history | passed via ARG/COPY | use BuildKit secrets/runtime secrets |
| Image pull fails on worker | registry/auth issue | use registry auth, preflight, digest evidence |
| App cannot write temp files | read-only FS without tmpfs | declare writable path explicitly |
| CI tests flaky | startup race | health-gated dependency + diagnostics |
| Disk full | logs/images/volumes unbounded | log rotation, prune policy, monitoring |
| Container runs as root | default base image behavior | explicit USER, ownership, permission test |
| Stateful service moved incorrectly | volume locality ignored | placement constraints/external storage |
23. What “Done” Means for This Series
You are done with this series when you can:
- build a secure, reproducible image;
- explain every Dockerfile instruction you used;
- run a realistic local Compose stack;
- debug DNS, port, volume, permission, health, and restart issues;
- design a test harness with ephemeral Compose;
- promote images through registry using tags and digests;
- produce SBOM/provenance/scan evidence;
- deploy a Swarm stack;
- operate rolling update and rollback;
- model stateful workload risk;
- rotate secrets/configs safely;
- collect incident diagnostics;
- write an ADR defending the architecture;
- teach the entire lifecycle clearly.
That is the difference between “I know Docker” and “I can operate containerized systems”.
24. Final Mental Model
Containerization is not packaging.
Containerization is boundary design.
Dockerfile defines artifact boundary.
Image defines distribution boundary.
Container defines process boundary.
Network defines communication boundary.
Volume defines state boundary.
Compose defines local application boundary.
Swarm defines cluster desired-state boundary.
Secrets/configs define trust boundary.
Observability defines feedback boundary.
Release evidence defines accountability boundary.
A top engineer sees those boundaries, designs them intentionally, and verifies them under failure.
25. References
Primary references for this capstone:
- Docker Docs — Get Docker / Docker overview:
https://docs.docker.com/get-started/get-docker/ - Docker Docs — Compose profiles:
https://docs.docker.com/compose/how-tos/profiles/ - Docker Docs — Compose file reference:
https://docs.docker.com/reference/compose-file/ - Docker Docs — Compose Deploy Specification:
https://docs.docker.com/reference/compose-file/deploy/ - Docker Docs — Deploy a stack to a swarm:
https://docs.docker.com/engine/swarm/stack-deploy/ - Docker Docs — Swarm services:
https://docs.docker.com/engine/swarm/services/ - Docker Docs — Swarm networking:
https://docs.docker.com/engine/swarm/networking/ - Docker Docs — Rolling updates:
https://docs.docker.com/engine/swarm/swarm-tutorial/rolling-update/ - Docker CLI —
docker stack deploy:https://docs.docker.com/reference/cli/docker/stack/deploy/ - Docker CLI —
docker service rollback:https://docs.docker.com/reference/cli/docker/service/rollback/ - Docker Build — SBOM/provenance attestations:
https://docs.docker.com/build/ci/github-actions/attestations/ - Docker Scout — SBOM:
https://docs.docker.com/scout/how-tos/view-create-sboms/
Seri Selesai
Ini adalah Part 035 dan merupakan bagian terakhir dari seri:
Learn Docker, Containerization, Docker Compose, Docker Swarm
Total selesai: 35 / 35 part.
You just completed lesson 35 in final stretch. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.
Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.