Build CoreOrdered learning track

Debugging Containers Like a Systems Engineer

Learn Docker, Containerization, Docker Compose, Docker Swarm - Part 014

A systems-engineering approach to debugging Docker containers using inspect, logs, events, exec, Docker Debug, namespaces, filesystem inspection, and failure-mode runbooks.

20 min read3985 words
PrevNext
Lesson 1435 lesson track0719 Build Core
#docker#containerization#debugging#troubleshooting+3 more

Part 014 — Debugging Containers Like a Systems Engineer

Debugging containers is not about memorizing more commands.

It is about building a repeatable evidence path from symptom to boundary to root cause.

Many Docker debugging sessions fail because the engineer jumps directly into the container and starts poking around. That sometimes works, but it often creates noise. A better approach is to first locate the failure layer:

  1. Is the image wrong?
  2. Is the container configuration wrong?
  3. Is the process failing?
  4. Is the filesystem boundary wrong?
  5. Is the network boundary wrong?
  6. Is DNS wrong?
  7. Is the host resource envelope wrong?
  8. Is Docker daemon state wrong?
  9. Is the assumption about lifecycle wrong?

This part gives a systems-debugging method for Docker.

1. Learning Objective

After this part, we want to be able to:

  1. Debug a container without randomly trying flags.
  2. Read docker inspect as a runtime contract.
  3. Use logs, events, stats, and exit codes as evidence.
  4. Distinguish image failure from container configuration failure.
  5. Debug minimal images that do not contain shells or network tools.
  6. Diagnose permission, DNS, port, mount, healthcheck, signal, and OOM failures.
  7. Build reusable runbooks for common Docker incidents.
  8. Avoid unsafe debugging practices that mutate production containers.

2. Kaufman Skill Deconstruction

The debugging skill decomposes into smaller subskills:

SubskillPractice TargetFeedback Signal
Container state readingps, inspect, lifecycle status, exit codeCan explain what Docker thinks exists
Process debugginglogs, exec, top, PID 1, signalsCan explain what the main process did
Filesystem debuggingmounts, ownership, diff, cp, read-only pathsCan explain what files are visible/mutable
Network debuggingports, networks, DNS, routesCan explain who can reach whom
Resource debuggingstats, OOM, CPU throttling, disk pressureCan explain resource-induced failures
Daemon debuggingevents, daemon logs, Docker Desktop diagnoseCan explain infrastructure-level symptoms
Minimal image debuggingsidecar/debug toolbox, docker debug, debug variantsCan debug without polluting prod image
Runbook writingsymptom -> evidence -> cause -> fixCan help teammates repeat the reasoning

3. Debugging Mental Model: Evidence Layers

A container incident is usually visible at multiple layers.

Do not start at the deepest layer. Start with the cheapest facts.

A good order:

  1. docker ps -a
  2. docker inspect <container>
  3. docker logs <container>
  4. docker events --since ...
  5. docker stats if running
  6. docker exec or docker debug if process is running or image needs inspection
  7. network/mount/volume inspect
  8. daemon logs if Docker itself behaves unexpectedly

4. First Principle: Separate Desired State, Actual State, and Observed Symptom

Every debugging session should separate three things:

LayerQuestionExample
Desired stateWhat did we ask Docker to run?image, command, env, mounts, ports, user
Actual Docker stateWhat did Docker create/start/stop?created, running, exited, health status
Observed symptomWhat did user/app see?HTTP 502, connection refused, permission denied

Example:

Symptom: API returns 502 from reverse proxy.
Desired state: app should listen on 8080 inside container.
Actual Docker state: container running but healthcheck unhealthy.
Process evidence: app logs say failed to connect to database at localhost:5432.
Root cause: dependency host should be postgres, not localhost.

Without this separation, teams often fix the proxy when the root cause is database configuration.

5. Command Map: What Each Tool Answers

CommandPrimary QuestionNotes
docker ps -aWhat containers exist and in what state?First command for lifecycle issues
docker inspectWhat exact config and runtime metadata exist?Truth source for mounts, env, network, state
docker logsWhat did the container write to stdout/stderr?Depends on logging driver availability
docker eventsWhat did Docker observe over time?Useful for restarts, kills, OOM, health events
docker execCan we run a command inside a running container?Requires process/container to be running
docker debugCan we debug a slim image/container without built-in tools?Useful for minimal production images
docker statsWhat resources are being used now?CPU, memory, network IO, block IO
docker topWhat processes are running?Quick process tree signal
docker portHow are ports published?Validates host-to-container mapping
docker network inspectWhich containers are attached to a network?DNS/connectivity debugging
docker volume inspectWhere/how is a volume managed?Persistence and data debugging
docker diffWhat changed in writable layer?Finds unexpected runtime mutation
docker cpExtract files for inspectionUseful when container exited
docker system dfWhat consumes Docker disk?Cleanup/capacity debugging

6. docker inspect as Runtime Contract

docker inspect is not just verbose JSON. It is the realized runtime contract.

docker inspect app | jq '.[0] | {
  Id,
  Name,
  State,
  Config: {
    Image: .Config.Image,
    User: .Config.User,
    Entrypoint: .Config.Entrypoint,
    Cmd: .Config.Cmd,
    Env: .Config.Env,
    WorkingDir: .Config.WorkingDir,
    Healthcheck: .Config.Healthcheck
  },
  HostConfig: {
    Binds: .HostConfig.Binds,
    Mounts: .Mounts,
    PortBindings: .HostConfig.PortBindings,
    NetworkMode: .HostConfig.NetworkMode,
    RestartPolicy: .HostConfig.RestartPolicy,
    Memory: .HostConfig.Memory,
    NanoCpus: .HostConfig.NanoCpus,
    CapAdd: .HostConfig.CapAdd,
    CapDrop: .HostConfig.CapDrop,
    Privileged: .HostConfig.Privileged
  },
  NetworkSettings
}'

Important fields:

FieldWhy It Matters
.State.Statuscreated, running, paused, restarting, exited, dead
.State.ExitCodeprocess termination clue
.State.OOMKilledmemory limit failure signal
.State.Healthhealthcheck state and recent outputs
.Config.Entrypoint + .Config.Cmdactual process contract
.Config.Userruntime identity
.Config.Envenvironment contract; be careful with secrets
.Mountsactual mounted paths
.HostConfig.PortBindingshost port publishing
.NetworkSettings.Networksnetwork attachment and IPs
.HostConfig.RestartPolicyrestart behavior
.HostConfig.Privilegedbroad boundary expansion
.HostConfig.CapAdd/CapDropkernel capability contract

A senior engineer should be able to look at inspect and say, “This container could never have worked because the process listens on port 8080 but only port 8081 is published,” or “The app is running as UID 10001 but the bind mount is host-owned by UID 1000 with mode 700.”

7. Logs: Useful, but Not Complete

docker logs fetches logs captured from the container's stdout/stderr stream for logging drivers that support it.

Basic usage:

docker logs app

docker logs --tail=100 app

docker logs -f app

docker logs --since=30m app

docker logs --timestamps app

Important constraints:

  • logs show what the process emitted, not everything Docker knows
  • some logging drivers change retrieval behavior
  • logs may contain secrets if the app logs environment/config carelessly
  • logs may not exist if the process fails before emitting output
  • logs do not prove a port is listening or dependency is reachable

Use logs as one evidence stream, not the entire debugging model.

8. Events: Debugging Time and Lifecycle

docker events gives real-time events from the Docker daemon.

Examples:

docker events --since 30m

docker events \
  --filter container=app \
  --since 1h

docker events \
  --filter type=container \
  --filter event=die \
  --since 2h

Events help answer:

  • Did Docker restart the container?
  • Did it die repeatedly?
  • Was it killed?
  • Did the health status change?
  • Did a network connect/disconnect happen?
  • Did a volume or image operation happen around the incident?

Example interpretation:

2026-07-01T10:00:03 container start app
2026-07-01T10:00:05 container health_status: unhealthy app
2026-07-01T10:00:07 container die app exitCode=1
2026-07-01T10:00:08 container restart app

This points to a startup/restart loop, not necessarily a network outage.

9. Exec vs Debug vs Recreate

9.1 docker exec

docker exec runs a new command in a running container.

docker exec -it app sh

docker exec app id

docker exec app ps aux

docker exec app cat /etc/resolv.conf

Limitations:

  • the container must be running
  • the image must contain the command you want to run
  • slim/distroless images may not include a shell
  • mutating a running container can invalidate evidence

9.2 docker debug

Modern Docker provides docker debug to enter or inspect containers/images even when the image itself does not include common debugging tools. This is important because production images should be small and secure, not packed with shells, package managers, and troubleshooting utilities.

Typical intent:

docker debug app

Use it to inspect:

  • process state
  • network settings
  • mounted files
  • filesystem content
  • runtime assumptions

The key engineering idea:

Do not bloat production images just to make debugging convenient. Attach debugging capability when needed through controlled tooling.

9.3 Recreate for Clean Evidence

Sometimes the safest debug step is not to mutate the running container but to create a comparable one:

docker run --rm -it \
  --entrypoint sh \
  my-app:prod

or inspect the image:

docker image inspect my-app:prod

For crashed containers, preserve evidence before cleanup:

docker ps -a

docker logs crashed-app

docker inspect crashed-app > crashed-app.inspect.json

docker cp crashed-app:/app/logs ./extracted-logs || true

10. Debugging Runbook: Container Exits Immediately

Symptom:

docker run my-app
# container exits immediately

Evidence path:

docker ps -a --no-trunc

docker logs <container>

docker inspect <container> | jq '.[0].State'

docker inspect <container> | jq '.[0].Config.Entrypoint, .[0].Config.Cmd, .[0].Config.Env'

Common causes:

CauseEvidenceFix
wrong command/entrypointexec: not found, exit 127fix Dockerfile ENTRYPOINT/CMD
missing executable permissionpermission deniedchmod +x at build time, correct copy mode
app config missinglogs mention missing env/configinject required config properly
dependency unavailableconnection refused/name not resolvedfix dependency host/readiness/retry
app runs foreground incorrectlymain process exitsrun actual long-lived process
architecture mismatchexec format errorbuild correct platform
file path wrongno such filecorrect WORKDIR, copy path, command

Decision tree:

11. Debugging Runbook: Container Is Running but App Is Unreachable

Symptom:

curl localhost:8080 fails

Evidence path:

docker ps

docker port app

docker inspect app | jq '.[0].NetworkSettings.Ports'

docker exec app ss -ltnp || docker exec app netstat -ltnp || true

docker logs --tail=100 app

Common causes:

CauseEvidenceFix
port not publisheddocker port emptyadd -p host:container or Compose ports
app listens on wrong portprocess listens on 8081 not 8080align app config and port mapping
app binds to 127.0.0.1 inside containeronly loopback listenerbind to 0.0.0.0 inside container
host firewall blocks portcontainer port mapping existsfix host firewall/security group
wrong protocolHTTP vs HTTPS/TCPtest correct protocol
reverse proxy target wrongproxy logs 502use service name and correct internal port

Important principle:

Publishing a port maps host traffic to a container port. It does not force the application inside the container to listen on that port or on the correct interface.

Wrong app behavior:

Listening on 127.0.0.1:8080

Better container behavior:

Listening on 0.0.0.0:8080

12. Debugging Runbook: Compose Service Cannot Reach Another Service

Symptom:

api cannot connect to postgres

Evidence path:

docker compose ps

docker compose logs api postgres

docker compose exec api cat /etc/resolv.conf

docker compose exec api getent hosts postgres

docker compose exec api sh -c 'nc -vz postgres 5432' || true

docker network ls

docker network inspect <project>_default

Common causes:

CauseEvidenceFix
using localhostenv shows localhost:5432use postgres:5432
service not on same networknetwork inspect missing peerattach services to same network
database not readyDNS resolves but connection failshealthcheck/retry/backoff
wrong exposed portusing host-published port internallyuse container port inside network
wrong credentials/db nameauth errorsfix env/secrets
dependency crashedcompose ps shows exiteddebug dependency first

Internal Compose connectivity should use service names and container ports.

Do not do this inside Compose:

DATABASE_URL: postgres://localhost:5432/app

Use:

DATABASE_URL: postgres://postgres:5432/app

13. Debugging Runbook: Permission Denied on Mounted Path

Symptom:

Permission denied: /data/file

Evidence path:

docker inspect app | jq '.[0].Config.User, .[0].Mounts'

docker exec app id

docker exec app ls -lna /data

ls -lna ./data

Common causes:

CauseEvidenceFix
container UID cannot write host dirnumeric UID mismatchalign ownership or runtime user
bind mount read-onlyinspect shows RW falsemake writable only if safe
rootfs read-onlywrite outside allowed path failswrite to declared tmpfs/volume
named volume initialized as rootvolume content root-ownedinitialization/chown strategy
AppArmor/SELinux denialhost audit logscorrect label/profile/policy

Do not fix blindly with:

chmod -R 777 data

A better fix is to understand the numeric identity contract.

14. Debugging Runbook: Healthcheck Is Failing

Symptom:

docker ps
# STATUS: Up 2 minutes (unhealthy)

Evidence path:

docker inspect app | jq '.[0].State.Health'

docker logs --tail=100 app

docker exec app sh -c '<healthcheck-command>'

Common causes:

CauseEvidenceFix
command missinghealth log says executable not founduse available binary or app endpoint
endpoint too strictapp works but health failsseparate liveness/readiness semantics
startup takes longerfails early then recoversadd start period / interval tuning
dependency included in livenessunhealthy during dependency blipdon't confuse liveness with readiness
wrong port/path404/connection refusedalign healthcheck command

Healthcheck design matters. A healthcheck is not just a command; it is a lifecycle signal consumed by humans and orchestrators.

Bad healthcheck:

HEALTHCHECK CMD curl -f http://localhost:8080/full-dependency-check || exit 1

Better liveness-oriented healthcheck:

HEALTHCHECK --interval=30s --timeout=3s --start-period=20s --retries=3 \
  CMD wget -qO- http://localhost:8080/health/live || exit 1

Readiness often belongs at the orchestrator/reverse proxy layer or in application startup logic, depending on platform.

15. Debugging Runbook: OOMKilled or Memory Pressure

Symptom:

docker inspect app | jq '.[0].State.OOMKilled'
# true

Evidence path:

docker inspect app | jq '.[0].State'

docker inspect app | jq '.[0].HostConfig.Memory, .[0].HostConfig.MemorySwap'

docker stats app

docker logs --tail=200 app

Common causes:

CauseEvidenceFix
memory limit too lowOOMKilled trueset realistic limit
JVM/Go/runtime unaware of limitheap too largeconfigure runtime memory behavior
memory leakincreasing RSS over timefix app, add profiling
bursty workloadspikes before deathqueue/backpressure/concurrency limit
page cache/IO pressuresystem-level pressurecapacity planning

For Java containers, do not only set Docker memory limit. Make sure JVM ergonomics are understood. Modern JVMs are container-aware, but application-specific heap, direct memory, metaspace, thread stacks, native buffers, and off-heap caches still need a memory budget.

Example memory budget:

Container limit: 1024 MiB
JVM heap:        512 MiB
Metaspace:       128 MiB
Direct/native:   128 MiB
Thread stacks:    64 MiB
OS/libs/headroom:192 MiB

16. Debugging Runbook: CPU Throttling and Slow App

Symptom:

app works but latency is high under load

Evidence path:

docker stats app

docker inspect app | jq '.[0].HostConfig.NanoCpus, .[0].HostConfig.CpuQuota, .[0].HostConfig.CpuPeriod, .[0].HostConfig.CpuShares'

Host-level checks may be needed:

cat /sys/fs/cgroup/*/cpu.stat 2>/dev/null || true

Common causes:

CauseEvidenceFix
CPU quota too lowhigh CPU %, throttlingincrease quota or reduce concurrency
too many app threadscontext switching, latencytune thread pools
noisy neighborhost saturationisolate workload/capacity
build tasks in same hostCPU spikesseparate build/runtime capacity
app blocks on IOCPU low, latency highdebug network/storage dependency

Do not confuse CPU limit with CPU reservation. A container can be throttled even though the application logic is correct.

17. Debugging Runbook: Image Builds but Runtime File Missing

Symptom:

No such file or directory: /app/app.jar

Evidence path:

docker image inspect my-app

docker run --rm --entrypoint sh my-app -c 'ls -lna /app || true'

docker history my-app

docker build --no-cache --progress=plain .

Common causes:

CauseEvidenceFix
wrong COPY pathbuild logsfix build context/path
.dockerignore excluded filefile absent in contextupdate .dockerignore
multi-stage wrong sourceCOPY --from wrongcorrect stage alias/path
mount masks image contentapp exists in image but not containerinspect mounts
architecture mismatchexec format errorbuild correct platform

Important boundary:

A file can exist in the image but disappear in the running container if a mount overlays the target path.

Example:

services:
  app:
    image: my-app
    volumes:
      - ./empty-dir:/app

This hides /app from the image with the host directory.

18. Debugging Runbook: Minimal Image Has No Shell

Symptom:

docker exec -it app sh
# executable file not found

This is not a bad image. It may be a good production image.

Options:

  1. Use docker debug.
  2. Run a separate debug container on the same network.
  3. Use a development/debug image variant.
  4. Recreate with an alternate entrypoint only if the image has necessary tools.
  5. Inspect artifacts using docker cp or image export techniques.

Example sidecar network debug:

docker run --rm -it \
  --network container:app \
  nicolaka/netshoot

Use third-party debug images carefully. In restricted environments, maintain an internal approved debug toolbox image.

19. Debugging Network Namespaces

For advanced Linux debugging, the key concept is namespace sharing.

Debug from a helper container sharing the target's network namespace:

docker run --rm -it \
  --network container:app \
  alpine sh

Inside helper:

ip addr
ip route
cat /etc/resolv.conf
wget -qO- http://127.0.0.1:8080/health || true

This helps when the target image lacks tools.

On a Linux host, nsenter can be used when you know the target process PID:

PID=$(docker inspect -f '{{.State.Pid}}' app)
sudo nsenter -t "$PID" -n ip addr
sudo nsenter -t "$PID" -n ip route

Use host-level namespace debugging carefully in production. It is powerful and can bypass some operational guardrails.

20. Debugging Filesystem Mutation

Find unexpected writes:

docker diff app

Output meanings:

A = Added
C = Changed
D = Deleted

Example:

C /etc/app/config.yml
A /tmp/cache.bin
A /app/generated.log

Interpretation:

  • /tmp/cache.bin may be fine if /tmp is declared writable.
  • /app/generated.log may indicate the app writes into the immutable application directory.
  • /etc/app/config.yml changed at runtime may be dangerous unless explicitly designed.

Extract files:

docker cp app:/app/generated.log ./generated.log

Compare with expected image state:

docker run --rm --entrypoint sh my-app -c 'find /app -maxdepth 2 -type f -ls'

21. Debugging Docker Daemon and Desktop Issues

Sometimes the container is not the problem. Docker itself or Docker Desktop integration is.

Symptoms:

  • Docker commands hang
  • pulls fail despite internet working on host
  • DNS works on host but not container
  • volumes behave differently on macOS/Windows
  • file sharing path not mounted
  • WSL integration issue
  • daemon restart disrupted containers

Evidence path:

docker version

docker info

docker system df

docker events --since 1h

On Linux hosts:

journalctl -u docker --since "1 hour ago"

Docker daemon debug logging can be enabled through daemon.json, but do not leave verbose debug logging on by default in production unless approved.

Docker Desktop has diagnostic tooling and logs accessible through its troubleshooting UI. For team environments, record the exact Desktop version, backend, OS version, and resource settings.

22. Debugging Compose as an Application Graph

Compose debugging is graph debugging.

Useful commands:

docker compose config

docker compose ps

docker compose logs -f --tail=100

docker compose events

docker compose top

docker compose exec api sh

docker compose down --remove-orphans

22.1 Render the Effective Config

Always inspect effective Compose config:

docker compose config

This catches:

  • environment interpolation mistakes
  • override file effects
  • profile activation assumptions
  • invalid indentation hidden by YAML anchors
  • unexpected default network/volume names

22.2 Orphans and Project Names

Compose project names affect network, volume, and container names.

Symptoms of project-name confusion:

  • old containers still running
  • wrong network used
  • app connects to stale database
  • volume from previous run reused

Useful cleanup:

docker compose down --remove-orphans

For test isolation:

docker compose -p "test_${BUILD_ID}" up -d

Then cleanup:

docker compose -p "test_${BUILD_ID}" down -v --remove-orphans

23. Debugging Build Failures

Build failures often involve context, cache, platform, credentials, or multi-stage references.

Evidence path:

docker build --progress=plain .

docker build --no-cache --progress=plain .

docker buildx build --progress=plain --platform linux/amd64 .

Common causes:

CauseEvidenceFix
missing file in contextCOPY failedcheck build context and .dockerignore
cache hides stale statebuild succeeds unexpectedlyuse --no-cache for diagnosis
wrong platformexec format or package mismatchset --platform intentionally
private dependency auth401/403use BuildKit secrets/SSH, not copied credentials
multi-stage alias wrongmissing from stagename stages clearly
package repo transient issueapt/apk timeoutretry strategy, mirror policy

Do not put credentials into Dockerfile or build args as a lazy debug step. Use BuildKit secret/SSH mechanisms.

24. Debugging Decision Tree

The decision tree is intentionally boring. Good incident response is often boring because it follows evidence instead of drama.

25. Safe Debugging Rules

  1. Preserve evidence before cleanup.
  2. Avoid mutating production containers unless required for mitigation.
  3. Do not install tools into a running production container as the default debugging method.
  4. Prefer debug sidecar/toolbox patterns.
  5. Export inspect, logs, events, and Compose config into incident artifacts.
  6. Do not print secrets from environment variables into shared channels.
  7. Use read-only inspection where possible.
  8. Reproduce in disposable environment before changing platform policy.
  9. Do not solve unknown permission issues with --privileged.
  10. Convert repeated incidents into runbooks and tests.

26. Incident Artifact Template

When debugging a non-trivial Docker issue, collect a structured artifact:

# Docker Incident Artifact

## Summary
- Symptom:
- First observed:
- Impact:
- Environment:

## Docker Version
```bash
docker version
docker info

Container State

docker ps -a --no-trunc
docker inspect <container>

Logs

docker logs --timestamps --tail=500 <container>

Events

docker events --since <time> --until <time>

Compose Effective Config

docker compose config

Network Evidence

docker network inspect <network>
docker port <container>

Mount/Volume Evidence

docker inspect <container> | jq '.[0].Mounts'
docker volume inspect <volume>

Resource Evidence

docker stats --no-stream <container>

Root Cause

Fix

Prevention

This makes debugging reviewable and teachable. ## 27. Practice Labs ### Lab 1 — Exit Code and Entrypoint 1. Build an image with a wrong `ENTRYPOINT` path. 2. Run it and observe failure. 3. Use `docker ps -a`, `logs`, and `inspect` to identify the error. 4. Fix the Dockerfile. Expected learning: - Container exit is often command contract failure. - `inspect` shows what Docker actually tried to run. ### Lab 2 — Port Binding Failure 1. Run an app that listens on `127.0.0.1` inside the container. 2. Publish the port. 3. Try to reach it from the host. 4. Change app binding to `0.0.0.0`. 5. Explain the difference. Expected learning: - Published ports do not fix app listener binding. ### Lab 3 — Compose `localhost` Trap 1. Create `api` and `postgres` services. 2. Configure `api` to connect to `localhost`. 3. Observe failure. 4. Change to `postgres`. 5. Use `getent hosts postgres` from inside `api`. Expected learning: - Container localhost is not dependency localhost. ### Lab 4 — Mount Masks Image Content 1. Build an image with `/app/app.jar`. 2. Run it with `-v ./empty:/app`. 3. Observe missing file. 4. Inspect mounts. 5. Explain why the file exists in the image but not the container. Expected learning: - Mount targets overlay image paths. ### Lab 5 — OOMKilled 1. Run a memory-hungry process with a low memory limit. 2. Observe `OOMKilled` in inspect. 3. Check logs and stats. 4. Increase memory or reduce workload. Expected learning: - Not every crash is an application exception. ### Lab 6 — Minimal Image Debugging 1. Run a minimal image with no shell. 2. Try `docker exec -it app sh`. 3. Use `docker debug` or a toolbox container sharing the network namespace. 4. Inspect network/process/filesystem. Expected learning: - Production images can be minimal and still debuggable with the right method. ## 28. Patterns for Better Debuggability ### 28.1 Emit Logs to stdout/stderr Container platforms expect logs on stdout/stderr. Avoid hidden log files unless an agent explicitly collects them. ### 28.2 Make Config Visible Without Exposing Secrets At startup, log safe config summary: ```text profile=prod http.port=8080 database.host=postgres database.port=5432 feature.x.enabled=true secret.database.password=<redacted>

This saves hours.

28.3 Include Health Endpoints

Expose clear endpoints:

  • /health/live — process is alive
  • /health/ready — ready to serve traffic
  • /metrics — operational metrics

Do not make liveness depend on every downstream dependency.

28.4 Label Containers

Labels make filtering easier:

LABEL org.opencontainers.image.source="https://example.com/repo"
LABEL com.example.service="billing-api"
LABEL com.example.team="payments-platform"

Compose:

services:
  app:
    labels:
      com.example.service: billing-api
      com.example.team: payments-platform

Then:

docker ps --filter label=com.example.team=payments-platform

28.5 Provide a Debug Variant

Production image:

my-app:1.4.2

Debug image:

my-app:1.4.2-debug

Rules:

  • same app artifact
  • extra tools allowed
  • not used as production runtime
  • stored in same registry with clear policy
  • scanned separately

28.6 Make Startup Failures Explicit

Bad:

Error occurred

Good:

ConfigurationError: DATABASE_URL is required. Expected format: jdbc:postgresql://<host>:<port>/<db>

Container debugging quality often begins in application error quality.

29. Common Root Cause Patterns

SymptomRoot Cause PatternPrevention
Works on laptop, fails in CIhidden bind mount, env var, platform differencerender config, isolate test project, avoid host assumptions
Works with docker run, fails in Composedifferent network/env/commandcompare inspect and compose config
Works in Compose, fails in productionCompose conveniences not presentdefine production runtime contract separately
Random permission errorsUID/GID not specifiedfixed non-root identity and volume init policy
Random DNS failureshost resolver/proxy/custom DNS driftexplicit DNS/network policy
Unreachable servicewrong bind address or port mappingbind 0.0.0.0, document internal/external ports
Repeated restartsrestart policy masking app crashinspect events and exit code
Image too hard to debugminimal image without debug plandocker debug, toolbox, debug variant
Disk fills hostunused images/volumes/logssystem df, log rotation, prune policy
Secrets exposedprinted env/logs or copied into imageruntime secrets and redaction

30. The Senior Engineer's Docker Debug Loop

Use this loop until it becomes automatic:

The prevention step is where top engineers separate themselves.

A fix without prevention is local heroism.

A fix with runbook, test, config validation, or platform default is engineering leverage.

31. Summary

Docker debugging is structured boundary reasoning.

The main skill is to avoid guessing. Start with state, then inspect the realized contract, then read logs/events, then go into the container or debug namespace only when needed.

The most important debugging questions are:

  • What did Docker create?
  • What process actually ran?
  • What user did it run as?
  • What filesystem did it see?
  • What network namespace did it use?
  • What DNS resolver did it use?
  • What resource envelope constrained it?
  • What did Docker events say happened over time?
  • What changed between the environment where it works and the one where it fails?

A container is easy to run. A containerized system is easy to misunderstand.

Debugging is the discipline that closes that gap.

References

Lesson Recap

You just completed lesson 14 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.