Logging, Diagnostics, dan Runtime Visibility
Part 020 — Logging, Diagnostics, dan Runtime Visibility
Membahas logging dan diagnostics Python: logging levels, logger hierarchy, structured context, exception logging, correlation id, metrics mindset, debugging, runtime visibility, dan operational design.
Part 020 — Logging, Diagnostics, dan Runtime Visibility
1. Tujuan Part Ini
Kode yang benar di test belum tentu mudah dioperasikan.
Saat aplikasi berjalan di mesin user, CI, server, container, job scheduler, atau production environment, kamu perlu menjawab:
- Apa yang terjadi?
- Command apa yang dijalankan?
- Input apa yang diproses?
- Case mana yang gagal?
- Error-nya apa?
- Root cause di mana?
- Apakah failure terjadi di domain, storage, atau boundary?
- Apakah data berhasil disimpan?
- Berapa lama operasi berlangsung?
- Berapa banyak item diproses?
- Apakah retry terjadi?
- Apakah user perlu aksi?
- Apakah engineer perlu investigasi?
Logging dan diagnostics memberi runtime visibility.
Part ini membahas:
logging;- log levels;
- logger hierarchy;
- exception logging;
- context;
- correlation id;
- diagnostic messages;
printvs logging;- debugging;
- timing;
- operational events;
- testing logs;
- penerapan ke
case-tracker.
Target setelah part ini:
- Memahami kapan memakai logging vs print.
- Memahami logger per module.
- Memahami log levels.
- Mengkonfigurasi logging di boundary.
- Melog exception dengan traceback.
- Menambahkan context yang cukup.
- Menghindari logging secret.
- Mengukur durasi operasi.
- Mendesain diagnostic messages.
- Menambahkan runtime visibility ke
case-tracker.
2. Print vs Logging
print cocok untuk:
- CLI user output;
- quick scratch debugging;
- command result;
- educational examples.
logging cocok untuk:
- diagnostics;
- internal application events;
- warnings/errors;
- operational visibility;
- troubleshooting;
- libraries;
- long-running apps;
- structured runtime information.
Example CLI:
print(f"Created {case.id}: {case.title}")
This is user-facing output.
Example logging:
logger.info("Created case %s", case.id)
This is diagnostic output.
Rule:
Boundary may print user output. Internal modules should log diagnostics or raise errors, not print.
3. Logger per Module
Use:
import logging
logger = logging.getLogger(__name__)
In case_tracker.service, logger name becomes:
case_tracker.service
In case_tracker.storage:
case_tracker.storage
Benefits:
- logs show source module;
- log levels can be configured per module;
- library code does not configure global logging;
- diagnostics are organized.
Do not call basicConfig in library/domain modules. Configure logging at application boundary.
4. Logging Configuration at Boundary
In CLI:
import logging
def configure_logging(verbose: bool) -> None:
level = logging.DEBUG if verbose else logging.INFO
logging.basicConfig(
level=level,
format="%(levelname)s %(name)s: %(message)s",
)
In main:
def main(argv: list[str] | None = None) -> int:
parser = build_parser()
args = parser.parse_args(argv)
configure_logging(args.verbose)
...
Parser:
parser.add_argument("--verbose", action="store_true", help="Enable verbose diagnostic logs.")
5. Log Levels
Common levels:
| Level | Meaning |
|---|---|
DEBUG | Detailed diagnostic info for developers |
INFO | Normal significant events |
WARNING | Something unexpected but operation may continue |
ERROR | Operation failed |
CRITICAL | Severe failure, system may be unusable |
Examples:
logger.debug("Loaded raw case data from %s", path)
logger.info("Created case %s", case.id)
logger.warning("Case store is empty: %s", path)
logger.error("Failed to save cases to %s", path)
logger.critical("Cannot initialize application")
5.1 Level Judgment
Bad:
logger.error("User entered invalid status")
If invalid status is normal user error, INFO or WARNING may be enough.
Bad:
logger.info("Unexpected exception")
Unexpected exception should be ERROR.
Log level should reflect operational severity.
6. Message Style
Prefer lazy formatting:
logger.info("Transitioning case %s to %s", case_id, target_status.value)
Avoid:
logger.info(f"Transitioning case {case_id} to {target_status.value}")
Why?
- logging can defer formatting;
- style is consistent;
- structured logging adapters can work better;
- avoids cost when level disabled.
7. What to Log
Log events that help reconstruct behavior:
- application start/stop;
- command received;
- important domain action;
- persistence read/write;
- external call start/failure;
- validation failure if operationally relevant;
- retry;
- fallback;
- migration;
- unexpected exception;
- performance timing;
- data counts.
Do not log every line. Noise hides signal.
7.1 Case Tracker Useful Logs
- store path selected;
- number of cases loaded;
- case created;
- transition attempted;
- transition succeeded;
- invalid transition rejected;
- save succeeded;
- store corrupted;
- CSV export completed;
- command duration.
8. What Not to Log
Avoid logging:
- passwords;
- tokens;
- API keys;
- full personal data if not needed;
- sensitive case content;
- large payloads;
- raw user input that may contain secrets;
- excessive stack traces for expected errors;
- data regulated by privacy/security policy.
For regulatory/case systems, logs themselves can become sensitive records. Design retention and access accordingly.
9. Logging Exceptions
Use logger.exception inside except.
try:
run_command()
except Exception:
logger.exception("Unexpected command failure")
return 1
logger.exception logs at ERROR level and includes traceback.
Equivalent:
logger.error("Unexpected command failure", exc_info=True)
Do not log exception and then swallow silently unless boundary deliberately handles it.
10. Expected vs Unexpected Errors
Expected user/domain error:
try:
case.transition_to(target_status)
except InvalidCaseTransitionError as error:
logger.info("Rejected invalid transition: %s", error)
print(error)
return 1
Unexpected bug:
try:
...
except Exception:
logger.exception("Unexpected error")
print("Unexpected error. See logs for details.")
return 1
During learning/development, letting traceback show can be useful. For production CLI/user-facing app, show friendly message and log detail.
11. Exception Context
Log context, not just error.
Weak:
logger.exception("Save failed")
Better:
logger.exception("Failed to save %d cases to %s", len(cases), path)
Context should answer:
- operation?
- entity id?
- path/resource?
- count?
- target state?
- user/request/job id if available?
- duration?
But avoid sensitive data.
12. Correlation ID
A correlation ID ties logs from one operation together.
For CLI command:
from uuid import uuid4
command_id = str(uuid4())
Log:
logger.info("Starting command %s", command_id)
Better with LoggerAdapter:
adapter = logging.LoggerAdapter(logger, {"command_id": command_id})
adapter.info("Starting command")
But default formatter must include extra field or adapter context will not show.
Simple approach for CLI:
logger.info("command_id=%s Starting command", command_id)
For web systems, correlation/request IDs become critical.
13. Structured Logging Mindset
Plain text log:
INFO case_tracker.service: Created case CASE-001
Structured style:
event=case_created case_id=CASE-001 status=DRAFT
Or JSON logs in production.
Standard library logging can support structured-ish messages, but full structured logging often uses additional tooling.
For now, write messages with consistent key context:
logger.info("event=case_created case_id=%s", case.id)
This is searchable.
14. Logging in Libraries
Libraries should not configure global logging.
Library/module:
logger = logging.getLogger(__name__)
No basicConfig.
Application:
logging.basicConfig(...)
Why?
- application owns output format;
- application owns level;
- library should be composable;
- global config in imported module surprises users.
case-tracker as application can configure in CLI boundary.
15. Diagnostics vs Observability
Diagnostics:
- logs;
- stack traces;
- debug output;
- timing;
- local troubleshooting.
Observability broader:
- logs;
- metrics;
- traces;
- events;
- health checks;
- dashboards;
- alerts;
- correlation;
- sampling;
- SLOs.
This part focuses on foundational logging/diagnostics. Production observability will be covered later.
16. Metrics Mindset
Even without metrics system, think in counters and timings.
For command:
- cases_loaded_count;
- cases_saved_count;
- command_duration_ms;
- invalid_transition_count;
- store_load_error_count;
- export_row_count.
Local timing:
from time import perf_counter
start = perf_counter()
...
elapsed_ms = (perf_counter() - start) * 1000
logger.info("event=command_completed duration_ms=%.2f", elapsed_ms)
Use perf_counter for measuring durations.
17. Timing Context Manager
from contextlib import contextmanager
from time import perf_counter
from collections.abc import Iterator
@contextmanager
def log_duration(operation: str) -> Iterator[None]:
start = perf_counter()
try:
yield
finally:
elapsed_ms = (perf_counter() - start) * 1000
logger.info("event=operation_completed operation=%s duration_ms=%.2f", operation, elapsed_ms)
Use:
with log_duration("load_cases"):
cases = load_cases(path)
In real code, pass logger or define utility carefully to avoid global coupling.
18. Debugging with breakpoint
Use:
breakpoint()
Run code and inspect variables interactively.
Commands in pdb:
n next line
s step into
c continue
p var print variable
l list source
q quit
Do not commit breakpoints.
Ruff/linter can catch accidental breakpoint() if configured.
19. Traceback Reading
Traceback is diagnostic artifact.
Example:
Traceback (most recent call last):
File "cli.py", line 80, in main
transition_case(...)
File "service.py", line 42, in transition_case
case.transition_to(target_status)
File "domain.py", line 33, in transition_to
raise InvalidCaseTransitionError(...)
InvalidCaseTransitionError: Cannot transition case from DRAFT to CLOSED
Read from bottom:
- Exception type.
- Message.
- Line where exception raised.
- Call chain above.
Logs should preserve tracebacks for unexpected errors.
20. Diagnostic Error Messages
Good diagnostic message:
Case store is corrupted: data/cases.json. Reason: Invalid JSON
Poor message:
Failed
Good message includes:
- what failed;
- resource/entity;
- reason;
- maybe next action;
- no sensitive data.
For user-facing CLI:
Could not load case store data/cases.json: invalid JSON.
For internal log:
logger.exception("Failed to load case store path=%s", path)
21. Logging Boundary Pattern
Lower layer:
def load_cases(path: Path) -> list[Case]:
...
raise CaseStoreCorruptedError(path, "Invalid JSON")
Boundary:
try:
cases = load_cases(path)
except CaseStoreCorruptedError as error:
logger.error("event=case_store_corrupted path=%s reason=%s", error.path, error.reason)
print(error)
return 1
Do not log same error at every layer.
Rule:
Raise with context in lower layers. Log at boundary/recovery point.
22. Avoid Duplicate Logs
Bad:
def load_cases(path: Path) -> list[Case]:
try:
...
except Exception:
logger.exception("load failed")
raise
Then CLI:
except Exception:
logger.exception("command failed")
Same exception logged twice.
This creates noisy logs.
Better:
- lower layer raises contextual exception;
- boundary logs once.
Exception message/attributes carry context.
23. Logging Domain Events
Domain event concept:
@dataclass(frozen=True)
class CaseTransitioned:
case_id: CaseId
from_status: CaseStatus
to_status: CaseStatus
For now, simple log:
logger.info(
"event=case_transitioned case_id=%s from_status=%s to_status=%s",
case.id,
from_status.value,
target_status.value,
)
Later, domain events may be persisted/audited. Logs are not substitute for audit trail.
Important distinction:
| Log | Audit |
|---|---|
| Operational diagnostics | Business/legal record |
| May be sampled/rotated | Must meet retention/integrity |
| Engineer-focused | Compliance/user/business-focused |
| Not always complete | Should be complete for required events |
For regulatory systems, do not rely on logs as audit source unless explicitly designed.
24. Audit Trail vs Logging
Case management often needs audit trail:
- who changed status;
- when;
- from/to;
- reason;
- actor;
- source;
- request id;
- policy decision.
Logging can record operational event, but audit trail should be domain data.
Example audit event:
@dataclass(frozen=True)
class AuditEvent:
id: str
case_id: CaseId
event_type: str
occurred_at: datetime
actor: str
details: dict[str, str]
Store audit separately from logs if required.
25. CLI Verbosity
Add CLI flags:
parser.add_argument("-v", "--verbose", action="store_true")
parser.add_argument("--quiet", action="store_true")
Config:
def configure_logging(verbose: bool, quiet: bool) -> None:
if quiet:
level = logging.WARNING
elif verbose:
level = logging.DEBUG
else:
level = logging.INFO
logging.basicConfig(level=level, format="%(levelname)s %(name)s: %(message)s")
Be careful: user output should still be controlled separately from logs.
26. Logging to stderr vs stdout
By default, logging often goes to stderr. This is good.
CLI stdout can be machine-readable output:
case-tracker list > cases.txt
Logs on stderr do not pollute stdout.
User-facing output via print defaults to stdout. Error messages may go to stderr:
import sys
print("Error: invalid status", file=sys.stderr)
For CLI tools, stdout/stderr separation matters.
27. Testing Logs with caplog
def test_transition_logs_event(caplog, tmp_path) -> None:
path = tmp_path / "cases.json"
created = create_new_case(path, "Late reporting")
with caplog.at_level(logging.INFO):
transition_case(path, created.id, CaseStatus.SUBMITTED)
assert "case_transitioned" in caplog.text
Do not over-test logs unless logs are part of required behavior.
Test logs when:
- important operational event;
- warning/error expected;
- regression around noisy/missing logs;
- compliance-ish diagnostic requirement.
28. Testing stderr/stdout
capsys:
def test_main_prints_error_to_stderr(capsys) -> None:
exit_code = main(["transition", "CASE-001", "INVALID"])
captured = capsys.readouterr()
assert exit_code == 1
assert "Invalid status" in captured.err
If you choose errors to stdout, test that. But for CLI convention, stderr is often better for errors.
29. Runtime Diagnostics Command
Add command:
case-tracker diagnostics
Output:
Python: 3.x
Package: case-tracker 0.1.0
Store path: cases.json
Store exists: yes
Case count: 10
Useful for support.
Implementation:
import platform
from importlib.metadata import version, PackageNotFoundError
def get_package_version(package_name: str) -> str:
try:
return version(package_name)
except PackageNotFoundError:
return "unknown"
Diagnostics commands help operational troubleshooting.
30. Health Check Mindset
For services, health checks answer:
- process alive?
- dependencies reachable?
- database migration compatible?
- config valid?
- disk writable?
- queue reachable?
For CLI case-tracker, diagnostics might check:
- store path parent exists/writable;
- store file parseable;
- package version;
- Python version;
- configured log level.
This habit scales to web services later.
31. Progressive Logging for case-tracker
Add logger to modules:
service.py:
logger = logging.getLogger(__name__)
Create:
def create_new_case(path: Path, title: str) -> Case:
logger.debug("event=create_case_started path=%s", path)
cases = load_cases(path)
case = create_case(title)
cases.append(case)
save_cases(path, cases)
logger.info("event=case_created case_id=%s", case.id)
return case
Storage:
logger.debug("event=cases_loaded path=%s count=%d", path, len(cases))
CLI:
logger.info("event=command_started command=%s", args.command)
32. Logging Invalid Transitions
def transition_case(path: Path, case_id: str, target_status: CaseStatus) -> Case:
cases = load_cases(path)
case = get_case_from_list(cases, case_id)
from_status = case.status
try:
case.transition_to(target_status)
except InvalidCaseTransitionError:
logger.info(
"event=case_transition_rejected case_id=%s from_status=%s to_status=%s",
case_id,
from_status.value,
target_status.value,
)
raise
save_cases(path, cases)
logger.info(
"event=case_transitioned case_id=%s from_status=%s to_status=%s",
case_id,
from_status.value,
target_status.value,
)
return case
This logs expected domain rejection as info. Depending on ops policy, warning may be appropriate if repeated.
33. Logging Storage Corruption
In boundary:
except CaseStoreCorruptedError as error:
logger.error(
"event=case_store_corrupted path=%s reason=%s",
error.path,
error.reason,
)
print(error, file=sys.stderr)
return 1
If you use logger.exception, include traceback. For corrupted user file, traceback may be less useful at user level, but useful in debug logs.
Maybe:
logger.debug("case store corruption details", exc_info=True)
Design depends on audience.
34. Avoid Logging Sensitive Case Details
Bad:
logger.info("Created case title=%s notes=%s", case.title, case.notes)
Case title/notes may contain sensitive data.
Better:
logger.info("event=case_created case_id=%s status=%s", case.id, case.status.value)
Use identifiers and metadata, not full content, unless policy allows.
35. Logging and Performance
Logging disabled levels are cheap but not free.
Avoid expensive computation inside log call:
Bad:
logger.debug("cases=%s", expensive_render_cases(cases))
expensive_render_cases runs even if DEBUG disabled.
Better:
if logger.isEnabledFor(logging.DEBUG):
logger.debug("cases=%s", expensive_render_cases(cases))
For normal simple variables, no need to guard.
36. Warnings
Python has warnings module for warning library users about deprecated behavior.
import warnings
warnings.warn("old_function is deprecated", DeprecationWarning, stacklevel=2)
Use warnings for:
- deprecation;
- library API changes;
- runtime caution for developers.
Use logging for operational events.
Do not confuse warnings and logs.
37. Diagnostics and Exit Codes
CLI exit codes:
| Code | Meaning |
|---|---|
| 0 | success |
| 1 | general handled error |
| 2 | usage/argument error commonly used by argparse |
| >2 | specific application errors if desired |
Example:
return 0
On invalid domain input:
return 1
Argparse may exit with 2 for parse error.
For advanced CLI, define constants:
EXIT_SUCCESS = 0
EXIT_ERROR = 1
38. Debug Mode
Debug mode may:
- set logging level DEBUG;
- include traceback for expected errors;
- output config;
- enable timing;
- avoid friendly suppression.
Example:
parser.add_argument("--debug", action="store_true")
Boundary:
except CaseTrackerError as error:
if args.debug:
raise
print(error, file=sys.stderr)
return 1
Useful for local support.
39. Operational Runbook
For even small tools, write a runbook:
# Case Tracker Diagnostics
## Check version
```bash
case-tracker diagnostics
Enable verbose logs
case-tracker --verbose list
Validate store
case-tracker diagnostics --store cases.json
Common Errors
Case store is corrupted
Cause:
- invalid JSON
- partial write
- manual edit
Action:
- restore backup
- validate JSON
- run migration if available
Runbooks turn logs into action.
---
## 40. Observability Evolution Path
For `case-tracker`:
Stage 1:
- logging;
- verbose flag;
- clear error messages;
- diagnostics command.
Stage 2:
- audit events;
- duration logging;
- structured key-value logs.
Stage 3:
- metrics if long-running service;
- tracing if distributed;
- dashboards/alerts;
- correlation IDs across requests.
Do not add production observability stack to a small CLI prematurely. But design messages and boundaries with future in mind.
---
## 41. Logging Smell Checklist
Watch for:
1. `print` in domain/service/storage.
2. `basicConfig` in imported module.
3. Error logged at every layer.
4. `except Exception: logger.error(...); pass`.
5. f-string logging everywhere.
6. Logs missing entity id/context.
7. Logs include secrets/sensitive content.
8. Expected user error logged as scary exception.
9. Unexpected exception logged without traceback.
10. Huge payloads in logs.
11. Debug logs require expensive computation.
12. Logs are not searchable.
13. No command/request correlation.
14. CLI logs pollute stdout machine output.
15. Tests assert too much exact log text.
---
## 42. Practice: Add Logging Config
Add to `cli.py`:
```python
def configure_logging(verbose: bool) -> None:
level = logging.DEBUG if verbose else logging.INFO
logging.basicConfig(
level=level,
format="%(levelname)s %(name)s: %(message)s",
)
Parser:
parser.add_argument("--verbose", action="store_true")
Call after parse.
43. Practice: Add Service Logs
In service.py:
logger = logging.getLogger(__name__)
Add logs to:
create_new_case;transition_case;add_case_note;list_cases.
Keep logs contextual but not sensitive.
44. Practice: Log Duration
Implement:
@contextmanager
def log_duration(logger: logging.Logger, operation: str) -> Iterator[None]:
start = perf_counter()
try:
yield
finally:
elapsed_ms = (perf_counter() - start) * 1000
logger.info("event=operation_completed operation=%s duration_ms=%.2f", operation, elapsed_ms)
Use in CLI command handling.
45. Practice: Capture Logs in Test
def test_create_case_logs_event(tmp_path, caplog) -> None:
path = tmp_path / "cases.json"
with caplog.at_level(logging.INFO):
create_new_case(path, "Late reporting")
assert "event=case_created" in caplog.text
Keep assertion stable.
46. Practice: stderr for Errors
In CLI:
print(error, file=sys.stderr)
Test:
def test_invalid_status_prints_to_stderr(capsys) -> None:
exit_code = main(["transition", "CASE-001", "INVALID"])
captured = capsys.readouterr()
assert exit_code == 1
assert "Invalid status" in captured.err
47. Practice: Diagnostics Command
Add:
case-tracker diagnostics
Report:
- Python version;
- package version;
- store path;
- store exists;
- store parseable;
- case count.
Test with tmp_path.
48. Self-Check
Jawab tanpa melihat materi:
- Kapan memakai print?
- Kapan memakai logging?
- Kenapa logger per module?
- Kenapa logging dikonfigurasi di boundary?
- Apa arti DEBUG, INFO, WARNING, ERROR, CRITICAL?
- Kenapa lazy logging format disarankan?
- Apa yang perlu dilog untuk transition case?
- Apa yang tidak boleh dilog?
- Bagaimana melog exception dengan traceback?
- Apa beda expected dan unexpected error?
- Apa itu correlation ID?
- Apa itu structured logging mindset?
- Kenapa library tidak boleh
basicConfig? - Apa beda log dan audit trail?
- Kenapa logs biasanya ke stderr?
- Bagaimana test logs dengan
caplog? - Bagaimana test stdout/stderr dengan
capsys? - Apa guna
perf_counter? - Apa isi diagnostics command yang berguna?
- Apa logging smell paling berbahaya?
49. Definition of Done Part 020
Kamu selesai part ini jika bisa:
- Menambahkan logger per module.
- Mengkonfigurasi logging di CLI boundary.
- Menambahkan
--verbose. - Memakai log levels dengan tepat.
- Memakai lazy logging formatting.
- Melog expected domain rejection.
- Melog unexpected exception dengan traceback.
- Menghindari duplicate logs.
- Menghindari sensitive data di logs.
- Mengukur durasi operasi.
- Memahami log vs audit.
- Mengarahkan error CLI ke stderr.
- Menulis test dengan
caplog. - Menulis test dengan
capsys. - Membuat diagnostics command sederhana.
50. Ringkasan
Runtime visibility membuat aplikasi bisa dioperasikan dan didiagnosis.
Inti part ini:
printuntuk user output, logging untuk diagnostics;- logger per module memberi struktur;
- logging config dilakukan di application boundary;
- log levels harus mencerminkan severity;
- gunakan lazy formatting;
- log context yang membantu, bukan payload sensitif;
- log unexpected exceptions dengan traceback;
- expected domain errors tidak selalu harus ERROR;
- avoid duplicate logging across layers;
- stdout/stderr separation penting untuk CLI;
- timing dengan
perf_countermembantu performance diagnostics; - logs bukan audit trail;
- diagnostics command dan runbook meningkatkan operability;
- observability berkembang bertahap dari logs ke metrics/tracing.
Part berikutnya akan membahas concurrency model Python: threads, processes, GIL, workload classification, dan cara memilih concurrency approach yang benar.
51. Referensi
- Python Documentation —
logging. - Python Documentation —
traceback. - Python Documentation —
time.perf_counter. - Python Documentation —
warnings. - Python Documentation —
pdb. - Python Documentation —
sys. - Python Documentation —
importlib.metadata.
You just completed lesson 20 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.
Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.