Deepen PracticeOrdered learning track

Logging, Diagnostics, dan Runtime Visibility

Part 020 — Logging, Diagnostics, dan Runtime Visibility

Membahas logging dan diagnostics Python: logging levels, logger hierarchy, structured context, exception logging, correlation id, metrics mindset, debugging, runtime visibility, dan operational design.

13 min read2426 words
PrevNext
Lesson 2035 lesson track2029 Deepen Practice
#python#logging#diagnostics#observability+3 more

Part 020 — Logging, Diagnostics, dan Runtime Visibility

1. Tujuan Part Ini

Kode yang benar di test belum tentu mudah dioperasikan.

Saat aplikasi berjalan di mesin user, CI, server, container, job scheduler, atau production environment, kamu perlu menjawab:

  • Apa yang terjadi?
  • Command apa yang dijalankan?
  • Input apa yang diproses?
  • Case mana yang gagal?
  • Error-nya apa?
  • Root cause di mana?
  • Apakah failure terjadi di domain, storage, atau boundary?
  • Apakah data berhasil disimpan?
  • Berapa lama operasi berlangsung?
  • Berapa banyak item diproses?
  • Apakah retry terjadi?
  • Apakah user perlu aksi?
  • Apakah engineer perlu investigasi?

Logging dan diagnostics memberi runtime visibility.

Part ini membahas:

  • logging;
  • log levels;
  • logger hierarchy;
  • exception logging;
  • context;
  • correlation id;
  • diagnostic messages;
  • print vs logging;
  • debugging;
  • timing;
  • operational events;
  • testing logs;
  • penerapan ke case-tracker.

Target setelah part ini:

  1. Memahami kapan memakai logging vs print.
  2. Memahami logger per module.
  3. Memahami log levels.
  4. Mengkonfigurasi logging di boundary.
  5. Melog exception dengan traceback.
  6. Menambahkan context yang cukup.
  7. Menghindari logging secret.
  8. Mengukur durasi operasi.
  9. Mendesain diagnostic messages.
  10. Menambahkan runtime visibility ke case-tracker.

2. Print vs Logging

print cocok untuk:

  • CLI user output;
  • quick scratch debugging;
  • command result;
  • educational examples.

logging cocok untuk:

  • diagnostics;
  • internal application events;
  • warnings/errors;
  • operational visibility;
  • troubleshooting;
  • libraries;
  • long-running apps;
  • structured runtime information.

Example CLI:

print(f"Created {case.id}: {case.title}")

This is user-facing output.

Example logging:

logger.info("Created case %s", case.id)

This is diagnostic output.

Rule:

Boundary may print user output. Internal modules should log diagnostics or raise errors, not print.


3. Logger per Module

Use:

import logging

logger = logging.getLogger(__name__)

In case_tracker.service, logger name becomes:

case_tracker.service

In case_tracker.storage:

case_tracker.storage

Benefits:

  • logs show source module;
  • log levels can be configured per module;
  • library code does not configure global logging;
  • diagnostics are organized.

Do not call basicConfig in library/domain modules. Configure logging at application boundary.


4. Logging Configuration at Boundary

In CLI:

import logging


def configure_logging(verbose: bool) -> None:
    level = logging.DEBUG if verbose else logging.INFO

    logging.basicConfig(
        level=level,
        format="%(levelname)s %(name)s: %(message)s",
    )

In main:

def main(argv: list[str] | None = None) -> int:
    parser = build_parser()
    args = parser.parse_args(argv)

    configure_logging(args.verbose)
    ...

Parser:

parser.add_argument("--verbose", action="store_true", help="Enable verbose diagnostic logs.")

5. Log Levels

Common levels:

LevelMeaning
DEBUGDetailed diagnostic info for developers
INFONormal significant events
WARNINGSomething unexpected but operation may continue
ERROROperation failed
CRITICALSevere failure, system may be unusable

Examples:

logger.debug("Loaded raw case data from %s", path)
logger.info("Created case %s", case.id)
logger.warning("Case store is empty: %s", path)
logger.error("Failed to save cases to %s", path)
logger.critical("Cannot initialize application")

5.1 Level Judgment

Bad:

logger.error("User entered invalid status")

If invalid status is normal user error, INFO or WARNING may be enough.

Bad:

logger.info("Unexpected exception")

Unexpected exception should be ERROR.

Log level should reflect operational severity.


6. Message Style

Prefer lazy formatting:

logger.info("Transitioning case %s to %s", case_id, target_status.value)

Avoid:

logger.info(f"Transitioning case {case_id} to {target_status.value}")

Why?

  • logging can defer formatting;
  • style is consistent;
  • structured logging adapters can work better;
  • avoids cost when level disabled.

7. What to Log

Log events that help reconstruct behavior:

  • application start/stop;
  • command received;
  • important domain action;
  • persistence read/write;
  • external call start/failure;
  • validation failure if operationally relevant;
  • retry;
  • fallback;
  • migration;
  • unexpected exception;
  • performance timing;
  • data counts.

Do not log every line. Noise hides signal.

7.1 Case Tracker Useful Logs

  • store path selected;
  • number of cases loaded;
  • case created;
  • transition attempted;
  • transition succeeded;
  • invalid transition rejected;
  • save succeeded;
  • store corrupted;
  • CSV export completed;
  • command duration.

8. What Not to Log

Avoid logging:

  • passwords;
  • tokens;
  • API keys;
  • full personal data if not needed;
  • sensitive case content;
  • large payloads;
  • raw user input that may contain secrets;
  • excessive stack traces for expected errors;
  • data regulated by privacy/security policy.

For regulatory/case systems, logs themselves can become sensitive records. Design retention and access accordingly.


9. Logging Exceptions

Use logger.exception inside except.

try:
    run_command()
except Exception:
    logger.exception("Unexpected command failure")
    return 1

logger.exception logs at ERROR level and includes traceback.

Equivalent:

logger.error("Unexpected command failure", exc_info=True)

Do not log exception and then swallow silently unless boundary deliberately handles it.


10. Expected vs Unexpected Errors

Expected user/domain error:

try:
    case.transition_to(target_status)
except InvalidCaseTransitionError as error:
    logger.info("Rejected invalid transition: %s", error)
    print(error)
    return 1

Unexpected bug:

try:
    ...
except Exception:
    logger.exception("Unexpected error")
    print("Unexpected error. See logs for details.")
    return 1

During learning/development, letting traceback show can be useful. For production CLI/user-facing app, show friendly message and log detail.


11. Exception Context

Log context, not just error.

Weak:

logger.exception("Save failed")

Better:

logger.exception("Failed to save %d cases to %s", len(cases), path)

Context should answer:

  • operation?
  • entity id?
  • path/resource?
  • count?
  • target state?
  • user/request/job id if available?
  • duration?

But avoid sensitive data.


12. Correlation ID

A correlation ID ties logs from one operation together.

For CLI command:

from uuid import uuid4

command_id = str(uuid4())

Log:

logger.info("Starting command %s", command_id)

Better with LoggerAdapter:

adapter = logging.LoggerAdapter(logger, {"command_id": command_id})
adapter.info("Starting command")

But default formatter must include extra field or adapter context will not show.

Simple approach for CLI:

logger.info("command_id=%s Starting command", command_id)

For web systems, correlation/request IDs become critical.


13. Structured Logging Mindset

Plain text log:

INFO case_tracker.service: Created case CASE-001

Structured style:

event=case_created case_id=CASE-001 status=DRAFT

Or JSON logs in production.

Standard library logging can support structured-ish messages, but full structured logging often uses additional tooling.

For now, write messages with consistent key context:

logger.info("event=case_created case_id=%s", case.id)

This is searchable.


14. Logging in Libraries

Libraries should not configure global logging.

Library/module:

logger = logging.getLogger(__name__)

No basicConfig.

Application:

logging.basicConfig(...)

Why?

  • application owns output format;
  • application owns level;
  • library should be composable;
  • global config in imported module surprises users.

case-tracker as application can configure in CLI boundary.


15. Diagnostics vs Observability

Diagnostics:

  • logs;
  • stack traces;
  • debug output;
  • timing;
  • local troubleshooting.

Observability broader:

  • logs;
  • metrics;
  • traces;
  • events;
  • health checks;
  • dashboards;
  • alerts;
  • correlation;
  • sampling;
  • SLOs.

This part focuses on foundational logging/diagnostics. Production observability will be covered later.


16. Metrics Mindset

Even without metrics system, think in counters and timings.

For command:

  • cases_loaded_count;
  • cases_saved_count;
  • command_duration_ms;
  • invalid_transition_count;
  • store_load_error_count;
  • export_row_count.

Local timing:

from time import perf_counter

start = perf_counter()
...
elapsed_ms = (perf_counter() - start) * 1000
logger.info("event=command_completed duration_ms=%.2f", elapsed_ms)

Use perf_counter for measuring durations.


17. Timing Context Manager

from contextlib import contextmanager
from time import perf_counter
from collections.abc import Iterator


@contextmanager
def log_duration(operation: str) -> Iterator[None]:
    start = perf_counter()
    try:
        yield
    finally:
        elapsed_ms = (perf_counter() - start) * 1000
        logger.info("event=operation_completed operation=%s duration_ms=%.2f", operation, elapsed_ms)

Use:

with log_duration("load_cases"):
    cases = load_cases(path)

In real code, pass logger or define utility carefully to avoid global coupling.


18. Debugging with breakpoint

Use:

breakpoint()

Run code and inspect variables interactively.

Commands in pdb:

n       next line
s       step into
c       continue
p var   print variable
l       list source
q       quit

Do not commit breakpoints.

Ruff/linter can catch accidental breakpoint() if configured.


19. Traceback Reading

Traceback is diagnostic artifact.

Example:

Traceback (most recent call last):
  File "cli.py", line 80, in main
    transition_case(...)
  File "service.py", line 42, in transition_case
    case.transition_to(target_status)
  File "domain.py", line 33, in transition_to
    raise InvalidCaseTransitionError(...)
InvalidCaseTransitionError: Cannot transition case from DRAFT to CLOSED

Read from bottom:

  1. Exception type.
  2. Message.
  3. Line where exception raised.
  4. Call chain above.

Logs should preserve tracebacks for unexpected errors.


20. Diagnostic Error Messages

Good diagnostic message:

Case store is corrupted: data/cases.json. Reason: Invalid JSON

Poor message:

Failed

Good message includes:

  • what failed;
  • resource/entity;
  • reason;
  • maybe next action;
  • no sensitive data.

For user-facing CLI:

Could not load case store data/cases.json: invalid JSON.

For internal log:

logger.exception("Failed to load case store path=%s", path)

21. Logging Boundary Pattern

Lower layer:

def load_cases(path: Path) -> list[Case]:
    ...
    raise CaseStoreCorruptedError(path, "Invalid JSON")

Boundary:

try:
    cases = load_cases(path)
except CaseStoreCorruptedError as error:
    logger.error("event=case_store_corrupted path=%s reason=%s", error.path, error.reason)
    print(error)
    return 1

Do not log same error at every layer.

Rule:

Raise with context in lower layers. Log at boundary/recovery point.


22. Avoid Duplicate Logs

Bad:

def load_cases(path: Path) -> list[Case]:
    try:
        ...
    except Exception:
        logger.exception("load failed")
        raise

Then CLI:

except Exception:
    logger.exception("command failed")

Same exception logged twice.

This creates noisy logs.

Better:

  • lower layer raises contextual exception;
  • boundary logs once.

Exception message/attributes carry context.


23. Logging Domain Events

Domain event concept:

@dataclass(frozen=True)
class CaseTransitioned:
    case_id: CaseId
    from_status: CaseStatus
    to_status: CaseStatus

For now, simple log:

logger.info(
    "event=case_transitioned case_id=%s from_status=%s to_status=%s",
    case.id,
    from_status.value,
    target_status.value,
)

Later, domain events may be persisted/audited. Logs are not substitute for audit trail.

Important distinction:

LogAudit
Operational diagnosticsBusiness/legal record
May be sampled/rotatedMust meet retention/integrity
Engineer-focusedCompliance/user/business-focused
Not always completeShould be complete for required events

For regulatory systems, do not rely on logs as audit source unless explicitly designed.


24. Audit Trail vs Logging

Case management often needs audit trail:

  • who changed status;
  • when;
  • from/to;
  • reason;
  • actor;
  • source;
  • request id;
  • policy decision.

Logging can record operational event, but audit trail should be domain data.

Example audit event:

@dataclass(frozen=True)
class AuditEvent:
    id: str
    case_id: CaseId
    event_type: str
    occurred_at: datetime
    actor: str
    details: dict[str, str]

Store audit separately from logs if required.


25. CLI Verbosity

Add CLI flags:

parser.add_argument("-v", "--verbose", action="store_true")
parser.add_argument("--quiet", action="store_true")

Config:

def configure_logging(verbose: bool, quiet: bool) -> None:
    if quiet:
        level = logging.WARNING
    elif verbose:
        level = logging.DEBUG
    else:
        level = logging.INFO

    logging.basicConfig(level=level, format="%(levelname)s %(name)s: %(message)s")

Be careful: user output should still be controlled separately from logs.


26. Logging to stderr vs stdout

By default, logging often goes to stderr. This is good.

CLI stdout can be machine-readable output:

case-tracker list > cases.txt

Logs on stderr do not pollute stdout.

User-facing output via print defaults to stdout. Error messages may go to stderr:

import sys

print("Error: invalid status", file=sys.stderr)

For CLI tools, stdout/stderr separation matters.


27. Testing Logs with caplog

def test_transition_logs_event(caplog, tmp_path) -> None:
    path = tmp_path / "cases.json"
    created = create_new_case(path, "Late reporting")

    with caplog.at_level(logging.INFO):
        transition_case(path, created.id, CaseStatus.SUBMITTED)

    assert "case_transitioned" in caplog.text

Do not over-test logs unless logs are part of required behavior.

Test logs when:

  • important operational event;
  • warning/error expected;
  • regression around noisy/missing logs;
  • compliance-ish diagnostic requirement.

28. Testing stderr/stdout

capsys:

def test_main_prints_error_to_stderr(capsys) -> None:
    exit_code = main(["transition", "CASE-001", "INVALID"])
    captured = capsys.readouterr()

    assert exit_code == 1
    assert "Invalid status" in captured.err

If you choose errors to stdout, test that. But for CLI convention, stderr is often better for errors.


29. Runtime Diagnostics Command

Add command:

case-tracker diagnostics

Output:

Python: 3.x
Package: case-tracker 0.1.0
Store path: cases.json
Store exists: yes
Case count: 10

Useful for support.

Implementation:

import platform
from importlib.metadata import version, PackageNotFoundError


def get_package_version(package_name: str) -> str:
    try:
        return version(package_name)
    except PackageNotFoundError:
        return "unknown"

Diagnostics commands help operational troubleshooting.


30. Health Check Mindset

For services, health checks answer:

  • process alive?
  • dependencies reachable?
  • database migration compatible?
  • config valid?
  • disk writable?
  • queue reachable?

For CLI case-tracker, diagnostics might check:

  • store path parent exists/writable;
  • store file parseable;
  • package version;
  • Python version;
  • configured log level.

This habit scales to web services later.


31. Progressive Logging for case-tracker

Add logger to modules:

service.py:

logger = logging.getLogger(__name__)

Create:

def create_new_case(path: Path, title: str) -> Case:
    logger.debug("event=create_case_started path=%s", path)

    cases = load_cases(path)
    case = create_case(title)
    cases.append(case)
    save_cases(path, cases)

    logger.info("event=case_created case_id=%s", case.id)
    return case

Storage:

logger.debug("event=cases_loaded path=%s count=%d", path, len(cases))

CLI:

logger.info("event=command_started command=%s", args.command)

32. Logging Invalid Transitions

def transition_case(path: Path, case_id: str, target_status: CaseStatus) -> Case:
    cases = load_cases(path)
    case = get_case_from_list(cases, case_id)
    from_status = case.status

    try:
        case.transition_to(target_status)
    except InvalidCaseTransitionError:
        logger.info(
            "event=case_transition_rejected case_id=%s from_status=%s to_status=%s",
            case_id,
            from_status.value,
            target_status.value,
        )
        raise

    save_cases(path, cases)

    logger.info(
        "event=case_transitioned case_id=%s from_status=%s to_status=%s",
        case_id,
        from_status.value,
        target_status.value,
    )

    return case

This logs expected domain rejection as info. Depending on ops policy, warning may be appropriate if repeated.


33. Logging Storage Corruption

In boundary:

except CaseStoreCorruptedError as error:
    logger.error(
        "event=case_store_corrupted path=%s reason=%s",
        error.path,
        error.reason,
    )
    print(error, file=sys.stderr)
    return 1

If you use logger.exception, include traceback. For corrupted user file, traceback may be less useful at user level, but useful in debug logs.

Maybe:

logger.debug("case store corruption details", exc_info=True)

Design depends on audience.


34. Avoid Logging Sensitive Case Details

Bad:

logger.info("Created case title=%s notes=%s", case.title, case.notes)

Case title/notes may contain sensitive data.

Better:

logger.info("event=case_created case_id=%s status=%s", case.id, case.status.value)

Use identifiers and metadata, not full content, unless policy allows.


35. Logging and Performance

Logging disabled levels are cheap but not free.

Avoid expensive computation inside log call:

Bad:

logger.debug("cases=%s", expensive_render_cases(cases))

expensive_render_cases runs even if DEBUG disabled.

Better:

if logger.isEnabledFor(logging.DEBUG):
    logger.debug("cases=%s", expensive_render_cases(cases))

For normal simple variables, no need to guard.


36. Warnings

Python has warnings module for warning library users about deprecated behavior.

import warnings

warnings.warn("old_function is deprecated", DeprecationWarning, stacklevel=2)

Use warnings for:

  • deprecation;
  • library API changes;
  • runtime caution for developers.

Use logging for operational events.

Do not confuse warnings and logs.


37. Diagnostics and Exit Codes

CLI exit codes:

CodeMeaning
0success
1general handled error
2usage/argument error commonly used by argparse
>2specific application errors if desired

Example:

return 0

On invalid domain input:

return 1

Argparse may exit with 2 for parse error.

For advanced CLI, define constants:

EXIT_SUCCESS = 0
EXIT_ERROR = 1

38. Debug Mode

Debug mode may:

  • set logging level DEBUG;
  • include traceback for expected errors;
  • output config;
  • enable timing;
  • avoid friendly suppression.

Example:

parser.add_argument("--debug", action="store_true")

Boundary:

except CaseTrackerError as error:
    if args.debug:
        raise

    print(error, file=sys.stderr)
    return 1

Useful for local support.


39. Operational Runbook

For even small tools, write a runbook:

# Case Tracker Diagnostics

## Check version

```bash
case-tracker diagnostics

Enable verbose logs

case-tracker --verbose list

Validate store

case-tracker diagnostics --store cases.json

Common Errors

Case store is corrupted

Cause:

  • invalid JSON
  • partial write
  • manual edit

Action:

  • restore backup
  • validate JSON
  • run migration if available
Runbooks turn logs into action. --- ## 40. Observability Evolution Path For `case-tracker`: Stage 1: - logging; - verbose flag; - clear error messages; - diagnostics command. Stage 2: - audit events; - duration logging; - structured key-value logs. Stage 3: - metrics if long-running service; - tracing if distributed; - dashboards/alerts; - correlation IDs across requests. Do not add production observability stack to a small CLI prematurely. But design messages and boundaries with future in mind. --- ## 41. Logging Smell Checklist Watch for: 1. `print` in domain/service/storage. 2. `basicConfig` in imported module. 3. Error logged at every layer. 4. `except Exception: logger.error(...); pass`. 5. f-string logging everywhere. 6. Logs missing entity id/context. 7. Logs include secrets/sensitive content. 8. Expected user error logged as scary exception. 9. Unexpected exception logged without traceback. 10. Huge payloads in logs. 11. Debug logs require expensive computation. 12. Logs are not searchable. 13. No command/request correlation. 14. CLI logs pollute stdout machine output. 15. Tests assert too much exact log text. --- ## 42. Practice: Add Logging Config Add to `cli.py`: ```python def configure_logging(verbose: bool) -> None: level = logging.DEBUG if verbose else logging.INFO logging.basicConfig( level=level, format="%(levelname)s %(name)s: %(message)s", )

Parser:

parser.add_argument("--verbose", action="store_true")

Call after parse.


43. Practice: Add Service Logs

In service.py:

logger = logging.getLogger(__name__)

Add logs to:

  • create_new_case;
  • transition_case;
  • add_case_note;
  • list_cases.

Keep logs contextual but not sensitive.


44. Practice: Log Duration

Implement:

@contextmanager
def log_duration(logger: logging.Logger, operation: str) -> Iterator[None]:
    start = perf_counter()
    try:
        yield
    finally:
        elapsed_ms = (perf_counter() - start) * 1000
        logger.info("event=operation_completed operation=%s duration_ms=%.2f", operation, elapsed_ms)

Use in CLI command handling.


45. Practice: Capture Logs in Test

def test_create_case_logs_event(tmp_path, caplog) -> None:
    path = tmp_path / "cases.json"

    with caplog.at_level(logging.INFO):
        create_new_case(path, "Late reporting")

    assert "event=case_created" in caplog.text

Keep assertion stable.


46. Practice: stderr for Errors

In CLI:

print(error, file=sys.stderr)

Test:

def test_invalid_status_prints_to_stderr(capsys) -> None:
    exit_code = main(["transition", "CASE-001", "INVALID"])
    captured = capsys.readouterr()

    assert exit_code == 1
    assert "Invalid status" in captured.err

47. Practice: Diagnostics Command

Add:

case-tracker diagnostics

Report:

  • Python version;
  • package version;
  • store path;
  • store exists;
  • store parseable;
  • case count.

Test with tmp_path.


48. Self-Check

Jawab tanpa melihat materi:

  1. Kapan memakai print?
  2. Kapan memakai logging?
  3. Kenapa logger per module?
  4. Kenapa logging dikonfigurasi di boundary?
  5. Apa arti DEBUG, INFO, WARNING, ERROR, CRITICAL?
  6. Kenapa lazy logging format disarankan?
  7. Apa yang perlu dilog untuk transition case?
  8. Apa yang tidak boleh dilog?
  9. Bagaimana melog exception dengan traceback?
  10. Apa beda expected dan unexpected error?
  11. Apa itu correlation ID?
  12. Apa itu structured logging mindset?
  13. Kenapa library tidak boleh basicConfig?
  14. Apa beda log dan audit trail?
  15. Kenapa logs biasanya ke stderr?
  16. Bagaimana test logs dengan caplog?
  17. Bagaimana test stdout/stderr dengan capsys?
  18. Apa guna perf_counter?
  19. Apa isi diagnostics command yang berguna?
  20. Apa logging smell paling berbahaya?

49. Definition of Done Part 020

Kamu selesai part ini jika bisa:

  1. Menambahkan logger per module.
  2. Mengkonfigurasi logging di CLI boundary.
  3. Menambahkan --verbose.
  4. Memakai log levels dengan tepat.
  5. Memakai lazy logging formatting.
  6. Melog expected domain rejection.
  7. Melog unexpected exception dengan traceback.
  8. Menghindari duplicate logs.
  9. Menghindari sensitive data di logs.
  10. Mengukur durasi operasi.
  11. Memahami log vs audit.
  12. Mengarahkan error CLI ke stderr.
  13. Menulis test dengan caplog.
  14. Menulis test dengan capsys.
  15. Membuat diagnostics command sederhana.

50. Ringkasan

Runtime visibility membuat aplikasi bisa dioperasikan dan didiagnosis.

Inti part ini:

  • print untuk user output, logging untuk diagnostics;
  • logger per module memberi struktur;
  • logging config dilakukan di application boundary;
  • log levels harus mencerminkan severity;
  • gunakan lazy formatting;
  • log context yang membantu, bukan payload sensitif;
  • log unexpected exceptions dengan traceback;
  • expected domain errors tidak selalu harus ERROR;
  • avoid duplicate logging across layers;
  • stdout/stderr separation penting untuk CLI;
  • timing dengan perf_counter membantu performance diagnostics;
  • logs bukan audit trail;
  • diagnostics command dan runbook meningkatkan operability;
  • observability berkembang bertahap dari logs ke metrics/tracing.

Part berikutnya akan membahas concurrency model Python: threads, processes, GIL, workload classification, dan cara memilih concurrency approach yang benar.


51. Referensi

  • Python Documentation — logging.
  • Python Documentation — traceback.
  • Python Documentation — time.perf_counter.
  • Python Documentation — warnings.
  • Python Documentation — pdb.
  • Python Documentation — sys.
  • Python Documentation — importlib.metadata.
Lesson Recap

You just completed lesson 20 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.