Deepen PracticeOrdered learning track

Performance Engineering: Measurement Before Optimization

Part 023 — Performance Engineering: Measurement Before Optimization

Membahas performance engineering Python secara profesional: measurement before optimization, latency vs throughput, profiling, benchmarking, timeit, cProfile, pstats, flame graph mindset, algorithmic complexity, dan optimization strategy.

13 min read2579 words
PrevNext
Lesson 2335 lesson track2029 Deepen Practice
#python#performance#profiling#benchmarking+4 more

Part 023 — Performance Engineering: Measurement Before Optimization

1. Tujuan Part Ini

Performance engineering bukan sekadar “membuat kode cepat”.

Performance engineering adalah proses sistematis untuk:

  • mendefinisikan target;
  • mengukur baseline;
  • menemukan bottleneck;
  • memahami trade-off;
  • memilih optimisasi yang benar;
  • membuktikan dampak;
  • menjaga correctness;
  • menghindari over-optimization;
  • mengontrol regression.

Kesalahan umum engineer Python:

  • mengoptimisasi berdasarkan feeling;
  • fokus pada micro-optimization sebelum algoritma;
  • memakai async/thread/process tanpa mengukur bottleneck;
  • membuat kode tidak readable demi speed kecil;
  • tidak membedakan latency dan throughput;
  • tidak memakai profiler;
  • benchmarking tanpa warmup/repetition;
  • membandingkan hasil di environment berbeda;
  • mengabaikan I/O/database/network;
  • menganggap Python lambat tanpa melihat desain data;
  • tidak menulis test sebelum optimisasi;
  • tidak memeriksa memory allocation;
  • tidak mempertimbangkan observability setelah optimisasi.

Part ini membahas performance engineering sebagai discipline.

Target setelah part ini:

  1. Memahami prinsip measurement before optimization.
  2. Membedakan latency, throughput, CPU, memory, I/O.
  3. Membuat baseline.
  4. Memakai timeit untuk microbenchmark.
  5. Memakai perf_counter untuk timing aplikasi.
  6. Memakai cProfile dan pstats.
  7. Membaca profiler output.
  8. Mengenali algorithmic bottleneck.
  9. Mengenali I/O bottleneck.
  10. Memilih optimisasi yang masuk akal.
  11. Menghindari performance anti-patterns.
  12. Menerapkan performance loop ke case-tracker.

2. Performance Engineering Loop

Rule utama:

Jangan optimisasi sebelum tahu bottleneck.

Bottleneck nyata sering mengejutkan. Kode yang terlihat “inefficient” belum tentu masalah. Kode yang terlihat biasa bisa menjadi bottleneck karena dipanggil jutaan kali.


3. Define Performance Goal

Performance goal harus konkret.

Buruk:

Make it faster.

Lebih baik:

List 100,000 cases under 500 ms on developer laptop.

Atau:

CSV import of 1 million rows should finish under 60 seconds and use under 1 GB memory.

Atau:

API p95 latency for case lookup should be under 100 ms.

Goal harus menyebut:

  • operasi;
  • ukuran input;
  • environment;
  • metric;
  • target;
  • constraints.

4. Metrics: Latency, Throughput, CPU, Memory

MetricPertanyaan
LatencyBerapa lama satu operasi selesai?
ThroughputBerapa banyak operasi per waktu?
CPU timeBerapa banyak CPU dipakai?
Wall-clock timeBerapa waktu nyata dari perspektif user?
Memory peakBerapa puncak memory?
Allocation rateBerapa banyak object dialokasikan?
I/O waitBerapa lama menunggu file/network/database?
Error rateApakah optimisasi meningkatkan failure?
Tail latencyBagaimana p95/p99, bukan hanya average?

Contoh:

  • CLI command peduli wall-clock latency.
  • Batch job peduli throughput dan memory peak.
  • Web API peduli p95/p99 latency.
  • Worker peduli throughput dan retry/error rate.
  • Data pipeline peduli memory dan backpressure.

5. Wall-Clock vs CPU Time

Wall-clock time:

waktu nyata yang berlalu.

CPU time:

waktu CPU yang dikonsumsi process/thread.

I/O-bound operation bisa punya wall-clock tinggi tetapi CPU rendah.

CPU-bound operation biasanya punya CPU tinggi.

Use time.perf_counter() untuk elapsed wall-clock measurement.

from time import perf_counter

start = perf_counter()
do_work()
elapsed = perf_counter() - start
print(f"Elapsed: {elapsed:.3f}s")

Untuk profiling CPU function calls, gunakan cProfile.


6. Baseline First

Sebelum optimisasi:

  1. Pastikan test passing.
  2. Buat reproducible input.
  3. Catat environment.
  4. Ukur baseline.
  5. Simpan hasil.
  6. Buat hipotesis.

Example baseline note:

# Performance Baseline: case list command

Date: 2026-06-26
Python: 3.12
Machine: laptop
Dataset: 100,000 cases JSON, 35 MB
Command: case-tracker --store cases-large.json list --no-color

Result:
- wall-clock: 2.8s
- peak memory: 260 MB
- cProfile top function: case_from_dict

Tanpa baseline, kamu tidak tahu apakah optimisasi berhasil atau hanya terasa cepat.


7. Correctness Before Speed

Optimisasi tanpa test adalah risk multiplier.

Before optimizing:

  • domain tests pass;
  • serialization tests pass;
  • failure path tests pass;
  • performance scenario has expected output;
  • benchmark input deterministic.

Example:

def test_optimized_index_matches_baseline() -> None:
    cases = make_many_cases()

    assert build_index_optimized(cases) == build_index_baseline(cases)

Optimization should preserve behavior unless deliberately changing semantics.


8. Big-O Before Micro-Optimization

Algorithmic complexity often dominates.

Example bad lookup:

def get_case(cases: list[Case], case_id: CaseId) -> Case:
    for case in cases:
        if case.id == case_id:
            return case

    raise CaseNotFoundError(case_id)

For one lookup in 100 cases, fine.

For 100,000 lookups over 100,000 cases, terrible:

O(n * m)

Index:

case_by_id = {case.id: case for case in cases}

Lookup:

case_by_id[case_id]

Now:

O(n) build + O(1) average lookup

Algorithm change beats micro-optimization.


9. Example: O(n²) Hidden Bottleneck

Bad:

def find_duplicate_case_ids(cases: list[Case]) -> set[CaseId]:
    duplicates: set[CaseId] = set()

    for case in cases:
        count = sum(1 for other in cases if other.id == case.id)

        if count > 1:
            duplicates.add(case.id)

    return duplicates

This is O(n²).

Better:

from collections import Counter


def find_duplicate_case_ids(cases: list[Case]) -> set[CaseId]:
    counts = Counter(case.id for case in cases)
    return {case_id for case_id, count in counts.items() if count > 1}

This is O(n).


10. Timing with perf_counter

Use for application-level timing.

from time import perf_counter


def measure_load_cases(path: Path) -> list[Case]:
    start = perf_counter()

    try:
        return load_cases(path)
    finally:
        elapsed_ms = (perf_counter() - start) * 1000
        logger.info("event=load_cases_completed duration_ms=%.2f", elapsed_ms)

Use finally to log duration even if error occurs, if useful.

Do not litter code with manual timers everywhere. Use targeted instrumentation or context manager.


11. Timing Context Manager

from collections.abc import Iterator
from contextlib import contextmanager
from time import perf_counter


@contextmanager
def timer(label: str) -> Iterator[None]:
    start = perf_counter()
    try:
        yield
    finally:
        elapsed = perf_counter() - start
        print(f"{label}: {elapsed:.3f}s")

Use:

with timer("load cases"):
    cases = load_cases(path)

For production, use logger instead of print.


12. timeit for Microbenchmark

timeit is useful for small snippets.

Command line:

python -m timeit "sum(range(1000))"

Python API:

from timeit import timeit

duration = timeit("sum(range(1000))", number=10000)
print(duration)

With setup:

duration = timeit(
    "target in allowed",
    setup="allowed = {'DRAFT', 'SUBMITTED', 'CLOSED'}; target = 'CLOSED'",
    number=1_000_000,
)

12.1 Microbenchmark Caveats

Microbenchmarks can mislead because:

  • they isolate code from real context;
  • CPU scaling/turbo affects results;
  • cache effects matter;
  • small differences may be noise;
  • faster micro operation may not matter in app;
  • readability trade-off may not be worth it;
  • benchmark environment matters.

Use microbenchmarks for tight loops or choosing between local implementations. Use profiling for whole program.


13. Benchmark Harness

Simple harness:

from statistics import mean, stdev
from time import perf_counter


def benchmark(function, *, repeats: int = 5) -> None:
    durations: list[float] = []

    for _ in range(repeats):
        start = perf_counter()
        function()
        durations.append(perf_counter() - start)

    print(f"mean={mean(durations):.4f}s")
    print(f"stdev={stdev(durations):.4f}s")
    print(f"min={min(durations):.4f}s")

Use with deterministic input.

For serious benchmarking, consider dedicated tools and controlled environment. But simple harness is better than guessing.


14. Profiling with cProfile

Run module:

python -m cProfile -o profile.out -m case_tracker --store cases.json list

Then inspect:

python -m pstats profile.out

Programmatic:

import cProfile
import pstats

profiler = cProfile.Profile()
profiler.enable()

run_workload()

profiler.disable()

stats = pstats.Stats(profiler)
stats.sort_stats("cumtime").print_stats(20)

cProfile tells how often functions were called and how much time they took.


15. Reading cProfile Output

Columns often include:

ColumnMeaning
ncallsNumber of calls
tottimeTime spent in function excluding subcalls
percallTime per call
cumtimeTime in function including subcalls
filename:lineno(function)Function location

Interpretation:

  • High cumtime: function and children take time.
  • High tottime: function body itself takes time.
  • High ncalls: maybe called too often.
  • High percall: expensive single call.

Focus on high cumulative time and high call count.


16. tottime vs cumtime

Example:

def outer():
    inner()


def inner():
    expensive_work()

outer may have high cumtime but low tottime.

inner or expensive_work may have high tottime.

Optimization target depends:

  • If outer orchestrates too many calls, reduce calls.
  • If inner itself slow, optimize inner.
  • If subcall is I/O, change I/O strategy.

17. Profiling Example: Case Loading

Suppose profile shows:

ncalls  tottime  cumtime function
100000  0.300    1.800   case_from_dict
100000  0.900    0.900   CaseId.__post_init__
1       0.100    2.500   load_cases

Possible hypotheses:

  1. CaseId.__post_init__ normalization repeated.
  2. Validation too expensive.
  3. JSON parsing might not be top bottleneck.
  4. Object construction dominates.

Actions:

  • avoid unnecessary re-normalization;
  • optimize validation;
  • use simpler representation if needed;
  • use streaming if memory issue;
  • consider SQLite if repeated loads dominate.

But measure after each change.


18. Sampling Profilers and Flame Graph Mindset

cProfile is deterministic profiler. Sampling profilers periodically sample stack traces.

Sampling profiler benefits:

  • lower overhead;
  • good for long-running apps;
  • flame graph visualization;
  • can attach to running process depending tool.

Even if you do not use flame graphs yet, learn the mental model:

wide frame = more time spent there
stack depth = call chain

Flame graphs help identify where time accumulates.


19. Line-Level Profiling

Sometimes function-level profile is not enough.

Options:

  • manual timing around sections;
  • line profilers via external tools;
  • break function into meaningful subfunctions;
  • use sampling profiler with line support if available.

Before using specialized line profiler, ask if function is too large.

If line-level profiling is needed often, code may need clearer boundaries.


20. Memory Profiling Preview

Performance can be memory-bound.

Symptoms:

  • high memory usage;
  • swapping;
  • GC pauses;
  • slow due to allocation churn;
  • huge intermediate lists;
  • reading full files unnecessarily.

Tools/concepts:

  • tracemalloc;
  • sys.getsizeof;
  • process RSS via external tools;
  • streaming;
  • generators;
  • data-oriented representation;
  • object overhead.

Part 024 goes deeper.


21. I/O Profiling

If wall-clock high but CPU profile low, bottleneck may be I/O.

Examples:

  • reading large file;
  • network call;
  • database query;
  • subprocess;
  • filesystem latency;
  • logging too much;
  • terminal output.

For CLI list command, printing 100,000 lines may dominate.

Test:

case-tracker list > /dev/null

If faster, terminal output is bottleneck.

Or profile with count/report modes instead of printing everything.


22. Database/Network Bottlenecks

In real services, Python code may not be bottleneck.

Could be:

  • missing database index;
  • N+1 query;
  • slow network;
  • DNS;
  • TLS handshake;
  • connection pool exhaustion;
  • API rate limits;
  • serialization overhead;
  • large response payloads.

Do not optimize Python loops if database query is 95% of latency.

Measure boundary timings.


23. N+1 Pattern

N+1 query:

cases = repository.list_cases()

for case in cases:
    case.notes = repository.list_notes(case.id)

This does 1 query for cases + N queries for notes.

Better:

cases = repository.list_cases_with_notes()

or batch:

notes_by_case_id = repository.list_notes_for_cases([case.id for case in cases])

N+1 exists beyond database too:

  • HTTP call per item;
  • file read per item;
  • permission lookup per item.

Batching often beats concurrency.


24. Caching

Caching can improve performance but introduces complexity.

Example:

from functools import lru_cache


@lru_cache(maxsize=1024)
def parse_case_status(raw_status: str) -> CaseStatus:
    return CaseStatus(raw_status.strip().upper())

This may be unnecessary because parsing enum is cheap.

Good caching candidates:

  • expensive pure function;
  • repeated same input;
  • stable result;
  • bounded memory;
  • invalidation clear.

Bad caching candidates:

  • mutable output;
  • user-specific sensitive data;
  • rapidly changing data;
  • unbounded key space;
  • function with side effects;
  • hidden global state.

25. Batch Processing

Instead of:

for case in cases:
    save_case(case)

Maybe:

save_cases(cases)

Batch benefits:

  • fewer I/O calls;
  • fewer transactions;
  • fewer network round trips;
  • better compression;
  • easier atomicity.

Trade-offs:

  • memory bigger;
  • partial failure handling harder;
  • latency for individual item may increase;
  • transaction size matters.

26. Streaming

Instead of reading all:

data = path.read_text(encoding="utf-8")

For huge files:

with path.open("r", encoding="utf-8") as file:
    for line in file:
        process(line)

Streaming reduces memory but complicates:

  • error recovery;
  • multiple passes;
  • sorting/grouping;
  • progress reporting;
  • transaction boundary.

Use streaming when data size justifies it.


27. Vectorization and Native Extensions

For numeric/data workloads, Python loops may be bottleneck.

Options:

  • use built-in operations;
  • use standard library optimized functions;
  • use NumPy/Pandas if appropriate;
  • use database for aggregation;
  • use C/Rust extension if extreme;
  • use multiprocessing;
  • use PyPy if compatible;
  • use algorithm/data structure changes.

Example:

sum(values)

often better than manual loop for readability and speed.

But do not add heavy dependency for trivial math.


28. Built-ins Are Often Fast

Prefer built-ins when clear:

any(...)
all(...)
sum(...)
min(...)
max(...)
sorted(...)
set(...)
dict(...)

Example:

has_closed = any(case.status is CaseStatus.CLOSED for case in cases)

Better than manual flag loop unless custom behavior needed.

Built-ins are implemented efficiently and idiomatically.


29. Avoiding Unnecessary Work

Fastest code is code not executed.

Examples:

  • validate once at boundary;
  • build index once, not per lookup;
  • early return;
  • short-circuit with any/all;
  • lazy iteration;
  • avoid duplicate serialization;
  • avoid repeated regex compilation;
  • avoid repeated file reads;
  • avoid logging expensive messages at disabled level;
  • avoid re-parsing config per request.

30. Regex Compilation

If regex used many times:

CASE_ID_PATTERN = re.compile(r"^CASE-\d+$")


def is_case_id(value: str) -> bool:
    return bool(CASE_ID_PATTERN.match(value))

Do not compile inside hot loop:

def is_case_id(value: str) -> bool:
    return bool(re.compile(r"^CASE-\d+$").match(value))

But if called rarely, performance difference irrelevant. Clarity still favors module-level pattern for named rule.


31. Allocation Awareness

Creating many temporary objects can cost time and memory.

Example:

closed_cases = [case for case in cases if case.status is CaseStatus.CLOSED]
return len(closed_cases)

Better:

return sum(1 for case in cases if case.status is CaseStatus.CLOSED)

Avoids intermediate list.

But if you need the list later, create it once.


32. Case Tracker Performance Scenario

Scenario:

Load 100,000 cases from JSON, list summaries.

Possible bottlenecks:

  1. File read.
  2. JSON parse.
  3. Dict validation.
  4. Domain object construction.
  5. Summary string construction.
  6. Terminal output.
  7. Logging.
  8. Memory allocation.

Measurement plan:

A. Time load_cases only.
B. Time render summaries without printing.
C. Time printing to stdout redirected.
D. Profile full command.
E. Measure memory peak with tracemalloc.

Do not guess.


33. Case Tracker Benchmark Dataset

Create deterministic dataset:

import json
from pathlib import Path


def generate_cases(path: Path, count: int) -> None:
    cases = [
        {
            "id": f"CASE-{index:06d}",
            "title": f"Case {index}",
            "status": "DRAFT",
            "notes": [],
        }
        for index in range(count)
    ]

    path.write_text(json.dumps(cases), encoding="utf-8")

Use:

python scripts/generate_cases.py --count 100000 --output cases-large.json

Keep dataset generation deterministic.


34. Case Tracker Timing Script

from pathlib import Path
from time import perf_counter

from case_tracker.storage import load_cases


def main() -> None:
    path = Path("cases-large.json")

    start = perf_counter()
    cases = load_cases(path)
    elapsed = perf_counter() - start

    print(f"Loaded {len(cases)} cases in {elapsed:.3f}s")


if __name__ == "__main__":
    main()

Run multiple times. Record min/mean.


35. Case Tracker cProfile

python -m cProfile -o load_cases.prof scripts/measure_load_cases.py
python -m pstats load_cases.prof

Inside pstats:

sort cumtime
stats 20

Look at:

  • json.loads;
  • case_from_dict;
  • CaseId;
  • CaseStatus;
  • validation helpers;
  • dataclass init.

36. Optimization Example: Repeated Lookup

Bad service:

def add_notes_to_cases(cases: list[Case], notes: list[Note]) -> None:
    for note in notes:
        case = get_case_from_list(cases, note.case_id)
        case.add_note(note.text)

If both lists large, O(n*m).

Better:

def add_notes_to_cases(cases: list[Case], notes: list[Note]) -> None:
    case_by_id = {case.id: case for case in cases}

    for note in notes:
        case = case_by_id[note.case_id]
        case.add_note(note.text)

This is typical Python performance win: choose correct data structure.


37. Optimization Example: Printing

Printing line-by-line can be slow.

for summary in summaries:
    print(summary)

For large output, build and write once:

import sys

sys.stdout.write("\n".join(summaries))
sys.stdout.write("\n")

But this materializes all summaries.

Streaming alternative:

for summary in summaries:
    sys.stdout.write(summary)
    sys.stdout.write("\n")

Measure. Terminal output itself may dominate.


38. Optimization Example: JSON Format

Pretty JSON:

json.dumps(data, indent=2)

Readable but larger/slower.

Compact:

json.dumps(data, separators=(",", ":"))

Trade-off:

  • pretty useful for manual editing/debugging;
  • compact smaller/faster;
  • for large data, compact may matter;
  • for user-editable store, pretty may be worth it.

Measure and choose.


39. Optimization Example: Lazy Summaries

Eager:

summaries = [render_case_summary(case) for case in cases]
write_output(summaries)

Lazy:

summaries = (render_case_summary(case) for case in cases)
write_output(summaries)

If write_output streams iterable, lazy reduces memory.

def write_lines(lines: Iterable[str]) -> None:
    for line in lines:
        print(line)

40. Optimization Example: Avoid Re-Parsing

Bad:

def get_case(path: Path, case_id: CaseId) -> Case:
    cases = load_cases(path)
    ...


def transition_case(path: Path, case_id: CaseId, status: CaseStatus) -> Case:
    case = get_case(path, case_id)
    cases = load_cases(path)
    ...

Loads file twice.

Better:

def transition_case(path: Path, case_id: CaseId, status: CaseStatus) -> Case:
    cases = load_cases(path)
    case = get_case_from_list(cases, case_id)
    ...

For service class/repository, manage load/save coherently.


41. Performance Regression Tests

Unit tests should not usually assert strict timings because environments vary.

But you can have optional benchmark tests.

Example marker:

@pytest.mark.benchmark
def test_load_cases_large_dataset_performance():
    ...

Run manually or in controlled CI.

Better regression guard:

  • algorithmic tests;
  • complexity-aware tests;
  • profile in performance suite;
  • benchmark dashboard for serious systems.

Avoid flaky timing asserts:

assert elapsed < 0.01

unless environment controlled.


42. Optimization Documentation

When optimizing, document:

# Optimization: case lookup index

## Problem

Transitioning 10,000 cases with notes used repeated list scan.

## Baseline

100k cases, 100k notes: 74s

## Change

Build dict index once.

## Result

100k cases, 100k notes: 1.8s

## Trade-offs

Uses additional memory proportional to number of cases.
Duplicate ids now rejected explicitly.

This helps future maintainers understand why code changed.


43. Performance Smell Checklist

Watch for:

  1. Repeated list scans in loops.
  2. O(n²) duplicate detection.
  3. Reading same file repeatedly.
  4. Query/API call inside loop.
  5. Building large intermediate lists unnecessarily.
  6. Sorting repeatedly.
  7. Regex compiled in hot loop.
  8. Logging expensive debug values.
  9. Pretty serialization for huge data without reason.
  10. Printing huge output line-by-line without measurement.
  11. Catch-all caching without invalidation.
  12. Threading CPU-bound code.
  13. Multiprocessing tiny tasks with huge serialization overhead.
  14. Async blocking I/O.
  15. Optimization without before/after measurement.

44. Practice: Benchmark List vs Set Membership

Use timeit:

from timeit import timeit

list_time = timeit(
    "'CLOSED' in statuses",
    setup="statuses = ['DRAFT', 'SUBMITTED', 'UNDER_REVIEW', 'ESCALATED', 'CLOSED']",
    number=1_000_000,
)

set_time = timeit(
    "'CLOSED' in statuses",
    setup="statuses = {'DRAFT', 'SUBMITTED', 'UNDER_REVIEW', 'ESCALATED', 'CLOSED'}",
    number=1_000_000,
)

print(list_time, set_time)

Then answer:

  1. Is difference meaningful for tiny status set?
  2. Is set still semantically better for membership?
  3. Would this matter in app profile?

45. Practice: Profile load_cases

  1. Generate 100,000 cases.
  2. Time load_cases.
  3. Profile with cProfile.
  4. Print top 20 by cumulative time.
  5. Identify top 3 functions.
  6. Write one hypothesis.
  7. Optimize one thing.
  8. Measure again.

Do not optimize more than one thing at once.


46. Practice: Fix O(n²)

Implement both:

def find_duplicates_slow(cases: list[Case]) -> set[CaseId]:
    ...


def find_duplicates_fast(cases: list[Case]) -> set[CaseId]:
    ...

Test same result.

Benchmark with 1k, 10k, 100k cases.

Observe growth.


47. Practice: Output Bottleneck

Measure:

case-tracker --store cases-large.json list
case-tracker --store cases-large.json list > /dev/null

If redirected output is much faster/slower depending platform, analyze stdout/terminal cost.

Add option:

case-tracker summary

that prints counts instead of all rows.


48. Self-Check

Jawab tanpa melihat materi:

  1. Apa prinsip measurement before optimization?
  2. Kenapa performance goal harus konkret?
  3. Apa beda latency dan throughput?
  4. Apa beda wall-clock dan CPU time?
  5. Kapan memakai perf_counter?
  6. Kapan memakai timeit?
  7. Apa caveat microbenchmark?
  8. Apa fungsi cProfile?
  9. Apa beda tottime dan cumtime?
  10. Apa arti ncalls?
  11. Kenapa Big-O lebih penting dari micro-optimization?
  12. Apa contoh O(n²) tersembunyi?
  13. Apa itu N+1 pattern?
  14. Kapan caching berguna?
  15. Kapan caching berbahaya?
  16. Kenapa built-ins sering baik?
  17. Kenapa terminal output bisa jadi bottleneck?
  18. Kenapa performance tests mudah flaky?
  19. Apa yang harus didokumentasikan setelah optimisasi?
  20. Apa performance smell paling umum di Python?

49. Definition of Done Part 023

Kamu selesai part ini jika bisa:

  1. Mendefinisikan performance goal konkret.
  2. Membuat baseline.
  3. Mengukur dengan perf_counter.
  4. Membuat microbenchmark dengan timeit.
  5. Menjalankan cProfile.
  6. Membaca pstats.
  7. Membedakan tottime dan cumtime.
  8. Mengidentifikasi O(n²).
  9. Mengganti repeated lookup dengan dict index.
  10. Menjelaskan N+1 pattern.
  11. Menjelaskan caching trade-off.
  12. Mengukur output bottleneck.
  13. Menulis optimization note.
  14. Menghindari timing test flaky.
  15. Menjelaskan kenapa correctness harus dijaga sebelum optimisasi.

50. Ringkasan

Performance engineering adalah proses berbasis bukti.

Inti part ini:

  • jangan optimisasi tanpa measurement;
  • goal harus konkret;
  • latency, throughput, CPU, memory, dan I/O adalah metric berbeda;
  • baseline wajib sebelum perubahan;
  • test correctness harus ada sebelum optimisasi;
  • algoritma dan data structure sering mengalahkan micro-optimization;
  • perf_counter untuk timing aplikasi;
  • timeit untuk microbenchmark;
  • cProfile/pstats untuk function-level profiling;
  • tottime, cumtime, dan ncalls membantu membaca bottleneck;
  • I/O/database/network sering lebih penting daripada Python loop;
  • caching, batching, streaming, dan indexing punya trade-off;
  • dokumentasikan optimisasi dan hasilnya.

Part berikutnya membahas memory model, object overhead, allocation, sys.getsizeof, tracemalloc, __slots__, dataclass memory, dan data-oriented Python.


51. Referensi

  • Python Documentation — timeit.
  • Python Documentation — time.perf_counter.
  • Python Documentation — profile and cProfile.
  • Python Documentation — pstats.
  • Python Documentation — tracemalloc.
  • Python Documentation — functools.lru_cache.
Lesson Recap

You just completed lesson 23 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.