Start HereOrdered learning track

Learn Java Dsa Part 002 Complexity As Engineering Contract

[]18 min read3405 words

In This Lesson

1. Complexity Bukan Sekadar Notasi 2. Cost Model: Apa yang Kita Hitung?3. Big-O: Upper Bound Pertumbuhan

Lesson 0235 lesson track01–06 Start Here

title: Learn Java Data Structures and Algorithms - Part 002 description: Big-O, amortized analysis, lower bounds, constant factors, and Java-specific cost model sebagai kontrak engineering untuk memilih struktur data dan algoritma. series: learn-java-dsa seriesTitle: Learn Java Data Structures and Algorithms order: 2 partTitle: Complexity as an Engineering Contract tags:

java
data-structures
algorithms
complexity
performance
series date: 2026-06-27

Part 002 — Complexity as an Engineering Contract

Complexity analysis sering diajarkan sebagai formalitas di akhir solusi:

Time: O(n log n)
Space: O(n)

Untuk engineer yang membangun sistem nyata, itu tidak cukup. Complexity adalah kontrak engineering: klaim tentang bagaimana solusi akan bertumbuh ketika input, traffic, atau state membesar.

Jika kontrak itu salah, sistem bisa gagal dalam bentuk:

latency spike;
CPU saturation;
memory blow-up;
GC pause;
timeout;
queue backlog;
database overload;
degraded user journey;
incident operasional.

Part ini membangun fondasi yang akan dipakai di semua part berikutnya.

1. Complexity Bukan Sekadar Notasi

Ketika kita mengatakan:

lookup in HashMap is O(1) average

itu bukan mantra. Itu klaim dengan asumsi:

hash function menyebarkan key cukup baik;
equals() dan hashCode() benar;
load factor dijaga;
tidak ada adversarial collision yang ekstrem;
resize cost diamortisasi;
memory cukup;
tidak ada contention concurrent di luar struktur tersebut.

Jika asumsi berubah, kontraknya berubah.

Jadi complexity harus dibaca sebagai:

Under these assumptions, this operation has this growth behavior.

2. Cost Model: Apa yang Kita Hitung?

Sebelum menganalisis algoritma, tentukan dulu model biayanya.

2.1 Unit-Cost RAM Model

Model klasik DSA biasanya mengasumsikan operasi dasar bernilai konstan:

assignment;
arithmetic;
array access;
comparison;
pointer/reference read;
method call sederhana.

Model ini berguna untuk reasoning awal, tetapi terlalu sederhana untuk Java production.

2.2 Java Runtime Cost Model

Di Java, biaya nyata dapat mencakup:

Cost	Contoh
Allocation	membuat object baru, node linked list, wrapper Integer
Indirection	membaca object melalui reference
Cache Miss	traversal pointer-heavy structure
Branch Misprediction	conditional logic tidak predictable
Virtual Dispatch	pemanggilan method polymorphic tertentu
Boxing/Unboxing	`int` menjadi `Integer` dan sebaliknya
Comparator Cost	sorting object dengan comparator kompleks
GC Pressure	banyak object short-lived
Synchronization	lock, CAS retry, contention
Bounds Check	akses array dengan index

Artinya, dua algoritma sama-sama O(n) bisa punya performa sangat berbeda.

Contoh:

int sumArray(int[] values) {
    int sum = 0;
    for (int value : values) {
        sum += value;
    }
    return sum;
}

versus:

int sumList(List<Integer> values) {
    int sum = 0;
    for (Integer value : values) {
        sum += value;
    }
    return sum;
}

Keduanya O(n), tetapi versi List<Integer> dapat membawa biaya boxing, indirection, layout object, dan cache locality yang lebih buruk tergantung implementasi list dan asal data.

3. Big-O: Upper Bound Pertumbuhan

Big-O menyatakan batas atas pertumbuhan fungsi biaya.

Secara praktis:

O(f(n)) means the cost grows no faster than a constant multiple of f(n), after n is large enough.

Contoh:

boolean contains(int[] values, int target) {
    for (int value : values) {
        if (value == target) {
            return true;
        }
    }
    return false;
}

Worst-case: target tidak ada atau berada di akhir array.

T(n) = c1 * n + c2
T(n) = O(n)

Big-O mengabaikan constant factor dan lower-order term:

5n + 100      -> O(n)
2n log n + n -> O(n log n)
n^2 + 100n   -> O(n^2)

Namun untuk engineering, jangan buang constant factor terlalu cepat. Untuk n kecil, constant factor bisa dominan.

4. Big-O, Big-Theta, Big-Omega

Tiga notasi yang sering tertukar:

Notasi	Makna Praktis
O(f(n))	upper bound: tidak lebih buruk dari f(n) dalam skala besar
Ω(f(n))	lower bound: setidaknya sebesar f(n) dalam skala besar
Θ(f(n))	tight bound: tumbuh sekelas f(n) dari atas dan bawah

Contoh linear scan:

int indexOf(int[] values, int target) {
    for (int i = 0; i < values.length; i++) {
        if (values[i] == target) {
            return i;
        }
    }
    return -1;
}

Best-case:

target at index 0 -> Θ(1)

Worst-case:

target absent -> Θ(n)

Jika kita hanya bilang O(n), itu benar tetapi tidak lengkap. Dalam desain sistem, best-case jarang menjadi kontrak yang aman.

5. Worst-Case, Average-Case, Expected, Amortized

Empat istilah ini harus dipisahkan.

5.1 Worst-Case

Biaya maksimum untuk input berukuran n.

Contoh:

boolean hasDuplicate(int[] values) {
    for (int i = 0; i < values.length; i++) {
        for (int j = i + 1; j < values.length; j++) {
            if (values[i] == values[j]) {
                return true;
            }
        }
    }
    return false;
}

Worst-case terjadi ketika tidak ada duplicate.

comparisons = n(n - 1) / 2
complexity = Θ(n^2)

5.2 Average-Case

Rata-rata terhadap distribusi input tertentu.

Masalahnya: distribusi input harus jelas.

Average over what?
Uniform random input?
Real production traffic?
Adversarial user input?
Sorted data?
Skewed tenant data?

Tanpa distribusi, average-case sering menjadi klaim kosong.

5.3 Expected Time

Expected time biasanya muncul pada algoritma randomized.

Contoh randomized quicksort memiliki expected O(n log n) jika pivot random membuat pembagian cukup seimbang secara probabilistik. Namun worst-case tetap O(n^2).

5.4 Amortized Time

Amortized bukan average input. Amortized berarti biaya mahal sesekali disebarkan ke banyak operasi dalam sequence.

Contoh: dynamic array addLast.

Kebanyakan append O(1).
Saat capacity penuh, resize dan copy O(n).
Dalam sequence panjang, total biaya append tetap linear.
Maka append adalah amortized O(1).

6. Amortized Analysis: Dynamic Array

Misalkan dynamic array mulai dengan capacity 1 dan selalu double capacity ketika penuh.

Operasi append:

final class IntArray {
    private int[] elements = new int[1];
    private int size;

    void add(int value) {
        if (size == elements.length) {
            elements = Arrays.copyOf(elements, elements.length * 2);
        }
        elements[size++] = value;
    }
}

Untuk menambahkan 8 elemen:

Append ke-	Capacity Sebelum	Copy?	Elemen Dicopy
1	1	no	0
2	1	yes	1
3	2	yes	2
4	4	no	0
5	4	yes	4
6	8	no	0
7	8	no	0
8	8	no	0

Total copy sampai n append kurang dari:

1 + 2 + 4 + 8 + ... + n/2 < n

Total operasi insert asli n, total copy < n, sehingga total < 2n.

n append = O(n)
1 append amortized = O(1)

Kontraknya:

addLast is amortized O(1), but individual add may be O(n) during resize.

Ini penting untuk latency-sensitive systems. Amortized O(1) tidak berarti tidak ada spike.

7. Capacity Growth: Mengapa Faktor Pertumbuhan Penting?

Jika capacity bertambah satu per satu:

elements = Arrays.copyOf(elements, elements.length + 1);

Maka total copy untuk n append:

1 + 2 + 3 + ... + (n - 1) = n(n - 1)/2 = Θ(n^2)

Jika capacity dikalikan dua:

1 + 2 + 4 + ... < 2n = Θ(n)

Trade-off doubling:

Growth Strategy	Append Amortized	Memory Waste	Resize Frequency
+1	O(n)	low	very high
x1.5	O(1)	moderate	moderate
x2	O(1)	higher	low

Engineering implication:

growth terlalu kecil membuat CPU/copy mahal;
growth terlalu besar membuang memory;
untuk workload besar, pre-sizing sering lebih baik.

Contoh:

List<Order> orders = new ArrayList<>(expectedOrderCount);

Pre-sizing bukan micro-optimization jika cardinality diketahui dan list besar.

8. Hidden Complexity in Java Collections

8.1 `List.contains` Bisa Menjadi Trap

boolean allAllowed(List<String> inputs, List<String> allowed) {
    for (String input : inputs) {
        if (!allowed.contains(input)) {
            return false;
        }
    }
    return true;
}

Jika allowed adalah ArrayList, maka contains adalah linear scan.

inputs size = n
allowed size = m
complexity = O(n * m)

Untuk membership check, gunakan HashSet bila ordering tidak diperlukan:

boolean allAllowed(List<String> inputs, Collection<String> allowedValues) {
    Set<String> allowed = new HashSet<>(allowedValues);
    for (String input : inputs) {
        if (!allowed.contains(input)) {
            return false;
        }
    }
    return true;
}

Complexity:

build set: O(m) average
checks: O(n) average
total: O(n + m) average

Tetapi ada trade-off: memory tambahan O(m).

8.2 `remove` While Iterating

for (String item : items) {
    if (shouldRemove(item)) {
        items.remove(item);
    }
}

Masalah:

bisa ConcurrentModificationException;
untuk ArrayList, remove menggeser elemen O(n);
repeated remove bisa O(n^2).

Lebih baik:

items.removeIf(this::shouldRemove);

Atau buat list baru bila ingin explicit filtering:

List<String> kept = new ArrayList<>();
for (String item : items) {
    if (!shouldRemove(item)) {
        kept.add(item);
    }
}

8.3 Sorting with Expensive Comparator

users.sort(Comparator.comparing(user -> expensiveScore(user)));

Sorting comparison-based biasanya O(n log n) comparisons. Jika comparator mahal, total cost bisa didominasi expensiveScore.

Lebih baik precompute key jika score deterministic:

record ScoredUser(User user, int score) {}

List<ScoredUser> scored = new ArrayList<>(users.size());
for (User user : users) {
    scored.add(new ScoredUser(user, expensiveScore(user)));
}

scored.sort(Comparator.comparingInt(ScoredUser::score));

Trade-off:

waktu comparator turun;
memory naik O(n);
kode lebih panjang;
harus pastikan score tidak berubah selama sort.

9. Data Size Regimes

Complexity harus dibaca bersama ukuran data.

N	Yang Biasanya Aman	Yang Mulai Berbahaya
10	hampir apa pun	overengineering
100	O(n²) sering masih aman	struktur terlalu kompleks
10,000	O(n²) mulai berbahaya	nested scan
1,000,000	O(n log n) / O(n) lebih masuk akal	boxing besar, memory overhead
100,000,000	memory layout kritis	object-per-element

Ini bukan angka absolut. Machine, JVM, workload, dan SLA menentukan hasil. Namun tabel ini membantu intuition.

Contoh:

n = 1,000
n² = 1,000,000

n = 1,000,000
n² = 1,000,000,000,000

Perbedaan dari O(n) ke O(n²) bukan “sedikit lebih lambat”. Pada skala tertentu, itu mengubah sistem dari feasible menjadi mustahil.

10. Common Complexity Classes

Complexity	Nama Praktis	Contoh
O(1)	constant	array access, stack push amortized
O(log n)	logarithmic	binary search, heap height, balanced tree lookup
O(n)	linear	scan array
O(n log n)	linearithmic	comparison sorting optimal class
O(n²)	quadratic	pairwise comparison
O(n³)	cubic	naive matrix multiplication, Floyd-Warshall
O(2ⁿ)	exponential	subset enumeration
O(n!)	factorial	brute-force permutations

Growth intuition:

log n grows slowly
n grows directly
n log n is often acceptable for large n
n² collapses quickly at scale
2ⁿ is only for small n unless heavily pruned

11. Logarithm Mental Model

Dalam DSA, log n biasanya berarti “berapa kali kita bisa membagi problem sebelum habis”.

Binary search:

n -> n/2 -> n/4 -> n/8 -> ... -> 1

Jumlah step:

k steps until n / 2^k = 1
2^k = n
k = log2(n)

Untuk n = 1,000,000, log2(n) sekitar 20.

Itulah mengapa balanced tree lookup O(log n) sangat kuat: satu juta elemen bisa dicari dengan sekitar puluhan level, bukan jutaan scan.

Namun O(log n) tidak otomatis lebih cepat dari O(1) average hash lookup, dan tidak otomatis lebih lambat dari array scan untuk n kecil. Context matters.

12. Nested Loops: Jangan Hanya Menghitung Jumlah Loop

Tidak semua nested loop adalah O(n²).

Contoh O(n²):

for (int i = 0; i < n; i++) {
    for (int j = 0; j < n; j++) {
        work();
    }
}

Contoh tetap O(n):

int j = 0;
for (int i = 0; i < n; i++) {
    while (j < n && condition(i, j)) {
        j++;
    }
}

Jika j hanya bergerak maju dari 0 ke n sepanjang keseluruhan program, total iterasi inner loop adalah O(n), bukan O(n²). Ini disebut pola two pointers atau monotonic progress.

Pertanyaan penting:

Apakah variabel inner reset untuk setiap outer iteration?
Atau bergerak total sepanjang eksekusi?

13. Recursion Complexity

Untuk recursion, pikirkan:

berapa banyak subproblem;
ukuran setiap subproblem;
biaya combine per level;
kedalaman recursion.

13.1 Linear Recursion

int sum(int[] values, int index) {
    if (index == values.length) {
        return 0;
    }
    return values[index] + sum(values, index + 1);
}

Ada n call, tiap call O(1).

T(n) = T(n - 1) + O(1) = O(n)

Space call stack:

O(n)

Catatan Java: Java tidak menjamin tail-call optimization umum, jadi recursion dalam bisa menyebabkan StackOverflowError.

13.2 Divide and Conquer

Mergesort:

T(n) = 2T(n/2) + O(n)

Ada log n level, tiap level total kerja O(n).

T(n) = O(n log n)

13.3 Exponential Recursion

Naive Fibonacci:

long fib(int n) {
    if (n <= 1) {
        return n;
    }
    return fib(n - 1) + fib(n - 2);
}

Banyak subproblem dihitung ulang.

T(n) approximately O(2^n)

Memoization mengubahnya menjadi O(n) time dan O(n) space.

14. Space Complexity

Space complexity bukan hanya array tambahan.

Hitung:

output space jika relevan;
auxiliary data structure;
recursion stack;
object overhead;
boxing;
cached/precomputed values;
temporary arrays during sorting;
retained references yang mencegah GC.

Contoh:

Set<Integer> unique = new HashSet<>(values.length);
for (int value : values) {
    unique.add(value);
}

Secara algoritmik:

O(n) extra space

Secara Java runtime:

Integer boxing jika value berasal dari int;
hash table bucket/table overhead;
node/object overhead tergantung implementation;
load factor membuat capacity lebih besar dari size.

Untuk data sangat besar, “O(n) space” harus diterjemahkan menjadi estimasi memory kasar.

15. Memory Budget Estimation

Misalkan kita menyimpan 10 juta integer.

Dengan int[]:

10,000,000 * 4 bytes = about 40 MB plus array header

Dengan List<Integer>:

backing array menyimpan references;
setiap Integer adalah object;
ada object header dan alignment;
locality lebih buruk.

Estimasi kasar bisa jauh lebih besar dari 40 MB.

Pelajaran:

O(n) space is not enough information when n is large.

Gunakan primitive array atau primitive-specialized collection bila memory menjadi constraint utama.

16. Lower Bounds: Batas yang Tidak Bisa Ditembus

Lower bound membantu mencegah pencarian solusi mustahil.

16.1 Finding Maximum

Untuk menemukan maksimum dari unsorted array, setiap elemen harus dilihat setidaknya sekali.

Lower bound = Ω(n)

Tidak ada algoritma general yang bisa menjamin maximum tanpa membaca semua elemen.

16.2 Comparison Sorting

Sorting berbasis comparison memiliki lower bound:

Ω(n log n)

Artinya, jika hanya boleh membandingkan dua elemen, kita tidak bisa general-purpose sort lebih baik dari n log n dalam model comparison.

Untuk melewati batas ini, harus memakai asumsi tambahan:

integer range kecil -> counting sort;
fixed-width digits -> radix sort;
distribution tertentu -> bucket sort.

16.3 Membership Query

Jika data unsorted dan hanya query sekali:

scan O(n)

Jika query banyak kali:

preprocess into HashSet O(n), then query O(1) average

Lower bound memaksa kita bertanya:

Apakah biaya preprocessing layak dibanding jumlah query?

17. Preprocessing Trade-Off

Banyak algoritma production memakai pola:

pay once, query many times

Contoh:

final class UserDirectory {
    private final Map<String, User> byEmail;

    UserDirectory(List<User> users) {
        this.byEmail = new HashMap<>(users.size() * 2);
        for (User user : users) {
            byEmail.put(user.email(), user);
        }
    }

    Optional<User> findByEmail(String email) {
        return Optional.ofNullable(byEmail.get(email));
    }
}

Trade-off:

Approach	Build	Query	Space
Scan list	O(1)	O(n)	O(1) extra
Hash index	O(n) average	O(1) average	O(n)
Tree index	O(n log n) or O(n) if sorted build	O(log n)	O(n)

Decision depends on:

jumlah query;
update frequency;
memory budget;
need for ordered/range query;
consistency requirement.

18. Complexity of Multiple Operations

Jangan analisis per method secara terisolasi jika workload adalah sequence operasi.

Contoh API:

interface EventStore {
    void append(Event event);
    List<Event> latest(int limit);
    void deleteOlderThan(Instant threshold);
}

Pertanyaan complexity:

Berapa biaya append?
Berapa biaya latest?
Berapa biaya cleanup?
Apakah cleanup eager atau lazy?
Apakah biaya cleanup muncul di request path?
Apakah operasi satu tenant bisa memengaruhi tenant lain?

Sering kali desain terbaik bukan operasi individual tercepat, tetapi distribusi biaya paling aman.

19. Latency vs Throughput

Complexity sering bicara total work, tetapi sistem production peduli latency dan throughput.

Concern	Pertanyaan
Latency	Berapa lama satu request menunggu?
Throughput	Berapa banyak request per detik?
Tail Latency	Bagaimana p95/p99/p999?
Burst Behavior	Apa yang terjadi saat traffic spike?
Backpressure	Apakah queue tumbuh tak terkendali?

Amortized O(1) bisa bagus untuk throughput tetapi buruk untuk tail latency jika resize besar terjadi di request kritis.

Mitigasi:

pre-size collections;
move expensive maintenance off critical path;
chunk work;
cap per-request processing;
use bounded queues;
use incremental cleanup.

20. Complexity and Invariants

Complexity sering bergantung pada invariant.

Binary Search

O(log n) bergantung pada invariant:

array is sorted under the same ordering used by search

Jika array tidak sorted, binary search tidak hanya lambat; ia salah.

Heap

poll O(log n) bergantung pada invariant:

parent priority <= child priority

Jika elemen di dalam heap dimutasi setelah insertion, heap property bisa rusak.

HashMap

Average O(1) bergantung pada:

hashCode stable while key is in map
if a.equals(b), then a.hashCode() == b.hashCode()

Jika key mutable dan field yang dipakai hash berubah, lookup bisa gagal.

Contoh:

final class MutableKey {
    String value;

    MutableKey(String value) {
        this.value = value;
    }

    @Override
    public boolean equals(Object other) {
        return other instanceof MutableKey that && Objects.equals(value, that.value);
    }

    @Override
    public int hashCode() {
        return Objects.hash(value);
    }
}

Map<MutableKey, String> map = new HashMap<>();
MutableKey key = new MutableKey("A");
map.put(key, "stored");

key.value = "B";

System.out.println(map.get(key)); // likely null

Complexity contract runtuh karena correctness contract runtuh.

21. Constant Factor: Kapan Penting?

Big-O mengabaikan constant factor, tetapi engineer tidak boleh selalu mengabaikannya.

Constant factor biasanya penting ketika:

n kecil tetapi operasi sering sekali;
kode berada di hot path;
operasi dasar mahal, misalnya comparator melakukan parsing;
struktur menyebabkan banyak allocation;
latency target ketat;
memory bandwidth menjadi bottleneck.

Constant factor biasanya kurang penting ketika:

growth class salah besar;
O(n²) dipakai untuk n besar;
operasi tidak berada di hot path;
readability dan correctness lebih penting;
data size kecil dan stabil.

Aturan praktis:

First choose the right complexity class.
Then measure before fighting constants.

22. Complexity Smells in Java Code Review

Cari pola berikut saat review.

22.1 Nested `contains`

for (User user : users) {
    if (allowedUsers.contains(user)) {
        result.add(user);
    }
}

Jika allowedUsers adalah list besar, smell.

22.2 Repeated Sorting

for (Request request : requests) {
    candidates.sort(comparator);
    process(request, candidates.get(0));
}

Sorting di dalam loop sering bisa diganti priority queue, pre-sort, atau incremental update.

22.3 `stream().filter().findFirst()` Repeatedly

for (String id : ids) {
    User user = users.stream()
        .filter(u -> u.id().equals(id))
        .findFirst()
        .orElseThrow();
}

Ini sering O(n*m). Buat index:

Map<String, User> byId = users.stream()
    .collect(Collectors.toMap(User::id, Function.identity()));

22.4 Accidental String Concatenation in Loop

String result = "";
for (String part : parts) {
    result += part;
}

Gunakan StringBuilder untuk repeated concatenation.

22.5 `LinkedList` as Default Queue

Queue<Job> queue = new LinkedList<>();

Untuk kebanyakan queue single-threaded, ArrayDeque lebih baik karena lebih cache-friendly dan tidak membuat node object per element.

23. Decision Table: Choosing by Operation

Need	Candidate	Typical Cost	Notes
random index access	`ArrayList`, array	O(1)	good locality
frequent append	`ArrayList`, `ArrayDeque`	amortized O(1)	watch resize spike
frequent insert middle	depends	O(n) often	maybe need tree/rope/gap buffer
membership check	`HashSet`	O(1) average	requires correct hash/equality
ordered lookup	`TreeMap`/`TreeSet`	O(log n)	comparator correctness critical
min/max repeated	`PriorityQueue`	O(log n) update	no efficient decrease-key
range sum static	prefix sum	O(1) query after O(n) build	updates expensive
range sum dynamic	Fenwick/segment tree	O(log n)	more implementation complexity
graph traversal	adjacency list	O(V+E)	representation matters
dependency resolution	topological sort	O(V+E)	cycle handling required

24. Worked Example: From O(n²) to O(n)

Problem:

Given two lists of user IDs, return IDs that appear in both.

Naive:

List<String> intersection(List<String> a, List<String> b) {
    List<String> result = new ArrayList<>();
    for (String x : a) {
        if (b.contains(x)) {
            result.add(x);
        }
    }
    return result;
}

If b is an ArrayList:

Time: O(a.size * b.size)
Space: O(k) result

Improved:

List<String> intersection(List<String> a, List<String> b) {
    Set<String> bSet = new HashSet<>(b);
    List<String> result = new ArrayList<>();

    for (String x : a) {
        if (bSet.contains(x)) {
            result.add(x);
        }
    }
    return result;
}

Complexity:

Build HashSet: O(b.size) average
Loop over a: O(a.size) average
Total: O(a.size + b.size) average
Extra space: O(b.size)

But clarify semantics:

Should duplicates from a be preserved?
Should result be unique?
Should order follow a?
What if a or b is huge?
Is String comparison case-sensitive?

A complexity improvement without semantic clarity can still be wrong.

25. Worked Example: Top-K

Problem:

Given 10 million scores, return top 100.

Option 1: sort all.

scores.sort(Comparator.reverseOrder());
return scores.subList(0, 100);

Complexity:

O(n log n)

Option 2: min-heap of size k.

List<Integer> topK(List<Integer> scores, int k) {
    PriorityQueue<Integer> heap = new PriorityQueue<>();

    for (int score : scores) {
        if (heap.size() < k) {
            heap.offer(score);
        } else if (score > heap.peek()) {
            heap.poll();
            heap.offer(score);
        }
    }

    ArrayList<Integer> result = new ArrayList<>(heap);
    result.sort(Comparator.reverseOrder());
    return result;
}

Complexity:

O(n log k + k log k)

For n = 10,000,000 and k = 100, log k is much smaller than log n.

Trade-off:

heap solution better when k is small relative to n;
sorting simpler and may be fine for moderate n;
primitive arrays may be better than List<Integer> for very large numeric data;
if scores stream in online, heap solution works incrementally.

26. Worked Example: Range Sum

Problem:

Given an array and many range sum queries [l, r], answer each query quickly.

Naive per query:

long rangeSum(int[] values, int left, int right) {
    long sum = 0;
    for (int i = left; i <= right; i++) {
        sum += values[i];
    }
    return sum;
}

Cost:

O(length of range) per query

If many queries, build prefix sum:

final class RangeSum {
    private final long[] prefix;

    RangeSum(int[] values) {
        this.prefix = new long[values.length + 1];
        for (int i = 0; i < values.length; i++) {
            prefix[i + 1] = prefix[i] + values[i];
        }
    }

    long queryInclusive(int left, int right) {
        if (left < 0 || right < left || right >= prefix.length - 1) {
            throw new IndexOutOfBoundsException();
        }
        return prefix[right + 1] - prefix[left];
    }
}

Cost:

Build: O(n)
Query: O(1)
Space: O(n)

Invariant:

prefix[i] = sum of values[0..i-1]

This invariant makes query derivation obvious:

sum(left..right) = prefix[right + 1] - prefix[left]

27. Complexity and API Design

API shape can force complexity.

Bad API:

interface UserStore {
    List<User> allUsers();
}

Caller then repeatedly scans:

User findByEmail(UserStore store, String email) {
    for (User user : store.allUsers()) {
        if (user.email().equals(email)) {
            return user;
        }
    }
    throw new NoSuchElementException();
}

Better API if lookup is first-class:

interface UserStore {
    Optional<User> findByEmail(String email);
}

Even better if semantics are explicit:

interface UserDirectory {
    Optional<User> findActiveUserByNormalizedEmail(String normalizedEmail);
}

DSA is not only inside method body. It affects service interfaces, indexes, caches, and data ownership.

28. Complexity and Domain Modelling

A poor domain model can make every algorithm expensive.

Example:

record Case(String id, List<Action> actions) {}
record Action(String id, String assigneeId, Instant createdAt) {}

Requirement:

Find all open cases assigned to a given officer.

If only stored as List<Case>, query may require scanning all cases and actions.

Potential index:

Map<String, Set<String>> caseIdsByAssignee;

But now update complexity matters:

when action assigned;
when action completed;
when case closed;
when assignee changes;
when data is rebuilt;
when index becomes inconsistent.

Complexity improvement introduces consistency invariant:

For every open case C and active action A assigned to officer O,
caseIdsByAssignee[O] contains C.id.

Indexes are data structures with correctness obligations.

29. Complexity Contract Template

Gunakan template ini dalam design doc atau code review.

Operation:
  find active session by token

Data structure:
  HashMap<Token, Session>

Assumptions:
  Token is immutable.
  Token.hashCode is stable and well-distributed.
  Map is not accessed concurrently without synchronization.
  Expected max sessions: 500k.

Complexity:
  get: O(1) average, O(n) worst-case under severe collision
  put: O(1) amortized average, occasional resize O(n)
  memory: O(n)

Failure modes:
  Mutable token breaks lookup.
  High cardinality increases memory and GC pressure.
  Resize can cause latency spike.
  No eviction causes unbounded growth.

Mitigations:
  Use immutable token type.
  Pre-size map if cardinality known.
  Add eviction/TTL.
  Monitor size and latency.

This is how complexity becomes engineering communication.

30. Anti-Patterns

30.1 “It’s Only O(n)” Without N

O(n) for n = 100 and n = 100,000,000 are different engineering realities.

Always ask:

What is n?
How fast can n grow?
Is n per request, per tenant, per day, or global?

30.2 Confusing Amortized With Worst-Case

ArrayList.add is amortized O(1)

does not mean every add is worst-case O(1).

30.3 Assuming Library Means Optimal

Java Collections are excellent general-purpose tools, but not magic. LinkedList, HashMap, TreeMap, PriorityQueue, and ConcurrentHashMap have very different contracts.

30.4 Ignoring Memory

A time-optimal solution can be unusable if memory blows up.

30.5 Ignoring Semantics

Replacing List with Set changes duplicate behavior and ordering. Complexity improvement that changes semantics is a bug.

31. Complexity Review Examples

Example A: Duplicate Detection

boolean hasDuplicate(int[] values) {
    Set<Integer> seen = new HashSet<>();
    for (int value : values) {
        if (!seen.add(value)) {
            return true;
        }
    }
    return false;
}

Analysis:

Time: O(n) average, O(n²) pathological worst-case if hash degenerates severely
Space: O(n)
Java cost: boxing int to Integer
Alternative: sort int[] then scan, O(n log n) time, O(1) or O(n) depending sort, avoids boxing if array mutable

Decision:

if input can be mutated and memory matters, sort then scan may be better;
if preserving order and average speed matter, hash set is better;
if integer range small, boolean bitmap may be best.

Example B: First Repeated Character

OptionalInt firstRepeatedAsciiChar(String s) {
    boolean[] seen = new boolean[128];
    for (int i = 0; i < s.length(); i++) {
        char c = s.charAt(i);
        if (c >= 128) {
            throw new IllegalArgumentException("non-ASCII character");
        }
        if (seen[c]) {
            return OptionalInt.of(c);
        }
        seen[c] = true;
    }
    return OptionalInt.empty();
}

Analysis:

Time: O(n)
Space: O(1), because 128 fixed-size array

But semantic constraint is explicit: ASCII only. Without that, Unicode handling changes representation and correctness.

32. How to Analyze a New Solution

Use this sequence:

Define n, m, V, E, or other variables.
Identify dominant operations.
Count loops and monotonic movement.
Analyze data structure operations used inside loops.
Separate build/preprocess/query/update costs.
Separate average, worst, expected, and amortized claims.
Analyze extra space.
Add Java-specific costs.
Validate against constraints.
State assumptions.

Example statement:

Let n be number of users and m be number of allowed IDs.
Building the HashSet costs O(m) average time and O(m) space.
For each user, membership check is O(1) average, so filtering costs O(n) average.
Total average time is O(n + m), with worst-case degradation if hashing behaves pathologically.
The solution preserves input user order but does not deduplicate users.

Notice how this is much stronger than:

Time O(n), space O(n)

33. Exercises

Exercise 1: Identify Hidden Complexity

Analyze this code:

List<Order> findOrders(List<Order> orders, List<String> targetIds) {
    List<Order> result = new ArrayList<>();
    for (Order order : orders) {
        if (targetIds.contains(order.id())) {
            result.add(order);
        }
    }
    return result;
}

Answer:

define n = orders.size();
define m = targetIds.size();
if targetIds is list, complexity O(n*m);
improvement: convert target IDs to HashSet, but preserve semantics around duplicates/order.

Exercise 2: Preprocessing Decision

You receive 1 million products and 5 lookup queries by SKU. Should you build HashMap<Sku, Product>?

Answer depends on:

are there only 5 queries once?
is map reused?
how expensive is scan?
memory budget?
latency target?

For one-time 5 queries, scan may be acceptable. For repeated service lookup, index is better.

Exercise 3: Analyze Dynamic Array

Explain why doubling capacity gives amortized O(1) append, but single-step growth gives O(n) amortized append.

Expected reasoning:

doubling causes geometric copy series;
geometric series sums to linear;
+1 growth causes arithmetic series;
arithmetic series sums to quadratic.

Exercise 4: Complexity Contract

Write a complexity contract for:

Map<CustomerId, TreeSet<Invoice>> invoicesByCustomer;

Consider:

key equality;
invoice comparator;
insert cost;
find oldest unpaid invoice;
range query by due date;
memory overhead;
mutation hazard if invoice fields used in comparator change.

34. Self-Correction Checklist

Before moving on, verify:

[ ] I can explain Big-O without treating it as exact runtime.
[ ] I can separate worst-case, average-case, expected, and amortized.
[ ] I can analyze hidden complexity inside Java collection calls.
[ ] I can explain why dynamic array append is amortized O(1).
[ ] I can estimate whether O(n²) is feasible for a given n.
[ ] I can identify when memory overhead matters.
[ ] I can state assumptions behind a complexity claim.
[ ] I can write a complexity contract for an API.

35. Summary

Complexity is not an academic afterthought. It is the contract between your solution and future scale.

Key takeaways:

Big-O describes growth, not exact runtime.
Complexity claims depend on assumptions.
Worst-case, average-case, expected, and amortized are different.
Java-specific costs matter: allocation, boxing, indirection, comparator cost, GC, and concurrency.
Hidden complexity often appears inside innocent collection calls.
Preprocessing can trade build time and memory for faster query.
Correctness invariants often determine whether complexity claims are valid.
For production engineering, always state assumptions and failure modes.

Part berikutnya akan membahas Java Memory Model for Data Structures: object layout, primitive arrays, reference chasing, cache locality, allocation pressure, dan bagaimana runtime Java memengaruhi desain struktur data.

References

Oracle Java Collections Framework Documentation.
OpenJDK Java Microbenchmark Harness, JMH.
OpenJDK JDK 25 Project and Release Notes.
Joshua Bloch, Effective Java, especially object equality, hash code, and comparator contracts.
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein, Introduction to Algorithms.

Lesson Recap

You just completed lesson 02 in start here. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 01

Learn Java Dsa Part 001 Kaufman Skill Map

Next Lesson

Lesson 03

Learn Java Dsa Part 003 Java Memory Model For Data Structures

Learn Java Dsa Part 002 Complexity As Engineering Contract

Part 002 — Complexity as an Engineering Contract

1. Complexity Bukan Sekadar Notasi

2. Cost Model: Apa yang Kita Hitung?

2.1 Unit-Cost RAM Model

2.2 Java Runtime Cost Model

3. Big-O: Upper Bound Pertumbuhan

4. Big-O, Big-Theta, Big-Omega

5. Worst-Case, Average-Case, Expected, Amortized

5.1 Worst-Case

5.2 Average-Case

5.3 Expected Time

5.4 Amortized Time

6. Amortized Analysis: Dynamic Array

7. Capacity Growth: Mengapa Faktor Pertumbuhan Penting?

8. Hidden Complexity in Java Collections

8.1 List.contains Bisa Menjadi Trap

8.2 remove While Iterating

8.3 Sorting with Expensive Comparator

9. Data Size Regimes

10. Common Complexity Classes

11. Logarithm Mental Model

12. Nested Loops: Jangan Hanya Menghitung Jumlah Loop

13. Recursion Complexity

13.1 Linear Recursion

13.2 Divide and Conquer

13.3 Exponential Recursion

14. Space Complexity

15. Memory Budget Estimation

16. Lower Bounds: Batas yang Tidak Bisa Ditembus

16.1 Finding Maximum

16.2 Comparison Sorting

16.3 Membership Query

17. Preprocessing Trade-Off

18. Complexity of Multiple Operations

19. Latency vs Throughput

20. Complexity and Invariants

Binary Search

Heap

HashMap

21. Constant Factor: Kapan Penting?

Constant factor biasanya penting ketika:

Constant factor biasanya kurang penting ketika:

22. Complexity Smells in Java Code Review

22.1 Nested contains

22.2 Repeated Sorting

22.3 stream().filter().findFirst() Repeatedly

22.4 Accidental String Concatenation in Loop

22.5 LinkedList as Default Queue

23. Decision Table: Choosing by Operation

24. Worked Example: From O(n²) to O(n)

25. Worked Example: Top-K

26. Worked Example: Range Sum

27. Complexity and API Design

28. Complexity and Domain Modelling

29. Complexity Contract Template

30. Anti-Patterns

30.1 “It’s Only O(n)” Without N

30.2 Confusing Amortized With Worst-Case

30.3 Assuming Library Means Optimal

30.4 Ignoring Memory

30.5 Ignoring Semantics

31. Complexity Review Examples

Example A: Duplicate Detection

Example B: First Repeated Character

32. How to Analyze a New Solution

33. Exercises

Exercise 1: Identify Hidden Complexity

Exercise 2: Preprocessing Decision

Exercise 3: Analyze Dynamic Array

Exercise 4: Complexity Contract

34. Self-Correction Checklist

35. Summary

References

8.1 `List.contains` Bisa Menjadi Trap

8.2 `remove` While Iterating

22.1 Nested `contains`

22.3 `stream().filter().findFirst()` Repeatedly

22.5 `LinkedList` as Default Queue