Learn Java Dsa Part 002 Complexity As Engineering Contract
title: Learn Java Data Structures and Algorithms - Part 002 description: Big-O, amortized analysis, lower bounds, constant factors, and Java-specific cost model sebagai kontrak engineering untuk memilih struktur data dan algoritma. series: learn-java-dsa seriesTitle: Learn Java Data Structures and Algorithms order: 2 partTitle: Complexity as an Engineering Contract tags:
- java
- data-structures
- algorithms
- complexity
- performance
- series date: 2026-06-27
Part 002 — Complexity as an Engineering Contract
Complexity analysis sering diajarkan sebagai formalitas di akhir solusi:
Time: O(n log n)
Space: O(n)
Untuk engineer yang membangun sistem nyata, itu tidak cukup. Complexity adalah kontrak engineering: klaim tentang bagaimana solusi akan bertumbuh ketika input, traffic, atau state membesar.
Jika kontrak itu salah, sistem bisa gagal dalam bentuk:
- latency spike;
- CPU saturation;
- memory blow-up;
- GC pause;
- timeout;
- queue backlog;
- database overload;
- degraded user journey;
- incident operasional.
Part ini membangun fondasi yang akan dipakai di semua part berikutnya.
1. Complexity Bukan Sekadar Notasi
Ketika kita mengatakan:
lookup in HashMap is O(1) average
itu bukan mantra. Itu klaim dengan asumsi:
- hash function menyebarkan key cukup baik;
equals()danhashCode()benar;- load factor dijaga;
- tidak ada adversarial collision yang ekstrem;
- resize cost diamortisasi;
- memory cukup;
- tidak ada contention concurrent di luar struktur tersebut.
Jika asumsi berubah, kontraknya berubah.
Jadi complexity harus dibaca sebagai:
Under these assumptions, this operation has this growth behavior.
2. Cost Model: Apa yang Kita Hitung?
Sebelum menganalisis algoritma, tentukan dulu model biayanya.
2.1 Unit-Cost RAM Model
Model klasik DSA biasanya mengasumsikan operasi dasar bernilai konstan:
- assignment;
- arithmetic;
- array access;
- comparison;
- pointer/reference read;
- method call sederhana.
Model ini berguna untuk reasoning awal, tetapi terlalu sederhana untuk Java production.
2.2 Java Runtime Cost Model
Di Java, biaya nyata dapat mencakup:
| Cost | Contoh |
|---|---|
| Allocation | membuat object baru, node linked list, wrapper Integer |
| Indirection | membaca object melalui reference |
| Cache Miss | traversal pointer-heavy structure |
| Branch Misprediction | conditional logic tidak predictable |
| Virtual Dispatch | pemanggilan method polymorphic tertentu |
| Boxing/Unboxing | int menjadi Integer dan sebaliknya |
| Comparator Cost | sorting object dengan comparator kompleks |
| GC Pressure | banyak object short-lived |
| Synchronization | lock, CAS retry, contention |
| Bounds Check | akses array dengan index |
Artinya, dua algoritma sama-sama O(n) bisa punya performa sangat berbeda.
Contoh:
int sumArray(int[] values) {
int sum = 0;
for (int value : values) {
sum += value;
}
return sum;
}
versus:
int sumList(List<Integer> values) {
int sum = 0;
for (Integer value : values) {
sum += value;
}
return sum;
}
Keduanya O(n), tetapi versi List<Integer> dapat membawa biaya boxing, indirection, layout object, dan cache locality yang lebih buruk tergantung implementasi list dan asal data.
3. Big-O: Upper Bound Pertumbuhan
Big-O menyatakan batas atas pertumbuhan fungsi biaya.
Secara praktis:
O(f(n)) means the cost grows no faster than a constant multiple of f(n), after n is large enough.
Contoh:
boolean contains(int[] values, int target) {
for (int value : values) {
if (value == target) {
return true;
}
}
return false;
}
Worst-case: target tidak ada atau berada di akhir array.
T(n) = c1 * n + c2
T(n) = O(n)
Big-O mengabaikan constant factor dan lower-order term:
5n + 100 -> O(n)
2n log n + n -> O(n log n)
n^2 + 100n -> O(n^2)
Namun untuk engineering, jangan buang constant factor terlalu cepat. Untuk n kecil, constant factor bisa dominan.
4. Big-O, Big-Theta, Big-Omega
Tiga notasi yang sering tertukar:
| Notasi | Makna Praktis |
|---|---|
| O(f(n)) | upper bound: tidak lebih buruk dari f(n) dalam skala besar |
| Ω(f(n)) | lower bound: setidaknya sebesar f(n) dalam skala besar |
| Θ(f(n)) | tight bound: tumbuh sekelas f(n) dari atas dan bawah |
Contoh linear scan:
int indexOf(int[] values, int target) {
for (int i = 0; i < values.length; i++) {
if (values[i] == target) {
return i;
}
}
return -1;
}
Best-case:
target at index 0 -> Θ(1)
Worst-case:
target absent -> Θ(n)
Jika kita hanya bilang O(n), itu benar tetapi tidak lengkap. Dalam desain sistem, best-case jarang menjadi kontrak yang aman.
5. Worst-Case, Average-Case, Expected, Amortized
Empat istilah ini harus dipisahkan.
5.1 Worst-Case
Biaya maksimum untuk input berukuran n.
Contoh:
boolean hasDuplicate(int[] values) {
for (int i = 0; i < values.length; i++) {
for (int j = i + 1; j < values.length; j++) {
if (values[i] == values[j]) {
return true;
}
}
}
return false;
}
Worst-case terjadi ketika tidak ada duplicate.
comparisons = n(n - 1) / 2
complexity = Θ(n^2)
5.2 Average-Case
Rata-rata terhadap distribusi input tertentu.
Masalahnya: distribusi input harus jelas.
Average over what?
Uniform random input?
Real production traffic?
Adversarial user input?
Sorted data?
Skewed tenant data?
Tanpa distribusi, average-case sering menjadi klaim kosong.
5.3 Expected Time
Expected time biasanya muncul pada algoritma randomized.
Contoh randomized quicksort memiliki expected O(n log n) jika pivot random membuat pembagian cukup seimbang secara probabilistik. Namun worst-case tetap O(n^2).
5.4 Amortized Time
Amortized bukan average input. Amortized berarti biaya mahal sesekali disebarkan ke banyak operasi dalam sequence.
Contoh: dynamic array addLast.
- Kebanyakan append
O(1). - Saat capacity penuh, resize dan copy
O(n). - Dalam sequence panjang, total biaya append tetap linear.
- Maka append adalah amortized
O(1).
6. Amortized Analysis: Dynamic Array
Misalkan dynamic array mulai dengan capacity 1 dan selalu double capacity ketika penuh.
Operasi append:
final class IntArray {
private int[] elements = new int[1];
private int size;
void add(int value) {
if (size == elements.length) {
elements = Arrays.copyOf(elements, elements.length * 2);
}
elements[size++] = value;
}
}
Untuk menambahkan 8 elemen:
| Append ke- | Capacity Sebelum | Copy? | Elemen Dicopy |
|---|---|---|---|
| 1 | 1 | no | 0 |
| 2 | 1 | yes | 1 |
| 3 | 2 | yes | 2 |
| 4 | 4 | no | 0 |
| 5 | 4 | yes | 4 |
| 6 | 8 | no | 0 |
| 7 | 8 | no | 0 |
| 8 | 8 | no | 0 |
Total copy sampai n append kurang dari:
1 + 2 + 4 + 8 + ... + n/2 < n
Total operasi insert asli n, total copy < n, sehingga total < 2n.
n append = O(n)
1 append amortized = O(1)
Kontraknya:
addLast is amortized O(1), but individual add may be O(n) during resize.
Ini penting untuk latency-sensitive systems. Amortized O(1) tidak berarti tidak ada spike.
7. Capacity Growth: Mengapa Faktor Pertumbuhan Penting?
Jika capacity bertambah satu per satu:
elements = Arrays.copyOf(elements, elements.length + 1);
Maka total copy untuk n append:
1 + 2 + 3 + ... + (n - 1) = n(n - 1)/2 = Θ(n^2)
Jika capacity dikalikan dua:
1 + 2 + 4 + ... < 2n = Θ(n)
Trade-off doubling:
| Growth Strategy | Append Amortized | Memory Waste | Resize Frequency |
|---|---|---|---|
| +1 | O(n) | low | very high |
| x1.5 | O(1) | moderate | moderate |
| x2 | O(1) | higher | low |
Engineering implication:
- growth terlalu kecil membuat CPU/copy mahal;
- growth terlalu besar membuang memory;
- untuk workload besar, pre-sizing sering lebih baik.
Contoh:
List<Order> orders = new ArrayList<>(expectedOrderCount);
Pre-sizing bukan micro-optimization jika cardinality diketahui dan list besar.
8. Hidden Complexity in Java Collections
8.1 List.contains Bisa Menjadi Trap
boolean allAllowed(List<String> inputs, List<String> allowed) {
for (String input : inputs) {
if (!allowed.contains(input)) {
return false;
}
}
return true;
}
Jika allowed adalah ArrayList, maka contains adalah linear scan.
inputs size = n
allowed size = m
complexity = O(n * m)
Untuk membership check, gunakan HashSet bila ordering tidak diperlukan:
boolean allAllowed(List<String> inputs, Collection<String> allowedValues) {
Set<String> allowed = new HashSet<>(allowedValues);
for (String input : inputs) {
if (!allowed.contains(input)) {
return false;
}
}
return true;
}
Complexity:
build set: O(m) average
checks: O(n) average
total: O(n + m) average
Tetapi ada trade-off: memory tambahan O(m).
8.2 remove While Iterating
for (String item : items) {
if (shouldRemove(item)) {
items.remove(item);
}
}
Masalah:
- bisa
ConcurrentModificationException; - untuk
ArrayList, remove menggeser elemenO(n); - repeated remove bisa
O(n^2).
Lebih baik:
items.removeIf(this::shouldRemove);
Atau buat list baru bila ingin explicit filtering:
List<String> kept = new ArrayList<>();
for (String item : items) {
if (!shouldRemove(item)) {
kept.add(item);
}
}
8.3 Sorting with Expensive Comparator
users.sort(Comparator.comparing(user -> expensiveScore(user)));
Sorting comparison-based biasanya O(n log n) comparisons. Jika comparator mahal, total cost bisa didominasi expensiveScore.
Lebih baik precompute key jika score deterministic:
record ScoredUser(User user, int score) {}
List<ScoredUser> scored = new ArrayList<>(users.size());
for (User user : users) {
scored.add(new ScoredUser(user, expensiveScore(user)));
}
scored.sort(Comparator.comparingInt(ScoredUser::score));
Trade-off:
- waktu comparator turun;
- memory naik
O(n); - kode lebih panjang;
- harus pastikan score tidak berubah selama sort.
9. Data Size Regimes
Complexity harus dibaca bersama ukuran data.
| N | Yang Biasanya Aman | Yang Mulai Berbahaya |
|---|---|---|
| 10 | hampir apa pun | overengineering |
| 100 | O(n²) sering masih aman | struktur terlalu kompleks |
| 10,000 | O(n²) mulai berbahaya | nested scan |
| 1,000,000 | O(n log n) / O(n) lebih masuk akal | boxing besar, memory overhead |
| 100,000,000 | memory layout kritis | object-per-element |
Ini bukan angka absolut. Machine, JVM, workload, dan SLA menentukan hasil. Namun tabel ini membantu intuition.
Contoh:
n = 1,000
n² = 1,000,000
n = 1,000,000
n² = 1,000,000,000,000
Perbedaan dari O(n) ke O(n²) bukan “sedikit lebih lambat”. Pada skala tertentu, itu mengubah sistem dari feasible menjadi mustahil.
10. Common Complexity Classes
| Complexity | Nama Praktis | Contoh |
|---|---|---|
| O(1) | constant | array access, stack push amortized |
| O(log n) | logarithmic | binary search, heap height, balanced tree lookup |
| O(n) | linear | scan array |
| O(n log n) | linearithmic | comparison sorting optimal class |
| O(n²) | quadratic | pairwise comparison |
| O(n³) | cubic | naive matrix multiplication, Floyd-Warshall |
| O(2ⁿ) | exponential | subset enumeration |
| O(n!) | factorial | brute-force permutations |
Growth intuition:
log n grows slowly
n grows directly
n log n is often acceptable for large n
n² collapses quickly at scale
2ⁿ is only for small n unless heavily pruned
11. Logarithm Mental Model
Dalam DSA, log n biasanya berarti “berapa kali kita bisa membagi problem sebelum habis”.
Binary search:
n -> n/2 -> n/4 -> n/8 -> ... -> 1
Jumlah step:
k steps until n / 2^k = 1
2^k = n
k = log2(n)
Untuk n = 1,000,000, log2(n) sekitar 20.
Itulah mengapa balanced tree lookup O(log n) sangat kuat: satu juta elemen bisa dicari dengan sekitar puluhan level, bukan jutaan scan.
Namun O(log n) tidak otomatis lebih cepat dari O(1) average hash lookup, dan tidak otomatis lebih lambat dari array scan untuk n kecil. Context matters.
12. Nested Loops: Jangan Hanya Menghitung Jumlah Loop
Tidak semua nested loop adalah O(n²).
Contoh O(n²):
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
work();
}
}
Contoh tetap O(n):
int j = 0;
for (int i = 0; i < n; i++) {
while (j < n && condition(i, j)) {
j++;
}
}
Jika j hanya bergerak maju dari 0 ke n sepanjang keseluruhan program, total iterasi inner loop adalah O(n), bukan O(n²). Ini disebut pola two pointers atau monotonic progress.
Pertanyaan penting:
Apakah variabel inner reset untuk setiap outer iteration?
Atau bergerak total sepanjang eksekusi?
13. Recursion Complexity
Untuk recursion, pikirkan:
- berapa banyak subproblem;
- ukuran setiap subproblem;
- biaya combine per level;
- kedalaman recursion.
13.1 Linear Recursion
int sum(int[] values, int index) {
if (index == values.length) {
return 0;
}
return values[index] + sum(values, index + 1);
}
Ada n call, tiap call O(1).
T(n) = T(n - 1) + O(1) = O(n)
Space call stack:
O(n)
Catatan Java: Java tidak menjamin tail-call optimization umum, jadi recursion dalam bisa menyebabkan StackOverflowError.
13.2 Divide and Conquer
Mergesort:
T(n) = 2T(n/2) + O(n)
Ada log n level, tiap level total kerja O(n).
T(n) = O(n log n)
13.3 Exponential Recursion
Naive Fibonacci:
long fib(int n) {
if (n <= 1) {
return n;
}
return fib(n - 1) + fib(n - 2);
}
Banyak subproblem dihitung ulang.
T(n) approximately O(2^n)
Memoization mengubahnya menjadi O(n) time dan O(n) space.
14. Space Complexity
Space complexity bukan hanya array tambahan.
Hitung:
- output space jika relevan;
- auxiliary data structure;
- recursion stack;
- object overhead;
- boxing;
- cached/precomputed values;
- temporary arrays during sorting;
- retained references yang mencegah GC.
Contoh:
Set<Integer> unique = new HashSet<>(values.length);
for (int value : values) {
unique.add(value);
}
Secara algoritmik:
O(n) extra space
Secara Java runtime:
Integerboxing jika value berasal dariint;- hash table bucket/table overhead;
- node/object overhead tergantung implementation;
- load factor membuat capacity lebih besar dari size.
Untuk data sangat besar, “O(n) space” harus diterjemahkan menjadi estimasi memory kasar.
15. Memory Budget Estimation
Misalkan kita menyimpan 10 juta integer.
Dengan int[]:
10,000,000 * 4 bytes = about 40 MB plus array header
Dengan List<Integer>:
- backing array menyimpan references;
- setiap
Integeradalah object; - ada object header dan alignment;
- locality lebih buruk.
Estimasi kasar bisa jauh lebih besar dari 40 MB.
Pelajaran:
O(n) space is not enough information when n is large.
Gunakan primitive array atau primitive-specialized collection bila memory menjadi constraint utama.
16. Lower Bounds: Batas yang Tidak Bisa Ditembus
Lower bound membantu mencegah pencarian solusi mustahil.
16.1 Finding Maximum
Untuk menemukan maksimum dari unsorted array, setiap elemen harus dilihat setidaknya sekali.
Lower bound = Ω(n)
Tidak ada algoritma general yang bisa menjamin maximum tanpa membaca semua elemen.
16.2 Comparison Sorting
Sorting berbasis comparison memiliki lower bound:
Ω(n log n)
Artinya, jika hanya boleh membandingkan dua elemen, kita tidak bisa general-purpose sort lebih baik dari n log n dalam model comparison.
Untuk melewati batas ini, harus memakai asumsi tambahan:
- integer range kecil -> counting sort;
- fixed-width digits -> radix sort;
- distribution tertentu -> bucket sort.
16.3 Membership Query
Jika data unsorted dan hanya query sekali:
scan O(n)
Jika query banyak kali:
preprocess into HashSet O(n), then query O(1) average
Lower bound memaksa kita bertanya:
Apakah biaya preprocessing layak dibanding jumlah query?
17. Preprocessing Trade-Off
Banyak algoritma production memakai pola:
pay once, query many times
Contoh:
final class UserDirectory {
private final Map<String, User> byEmail;
UserDirectory(List<User> users) {
this.byEmail = new HashMap<>(users.size() * 2);
for (User user : users) {
byEmail.put(user.email(), user);
}
}
Optional<User> findByEmail(String email) {
return Optional.ofNullable(byEmail.get(email));
}
}
Trade-off:
| Approach | Build | Query | Space |
|---|---|---|---|
| Scan list | O(1) | O(n) | O(1) extra |
| Hash index | O(n) average | O(1) average | O(n) |
| Tree index | O(n log n) or O(n) if sorted build | O(log n) | O(n) |
Decision depends on:
- jumlah query;
- update frequency;
- memory budget;
- need for ordered/range query;
- consistency requirement.
18. Complexity of Multiple Operations
Jangan analisis per method secara terisolasi jika workload adalah sequence operasi.
Contoh API:
interface EventStore {
void append(Event event);
List<Event> latest(int limit);
void deleteOlderThan(Instant threshold);
}
Pertanyaan complexity:
- Berapa biaya
append? - Berapa biaya
latest? - Berapa biaya cleanup?
- Apakah cleanup eager atau lazy?
- Apakah biaya cleanup muncul di request path?
- Apakah operasi satu tenant bisa memengaruhi tenant lain?
Sering kali desain terbaik bukan operasi individual tercepat, tetapi distribusi biaya paling aman.
19. Latency vs Throughput
Complexity sering bicara total work, tetapi sistem production peduli latency dan throughput.
| Concern | Pertanyaan |
|---|---|
| Latency | Berapa lama satu request menunggu? |
| Throughput | Berapa banyak request per detik? |
| Tail Latency | Bagaimana p95/p99/p999? |
| Burst Behavior | Apa yang terjadi saat traffic spike? |
| Backpressure | Apakah queue tumbuh tak terkendali? |
Amortized O(1) bisa bagus untuk throughput tetapi buruk untuk tail latency jika resize besar terjadi di request kritis.
Mitigasi:
- pre-size collections;
- move expensive maintenance off critical path;
- chunk work;
- cap per-request processing;
- use bounded queues;
- use incremental cleanup.
20. Complexity and Invariants
Complexity sering bergantung pada invariant.
Binary Search
O(log n) bergantung pada invariant:
array is sorted under the same ordering used by search
Jika array tidak sorted, binary search tidak hanya lambat; ia salah.
Heap
poll O(log n) bergantung pada invariant:
parent priority <= child priority
Jika elemen di dalam heap dimutasi setelah insertion, heap property bisa rusak.
HashMap
Average O(1) bergantung pada:
hashCode stable while key is in map
if a.equals(b), then a.hashCode() == b.hashCode()
Jika key mutable dan field yang dipakai hash berubah, lookup bisa gagal.
Contoh:
final class MutableKey {
String value;
MutableKey(String value) {
this.value = value;
}
@Override
public boolean equals(Object other) {
return other instanceof MutableKey that && Objects.equals(value, that.value);
}
@Override
public int hashCode() {
return Objects.hash(value);
}
}
Map<MutableKey, String> map = new HashMap<>();
MutableKey key = new MutableKey("A");
map.put(key, "stored");
key.value = "B";
System.out.println(map.get(key)); // likely null
Complexity contract runtuh karena correctness contract runtuh.
21. Constant Factor: Kapan Penting?
Big-O mengabaikan constant factor, tetapi engineer tidak boleh selalu mengabaikannya.
Constant factor biasanya penting ketika:
nkecil tetapi operasi sering sekali;- kode berada di hot path;
- operasi dasar mahal, misalnya comparator melakukan parsing;
- struktur menyebabkan banyak allocation;
- latency target ketat;
- memory bandwidth menjadi bottleneck.
Constant factor biasanya kurang penting ketika:
- growth class salah besar;
O(n²)dipakai untuknbesar;- operasi tidak berada di hot path;
- readability dan correctness lebih penting;
- data size kecil dan stabil.
Aturan praktis:
First choose the right complexity class.
Then measure before fighting constants.
22. Complexity Smells in Java Code Review
Cari pola berikut saat review.
22.1 Nested contains
for (User user : users) {
if (allowedUsers.contains(user)) {
result.add(user);
}
}
Jika allowedUsers adalah list besar, smell.
22.2 Repeated Sorting
for (Request request : requests) {
candidates.sort(comparator);
process(request, candidates.get(0));
}
Sorting di dalam loop sering bisa diganti priority queue, pre-sort, atau incremental update.
22.3 stream().filter().findFirst() Repeatedly
for (String id : ids) {
User user = users.stream()
.filter(u -> u.id().equals(id))
.findFirst()
.orElseThrow();
}
Ini sering O(n*m). Buat index:
Map<String, User> byId = users.stream()
.collect(Collectors.toMap(User::id, Function.identity()));
22.4 Accidental String Concatenation in Loop
String result = "";
for (String part : parts) {
result += part;
}
Gunakan StringBuilder untuk repeated concatenation.
22.5 LinkedList as Default Queue
Queue<Job> queue = new LinkedList<>();
Untuk kebanyakan queue single-threaded, ArrayDeque lebih baik karena lebih cache-friendly dan tidak membuat node object per element.
23. Decision Table: Choosing by Operation
| Need | Candidate | Typical Cost | Notes |
|---|---|---|---|
| random index access | ArrayList, array | O(1) | good locality |
| frequent append | ArrayList, ArrayDeque | amortized O(1) | watch resize spike |
| frequent insert middle | depends | O(n) often | maybe need tree/rope/gap buffer |
| membership check | HashSet | O(1) average | requires correct hash/equality |
| ordered lookup | TreeMap/TreeSet | O(log n) | comparator correctness critical |
| min/max repeated | PriorityQueue | O(log n) update | no efficient decrease-key |
| range sum static | prefix sum | O(1) query after O(n) build | updates expensive |
| range sum dynamic | Fenwick/segment tree | O(log n) | more implementation complexity |
| graph traversal | adjacency list | O(V+E) | representation matters |
| dependency resolution | topological sort | O(V+E) | cycle handling required |
24. Worked Example: From O(n²) to O(n)
Problem:
Given two lists of user IDs, return IDs that appear in both.
Naive:
List<String> intersection(List<String> a, List<String> b) {
List<String> result = new ArrayList<>();
for (String x : a) {
if (b.contains(x)) {
result.add(x);
}
}
return result;
}
If b is an ArrayList:
Time: O(a.size * b.size)
Space: O(k) result
Improved:
List<String> intersection(List<String> a, List<String> b) {
Set<String> bSet = new HashSet<>(b);
List<String> result = new ArrayList<>();
for (String x : a) {
if (bSet.contains(x)) {
result.add(x);
}
}
return result;
}
Complexity:
Build HashSet: O(b.size) average
Loop over a: O(a.size) average
Total: O(a.size + b.size) average
Extra space: O(b.size)
But clarify semantics:
- Should duplicates from
abe preserved? - Should result be unique?
- Should order follow
a? - What if
aorbis huge? - Is
Stringcomparison case-sensitive?
A complexity improvement without semantic clarity can still be wrong.
25. Worked Example: Top-K
Problem:
Given 10 million scores, return top 100.
Option 1: sort all.
scores.sort(Comparator.reverseOrder());
return scores.subList(0, 100);
Complexity:
O(n log n)
Option 2: min-heap of size k.
List<Integer> topK(List<Integer> scores, int k) {
PriorityQueue<Integer> heap = new PriorityQueue<>();
for (int score : scores) {
if (heap.size() < k) {
heap.offer(score);
} else if (score > heap.peek()) {
heap.poll();
heap.offer(score);
}
}
ArrayList<Integer> result = new ArrayList<>(heap);
result.sort(Comparator.reverseOrder());
return result;
}
Complexity:
O(n log k + k log k)
For n = 10,000,000 and k = 100, log k is much smaller than log n.
Trade-off:
- heap solution better when k is small relative to n;
- sorting simpler and may be fine for moderate n;
- primitive arrays may be better than
List<Integer>for very large numeric data; - if scores stream in online, heap solution works incrementally.
26. Worked Example: Range Sum
Problem:
Given an array and many range sum queries
[l, r], answer each query quickly.
Naive per query:
long rangeSum(int[] values, int left, int right) {
long sum = 0;
for (int i = left; i <= right; i++) {
sum += values[i];
}
return sum;
}
Cost:
O(length of range) per query
If many queries, build prefix sum:
final class RangeSum {
private final long[] prefix;
RangeSum(int[] values) {
this.prefix = new long[values.length + 1];
for (int i = 0; i < values.length; i++) {
prefix[i + 1] = prefix[i] + values[i];
}
}
long queryInclusive(int left, int right) {
if (left < 0 || right < left || right >= prefix.length - 1) {
throw new IndexOutOfBoundsException();
}
return prefix[right + 1] - prefix[left];
}
}
Cost:
Build: O(n)
Query: O(1)
Space: O(n)
Invariant:
prefix[i] = sum of values[0..i-1]
This invariant makes query derivation obvious:
sum(left..right) = prefix[right + 1] - prefix[left]
27. Complexity and API Design
API shape can force complexity.
Bad API:
interface UserStore {
List<User> allUsers();
}
Caller then repeatedly scans:
User findByEmail(UserStore store, String email) {
for (User user : store.allUsers()) {
if (user.email().equals(email)) {
return user;
}
}
throw new NoSuchElementException();
}
Better API if lookup is first-class:
interface UserStore {
Optional<User> findByEmail(String email);
}
Even better if semantics are explicit:
interface UserDirectory {
Optional<User> findActiveUserByNormalizedEmail(String normalizedEmail);
}
DSA is not only inside method body. It affects service interfaces, indexes, caches, and data ownership.
28. Complexity and Domain Modelling
A poor domain model can make every algorithm expensive.
Example:
record Case(String id, List<Action> actions) {}
record Action(String id, String assigneeId, Instant createdAt) {}
Requirement:
Find all open cases assigned to a given officer.
If only stored as List<Case>, query may require scanning all cases and actions.
Potential index:
Map<String, Set<String>> caseIdsByAssignee;
But now update complexity matters:
- when action assigned;
- when action completed;
- when case closed;
- when assignee changes;
- when data is rebuilt;
- when index becomes inconsistent.
Complexity improvement introduces consistency invariant:
For every open case C and active action A assigned to officer O,
caseIdsByAssignee[O] contains C.id.
Indexes are data structures with correctness obligations.
29. Complexity Contract Template
Gunakan template ini dalam design doc atau code review.
Operation:
find active session by token
Data structure:
HashMap<Token, Session>
Assumptions:
Token is immutable.
Token.hashCode is stable and well-distributed.
Map is not accessed concurrently without synchronization.
Expected max sessions: 500k.
Complexity:
get: O(1) average, O(n) worst-case under severe collision
put: O(1) amortized average, occasional resize O(n)
memory: O(n)
Failure modes:
Mutable token breaks lookup.
High cardinality increases memory and GC pressure.
Resize can cause latency spike.
No eviction causes unbounded growth.
Mitigations:
Use immutable token type.
Pre-size map if cardinality known.
Add eviction/TTL.
Monitor size and latency.
This is how complexity becomes engineering communication.
30. Anti-Patterns
30.1 “It’s Only O(n)” Without N
O(n) for n = 100 and n = 100,000,000 are different engineering realities.
Always ask:
What is n?
How fast can n grow?
Is n per request, per tenant, per day, or global?
30.2 Confusing Amortized With Worst-Case
ArrayList.add is amortized O(1)
does not mean every add is worst-case O(1).
30.3 Assuming Library Means Optimal
Java Collections are excellent general-purpose tools, but not magic. LinkedList, HashMap, TreeMap, PriorityQueue, and ConcurrentHashMap have very different contracts.
30.4 Ignoring Memory
A time-optimal solution can be unusable if memory blows up.
30.5 Ignoring Semantics
Replacing List with Set changes duplicate behavior and ordering. Complexity improvement that changes semantics is a bug.
31. Complexity Review Examples
Example A: Duplicate Detection
boolean hasDuplicate(int[] values) {
Set<Integer> seen = new HashSet<>();
for (int value : values) {
if (!seen.add(value)) {
return true;
}
}
return false;
}
Analysis:
Time: O(n) average, O(n²) pathological worst-case if hash degenerates severely
Space: O(n)
Java cost: boxing int to Integer
Alternative: sort int[] then scan, O(n log n) time, O(1) or O(n) depending sort, avoids boxing if array mutable
Decision:
- if input can be mutated and memory matters, sort then scan may be better;
- if preserving order and average speed matter, hash set is better;
- if integer range small, boolean bitmap may be best.
Example B: First Repeated Character
OptionalInt firstRepeatedAsciiChar(String s) {
boolean[] seen = new boolean[128];
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (c >= 128) {
throw new IllegalArgumentException("non-ASCII character");
}
if (seen[c]) {
return OptionalInt.of(c);
}
seen[c] = true;
}
return OptionalInt.empty();
}
Analysis:
Time: O(n)
Space: O(1), because 128 fixed-size array
But semantic constraint is explicit: ASCII only. Without that, Unicode handling changes representation and correctness.
32. How to Analyze a New Solution
Use this sequence:
- Define
n,m,V,E, or other variables. - Identify dominant operations.
- Count loops and monotonic movement.
- Analyze data structure operations used inside loops.
- Separate build/preprocess/query/update costs.
- Separate average, worst, expected, and amortized claims.
- Analyze extra space.
- Add Java-specific costs.
- Validate against constraints.
- State assumptions.
Example statement:
Let n be number of users and m be number of allowed IDs.
Building the HashSet costs O(m) average time and O(m) space.
For each user, membership check is O(1) average, so filtering costs O(n) average.
Total average time is O(n + m), with worst-case degradation if hashing behaves pathologically.
The solution preserves input user order but does not deduplicate users.
Notice how this is much stronger than:
Time O(n), space O(n)
33. Exercises
Exercise 1: Identify Hidden Complexity
Analyze this code:
List<Order> findOrders(List<Order> orders, List<String> targetIds) {
List<Order> result = new ArrayList<>();
for (Order order : orders) {
if (targetIds.contains(order.id())) {
result.add(order);
}
}
return result;
}
Answer:
- define
n = orders.size(); - define
m = targetIds.size(); - if
targetIdsis list, complexityO(n*m); - improvement: convert target IDs to
HashSet, but preserve semantics around duplicates/order.
Exercise 2: Preprocessing Decision
You receive 1 million products and 5 lookup queries by SKU. Should you build HashMap<Sku, Product>?
Answer depends on:
- are there only 5 queries once?
- is map reused?
- how expensive is scan?
- memory budget?
- latency target?
For one-time 5 queries, scan may be acceptable. For repeated service lookup, index is better.
Exercise 3: Analyze Dynamic Array
Explain why doubling capacity gives amortized O(1) append, but single-step growth gives O(n) amortized append.
Expected reasoning:
- doubling causes geometric copy series;
- geometric series sums to linear;
- +1 growth causes arithmetic series;
- arithmetic series sums to quadratic.
Exercise 4: Complexity Contract
Write a complexity contract for:
Map<CustomerId, TreeSet<Invoice>> invoicesByCustomer;
Consider:
- key equality;
- invoice comparator;
- insert cost;
- find oldest unpaid invoice;
- range query by due date;
- memory overhead;
- mutation hazard if invoice fields used in comparator change.
34. Self-Correction Checklist
Before moving on, verify:
[ ] I can explain Big-O without treating it as exact runtime.
[ ] I can separate worst-case, average-case, expected, and amortized.
[ ] I can analyze hidden complexity inside Java collection calls.
[ ] I can explain why dynamic array append is amortized O(1).
[ ] I can estimate whether O(n²) is feasible for a given n.
[ ] I can identify when memory overhead matters.
[ ] I can state assumptions behind a complexity claim.
[ ] I can write a complexity contract for an API.
35. Summary
Complexity is not an academic afterthought. It is the contract between your solution and future scale.
Key takeaways:
- Big-O describes growth, not exact runtime.
- Complexity claims depend on assumptions.
- Worst-case, average-case, expected, and amortized are different.
- Java-specific costs matter: allocation, boxing, indirection, comparator cost, GC, and concurrency.
- Hidden complexity often appears inside innocent collection calls.
- Preprocessing can trade build time and memory for faster query.
- Correctness invariants often determine whether complexity claims are valid.
- For production engineering, always state assumptions and failure modes.
Part berikutnya akan membahas Java Memory Model for Data Structures: object layout, primitive arrays, reference chasing, cache locality, allocation pressure, dan bagaimana runtime Java memengaruhi desain struktur data.
References
- Oracle Java Collections Framework Documentation.
- OpenJDK Java Microbenchmark Harness, JMH.
- OpenJDK JDK 25 Project and Release Notes.
- Joshua Bloch, Effective Java, especially object equality, hash code, and comparator contracts.
- Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein, Introduction to Algorithms.
You just completed lesson 02 in start here. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.
Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.