Series MapLesson 02 / 35
Start HereOrdered learning track

Learn Java Dsa Part 002 Complexity As Engineering Contract

18 min read3405 words
PrevNext
Lesson 0235 lesson track0106 Start Here

title: Learn Java Data Structures and Algorithms - Part 002 description: Big-O, amortized analysis, lower bounds, constant factors, and Java-specific cost model sebagai kontrak engineering untuk memilih struktur data dan algoritma. series: learn-java-dsa seriesTitle: Learn Java Data Structures and Algorithms order: 2 partTitle: Complexity as an Engineering Contract tags:

  • java
  • data-structures
  • algorithms
  • complexity
  • performance
  • series date: 2026-06-27

Part 002 — Complexity as an Engineering Contract

Complexity analysis sering diajarkan sebagai formalitas di akhir solusi:

Time: O(n log n)
Space: O(n)

Untuk engineer yang membangun sistem nyata, itu tidak cukup. Complexity adalah kontrak engineering: klaim tentang bagaimana solusi akan bertumbuh ketika input, traffic, atau state membesar.

Jika kontrak itu salah, sistem bisa gagal dalam bentuk:

  • latency spike;
  • CPU saturation;
  • memory blow-up;
  • GC pause;
  • timeout;
  • queue backlog;
  • database overload;
  • degraded user journey;
  • incident operasional.

Part ini membangun fondasi yang akan dipakai di semua part berikutnya.


1. Complexity Bukan Sekadar Notasi

Ketika kita mengatakan:

lookup in HashMap is O(1) average

itu bukan mantra. Itu klaim dengan asumsi:

  • hash function menyebarkan key cukup baik;
  • equals() dan hashCode() benar;
  • load factor dijaga;
  • tidak ada adversarial collision yang ekstrem;
  • resize cost diamortisasi;
  • memory cukup;
  • tidak ada contention concurrent di luar struktur tersebut.

Jika asumsi berubah, kontraknya berubah.

Jadi complexity harus dibaca sebagai:

Under these assumptions, this operation has this growth behavior.

2. Cost Model: Apa yang Kita Hitung?

Sebelum menganalisis algoritma, tentukan dulu model biayanya.

2.1 Unit-Cost RAM Model

Model klasik DSA biasanya mengasumsikan operasi dasar bernilai konstan:

  • assignment;
  • arithmetic;
  • array access;
  • comparison;
  • pointer/reference read;
  • method call sederhana.

Model ini berguna untuk reasoning awal, tetapi terlalu sederhana untuk Java production.

2.2 Java Runtime Cost Model

Di Java, biaya nyata dapat mencakup:

CostContoh
Allocationmembuat object baru, node linked list, wrapper Integer
Indirectionmembaca object melalui reference
Cache Misstraversal pointer-heavy structure
Branch Mispredictionconditional logic tidak predictable
Virtual Dispatchpemanggilan method polymorphic tertentu
Boxing/Unboxingint menjadi Integer dan sebaliknya
Comparator Costsorting object dengan comparator kompleks
GC Pressurebanyak object short-lived
Synchronizationlock, CAS retry, contention
Bounds Checkakses array dengan index

Artinya, dua algoritma sama-sama O(n) bisa punya performa sangat berbeda.

Contoh:

int sumArray(int[] values) {
    int sum = 0;
    for (int value : values) {
        sum += value;
    }
    return sum;
}

versus:

int sumList(List<Integer> values) {
    int sum = 0;
    for (Integer value : values) {
        sum += value;
    }
    return sum;
}

Keduanya O(n), tetapi versi List<Integer> dapat membawa biaya boxing, indirection, layout object, dan cache locality yang lebih buruk tergantung implementasi list dan asal data.


3. Big-O: Upper Bound Pertumbuhan

Big-O menyatakan batas atas pertumbuhan fungsi biaya.

Secara praktis:

O(f(n)) means the cost grows no faster than a constant multiple of f(n), after n is large enough.

Contoh:

boolean contains(int[] values, int target) {
    for (int value : values) {
        if (value == target) {
            return true;
        }
    }
    return false;
}

Worst-case: target tidak ada atau berada di akhir array.

T(n) = c1 * n + c2
T(n) = O(n)

Big-O mengabaikan constant factor dan lower-order term:

5n + 100      -> O(n)
2n log n + n -> O(n log n)
n^2 + 100n   -> O(n^2)

Namun untuk engineering, jangan buang constant factor terlalu cepat. Untuk n kecil, constant factor bisa dominan.


4. Big-O, Big-Theta, Big-Omega

Tiga notasi yang sering tertukar:

NotasiMakna Praktis
O(f(n))upper bound: tidak lebih buruk dari f(n) dalam skala besar
Ω(f(n))lower bound: setidaknya sebesar f(n) dalam skala besar
Θ(f(n))tight bound: tumbuh sekelas f(n) dari atas dan bawah

Contoh linear scan:

int indexOf(int[] values, int target) {
    for (int i = 0; i < values.length; i++) {
        if (values[i] == target) {
            return i;
        }
    }
    return -1;
}

Best-case:

target at index 0 -> Θ(1)

Worst-case:

target absent -> Θ(n)

Jika kita hanya bilang O(n), itu benar tetapi tidak lengkap. Dalam desain sistem, best-case jarang menjadi kontrak yang aman.


5. Worst-Case, Average-Case, Expected, Amortized

Empat istilah ini harus dipisahkan.

5.1 Worst-Case

Biaya maksimum untuk input berukuran n.

Contoh:

boolean hasDuplicate(int[] values) {
    for (int i = 0; i < values.length; i++) {
        for (int j = i + 1; j < values.length; j++) {
            if (values[i] == values[j]) {
                return true;
            }
        }
    }
    return false;
}

Worst-case terjadi ketika tidak ada duplicate.

comparisons = n(n - 1) / 2
complexity = Θ(n^2)

5.2 Average-Case

Rata-rata terhadap distribusi input tertentu.

Masalahnya: distribusi input harus jelas.

Average over what?
Uniform random input?
Real production traffic?
Adversarial user input?
Sorted data?
Skewed tenant data?

Tanpa distribusi, average-case sering menjadi klaim kosong.

5.3 Expected Time

Expected time biasanya muncul pada algoritma randomized.

Contoh randomized quicksort memiliki expected O(n log n) jika pivot random membuat pembagian cukup seimbang secara probabilistik. Namun worst-case tetap O(n^2).

5.4 Amortized Time

Amortized bukan average input. Amortized berarti biaya mahal sesekali disebarkan ke banyak operasi dalam sequence.

Contoh: dynamic array addLast.

  • Kebanyakan append O(1).
  • Saat capacity penuh, resize dan copy O(n).
  • Dalam sequence panjang, total biaya append tetap linear.
  • Maka append adalah amortized O(1).

6. Amortized Analysis: Dynamic Array

Misalkan dynamic array mulai dengan capacity 1 dan selalu double capacity ketika penuh.

Operasi append:

final class IntArray {
    private int[] elements = new int[1];
    private int size;

    void add(int value) {
        if (size == elements.length) {
            elements = Arrays.copyOf(elements, elements.length * 2);
        }
        elements[size++] = value;
    }
}

Untuk menambahkan 8 elemen:

Append ke-Capacity SebelumCopy?Elemen Dicopy
11no0
21yes1
32yes2
44no0
54yes4
68no0
78no0
88no0

Total copy sampai n append kurang dari:

1 + 2 + 4 + 8 + ... + n/2 < n

Total operasi insert asli n, total copy < n, sehingga total < 2n.

n append = O(n)
1 append amortized = O(1)

Kontraknya:

addLast is amortized O(1), but individual add may be O(n) during resize.

Ini penting untuk latency-sensitive systems. Amortized O(1) tidak berarti tidak ada spike.


7. Capacity Growth: Mengapa Faktor Pertumbuhan Penting?

Jika capacity bertambah satu per satu:

elements = Arrays.copyOf(elements, elements.length + 1);

Maka total copy untuk n append:

1 + 2 + 3 + ... + (n - 1) = n(n - 1)/2 = Θ(n^2)

Jika capacity dikalikan dua:

1 + 2 + 4 + ... < 2n = Θ(n)

Trade-off doubling:

Growth StrategyAppend AmortizedMemory WasteResize Frequency
+1O(n)lowvery high
x1.5O(1)moderatemoderate
x2O(1)higherlow

Engineering implication:

  • growth terlalu kecil membuat CPU/copy mahal;
  • growth terlalu besar membuang memory;
  • untuk workload besar, pre-sizing sering lebih baik.

Contoh:

List<Order> orders = new ArrayList<>(expectedOrderCount);

Pre-sizing bukan micro-optimization jika cardinality diketahui dan list besar.


8. Hidden Complexity in Java Collections

8.1 List.contains Bisa Menjadi Trap

boolean allAllowed(List<String> inputs, List<String> allowed) {
    for (String input : inputs) {
        if (!allowed.contains(input)) {
            return false;
        }
    }
    return true;
}

Jika allowed adalah ArrayList, maka contains adalah linear scan.

inputs size = n
allowed size = m
complexity = O(n * m)

Untuk membership check, gunakan HashSet bila ordering tidak diperlukan:

boolean allAllowed(List<String> inputs, Collection<String> allowedValues) {
    Set<String> allowed = new HashSet<>(allowedValues);
    for (String input : inputs) {
        if (!allowed.contains(input)) {
            return false;
        }
    }
    return true;
}

Complexity:

build set: O(m) average
checks: O(n) average
total: O(n + m) average

Tetapi ada trade-off: memory tambahan O(m).

8.2 remove While Iterating

for (String item : items) {
    if (shouldRemove(item)) {
        items.remove(item);
    }
}

Masalah:

  • bisa ConcurrentModificationException;
  • untuk ArrayList, remove menggeser elemen O(n);
  • repeated remove bisa O(n^2).

Lebih baik:

items.removeIf(this::shouldRemove);

Atau buat list baru bila ingin explicit filtering:

List<String> kept = new ArrayList<>();
for (String item : items) {
    if (!shouldRemove(item)) {
        kept.add(item);
    }
}

8.3 Sorting with Expensive Comparator

users.sort(Comparator.comparing(user -> expensiveScore(user)));

Sorting comparison-based biasanya O(n log n) comparisons. Jika comparator mahal, total cost bisa didominasi expensiveScore.

Lebih baik precompute key jika score deterministic:

record ScoredUser(User user, int score) {}

List<ScoredUser> scored = new ArrayList<>(users.size());
for (User user : users) {
    scored.add(new ScoredUser(user, expensiveScore(user)));
}

scored.sort(Comparator.comparingInt(ScoredUser::score));

Trade-off:

  • waktu comparator turun;
  • memory naik O(n);
  • kode lebih panjang;
  • harus pastikan score tidak berubah selama sort.

9. Data Size Regimes

Complexity harus dibaca bersama ukuran data.

NYang Biasanya AmanYang Mulai Berbahaya
10hampir apa punoverengineering
100O(n²) sering masih amanstruktur terlalu kompleks
10,000O(n²) mulai berbahayanested scan
1,000,000O(n log n) / O(n) lebih masuk akalboxing besar, memory overhead
100,000,000memory layout kritisobject-per-element

Ini bukan angka absolut. Machine, JVM, workload, dan SLA menentukan hasil. Namun tabel ini membantu intuition.

Contoh:

n = 1,000
n² = 1,000,000

n = 1,000,000
n² = 1,000,000,000,000

Perbedaan dari O(n) ke O(n²) bukan “sedikit lebih lambat”. Pada skala tertentu, itu mengubah sistem dari feasible menjadi mustahil.


10. Common Complexity Classes

ComplexityNama PraktisContoh
O(1)constantarray access, stack push amortized
O(log n)logarithmicbinary search, heap height, balanced tree lookup
O(n)linearscan array
O(n log n)linearithmiccomparison sorting optimal class
O(n²)quadraticpairwise comparison
O(n³)cubicnaive matrix multiplication, Floyd-Warshall
O(2ⁿ)exponentialsubset enumeration
O(n!)factorialbrute-force permutations

Growth intuition:

log n grows slowly
n grows directly
n log n is often acceptable for large n
n² collapses quickly at scale
2ⁿ is only for small n unless heavily pruned

11. Logarithm Mental Model

Dalam DSA, log n biasanya berarti “berapa kali kita bisa membagi problem sebelum habis”.

Binary search:

n -> n/2 -> n/4 -> n/8 -> ... -> 1

Jumlah step:

k steps until n / 2^k = 1
2^k = n
k = log2(n)

Untuk n = 1,000,000, log2(n) sekitar 20.

Itulah mengapa balanced tree lookup O(log n) sangat kuat: satu juta elemen bisa dicari dengan sekitar puluhan level, bukan jutaan scan.

Namun O(log n) tidak otomatis lebih cepat dari O(1) average hash lookup, dan tidak otomatis lebih lambat dari array scan untuk n kecil. Context matters.


12. Nested Loops: Jangan Hanya Menghitung Jumlah Loop

Tidak semua nested loop adalah O(n²).

Contoh O(n²):

for (int i = 0; i < n; i++) {
    for (int j = 0; j < n; j++) {
        work();
    }
}

Contoh tetap O(n):

int j = 0;
for (int i = 0; i < n; i++) {
    while (j < n && condition(i, j)) {
        j++;
    }
}

Jika j hanya bergerak maju dari 0 ke n sepanjang keseluruhan program, total iterasi inner loop adalah O(n), bukan O(n²). Ini disebut pola two pointers atau monotonic progress.

Pertanyaan penting:

Apakah variabel inner reset untuk setiap outer iteration?
Atau bergerak total sepanjang eksekusi?

13. Recursion Complexity

Untuk recursion, pikirkan:

  1. berapa banyak subproblem;
  2. ukuran setiap subproblem;
  3. biaya combine per level;
  4. kedalaman recursion.

13.1 Linear Recursion

int sum(int[] values, int index) {
    if (index == values.length) {
        return 0;
    }
    return values[index] + sum(values, index + 1);
}

Ada n call, tiap call O(1).

T(n) = T(n - 1) + O(1) = O(n)

Space call stack:

O(n)

Catatan Java: Java tidak menjamin tail-call optimization umum, jadi recursion dalam bisa menyebabkan StackOverflowError.

13.2 Divide and Conquer

Mergesort:

T(n) = 2T(n/2) + O(n)

Ada log n level, tiap level total kerja O(n).

T(n) = O(n log n)

13.3 Exponential Recursion

Naive Fibonacci:

long fib(int n) {
    if (n <= 1) {
        return n;
    }
    return fib(n - 1) + fib(n - 2);
}

Banyak subproblem dihitung ulang.

T(n) approximately O(2^n)

Memoization mengubahnya menjadi O(n) time dan O(n) space.


14. Space Complexity

Space complexity bukan hanya array tambahan.

Hitung:

  • output space jika relevan;
  • auxiliary data structure;
  • recursion stack;
  • object overhead;
  • boxing;
  • cached/precomputed values;
  • temporary arrays during sorting;
  • retained references yang mencegah GC.

Contoh:

Set<Integer> unique = new HashSet<>(values.length);
for (int value : values) {
    unique.add(value);
}

Secara algoritmik:

O(n) extra space

Secara Java runtime:

  • Integer boxing jika value berasal dari int;
  • hash table bucket/table overhead;
  • node/object overhead tergantung implementation;
  • load factor membuat capacity lebih besar dari size.

Untuk data sangat besar, “O(n) space” harus diterjemahkan menjadi estimasi memory kasar.


15. Memory Budget Estimation

Misalkan kita menyimpan 10 juta integer.

Dengan int[]:

10,000,000 * 4 bytes = about 40 MB plus array header

Dengan List<Integer>:

  • backing array menyimpan references;
  • setiap Integer adalah object;
  • ada object header dan alignment;
  • locality lebih buruk.

Estimasi kasar bisa jauh lebih besar dari 40 MB.

Pelajaran:

O(n) space is not enough information when n is large.

Gunakan primitive array atau primitive-specialized collection bila memory menjadi constraint utama.


16. Lower Bounds: Batas yang Tidak Bisa Ditembus

Lower bound membantu mencegah pencarian solusi mustahil.

16.1 Finding Maximum

Untuk menemukan maksimum dari unsorted array, setiap elemen harus dilihat setidaknya sekali.

Lower bound = Ω(n)

Tidak ada algoritma general yang bisa menjamin maximum tanpa membaca semua elemen.

16.2 Comparison Sorting

Sorting berbasis comparison memiliki lower bound:

Ω(n log n)

Artinya, jika hanya boleh membandingkan dua elemen, kita tidak bisa general-purpose sort lebih baik dari n log n dalam model comparison.

Untuk melewati batas ini, harus memakai asumsi tambahan:

  • integer range kecil -> counting sort;
  • fixed-width digits -> radix sort;
  • distribution tertentu -> bucket sort.

16.3 Membership Query

Jika data unsorted dan hanya query sekali:

scan O(n)

Jika query banyak kali:

preprocess into HashSet O(n), then query O(1) average

Lower bound memaksa kita bertanya:

Apakah biaya preprocessing layak dibanding jumlah query?

17. Preprocessing Trade-Off

Banyak algoritma production memakai pola:

pay once, query many times

Contoh:

final class UserDirectory {
    private final Map<String, User> byEmail;

    UserDirectory(List<User> users) {
        this.byEmail = new HashMap<>(users.size() * 2);
        for (User user : users) {
            byEmail.put(user.email(), user);
        }
    }

    Optional<User> findByEmail(String email) {
        return Optional.ofNullable(byEmail.get(email));
    }
}

Trade-off:

ApproachBuildQuerySpace
Scan listO(1)O(n)O(1) extra
Hash indexO(n) averageO(1) averageO(n)
Tree indexO(n log n) or O(n) if sorted buildO(log n)O(n)

Decision depends on:

  • jumlah query;
  • update frequency;
  • memory budget;
  • need for ordered/range query;
  • consistency requirement.

18. Complexity of Multiple Operations

Jangan analisis per method secara terisolasi jika workload adalah sequence operasi.

Contoh API:

interface EventStore {
    void append(Event event);
    List<Event> latest(int limit);
    void deleteOlderThan(Instant threshold);
}

Pertanyaan complexity:

  • Berapa biaya append?
  • Berapa biaya latest?
  • Berapa biaya cleanup?
  • Apakah cleanup eager atau lazy?
  • Apakah biaya cleanup muncul di request path?
  • Apakah operasi satu tenant bisa memengaruhi tenant lain?

Sering kali desain terbaik bukan operasi individual tercepat, tetapi distribusi biaya paling aman.


19. Latency vs Throughput

Complexity sering bicara total work, tetapi sistem production peduli latency dan throughput.

ConcernPertanyaan
LatencyBerapa lama satu request menunggu?
ThroughputBerapa banyak request per detik?
Tail LatencyBagaimana p95/p99/p999?
Burst BehaviorApa yang terjadi saat traffic spike?
BackpressureApakah queue tumbuh tak terkendali?

Amortized O(1) bisa bagus untuk throughput tetapi buruk untuk tail latency jika resize besar terjadi di request kritis.

Mitigasi:

  • pre-size collections;
  • move expensive maintenance off critical path;
  • chunk work;
  • cap per-request processing;
  • use bounded queues;
  • use incremental cleanup.

20. Complexity and Invariants

Complexity sering bergantung pada invariant.

O(log n) bergantung pada invariant:

array is sorted under the same ordering used by search

Jika array tidak sorted, binary search tidak hanya lambat; ia salah.

Heap

poll O(log n) bergantung pada invariant:

parent priority <= child priority

Jika elemen di dalam heap dimutasi setelah insertion, heap property bisa rusak.

HashMap

Average O(1) bergantung pada:

hashCode stable while key is in map
if a.equals(b), then a.hashCode() == b.hashCode()

Jika key mutable dan field yang dipakai hash berubah, lookup bisa gagal.

Contoh:

final class MutableKey {
    String value;

    MutableKey(String value) {
        this.value = value;
    }

    @Override
    public boolean equals(Object other) {
        return other instanceof MutableKey that && Objects.equals(value, that.value);
    }

    @Override
    public int hashCode() {
        return Objects.hash(value);
    }
}
Map<MutableKey, String> map = new HashMap<>();
MutableKey key = new MutableKey("A");
map.put(key, "stored");

key.value = "B";

System.out.println(map.get(key)); // likely null

Complexity contract runtuh karena correctness contract runtuh.


21. Constant Factor: Kapan Penting?

Big-O mengabaikan constant factor, tetapi engineer tidak boleh selalu mengabaikannya.

Constant factor biasanya penting ketika:

  • n kecil tetapi operasi sering sekali;
  • kode berada di hot path;
  • operasi dasar mahal, misalnya comparator melakukan parsing;
  • struktur menyebabkan banyak allocation;
  • latency target ketat;
  • memory bandwidth menjadi bottleneck.

Constant factor biasanya kurang penting ketika:

  • growth class salah besar;
  • O(n²) dipakai untuk n besar;
  • operasi tidak berada di hot path;
  • readability dan correctness lebih penting;
  • data size kecil dan stabil.

Aturan praktis:

First choose the right complexity class.
Then measure before fighting constants.

22. Complexity Smells in Java Code Review

Cari pola berikut saat review.

22.1 Nested contains

for (User user : users) {
    if (allowedUsers.contains(user)) {
        result.add(user);
    }
}

Jika allowedUsers adalah list besar, smell.

22.2 Repeated Sorting

for (Request request : requests) {
    candidates.sort(comparator);
    process(request, candidates.get(0));
}

Sorting di dalam loop sering bisa diganti priority queue, pre-sort, atau incremental update.

22.3 stream().filter().findFirst() Repeatedly

for (String id : ids) {
    User user = users.stream()
        .filter(u -> u.id().equals(id))
        .findFirst()
        .orElseThrow();
}

Ini sering O(n*m). Buat index:

Map<String, User> byId = users.stream()
    .collect(Collectors.toMap(User::id, Function.identity()));

22.4 Accidental String Concatenation in Loop

String result = "";
for (String part : parts) {
    result += part;
}

Gunakan StringBuilder untuk repeated concatenation.

22.5 LinkedList as Default Queue

Queue<Job> queue = new LinkedList<>();

Untuk kebanyakan queue single-threaded, ArrayDeque lebih baik karena lebih cache-friendly dan tidak membuat node object per element.


23. Decision Table: Choosing by Operation

NeedCandidateTypical CostNotes
random index accessArrayList, arrayO(1)good locality
frequent appendArrayList, ArrayDequeamortized O(1)watch resize spike
frequent insert middledependsO(n) oftenmaybe need tree/rope/gap buffer
membership checkHashSetO(1) averagerequires correct hash/equality
ordered lookupTreeMap/TreeSetO(log n)comparator correctness critical
min/max repeatedPriorityQueueO(log n) updateno efficient decrease-key
range sum staticprefix sumO(1) query after O(n) buildupdates expensive
range sum dynamicFenwick/segment treeO(log n)more implementation complexity
graph traversaladjacency listO(V+E)representation matters
dependency resolutiontopological sortO(V+E)cycle handling required

24. Worked Example: From O(n²) to O(n)

Problem:

Given two lists of user IDs, return IDs that appear in both.

Naive:

List<String> intersection(List<String> a, List<String> b) {
    List<String> result = new ArrayList<>();
    for (String x : a) {
        if (b.contains(x)) {
            result.add(x);
        }
    }
    return result;
}

If b is an ArrayList:

Time: O(a.size * b.size)
Space: O(k) result

Improved:

List<String> intersection(List<String> a, List<String> b) {
    Set<String> bSet = new HashSet<>(b);
    List<String> result = new ArrayList<>();

    for (String x : a) {
        if (bSet.contains(x)) {
            result.add(x);
        }
    }
    return result;
}

Complexity:

Build HashSet: O(b.size) average
Loop over a: O(a.size) average
Total: O(a.size + b.size) average
Extra space: O(b.size)

But clarify semantics:

  • Should duplicates from a be preserved?
  • Should result be unique?
  • Should order follow a?
  • What if a or b is huge?
  • Is String comparison case-sensitive?

A complexity improvement without semantic clarity can still be wrong.


25. Worked Example: Top-K

Problem:

Given 10 million scores, return top 100.

Option 1: sort all.

scores.sort(Comparator.reverseOrder());
return scores.subList(0, 100);

Complexity:

O(n log n)

Option 2: min-heap of size k.

List<Integer> topK(List<Integer> scores, int k) {
    PriorityQueue<Integer> heap = new PriorityQueue<>();

    for (int score : scores) {
        if (heap.size() < k) {
            heap.offer(score);
        } else if (score > heap.peek()) {
            heap.poll();
            heap.offer(score);
        }
    }

    ArrayList<Integer> result = new ArrayList<>(heap);
    result.sort(Comparator.reverseOrder());
    return result;
}

Complexity:

O(n log k + k log k)

For n = 10,000,000 and k = 100, log k is much smaller than log n.

Trade-off:

  • heap solution better when k is small relative to n;
  • sorting simpler and may be fine for moderate n;
  • primitive arrays may be better than List<Integer> for very large numeric data;
  • if scores stream in online, heap solution works incrementally.

26. Worked Example: Range Sum

Problem:

Given an array and many range sum queries [l, r], answer each query quickly.

Naive per query:

long rangeSum(int[] values, int left, int right) {
    long sum = 0;
    for (int i = left; i <= right; i++) {
        sum += values[i];
    }
    return sum;
}

Cost:

O(length of range) per query

If many queries, build prefix sum:

final class RangeSum {
    private final long[] prefix;

    RangeSum(int[] values) {
        this.prefix = new long[values.length + 1];
        for (int i = 0; i < values.length; i++) {
            prefix[i + 1] = prefix[i] + values[i];
        }
    }

    long queryInclusive(int left, int right) {
        if (left < 0 || right < left || right >= prefix.length - 1) {
            throw new IndexOutOfBoundsException();
        }
        return prefix[right + 1] - prefix[left];
    }
}

Cost:

Build: O(n)
Query: O(1)
Space: O(n)

Invariant:

prefix[i] = sum of values[0..i-1]

This invariant makes query derivation obvious:

sum(left..right) = prefix[right + 1] - prefix[left]

27. Complexity and API Design

API shape can force complexity.

Bad API:

interface UserStore {
    List<User> allUsers();
}

Caller then repeatedly scans:

User findByEmail(UserStore store, String email) {
    for (User user : store.allUsers()) {
        if (user.email().equals(email)) {
            return user;
        }
    }
    throw new NoSuchElementException();
}

Better API if lookup is first-class:

interface UserStore {
    Optional<User> findByEmail(String email);
}

Even better if semantics are explicit:

interface UserDirectory {
    Optional<User> findActiveUserByNormalizedEmail(String normalizedEmail);
}

DSA is not only inside method body. It affects service interfaces, indexes, caches, and data ownership.


28. Complexity and Domain Modelling

A poor domain model can make every algorithm expensive.

Example:

record Case(String id, List<Action> actions) {}
record Action(String id, String assigneeId, Instant createdAt) {}

Requirement:

Find all open cases assigned to a given officer.

If only stored as List<Case>, query may require scanning all cases and actions.

Potential index:

Map<String, Set<String>> caseIdsByAssignee;

But now update complexity matters:

  • when action assigned;
  • when action completed;
  • when case closed;
  • when assignee changes;
  • when data is rebuilt;
  • when index becomes inconsistent.

Complexity improvement introduces consistency invariant:

For every open case C and active action A assigned to officer O,
caseIdsByAssignee[O] contains C.id.

Indexes are data structures with correctness obligations.


29. Complexity Contract Template

Gunakan template ini dalam design doc atau code review.

Operation:
  find active session by token

Data structure:
  HashMap<Token, Session>

Assumptions:
  Token is immutable.
  Token.hashCode is stable and well-distributed.
  Map is not accessed concurrently without synchronization.
  Expected max sessions: 500k.

Complexity:
  get: O(1) average, O(n) worst-case under severe collision
  put: O(1) amortized average, occasional resize O(n)
  memory: O(n)

Failure modes:
  Mutable token breaks lookup.
  High cardinality increases memory and GC pressure.
  Resize can cause latency spike.
  No eviction causes unbounded growth.

Mitigations:
  Use immutable token type.
  Pre-size map if cardinality known.
  Add eviction/TTL.
  Monitor size and latency.

This is how complexity becomes engineering communication.


30. Anti-Patterns

30.1 “It’s Only O(n)” Without N

O(n) for n = 100 and n = 100,000,000 are different engineering realities.

Always ask:

What is n?
How fast can n grow?
Is n per request, per tenant, per day, or global?

30.2 Confusing Amortized With Worst-Case

ArrayList.add is amortized O(1)

does not mean every add is worst-case O(1).

30.3 Assuming Library Means Optimal

Java Collections are excellent general-purpose tools, but not magic. LinkedList, HashMap, TreeMap, PriorityQueue, and ConcurrentHashMap have very different contracts.

30.4 Ignoring Memory

A time-optimal solution can be unusable if memory blows up.

30.5 Ignoring Semantics

Replacing List with Set changes duplicate behavior and ordering. Complexity improvement that changes semantics is a bug.


31. Complexity Review Examples

Example A: Duplicate Detection

boolean hasDuplicate(int[] values) {
    Set<Integer> seen = new HashSet<>();
    for (int value : values) {
        if (!seen.add(value)) {
            return true;
        }
    }
    return false;
}

Analysis:

Time: O(n) average, O(n²) pathological worst-case if hash degenerates severely
Space: O(n)
Java cost: boxing int to Integer
Alternative: sort int[] then scan, O(n log n) time, O(1) or O(n) depending sort, avoids boxing if array mutable

Decision:

  • if input can be mutated and memory matters, sort then scan may be better;
  • if preserving order and average speed matter, hash set is better;
  • if integer range small, boolean bitmap may be best.

Example B: First Repeated Character

OptionalInt firstRepeatedAsciiChar(String s) {
    boolean[] seen = new boolean[128];
    for (int i = 0; i < s.length(); i++) {
        char c = s.charAt(i);
        if (c >= 128) {
            throw new IllegalArgumentException("non-ASCII character");
        }
        if (seen[c]) {
            return OptionalInt.of(c);
        }
        seen[c] = true;
    }
    return OptionalInt.empty();
}

Analysis:

Time: O(n)
Space: O(1), because 128 fixed-size array

But semantic constraint is explicit: ASCII only. Without that, Unicode handling changes representation and correctness.


32. How to Analyze a New Solution

Use this sequence:

  1. Define n, m, V, E, or other variables.
  2. Identify dominant operations.
  3. Count loops and monotonic movement.
  4. Analyze data structure operations used inside loops.
  5. Separate build/preprocess/query/update costs.
  6. Separate average, worst, expected, and amortized claims.
  7. Analyze extra space.
  8. Add Java-specific costs.
  9. Validate against constraints.
  10. State assumptions.

Example statement:

Let n be number of users and m be number of allowed IDs.
Building the HashSet costs O(m) average time and O(m) space.
For each user, membership check is O(1) average, so filtering costs O(n) average.
Total average time is O(n + m), with worst-case degradation if hashing behaves pathologically.
The solution preserves input user order but does not deduplicate users.

Notice how this is much stronger than:

Time O(n), space O(n)

33. Exercises

Exercise 1: Identify Hidden Complexity

Analyze this code:

List<Order> findOrders(List<Order> orders, List<String> targetIds) {
    List<Order> result = new ArrayList<>();
    for (Order order : orders) {
        if (targetIds.contains(order.id())) {
            result.add(order);
        }
    }
    return result;
}

Answer:

  • define n = orders.size();
  • define m = targetIds.size();
  • if targetIds is list, complexity O(n*m);
  • improvement: convert target IDs to HashSet, but preserve semantics around duplicates/order.

Exercise 2: Preprocessing Decision

You receive 1 million products and 5 lookup queries by SKU. Should you build HashMap<Sku, Product>?

Answer depends on:

  • are there only 5 queries once?
  • is map reused?
  • how expensive is scan?
  • memory budget?
  • latency target?

For one-time 5 queries, scan may be acceptable. For repeated service lookup, index is better.

Exercise 3: Analyze Dynamic Array

Explain why doubling capacity gives amortized O(1) append, but single-step growth gives O(n) amortized append.

Expected reasoning:

  • doubling causes geometric copy series;
  • geometric series sums to linear;
  • +1 growth causes arithmetic series;
  • arithmetic series sums to quadratic.

Exercise 4: Complexity Contract

Write a complexity contract for:

Map<CustomerId, TreeSet<Invoice>> invoicesByCustomer;

Consider:

  • key equality;
  • invoice comparator;
  • insert cost;
  • find oldest unpaid invoice;
  • range query by due date;
  • memory overhead;
  • mutation hazard if invoice fields used in comparator change.

34. Self-Correction Checklist

Before moving on, verify:

[ ] I can explain Big-O without treating it as exact runtime.
[ ] I can separate worst-case, average-case, expected, and amortized.
[ ] I can analyze hidden complexity inside Java collection calls.
[ ] I can explain why dynamic array append is amortized O(1).
[ ] I can estimate whether O(n²) is feasible for a given n.
[ ] I can identify when memory overhead matters.
[ ] I can state assumptions behind a complexity claim.
[ ] I can write a complexity contract for an API.

35. Summary

Complexity is not an academic afterthought. It is the contract between your solution and future scale.

Key takeaways:

  • Big-O describes growth, not exact runtime.
  • Complexity claims depend on assumptions.
  • Worst-case, average-case, expected, and amortized are different.
  • Java-specific costs matter: allocation, boxing, indirection, comparator cost, GC, and concurrency.
  • Hidden complexity often appears inside innocent collection calls.
  • Preprocessing can trade build time and memory for faster query.
  • Correctness invariants often determine whether complexity claims are valid.
  • For production engineering, always state assumptions and failure modes.

Part berikutnya akan membahas Java Memory Model for Data Structures: object layout, primitive arrays, reference chasing, cache locality, allocation pressure, dan bagaimana runtime Java memengaruhi desain struktur data.


References

  • Oracle Java Collections Framework Documentation.
  • OpenJDK Java Microbenchmark Harness, JMH.
  • OpenJDK JDK 25 Project and Release Notes.
  • Joshua Bloch, Effective Java, especially object equality, hash code, and comparator contracts.
  • Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein, Introduction to Algorithms.
Lesson Recap

You just completed lesson 02 in start here. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.