Series/Learn Java Array, Collections, Iterator/Iterable, Stream

Deepen PracticeOrdered learning track

Collectors Deep Dive: Grouping, Partitioning, Mapping, Reducing

Learn Java Array, Collections, Iterator/Iterable, Stream - Part 024

Deep dive into Java Collectors: mutable reduction, collector anatomy, toList, toSet, toMap, groupingBy, partitioningBy, mapping, filtering, flatMapping, reducing, summarizing, teeing, ordering, map suppliers, merge policies, and production failure modes.

[2026-06-30]11 min read2012 words

In This Lesson

1. Posisi Part Ini dalam Framework Kaufman 2. `collect` vs `reduce`3. Anatomy Collector

PrevNext

Lesson 2432 lesson track19–27 Deepen Practice

#java#stream#collectors#collector+7 more

Part 024 — Collectors Deep Dive: Grouping, Partitioning, Mapping, Reducing

Target: setelah bagian ini, kamu mampu memilih dan merangkai Collectors untuk membuat hasil aggregation yang benar, deterministic, eksplisit terhadap duplicate policy, jelas terhadap ordering, dan aman terhadap mutability boundary. Kamu juga akan mampu membaca bug umum seperti duplicate key di toMap, grouping yang kehilangan urutan, mutable result yang bocor, downstream collector yang salah, dan collector yang terlihat elegan tetapi sulit dipertahankan.

collect adalah terminal operation yang mengubah stream menjadi struktur hasil.

Map<String, List<Order>> ordersByCustomer = orders.stream()
    .collect(Collectors.groupingBy(Order::customerId));

Tetapi Collectors bukan hanya helper method. Ia adalah model mutable reduction.

Mental model paling penting:

Collector = strategy for accumulating stream elements into a result container

Collectors menjawab pertanyaan:

container apa yang dibuat?
elemen dimasukkan bagaimana?
partial result digabung bagaimana?
apakah hasil akhir perlu ditransformasi?
apakah result mutable/unmodifiable?
apakah order dijaga?
bagaimana duplicate key diselesaikan?
bagaimana aggregation nested dibuat?

1. Posisi Part Ini dalam Framework Kaufman

Kaufman-style deconstruction untuk Collectors:

Do not memorize every collector.
Learn the small set of collector shapes and compose them correctly.

Shapes:

materialize into collection
index by key
group by classifier
partition by boolean predicate
transform before collecting downstream
reduce/summarize downstream
finish/lock result

2. `collect` vs `reduce`

reduce cocok untuk immutable value reduction.

BigDecimal total = invoices.stream()
    .map(Invoice::amount)
    .reduce(BigDecimal.ZERO, BigDecimal::add);

collect cocok untuk mutable accumulation.

List<OrderDto> result = orders.stream()
    .map(OrderDto::from)
    .collect(Collectors.toCollection(ArrayList::new));

Jangan gunakan reduce untuk memutasi container.

Anti-pattern:

List<OrderDto> result = orders.stream()
    .reduce(
        new ArrayList<>(),
        (list, order) -> {
            list.add(OrderDto.from(order));
            return list;
        },
        (left, right) -> {
            left.addAll(right);
            return left;
        }
    );

Masalah:

identity mutable dipakai ulang secara berbahaya
parallel semantics mudah salah
intent tidak jelas
collect sudah didesain untuk ini

Benar:

List<OrderDto> result = orders.stream()
    .map(OrderDto::from)
    .collect(Collectors.toCollection(ArrayList::new));

Rule:

Use reduce for immutable scalar/value result.
Use collect for mutable accumulation container.

3. Anatomy Collector

Dokumentasi Collector menjelaskan collector sebagai empat fungsi utama:

supplier    -> create result container
accumulator -> incorporate element into container
combiner    -> combine two containers
finisher    -> final transform

Konsep:

Collector<T, A, R>

T: input element type
A: mutable accumulation type
R: final result type

Contoh konseptual toList:

T = Order
A = ArrayList<Order>
R = List<Order>

supplier:    () -> new ArrayList<Order>()
accumulator: (list, order) -> list.add(order)
combiner:    (left, right) -> { left.addAll(right); return left; }
finisher:    maybe identity

Kamu tidak perlu selalu menulis custom collector. Tapi memahami anatomy ini membuat kamu bisa menilai:

apakah collector aman di parallel stream?
apakah combiner masuk akal?
apakah result mutable?
apakah downstream collector sesuai?

Custom collector akan dibahas di Part 025. Part ini fokus pada predefined collectors.

4. Materialization Collectors

4.1 `Stream.toList()` vs `Collectors.toList()`

Sejak Java 16, Stream punya toList() terminal operation.

List<OrderDto> dtos = orders.stream()
    .map(OrderDto::from)
    .toList();

Stream.toList() menghasilkan unmodifiable list menurut kontrak API modern.

Collectors.toList():

List<OrderDto> dtos = orders.stream()
    .map(OrderDto::from)
    .collect(Collectors.toList());

Kontrak Collectors.toList() tidak menjamin type, mutability, serializability, atau thread-safety tertentu.

Practical rule:

Need unmodifiable materialized result? Prefer stream.toList().
Need specific mutable collection? Use toCollection(ArrayList::new).
Need API boundary immutability? Prefer toList() or collectingAndThen(..., List::copyOf).

4.2 `toSet`

Set<String> customerIds = orders.stream()
    .map(Order::customerId)
    .collect(Collectors.toSet());

Jangan mengandalkan order dari toSet().

Jika order penting:

Set<String> customerIds = orders.stream()
    .map(Order::customerId)
    .collect(Collectors.toCollection(LinkedHashSet::new));

Jika sorted:

Set<String> customerIds = orders.stream()
    .map(Order::customerId)
    .collect(Collectors.toCollection(TreeSet::new));

4.3 `toCollection`

Gunakan jika implementasi result adalah bagian dari contract internal.

ArrayDeque<Task> queue = tasks.stream()
    .filter(Task::ready)
    .collect(Collectors.toCollection(ArrayDeque::new));

EnumSet<Permission> permissions = roles.stream()
    .flatMap(role -> role.permissions().stream())
    .collect(Collectors.toCollection(() -> EnumSet.noneOf(Permission.class)));

5. `toMap`: Indexing, Duplicate Policy, and Map Supplier

toMap adalah collector yang paling sering menyebabkan bug production.

Basic:

Map<String, User> usersById = users.stream()
    .collect(Collectors.toMap(User::id, Function.identity()));

Jika ada duplicate key, collector melempar exception.

Itu bagus jika duplicate adalah data integrity error.

Map<String, User> usersById = users.stream()
    .collect(Collectors.toMap(
        User::id,
        Function.identity()
    ));

Kode di atas menyatakan invariant:

User id must be unique in this stream.

5.1 Merge Policy

Jika duplicate valid, merge policy harus eksplisit.

Keep first:

Map<String, User> usersByEmail = users.stream()
    .collect(Collectors.toMap(
        User::email,
        Function.identity(),
        (first, duplicate) -> first
    ));

Keep last:

Map<String, User> usersByEmail = users.stream()
    .collect(Collectors.toMap(
        User::email,
        Function.identity(),
        (previous, latest) -> latest
    ));

Merge domain object:

Map<String, AccountSummary> summaries = rows.stream()
    .collect(Collectors.toMap(
        AccountRow::accountId,
        AccountSummary::from,
        AccountSummary::merge
    ));

Top engineer rule:

Never use merge function as a way to silence duplicate-key errors.
The merge function is a business rule.

5.2 Map Supplier

Default map implementation is usually HashMap. Jika ordering penting, tentukan supplier.

Preserve encounter order:

Map<String, User> usersById = users.stream()
    .collect(Collectors.toMap(
        User::id,
        Function.identity(),
        (a, b) -> { throw new IllegalStateException("Duplicate id: " + a.id()); },
        LinkedHashMap::new
    ));

Sorted keys:

Map<String, User> usersById = users.stream()
    .collect(Collectors.toMap(
        User::id,
        Function.identity(),
        (a, b) -> { throw new IllegalStateException("Duplicate id"); },
        TreeMap::new
    ));

Enum keys:

Map<Status, Long> countsByStatus = orders.stream()
    .collect(Collectors.groupingBy(
        Order::status,
        () -> new EnumMap<>(Status.class),
        Collectors.counting()
    ));

6. `groupingBy`: One-to-Many Classification

groupingBy mengelompokkan elemen berdasarkan classifier function.

Map<String, List<Order>> ordersByCustomer = orders.stream()
    .collect(Collectors.groupingBy(Order::customerId));

Mental model:

classifier(element) -> key
append element to group for that key

Use case:

orders by customer
events by type
errors by code
users by organization
records by effective date
tasks by status

6.1 `groupingBy` vs `toMap`

Gunakan toMap jika satu key harus punya satu value.

Map<String, User> userById = users.stream()
    .collect(Collectors.toMap(User::id, Function.identity()));

Gunakan groupingBy jika satu key bisa punya banyak value.

Map<String, List<User>> usersByDepartment = users.stream()
    .collect(Collectors.groupingBy(User::department));

Salah model:

Map<String, User> usersByDepartment = users.stream()
    .collect(Collectors.toMap(
        User::department,
        Function.identity(),
        (a, b) -> a
    ));

Ini menghapus user lain per department tanpa policy yang jelas.

6.2 Downstream Collector

Default groupingBy(classifier) mengumpulkan values ke List<T>.

Jika butuh count:

Map<String, Long> orderCountByCustomer = orders.stream()
    .collect(Collectors.groupingBy(
        Order::customerId,
        Collectors.counting()
    ));

Jika butuh set:

Map<String, Set<String>> productIdsByCustomer = orders.stream()
    .collect(Collectors.groupingBy(
        Order::customerId,
        Collectors.mapping(Order::productId, Collectors.toSet())
    ));

Jika butuh numeric summary:

Map<String, IntSummaryStatistics> itemStatsByCustomer = orders.stream()
    .collect(Collectors.groupingBy(
        Order::customerId,
        Collectors.summarizingInt(Order::itemCount)
    ));

7. `partitioningBy`: Boolean Split

partitioningBy adalah grouping khusus dengan key Boolean.

Map<Boolean, List<Order>> partitioned = orders.stream()
    .collect(Collectors.partitioningBy(Order::isHighRisk));

List<Order> highRisk = partitioned.get(true);
List<Order> normal = partitioned.get(false);

Gunakan jika benar-benar ada dua bucket berdasarkan predicate boolean.

Contoh downstream:

Map<Boolean, Long> counts = orders.stream()
    .collect(Collectors.partitioningBy(
        Order::isHighRisk,
        Collectors.counting()
    ));

Jangan gunakan partitioningBy untuk kategori lebih dari dua.

Salah:

orders.stream()
    .collect(Collectors.partitioningBy(order -> order.status() == Status.OPEN));

Jika kamu butuh status lengkap:

Map<Status, List<Order>> byStatus = orders.stream()
    .collect(Collectors.groupingBy(Order::status));

8. Downstream `mapping`

mapping mentransform elemen sebelum downstream collector.

Map<String, Set<String>> productIdsByCustomer = orders.stream()
    .collect(Collectors.groupingBy(
        Order::customerId,
        Collectors.mapping(Order::productId, Collectors.toSet())
    ));

Tanpa mapping, kamu harus group orders lalu map values afterward.

Kurang direct:

Map<String, List<Order>> grouped = orders.stream()
    .collect(Collectors.groupingBy(Order::customerId));

Map<String, Set<String>> result = new HashMap<>();
for (var entry : grouped.entrySet()) {
    result.put(entry.getKey(), entry.getValue().stream()
        .map(Order::productId)
        .collect(Collectors.toSet()));
}

mapping membuat transformation berada tepat di tempat aggregation terjadi.

9. Downstream `flatMapping`

flatMapping berguna jika satu elemen source menghasilkan banyak downstream values.

Map<String, Set<String>> skuByCustomer = orders.stream()
    .collect(Collectors.groupingBy(
        Order::customerId,
        Collectors.flatMapping(
            order -> order.lines().stream().map(OrderLine::sku),
            Collectors.toSet()
        )
    ));

Mental model:

group order by customer
within each group, flatten order lines
collect SKUs into set

Tanpa flatMapping, kamu sering perlu pre-flatten dengan helper record:

record CustomerSku(String customerId, String sku) {}

Map<String, Set<String>> result = orders.stream()
    .flatMap(order -> order.lines().stream()
        .map(line -> new CustomerSku(order.customerId(), line.sku())))
    .collect(Collectors.groupingBy(
        CustomerSku::customerId,
        Collectors.mapping(CustomerSku::sku, Collectors.toSet())
    ));

Keduanya valid. Pilih yang lebih jelas untuk domain.

10. Downstream `filtering`

filtering memfilter elemen dalam downstream context.

Map<String, List<Order>> highRiskOrdersByCustomer = orders.stream()
    .collect(Collectors.groupingBy(
        Order::customerId,
        Collectors.filtering(Order::isHighRisk, Collectors.toList())
    ));

Bedakan dengan filter sebelum grouping:

Map<String, List<Order>> highRiskOnlyCustomers = orders.stream()
    .filter(Order::isHighRisk)
    .collect(Collectors.groupingBy(Order::customerId));

Perbedaan semantic:

filter before grouping:
  customer with no high-risk orders may not appear

filtering downstream:
  customer can appear with empty downstream result

Ini penting untuk report completeness.

Contoh:

Map<String, Long> failedCountByService = logs.stream()
    .collect(Collectors.groupingBy(
        LogEvent::service,
        Collectors.filtering(
            LogEvent::failed,
            Collectors.counting()
        )
    ));

Jika semua service harus muncul, downstream filtering bisa lebih tepat.

11. Numeric Collectors

Untuk aggregation numeric by group:

Map<String, Integer> totalItemsByCustomer = orders.stream()
    .collect(Collectors.groupingBy(
        Order::customerId,
        Collectors.summingInt(Order::itemCount)
    ));

Long:

Map<String, Long> totalBytesByService = events.stream()
    .collect(Collectors.groupingBy(
        Event::service,
        Collectors.summingLong(Event::payloadBytes)
    ));

Double:

Map<String, Double> averageScoreBySegment = users.stream()
    .collect(Collectors.groupingBy(
        User::segment,
        Collectors.averagingDouble(User::score)
    ));

Summary:

Map<String, LongSummaryStatistics> latencyStatsByService = events.stream()
    .collect(Collectors.groupingBy(
        Event::service,
        Collectors.summarizingLong(Event::latencyNanos)
    ));

Remember from Part 023:

summingInt can overflow if int total too large
summingLong can overflow if long total too large
summingDouble has floating-point semantics
summaryStatistics empty semantics must be interpreted carefully

12. `reducing` Collector

Collectors.reducing berguna sebagai downstream collector.

Contoh: max order amount by customer.

Map<String, Optional<BigDecimal>> maxAmountByCustomer = orders.stream()
    .collect(Collectors.groupingBy(
        Order::customerId,
        Collectors.mapping(
            Order::amount,
            Collectors.reducing(BigDecimal::max)
        )
    ));

Namun untuk simple whole-stream reduction, gunakan Stream.reduce.

Optional<BigDecimal> maxAmount = orders.stream()
    .map(Order::amount)
    .reduce(BigDecimal::max);

Rule:

Use Collectors.reducing mostly as downstream of groupingBy/partitioningBy.
Use Stream.reduce for direct whole-stream reduction.

Dengan identity:

Map<String, BigDecimal> totalAmountByCustomer = orders.stream()
    .collect(Collectors.groupingBy(
        Order::customerId,
        Collectors.mapping(
            Order::amount,
            Collectors.reducing(BigDecimal.ZERO, BigDecimal::add)
        )
    ));

Tapi sering lebih jelas dengan custom domain collector atau toMap merge:

Map<String, BigDecimal> totalAmountByCustomer = orders.stream()
    .collect(Collectors.toMap(
        Order::customerId,
        Order::amount,
        BigDecimal::add
    ));

Pilih yang menyatakan intent lebih langsung.

13. `collectingAndThen`: Finish and Lock

collectingAndThen menjalankan finisher setelah downstream collector selesai.

Map<String, List<Order>> immutableOrdersByCustomer = orders.stream()
    .collect(Collectors.groupingBy(
        Order::customerId,
        Collectors.collectingAndThen(
            Collectors.toList(),
            List::copyOf
        )
    ));

Namun map outer masih mutable.

Untuk lock outer map juga:

Map<String, List<Order>> immutableOrdersByCustomer = orders.stream()
    .collect(Collectors.collectingAndThen(
        Collectors.groupingBy(
            Order::customerId,
            Collectors.collectingAndThen(
                Collectors.toList(),
                List::copyOf
            )
        ),
        Map::copyOf
    ));

This matters at API boundaries.

Mental model:

inner finishing locks values
outer finishing locks map

14. `joining`: String Aggregation

joining is useful for strings.

String csv = users.stream()
    .map(User::email)
    .collect(Collectors.joining(","));

With prefix/suffix:

String label = users.stream()
    .map(User::email)
    .collect(Collectors.joining(", ", "[", "]"));

Production caution:

avoid using joining to build SQL queries manually
avoid huge unbounded string aggregation without size policy
ensure escaping if output format matters

For audit/debug report, deterministic ordering matters:

String customerIds = orders.stream()
    .map(Order::customerId)
    .distinct()
    .sorted()
    .collect(Collectors.joining(","));

15. `teeing`: Two Aggregations, One Result

teeing combines two collectors and merges their results.

Example: min/max range.

record Range(int min, int max) {}

Range range = values.stream()
    .collect(Collectors.teeing(
        Collectors.minBy(Integer::compareTo),
        Collectors.maxBy(Integer::compareTo),
        (min, max) -> new Range(
            min.orElseThrow(),
            max.orElseThrow()
        )
    ));

But for primitive numbers, summary statistics may be better:

IntSummaryStatistics stats = values.stream()
    .mapToInt(Integer::intValue)
    .summaryStatistics();

teeing shines when result combines different derived values:

record AuditSummary(long total, long failed) {}

AuditSummary summary = events.stream()
    .collect(Collectors.teeing(
        Collectors.counting(),
        Collectors.filtering(Event::failed, Collectors.counting()),
        AuditSummary::new
    ));

Use it when it improves intent. Avoid it when a small loop is clearer.

16. Ordering Semantics

Collectors can preserve or lose order depending on source, collector, and container.

Examples:

List<Order> list = orders.stream().toList();

For ordered stream, list encounter order is preserved.

Set:

Set<String> ids = orders.stream()
    .map(Order::id)
    .collect(Collectors.toSet());

No deterministic iteration order guarantee.

Preserve order:

Set<String> ids = orders.stream()
    .map(Order::id)
    .collect(Collectors.toCollection(LinkedHashSet::new));

Group map ordering:

Map<String, List<Order>> byCustomer = orders.stream()
    .collect(Collectors.groupingBy(Order::customerId));

Do not assume map key iteration order.

Preserve first-seen key order:

Map<String, List<Order>> byCustomer = orders.stream()
    .collect(Collectors.groupingBy(
        Order::customerId,
        LinkedHashMap::new,
        Collectors.toList()
    ));

Sorted key order:

Map<String, List<Order>> byCustomer = orders.stream()
    .collect(Collectors.groupingBy(
        Order::customerId,
        TreeMap::new,
        Collectors.toList()
    ));

Production rule:

If output is used for audit, tests, serialization, reports, or diffs,
make ordering explicit.

17. Null Policy

Collectors often interact badly with unclear null policy.

Examples:

Map<String, User> byId = users.stream()
    .collect(Collectors.toMap(User::id, Function.identity()));

If User::id can return null, behavior depends on map implementation and collector path. Do not let null drift into collector logic accidentally.

Better:

Map<String, User> byId = users.stream()
    .filter(user -> user.id() != null)
    .collect(Collectors.toMap(User::id, Function.identity()));

But filtering might hide data quality problems.

More defensible:

Map<String, User> byId = users.stream()
    .peek(user -> {
        if (user.id() == null) {
            throw new IllegalArgumentException("User id must not be null: " + user);
        }
    })
    .collect(Collectors.toMap(User::id, Function.identity()));

However peek for validation side effects is debatable. A clearer loop or validation method may be better:

static String requireUserId(User user) {
    if (user.id() == null) {
        throw new IllegalArgumentException("User id must not be null: " + user);
    }
    return user.id();
}

Map<String, User> byId = users.stream()
    .collect(Collectors.toMap(
        CollectorsLesson::requireUserId,
        Function.identity()
    ));

Rule:

Classifier/key mapper/value mapper should encode null policy explicitly.

18. Duplicate Policy Patterns

18.1 Fail on Duplicate with Helpful Message

Collectors.toMap default duplicate exception message may not carry enough domain context. You can encode stronger policy.

static <T> BinaryOperator<T> duplicateKey(String keyName) {
    return (a, b) -> {
        throw new IllegalStateException("Duplicate " + keyName + ": " + a + " vs " + b);
    };
}

Use:

Map<String, User> byEmail = users.stream()
    .collect(Collectors.toMap(
        User::email,
        Function.identity(),
        duplicateKey("email"),
        LinkedHashMap::new
    ));

18.2 Keep Latest by Version

Map<String, Contract> latestById = contracts.stream()
    .collect(Collectors.toMap(
        Contract::id,
        Function.identity(),
        BinaryOperator.maxBy(Comparator.comparing(Contract::version))
    ));

18.3 Merge Lists

Map<String, List<ErrorDetail>> errorsByEntity = errors.stream()
    .collect(Collectors.toMap(
        ErrorDetail::entityId,
        error -> new ArrayList<>(List.of(error)),
        (left, right) -> {
            left.addAll(right);
            return left;
        }
    ));

But this is exactly groupingBy shape:

Map<String, List<ErrorDetail>> errorsByEntity = errors.stream()
    .collect(Collectors.groupingBy(ErrorDetail::entityId));

Prefer groupingBy if result is one-to-many.

19. Grouping Pattern Catalogue

19.1 Count by Status

Map<Status, Long> countByStatus = orders.stream()
    .collect(Collectors.groupingBy(
        Order::status,
        () -> new EnumMap<>(Status.class),
        Collectors.counting()
    ));

19.2 IDs by Customer

Map<String, Set<String>> orderIdsByCustomer = orders.stream()
    .collect(Collectors.groupingBy(
        Order::customerId,
        LinkedHashMap::new,
        Collectors.mapping(Order::id, Collectors.toCollection(LinkedHashSet::new))
    ));

19.3 Stats by Service

Map<String, LongSummaryStatistics> latencyByService = events.stream()
    .collect(Collectors.groupingBy(
        Event::service,
        Collectors.summarizingLong(Event::latencyNanos)
    ));

19.4 High-Risk Count by Region

Map<String, Long> highRiskCountByRegion = orders.stream()
    .collect(Collectors.groupingBy(
        Order::region,
        Collectors.filtering(
            Order::highRisk,
            Collectors.counting()
        )
    ));

19.5 Nested Grouping

Map<String, Map<Status, Long>> countByRegionAndStatus = orders.stream()
    .collect(Collectors.groupingBy(
        Order::region,
        Collectors.groupingBy(
            Order::status,
            () -> new EnumMap<>(Status.class),
            Collectors.counting()
        )
    ));

Nested grouping is powerful, but readability degrades quickly. For complex reporting, consider explicit accumulator type.

20. When Collectors Become Too Clever

Collectors are expressive, but deeply nested collectors can become unreadable.

Example smell:

var result = orders.stream()
    .collect(Collectors.groupingBy(
        Order::region,
        Collectors.mapping(
            Order::customer,
            Collectors.collectingAndThen(
                Collectors.groupingBy(
                    Customer::segment,
                    Collectors.flatMapping(
                        customer -> customer.permissions().stream(),
                        Collectors.filtering(
                            Permission::enabled,
                            Collectors.mapping(Permission::code, Collectors.toSet())
                        )
                    )
                ),
                Map::copyOf
            )
        )
    ));

This may be technically valid but not maintainable.

Refactor options:

Extract classifier methods.
Extract downstream collector factory methods.
Use intermediate record.
Use loop with named accumulator.
Split into multiple transformations if dataset size allows.

Example:

static Collector<Order, ?, Set<String>> enabledPermissionCodes() {
    return Collectors.flatMapping(
        order -> order.customer().permissions().stream(),
        Collectors.filtering(
            Permission::enabled,
            Collectors.mapping(Permission::code, Collectors.toSet())
        )
    );
}

Top engineer rule:

Collector composition is good when it compresses accidental complexity.
It is bad when it hides business rules.

21. Production Case Study: Validation Report

Problem:

Dari list ValidationError, buat report:

errors by entity id

count by severity

field names by entity id

deterministic ordering

immutable result boundary

Model:

record ValidationError(
    String entityId,
    String field,
    Severity severity,
    String message
) {}

enum Severity {
    INFO, WARNING, ERROR
}

record ValidationReport(
    Map<String, List<ValidationError>> errorsByEntity,
    Map<Severity, Long> countBySeverity,
    Map<String, Set<String>> fieldsByEntity
) {}

Implementation:

static ValidationReport report(List<ValidationError> errors) {
    Map<String, List<ValidationError>> errorsByEntity = errors.stream()
        .collect(Collectors.collectingAndThen(
            Collectors.groupingBy(
                ValidationError::entityId,
                LinkedHashMap::new,
                Collectors.collectingAndThen(
                    Collectors.toList(),
                    List::copyOf
                )
            ),
            Map::copyOf
        ));

    Map<Severity, Long> countBySeverity = errors.stream()
        .collect(Collectors.collectingAndThen(
            Collectors.groupingBy(
                ValidationError::severity,
                () -> new EnumMap<>(Severity.class),
                Collectors.counting()
            ),
            map -> Map.copyOf(map)
        ));

    Map<String, Set<String>> fieldsByEntity = errors.stream()
        .collect(Collectors.collectingAndThen(
            Collectors.groupingBy(
                ValidationError::entityId,
                LinkedHashMap::new,
                Collectors.mapping(
                    ValidationError::field,
                    Collectors.collectingAndThen(
                        Collectors.toCollection(LinkedHashSet::new),
                        Set::copyOf
                    )
                )
            ),
            Map::copyOf
        ));

    return new ValidationReport(errorsByEntity, countBySeverity, fieldsByEntity);
}

Analysis:

LinkedHashMap preserves first-seen entity order before Map.copyOf boundary.
EnumMap is appropriate for enum severity internally.
List::copyOf and Set::copyOf protect inner collections.
Map::copyOf protects outer map.
Three traversals are acceptable if clarity and dataset size are reasonable.

If report generation is hot path and data volume huge, use explicit accumulator.

22. Collector Selection Matrix

Need	Collector shape	Example
Immutable list	`stream.toList()`	`orders.stream().map(...).toList()`
Specific mutable list	`toCollection(ArrayList::new)`	queue/build buffer
Preserve unique order	`toCollection(LinkedHashSet::new)`	audit/report IDs
Index by unique key	`toMap(key, value)`	`userById`
Index with duplicate rule	`toMap(key, value, merge)`	latest version by id
Preserve map order	`toMap(..., LinkedHashMap::new)`	deterministic output
Group one-to-many	`groupingBy(classifier)`	orders by customer
Count by key	`groupingBy(key, counting())`	status counts
Boolean split	`partitioningBy(predicate)`	valid/invalid
Extract values per group	`groupingBy(key, mapping(...))`	product IDs by customer
Flatten values per group	`flatMapping` downstream	SKUs by customer
Filter inside group	`filtering` downstream	failed count per service
Numeric stats by group	`summarizingLong` downstream	latency by service
Finish immutable result	`collectingAndThen`	`List::copyOf`
Combine two aggregations	`teeing`	total + failed count

23. Failure Catalogue

23.1 Duplicate Key Explosion

Map<String, User> byEmail = users.stream()
    .collect(Collectors.toMap(User::email, Function.identity()));

Fails if email duplicate. Decide:

should duplicate fail?
keep first?
keep latest?
merge?
group?

23.2 Accidental Data Loss

Map<String, User> byDepartment = users.stream()
    .collect(Collectors.toMap(
        User::department,
        Function.identity(),
        (a, b) -> a
    ));

This silently drops users. Probably should be:

Map<String, List<User>> byDepartment = users.stream()
    .collect(Collectors.groupingBy(User::department));

23.3 Ordering Assumption

Set<String> ids = users.stream()
    .map(User::id)
    .collect(Collectors.toSet());

Do not assume deterministic iteration order. Use LinkedHashSet or sorted collection.

23.4 Mutable Result Leak

class UserIndex {
    private final Map<String, List<User>> usersByDepartment;

    UserIndex(List<User> users) {
        this.usersByDepartment = users.stream()
            .collect(Collectors.groupingBy(User::department));
    }

    Map<String, List<User>> usersByDepartment() {
        return usersByDepartment;
    }
}

Caller can mutate map/list.

Better:

this.usersByDepartment = users.stream()
    .collect(Collectors.collectingAndThen(
        Collectors.groupingBy(
            User::department,
            Collectors.collectingAndThen(Collectors.toList(), List::copyOf)
        ),
        Map::copyOf
    ));

23.5 Wrong Filter Placement

Map<String, List<Order>> highRisk = orders.stream()
    .filter(Order::highRisk)
    .collect(Collectors.groupingBy(Order::customerId));

Customers without high-risk orders disappear. If report needs all customers, use downstream filtering.

23.6 Overly Clever Nested Collector

If collector takes longer to explain than the business rule, extract named methods or use a loop.

24. Code Review Checklist

For every collector pipeline, ask:

What is the result shape?
Is the collector shape appropriate: materialize, index, group, partition, summarize?
If toMap, what is duplicate policy?
If groupingBy, is one-to-many really intended?
Is ordering explicit where needed?
Is map implementation explicit where needed?
Is result mutability acceptable?
Are inner collections also protected if needed?
Is null policy explicit?
Is downstream collector correct?
Is filter placement semantically correct?
Would a loop be clearer?
Would an explicit accumulator type be more maintainable?
Is numeric aggregation safe for overflow/precision?
Is this collector safe if stream becomes parallel later?

25. Latihan Terarah

Latihan 1 — Unique Index

Diberikan:

record User(String id, String email, String department) {}

Buat:

Map<String, User> userById(List<User> users)

Constraint:

duplicate id harus fail
error message harus helpful
result preserve encounter order

Latihan 2 — Group with Downstream Mapping

Buat:

Map<String, Set<String>> emailsByDepartment(List<User> users)

Constraint:

department order first-seen
email order first-seen per department
immutable outer map dan inner set

Latihan 3 — Partition Validation

Diberikan:

record ValidationResult(String id, boolean valid, List<String> errors) {}

Buat partition valid/invalid dan count masing-masing.

Bandingkan:

partitioningBy(ValidationResult::valid)
groupingBy(ValidationResult::valid)

Latihan 4 — Report Completeness

Diberikan semua service dan logs. Buat failed count per service, termasuk service dengan count 0.

Hint:

downstream filtering hanya menjaga key yang muncul di source
jika service harus muncul walau tidak ada log, kamu butuh seed map atau post-fill step

Latihan 5 — Replace Clever Collector

Ambil satu nested collector kompleks dari codebase dan refactor menjadi:

named helper collector methods, atau
explicit accumulator class

Bandingkan readability dan testability.

26. Ringkasan

Collectors adalah tool untuk mutable reduction dan structured aggregation.

Key ideas:

collect berbeda dari reduce; gunakan collect untuk mutable containers.
Collector punya supplier, accumulator, combiner, finisher.
toMap harus punya duplicate policy yang jelas.
groupingBy cocok untuk one-to-many classification.
partitioningBy cocok untuk boolean split.
downstream collectors seperti mapping, flatMapping, filtering, counting, summarizingX, dan reducing membuat aggregation lebih tepat.
ordering tidak boleh diasumsikan; tentukan collection/map supplier jika penting.
mutability boundary harus eksplisit, termasuk inner collections.
collector composition harus memperjelas business rule, bukan menyembunyikannya.

Mental model final:

Collector is not just a convenience method.
Collector is an explicit aggregation contract.

Part berikutnya akan membahas custom collectors: kapan perlu dibuat, bagaimana memastikan identity/associativity/combiner correctness, dan bagaimana menguji collector agar aman untuk sequential maupun parallel execution.

References

Java SE 25 API — Collector: https://docs.oracle.com/en/java/javase/25/docs/api/java.base/java/util/stream/Collector.html
Java SE 25 API — Collectors: https://docs.oracle.com/en/java/javase/25/docs/api/java.base/java/util/stream/Collectors.html
Java SE 25 API — Stream: https://docs.oracle.com/en/java/javase/25/docs/api/java.base/java/util/stream/Stream.html
Java SE 25 API — Map: https://docs.oracle.com/en/java/javase/25/docs/api/java.base/java/util/Map.html
Java SE 25 API — EnumMap: https://docs.oracle.com/en/java/javase/25/docs/api/java.base/java/util/EnumMap.html

Lesson Recap

You just completed lesson 24 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 23

Primitive Streams: IntStream, LongStream, DoubleStream

Next Lesson

Lesson 25

Custom Collectors: Correctness, Associativity, and Parallel Safety

Collectors Deep Dive: Grouping, Partitioning, Mapping, Reducing

Part 024 — Collectors Deep Dive: Grouping, Partitioning, Mapping, Reducing

1. Posisi Part Ini dalam Framework Kaufman

2. collect vs reduce

3. Anatomy Collector

4. Materialization Collectors

4.1 Stream.toList() vs Collectors.toList()

4.2 toSet

4.3 toCollection

5. toMap: Indexing, Duplicate Policy, and Map Supplier

5.1 Merge Policy

5.2 Map Supplier

6. groupingBy: One-to-Many Classification

6.1 groupingBy vs toMap

6.2 Downstream Collector

7. partitioningBy: Boolean Split

8. Downstream mapping

9. Downstream flatMapping

10. Downstream filtering

11. Numeric Collectors

12. reducing Collector

13. collectingAndThen: Finish and Lock

14. joining: String Aggregation

15. teeing: Two Aggregations, One Result

16. Ordering Semantics

17. Null Policy

18. Duplicate Policy Patterns

18.1 Fail on Duplicate with Helpful Message

18.2 Keep Latest by Version

18.3 Merge Lists

19. Grouping Pattern Catalogue

19.1 Count by Status

19.2 IDs by Customer

19.3 Stats by Service

19.4 High-Risk Count by Region

19.5 Nested Grouping

20. When Collectors Become Too Clever

21. Production Case Study: Validation Report

22. Collector Selection Matrix

23. Failure Catalogue

23.1 Duplicate Key Explosion

23.2 Accidental Data Loss

23.3 Ordering Assumption

23.4 Mutable Result Leak

23.5 Wrong Filter Placement

23.6 Overly Clever Nested Collector

24. Code Review Checklist

25. Latihan Terarah

Latihan 1 — Unique Index

Latihan 2 — Group with Downstream Mapping

Latihan 3 — Partition Validation

Latihan 4 — Report Completeness

Latihan 5 — Replace Clever Collector

26. Ringkasan

References

2. `collect` vs `reduce`

4.1 `Stream.toList()` vs `Collectors.toList()`

4.2 `toSet`

4.3 `toCollection`

5. `toMap`: Indexing, Duplicate Policy, and Map Supplier

6. `groupingBy`: One-to-Many Classification

6.1 `groupingBy` vs `toMap`

7. `partitioningBy`: Boolean Split

8. Downstream `mapping`

9. Downstream `flatMapping`

10. Downstream `filtering`

12. `reducing` Collector

13. `collectingAndThen`: Finish and Lock

14. `joining`: String Aggregation

15. `teeing`: Two Aggregations, One Result