Build CoreOrdered learning track

Learn Postgresql In Action Part 014 Indexing Specialized Indexes

[]18 min read3554 words

In This Lesson

1. Kaufman Skill Target 2. First Principle: Index Type Follows Predicate Semantics 3. PostgreSQL Index Families at a Glance

PrevNext

Lesson 1435 lesson track07–19 Build Core

title: Learn PostgreSQL in Action - Part 014 description: Specialized PostgreSQL indexes for advanced workloads: GIN, GiST, SP-GiST, BRIN, Hash, JSONB, arrays, full-text search, trigram search, ranges, and time-series access patterns. series: learn-postgresql-in-action seriesTitle: Learn PostgreSQL in Action order: 14 partTitle: Specialized Indexes: GIN, GiST, SP-GiST, BRIN, Hash tags:

postgresql
indexing
gin
gist
brin
jsonb
full-text-search
java
series date: 2026-07-01

Part 014 — Specialized Indexes: GIN, GiST, SP-GiST, BRIN, Hash

B-tree indexes dominate OLTP because most applications ask equality, range, and ordered-list questions.

But PostgreSQL is not limited to B-tree. It has multiple index access methods because different data shapes need different search structures.

This part answers one question:

When the query is not naturally a scalar equality/range/order problem, which PostgreSQL index type should we reach for, and what are the trade-offs?

We will cover:

GIN;
GiST;
SP-GiST;
BRIN;
Hash;
trigram indexes;
full-text search indexes;
JSONB indexes;
array indexes;
range indexes;
operational failure modes.

1. Kaufman Skill Target

The target is not memorizing every index type.

The target is classification:

Given a query predicate and data shape, classify the problem into the correct index family, choose a safe initial index, and validate it with EXPLAIN and workload behavior.

After this part, you should be able to answer:

Is this a B-tree problem or not?
Should JSONB use jsonb_ops, jsonb_path_ops, expression index, or generated column?
Should search use full-text search or trigram?
Should time-series data use B-tree or BRIN?
Why can a GIN index be fast for reads but expensive for writes?
Why does a GiST index often return candidates that require recheck?
Why can range/exclusion constraints depend on GiST?
Why is a Hash index rarely the first index you choose?
Why should Java APIs expose search semantics clearly before choosing an index?

Skill map:

2. First Principle: Index Type Follows Predicate Semantics

Do not start with:

Should we use GIN?

Start with:

What question is the query asking?

Query Question	Typical Index Family
Is scalar equal to value?	B-tree
Is scalar in range?	B-tree
Return rows in sorted order?	B-tree
Does JSON contain this object/path?	GIN or expression/B-tree
Does array contain element?	GIN
Does text match words linguistically?	Full-text with GIN/GiST
Does text look similar / contain substring?	Trigram GIN/GiST
Does range overlap another range?	GiST or SP-GiST depending type/operator
Does geometry overlap/intersect?	GiST/SP-GiST via PostGIS/operator classes
Is huge append-only table filtered by naturally ordered column?	BRIN
Is equality only, no ordering/range?	B-tree usually; Hash only selectively

The access method is a consequence, not the starting point.

3. PostgreSQL Index Families at a Glance

Short mental model:

Index	Mental Model	Common Use
B-tree	sorted keys	equality, range, order
GIN	inverted index from element to row	JSONB, arrays, full-text
GiST	generalized tree of bounding predicates	ranges, geometry, nearest-neighbor, exclusion
SP-GiST	partitioned search space	certain spatial/prefix/range patterns
BRIN	block-level min/max summaries	huge naturally ordered tables
Hash	hash equality lookup	narrow equality-only cases

4. GIN: Generalized Inverted Index

GIN indexes are designed for composite values where a row contains multiple searchable elements.

Think:

row -> many tokens/elements

GIN builds the reverse mapping:

element/token -> rows containing it

GIN is common for:

JSONB containment;
arrays;
full-text search tsvector;
trigram search through pg_trgm;
hstore if used.

GIN is usually read-powerful and write-expensive.

5. GIN for JSONB

Suppose we have flexible case attributes:

CREATE TABLE case_event (
    id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id uuid NOT NULL,
    event_type text NOT NULL,
    attributes jsonb NOT NULL,
    created_at timestamptz NOT NULL DEFAULT now()
);

Example JSON:

{
  "channel": "email",
  "risk": "high",
  "actor": {
    "type": "officer",
    "id": "u123"
  },
  "tags": ["fraud", "priority"]
}

Containment query:

SELECT *
FROM case_event
WHERE attributes @> '{"risk":"high"}'::jsonb;

GIN index:

CREATE INDEX idx_case_event_attributes_gin
ON case_event USING gin (attributes);

This supports JSONB containment-style queries.

However, this is broad. It indexes many keys/values and can grow large.

6. JSONB Operator Class: `jsonb_ops` vs `jsonb_path_ops`

For JSONB GIN, two common operator classes matter:

CREATE INDEX idx_event_attr_ops
ON case_event USING gin (attributes jsonb_ops);

CREATE INDEX idx_event_attr_path_ops
ON case_event USING gin (attributes jsonb_path_ops);

High-level trade-off:

Operator Class	Strength	Trade-off
`jsonb_ops`	broad operator support	larger index
`jsonb_path_ops`	efficient containment-focused indexing	narrower operator support

Use jsonb_path_ops when your dominant query is containment:

attributes @> '{"risk":"high"}'::jsonb

Use default jsonb_ops when you need broader JSONB operator coverage.

Do not choose blindly. Choose based on actual operators.

7. JSONB: GIN vs Expression B-tree vs Generated Column

Not every JSONB query needs a GIN index.

Query:

SELECT *
FROM case_event
WHERE attributes ->> 'externalReference' = ?;

This is a scalar equality lookup on one extracted value.

Option 1: expression B-tree index:

CREATE INDEX idx_case_event_external_ref
ON case_event ((attributes ->> 'externalReference'));

Option 2: generated column:

ALTER TABLE case_event
ADD COLUMN external_reference text
GENERATED ALWAYS AS (attributes ->> 'externalReference') STORED;

CREATE INDEX idx_case_event_external_reference
ON case_event (external_reference);

Option 3: broad JSONB GIN:

CREATE INDEX idx_case_event_attributes_gin
ON case_event USING gin (attributes);

Decision:

Need	Better Choice
Search many arbitrary JSON keys	GIN
Search one stable scalar field	Expression B-tree or generated column
Field becomes domain-important	Generated column
Need relational constraint on extracted value	Generated column/domain schema

A mature PostgreSQL system tends to move stable JSON attributes toward relational/generator boundaries.

8. JSONB Indexing Failure Modes

8.1 Over-indexing arbitrary JSON

A broad GIN index on a huge JSONB column can be large and write-heavy.

Bad default:

CREATE INDEX idx_every_payload_gin
ON audit_log USING gin (payload);

If the audit log is append-heavy and rarely queried by arbitrary JSON containment, this may be pure overhead.

8.2 Query operator mismatch

Index built for containment:

attributes @> '{"risk":"high"}'

Application emits extraction comparison:

attributes ->> 'risk' = 'high'

Those are different operator shapes. The planner may choose different index strategies.

8.3 Unbounded JSON schema drift

If each tenant stores different keys, a GIN index may become large while still not serving stable business queries well.

8.4 Type ambiguity

JSON string and number are different:

{"amount": "100"}

is not the same as:

{"amount": 100}

Your Java serialization contract matters.

9. GIN for Arrays

PostgreSQL arrays can be indexed with GIN for containment/overlap queries.

Example:

CREATE TABLE document (
    id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id uuid NOT NULL,
    title text NOT NULL,
    tags text[] NOT NULL DEFAULT '{}'
);

CREATE INDEX idx_document_tags_gin
ON document USING gin (tags);

Queries:

-- contains tag
SELECT *
FROM document
WHERE tags @> ARRAY['fraud'];

-- overlaps any tag
SELECT *
FROM document
WHERE tags && ARRAY['fraud', 'priority'];

But arrays should not replace relational modeling when elements have lifecycle, ownership, or metadata.

Bad use:

case_file.assigned_user_ids uuid[]

If assignments have time, actor, role, and audit semantics, use a child table.

Good use:

small immutable labels
simple feature flags
search tags without metadata

10. GIN Pending List and Write Behavior

GIN indexes can use a pending list to optimize writes. Inserts accumulate and later get merged into the main index structure.

Operational implication:

reads may occasionally pay cleanup cost;
bulk writes can create pending-list pressure;
autovacuum and maintenance settings matter;
sudden latency spikes can appear if cleanup occurs on user queries.

For write-heavy GIN workloads, monitor:

SELECT *
FROM pg_stat_user_indexes
WHERE indexrelname = 'idx_case_event_attributes_gin';

and index size:

SELECT
    relname,
    pg_size_pretty(pg_relation_size(oid)) AS size
FROM pg_class
WHERE relname LIKE 'idx_case_event%';

Practical rule:

GIN is powerful for search-like reads, but it should be justified on write-heavy tables.

11. Full-Text Search

Full-text search is for linguistic token search, not substring matching.

Example table:

CREATE TABLE article (
    id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id uuid NOT NULL,
    title text NOT NULL,
    body text NOT NULL,
    search_vector tsvector GENERATED ALWAYS AS (
        setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(body, '')), 'B')
    ) STORED
);

Index:

CREATE INDEX idx_article_search_vector
ON article USING gin (search_vector);

Query:

SELECT id, title
FROM article
WHERE search_vector @@ plainto_tsquery('english', ?)
ORDER BY ts_rank(search_vector, plainto_tsquery('english', ?)) DESC
LIMIT 20;

Full-text search is good when users search for words/concepts:

"payment dispute"
"regulatory breach"
"late filing"

It is not ideal for arbitrary substring:

"abc" inside "XX-abc-123"

That is trigram territory.

12. Trigram Search with `pg_trgm`

Trigram indexing helps similarity and substring search.

Enable extension:

CREATE EXTENSION IF NOT EXISTS pg_trgm;

Index:

CREATE INDEX idx_customer_name_trgm
ON customer USING gin (name gin_trgm_ops);

Substring query:

SELECT *
FROM customer
WHERE name ILIKE '%' || ? || '%'
LIMIT 50;

Similarity query:

SELECT *
FROM customer
WHERE name % ?
ORDER BY similarity(name, ?) DESC
LIMIT 20;

Trigram is useful for:

user-facing fuzzy search;
partial identifiers;
names;
email fragments;
case references;
external IDs with inconsistent formatting.

But it is not a substitute for normalized exact lookup.

If you need exact lookup:

WHERE reference_no = ?

use B-tree.

If you need fuzzy search:

WHERE reference_no ILIKE '%ABC%'

consider trigram.

13. Java API Semantics: Search Must Be Explicit

Do not expose one vague endpoint:

GET /cases?search=abc

without defining semantics.

The database index depends on the meaning of search.

Possible meanings:

API Meaning	Query	Index
exact reference	`reference_no = ?`	B-tree
prefix reference	`reference_no LIKE 'ABC%'`	B-tree pattern ops
substring reference	`reference_no ILIKE '%ABC%'`	Trigram
full-text notes	`search_vector @@ query`	GIN full-text
JSON attribute	`attributes @> ...`	GIN JSONB

A top-tier engineer forces this distinction into API design.

Bad API contract:

search means whatever product wants this month

Good API contract:

referenceExact
referencePrefix
freeText
attributeFilter

Indexing starts at the product/API semantic boundary.

14. GiST: Generalized Search Tree

GiST is a framework for building balanced tree indexes over many kinds of data and predicates.

Mental model:

GiST stores bounding/summary information that can quickly eliminate impossible rows, then rechecks candidates.

It is common for:

range types;
exclusion constraints;
geometric/spatial indexing;
nearest-neighbor search for supported operator classes;
certain full-text scenarios;
PostGIS indexes.

Example with range type:

CREATE TABLE room_booking (
    id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
    room_id uuid NOT NULL,
    during tstzrange NOT NULL
);

CREATE INDEX idx_room_booking_during_gist
ON room_booking USING gist (during);

Overlap query:

SELECT *
FROM room_booking
WHERE room_id = ?
  AND during && tstzrange(?, ?);

The range overlap operator && is not a simple scalar equality question. GiST is a natural fit.

15. GiST and Exclusion Constraints

Exclusion constraints can enforce rules like "no overlapping active periods".

Example: no two bookings for the same room may overlap.

CREATE EXTENSION IF NOT EXISTS btree_gist;

CREATE TABLE room_booking (
    id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
    room_id uuid NOT NULL,
    during tstzrange NOT NULL,
    EXCLUDE USING gist (
        room_id WITH =,
        during WITH &&
    )
);

This is more robust than checking in Java:

SELECT count(*)
FROM room_booking
WHERE room_id = ?
  AND during && ?;

then inserting.

The check-then-insert pattern races under concurrency unless protected by proper locks or constraints.

Exclusion constraint turns the invariant into database-enforced concurrency control.

16. Range Indexing for Temporal Systems

PostgreSQL range types are powerful for temporal modeling.

Example:

CREATE TABLE policy_version (
    id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
    policy_id uuid NOT NULL,
    valid_during tstzrange NOT NULL,
    status text NOT NULL
);

CREATE EXTENSION IF NOT EXISTS btree_gist;

ALTER TABLE policy_version
ADD CONSTRAINT ex_policy_no_overlap
EXCLUDE USING gist (
    policy_id WITH =,
    valid_during WITH &&
)
WHERE (status = 'ACTIVE');

Invariant:

A policy cannot have overlapping active validity periods.

This is relevant to regulatory systems, pricing engines, entitlement systems, and workflow rules.

Application implication:

Java should model validity as an interval, not just two loose timestamps;
service code should handle exclusion violation as a domain conflict;
tests should include concurrent insert attempts.

17. GiST Recheck Behavior

GiST often returns candidates that require recheck.

In EXPLAIN, you may see:

Index Cond: (during && tstzrange(...))
Rows Removed by Index Recheck: ...

This is not automatically bad.

GiST may use bounding approximations. It eliminates impossible rows quickly, then verifies exact conditions.

The key is the ratio:

few rechecks: good selectivity;
many rechecks: index is too lossy for the query/data shape;
high buffer reads: check physical layout and selectivity;
bad estimates: check statistics and operator class assumptions.

18. SP-GiST: Space-Partitioned GiST

SP-GiST supports partitioned search structures. It is useful when the data can be partitioned into non-overlapping regions or prefixes depending on operator class.

Mental model:

GiST: bounding hierarchy
SP-GiST: partitioned search space

Common use cases can include:

certain geometric data;
network address types;
text prefix/patricia-trie-like operator classes;
point data where partitioning is effective;
ranges for some operator classes.

A simplified conceptual diagram:

SP-GiST is less commonly the first index a Java backend engineer reaches for, but it matters in specialized workloads.

Decision rule:

Use SP-GiST when the data type/operator class documentation and workload indicate partitioned search is better than B-tree/GiST/GIN.

Do not use it because it sounds advanced.

19. BRIN: Block Range Index

BRIN stands for Block Range Index.

Mental model:

Instead of indexing every row, BRIN stores summaries for ranges of heap pages.

For each block range, it may store min/max or other summary data.

BRIN is tiny compared to B-tree and excellent when:

table is very large;
data is naturally correlated with physical insertion order;
queries scan time ranges or ordered ranges;
exact row lookup is not the goal;
append-only or append-mostly workload.

Common table:

CREATE TABLE audit_event (
    id uuid DEFAULT gen_random_uuid(),
    tenant_id uuid NOT NULL,
    event_type text NOT NULL,
    created_at timestamptz NOT NULL DEFAULT now(),
    payload jsonb NOT NULL
);

BRIN index:

CREATE INDEX idx_audit_event_created_brin
ON audit_event USING brin (created_at);

Query:

SELECT count(*)
FROM audit_event
WHERE created_at >= now() - interval '1 day';

If rows are inserted roughly in created_at order, BRIN can skip many irrelevant block ranges.

20. BRIN vs B-tree for Time-Series-Like Tables

For a 2-billion-row append-only audit table:

CREATE INDEX idx_audit_created_btree
ON audit_event (created_at);

may be large and expensive.

BRIN:

CREATE INDEX idx_audit_created_brin
ON audit_event USING brin (created_at);

is much smaller.

Trade-off:

Need	Better Fit
Exact lookup by timestamp and small result	B-tree may be better
Large time range scan	BRIN often good
Append-only huge table	BRIN often good
Randomly inserted timestamps	BRIN weaker
Ordered pagination by tenant/time	B-tree composite likely better
Retention by partition/time	Partitioning + BRIN or local B-tree

BRIN is not a replacement for all time indexes. It is a block-skipping strategy.

21. BRIN Tuning: `pages_per_range`

BRIN summarizes ranges of pages. The range size affects precision.

Example:

CREATE INDEX idx_audit_created_brin
ON audit_event USING brin (created_at)
WITH (pages_per_range = 64);

Smaller pages_per_range:

more precise;
larger index;
more summary entries.

Larger pages_per_range:

smaller index;
less precise;
more false positives/rechecks.

Use workload measurement.

22. Hash Indexes

Hash indexes support equality comparisons.

CREATE INDEX idx_session_token_hash
ON session USING hash (token);

But B-tree also supports equality and is usually more versatile.

B-tree can support:

equality;
ordering;
range;
uniqueness;
merge strategies;
composite access patterns.

Hash index is rarely the default choice for application engineering.

Consider Hash only when:

equality-only workload is proven;
B-tree size/performance is not sufficient;
ordering/range will never be needed;
operational team accepts the narrower semantics;
benchmark shows benefit under realistic workload.

Practical rule:

If you cannot clearly explain why B-tree is inadequate, use B-tree.

23. Combining Specialized and B-tree Indexes

Specialized indexes often need B-tree companions.

Example:

SELECT id, created_at
FROM case_event
WHERE tenant_id = ?
  AND attributes @> '{"risk":"high"}'::jsonb
ORDER BY created_at DESC
LIMIT 50;

Indexes:

CREATE INDEX idx_case_event_attr_gin
ON case_event USING gin (attributes jsonb_path_ops);

CREATE INDEX idx_case_event_tenant_created
ON case_event (tenant_id, created_at DESC);

PostgreSQL may use one, the other, or combine paths depending on selectivity.

But maybe a generated column is better:

ALTER TABLE case_event
ADD COLUMN risk text
GENERATED ALWAYS AS (attributes ->> 'risk') STORED;

CREATE INDEX idx_case_event_tenant_risk_created
ON case_event (tenant_id, risk, created_at DESC);

If risk is a first-class query dimension, this is often cleaner than relying on broad JSONB GIN.

24. Multi-Column Specialized Indexes

Some specialized index methods support multi-column indexes, but do not assume B-tree-like behavior.

Example:

CREATE INDEX idx_booking_room_during_gist
ON room_booking USING gist (room_id, during);

This may require btree_gist for scalar equality combined with range overlap.

The semantics come from operator classes, not just columns.

For mixed access patterns, sometimes two indexes are clearer:

CREATE INDEX idx_booking_room_id
ON room_booking (room_id);

CREATE INDEX idx_booking_during_gist
ON room_booking USING gist (during);

Sometimes one combined GiST index is correct. Validate with actual query plans and constraints.

25. Specialized Indexes and Recheck

You may see:

Recheck Cond: ...
Rows Removed by Index Recheck: ...

This is common for lossy index strategies.

Mental model:

Specialized indexes often answer:

Which rows might match?

Then PostgreSQL verifies:

Which rows actually match?

This is not bad by itself. The issue is whether the candidate set is too large.

26. Index Type Decision Matrix

Data Shape	Query	Recommended Start
scalar id	`id = ?`	B-tree
tenant list by time	tenant + status + order	composite B-tree or partial B-tree
JSON containment	`payload @> ...`	GIN JSONB
JSON scalar equality	`payload ->> 'x' = ?`	expression B-tree or generated column
text word search	`@@ tsquery`	GIN on `tsvector`
text substring	`ILIKE '%x%'`	`pg_trgm` GIN/GiST
array contains	`tags @> ARRAY[...]`	GIN
range overlap	`during && range`	GiST/SP-GiST depending type
no overlapping intervals	exclusion constraint	GiST + `btree_gist`
huge append-only time filtering	`created_at BETWEEN ...`	BRIN, often with partitioning
equality-only hashable value	`token = ?`	B-tree first, Hash only after proof

27. Case Study: Regulatory Case Search

Suppose product asks:

Users need to search enforcement cases by reference number, party name, notes, risk flags, and assigned officer.

Do not create one giant generic search index immediately.

Decompose search semantics.

27.1 Exact reference lookup

WHERE tenant_id = ?
  AND reference_no = ?

Index:

CREATE UNIQUE INDEX ux_case_tenant_reference
ON enforcement_case (tenant_id, reference_no);

27.2 Prefix reference search

WHERE tenant_id = ?
  AND reference_no LIKE ? || '%'
ORDER BY reference_no
LIMIT 20;

Index:

CREATE INDEX idx_case_tenant_reference_pattern
ON enforcement_case (tenant_id, reference_no text_pattern_ops);

27.3 Party name fuzzy search

WHERE party_name ILIKE '%' || ? || '%'

Index:

CREATE EXTENSION IF NOT EXISTS pg_trgm;

CREATE INDEX idx_case_party_name_trgm
ON enforcement_case USING gin (party_name gin_trgm_ops);

If tenant boundary is required, test whether adding a separate tenant B-tree or generated search table is better.

27.4 Notes full-text search

WHERE notes_vector @@ plainto_tsquery('english', ?)

Index:

CREATE INDEX idx_case_notes_vector_gin
ON enforcement_case USING gin (notes_vector);

27.5 JSON risk flags

WHERE attributes @> '{"risk":"high"}'::jsonb

Index:

CREATE INDEX idx_case_attributes_gin
ON enforcement_case USING gin (attributes jsonb_path_ops);

Or promote risk to generated column:

ALTER TABLE enforcement_case
ADD COLUMN risk text
GENERATED ALWAYS AS (attributes ->> 'risk') STORED;

CREATE INDEX idx_case_tenant_risk_status
ON enforcement_case (tenant_id, risk, status);

A top-tier solution may use multiple precise indexes, not one universal index.

28. Case Study: Outbox Table

Outbox table:

CREATE TABLE outbox_event (
    id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
    aggregate_type text NOT NULL,
    aggregate_id uuid NOT NULL,
    event_type text NOT NULL,
    payload jsonb NOT NULL,
    status text NOT NULL,
    created_at timestamptz NOT NULL DEFAULT now(),
    published_at timestamptz
);

Worker query:

SELECT id, payload
FROM outbox_event
WHERE status = 'READY'
ORDER BY created_at ASC
LIMIT 100
FOR UPDATE SKIP LOCKED;

Best initial index is not GIN on payload.

CREATE INDEX idx_outbox_ready_created
ON outbox_event (created_at ASC, id ASC)
WHERE status = 'READY';

But if support needs to search payload for a rare incident:

SELECT *
FROM outbox_event
WHERE payload @> '{"customerTier":"VIP"}'::jsonb;

Do not automatically add a GIN index to the hot outbox table. Consider:

incident frequency;
table retention;
replica-only index;
archive/search table;
generated column for stable fields;
external search system if semantics grow.

29. Case Study: Audit Log

Audit logs are often append-heavy and huge.

CREATE TABLE audit_log (
    id bigserial PRIMARY KEY,
    tenant_id uuid NOT NULL,
    actor_id uuid,
    action text NOT NULL,
    created_at timestamptz NOT NULL DEFAULT now(),
    payload jsonb NOT NULL
);

Common queries:

Recent tenant audit

WHERE tenant_id = ?
ORDER BY created_at DESC
LIMIT 100

Index:

CREATE INDEX idx_audit_tenant_created
ON audit_log (tenant_id, created_at DESC, id DESC);

Large time-range reporting

WHERE created_at >= ? AND created_at < ?

BRIN:

CREATE INDEX idx_audit_created_brin
ON audit_log USING brin (created_at);

Rare JSON payload investigation

WHERE payload @> '{"ip":"1.2.3.4"}'::jsonb

Decision depends on frequency. A broad GIN may be too expensive. Options:

add GIN only on archive partition;
route logs to search/analytics system;
extract stable fields into columns;
create temporary incident-specific index and drop later;
use a read replica for forensic indexing.

30. EXPLAIN Patterns for Specialized Indexes

30.1 GIN JSONB

EXPLAIN (ANALYZE, BUFFERS)
SELECT *
FROM case_event
WHERE attributes @> '{"risk":"high"}'::jsonb;

Look for:

Bitmap Index Scan on idx_case_event_attributes_gin
Bitmap Heap Scan on case_event
Recheck Cond: (attributes @> ...)

30.2 Trigram

EXPLAIN (ANALYZE, BUFFERS)
SELECT *
FROM customer
WHERE name ILIKE '%andi%';

Look for:

Bitmap Index Scan on idx_customer_name_trgm

30.3 BRIN

EXPLAIN (ANALYZE, BUFFERS)
SELECT count(*)
FROM audit_log
WHERE created_at >= now() - interval '1 day';

Look for:

Bitmap Index Scan on idx_audit_created_brin
Rows Removed by Index Recheck

Some recheck is expected. Too much recheck means the BRIN summary is too coarse or physical correlation is poor.

31. Operational Signals

For specialized indexes, monitor:

SELECT
    schemaname,
    relname AS table_name,
    indexrelname AS index_name,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC;

Size:

SELECT
    c.relname,
    pg_size_pretty(pg_relation_size(c.oid)) AS relation_size
FROM pg_class c
WHERE c.relkind IN ('i', 'r')
ORDER BY pg_relation_size(c.oid) DESC
LIMIT 20;

Table/index relation:

SELECT
    t.relname AS table_name,
    pg_size_pretty(pg_relation_size(t.oid)) AS table_size,
    pg_size_pretty(pg_indexes_size(t.oid)) AS indexes_size
FROM pg_class t
WHERE t.relkind = 'r'
ORDER BY pg_indexes_size(t.oid) DESC
LIMIT 20;

Watch for:

GIN index size explosion;
high write latency after adding GIN/trigram;
BRIN returning too many false positives;
GiST recheck ratio too high;
unused specialized indexes;
duplicate broad and narrow indexes;
index creation causing WAL/replication lag.

32. Specialized Index Creation in Production

Use concurrent creation for large production tables:

CREATE INDEX CONCURRENTLY idx_customer_name_trgm
ON customer USING gin (name gin_trgm_ops);

But remember:

concurrent creation still consumes resources;
GIN/GiST/BRIN index builds can create significant I/O;
replicas must replay WAL;
index build may hurt cache locality;
failed concurrent builds can leave invalid indexes;
extension creation may require elevated privileges;
managed PostgreSQL may restrict some extensions.

Migration plan should include:

1. create extension if approved
2. create index concurrently
3. analyze table
4. verify plan
5. monitor latency, WAL, replica lag
6. remove old/redundant index later

33. Extension Governance

Some specialized indexes depend on extensions.

Examples:

CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE EXTENSION IF NOT EXISTS btree_gist;

Treat extensions as platform dependencies.

Review:

availability in managed PostgreSQL;
version compatibility;
backup/restore behavior;
security approval;
migration order;
local dev/test parity;
rollback plan.

Do not let an application migration casually introduce extensions without platform review.

34. Specialized Index Anti-Patterns

34.1 GIN on every JSONB column

Bad:

CREATE INDEX idx_payload_gin ON event USING gin (payload);

without known queries.

34.2 Full-text search for identifiers

Full-text tokenization may transform input. Exact identifiers need exact or trigram/prefix search, not linguistic search.

34.3 Trigram for exact lookup

If you always query email = ?, use B-tree, not trigram.

34.4 BRIN on uncorrelated data

If physical order does not correlate with indexed column, BRIN may scan too many blocks.

34.5 Hash index because "hash is O(1)"

Database performance is not textbook hash-table complexity. B-tree is usually more versatile and competitive.

34.6 Ignoring recheck cost

Lossy indexes can produce large candidate sets. Always measure.

34.7 Adding specialized indexes to hot write tables without write testing

GIN/trigram indexes can materially affect insert/update latency.

35. Hands-On Lab

35.1 JSONB GIN

DROP TABLE IF EXISTS case_event;

CREATE TABLE case_event (
    id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
    tenant_id uuid NOT NULL,
    attributes jsonb NOT NULL,
    created_at timestamptz NOT NULL DEFAULT now()
);

INSERT INTO case_event (tenant_id, attributes, created_at)
SELECT
    ('00000000-0000-0000-0000-' || lpad(((g % 100) + 1)::text, 12, '0'))::uuid,
    jsonb_build_object(
        'risk', CASE WHEN g % 10 = 0 THEN 'high' ELSE 'normal' END,
        'channel', CASE WHEN g % 3 = 0 THEN 'email' ELSE 'portal' END,
        'tags', jsonb_build_array('case', CASE WHEN g % 10 = 0 THEN 'priority' ELSE 'standard' END)
    ),
    now() - ((g % 365) || ' days')::interval
FROM generate_series(1, 500000) AS g;

ANALYZE case_event;

Baseline:

EXPLAIN (ANALYZE, BUFFERS)
SELECT count(*)
FROM case_event
WHERE attributes @> '{"risk":"high"}'::jsonb;

Add index:

CREATE INDEX idx_case_event_attributes_path_gin
ON case_event USING gin (attributes jsonb_path_ops);

ANALYZE case_event;

Re-test:

EXPLAIN (ANALYZE, BUFFERS)
SELECT count(*)
FROM case_event
WHERE attributes @> '{"risk":"high"}'::jsonb;

35.2 Expression B-tree alternative

CREATE INDEX idx_case_event_risk_expr
ON case_event ((attributes ->> 'risk'));

EXPLAIN (ANALYZE, BUFFERS)
SELECT count(*)
FROM case_event
WHERE attributes ->> 'risk' = 'high';

Compare:

GIN containment query;
expression B-tree extraction query;
index size;
plan shape.

35.3 Trigram

CREATE EXTENSION IF NOT EXISTS pg_trgm;

DROP TABLE IF EXISTS customer_search;

CREATE TABLE customer_search (
    id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
    name text NOT NULL
);

INSERT INTO customer_search (name)
SELECT 'Customer ' || g || ' Andika ' || md5(g::text)
FROM generate_series(1, 300000) AS g;

ANALYZE customer_search;

EXPLAIN (ANALYZE, BUFFERS)
SELECT *
FROM customer_search
WHERE name ILIKE '%andika%'
LIMIT 20;

CREATE INDEX idx_customer_search_name_trgm
ON customer_search USING gin (name gin_trgm_ops);

ANALYZE customer_search;

EXPLAIN (ANALYZE, BUFFERS)
SELECT *
FROM customer_search
WHERE name ILIKE '%andika%'
LIMIT 20;

35.4 BRIN

DROP TABLE IF EXISTS audit_log_lab;

CREATE TABLE audit_log_lab (
    id bigserial PRIMARY KEY,
    created_at timestamptz NOT NULL,
    payload jsonb NOT NULL
);

INSERT INTO audit_log_lab (created_at, payload)
SELECT
    timestamp '2025-01-01' + (g || ' seconds')::interval,
    jsonb_build_object('n', g)
FROM generate_series(1, 1000000) AS g;

ANALYZE audit_log_lab;

EXPLAIN (ANALYZE, BUFFERS)
SELECT count(*)
FROM audit_log_lab
WHERE created_at >= timestamp '2025-01-10'
  AND created_at < timestamp '2025-01-11';

CREATE INDEX idx_audit_log_lab_created_brin
ON audit_log_lab USING brin (created_at);

ANALYZE audit_log_lab;

EXPLAIN (ANALYZE, BUFFERS)
SELECT count(*)
FROM audit_log_lab
WHERE created_at >= timestamp '2025-01-10'
  AND created_at < timestamp '2025-01-11';

36. Self-Correction Drills

Drill 1: JSONB containment or scalar extraction?

Query:

WHERE payload ->> 'customerId' = ?

Should you start with GIN JSONB or expression/generated-column B-tree?

Explain.

Drill 2: Notes search

Users search free-form notes by words like "late filing penalty".

Should you use trigram or full-text search?

What if they search arbitrary fragments like "FIL-2026"?

Drill 3: Audit table

A 4-billion-row append-only audit table is queried by monthly date ranges. The table is physically ordered by insertion time.

Should you start with B-tree or BRIN?

What would make BRIN fail?

Drill 4: No overlapping validity periods

You need to enforce no overlapping active policy versions.

Should this be Java check-then-insert, trigger, or exclusion constraint?

Explain concurrency risk.

Drill 5: Hot outbox payload search

A hot outbox table gets 2,000 inserts/sec. Support wants occasional JSON search on payload.

Should you add broad GIN immediately?

List safer alternatives.

37. Production Readiness Checklist

Before approving a specialized index:

38. Key Takeaways

Specialized index choice starts from predicate semantics, not from index popularity.
GIN is an inverted index useful for JSONB, arrays, full-text, and trigram search, but it can be write-expensive.
JSONB scalar equality often belongs to expression B-tree or generated columns, not broad GIN.
Full-text search and trigram search solve different problems.
GiST is powerful for ranges, overlap, spatial-style predicates, and exclusion constraints.
SP-GiST is useful for partitioned search-space operator classes, but should be chosen based on data/operator fit.
BRIN is excellent for huge naturally ordered tables and range scans, but it is a block-skipping strategy, not exact row lookup.
Hash indexes are equality-only and rarely the first choice because B-tree is more versatile.
Recheck is normal for many specialized indexes; the question is whether the candidate set is acceptably small.
Java API design must expose search semantics clearly, or the database will be forced into vague and inefficient indexing.

In Part 015, we will step back from individual index types and build an index selection strategy: index budget, write amplification, redundant indexes, governance, anti-patterns, and production review discipline.

Lesson Recap

You just completed lesson 14 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 13

Learn Postgresql In Action Part 013 Btree Index Deep Dive

Next Lesson

Lesson 15

Index Selection Strategy and Anti-Patterns

Learn Postgresql In Action Part 014 Indexing Specialized Indexes

Part 014 — Specialized Indexes: GIN, GiST, SP-GiST, BRIN, Hash

1. Kaufman Skill Target

2. First Principle: Index Type Follows Predicate Semantics

3. PostgreSQL Index Families at a Glance

4. GIN: Generalized Inverted Index

5. GIN for JSONB

6. JSONB Operator Class: jsonb_ops vs jsonb_path_ops

7. JSONB: GIN vs Expression B-tree vs Generated Column

8. JSONB Indexing Failure Modes

8.1 Over-indexing arbitrary JSON

8.2 Query operator mismatch

8.3 Unbounded JSON schema drift

8.4 Type ambiguity

9. GIN for Arrays

10. GIN Pending List and Write Behavior

11. Full-Text Search

12. Trigram Search with pg_trgm

13. Java API Semantics: Search Must Be Explicit

14. GiST: Generalized Search Tree

15. GiST and Exclusion Constraints

16. Range Indexing for Temporal Systems

17. GiST Recheck Behavior

18. SP-GiST: Space-Partitioned GiST

19. BRIN: Block Range Index

20. BRIN vs B-tree for Time-Series-Like Tables

21. BRIN Tuning: pages_per_range

22. Hash Indexes

23. Combining Specialized and B-tree Indexes

24. Multi-Column Specialized Indexes

25. Specialized Indexes and Recheck

26. Index Type Decision Matrix

27. Case Study: Regulatory Case Search

27.1 Exact reference lookup

27.2 Prefix reference search

27.3 Party name fuzzy search

27.4 Notes full-text search

27.5 JSON risk flags

28. Case Study: Outbox Table

29. Case Study: Audit Log

Recent tenant audit

Large time-range reporting

Rare JSON payload investigation

30. EXPLAIN Patterns for Specialized Indexes

30.1 GIN JSONB

30.2 Trigram

30.3 BRIN

31. Operational Signals

32. Specialized Index Creation in Production

33. Extension Governance

34. Specialized Index Anti-Patterns

34.1 GIN on every JSONB column

34.2 Full-text search for identifiers

34.3 Trigram for exact lookup

34.4 BRIN on uncorrelated data

34.5 Hash index because "hash is O(1)"

34.6 Ignoring recheck cost

34.7 Adding specialized indexes to hot write tables without write testing

35. Hands-On Lab

35.1 JSONB GIN

35.2 Expression B-tree alternative

35.3 Trigram

35.4 BRIN

36. Self-Correction Drills

Drill 1: JSONB containment or scalar extraction?

Drill 2: Notes search

Drill 3: Audit table

Drill 4: No overlapping validity periods

Drill 5: Hot outbox payload search

37. Production Readiness Checklist

38. Key Takeaways

6. JSONB Operator Class: `jsonb_ops` vs `jsonb_path_ops`

12. Trigram Search with `pg_trgm`

21. BRIN Tuning: `pages_per_range`