Build CoreOrdered learning track

Query Shape Engineering

Learn PostgreSQL in Action - Part 016

Query shape engineering for PostgreSQL: sargability, predicate design, CTE materialization, EXISTS, IN, LATERAL, pagination, batching, optional filters, JSONB predicates, and Java/Hibernate query patterns that help or hurt the planner.

[2026-07-01]12 min read2380 words

In This Lesson

1. What Problem This Solves 2. Mental Model: SQL as a Physical Request 3. The Planner Can Only Use What the Query Exposes

PrevNext

Lesson 1635 lesson track07–19 Build Core

#postgresql#database#sql#query-planner+4 more

Part 016 — Query Shape Engineering

A slow PostgreSQL query is not always missing an index. Often the index exists, the data model is acceptable, and the problem is the SQL shape.

The planner optimizes the statement it receives, not the repository method name, not the domain intent, and not the diagram in the architecture document.

This part focuses on engineering SQL shapes that allow PostgreSQL to use good plans predictably.

1. What Problem This Solves

A query can be logically correct but physically hostile.

Examples:

WHERE lower(email) = lower($1)

may be fine if an expression index exists, but bad otherwise.

WHERE ($1 IS NULL OR owner_id = $1)

is convenient for dynamic filters, but can make selectivity hard and plan choice unstable.

ORDER BY created_at DESC
OFFSET 500000
LIMIT 50

is syntactically simple, but operationally expensive.

WITH x AS (...)
SELECT ... FROM x JOIN x ...

may materialize or not depending on the shape, references, and explicit MATERIALIZED / NOT MATERIALIZED choice.

Query shape engineering is the skill of writing SQL so the planner can see the useful constraints, indexes, ordering, and cardinality boundaries.

2. Mental Model: SQL as a Physical Request

Application developers often think of SQL as a declarative description of desired rows.

For performance work, think of SQL as a physical request with these questions:

1. Which rows are eligible?
2. Can eligibility be tested through an index?
3. How many rows will probably pass?
4. Does the query require order?
5. Can the order come from an index?
6. Can the query stop early?
7. Does the query force materialization, sorting, hashing, or repeated work?
8. Does the query force per-row correlated execution?
9. Does the query return more columns than needed?
10. Does the query shape remain stable across bind values?

Good SQL gives the planner useful structure.

Bad SQL hides structure behind functions, casts, OR conditions, dynamic filters, unbounded sorting, wide projection, or ORM-generated accidental joins.

3. The Planner Can Only Use What the Query Exposes

The following two queries can be logically similar but physically different.

Bad for a normal index on created_at:

SELECT *
FROM case_event
WHERE date(created_at) = date '2026-07-01';

Better:

SELECT *
FROM case_event
WHERE created_at >= timestamptz '2026-07-01 00:00:00+07'
  AND created_at <  timestamptz '2026-07-02 00:00:00+07';

The second shape lets PostgreSQL use a B-tree range scan on created_at.

Core principle:

Keep indexed columns visible and unwrapped on the side PostgreSQL needs to navigate.

4. Sargability in PostgreSQL Terms

“Sargable” means the predicate can use an index search argument effectively.

PostgreSQL does not use the term as centrally as some systems, but the idea applies.

4.1 Function-wrapped column

Bad unless expression index exists:

WHERE lower(email) = lower($1)

Better with normalized column:

WHERE email_norm = lower($1)

or with expression index:

CREATE INDEX idx_user_lower_email
ON app_user (lower(email));

4.2 Cast on column

Bad:

WHERE order_id::text = $1

Better:

WHERE order_id = $1::uuid

Cast the parameter, not the indexed column.

4.3 Arithmetic on column

Bad:

WHERE amount_cents / 100 > 1000

Better:

WHERE amount_cents > 100000

4.4 Date extraction

Bad:

WHERE extract(year from created_at) = 2026

Better:

WHERE created_at >= timestamptz '2026-01-01 00:00:00+00'
  AND created_at <  timestamptz '2027-01-01 00:00:00+00'

4.5 Leading wildcard

Bad for normal B-tree:

WHERE name LIKE '%bank%'

Better options:

prefix search: name LIKE 'bank%' with suitable operator class/collation strategy;
trigram index for contains search;
full-text search for language-aware search;
external search engine when ranking/faceting/linguistics dominate.

5. Predicate Shape Patterns

5.1 Equality + range

Good:

WHERE tenant_id = $1
  AND customer_id = $2
  AND created_at >= $3
  AND created_at < $4
ORDER BY created_at DESC, id DESC
LIMIT 100;

Index:

CREATE INDEX idx_event_customer_created_desc
ON customer_event (tenant_id, customer_id, created_at DESC, id DESC);

5.2 Optional filters

Convenient but dangerous:

WHERE ($1::uuid IS NULL OR owner_id = $1)
  AND ($2::text IS NULL OR state = $2)

This single generic query represents several different physical access paths:

no owner, no state;
owner only;
state only;
owner and state.

A single plan may be bad for some combinations.

Better for hot paths: generate distinct SQL shapes.

-- shape 1: owner worklist
WHERE tenant_id = $1
  AND owner_id = $2
  AND closed_at IS NULL
ORDER BY due_at, id
LIMIT $3;

-- shape 2: tenant active worklist
WHERE tenant_id = $1
  AND closed_at IS NULL
ORDER BY due_at, id
LIMIT $2;

This lets each query have an index and stable plan.

5.3 Dynamic search endpoints

Do not let a single search endpoint generate arbitrary WHERE clauses against an OLTP table and then “solve it with indexes.”

Use tiers:

Search type	Recommended path
Primary lookup	Exact index-backed query
Worklist/filter screen	Bounded set of supported query shapes
Admin search	Separate read model or limited indexes
Analytics	Warehouse/materialized view/replica path
Fuzzy text search	Full-text/trigram/search service

6. `OR` Conditions and Rewrites

OR can be valid and PostgreSQL can sometimes combine indexes using bitmap OR. But OR often causes broader scans, lost ordering, or unstable plans.

Example:

SELECT id, created_at
FROM case_event
WHERE tenant_id = $1
  AND (case_id = $2 OR actor_id = $3)
ORDER BY created_at DESC
LIMIT 100;

Possible rewrite:

(
    SELECT id, created_at
    FROM case_event
    WHERE tenant_id = $1
      AND case_id = $2
    ORDER BY created_at DESC
    LIMIT 100
)
UNION ALL
(
    SELECT id, created_at
    FROM case_event
    WHERE tenant_id = $1
      AND actor_id = $3
    ORDER BY created_at DESC
    LIMIT 100
)
ORDER BY created_at DESC
LIMIT 100;

This can let each branch use a specific index.

But this rewrite changes duplicate behavior. If the same row can match both branches, use UNION or deduplicate explicitly:

SELECT DISTINCT ON (id) id, created_at
FROM (
    ...
) s
ORDER BY id, created_at DESC;

Then re-apply final ordering if needed.

Rule:

Rewrite OR only when you preserve semantics and improve access paths measurably.

7. `IN`, `ANY`, and Large Lists

Small list:

WHERE state IN ('OPEN', 'ESCALATED')

Usually fine.

Parameterized array:

WHERE state = ANY($1)

Also common, but watch partial-index implication and cardinality estimation.

Large list:

WHERE id IN (... thousands of ids ...)

Potential issues:

huge SQL text if literals are expanded;
poor plan choices depending on list size;
parameter limit or driver overhead;
unstable latency.

Better options:

pass array and use = ANY($1) for moderate lists;
load ids into a temporary table and join;
use a staging table for batch processing;
chunk intentionally at application boundary.

Temp table pattern:

CREATE TEMP TABLE tmp_target_id (
    id uuid PRIMARY KEY
) ON COMMIT DROP;

-- bulk insert ids via JDBC batch or COPY

SELECT c.*
FROM enforcement_case c
JOIN tmp_target_id t ON t.id = c.id;

For large batch operations, a temp table makes the data visible to the planner as a relation rather than a giant predicate blob.

8. `NOT IN`, `NULL`, and Anti-Joins

This can be semantically dangerous:

WHERE id NOT IN (SELECT case_id FROM blocked_case)

If the subquery can return NULL, SQL three-valued logic can produce surprising results.

Safer common pattern:

WHERE NOT EXISTS (
    SELECT 1
    FROM blocked_case b
    WHERE b.case_id = c.id
)

For PostgreSQL, NOT EXISTS often maps naturally to anti-join strategies.

Guideline:

use EXISTS / NOT EXISTS for existence semantics;
use joins when returning columns from both sides;
use IN for small clear membership lists;
be explicit about NULL semantics.

9. `EXISTS` vs `JOIN` for Existence

Bad if you only need existence:

SELECT c.*
FROM enforcement_case c
JOIN case_attachment a ON a.case_id = c.id
WHERE c.tenant_id = $1;

This can duplicate case rows if multiple attachments exist.

Better:

SELECT c.*
FROM enforcement_case c
WHERE c.tenant_id = $1
  AND EXISTS (
      SELECT 1
      FROM case_attachment a
      WHERE a.case_id = c.id
  );

This encodes intent: return cases for which at least one attachment exists.

Index:

CREATE INDEX idx_attachment_case_id
ON case_attachment (case_id);

10. `LATERAL` for Top-N Per Group

Problem: get the latest event for each active case.

Bad pattern:

SELECT c.id, e.id, e.created_at
FROM enforcement_case c
JOIN case_event e ON e.case_id = c.id
WHERE c.closed_at IS NULL
ORDER BY e.created_at DESC;

This can explode rows before filtering.

LATERAL pattern:

SELECT c.id AS case_id,
       latest_event.id AS event_id,
       latest_event.created_at
FROM enforcement_case c
LEFT JOIN LATERAL (
    SELECT e.id, e.created_at
    FROM case_event e
    WHERE e.case_id = c.id
    ORDER BY e.created_at DESC, e.id DESC
    LIMIT 1
) latest_event ON true
WHERE c.closed_at IS NULL
LIMIT 100;

Index:

CREATE INDEX idx_case_event_case_created_desc
ON case_event (case_id, created_at DESC, id DESC);

Mental model:

For each outer row, run a bounded index-backed inner lookup.

This works well when the outer row count is bounded. It can be bad if the outer relation is huge. Always check EXPLAIN (ANALYZE, BUFFERS).

11. CTE Materialization and Optimization Boundaries

CTEs are useful for readability, recursion, data-modifying statements, and intentional materialization. But they are not always free.

PostgreSQL can fold a side-effect-free non-recursive CTE into the parent query when referenced once. If referenced more than once, it may be materialized by default. PostgreSQL lets you force the choice using MATERIALIZED or NOT MATERIALIZED.

11.1 Good CTE for readability

WITH active_case AS (
    SELECT id, tenant_id, owner_id
    FROM enforcement_case
    WHERE tenant_id = $1
      AND closed_at IS NULL
)
SELECT *
FROM active_case
WHERE owner_id = $2;

If folded, this may behave like a normal query.

11.2 Dangerous repeated CTE

WITH active_case AS (
    SELECT *
    FROM enforcement_case
    WHERE tenant_id = $1
      AND closed_at IS NULL
)
SELECT *
FROM active_case a1
JOIN active_case a2 ON a2.parent_case_id = a1.id;

This may materialize a large intermediate result.

11.3 Force merge when appropriate

WITH active_case AS NOT MATERIALIZED (
    SELECT *
    FROM enforcement_case
    WHERE tenant_id = $1
      AND closed_at IS NULL
)
SELECT *
FROM active_case a1
JOIN active_case a2 ON a2.parent_case_id = a1.id;

11.4 Force materialization when appropriate

WITH expensive AS MATERIALIZED (
    SELECT id, expensive_stable_function(payload) AS score
    FROM enforcement_case
    WHERE tenant_id = $1
)
SELECT *
FROM expensive e1
JOIN expensive e2 ON e1.score = e2.score;

Rule:

Use CTEs intentionally: readability, recursion, data modification, or materialization control. Do not assume they are always optimized away.

12. Pagination: Offset vs Keyset

12.1 Offset pagination

SELECT id, created_at
FROM case_event
WHERE tenant_id = $1
ORDER BY created_at DESC, id DESC
OFFSET 500000
LIMIT 50;

Problem:

PostgreSQL still has to walk past many rows;
latency grows with page depth;
concurrent inserts/deletes can shift pages;
user experience often does not require arbitrary deep page jumps.

12.2 Keyset pagination

First page:

SELECT id, created_at
FROM case_event
WHERE tenant_id = $1
ORDER BY created_at DESC, id DESC
LIMIT 50;

SELECT id, created_at
FROM case_event
WHERE tenant_id = $1
  AND (created_at, id) < ($2, $3)
ORDER BY created_at DESC, id DESC
LIMIT 50;

Index:

CREATE INDEX idx_event_tenant_created_id_desc
ON case_event (tenant_id, created_at DESC, id DESC);

Keyset pagination is not always suitable. It does not support “jump to page 500” naturally. But for timelines, audit logs, work queues, notifications, and event streams, it is usually the right production shape.

13. `ORDER BY` Without Determinism

Bad:

ORDER BY created_at DESC
LIMIT 50;

If many rows share the same timestamp, order is not deterministic.

Better:

ORDER BY created_at DESC, id DESC
LIMIT 50;

This also supports keyset pagination.

Rule:

Every paginated query should have a deterministic ORDER BY with a stable tie-breaker.

14. Projection Shape: Stop Selecting Full Entities

Bad for list screens:

SELECT *
FROM enforcement_case
WHERE tenant_id = $1
  AND closed_at IS NULL
ORDER BY due_at
LIMIT 50;

Better:

SELECT id, external_ref, state, priority, due_at
FROM enforcement_case
WHERE tenant_id = $1
  AND closed_at IS NULL
ORDER BY due_at, id
LIMIT 50;

Why:

less heap data read;
less network transfer;
less Java object allocation;
easier covering index design;
fewer accidental TOAST reads;
less pressure on serialization.

In Hibernate, use DTO projection for list views:

public record CaseWorklistRow(
    UUID id,
    String externalRef,
    String state,
    Instant dueAt
) {}

Do not load full aggregate roots for simple worklist cards.

15. Counting Patterns

15.1 Exact count on huge filtered set

SELECT count(*)
FROM audit_event
WHERE tenant_id = $1;

This can be expensive because PostgreSQL must count visible rows matching the predicate.

Alternatives:

Need	Better approach
Display “many results”	Avoid exact count
Paginated UI	Fetch `limit + 1` to detect next page
Dashboard	Pre-aggregate/materialized view
Approximate admin insight	Use estimates carefully
Compliance report	Exact count in batch/reporting path

15.2 `exists` instead of count

Bad:

SELECT count(*) > 0
FROM case_attachment
WHERE case_id = $1;

Better:

SELECT EXISTS (
    SELECT 1
    FROM case_attachment
    WHERE case_id = $1
);

The latter can stop when the first matching row is found.

16. Update and Delete Shape

Bad unbounded update:

UPDATE case_event
SET archived = true
WHERE created_at < now() - interval '2 years';

This can create massive WAL, locks, bloat, replication lag, and autovacuum pressure.

Better batch shape:

WITH batch AS (
    SELECT id
    FROM case_event
    WHERE created_at < now() - interval '2 years'
      AND archived = false
    ORDER BY created_at, id
    LIMIT 5000
)
UPDATE case_event e
SET archived = true
FROM batch
WHERE e.id = batch.id;

Repeat from application or job runner with backoff and metrics.

For deletes:

WITH batch AS (
    SELECT id
    FROM case_event
    WHERE created_at < $1
    ORDER BY created_at, id
    LIMIT 5000
)
DELETE FROM case_event e
USING batch
WHERE e.id = batch.id;

But if retention is the dominant operation, partitioning may be better than batch deletes.

17. JSONB Query Shape

Bad if you need frequent filtering:

WHERE payload->>'region' = $1

Options:

17.1 Expression index

CREATE INDEX idx_case_payload_region
ON enforcement_case ((payload->>'region'));

Query:

WHERE payload->>'region' = $1

17.2 Generated column

ALTER TABLE enforcement_case
ADD COLUMN region text
GENERATED ALWAYS AS (payload->>'region') STORED;

CREATE INDEX idx_case_region
ON enforcement_case (tenant_id, region);

Query:

WHERE tenant_id = $1
  AND region = $2

17.3 GIN containment

CREATE INDEX idx_case_payload_gin
ON enforcement_case USING gin (payload jsonb_path_ops);

Query:

WHERE payload @> '{"region":"R1"}'::jsonb

Decision:

Shape	Use when
Expression index	One or two stable extracted fields
Generated column	Field has domain meaning and frequent queries
GIN containment	Flexible containment search over JSONB
Relational column	Field is core to filtering, joining, constraints, or lifecycle

Do not use JSONB to avoid schema design when the field is actually part of the query contract.

18. Prepared Statements, Generic Plans, and Skew

Java applications use prepared statements heavily. This is usually good.

But some queries are highly sensitive to parameter values.

Example:

WHERE tenant_id = $1

If tenant sizes are extremely skewed, the best plan for a tiny tenant may differ from the best plan for a huge tenant.

Another example:

WHERE state = $1

state = 'OPEN' may be rare while state = 'CLOSED' may be common.

Potential symptoms:

query fast for some tenants, slow for others;
plan changes after repeated executions;
generic plan chosen where custom plan would be better;
partial index not used with bind parameters;
stable staging plan but unstable production plan due to skew.

Mitigations:

use better composite indexes aligned with tenant and lifecycle;
avoid over-generic dynamic query shapes for hot paths;
separate rare-state and common-state queries when necessary;
inspect plans with representative parameter values;
understand pgJDBC server-side prepare behavior;
use EXPLAIN (ANALYZE, BUFFERS) on realistic data.

19. Hibernate/JPA Query Shape Failure Modes

19.1 N+1 is not only a query count problem

N+1 also destroys access-path locality:

1 query loads 100 cases.
100 queries load notes by case_id.

Even if each note query uses an index, total overhead can dominate.

Better options:

DTO projection;
batch fetching;
explicit query for needed child rows;
EXISTS for presence checks;
limited LATERAL query for top child row.

19.2 Fetch join with pagination

A fetch join over a collection plus pagination can produce unexpected SQL and duplicate rows.

Bad conceptual shape:

@Query("select c from Case c join fetch c.notes where c.tenantId = :tenantId")
Page<Case> findCases(...);

Better:

page parent ids first;
fetch details for those ids;
use DTO read model for list screens.

19.3 Case-insensitive search without normalized contract

Bad:

where lower(u.email) = lower(:email)

Better:

normalized column;
expression index;
PostgreSQL citext if its semantics fit;
clear application-level normalization policy.

19.4 Optional filters in one repository query

Bad:

@Query("""
select c from Case c
where c.tenantId = :tenantId
  and (:ownerId is null or c.ownerId = :ownerId)
  and (:state is null or c.state = :state)
""")

Better for hot screens:

define supported query shapes;
generate different SQL for each shape;
keep indexes aligned with those shapes;
route ad hoc search elsewhere.

20. Query Shape Review Template

For each important query, record:

Name:
Business path:
Frequency:
Latency target:
Rows expected:
Worst-case tenant/cardinality:
SQL text:
Parameters:
Required ordering:
Pagination strategy:
Indexes expected:
Expected plan nodes:
Failure modes:
Fallback:
Owner:

Example:

Name: active-owner-worklist
Business path: officer opens work queue
Frequency: high
Latency target: p95 < 100 ms
Rows expected: 50 returned, < 500 scanned
Worst-case tenant/cardinality: tenant 7 has 4M cases, 120k active
SQL text: tenant_id + owner_id + closed_at is null order by due_at,id limit 50
Parameters: tenant_id, owner_id, limit
Required ordering: due_at asc, id asc
Pagination strategy: keyset
Indexes expected: idx_case_owner_active_due
Expected plan nodes: Index Scan using idx_case_owner_active_due, no Sort
Failure modes: owner filter optional, state filter introduced, due_at nullable
Fallback: route broad search to search endpoint/read model
Owner: case-management-service

21. Hands-On Lab

Use the lab_case table from Part 015 or create it again.

21.1 Sargability test

Create index:

CREATE INDEX idx_lab_case_due
ON lab_case (due_at);

ANALYZE lab_case;

Bad shape:

EXPLAIN (ANALYZE, BUFFERS)
SELECT count(*)
FROM lab_case
WHERE date(due_at) = current_date;

Better shape:

EXPLAIN (ANALYZE, BUFFERS)
SELECT count(*)
FROM lab_case
WHERE due_at >= date_trunc('day', now())
  AND due_at <  date_trunc('day', now()) + interval '1 day';

Observe index usage and buffer difference.

21.2 Offset vs keyset

Index:

CREATE INDEX idx_lab_case_tenant_due_id
ON lab_case (tenant_id, due_at DESC, id DESC);

Offset:

EXPLAIN (ANALYZE, BUFFERS)
SELECT id, due_at
FROM lab_case
WHERE tenant_id = 5
ORDER BY due_at DESC, id DESC
OFFSET 50000
LIMIT 50;

Keyset:

EXPLAIN (ANALYZE, BUFFERS)
SELECT id, due_at
FROM lab_case
WHERE tenant_id = 5
  AND (due_at, id) < (now(), 900000)
ORDER BY due_at DESC, id DESC
LIMIT 50;

Observe whether page depth affects latency.

21.3 Optional filter shape

Generic:

EXPLAIN (ANALYZE, BUFFERS)
SELECT id, tenant_id, owner_id, due_at
FROM lab_case
WHERE tenant_id = 5
  AND (100 IS NULL OR owner_id = 100)
  AND closed_at IS NULL
ORDER BY due_at, id
LIMIT 50;

Specific:

EXPLAIN (ANALYZE, BUFFERS)
SELECT id, tenant_id, owner_id, due_at
FROM lab_case
WHERE tenant_id = 5
  AND owner_id = 100
  AND closed_at IS NULL
ORDER BY due_at, id
LIMIT 50;

Compare plan stability.

21.4 CTE materialization

EXPLAIN (ANALYZE, BUFFERS)
WITH active_case AS MATERIALIZED (
    SELECT *
    FROM lab_case
    WHERE closed_at IS NULL
)
SELECT id, due_at
FROM active_case
WHERE tenant_id = 5
ORDER BY due_at, id
LIMIT 50;

Then:

EXPLAIN (ANALYZE, BUFFERS)
WITH active_case AS NOT MATERIALIZED (
    SELECT *
    FROM lab_case
    WHERE closed_at IS NULL
)
SELECT id, due_at
FROM active_case
WHERE tenant_id = 5
ORDER BY due_at, id
LIMIT 50;

Observe whether filters can be pushed down and indexes used.

22. Production Diagnostic Workflow

When a query is slow:

Do not jump directly to CREATE INDEX. First identify whether the problem is:

bad cardinality estimate;
bad predicate shape;
missing access path;
unbounded result;
sort/hash spill;
lock wait;
client-side fetch/serialization;
connection pool starvation;
data skew;
stale stats.

23. Query Shape Anti-Patterns

23.1 `SELECT *` for list endpoints

Use DTO projection.

23.2 Function-wrapped indexed columns

Use range predicates, normalized columns, generated columns, or expression indexes.

23.3 Deep offset pagination

Use keyset pagination for timelines and worklists.

23.4 One generic query for every screen filter

Use stable query shapes for hot paths.

23.5 Missing deterministic order

Always include a stable tie-breaker.

23.6 Counting when existence is enough

Use EXISTS.

23.7 Joining when existence is enough

Use EXISTS to avoid duplicate multiplication.

23.8 Unbounded update/delete

Batch, partition, or archive intentionally.

23.9 JSONB everywhere

Promote frequently queried fields to relational/generated columns.

23.10 Querying production OLTP as analytics warehouse

Separate OLTP paths from reporting paths.

24. Engineering Heuristics

Make hot queries boring and explicit.
Use separate SQL shapes for materially different access paths.
Keep indexed columns visible in predicates.
Cast parameters, not columns.
Avoid deep offset for large ordered sets.
Encode existence as EXISTS.
Encode top-N per parent with bounded LATERAL when appropriate.
Use CTE materialization intentionally.
Treat JSONB query fields as schema once they become operationally important.
Test with production-like cardinality and skew.

25. Self-Correction Questions

Why can date(created_at) = current_date block normal B-tree range usage?
Why is keyset pagination usually better than deep offset for event streams?
When can an OR rewrite to UNION ALL improve performance?
What semantic risk does UNION ALL introduce?
Why is NOT EXISTS usually safer than NOT IN when nulls are possible?
When should a CTE be MATERIALIZED?
When should a CTE be NOT MATERIALIZED?
Why can optional filter queries produce unstable plans?
Why does SELECT * hurt more than network bandwidth?
How can Hibernate generate SQL that defeats an otherwise good index?

26. Summary

Query shape engineering is the bridge between application intent and PostgreSQL execution.

Good query shape:

exposes selective predicates;
matches available indexes;
preserves deterministic ordering;
allows early stop with LIMIT;
avoids unnecessary materialization;
avoids row explosion;
separates hot paths from ad hoc paths;
respects cardinality and skew;
returns only needed columns;
remains stable under Java parameter binding and ORM behavior.

A top-tier PostgreSQL engineer does not ask only, “What index do we need?” They ask:

What exact SQL shape should this business access path have so PostgreSQL can execute it predictably under production data, concurrency, and latency constraints?

The next part moves deeper into execution behavior: join strategies. Once you understand query shape, the next skill is reading how PostgreSQL chooses nested loop, hash join, merge join, and how those choices fail under skew, bad statistics, or memory pressure.

References

PostgreSQL 18 Documentation — WITH Queries / CTE Materialization: https://www.postgresql.org/docs/current/queries-with.html
PostgreSQL 18 Documentation — Indexes: https://www.postgresql.org/docs/current/indexes.html
PostgreSQL 18 Documentation — Combining Multiple Indexes: https://www.postgresql.org/docs/current/indexes-bitmap-scans.html
PostgreSQL 18 Documentation — EXPLAIN: https://www.postgresql.org/docs/current/sql-explain.html
PostgreSQL 18 Documentation — Query Planning: https://www.postgresql.org/docs/current/planner-optimizer.html
PostgreSQL 18 Documentation — JSON Types: https://www.postgresql.org/docs/current/datatype-json.html

Lesson Recap

You just completed lesson 16 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 15

Index Selection Strategy and Anti-Patterns

Next Lesson

Lesson 17

Join Strategies and Execution Behavior

Query Shape Engineering

Part 016 — Query Shape Engineering

1. What Problem This Solves

2. Mental Model: SQL as a Physical Request

3. The Planner Can Only Use What the Query Exposes

4. Sargability in PostgreSQL Terms

4.1 Function-wrapped column

4.2 Cast on column

4.3 Arithmetic on column

4.4 Date extraction

4.5 Leading wildcard

5. Predicate Shape Patterns

5.1 Equality + range

5.2 Optional filters

5.3 Dynamic search endpoints

6. OR Conditions and Rewrites

7. IN, ANY, and Large Lists

8. NOT IN, NULL, and Anti-Joins

9. EXISTS vs JOIN for Existence

10. LATERAL for Top-N Per Group

11. CTE Materialization and Optimization Boundaries

11.1 Good CTE for readability

11.2 Dangerous repeated CTE

11.3 Force merge when appropriate

11.4 Force materialization when appropriate

12. Pagination: Offset vs Keyset

12.1 Offset pagination

12.2 Keyset pagination

13. ORDER BY Without Determinism

14. Projection Shape: Stop Selecting Full Entities

15. Counting Patterns

15.1 Exact count on huge filtered set

15.2 exists instead of count

16. Update and Delete Shape

17. JSONB Query Shape

17.1 Expression index

17.2 Generated column

17.3 GIN containment

18. Prepared Statements, Generic Plans, and Skew

19. Hibernate/JPA Query Shape Failure Modes

19.1 N+1 is not only a query count problem

19.2 Fetch join with pagination

19.3 Case-insensitive search without normalized contract

19.4 Optional filters in one repository query

20. Query Shape Review Template

21. Hands-On Lab

21.1 Sargability test

21.2 Offset vs keyset

21.3 Optional filter shape

21.4 CTE materialization

22. Production Diagnostic Workflow

23. Query Shape Anti-Patterns

23.1 SELECT * for list endpoints

23.2 Function-wrapped indexed columns

23.3 Deep offset pagination

23.4 One generic query for every screen filter

23.5 Missing deterministic order

23.6 Counting when existence is enough

23.7 Joining when existence is enough

23.8 Unbounded update/delete

23.9 JSONB everywhere

23.10 Querying production OLTP as analytics warehouse

24. Engineering Heuristics

25. Self-Correction Questions

26. Summary

References

6. `OR` Conditions and Rewrites

7. `IN`, `ANY`, and Large Lists

8. `NOT IN`, `NULL`, and Anti-Joins

9. `EXISTS` vs `JOIN` for Existence

10. `LATERAL` for Top-N Per Group

13. `ORDER BY` Without Determinism

15.2 `exists` instead of count

23.1 `SELECT *` for list endpoints