Deepen PracticeOrdered learning track

JSON, Semi-Structured, and Hybrid Relational Design

Learn SQL in Action - Part 024

JSON, semi-structured data, and hybrid relational design: when to use JSON, how to index it, how to validate it, how to migrate out of blobs, and how to keep relational truth governable.

[2026-07-01]13 min read2517 words

In This Lesson

1. Why This Part Exists 2. Kaufman Framing: The Sub-Skill We Are Training 3. Mental Model: JSON Is a Boundary, Not a Data Model Escape Hatch

PrevNext

Lesson 2435 lesson track20–29 Deepen Practice

#sql#json#semi-structured-data#relational-design+4 more

Part 024 — JSON, Semi-Structured, and Hybrid Relational Design

1. Why This Part Exists

JSON in SQL databases is powerful.

It is also one of the easiest ways to destroy data quality.

A column like this looks harmless:

metadata jsonb NOT NULL

But after two years it often contains:

{
  "priority": "high",
  "sla": "48h",
  "reviewer": { "id": 123, "name": "Ari" },
  "riskScore": "0.87",
  "risk_score": 0.91,
  "isEscalated": true,
  "escalated": "yes",
  "documents": [
    { "id": 1, "type": "pdf" },
    { "documentId": 2, "kind": "evidence" }
  ]
}

Now the database cannot reliably answer:

which cases are escalated?
what is the risk score type?
who owns the reviewer relationship?
which fields are PII?
which JSON shape is current?
how do we index this?
how do we enforce required keys?
what breaks if the application changes field names?

JSON should be a deliberate design boundary, not a shortcut around modelling.

2. Kaufman Framing: The Sub-Skill We Are Training

The sub-skill is:

Given semi-structured or evolving data, decide which parts belong in relational columns, which parts can live in JSON, how to validate/index/query them, and when to promote JSON fields into first-class schema.

You are training to:

separate core relational truth from flexible payload,
identify JSON-safe and JSON-dangerous fields,
design hybrid relational/document models,
query JSON without destroying performance,
index JSON intentionally,
validate JSON shape at boundaries,
avoid EAV and JSON blob anti-patterns,
version JSON payloads,
preserve audit payloads without making them operational source of truth,
migrate frequently used JSON attributes into columns.

Kaufman-style drills:

Take a JSON payload.
Classify each field: identity, relationship, invariant, search key, display-only, audit context, extension.
Promote core fields to columns.
Keep volatile/extensible fields in JSON.
Add validation.
Add indexes only for proven access paths.
Write extraction queries.
Write migration path from JSON to relational columns.

3. Mental Model: JSON Is a Boundary, Not a Data Model Escape Hatch

Think of JSON in SQL as one of these boundaries:

Use JSON when:

input shape varies by source system,
payload is mostly audit/evidence,
fields are sparse and rarely queried,
extension fields are tenant-specific,
data is stored for replay/debugging,
document representation is an API contract,
schema changes faster than relational migrations can safely follow.

Avoid JSON when:

field participates in joins,
field is required for integrity,
field drives workflow state,
field is frequently filtered/sorted/grouped,
field needs foreign key enforcement,
field is part of uniqueness,
field contains high-value business facts,
field is used by many services,
field must be governed as PII or regulated attribute.

Rule:

If the business depends on it as truth, it probably deserves schema.

4. Relational vs JSON Decision Matrix

Field type	Prefer relational column/table	JSON acceptable?
Primary identity	Yes	No
Foreign key relationship	Yes	No
Workflow status	Yes	No
Monetary amount	Yes	Rarely
Risk score used in decisions	Yes	Rarely
Search/filter key	Yes	Sometimes, with expression/generated index
Display-only metadata	Optional	Yes
Inbound raw API payload	No	Yes
Audit event context	Optional core columns + JSON	Yes
Tenant-specific extension	Sometimes extension table	Yes, with governance
Feature flags/config	Sometimes normalized	Yes, with versioning
Sparse optional attributes	Sometimes	Yes
Repeated child entities	Usually child table	Avoid JSON if queryable

5. The Hybrid Relational Pattern

A robust pattern:

CREATE TABLE case_intake (
  intake_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  external_reference text NOT NULL,
  source_system text NOT NULL,
  received_at timestamptz NOT NULL DEFAULT now(),

  -- Relational core extracted from payload
  subject_customer_id bigint,
  intake_type text NOT NULL,
  priority text NOT NULL,
  occurred_at timestamptz,

  -- Raw and semi-structured data
  payload jsonb NOT NULL,
  payload_schema_version int NOT NULL,

  -- Processing state
  normalization_status text NOT NULL DEFAULT 'pending',
  normalization_error text,

  CONSTRAINT intake_priority_check CHECK (priority IN ('low', 'normal', 'high', 'critical')),
  CONSTRAINT intake_normalization_status_check CHECK (
    normalization_status IN ('pending', 'normalized', 'failed', 'ignored')
  )
);

CREATE UNIQUE INDEX uq_intake_source_external_ref
  ON case_intake (source_system, external_reference);

CREATE INDEX idx_intake_priority_received
  ON case_intake (priority, received_at DESC);

This gives you:

stable relational fields for operations,
raw payload for investigation,
schema versioning,
source idempotency,
processing lifecycle,
indexable operational queries.

Bad alternative:

CREATE TABLE case_intake (
  id bigint PRIMARY KEY,
  payload jsonb NOT NULL
);

This is easy on day one and expensive on day 300.

6. PostgreSQL JSON vs JSONB Mental Model

PostgreSQL has both json and jsonb.

Practical distinction:

Type	Mental model	Common use
`json`	Stores textual JSON representation	Preserve exact input formatting/order if needed
`jsonb`	Stores decomposed binary representation	Querying, indexing, containment, operational usage

Typical production default in PostgreSQL:

payload jsonb NOT NULL

But do not confuse jsonb with full schema governance.

jsonb improves query/index capability; it does not magically enforce domain correctness.

7. Basic JSON Query Patterns

Assume:

CREATE TABLE case_event (
  event_id uuid PRIMARY KEY,
  event_type text NOT NULL,
  occurred_at timestamptz NOT NULL,
  payload jsonb NOT NULL
);

7.1 Extract JSON Object Field

SELECT payload -> 'risk' AS risk_object
FROM case_event;

7.2 Extract JSON Text Field

SELECT payload ->> 'riskScore' AS risk_score_text
FROM case_event;

7.3 Cast Extracted Value

SELECT (payload ->> 'riskScore')::numeric(8, 4) AS risk_score
FROM case_event
WHERE event_type = 'RISK_ASSESSED';

Be careful: bad payload values can fail casts.

Safer pattern:

SELECT
  event_id,
  CASE
    WHEN payload ->> 'riskScore' ~ '^[0-9]+(\.[0-9]+)?$'
      THEN (payload ->> 'riskScore')::numeric(8, 4)
    ELSE NULL
  END AS risk_score
FROM case_event;

7.4 JSON Containment

SELECT *
FROM case_event
WHERE payload @> '{"escalated": true}'::jsonb;

7.5 JSON Path-Like Predicate

SELECT *
FROM case_event
WHERE payload -> 'risk' ->> 'category' = 'high';

8. JSON Arrays: Powerful but Dangerous

Example payload:

{
  "documents": [
    { "documentId": 101, "type": "evidence" },
    { "documentId": 102, "type": "notice" }
  ]
}

Query:

SELECT
  e.event_id,
  d.doc ->> 'documentId' AS document_id,
  d.doc ->> 'type' AS document_type
FROM case_event e
CROSS JOIN LATERAL jsonb_array_elements(e.payload -> 'documents') AS d(doc)
WHERE e.event_type = 'EVIDENCE_ATTACHED';

This is acceptable for investigation and low-volume extraction.

But if documents are first-class business entities, model them relationally:

CREATE TABLE case_document (
  case_document_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  case_id bigint NOT NULL,
  document_id bigint NOT NULL,
  document_type text NOT NULL,
  attached_at timestamptz NOT NULL,
  attached_by_user_id bigint NOT NULL,

  UNIQUE (case_id, document_id)
);

JSON arrays are especially risky when:

they need uniqueness,
they are joined to other tables,
they are updated independently,
they contain lifecycle state,
they grow without bound,
they are frequently filtered.

9. JSON Validation in SQL

SQL databases usually do not validate full JSON Schema natively across all engines.

But you can enforce important invariants.

9.1 Required Key

ALTER TABLE case_event
ADD CONSTRAINT event_payload_has_case_id CHECK (payload ? 'caseId');

9.2 Type Check

ALTER TABLE case_event
ADD CONSTRAINT event_payload_case_id_is_string CHECK (
  jsonb_typeof(payload -> 'caseId') = 'string'
);

9.3 Event-Type-Specific Constraint

ALTER TABLE case_event
ADD CONSTRAINT risk_event_has_score CHECK (
  event_type <> 'RISK_ASSESSED'
  OR (
    payload ? 'riskScore'
    AND jsonb_typeof(payload -> 'riskScore') IN ('number', 'string')
  )
);

9.4 Generated Column for Typed Access

Depending on engine support, use generated/computed columns for frequently queried JSON values.

Example conceptual pattern:

ALTER TABLE case_intake
ADD COLUMN external_case_type text
GENERATED ALWAYS AS ((payload ->> 'caseType')) STORED;

CREATE INDEX idx_case_intake_external_case_type
  ON case_intake (external_case_type);

Generated columns make JSON-derived fields more visible to the optimizer and easier to query consistently.

10. Indexing JSON

10.1 Expression Index

Use when one JSON field is frequently filtered.

CREATE INDEX idx_case_event_payload_risk_category
  ON case_event ((payload -> 'risk' ->> 'category'));

Query must match expression shape closely:

SELECT *
FROM case_event
WHERE payload -> 'risk' ->> 'category' = 'high';

10.2 Partial Expression Index

Use when only one event type has the field.

CREATE INDEX idx_risk_assessed_score
  ON case_event (((payload ->> 'riskScore')::numeric))
  WHERE event_type = 'RISK_ASSESSED';

Query:

SELECT *
FROM case_event
WHERE event_type = 'RISK_ASSESSED'
  AND (payload ->> 'riskScore')::numeric >= 0.8;

10.3 GIN Index for Containment

CREATE INDEX idx_case_event_payload_gin
  ON case_event
  USING gin (payload);

Useful for containment-style predicates:

SELECT *
FROM case_event
WHERE payload @> '{"escalated": true}'::jsonb;

10.4 Do Not Index JSON Blindly

A broad GIN index can be large and expensive to maintain.

Before adding a JSON index, ask:

Which query will use it?
Is the predicate stable?
Is the field type consistent?
Is the field selective?
Is the write overhead acceptable?
Would a relational column be better?

If a JSON key becomes important enough to need a carefully tuned index, it may be important enough to become a column.

11. JSON and Sargability

Bad:

WHERE lower(payload ->> 'customerEmail') = lower(:email)

This is hard to optimize unless an exact expression index exists.

Better:

customer_email_normalized text GENERATED ALWAYS AS (
  lower(payload ->> 'customerEmail')
) STORED

Then:

CREATE INDEX idx_intake_customer_email_normalized
  ON case_intake (customer_email_normalized);

SELECT *
FROM case_intake
WHERE customer_email_normalized = lower(:email);

Best if the field is core:

customer_email_normalized text NOT NULL

and validate/populate it during ingestion.

12. Versioning JSON Payloads

Never assume all rows have the same JSON shape.

Add a schema version:

CREATE TABLE integration_message (
  message_id uuid PRIMARY KEY,
  source_system text NOT NULL,
  message_type text NOT NULL,
  schema_version int NOT NULL,
  received_at timestamptz NOT NULL DEFAULT now(),
  payload jsonb NOT NULL,

  UNIQUE (source_system, message_type, message_id)
);

Version-aware extraction:

SELECT
  message_id,
  CASE schema_version
    WHEN 1 THEN payload ->> 'risk_score'
    WHEN 2 THEN payload -> 'risk' ->> 'score'
    ELSE NULL
  END AS risk_score_text
FROM integration_message
WHERE message_type = 'RiskAssessed';

Better: normalize into a stable projection.

CREATE TABLE risk_assessment (
  risk_assessment_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  source_message_id uuid NOT NULL UNIQUE,
  case_id bigint NOT NULL,
  risk_score numeric(8, 4) NOT NULL,
  risk_category text NOT NULL,
  assessed_at timestamptz NOT NULL,
  model_version text NOT NULL
);

Raw JSON is preserved; operational truth is normalized.

13. Raw Landing + Normalized Projection

This is the most reliable pattern for integrations.

DDL:

CREATE TABLE raw_inbound_message (
  raw_message_id uuid PRIMARY KEY,
  source_system text NOT NULL,
  external_message_id text NOT NULL,
  message_type text NOT NULL,
  schema_version int NOT NULL,
  received_at timestamptz NOT NULL DEFAULT now(),
  payload jsonb NOT NULL,
  processing_status text NOT NULL DEFAULT 'pending',
  processing_error text,

  UNIQUE (source_system, external_message_id),

  CONSTRAINT raw_message_status_check CHECK (
    processing_status IN ('pending', 'processed', 'failed', 'ignored')
  )
);

CREATE TABLE normalized_case_signal (
  case_signal_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  raw_message_id uuid NOT NULL UNIQUE,
  case_id bigint NOT NULL,
  signal_type text NOT NULL,
  signal_strength numeric(8, 4),
  occurred_at timestamptz NOT NULL,
  source_system text NOT NULL
);

Processing transaction:

BEGIN;

INSERT INTO normalized_case_signal (
  raw_message_id,
  case_id,
  signal_type,
  signal_strength,
  occurred_at,
  source_system
)
SELECT
  raw_message_id,
  (payload ->> 'caseId')::bigint,
  payload ->> 'signalType',
  (payload ->> 'strength')::numeric(8, 4),
  (payload ->> 'occurredAt')::timestamptz,
  source_system
FROM raw_inbound_message
WHERE raw_message_id = :raw_message_id
  AND processing_status = 'pending';

UPDATE raw_inbound_message
SET processing_status = 'processed'
WHERE raw_message_id = :raw_message_id
  AND processing_status = 'pending';

COMMIT;

If normalization fails, mark it explicitly:

UPDATE raw_inbound_message
SET processing_status = 'failed',
    processing_error = :error_message
WHERE raw_message_id = :raw_message_id;

14. JSON as Audit Payload

Audit/event payload is one of the best uses of JSON.

Example:

INSERT INTO case_event (
  event_id,
  event_type,
  occurred_at,
  payload
)
VALUES (
  gen_random_uuid(),
  'CASE_ESCALATED',
  now(),
  jsonb_build_object(
    'caseId', :case_id,
    'oldPriority', :old_priority,
    'newPriority', :new_priority,
    'reasonCode', :reason_code,
    'ruleVersion', :rule_version,
    'evidenceIds', :evidence_ids
  )
);

Why JSON fits:

event types have different shapes,
payload is contextual,
many fields are not queried often,
preserving evidence matters,
schema can evolve by event type.

But do not hide core event identity in JSON only.

Better:

CREATE TABLE case_event (
  event_id uuid PRIMARY KEY,
  case_id bigint NOT NULL,
  event_type text NOT NULL,
  occurred_at timestamptz NOT NULL,
  actor_id text NOT NULL,
  payload jsonb NOT NULL
);

Core searchable fields are columns; contextual fields are JSON.

15. JSON as Extension Attributes

Sometimes different tenants or products need optional fields.

Pattern:

CREATE TABLE case_extension_attribute (
  case_id bigint NOT NULL,
  namespace text NOT NULL,
  attributes jsonb NOT NULL,
  schema_version int NOT NULL,
  updated_at timestamptz NOT NULL DEFAULT now(),

  PRIMARY KEY (case_id, namespace)
);

This is acceptable when:

extension attributes are not core workflow state,
each namespace has ownership,
schema version is documented,
reporting expectations are controlled,
validation exists outside or inside the database,
extension data has governance.

If tenants start demanding cross-tenant reports on extension fields, JSON stops being cheap.

16. JSON vs EAV

EAV means entity-attribute-value:

CREATE TABLE case_attribute (
  case_id bigint NOT NULL,
  attribute_name text NOT NULL,
  attribute_value text,
  PRIMARY KEY (case_id, attribute_name)
);

EAV is flexible but often terrible for correctness.

Problems:

values lose type,
constraints become application-only,
joins explode,
indexing is awkward,
query readability collapses,
required fields are hard to enforce,
schema governance disappears.

JSON has similar risks but can preserve nested structure and typed primitives.

Decision:

Need	Prefer
Sparse display metadata	JSON
Dynamic searchable attributes	Carefully governed extension table or typed attribute tables
High-integrity core fields	Relational columns
Audit context	JSON payload
User-defined form responses	Dedicated response model, sometimes JSON plus extracted indexes

For form systems, a good compromise:

CREATE TABLE form_submission (
  submission_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  form_id bigint NOT NULL,
  form_version int NOT NULL,
  submitted_by_user_id bigint NOT NULL,
  submitted_at timestamptz NOT NULL,
  answers jsonb NOT NULL
);

CREATE TABLE form_answer_index (
  submission_id bigint NOT NULL,
  question_key text NOT NULL,
  answer_text text,
  answer_number numeric,
  answer_date date,
  PRIMARY KEY (submission_id, question_key)
);

Raw answers are preserved; common query dimensions are indexed.

17. Migrating Out of JSON Blob

When a JSON field becomes important, promote it.

Example starting point:

CREATE TABLE case_intake (
  intake_id bigint PRIMARY KEY,
  payload jsonb NOT NULL
);

Application often queries:

WHERE payload ->> 'priority' = 'critical'

Migration plan:

Step 1: Add Nullable Column

ALTER TABLE case_intake
ADD COLUMN priority text;

Step 2: Backfill

UPDATE case_intake
SET priority = payload ->> 'priority'
WHERE priority IS NULL
  AND payload ? 'priority';

Step 3: Add Check Constraint as Not Valid Where Supported

ALTER TABLE case_intake
ADD CONSTRAINT case_intake_priority_check
CHECK (priority IN ('low', 'normal', 'high', 'critical'));

For very large tables, use engine-specific online validation if supported.

Step 4: Dual Write

Application writes both JSON and column during transition.

Step 5: Read from Column

Update application queries:

WHERE priority = 'critical'

Step 6: Add Index

CREATE INDEX idx_case_intake_priority
  ON case_intake (priority);

Step 7: Enforce Not Null if Required

ALTER TABLE case_intake
ALTER COLUMN priority SET NOT NULL;

Step 8: Decide Whether to Keep JSON Copy

Options:

keep for raw audit,
remove redundant key from new payloads,
keep but treat relational column as source of truth,
add assertion to detect drift.

Drift assertion:

SELECT intake_id
FROM case_intake
WHERE payload ? 'priority'
  AND payload ->> 'priority' IS DISTINCT FROM priority;

18. JSON and Data Quality

Run assertions.

18.1 Missing Required Key

SELECT event_id
FROM case_event
WHERE event_type = 'CASE_ESCALATED'
  AND NOT (payload ? 'reasonCode');

18.2 Inconsistent Type

SELECT event_id, payload -> 'riskScore' AS bad_value
FROM case_event
WHERE event_type = 'RISK_ASSESSED'
  AND jsonb_typeof(payload -> 'riskScore') NOT IN ('number', 'string');

18.3 Unknown Schema Version

SELECT schema_version, count(*)
FROM raw_inbound_message
GROUP BY schema_version
HAVING schema_version NOT IN (1, 2, 3);

18.4 Field Name Drift

SELECT
  count(*) FILTER (WHERE payload ? 'riskScore') AS camel_case,
  count(*) FILTER (WHERE payload ? 'risk_score') AS snake_case
FROM case_event
WHERE event_type = 'RISK_ASSESSED';

Field-name drift is a real production smell.

19. JSON Security and Governance

JSON payloads often become a hiding place for sensitive data.

Governance questions:

Does the JSON contain PII?
Are sensitive keys documented?
Are keys encrypted, tokenized, masked, or redacted?
Can access control inspect JSON fields?
Are logs accidentally emitting payloads?
Are payloads copied to analytics systems?
Are retention policies different for JSON payload and relational columns?
Are downstream consumers depending on undocumented keys?

Bad:

{
  "customerName": "...",
  "nationalId": "...",
  "address": "...",
  "debug": "full request body including secrets"
}

Better:

store sensitive identifiers in governed columns,
tokenize where possible,
redact raw payloads after normalization when allowed,
classify payload keys,
separate operational payload from secure evidence storage,
avoid dumping full JSON into logs.

JSON does not bypass data governance obligations.

It increases the need for them.

20. JSON and API Design

Sometimes relational storage should produce JSON output.

That does not mean JSON must be the storage model.

Pattern:

Example:

SELECT jsonb_build_object(
  'caseId', c.case_id,
  'status', c.status,
  'assignedTo', jsonb_build_object(
    'userId', u.user_id,
    'displayName', u.display_name
  ),
  'deadlines', jsonb_agg(
    jsonb_build_object(
      'type', d.deadline_type,
      'dueAt', d.deadline_at
    )
    ORDER BY d.deadline_at
  )
) AS case_document
FROM case_current_state c
JOIN app_user u ON u.user_id = c.assigned_to_user_id
LEFT JOIN case_deadline d ON d.case_id = c.case_id
WHERE c.case_id = :case_id
GROUP BY c.case_id, c.status, u.user_id, u.display_name;

This is often better than storing the entire API document as source of truth.

Storage model and API model do not have to be identical.

21. JSON-Relational Duality

Modern database vendors increasingly support ways to expose relational data as JSON documents while keeping relational integrity underneath.

The architectural idea is powerful:

One logical business object can be accessed as a document,
while being stored and constrained relationally.

This matters because many teams wrongly frame the choice as:

relational database OR document database

A better framing:

relational integrity for source of truth,
document shape for API and aggregate access.

Even where vendor-specific duality features are unavailable, you can implement the same principle using:

normalized tables,
views,
SQL JSON construction functions,
materialized views,
API-level assemblers,
CDC-fed document projections.

Do not abandon relational correctness just because clients prefer JSON.

22. Performance Failure Modes

22.1 JSON Field Used in Every Query

Symptom:

WHERE payload ->> 'status' = 'open'

Fix:

status text NOT NULL

22.2 JSON Cast in Predicate

Symptom:

WHERE (payload ->> 'riskScore')::numeric > 0.8

Issues:

cast can fail,
expression index required,
statistics may be weak,
values may be inconsistent.

Fix:

risk_score numeric(8, 4)

22.3 Huge Payload Read Amplification

If each row contains a large JSON document, even simple queries may read too much data.

Fix options:

split large payload into side table,
store only normalized fields in hot table,
archive raw payload,
use projection tables,
avoid selecting payload unless needed.

22.4 GIN Index Write Overhead

Broad inverted indexes can be expensive on write-heavy event tables.

Fix options:

partial GIN index by event type,
expression indexes for specific fields,
move hot fields to columns,
partition event table,
query from projection table.

23. Production Design Review Checklist

Before approving a JSON column, ask:

Why is this not a relational column/table?
Which fields are source-of-truth?
Which fields are required?
Which fields are indexed?
Which fields are joined?
Which fields contain PII or regulated data?
Is there a schema version?
Is there a validation strategy?
Is there an ownership model for keys?
How will field-name drift be detected?
How will frequently queried fields be promoted?
What is the retention/redaction policy?
Are raw payload and normalized projection separated?
Can we reproduce the original inbound message?
Can we query operational state without parsing JSON?
Are JSON indexes justified by actual query plans?

24. Practice Lab: Intake Payload to Relational Truth

Start with a raw payload:

{
  "externalReference": "SRC-991",
  "caseType": "complaint",
  "priority": "critical",
  "occurredAt": "2026-06-15T09:30:00Z",
  "customer": {
    "externalId": "C-123",
    "name": "Example Customer"
  },
  "risk": {
    "score": 0.91,
    "category": "high",
    "modelVersion": "risk-v4"
  },
  "evidence": [
    { "documentId": 501, "type": "pdf" },
    { "documentId": 502, "type": "image" }
  ]
}

Tasks:

Create raw_inbound_message.
Create case_intake with relational core fields.
Preserve raw JSON.
Normalize customer reference.
Normalize risk assessment.
Normalize evidence documents.
Add schema version.
Add unique source idempotency key.
Add validation checks for required keys.
Add indexes for priority and received time.
Write query for critical intakes.
Write query for high-risk intakes.
Write query extracting evidence from raw JSON for investigation.
Write drift query comparing normalized risk score to payload risk score.
Promote one more field from JSON to a relational column.

Success criterion:

Operational queries should not depend on deep JSON parsing, while raw payload remains available for audit and replay.

25. Common Interview-Level Traps

Trap 1: “JSON Means No Schema”

Wrong.

JSON has a schema. The only question is whether the schema is explicit, tested, versioned, and enforced.

Trap 2: “JSON Is Faster Because No Joins”

Sometimes.

But large JSON documents can cause read amplification, weak statistics, expensive parsing, and difficult indexes.

Trap 3: “Put All Optional Fields in JSON”

Optional does not mean unimportant.

An optional field can still be governed, indexed, or legally sensitive.

Trap 4: “Audit Payload Can Be the Operational Model”

Audit payload is evidence.

Operational model needs constraints, indexes, and stable semantics.

Trap 5: “We Can Normalize Later Easily”

You can normalize later only if you preserved schema version, data quality, and migration path.

Without that, JSON becomes archaeological work.

26. References

PostgreSQL Documentation — JSON Types: https://www.postgresql.org/docs/current/datatype-json.html
PostgreSQL Documentation — JSON Functions and Operators: https://www.postgresql.org/docs/current/functions-json.html
PostgreSQL Documentation — GIN Indexes: https://www.postgresql.org/docs/current/gin.html
PostgreSQL Documentation — Indexes on Expressions: https://www.postgresql.org/docs/current/indexes-expressional.html
PostgreSQL Documentation — Partial Indexes: https://www.postgresql.org/docs/current/indexes-partial.html
Oracle Documentation — JSON-Relational Duality Views: https://docs.oracle.com/en/database/oracle/oracle-database/23/jsnvu/
Oracle Documentation — JSON in Oracle Database: https://docs.oracle.com/en/database/oracle/oracle-database/26/adjsn/json-oracle-ai-database.html

27. What You Should Be Able to Do Now

You should now be able to:

decide when JSON is appropriate,
separate raw payload from relational truth,
design hybrid relational/JSON tables,
query JSON safely,
avoid JSON array modelling traps,
add targeted JSON indexes,
validate required JSON shape,
version JSON payloads,
migrate important JSON fields into columns,
govern sensitive JSON content.

The next part moves into partitioning, sharding, and large-table strategy.

The key question becomes:

How do we keep relational models operational when tables become too large for naive scanning, maintenance, and retention?

Lesson Recap

You just completed lesson 24 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 23

Temporal SQL, History, and Auditability

Next Lesson

Lesson 25

Partitioning, Sharding, and Large Table Strategy