Deepen PracticeOrdered learning track

Operational vs Analytical Data Boundaries

Learn Database Design and Architect - Part 066

Operational vs analytical data boundaries: OLTP, OLAP, reporting correctness, workload isolation, freshness, reconciliation, semantic layer, CDC/ETL, and ownership.

19 min read3634 words
PrevNext
Lesson 6684 lesson track46–69 Deepen Practice
#database#database-design#database-architecture#oltp+6 more

Part 066 — Operational vs Analytical Data Boundaries

Target bagian ini: kamu bisa memisahkan database operasional dan analitik secara benar: bukan karena “best practice”, tetapi karena workload, correctness, ownership, freshness, security, dan failure modes-nya berbeda.

Banyak sistem tumbuh seperti ini:

  1. Aplikasi mulai dengan satu database OLTP.
  2. Product manager butuh dashboard.
  3. Engineer menulis query agregasi langsung ke tabel produksi.
  4. Dashboard makin banyak.
  5. Query makin berat.
  6. Replica ditambahkan.
  7. Replica lag mulai memengaruhi angka.
  8. Business logic report tersebar di SQL dashboard.
  9. Migration aplikasi jadi takut karena “takut report rusak”.
  10. Database produksi menjadi campuran command store, integration store, search store, dan reporting warehouse.

Ini bukan masalah tooling. Ini masalah boundary.

Architectural rule:

A database should not be forced to serve contradictory workload contracts without an explicit architecture.


1. OLTP and OLAP Are Different Contracts

OLTP: Online Transaction Processing.

Fokus:

  • banyak transaksi kecil
  • write correctness
  • low latency
  • concurrency control
  • current state
  • strong invariants
  • index untuk lookup spesifik
  • normalized-ish model
  • short transactions

OLAP: Online Analytical Processing.

Fokus:

  • scan besar
  • aggregation
  • historical analysis
  • dimensional slicing
  • long-running queries
  • columnar access often preferred
  • read-heavy workload
  • denormalized or dimensional model
  • batch/stream load

Perbandingan:

DimensionOLTPOLAP
Primary goalCorrect command processingInsight and reporting
Data shapeCurrent operational stateHistorical events/snapshots
Query patternPoint lookup, small joinsLarge scan, group, aggregate
Write patternFrequent small writesBatch/stream loads
Latency targetmilliseconds to low secondsseconds to minutes depending report
Schemanormalized, invariant-richstar, wide, columnar, semantic
Concurrencymany users modifying statemany readers scanning data
Correctnesstransaction/invariant correctnessmetric/reconciliation correctness
Failure impactuser cannot perform operationbusiness cannot see/report insight

Mental model:


2. The Core Boundary Question

Do not start with “Should we use a data warehouse?”

Start with:

What decisions depend on this data, and what correctness contract do those decisions require?

There are different decision types:

Decision typeExampleStore usually involved
Operational commandapprove payment, assign caseOLTP
Operational readshow case detail pageOLTP/read model
Near-real-time monitoringqueue backlog, SLA breachOLTP projection / low-latency mart
Management reportingmonthly case volumeOLAP
Regulatory/statutory reportofficial submitted numbergoverned OLAP with reproducibility
Investigation analyticsidentify risk patternwarehouse/lake/search/graph
Product analyticsfunnel, retentionevent warehouse
ML feature generationrisk score featuresfeature store/warehouse

Boundary is determined by decision risk.

If wrong data causes illegal state or user-facing action error, do not use stale analytical projection as authority.

If query needs broad historical aggregation, do not punish the OLTP system.


3. Operational Store: What It Must Own

Operational database owns:

  • current authoritative state
  • command validity
  • invariants
  • transactional writes
  • concurrency control
  • workflow state
  • access-control-critical facts
  • correction path
  • audit event generation
  • source event/order identity

Example operational tables:

CREATE TABLE app.case_file (
    case_id bigint generated always as identity primary key,
    case_number text not null unique,
    tenant_id bigint not null,
    case_type_code text not null,
    status text not null,
    priority text not null,
    opened_at timestamptz not null,
    closed_at timestamptz,
    version bigint not null default 0,
    check ((status = 'CLOSED') = (closed_at is not null))
);

CREATE TABLE app.case_transition (
    case_transition_id bigint generated always as identity primary key,
    case_id bigint not null references app.case_file(case_id),
    from_status text not null,
    to_status text not null,
    transition_reason text,
    actor_id bigint not null,
    transitioned_at timestamptz not null,
    command_id uuid not null unique
);

The operational schema should care about:

  • illegal status prevention
  • duplicate command prevention
  • transactional correctness
  • access control
  • audit completeness

It should not be distorted to make quarterly dashboard SQL convenient.


4. Analytical Store: What It Must Own

Analytical store owns:

  • historical facts
  • reporting grain
  • dimensional context
  • metric definitions
  • snapshots
  • aggregations
  • reproducibility
  • lineage
  • reconciliation
  • semantic layer
  • report-specific performance

Example analytical tables:

CREATE TABLE analytics.fact_case_transition (
    case_transition_fact_key bigint generated always as identity primary key,
    case_transition_id bigint not null,
    case_id bigint not null,
    transition_date_key integer not null,
    from_status_key bigint not null,
    to_status_key bigint not null,
    actor_officer_key bigint,
    case_type_key bigint not null,
    region_key bigint not null,
    transition_count integer not null default 1,
    source_loaded_at timestamptz not null,
    unique (case_transition_id)
);

Analytical schema should care about:

  • correct grain
  • business-readable dimensions
  • historical interpretation
  • query performance for aggregation
  • stable metric contracts

It should not be used to decide whether a command is legal right now.


5. Boundary Anti-Pattern: Reporting Directly on OLTP

Reporting directly on production OLTP feels efficient until it creates hidden coupling.

Failure modes:

5.1 Workload Contention

Heavy report scans compete with application transactions for:

  • CPU
  • memory/cache
  • I/O
  • locks
  • connection pool
  • temporary disk
  • WAL/checkpoint pressure

Example bad query:

SELECT
    date_trunc('month', opened_at) AS month,
    case_type_code,
    status,
    count(*)
FROM app.case_file
WHERE opened_at >= now() - interval '3 years'
GROUP BY 1, 2, 3
ORDER BY 1, 2, 3;

This may be acceptable on small data. At scale, it becomes an operational risk.

5.2 Schema Coupling

Application migration becomes harder because dashboards depend on internal tables.

Now app.case_file.status cannot be refactored because unknown downstream consumers depend on it.

5.3 Metric Drift

Every dashboard defines its own logic.

-- Dashboard A
WHERE status <> 'CLOSED'

-- Dashboard B
WHERE closed_at IS NULL

-- Dashboard C
WHERE status IN ('OPEN', 'IN_REVIEW', 'ESCALATED')

-- Dashboard D
WHERE status NOT IN ('CLOSED', 'CANCELLED', 'DUPLICATE')

They all claim to show “open cases”.

5.4 Historical Context Loss

OLTP current state often cannot answer historical questions correctly.

Question:

How many cases were assigned to Unit A when they were opened last year?

If officer/unit data is overwritten in OLTP, answer may be impossible unless audit/history was designed.

5.5 Access Control Confusion

Operational authorization and analytical access are not the same.

A user who can see one case detail may not be allowed to export all cases matching a condition.


6. Boundary Anti-Pattern: Using Warehouse as Authority

The opposite mistake is using analytical data to make operational decisions.

Bad:

Why risky:

  • warehouse may be stale
  • load may fail
  • dimensions may be Type 1 overwritten
  • aggregation may lose row-level detail
  • security may be broader than operational scope
  • snapshot may represent end-of-day, not current state

Warehouse can inform operations, but should not replace operational truth unless explicitly designed as a serving store with freshness/consistency guarantees.


7. The Data Movement Boundary

Typical movement patterns:

PatternLatencyGood forRisk
Nightly batch ETLhours/daystable reportingstale data
Incremental batchminutes/hoursdashboardslate data complexity
CDC streamingseconds/minutesnear-real-time martsordering/replay complexity
Transactional outboxseconds/minutesdomain events/projectionsschema/event governance
API extractionvariablesmall controlled data productsthrottling/coupling
Direct replica querylowoperational read/report lightstale read + workload coupling

Architecture must define:

  • source of extraction
  • ordering guarantee
  • idempotency key
  • delete semantics
  • correction semantics
  • schema evolution contract
  • PII handling
  • replay path
  • load observability
  • reconciliation

8. Freshness Contract

Analytical data is almost always stale by some amount.

That is not inherently bad. It is bad when unstated.

Define freshness per data product:

Data Product: Case Backlog Dashboard

Freshness target:
- 5 minutes under normal operation

Freshness SLO:
- 99% of refreshes complete within 10 minutes

Staleness display:
- dashboard must show last successful source watermark

Operational use:
- not valid for legal command authorization
- valid for management monitoring

Failure mode:
- if stale > 30 minutes, dashboard displays degraded banner

Freshness metadata table:

CREATE TABLE analytics_control.data_product_watermark (
    data_product_name text primary key,
    source_system text not null,
    source_high_watermark timestamptz not null,
    last_successful_load_at timestamptz not null,
    last_failed_load_at timestamptz,
    status text not null check (status in ('HEALTHY', 'DEGRADED', 'FAILED')),
    error_message text
);

Dashboard should show:

  • last refreshed at
  • source watermark
  • data completeness warning
  • provisional period notice

Architectural rule:

Stale data is acceptable when visible and bounded. Invisible staleness is a correctness bug.


9. Reproducibility Contract

Some reports must be reproducible.

Examples:

  • statutory submissions
  • board reports
  • financial close
  • enforcement performance reports
  • audit evidence packages

A report is reproducible when you can answer:

  • what source data was used?
  • what code/query version produced it?
  • what reference data/taxonomy version was used?
  • what time window and timezone were used?
  • what corrections were included or excluded?
  • who approved it?
  • can the exact result be regenerated?

Report run table:

CREATE TABLE analytics_control.report_run (
    report_run_id uuid primary key,
    report_name text not null,
    report_version text not null,
    parameters jsonb not null,
    source_watermarks jsonb not null,
    reference_versions jsonb not null,
    executed_by text not null,
    executed_at timestamptz not null,
    result_hash text not null,
    status text not null check (status in ('RUNNING', 'SUCCEEDED', 'FAILED', 'APPROVED', 'REVOKED'))
);

For high-stakes reports, do not rely on mutable dashboards as evidence.


10. Workload Isolation Patterns

10.1 Read Replica for Light Reporting

Good for:

  • operational read scaling
  • admin screens
  • low-complexity dashboards
  • short queries

Risks:

  • replica lag
  • schema coupling remains
  • heavy queries can still damage replica
  • no dimensional modelling
  • no governed metric layer

Use when query is close to operational read model and freshness is understood.

10.2 Dedicated Reporting Schema in Same Engine

Good for:

  • early-stage reporting
  • moderate volume
  • teams not ready for full warehouse
  • controlled marts

Risks:

  • same physical database may still share resources
  • transform jobs can compete with OLTP
  • may become permanent accidental warehouse

10.3 Separate Warehouse

Good for:

  • large analytical scans
  • historical storage
  • dimensional models
  • BI workloads
  • cross-domain reporting
  • governed metrics

Risks:

  • data latency
  • pipeline complexity
  • reconciliation needed
  • security duplication
  • cost governance

10.4 Real-Time Projection Store

Good for:

  • near-real-time operational dashboard
  • queue count
  • SLA monitoring
  • status board

Risks:

  • projection drift
  • replay complexity
  • idempotency bugs
  • authorization freshness

11. Choosing the Boundary Pattern

Decision matrix:

RequirementBetter pattern
exact current state for commandOLTP
simple admin listOLTP read model / replica
large historical aggregationwarehouse/mart
dashboard refresh every few minutesCDC/incremental mart
real-time queue statusprojection store or carefully designed OLTP read model
official reportgoverned warehouse + report run evidence
exploratory analysiswarehouse/lakehouse sandbox with governed source
ML featuresfeature store/warehouse with lineage
search/filter over documentssearch projection
graph explorationgraph projection or graph store

12. Data Ownership Across the Boundary

Operational domain owns meaning of operational events.

Analytics team may own mart transformation and metrics.

But ownership must be explicit.

Example RACI:

AssetAccountableResponsibleConsulted
app.case_fileCase Platform TeamCase Platform TeamAnalytics, Compliance
case_opened eventCase Platform TeamCase Platform TeamAnalytics
fact_case_openedAnalytics PlatformAnalytics PlatformCase Platform
dim_case_typeGovernance/Reference Data TeamAnalytics PlatformCase Platform, Compliance
metric.average_triage_timeOperations AnalyticsAnalytics PlatformCompliance
Official monthly reportRegulatory ReportingAnalytics PlatformCase Platform

Rule:

Ownership must follow semantic authority, not just table location.


13. Source Contract for Analytics

Do not let analytics scrape arbitrary tables silently.

Expose stable source contract:

Options:

  • CDC contract on selected tables
  • domain events
  • versioned views
  • extract tables
  • data product API
  • outbox event stream

Example source view:

CREATE VIEW integration.v_case_opened_v1 AS
SELECT
    c.case_id,
    c.case_number,
    c.tenant_id,
    c.case_type_code,
    c.region_code,
    c.opened_at,
    c.created_by_actor_id,
    c.source_channel,
    c.created_at AS source_record_created_at,
    c.updated_at AS source_record_updated_at
FROM app.case_file c;

The view becomes a boundary. Internal OLTP table can evolve behind it.

Contract metadata:

CREATE TABLE integration.data_contract_registry (
    contract_name text primary key,
    version integer not null,
    owner_team text not null,
    compatibility_policy text not null,
    schema_definition jsonb not null,
    created_at timestamptz not null,
    deprecated_at timestamptz
);

14. CDC vs Domain Events

CDC emits data changes.

Domain events express business meaning.

AspectCDCDomain event
Sourcedatabase log/table changeapplication/domain logic
Meaningrow changedbusiness event occurred
Couplingtable/schema couplingevent contract coupling
Orderingusually DB commit order per stream/partitionapp-defined
Best forreplication, projections, warehouse loadingintegration, business workflows
Risklow semantic meaningevent design/versioning burden

Example CDC record says:

row in case_file changed status from TRIAGE to INVESTIGATION.

Domain event says:

CaseAssignedToInvestigation occurred because triage accepted the case.

Analytics may use either, but do not confuse them.

CDC is excellent for completeness and replay. Domain events are better for semantic intent.

Many systems use both:


15. Reporting Correctness Levels

Not all reports need the same rigor.

LevelExampleRequired discipline
Exploratoryad-hoc notebookdocumented caveat
Operational dashboardqueue statusfreshness watermark
Management dashboardmonthly volumemetric contract + reconciliation
Financial/regulatory reportofficial submissionreproducible run + approval + lineage
Legal evidencecase audit packageimmutable evidence + provenance

Do not over-engineer every chart. Do not under-engineer official reports.

Architectural maturity is matching rigor to consequence.


16. Handling Corrections

Operational systems correct data.

Analytical systems must decide how correction affects history.

Types:

Correction typeExampleAnalytical handling
Typo correctionwrong display nameType 1 update
Backdated eventcase opened date correctedrestate affected facts
Legal reclassificationcase type changedType 2 / taxonomy version
Duplicate mergetwo cases mergedbridge/merge event + restatement
Deletion/erasurePII removalprivacy workflow across projections
Reversalpayment reversedreversal fact, not silent delete

Correction event table:

CREATE TABLE analytics_control.data_restatement_run (
    restatement_run_id uuid primary key,
    reason text not null,
    affected_data_products text[] not null,
    affected_period_start date not null,
    affected_period_end date not null,
    source_change_reference text not null,
    executed_at timestamptz not null,
    executed_by text not null,
    validation_result jsonb not null,
    status text not null
);

Rule:

Analytical history is not always immutable, but every restatement must be explainable.


17. Security Boundary

Operational access and analytical access differ.

Operational access:

  • user sees specific case based on assignment/membership
  • user can perform command if role + state + ownership allow
  • row-level enforcement often required

Analytical access:

  • user may see aggregate only
  • user may export data
  • user may query across tenants/regions
  • user may access de-identified data
  • sensitive attributes may need masking

Security patterns:

NeedPattern
aggregate onlysemantic view hides row-level detail
tenant isolationtenant-filtered mart + RLS
PII maskingmasked dimension or tokenized column
export controlseparate export permission and audit
privileged analyticscontrolled break-glass workflow
subject erasurepropagation workflow to warehouse/search/cache

Bad assumption:

“It is only analytics, so security is less strict.”

Often analytics is more dangerous because it aggregates and exports more data.


18. Privacy Boundary

Do not blindly copy all OLTP columns into warehouse.

For each column ask:

  • Is it needed for a known use case?
  • Is it PII/sensitive?
  • Can it be minimized?
  • Can it be tokenized?
  • Can it be aggregated before exposure?
  • What is retention period?
  • Does it need erasure propagation?
  • Does it appear in logs or BI extracts?

Example safer dimension split:

CREATE TABLE analytics.dim_customer_public (
    customer_key bigint primary key,
    customer_id_hash text not null,
    customer_type text not null,
    risk_segment text,
    country_code text,
    effective_from timestamptz not null,
    effective_to timestamptz,
    is_current boolean not null
);

CREATE TABLE analytics_secure.dim_customer_sensitive (
    customer_key bigint primary key,
    legal_name text,
    email text,
    phone text,
    national_id_token text,
    access_policy text not null
);

Most BI users should not need analytics_secure.


19. Performance Boundary

Analytical queries often have different physical needs:

  • columnar storage
  • partition pruning
  • compression
  • materialized aggregates
  • vectorized execution
  • distributed scans
  • large memory grants
  • workload queues

Operational queries need:

  • low-latency index lookup
  • short transactions
  • stable p99
  • predictable write latency
  • small working set

If both run in one engine, enforce:

  • query timeouts
  • statement classes
  • resource groups/queues
  • replica separation
  • reporting windows
  • materialized summaries
  • connection pool separation
  • kill-switch for report users

Example PostgreSQL guardrails:

ALTER ROLE reporting_user SET statement_timeout = '30s';
ALTER ROLE reporting_user SET idle_in_transaction_session_timeout = '10s';

And use separate connection pools:

app-write-pool      -> primary OLTP, strict low-latency
app-read-pool       -> primary/replica, operational reads
reporting-pool      -> replica/warehouse, longer timeout
etl-pool            -> controlled batch windows
admin-pool          -> restricted break-glass

20. Semantic Layer Boundary

The semantic layer is where business metrics become governed interfaces.

It can be implemented through:

  • SQL views
  • dbt semantic models
  • BI semantic model
  • metrics service
  • cube layer
  • data product API

Core responsibilities:

  • define metric names
  • define formulas
  • define grain
  • define allowed dimensions
  • define filters/exclusions
  • define time semantics
  • expose lineage
  • enforce access
  • version changes

Example:

CREATE VIEW analytics_semantic.case_monthly_volume AS
SELECT
    d.fiscal_year,
    d.fiscal_month,
    r.region_name,
    ct.case_group,
    SUM(f.opened_count) AS cases_opened
FROM analytics.fact_case_opened f
JOIN analytics.dim_date d ON d.date_key = f.opened_date_key
JOIN analytics.dim_region r ON r.region_key = f.region_key
JOIN analytics.dim_case_type ct ON ct.case_type_key = f.case_type_key
WHERE f.is_test_case = false
GROUP BY d.fiscal_year, d.fiscal_month, r.region_name, ct.case_group;

If every dashboard writes this from scratch, semantic layer is missing.


21. Data Product Boundary

A mature organization treats analytical outputs as data products.

Data product contract:

Data Product: Case Monthly Volume

Owner:
- Enforcement Analytics Team

Source:
- Case Platform v_case_opened_v1

Refresh:
- hourly incremental load

Freshness SLO:
- 99% within 90 minutes

Grain:
- one row per month, region, case group

Metric:
- cases_opened = sum fact_case_opened.opened_count excluding test cases

Consumers:
- Operations dashboard
- Monthly board report
- Regulatory performance pack

Access:
- aggregate only for regional managers
- row-level mart access restricted to analytics engineers

Lineage:
- raw.case_opened -> fact_case_opened -> case_monthly_volume

Quality checks:
- monthly count reconciles to source within 0.1%

Data product thinking prevents anonymous tables from becoming critical dependencies.


22. Operational Read Model vs Analytical Mart

CQRS-style read model is not the same as warehouse mart.

AspectOperational read modelAnalytical mart
Purposeserve application UI/APIserve BI/reporting
Freshnessnear-real-timedefined by data product
Shapeoptimized for screen/use caseoptimized for analysis/grain
Authoritymay support user workflownot command authority
Data volumecurrent/recent statehistorical
Querylookup/filter/listaggregate/scan
Ownershipapplication/domain teamanalytics/data team or shared

Example operational read model:

CREATE TABLE read_model.case_worklist_item (
    case_id bigint primary key,
    tenant_id bigint not null,
    case_number text not null,
    status text not null,
    priority text not null,
    queue_id bigint not null,
    assigned_to bigint,
    sla_due_at timestamptz,
    last_activity_at timestamptz not null
);

Example analytical mart:

CREATE TABLE analytics.fact_case_daily_snapshot (
    case_id bigint not null,
    snapshot_date_key integer not null,
    status_key bigint not null,
    queue_key bigint not null,
    priority_key bigint not null,
    age_days integer not null,
    open_case_count integer not null,
    primary key (case_id, snapshot_date_key)
);

Both may be derived from same source. They serve different contracts.


23. Boundary Review Method

When reviewing a system, classify every data consumer.

Questions:

  1. Who consumes this data?
  2. What decision do they make?
  3. What is the consequence of wrong/stale data?
  4. What is the required freshness?
  5. What is the required history?
  6. Does query shape threaten OLTP?
  7. Does access scope differ from operational access?
  8. Does the metric need reproducibility?
  9. Who owns the definition?
  10. Can it be rebuilt/reconciled?

24. Example: Case Backlog Dashboard

Requirement:

Managers need to see open backlog by region, queue, priority, and SLA status.

Bad design:

SELECT region_code, queue_id, priority, count(*)
FROM app.case_file
WHERE closed_at IS NULL
GROUP BY region_code, queue_id, priority;

Problems:

  • current-only; cannot show historical trend
  • definition of backlog implicit
  • status semantics ignored
  • query hits operational table
  • no freshness/reproducibility
  • no SLA snapshot history

Better design:

24.1 Operational Source

case_file, case_transition, task, sla_clock remain operational.

24.2 Daily Snapshot Fact

CREATE TABLE analytics.fact_case_daily_snapshot (
    snapshot_date_key integer not null,
    case_id bigint not null,
    region_key bigint not null,
    queue_key bigint not null,
    priority_key bigint not null,
    status_key bigint not null,
    sla_status_key bigint not null,
    age_days integer not null,
    open_case_count integer not null default 1,
    primary key (snapshot_date_key, case_id)
);

24.3 Semantic View

CREATE VIEW analytics_semantic.case_backlog_daily AS
SELECT
    d.calendar_date,
    r.region_name,
    q.queue_name,
    p.priority_name,
    sla.sla_status_name,
    SUM(f.open_case_count) AS backlog_cases
FROM analytics.fact_case_daily_snapshot f
JOIN analytics.dim_date d ON d.date_key = f.snapshot_date_key
JOIN analytics.dim_region r ON r.region_key = f.region_key
JOIN analytics.dim_queue q ON q.queue_key = f.queue_key
JOIN analytics.dim_priority p ON p.priority_key = f.priority_key
JOIN analytics.dim_sla_status sla ON sla.sla_status_key = f.sla_status_key
JOIN analytics.dim_case_status s ON s.status_key = f.status_key
WHERE s.is_backlog_status = true
GROUP BY d.calendar_date, r.region_name, q.queue_name, p.priority_name, sla.sla_status_name;

24.4 Freshness Contract

  • Near-real-time dashboard: refresh every 5 minutes from projection.
  • Historical trend: daily snapshot final after midnight + late correction window.
  • Official monthly backlog: frozen report run with source watermark.

One use case may need multiple serving models.


25. Example: Regulatory Official Monthly Report

Requirement:

Produce official monthly report of case opened, closed, escalated, and breached SLA by jurisdiction.

Architecture:

Required tables:

  • source raw table
  • fact tables
  • dimension tables
  • metric semantic views
  • report run table
  • report approval table
  • report artifact table
  • restatement table

Official report should record:

  • report version
  • query/model version
  • source watermark
  • dimension/taxonomy version
  • generated file hash
  • approver
  • approval timestamp
  • restatement status

This is not overengineering when report has legal/regulatory consequence.


26. Migration Boundary

Operational schema evolves. Analytical consumers must not break unexpectedly.

Safe pattern:

Rules:

  • analytics should consume contract, not random internal table
  • breaking source changes require versioning
  • deprecation needs consumer registry
  • warehouse transformation tests must run before OLTP migration contract phase
  • metric regression test must compare old/new outputs

27. Observability Across Boundary

You need end-to-end visibility:

  • source change rate
  • CDC lag
  • ingestion lag
  • transform duration
  • failed records
  • unknown dimension count
  • reconciliation diff
  • report freshness
  • query latency
  • dashboard usage
  • semantic model errors

Control table:

CREATE TABLE analytics_control.pipeline_run (
    pipeline_run_id uuid primary key,
    pipeline_name text not null,
    started_at timestamptz not null,
    finished_at timestamptz,
    status text not null,
    source_low_watermark text,
    source_high_watermark text,
    rows_read bigint,
    rows_inserted bigint,
    rows_updated bigint,
    rows_rejected bigint,
    validation_result jsonb,
    error_message text
);

Alert examples:

  • CDC lag > 15 minutes
  • unknown dimension rows > threshold
  • daily reconciliation diff > tolerance
  • report freshness SLO breach
  • sudden drop in fact count
  • duplicate source event detected
  • load completed but semantic view query fails

28. Boundary Failure Modes

FailureSymptomRoot causeMitigation
OLTP slowed by dashboardapp p99 spikesanalytical query on productionworkload isolation, warehouse, timeout
dashboard number changedhistorical report changedType 1 overwrite or source correctionSCD, restatement process
report stalelast hour missingpipeline lag invisiblewatermark + freshness alert
duplicate metricscount too highreplay without idempotencysource event unique key
missing factscount too lowlate dimensions quarantinedunknown row/retry monitoring
unauthorized exportdata leakanalytics access too broadmasking/RLS/export control
schema migration broke BIdashboard failsdirect dependency on OLTP internalscontract view/versioning
official report unreproducibleaudit challengeno report run metadatareport-run artifact + lineage

29. Architecture Review Checklist

Workload

  • Which queries are operational and which are analytical?
  • Are large scans isolated from OLTP?
  • Are timeouts/resource groups defined?
  • Are reporting users separated from app users?

Correctness

  • Which store is authoritative for commands?
  • Which store is authoritative for metrics?
  • Are freshness and staleness visible?
  • Are official reports reproducible?

Data Movement

  • What extraction pattern is used?
  • Is load idempotent?
  • Are deletes/corrections handled?
  • Can pipeline replay from source?
  • Are schema changes versioned?

Security

  • Is analytical access broader than operational access?
  • Is PII minimized/masked?
  • Are exports audited?
  • Are tenant/region restrictions preserved?

Ownership

  • Who owns source contract?
  • Who owns metric definition?
  • Who owns mart transformation?
  • Who approves official reports?

Operations

  • Are pipeline runs observable?
  • Are reconciliation checks automated?
  • Are watermarks exposed?
  • Is there a stale-report incident runbook?

30. Practical Heuristics

  • Do not let dashboards depend on internal OLTP tables by accident.
  • Do not use warehouse projections for command authorization.
  • Do not hide staleness; expose watermarks everywhere.
  • Do not treat CDC as semantic event design.
  • Do not copy every PII field into analytics “just in case”.
  • Do not let each dashboard define metrics independently.
  • Use read replicas for operational read scaling, not as permanent warehouse substitute.
  • Use governed marts for high-value repeated reporting.
  • Use report run metadata for official reports.
  • Make every boundary explicit: source, contract, freshness, ownership, security, reconciliation.

31. Closing Mental Model

Operational and analytical systems are both “databases”, but they are not the same product.

Operational database is a state machine and invariant enforcement engine.

Analytical database is a measurement and interpretation engine.

Top-tier database architects do not collapse those responsibilities accidentally. They design the boundary deliberately.

The mature question is not:

“Can the production database answer this query?”

The mature question is:

“Should this query be answered by the operational authority, an operational read model, a projection store, a governed mart, or an official reproducible reporting pipeline?”

That question is the difference between a database that survives scale and a database that becomes an accidental dependency swamp.

Lesson Recap

You just completed lesson 66 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.