Build CoreOrdered learning track

Consistency, Invalidation, and Stampede Control

Learn Java Redis In Action - Part 015

Production consistency patterns for Redis-backed caches: invalidation, versioned keys, freshness windows, stampede control, stale-while-revalidate, refresh-ahead, single-flight, logical expiry, and correctness envelopes in Java systems.

[2026-07-02]22 min read4327 words

In This Lesson

1. Kaufman Skill Decomposition 2. The Core Problem: Redis and the Source of Truth Are Separate Systems 3. Taxonomy of Cache Consistency Requirements

PrevNext

Lesson 1534 lesson track07–18 Build Core

#java#redis#caching#consistency+4 more

Part 015 — Consistency Patterns: Invalidation, Versioned Cache, Stampede Control

Part 014 introduced major cache patterns. This part goes deeper into the hardest part of caching:

How do we keep Redis useful without letting it silently corrupt business behavior?

The main enemy is not stale data by itself. The real enemy is unbounded, invisible, unjustified staleness.

A senior engineer does not say:

Redis is eventually consistent.

A senior engineer says:

This Redis projection may be stale for at most 30 seconds under normal operation, may serve stale for 5 minutes during source outage, is invalidated by these domain events, and is never used for payment authorization.

That is the difference between using Redis as a speed hack and using Redis as a production engineering component.

1. Kaufman Skill Decomposition

Target skill:

Design Redis cache consistency behavior explicitly, including invalidation, freshness bounds, stampede control, stale fallback, and operational failure modes.

Sub-skills:

Sub-skill	What you must be able to do
Consistency envelope	Define allowed staleness per user journey
Invalidation design	Decide delete, update, version bump, or event-driven refresh
Versioned cache	Prevent stale overwrite and cross-version pollution
Stampede control	Stop hot misses from overwhelming source systems
Logical expiry	Separate usability expiry from Redis physical expiry
Single-flight	Ensure only one request refreshes a hot key
Stale fallback	Serve controlled stale data during source failures
Negative consistency	Prevent not-found poisoning and creation races
Observability	Measure staleness, refresh failures, and miss amplification
Failure modeling	Know what happens during Redis, DB, and event bus failures

Practice rule for this part:

Every cache pattern must answer: what can be stale, for how long, who refreshes it, who invalidates it, and what happens when refresh fails?

2. The Core Problem: Redis and the Source of Truth Are Separate Systems

Most Java services cache data from a durable source:

PostgreSQL
MySQL
Oracle
MongoDB
Elasticsearch/OpenSearch
external HTTP service
another microservice
event-derived projection

Redis is usually not updated in the same atomic transaction as the source.

The weak point is obvious:

DB commit and Redis invalidation are not one atomic operation.

If DB commit succeeds but Redis deletion fails, cache may remain stale. If Redis deletion succeeds but DB commit later fails, cache may be unnecessarily cold. If two writes race, a stale writer may overwrite a fresh cache value. If hot data expires at once, many clients may hit the DB together.

Therefore consistency must be designed, not assumed.

3. Taxonomy of Cache Consistency Requirements

Not every cache needs the same correctness.

Use case	Correctness need	Example Redis behavior
Product description	Low/medium	TTL + event invalidation
User display name	Medium	TTL + delete after update
Authorization permission	High	short TTL or direct source check for critical actions
Pricing quote	Very high	do not trust cache for final price confirmation
Inventory badge	Medium	stale allowed for browse, not checkout
Feature flag	High	versioned config + short TTL + fallback policy
Fraud/risk decision	High	cache intermediate features, not final irreversible decision
Search result	Medium	index version + TTL + rebuild path
Dashboard stats	Low/medium	precomputed projection + freshness timestamp

A useful rule:

Redis can accelerate decisions, but it should not secretly become the authority for decisions whose wrongness has high business cost.

Ask these questions before choosing a pattern:

What is the source of truth?
Is stale data acceptable?
How stale is acceptable?
Is stale data acceptable during outage?
Can users see stale data, or only internal services?
Can stale data cause financial, legal, security, or compliance damage?
Is read latency more important than freshness?
How expensive is source recomputation?
How often does the data change?
How many clients may request the same key simultaneously?

4. Consistency Envelope

A consistency envelope is a written contract around cached data.

Example:

cache: customer-profile
source: customer_db.customer
owner: customer-service
read_path: customer-service GET /customers/{id}
write_path: customer-service PATCH /customers/{id}
normal_freshness: <= 60 seconds
max_stale_during_source_outage: <= 10 minutes
physical_ttl: 15 minutes
logical_ttl: 60 seconds
refresh_policy: single-flight stale-while-revalidate
invalidation_policy: delete-after-commit on profile update event
critical_paths:
  - never use for KYC enforcement
  - never use for billing address validation

This looks heavy, but for important caches it prevents months of ambiguous production behavior.

Minimal Envelope Template

Cache name:
Source of truth:
Owner service:
Key pattern:
Value schema version:
Allowed normal staleness:
Allowed degraded staleness:
Invalidation trigger:
Refresh trigger:
Fallback when Redis unavailable:
Fallback when source unavailable:
Forbidden usage:
Observability metrics:

If a cache cannot be described this way, the team probably does not understand the cache.

5. Four Kinds of Expiry

Many Redis cache bugs come from mixing different meanings of “expired”.

Expiry type	Meaning
Physical TTL	Redis automatically removes the key
Logical TTL	Application considers value stale after timestamp
Business validity	Domain rule says the value is no longer valid
Client freshness	The caller’s journey requires data no older than X

These are not equivalent.

Example:

{
  "schemaVersion": 3,
  "loadedAtEpochMs": 1783000000000,
  "freshUntilEpochMs": 1783000060000,
  "staleUntilEpochMs": 1783000600000,
  "sourceVersion": 928172,
  "payload": {
    "customerId": "c-123",
    "tier": "GOLD"
  }
}

Redis key TTL may be 15 minutes. Logical freshness may be 60 seconds. Business validity may end when customer tier changes. Client freshness may differ between journeys.

The pattern:

physical TTL > staleUntil > freshUntil

Why?

freshUntil controls normal freshness.
staleUntil allows degraded serving during source failure.
physical TTL bounds memory and cleans abandoned keys.

6. Basic Invalidation: Delete After Commit

The safest default for many write paths:

1. Write source of truth.
2. Commit transaction.
3. Delete Redis cache key.
4. Next read reloads from source.

Why delete instead of update?

Updating the cache after write looks attractive, but delete is often safer:

Approach	Benefit	Risk
Update cache after DB write	Keeps cache warm	stale writer can overwrite fresh value; incomplete projection risk
Delete cache after DB write	Forces fresh reload	next read pays source cost
Version bump	Avoids old key pollution	needs version lookup and cleanup
Event-driven invalidation	decouples writer/read model	event delay/loss must be handled

Delete-after-commit makes the next read rebuild from the actual source.

Java Pseudocode

@Transactional
public Customer updateCustomer(String customerId, PatchCustomerCommand command) {
    Customer updated = customerRepository.patch(customerId, command);

    // DB transaction commits when method exits.
    // In Spring, prefer after-commit hook for Redis invalidation.
    TransactionSynchronizationManager.registerSynchronization(new TransactionSynchronization() {
        @Override
        public void afterCommit() {
            redisTemplate.delete("cache:customer:v3:" + customerId);
        }
    });

    return updated;
}

Important:

Do not delete before commit unless you deliberately accept refill-before-commit races.

7. The Classic Race: Delete Then Stale Refill

Consider this sequence:

Result:

Redis contains stale old value after writer invalidated it.

This happens when a reader loads old data before the writer commits and writes it after invalidation.

Solutions:

short TTL only
double delete
versioned cache
compare source version before setting cache
logical timestamp check in Lua
cache only after reading committed source version

There is no universal answer. Choose based on data criticality and write/read concurrency.

8. Double Delete Pattern

Pattern:

1. Delete cache.
2. Write DB.
3. Delete cache again after a small delay.

Or safer in many Java services:

1. Write DB.
2. Commit.
3. Delete cache.
4. Schedule second delete after expected stale refill window.

When it helps:

read-through cache can refill stale data during write race
no source version available
stale risk is moderate
delayed second delete is cheap

When it is weak:

delay is guessed
long source query can exceed delay
event scheduler may fail
repeated writes complicate behavior
it is mitigation, not proof

Double delete is a pragmatic patch. It is not a strong correctness protocol.

Java Sketch

public void invalidateCustomerAfterCommit(String customerId) {
    String key = customerKey(customerId);

    redisTemplate.delete(key);

    delayedExecutor.schedule(
        () -> redisTemplate.delete(key),
        Duration.ofMillis(750)
    );
}

Better version:

Delay should be based on observed p99 source-read + p99 cache-set latency, not folklore.

9. Versioned Cache Keys

Versioned keys avoid stale overwrite by changing the key namespace when source changes.

Example:

cache:customer:{c-123}:v42

A separate version pointer tells readers which key to use:

cachever:customer:{c-123} = 42
cache:customer:{c-123}:v42 = payload

Flow:

Advantages:

old refills write to old key only
latest pointer selects new version
avoids stale overwrite on the active key
useful for materialized projections

Costs:

two Redis reads unless pipelined
old keys require TTL cleanup
version pointer consistency matters
source must expose version or update counter

Version Sources

Source	Suitability
DB row `version` column	strong and simple
`updated_at` timestamp	useful but watch clock precision
monotonically increasing sequence	best for strict ordering
domain event offset	good for projection caches
Redis `INCR` version	ok if Redis drives invalidation, weak if DB is source

Java Key Builder

public final class CustomerCacheKeys {
    public static String versionKey(String customerId) {
        return "cachever:customer:{" + customerId + "}";
    }

    public static String payloadKey(String customerId, long version) {
        return "cache:customer:{" + customerId + "}:v" + version;
    }
}

In Redis Cluster, the hash tag {customerId} keeps related keys in the same hash slot. This matters for Lua and multi-key operations.

10. Value-Level Version Guard

If you cannot version the key, version the value.

Payload envelope:

{
  "schemaVersion": 3,
  "sourceVersion": 43,
  "loadedAtEpochMs": 1783000000000,
  "payload": {
    "customerId": "c-123",
    "name": "Ari"
  }
}

Before writing cache:

Only set payload if existing sourceVersion is missing or <= candidate sourceVersion.

This should be atomic. Use Lua or Redis Functions.

Lua Version Guard

-- KEYS[1] = cache key
-- ARGV[1] = candidate source version
-- ARGV[2] = candidate payload
-- ARGV[3] = ttl seconds

local current = redis.call('GET', KEYS[1])
if current then
  local currentVersion = tonumber(string.match(current, '"sourceVersion"%s*:%s*(%d+)'))
  local candidateVersion = tonumber(ARGV[1])

  if currentVersion and currentVersion > candidateVersion then
    return 0
  end
end

redis.call('SET', KEYS[1], ARGV[2], 'EX', ARGV[3])
return 1

Do not parse large JSON in Lua in a hot path if avoidable. A more practical design stores version separately:

cache:customer:{c-123}:value
cache:customer:{c-123}:version

Then a script can compare numeric version cheaply.

11. Event-Driven Invalidation

When writes happen in one service and reads happen in another, direct invalidation is insufficient.

Pattern:

1. Writer commits source change.
2. Writer emits domain event.
3. Cache-owning service consumes event.
4. Cache-owning service invalidates or refreshes Redis keys.

Important:

Use outbox if losing invalidation events would create unacceptable stale behavior.

Do not rely on “publish event after commit” without a retryable mechanism if correctness matters.

Invalidate or Refresh?

Strategy	When useful	Risk
Invalidate on event	common, simple, source reloads on demand	first read after event pays miss
Refresh on event	hot data stays warm	event consumer may overload source
Version bump on event	strong stale avoidance	needs version pointer and cleanup
Refresh-ahead	predictable hot keys	extra compute and stale scheduling

For most systems:

invalidate on event + refresh on demand

is safer than eager refresh everything.

12. Invalidation Fanout

One source change may affect many cache keys.

Example: product price update affects:

cache:product:{p-1}:detail:v3
cache:category:{cat-7}:products:page:1:v5
cache:search:q:{hash}:page:1:v2
cache:recommendation:user:{u-9}:v4
cache:quote-preview:{tenant}:{cart-hash}:v1

This is where naive caching collapses.

You need a dependency model.

Options

Approach	How it works	Trade-off
Direct key invalidation	writer knows all affected keys	tight coupling, brittle
Tag/index invalidation	maintain reverse index of affected keys	extra write/memory overhead
Version namespace	bump domain version used in key	simple, may orphan old keys
Short TTL	avoid explicit dependency tracking	stale window and source load
Rebuild projection	async refresh derived views	eventual freshness

Version Namespace Example

cachever:product:p-1 = 18
cachever:category:cat-7 = 81
cache:category:{cat-7}:products:v81:page:1

When category membership changes:

INCR cachever:category:cat-7

Old pages become unreachable and expire physically later.

This is often simpler than tracking every page key.

13. Cache Stampede Mental Model

A cache stampede happens when many requests miss at the same time and all recompute the same value.

Miss amplification:

source_load = miss_count * rebuild_cost

For hot keys, a single expiration can become a backend outage.

Stampede causes:

synchronized TTL
Redis restart flushes hot keys
deploy changes cache namespace
source outage causes refresh failures
high-cardinality keys with uneven traffic
manual invalidation of many hot keys
short TTL on expensive values

A top engineer treats stampede as a capacity problem, not only a cache problem.

14. TTL Jitter

Without jitter:

10,000 product keys loaded at 10:00
all expire at 10:15
backend spike at 10:15

With jitter:

ttl = baseTtl + random(0, jitterRange)

Example:

Duration ttlWithJitter(Duration base, Duration jitter) {
    long jitterMs = ThreadLocalRandom.current().nextLong(jitter.toMillis() + 1);
    return base.plusMillis(jitterMs);
}

Better for some cases:

ttl = baseTtl * random(0.8, 1.2)

Guideline:

Situation	Jitter
low traffic, cheap source	optional
many keys loaded in batch	mandatory
hot keys with same TTL	mandatory
scheduled refresh job	mandatory
cache namespace migration	mandatory

Jitter spreads load. It does not solve hot-key recomputation by itself.

15. Single-Flight Refresh

Single-flight means:

For a given key, only one caller rebuilds the value; others wait, return stale, or fail fast.

Basic lock-based flow:

Redis lock:

SET lock:refresh:customer:{c-123} <token> NX PX 5000

Rules:

lock TTL must be longer than expected refresh p99
lock token must be random
release only if token matches
never block request threads indefinitely
define fallback for lock wait timeout

Safe Unlock Lua

if redis.call('GET', KEYS[1]) == ARGV[1] then
  return redis.call('DEL', KEYS[1])
else
  return 0
end

This prevents one worker from deleting another worker’s lock after timeout/reacquire.

16. Single-Flight Java Sketch

public Optional<CustomerDto> getCustomer(String customerId) {
    String cacheKey = customerKey(customerId);
    String lockKey = "lock:refresh:customer:{" + customerId + "}";

    Optional<CacheEnvelope<CustomerDto>> cached = redisCache.get(cacheKey, customerType);
    if (cached.isPresent() && cached.get().isFresh(clock)) {
        return Optional.of(cached.get().payload());
    }

    String token = UUID.randomUUID().toString();
    boolean lockAcquired = redisLock.tryAcquire(lockKey, token, Duration.ofSeconds(5));

    if (lockAcquired) {
        try {
            // Double-check after acquiring lock.
            Optional<CacheEnvelope<CustomerDto>> afterLock = redisCache.get(cacheKey, customerType);
            if (afterLock.isPresent() && afterLock.get().isFresh(clock)) {
                return Optional.of(afterLock.get().payload());
            }

            CustomerDto loaded = customerRepository.findDto(customerId)
                .orElseThrow(() -> new NotFoundException(customerId));

            CacheEnvelope<CustomerDto> envelope = CacheEnvelope.fresh(
                loaded,
                clock.instant(),
                Duration.ofSeconds(60),
                Duration.ofMinutes(10)
            );

            redisCache.set(cacheKey, envelope, Duration.ofMinutes(15));
            return Optional.of(loaded);
        } finally {
            redisLock.releaseIfTokenMatches(lockKey, token);
        }
    }

    if (cached.isPresent() && cached.get().isUsablyStale(clock)) {
        metrics.increment("redis.cache.stale_served", "cache", "customer");
        return Optional.of(cached.get().payload());
    }

    // Last resort: short local wait then retry cache.
    sleepSmallBoundedDelay();
    return redisCache.get(cacheKey, customerType).map(CacheEnvelope::payload);
}

Key points:

read before lock
acquire lock only on miss/stale
double-check after lock
stale fallback if another worker refreshes
bounded wait only
no infinite retry loop

17. Stale-While-Revalidate

Stale-while-revalidate separates response availability from refresh latency.

State model:

fresh      -> return immediately
stale      -> return stale and trigger refresh
too stale  -> block on refresh or fail
missing    -> block on refresh or fail

Value envelope:

{
  "loadedAtEpochMs": 1783000000000,
  "freshUntilEpochMs": 1783000060000,
  "staleUntilEpochMs": 1783000600000,
  "payload": {}
}

Behavior:

State	Action
fresh	return value
stale but usable	return stale; one worker refreshes async
too stale	try foreground refresh
refresh fails and stale usable	return stale with metric
refresh fails and too stale	fail or degraded response

This pattern is excellent for:

profile summary
product detail
CMS page
dashboard widgets
risk features that tolerate degraded freshness
external API response cache

It is dangerous for:

final price authorization
payment state
permission revocation
inventory reservation
compliance decision

18. Refresh-Ahead

Refresh-ahead means refreshing hot keys before clients observe expiry.

Example:

if now > freshUntil - refreshAheadWindow:
    trigger refresh in background

Use it for:

known hot keys
expensive source queries
predictable dashboards
product homepage data
configuration snapshots

Avoid it for:

unbounded high-cardinality keyspace
rarely used keys
source systems with tight capacity
data where refresh does not matter until requested

Refresh-ahead requires admission control. Without admission control it becomes a cache warming DDoS against your own database.

Hot-Key Refresh Queue

Sorted set key:

cache-refresh:customer-profile:due

Member:

customer:{c-123}

Score:

epoch millis refresh due time

19. Negative Cache Consistency

Negative cache stores “not found” results.

Example:

cache:customer:{c-404}:negative = { "reason": "not_found" }

Benefits:

blocks repeated DB hits for missing IDs
protects source from malicious/random probes
reduces expensive external API calls

Risks:

newly created record hidden until negative TTL expires
permission-sensitive not-found may leak semantics
invalidation on creation is often forgotten

Rules:

Rule	Why
Use short TTL	not-found can become found
Include scope	tenant/user/permission affects visibility
Invalidate on create	creation must remove negative marker
Separate not-found vs forbidden	avoid security confusion
Do not cache transient source errors as not-found	prevents false negative poisoning

Negative key pattern:

cache:customer:{tenant-a:c-123}:neg:v1

TTL examples:

Data	Negative TTL
random ID lookup	30s-5m
external API missing object	1m-15m
user-created object	5s-30s
security-sensitive resource	often do not cache negative result

20. Permission-Aware Cache Keys

One of the worst cache consistency bugs is missing authorization dimensions.

Bad key:

cache:case-detail:{case-123}

If the payload depends on the viewer, this is wrong.

Better:

cache:case-detail:{tenant-a:case-123}:viewer-role:{role-hash}:v4

Or split the cache:

cache:case-public-summary:{tenant-a:case-123}:v2
cache:case-sensitive-fields:{tenant-a:case-123}:permission-snapshot:{hash}:v1

A cache key must include every input that changes the output:

tenant
locale
currency
role
entitlement snapshot
experiment bucket
API version
projection version
source version
feature flag version

Invariant:

If two requests can legitimately receive different responses, they must not share the same cached value unless the cached value contains only their common subset.

21. Local In-Process Cache + Redis Cache

Many Java systems use two levels:

Caffeine local cache -> Redis -> DB

This improves latency, but complicates invalidation.

Problems:

Redis invalidation does not automatically clear local caches
local cache may serve stale after Redis changed
instance restart changes behavior
Pub/Sub invalidation is at-most-once

Safer design:

Layer	TTL	Role
local Caffeine	very short, e.g. 1-5 seconds	absorb microbursts
Redis	longer, e.g. 1-15 minutes	distributed cache
DB/source	authoritative	correctness

Do not put long TTL in local memory unless you have reliable invalidation.

Local Cache Rule

local_ttl <= smallest acceptable staleness window

For permission/security data, local cache should often be extremely short or disabled.

22. Cache Rebuild Admission Control

When Redis misses, you need to protect the source.

Admission strategies:

Strategy	Behavior
single-flight per key	one rebuild per key
global rebuild semaphore	cap total concurrent source loads
per-tenant rebuild limit	prevent noisy tenant overload
priority queue	rebuild critical caches first
fail fast	reject low-priority rebuilds during pressure
stale fallback	serve stale instead of rebuilding

Java sketch:

public <T> T loadWithAdmission(String cacheName, Supplier<T> loader) {
    if (!rebuildLimiter.tryAcquire()) {
        throw new CacheRebuildRejectedException(cacheName);
    }
    try {
        return loader.get();
    } finally {
        rebuildLimiter.release();
    }
}

Metric:

cache_rebuild_rejected_total{cache="customer-profile"}

If this metric is non-zero, Redis is protecting the source by degrading cache refresh. That is often better than taking the source down.

23. Cache Miss Is Not One Thing

Do not observe only hit and miss.

A miss can mean:

Miss type	Meaning	Action
cold miss	never loaded	normal load
expired miss	physical TTL removed key	maybe stampede risk
invalidated miss	writer deleted key	expected source load
version miss	new version pointer, payload absent	rebuild latest
negative miss	no negative marker	source check needed
decode miss	value exists but cannot decode	delete and reload
too-stale miss	logical stale limit exceeded	foreground refresh
Redis error miss	cache unavailable	fallback or fail

Use reason codes in cache abstraction:

public enum CacheReadStatus {
    FRESH_HIT,
    STALE_HIT,
    MISS_ABSENT,
    MISS_TOO_STALE,
    MISS_DECODE_ERROR,
    MISS_REDIS_ERROR,
    NEGATIVE_HIT
}

Without reason codes, cache metrics become misleading.

24. Stale Overwrite Prevention

The stale overwrite problem:

Old reader loads source version 10 slowly.
New writer updates source to version 11.
Fast reader caches version 11.
Old reader finishes and overwrites cache with version 10.

Prevention options:

Option	Strength	Cost
versioned key	high	version pointer + old key cleanup
Lua compare version	high	envelope/version management
short TTL	low	stale window remains
delete on write	medium	race remains
double delete	medium	guessed delay

Preferred for important data:

sourceVersion + atomic set-if-newer

Separate Version Key Lua

-- KEYS[1] = value key
-- KEYS[2] = version key
-- ARGV[1] = candidate version
-- ARGV[2] = payload
-- ARGV[3] = ttl seconds

local currentVersion = redis.call('GET', KEYS[2])
local candidateVersion = tonumber(ARGV[1])

if currentVersion and tonumber(currentVersion) > candidateVersion then
  return 0
end

redis.call('SET', KEYS[1], ARGV[2], 'EX', ARGV[3])
redis.call('SET', KEYS[2], ARGV[1], 'EX', ARGV[3])
return 1

Cluster note:

cache:customer:{c-123}:value
cache:customer:{c-123}:version

The hash tag ensures both keys are in the same slot.

25. Freshness Budget

A freshness budget is the maximum age tolerated by a path.

Example:

Path	Allowed age
product browse	5 minutes
product detail price preview	30 seconds
checkout price confirmation	0 seconds or source-authoritative
admin config page	10 seconds
homepage recommendation	15 minutes
fraud feature cache	1-5 minutes depending on feature

Do not attach one TTL to an entity globally. The same entity may have different freshness budgets in different journeys.

Better:

cache:product-summary:{p-1}:browse:v3        TTL 10m
cache:product-price-preview:{p-1}:v5         TTL 30s
no-cache final checkout price authority      source read

This avoids over-constraining cheap paths and under-protecting critical paths.

26. Cache Key Versioning for Deployments

Schema changes require cache versioning.

Bad:

cache:customer:{id}

Better:

cache:customer:v3:{id}

But with Redis Cluster:

cache:customer:{id}:v3

because {id} is the hash tag.

Use version bump when:

serialized schema changes incompatibly
value meaning changes
key dimension changes
source query changes materially
permission model changes

Do not use version bump casually for every deploy. It can cold-start your entire cache.

Migration Strategy

Strategy	Behavior
read old, write new	gradual migration
dual write	higher write cost
namespace cutover	simple but cold start risk
prewarm	reduces cold start, adds source load
fallback decoder	supports mixed versions

For high traffic caches, use gradual migration:

GET v3 -> if miss GET v2 -> transform -> SET v3

Then expire v2 naturally.

27. Redis Unavailable: What Should Happen?

Every cache user needs a Redis failure policy.

Cache role	Redis down behavior
performance-only cache	bypass Redis, read source with admission control
source-protecting cache	degrade or fail fast to avoid DB collapse
coordination lock	do not pretend lock succeeded
idempotency marker	fail closed or use source uniqueness
rate limiter	fail open or fail closed based on risk
feature flag cache	use last-known-good local snapshot

For cache consistency, the most dangerous policy is accidental:

catch RedisException -> call DB with no limit

Under Redis outage, every service instance may stampede the DB.

Safer:

circuit breaker around Redis
bounded source fallback
per-cache fallback policy
stale local snapshot for selected caches
global rebuild semaphore
degraded response for non-critical data

28. Source Unavailable: What Should Happen?

If source is down and Redis has stale value, should you serve it?

Depends.

Data	Serve stale during source outage?
product content	yes, bounded
customer display profile	often yes
permissions	rarely, or very short bounded
payment status	usually no
rate plan definition	maybe, if effective date included
fraud rule	maybe last-known-good if governed
final quote price	no unless business explicitly accepts

Stale serving should be logged and measured:

cache_stale_served_total{cache="product-detail", reason="source_failure"}
cache_stale_age_seconds{cache="product-detail"}

Alert not on any stale serve, but on excessive stale age or volume.

29. Testing Cache Consistency

Unit tests are not enough. You need concurrency and failure tests.

Test Cases

Test	What it proves
concurrent read miss single-flight	only one source call
read/write race	stale overwrite prevented or bounded
Redis delete failure after DB commit	stale bounded by TTL/event repair
source failure with stale value	stale fallback policy works
source failure without stale value	correct failure response
decode failure	cache is deleted/reloaded
namespace migration	old value not decoded incorrectly
negative cache create race	creation invalidates negative marker
Redis cluster cross-slot	Lua/multi-key script keys co-located

Deterministic Race Test Sketch

@Test
void staleReaderMustNotOverwriteNewerCacheValue() throws Exception {
    CountDownLatch readerLoadedOld = new CountDownLatch(1);
    CountDownLatch writerCommittedNew = new CountDownLatch(1);

    CompletableFuture<Void> oldReader = CompletableFuture.runAsync(() -> {
        CustomerDto old = repository.loadVersion(10);
        readerLoadedOld.countDown();
        await(writerCommittedNew);
        cache.putIfNewer("c-123", 10, old);
    });

    CompletableFuture<Void> writer = CompletableFuture.runAsync(() -> {
        await(readerLoadedOld);
        repository.updateToVersion(11);
        cache.putIfNewer("c-123", 11, repository.loadVersion(11));
        writerCommittedNew.countDown();
    });

    CompletableFuture.allOf(oldReader, writer).join();

    assertThat(cache.get("c-123").sourceVersion()).isEqualTo(11);
}

This is the kind of test that reveals senior-level cache bugs.

30. Operational Metrics

Minimum metrics:

cache_read_total{cache,status}
cache_write_total{cache,status}
cache_delete_total{cache,status}
cache_hit_ratio{cache}
cache_stale_served_total{cache,reason}
cache_stale_age_seconds{cache}
cache_rebuild_total{cache,status}
cache_rebuild_duration_seconds{cache}
cache_rebuild_inflight{cache}
cache_lock_acquire_total{cache,status}
cache_lock_wait_duration_seconds{cache}
cache_decode_failure_total{cache,schemaVersion}
cache_payload_bytes{cache}
cache_negative_hit_total{cache}
cache_source_fallback_total{cache,reason}

Redis/server-side signals:

instantaneous_ops_per_sec
used_memory
evicted_keys
expired_keys
keyspace_hits
keyspace_misses
connected_clients
blocked_clients
cmdstat_get/usec_per_call
cmdstat_set/usec_per_call
slowlog_len
latency_spike_events

Business-facing signals:

checkout_price_revalidation_failure_total
permission_cache_stale_denied_total
permission_cache_stale_allowed_total
quote_preview_stale_age_seconds
customer_profile_cache_age_seconds

Do not stop at hit ratio. A high hit ratio can still hide stale, wrong, or oversized values.

31. Pattern Selection Matrix

Problem	Recommended pattern
normal product read cache	cache-aside + TTL jitter + delete-after-commit
hot expensive key	stale-while-revalidate + single-flight
source version available	versioned key or set-if-newer
many derived keys affected	namespace version bump
creation race with missing objects	short negative cache + invalidate on create
cache schema migration	versioned key + read-old/write-new
cross-service writes	outbox event invalidation
Redis outage risk to DB	source fallback admission control
stale overwrite unacceptable	atomic version guard
local cache above Redis	very short local TTL + invalidation hint only

32. Production Checklist

Before approving a Redis cache in a serious Java system:

33. Anti-Patterns

Anti-pattern 1 — TTL as a Guess

TTL = 1 hour because it feels reasonable

Better:

TTL = based on freshness budget + source load + stale fallback policy

Anti-pattern 2 — Cache Key Missing Security Context

cache:case:{caseId}

Better:

cache:case-summary:{tenantId:caseId}:visibility:{permissionHash}:v2

Anti-pattern 3 — Redis Exception Means Unlimited DB Fallback

catch (RedisException e) {
    return db.load(id);
}

Better:

catch (RedisException e) {
    return sourceFallbackLimiter.executeOrDegrade(() -> db.load(id));
}

Anti-pattern 4 — Hit Ratio as the Only Cache KPI

A cache can have 99% hit ratio and still serve unauthorized or stale data.

Add:

stale age
decode failures
source fallback rate
lock contention
rebuild duration
forbidden-path usage guard

Anti-pattern 5 — Pub/Sub as Reliable Invalidation

Redis Pub/Sub is an ephemeral signaling mechanism. Treat Pub/Sub invalidation as a hint unless you have another repair mechanism such as TTL, versioning, or event replay.

34. Engineering Heuristics

Use these defaults unless you have a reason not to:

Prefer cache-aside for simple derived reads.
Prefer delete-after-commit over update-after-commit.
Prefer versioned keys for high-write/high-read race-prone caches.
Prefer short local cache TTL above Redis.
Prefer stale-while-revalidate for hot expensive reads where stale is acceptable.
Prefer source-authoritative reads for final irreversible decisions.
Prefer outbox-driven invalidation when writer and reader are separate services.
Prefer TTL jitter on all batch-loaded or hot cache families.
Prefer explicit degraded behavior over accidental best effort.

The top 1% skill is not knowing many cache patterns. The top 1% skill is knowing which pattern is safe under the actual failure modes.

35. Part Summary

Cache consistency is not binary. It is an envelope containing freshness, invalidation, refresh, fallback, and observability.

Key takeaways:

Redis and the source of truth are usually not updated atomically together.
Define consistency envelope per cache and per user journey.
Physical TTL, logical TTL, business validity, and client freshness are different concepts.
Delete-after-commit is often the safest default invalidation strategy.
Double delete reduces some races but is not strong correctness.
Versioned keys and version-guarded writes prevent stale overwrites.
Event-driven invalidation needs reliable publication, often via outbox.
Stampede control is mandatory for hot keys.
Stale-while-revalidate improves availability when stale data is acceptable.
Negative cache must be short-lived and invalidated on create.
Cache keys must include security and personalization dimensions.
Redis outage must not become uncontrolled DB fallback.
Measure stale age, rebuild pressure, decode errors, and fallback behavior.

Next part:

Part 016 — Idempotency, Deduplication, and Exactly-Once Illusions

References

Redis Docs — SET command: https://redis.io/docs/latest/commands/set/
Redis Docs — EXPIRE command and options: https://redis.io/docs/latest/commands/expire/
Redis Docs — TTL command: https://redis.io/docs/latest/commands/ttl/
Redis Docs — Scripting with Lua: https://redis.io/docs/latest/develop/programmability/eval-intro/
Redis Docs — Redis Functions: https://redis.io/docs/latest/develop/programmability/functions-intro/
Redis Docs — Distributed locks with Redis: https://redis.io/docs/latest/develop/clients/patterns/distributed-locks/
Redis Blog — Thundering herd/cache stampede: https://redis.io/blog/how-to-tame-the-thundering-herd-problem/

Lesson Recap

You just completed lesson 15 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Idempotency, Deduplication, and Exactly-Once Illusions