Deepen PracticeOrdered learning track

Vector Search and AI-Oriented Redis Patterns

Learn Java Redis In Action - Part 023

Production-oriented Redis vector search, Redis 8 Vector Sets, Redis Query Engine vector indexes, embedding cache, semantic cache, RAG retrieval, recommendation patterns, Java integration, and AI-serving failure models.

18 min read3422 words
PrevNext
Lesson 2334 lesson track1928 Deepen Practice
#java#redis#vector-search#vector-set+6 more

Part 023 — Vector Search and AI-Oriented Redis Patterns

Part 022 covered Time Series and probabilistic structures: bounded approximation for low-latency decisions. This part moves into a newer Redis capability area:

Redis as a low-latency vector and AI-serving layer.

Do not frame this as “Redis replaces every vector database”. That is the wrong mental model.

A better framing:

Redis is useful when vector similarity is part of a latency-sensitive serving path, especially when you also need cache semantics, metadata filters, TTL, counters, feature state, and Java application integration close to the request path.

Redis now has two broad vector-oriented shapes you need to understand:

  1. Redis Query Engine vector indexes over Hash or JSON documents.
  2. Redis 8 Vector Sets, a native vector set data type with commands such as VADD, VSIM, and membership/inspection commands.

They solve overlapping but not identical problems. A senior engineer must be able to choose between them without hype.


1. Kaufman Skill Decomposition

The skill is not “run a vector query”. The real skill is:

Design a bounded, observable, version-aware semantic serving path where embeddings, metadata, cache state, model versions, and retrieval correctness are explicit.

Breakdown:

Sub-skillWhat you must be able to do
Embedding mental modelExplain vectors, dimensions, distance metrics, normalization, model versioning, and drift
Redis vector optionsChoose between Vector Sets and Query Engine vector indexes
Data modelingDecide key structure, metadata shape, TTL, tenant boundary, and deletion strategy
Java integrationEncode vectors safely, batch writes, run queries, and hide client-specific API churn
Semantic cache designCache LLM/query results by semantic similarity without serving wrong answers
RAG retrieval designStore chunk metadata, retrieve candidates, filter by tenant/security, and rerank safely
Recommendation designUse similarity, co-occurrence, sorted sets, and metadata filters together
ConsistencyHandle document updates, embedding re-generation, stale vectors, and model migration
PerformanceBudget embedding generation, vector indexing, query latency, memory, payload, and result count
OperationsMonitor recall, hit rate, drift, memory, index build time, hot tenants, and fallback behavior

Kaufman-style target for this part:

After this part, you should be able to design and implement a Redis-backed semantic lookup or RAG retrieval path in Java, explain why Redis was chosen, and state the correctness/performance boundary clearly.


2. Core Mental Model: Vector Search Is Approximate Meaning Lookup

Traditional lookup:

key = exact identifier
value = exact object

Vector lookup:

query text/image/item -> embedding vector -> nearest vectors -> candidate objects

The key difference:

Vector search does not prove truth. It returns nearby representations.

This matters because Redis may return the nearest item in embedding space, but:

  • nearest does not mean correct
  • similar does not mean authorized
  • semantically close does not mean up to date
  • embedding model changes can invalidate previous similarity behavior
  • approximate nearest-neighbor indexes can trade recall for speed
  • metadata filters can drastically change candidate quality

For production systems, vector search is usually one stage in a pipeline:

Redis is strongest when the query path needs to be fast, stateful, and close to the Java service.


3. Redis Vector Capability Map

3.1 Redis Query Engine Vector Index

Use this when you need:

  • vector fields inside Hash or JSON documents
  • metadata filtering
  • text search + vector search hybrid queries
  • numeric/tag/geospatial filters
  • document-style indexing
  • richer query syntax
  • more explicit schema/index management

Example conceptual document:

Key: doc:{tenantId}:{docId}:{chunkId}
Type: Hash or JSON
Fields:
  tenant_id      = acme
  doc_id         = policy-2026-001
  chunk_id       = 017
  content        = "Refund requests older than 90 days..."
  embedding      = <float32 vector bytes or JSON array>
  model_version  = text-embedding-v4
  security_scope = compliance-team
  updated_at     = 2026-07-02T12:00:00Z

Query Engine style is closer to “search database”. It asks you to define indexes, fields, vector dimensions, distance metric, and query expression.

3.2 Redis 8 Vector Sets

Use this when you need:

  • simpler native vector collection semantics
  • nearest-neighbor lookup by vector set member
  • low-friction vector similarity for online features
  • item-to-item recommendation style lookup
  • semantic matching where metadata filtering is simple or external
  • embedding collections that behave like a Redis-native structure

Conceptual shape:

Key: vset:tenant:{tenantId}:product-embedding:model:{modelVersion}
Type: Vector Set
Members:
  product:1001 -> [0.021, -0.44, ...]
  product:1002 -> [0.019, -0.39, ...]
  product:1003 -> [0.70,  0.02, ...]

Vector Set is simpler, but you still need metadata elsewhere:

vset:tenant:acme:product:model:v4       -> vector set
product:tenant:acme:1001                -> product hash/json
z:tenant:acme:product:popularity        -> sorted set
set:tenant:acme:product:active          -> active product ids

3.3 Decision Matrix

NeedPrefer
Vector + rich metadata filteringQuery Engine vector index
Vector + text search hybridQuery Engine vector index
JSON/Hash document searchQuery Engine vector index
Simple item-to-item nearest neighborsVector Set
Online recommendation candidate generationVector Set or Query Engine depending on filters
RAG chunk retrieval with security filtersQuery Engine vector index
Semantic cache with bounded candidatesVector Set can be enough
Strict audit/reporting truthNeither alone; pair with durable source of truth

The rule:

If metadata filtering determines correctness, keep filtering inside the index or strictly enforce it immediately after retrieval.


4. Embedding Model Versioning Is Not Optional

A vector is not just a number array. It is an output of:

  • model provider
  • model name/version
  • preprocessing rules
  • language handling
  • normalization policy
  • dimension count
  • distance metric assumption
  • chunking strategy
  • content version

That means this is dangerous:

doc:{id}:embedding

This is better:

embedding:{tenantId}:{entityType}:{modelVersion}:{entityId}

For RAG chunks:

rag:chunk:{tenantId}:{corpusId}:{modelVersion}:{docId}:{chunkNo}

For semantic cache:

semcache:{tenantId}:{intent}:{modelVersion}:{bucket}:{cacheId}

For Vector Sets:

vset:{tenantId}:{entityType}:{modelVersion}

Model version must be part of your key/index boundary because similarity from model A is not guaranteed to be comparable with similarity from model B.

Bad migration

Same vector index
Old and new embeddings mixed
No model_version filter
Similarity quality slowly degrades
No obvious exception thrown

Better migration

Key pattern:

config:tenant:{tenantId}:semantic-search:active-model -> v4

Do not hard-code active model version in application code if you expect controlled rollout.


5. Pattern 1 — Embedding Cache

Embedding generation is often slower and more expensive than Redis lookup. So a simple, high-value Redis use case is:

Cache normalized input → embedding vector.

5.1 Key Design

embcache:{tenantId}:{modelVersion}:{sha256(normalizedInput)}

Value envelope:

{
  "modelVersion": "text-embedding-v4",
  "dimension": 1536,
  "normalization": "lowercase-trim-collapse-space-v2",
  "inputHash": "...",
  "createdAt": "2026-07-02T12:00:00Z",
  "vectorEncoding": "float32-base64",
  "vector": "..."
}

For binary efficiency, store only the vector bytes plus minimal metadata in a Hash:

HSET embcache:tenant:v4:hash \
  model_version text-embedding-v4 \
  dimension 1536 \
  created_at 2026-07-02T12:00:00Z \
  vector <binary bytes>
EXPIRE embcache:tenant:v4:hash 2592000

5.2 Correctness Rules

Embedding cache is safe when:

  • the normalized input is deterministic
  • model version is part of the key
  • preprocessing version is part of the key or metadata
  • cache miss falls back to model generation
  • TTL is aligned with provider/model migration policy

Embedding cache is unsafe when:

  • user-specific context changes meaning but is not in the key
  • language detection is nondeterministic and not recorded
  • model aliases change silently
  • sensitive input is cached without governance
  • vector bytes are not dimension-checked before reuse

5.3 Java Interface

Keep embedding cache behind a narrow interface:

public interface EmbeddingCache {
    Optional<float[]> get(EmbeddingRequest request);
    void put(EmbeddingRequest request, float[] vector, Duration ttl);
}

public record EmbeddingRequest(
        String tenantId,
        String modelVersion,
        String preprocessingVersion,
        String normalizedInputHash,
        int expectedDimension
) {}

Dimension validation is mandatory:

static void validateVector(float[] vector, int expectedDimension) {
    if (vector == null) {
        throw new IllegalArgumentException("vector is null");
    }
    if (vector.length != expectedDimension) {
        throw new IllegalArgumentException(
            "Embedding dimension mismatch: expected " + expectedDimension + " but got " + vector.length
        );
    }
}

5.4 Failure Model

FailureImpactMitigation
Cache missHigher latency/costGenerate embedding and populate cache
Redis unavailableHigher latency or degraded featureBypass cache; rate-limit model calls
Wrong model version reusedBad similarityInclude model version in key and validate metadata
Large vectors increase memoryEviction/costTTL, compression, binary encoding, max payload policy
Sensitive text storedCompliance riskStore hash as key, avoid raw text, encrypt where required

6. Pattern 2 — Semantic Cache

A semantic cache stores answers or intermediate results for semantically similar inputs.

Exact cache:

same text -> same result

Semantic cache:

similar meaning -> maybe reuse result

This is powerful and dangerous.

6.1 Suitable Use Cases

Good candidates:

  • FAQ answers
  • product search query expansion
  • support macro suggestions
  • non-sensitive summarization hints
  • expensive classification where false reuse is tolerable
  • retrieval candidate cache

Poor candidates:

  • legal/regulatory final decision
  • payment authorization
  • medical advice
  • personalized account state
  • time-sensitive facts
  • permission-sensitive answers

The core question:

If we reuse an answer for a similar query, what harm happens when similarity is wrong?

6.2 Semantic Cache Flow

6.3 Cache Entry Model

semcache:answer:{tenantId}:{modelVersion}:{answerId}

Fields:

query_hash
query_normalized_preview
answer_payload_ref
embedding_model_version
answer_model_version
created_at
expires_at
tenant_id
security_scope
source_snapshot_version
quality_score

Vector index member points to answerId. Actual large payload can be stored separately:

semcache:payload:{tenantId}:{answerId}

6.4 Thresholds

Never use one global threshold blindly.

DomainThreshold policy
FAQModerate threshold may be acceptable
Support suggestionLower threshold if human reviews suggestion
Automatic answerHigher threshold
Security-sensitive answerExact scope match + high threshold + freshness check
Regulatory explanationPrefer retrieval reuse, not final answer reuse

A semantic hit requires all checks:

same tenant
same security scope
compatible language
same model family/version
fresh enough
similarity above threshold
answer type compatible
source snapshot still valid

6.5 Java Guardrail

public record SemanticCacheCandidate(
        String answerId,
        double similarity,
        String tenantId,
        String securityScope,
        String sourceSnapshotVersion,
        Instant createdAt,
        Instant expiresAt
) {
    boolean reusableFor(SemanticCacheRequest request, Instant now) {
        return tenantId.equals(request.tenantId())
                && securityScope.equals(request.securityScope())
                && sourceSnapshotVersion.equals(request.sourceSnapshotVersion())
                && similarity >= request.minSimilarity()
                && now.isBefore(expiresAt);
    }
}

This guardrail is more important than the vector query itself.


7. Pattern 3 — RAG Retrieval Index

RAG retrieval is one of the most common vector use cases.

The core pipeline:

7.1 Chunk Key Design

rag:chunk:{tenantId}:{corpusId}:{modelVersion}:{docId}:{chunkNo}

Metadata fields:

tenant_id
corpus_id
doc_id
chunk_no
source_uri
source_version
content_hash
language
security_scope
created_at
updated_at
embedding_model_version
chunking_version
text
embedding

7.2 Index Design

For Query Engine, index fields typically include:

FieldTypeWhy
embeddingVECTORSimilarity search
tenant_idTAGMandatory isolation filter
corpus_idTAGRestrict corpus
security_scopeTAGAuthorization boundary
doc_idTAGDebug/citation grouping
source_versionTAGStaleness control
languageTAGQuery-language matching
updated_at_epochNUMERICFreshness filter
textTEXTHybrid search or debug

7.3 Query Rule

A RAG vector query is not complete until authorization and freshness are enforced.

Bad:

Find top 10 nearest chunks globally.

Better:

Find top 50 nearest chunks where:
  tenant_id = request tenant
  corpus_id in allowed corpora
  security_scope in user scopes
  source_version is active
Then rerank and select top 5.

7.4 Candidate Count

Do not retrieve only the final number of chunks.

Typical flow:

vector topK = 50
filter/rerank
prompt topK = 5-12

Reason:

  • ANN retrieval may not return perfectly ranked results
  • metadata filters may remove many candidates
  • reranker needs enough candidates
  • prompt construction may need diversity by source document

7.5 Retrieval Quality Metrics

Track:

  • retrieval hit rate
  • no-result rate
  • average similarity of selected chunks
  • answer-with-citation rate
  • citation source diversity
  • stale chunk rate
  • user feedback by source
  • hallucination incidents tied to retrieval misses
  • reranker rejection rate

RAG quality is not only LLM quality. Bad retrieval creates bad generation.


8. Pattern 4 — Hybrid Search: Keyword + Vector

Vector search is good at semantic similarity. Keyword search is good at exact terms, identifiers, error codes, product SKUs, law references, names, and acronyms.

Hybrid search combines both.

Example query:

"ORA-00060 deadlock graph retry policy"

A pure vector query may retrieve broad database deadlock content. A keyword query may retrieve exact error-code references. Hybrid is often better.

8.1 Hybrid Architecture

8.2 Score Fusion

A simple reciprocal-rank fusion style approach:

score(doc) = 1 / (k + rank_vector) + 1 / (k + rank_keyword)

Then apply:

  • tenant filter
  • source freshness
  • security scope
  • diversity limit per document
  • max total token budget

Do not assume Redis alone must do all ranking logic. Often Redis retrieves candidates; Java service applies domain-specific ranking.


9. Pattern 5 — Recommendation Candidate Generation

Redis can support low-latency recommendation candidate generation:

  • similar product by embedding
  • similar content by embedding
  • users with similar preferences
  • item-to-item nearest neighbors
  • freshness/popularity blending with Sorted Sets
  • availability filtering with Sets/Hashes

9.1 Candidate Flow

9.2 Redis Structures

vset:tenant:{tenant}:item:model:{modelVersion}        -> item vectors
z:tenant:{tenant}:item:popularity:7d                  -> popularity score
z:tenant:{tenant}:item:freshness                      -> freshness score
set:tenant:{tenant}:item:active                       -> active item ids
set:user:{userId}:item:seen                           -> dedup seen items
hash:item:{itemId}:metadata                           -> category, price, stock, etc.

9.3 Score Blend

finalScore =
  0.65 * semanticSimilarity
+ 0.20 * normalizedPopularity
+ 0.10 * freshnessScore
+ 0.05 * businessBoost
- penaltyIfAlreadySeen

For high-scale systems, do not compute everything online. Use Redis for serving:

  • precomputed candidate lists
  • active filters
  • short-term session signals
  • fast exclusion sets
  • recent popularity

10. Pattern 6 — LLM Response Guard and Prompt Fragment Cache

Redis can cache prompt fragments or retrieval bundles, not only final answers.

Useful cache levels:

LevelKey ideaRisk
Embedding cacheInput → vectorLow if versioned
Retrieval cacheQuery intent → chunk idsMedium; source freshness matters
Prompt fragment cacheChunk ids → formatted contextMedium; token budget/model changes matter
Final answer cacheQuery → answerHigh; personalization/freshness/security matter

A safer RAG cache is often:

query -> retrieved chunk ids

not:

query -> final generated answer

Because retrieved chunks can still be revalidated before generation.


11. Java Vector Encoding

Vector encoding must be deterministic.

11.1 Float Array to Little-Endian Bytes

Many Redis vector examples for Hash vector fields use binary float32 encoding. Keep this conversion isolated and tested.

import java.nio.ByteBuffer;
import java.nio.ByteOrder;

public final class VectorEncoding {
    private VectorEncoding() {}

    public static byte[] float32LittleEndian(float[] vector) {
        if (vector == null) {
            throw new IllegalArgumentException("vector is null");
        }
        ByteBuffer buffer = ByteBuffer.allocate(vector.length * Float.BYTES)
                .order(ByteOrder.LITTLE_ENDIAN);
        for (float value : vector) {
            if (!Float.isFinite(value)) {
                throw new IllegalArgumentException("vector contains non-finite value: " + value);
            }
            buffer.putFloat(value);
        }
        return buffer.array();
    }

    public static float[] fromFloat32LittleEndian(byte[] bytes) {
        if (bytes.length % Float.BYTES != 0) {
            throw new IllegalArgumentException("invalid float32 byte length: " + bytes.length);
        }
        ByteBuffer buffer = ByteBuffer.wrap(bytes).order(ByteOrder.LITTLE_ENDIAN);
        float[] result = new float[bytes.length / Float.BYTES];
        for (int i = 0; i < result.length; i++) {
            result[i] = buffer.getFloat();
        }
        return result;
    }
}

11.2 Dimension Guard

public final class VectorGuard {
    public static void requireDimension(float[] vector, int expected) {
        if (vector.length != expected) {
            throw new IllegalArgumentException(
                    "Expected vector dimension " + expected + " but got " + vector.length);
        }
    }

    public static void requireNormalizedIfCosine(float[] vector, double tolerance) {
        double sum = 0;
        for (float value : vector) {
            sum += (double) value * value;
        }
        double norm = Math.sqrt(sum);
        if (Math.abs(norm - 1.0d) > tolerance) {
            throw new IllegalArgumentException("Expected normalized vector; norm=" + norm);
        }
    }
}

Do not hide dimension mismatch. It indicates model/config drift.


12. Java Integration Boundary

Because vector APIs can evolve faster than common Redis primitives, design your app with a narrow adapter.

public interface VectorSearchPort {
    List<VectorMatch> search(VectorSearchRequest request);
    void upsert(VectorDocument document);
    void delete(VectorDocumentId id);
}

public record VectorSearchRequest(
        String tenantId,
        String indexOrSetName,
        String modelVersion,
        float[] queryVector,
        int topK,
        Map<String, String> requiredTags,
        double minSimilarity
) {}

public record VectorMatch(
        String id,
        double score,
        Map<String, String> metadata
) {}

Benefits:

  • hide Jedis/Lettuce command details
  • support Query Engine today and Vector Sets later
  • make integration tests stable
  • centralize dimension checking
  • centralize model version enforcement
  • add tracing and metrics once

Do not scatter raw vector commands across services.


13. Query Engine vs Vector Set Adapter Shape

13.1 Query Engine Adapter

Best for document retrieval:

public final class RedisQueryEngineVectorSearch implements VectorSearchPort {
    @Override
    public List<VectorMatch> search(VectorSearchRequest request) {
        VectorGuard.requireDimension(request.queryVector(), 1536);

        // Conceptual steps:
        // 1. Encode query vector.
        // 2. Build parameterized Redis Query Engine vector query.
        // 3. Include tenant/security/model filters.
        // 4. Return ids + scores + metadata.
        //
        // Keep concrete command construction in this adapter, not in domain services.
        throw new UnsupportedOperationException("implementation-specific");
    }

    @Override
    public void upsert(VectorDocument document) {
        // Store Hash/JSON document and rely on index update.
        throw new UnsupportedOperationException("implementation-specific");
    }

    @Override
    public void delete(VectorDocumentId id) {
        // Delete source key and optionally cleanup auxiliary metadata.
        throw new UnsupportedOperationException("implementation-specific");
    }
}

13.2 Vector Set Adapter

Best for item-to-item matching or semantic cache candidates:

public final class RedisVectorSetSearch implements VectorSearchPort {
    @Override
    public List<VectorMatch> search(VectorSearchRequest request) {
        // Conceptual steps:
        // 1. Send VSIM against vset key.
        // 2. Ask for scores where supported.
        // 3. Apply minSimilarity in Java if needed.
        // 4. Hydrate metadata from Hash/JSON keys.
        throw new UnsupportedOperationException("implementation-specific");
    }

    @Override
    public void upsert(VectorDocument document) {
        // Conceptual VADD + metadata upsert.
        throw new UnsupportedOperationException("implementation-specific");
    }

    @Override
    public void delete(VectorDocumentId id) {
        // Conceptual vector member delete + metadata delete.
        throw new UnsupportedOperationException("implementation-specific");
    }
}

The design point:

Domain services should not know whether Redis is using a vector index or native Vector Set.


14. Multi-Tenant Safety

Never build a global semantic index unless your authorization model proves it is safe.

Safer patterns:

14.1 Tenant-Scoped Index/Set

idx:rag:{tenantId}:{modelVersion}
vset:{tenantId}:item:{modelVersion}

Pros:

  • strong isolation
  • simpler reasoning
  • easier tenant deletion
  • lower blast radius

Cons:

  • many indexes/sets
  • more operational objects
  • uneven tenant sizes

14.2 Shared Index with Tenant Filter

idx:rag:shared:{modelVersion}

Required filter:

tenant_id = request.tenantId
security_scope in request.allowedScopes

Pros:

  • fewer indexes
  • simpler global operations
  • better for many small tenants

Cons:

  • every query must enforce filters
  • filter bugs become data leaks
  • hot tenants can affect others
  • tenant deletion needs careful cleanup

For regulatory/case-management platforms, prefer tenant or domain partitioning unless there is a clear operational reason not to.


15. Deletion and Right-to-Be-Forgotten

Vector data is derived data. It still may be sensitive.

When a source document is deleted:

Checklist:

  • delete source metadata
  • delete vector representation
  • delete semantic cache entries derived from source
  • delete prompt fragments containing source text
  • expire stale retrieval caches
  • log deletion job result
  • scan for orphan vectors

Do not assume deleting the relational document automatically deletes its embedding.


16. Freshness and Drift

Vector systems rot quietly.

Common drift cases:

DriftExampleMitigation
Source driftDocument changed but vector not updatedSource version hash; re-embed on update
Model driftNew embedding model deployedModel-versioned index
Chunking driftChunk boundaries changedChunking version in key
Vocabulary driftNew product/domain termsPeriodic quality evaluation
Business driftPopularity/availability changedBlend vector score with current Redis state
Access driftUser permissions changedRuntime authorization filter

A candidate result must be validated against current metadata.


17. Performance Budget

A vector request has multiple latency components:

T_total = T_normalize
        + T_embedding_model
        + T_redis_query
        + T_metadata_hydration
        + T_rerank
        + T_downstream_generation_or_response

Redis query may be fast, but embedding generation may dominate. So optimize the whole path, not only Redis.

17.1 Typical Optimization Levers

BottleneckLever
Embedding generation latencyEmbedding cache, batch embedding, local model, provider timeout
Redis vector query latencyLower topK, better filters, index tuning, smaller vectors
Metadata hydrationPipeline HMGET/JSON fetch, store required metadata in index result
Reranking costRerank fewer candidates, cache rerank results
Payload sizeStore references, not huge text in hot keys
Tenant hot spotTenant partitioning, rate limit, dedicated Redis DB/cluster

18. Memory Budgeting

Approximate vector memory:

raw_vector_bytes = dimension * 4 bytes

For 1 million 1536-dim float32 vectors:

1,000,000 * 1536 * 4 = ~6.14 GB raw vector bytes

Index overhead can be significant. Metadata and key overhead also matter.

Questions before launch:

  • How many vectors per tenant?
  • What dimension?
  • How many model versions retained?
  • What metadata fields are indexed?
  • What TTL or retention policy exists?
  • How often do documents update?
  • What is the deletion SLA?
  • What is the expected query QPS and topK?
  • What is the memory headroom after replicas/persistence overhead?

Do not store every embedding forever without an explicit retention model.


19. Consistency Envelope

Vector search is rarely the system of record.

A robust system separates:

LayerRole
Source DB/object storeAuthoritative document/content state
Redis vector index/setServing-time semantic candidate lookup
Redis metadata/cacheHot retrieval state and derived values
Event pipelineRe-embedding/update/delete propagation
Audit logExplainability and compliance trail

Update flow:

Serving reads must tolerate temporary lag.

Expose this as product behavior:

Search index freshness target: under 2 minutes for normal updates.

For compliance-critical workflows, do not let vector index lag drive final truth.


20. Observability

Metrics:

vector.query.count
vector.query.latency.p50/p95/p99
vector.query.top_k
vector.query.result_count
vector.query.no_result_rate
vector.query.avg_similarity
vector.query.min_similarity_pass_rate
vector.embedding.cache.hit_rate
vector.embedding.model.latency
vector.index.upsert.count
vector.index.delete.count
vector.index.update_lag_seconds
vector.index.memory_bytes
vector.semantic_cache.hit_rate
vector.semantic_cache.false_hit_reported
rag.retrieval.selected_chunk_count
rag.retrieval.stale_chunk_rejected

Logs should include:

  • tenant id
  • corpus id
  • model version
  • topK
  • min similarity
  • result count
  • selected ids
  • source versions
  • query latency
  • fallback reason

Do not log raw user text or raw vectors by default.


21. Testing Strategy

21.1 Unit Tests

  • vector dimension validation
  • float encoding round trip
  • key naming
  • model version selection
  • min similarity filter
  • authorization filter
  • cache hit/miss rules

21.2 Integration Tests

  • create index/set
  • insert sample vectors
  • query nearest neighbors
  • update vector
  • delete vector
  • metadata hydration
  • tenant isolation
  • stale model rejection

21.3 Golden Retrieval Tests

Create a dataset:

query -> expected relevant document ids

Track:

  • recall@k
  • precision@k
  • mean reciprocal rank
  • no-result rate
  • latency

Example:

Query: "refund after ninety days"
Expected: policy-refund-90-days
Acceptable: refund-policy-summary
Rejected: shipping-policy

Vector systems need quality tests, not just API tests.


22. Failure Modes

Failure modeSymptomMitigation
Wrong dimensioncommand error or silent adapter failureValidate dimension before Redis call
Mixed model versionsdegraded relevanceversioned index/set; active pointer
Missing tenant filterdata leakmandatory query builder guard; tests
Stale deleted contentold docs appeardeletion pipeline + TTL + orphan scanner
Low recallrelevant docs missingincrease topK, tune index, rerank, improve chunking
High latencyp99 breachembedding cache, lower topK, batch hydration
Over-cached final answerswrong answerscache retrieval candidates instead
Memory growtheviction/OOMretention, capacity budget, per-tenant quota
Hot tenantnoisy neighborrate limit, partition, dedicated index
Provider outageno embeddingsfallback exact search, cached embeddings, degrade feature

23. Production Checklist

Before launch:

  • Vector use case has explicit correctness boundary.
  • Model version is part of key/index/set naming.
  • Dimension and distance metric are configured explicitly.
  • Tenant/security filters are mandatory.
  • Source version/freshness is checked.
  • Deletion path removes vectors and derived caches.
  • Embedding cache has TTL and sensitive-data policy.
  • Vector query has topK limit.
  • Metadata hydration is pipelined or minimized.
  • Retrieval quality dataset exists.
  • Drift/migration plan exists.
  • Metrics cover latency, hit rate, no-result rate, and update lag.
  • Fallback path is defined.

24. Practice Exercise

Build a Java service module with this interface:

public interface SemanticKnowledgeSearch {
    SearchResponse search(SearchRequest request);
}

Requirements:

  1. Normalize query text.
  2. Generate or retrieve embedding from Redis embedding cache.
  3. Query Redis vector layer.
  4. Enforce tenant and security scope.
  5. Reject stale source versions.
  6. Return top chunks with source metadata.
  7. Record metrics.
  8. Support model version switch through Redis config key.

Then write tests for:

  • wrong dimension
  • stale model version
  • tenant leakage attempt
  • deleted document
  • semantic cache false hit
  • Redis timeout fallback

The goal is not to build a demo. The goal is to make correctness boundaries executable.


25. Summary

Redis vector capabilities are useful when semantic lookup belongs in the low-latency serving path. But vector search must be treated as candidate generation, not truth.

The senior-engineering rules:

  1. Version everything: model, preprocessing, chunking, source snapshot.
  2. Keep tenant/security filters mandatory.
  3. Prefer retrieval/prompt-fragment caches over final-answer caches for sensitive workflows.
  4. Validate vector dimension before Redis calls.
  5. Budget memory explicitly.
  6. Test retrieval quality with golden queries.
  7. Monitor update lag and stale-result rejection.
  8. Keep Redis as serving state unless you intentionally choose it as a primary store.

Part 024 will move from capability design into the performance model that underlies all Redis production engineering: latency, throughput, pipelining, batching, command cost, payload size, and Java client behavior.


References

Lesson Recap

You just completed lesson 23 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.