Deepen PracticeOrdered learning track

Transactions, Lua Scripts, Functions, and Atomic Workflows

Learn Java Redis In Action - Part 025

Production-grade Redis atomic workflow engineering for Java engineers covering MULTI/EXEC, WATCH, Lua scripting, Redis Functions, Cluster constraints, Java client integration, correctness boundaries, testing, and operational safety.

21 min read4048 words
PrevNext
Lesson 2534 lesson track1928 Deepen Practice
#java#redis#transactions#lua+6 more

Part 025 — Transactions, Lua Scripts, Functions, and Atomic Workflows

Part 024 covered Redis performance: latency, throughput, pipelining, batching, payload size, and benchmark discipline. Now we move into one of the most misunderstood Redis topics:

How to build correct multi-step workflows on top of Redis.

Redis makes single commands atomic. That does not automatically make an application workflow atomic. A workflow usually contains:

  • read current state
  • validate precondition
  • compute next state
  • write one or more keys
  • set TTL
  • publish or enqueue a side effect
  • return an externally meaningful result

If those steps are executed as multiple round trips from Java, concurrency can interleave between them. The result can be duplicate processing, lost update, inconsistent TTL, broken quota, incorrect lock ownership, or replayed requests that observe partial state.

The core mental model:

Redis atomic workflow design is about moving the read-decide-write critical section either into Redis transaction semantics, optimistic concurrency, Lua scripting, or Redis Functions — while keeping the correctness boundary explicit.


1. Kaufman Skill Decomposition

The skill is not “know MULTI” or “write Lua”. The real skill is:

Given a Redis-backed business invariant, choose the weakest atomicity mechanism that preserves that invariant under concurrency, timeout, retry, failover, and Cluster topology.

Breakdown:

Sub-skillWhat you must be able to do
Atomicity boundary designIdentify which operations must be indivisible and which can be eventually consistent
Command-level atomicityKnow when a native Redis command is already enough
Transaction useUse MULTI/EXEC safely for queued writes and optimistic CAS with WATCH
Lua scriptingMove read-decide-write logic into Redis for atomic server-side execution
Redis FunctionsPackage server-side logic as versioned, persistent functions instead of ad-hoc scripts
Cluster key designKeep all keys in an atomic operation in the same hash slot
Java integrationExecute scripts/functions through Lettuce, Jedis, or Spring Data Redis without SHA-cache fragility
Retry safetyDistinguish retryable transport failures from unsafe duplicate writes
Operational safetyAvoid long-running scripts, unbounded loops, dynamic key discovery, and hidden blocking
TestingProve atomicity with concurrent tests, deterministic fixtures, and invariant checks

Kaufman-style outcome:

After this part, you should be able to design and review Redis-backed atomic workflows such as idempotency, rate limiting, session mutation, delayed queue claiming, lock release, and quota update without relying on accidental timing.


2. The Redis Atomicity Ladder

Do not start with Lua. Use the simplest mechanism that protects the invariant.

LevelMechanismUse whenAvoid when
1Single commandRedis already has the exact atomic primitiveYou need conditional logic across multiple values
2Single command with optionsSET NX PX, HSETNX, ZADD NX, EXPIRE NX/XX/GT/LT expresses the ruleYou need read-modify-write based on current value
3PipelineYou only need fewer round trips, not atomicityYou require all-or-nothing or no interleaving
4MULTI/EXECYou need commands executed together without interleavingYou need to branch based on intermediate results inside the transaction
5WATCH + MULTI/EXECYou need optimistic compare-and-set from the clientHigh contention would cause many retries
6Lua scriptYou need read-decide-write atomically in one server-side operationLogic is long, slow, non-deterministic, or operationally hard to manage
7Redis FunctionYou need reusable, deployed, versioned server-side logicYou only need a tiny one-off conditional update
8External transactional systemRedis cannot safely own the invariantCorrectness requires consensus, durable transactions, or cross-partition ACID

Important:

Pipelining is not atomicity. It is network optimization.

A pipeline reduces round trips. A transaction controls interleaving. A script moves logic to the server. A function packages server-side logic as a manageable unit.


3. Single-Command Atomicity First

Redis commands execute atomically with respect to other commands. Many production workflows can be solved with one command plus options.

Examples:

WorkflowCommand shapeWhy it works
Claim idempotency keySET key value NX PX ttlonly first claimant succeeds
Increment quota counterINCR keycounter update is atomic
Create object if absentHSETNX key field valuefield is only written once
Add unique event idSADD dedup eventIdmembership mutation is atomic
Insert leaderboard score only if newZADD NX board score memberavoids overwriting existing score
Extend TTL only when key existsEXPIRE key seconds XXavoids resurrecting absent state
Extend TTL only forwardEXPIRE key seconds GTavoids accidentally shortening lifetime

When a single command exists, prefer it. It is faster, simpler, easier to observe, and easier to reason about than Lua.

Java example: idempotency claim with Lettuce

import io.lettuce.core.SetArgs;
import io.lettuce.core.api.sync.RedisCommands;

public final class IdempotencyClaimStore {
    private final RedisCommands<String, String> redis;

    public IdempotencyClaimStore(RedisCommands<String, String> redis) {
        this.redis = redis;
    }

    public boolean claim(String idempotencyKey, String ownerId, long ttlMillis) {
        String redisKey = "idem:v1:" + idempotencyKey;

        String result = redis.set(
            redisKey,
            ownerId,
            SetArgs.Builder.nx().px(ttlMillis)
        );

        return "OK".equals(result);
    }
}

This is a complete atomic claim. No GET is required before the SET.

Bad version:

if (redis.get(key) == null) {
    redis.setex(key, ttlSeconds, ownerId);
}

That version has a race between GET and SETEX. Two clients can observe absence and both write.


4. MULTI/EXEC Mental Model

Redis transactions are centered around:

  • MULTI
  • queued commands
  • EXEC
  • DISCARD
  • optionally WATCH

A basic transaction:

MULTI
INCR account:123:debit-count
HSET account:123:last-debit amount 100 currency USD
EXPIRE account:123:last-debit 86400
EXEC

Mental model:

Important properties:

  1. Commands are queued after MULTI.
  2. Commands are executed at EXEC.
  3. Other clients do not interleave while the queued transaction executes.
  4. Redis transactions are not SQL transactions.
  5. There is no rollback of already executed commands in the SQL sense.
  6. You cannot read a value inside the transaction and branch client-side before queuing the next command.

What MULTI/EXEC is good for

Use it when:

  • you already know all commands before execution
  • you need commands executed as a contiguous unit
  • you do not need server-side branching
  • you want cheaper correctness than Lua
  • command results are not needed to decide subsequent commands

Example: write object + index + TTL together.

MULTI
HSET user:123 name "Ari" status "ACTIVE"
SADD users:by-status:ACTIVE 123
EXPIRE user:123 3600
EXEC

This avoids an observer seeing only part of the write due to interleaving during transaction execution.

What MULTI/EXEC is not good for

Avoid it when the logic is:

value = GET key
if value < limit:
    INCR key
    return allowed
else:
    return denied

With MULTI, the GET result is not available until EXEC. So you cannot branch inside the transaction from Java. For this, use WATCH or Lua.


5. WATCH as Optimistic Concurrency Control

WATCH turns a transaction into a compare-and-set workflow.

Pattern:

  1. WATCH key
  2. GET key
  3. compute new value in Java
  4. MULTI
  5. write new value
  6. EXEC
  7. if EXEC returns null/empty conflict indicator, retry

Use WATCH when:

  • contention is low
  • logic is easier in Java than Lua
  • conflicting updates can retry
  • keys are few
  • you can tolerate extra round trips

Avoid WATCH when:

  • key is hot
  • high QPS causes repeated aborts
  • branch logic is small enough for Lua
  • retry storms are likely
  • the value is large and expensive to deserialize repeatedly

Java CAS example with Jedis-style pseudocode

public boolean updateQuotaWithWatch(String userId, int limit) {
    String key = "quota:v1:{" + userId + "}:minute";

    for (int attempt = 0; attempt < 5; attempt++) {
        jedis.watch(key);

        String raw = jedis.get(key);
        int current = raw == null ? 0 : Integer.parseInt(raw);

        if (current >= limit) {
            jedis.unwatch();
            return false;
        }

        Transaction tx = jedis.multi();
        tx.set(key, Integer.toString(current + 1));
        tx.expire(key, 60);

        List<Object> result = tx.exec();
        if (result != null) {
            return true;
        }

        // Another client modified the watched key.
        // Retry with bounded attempts and jitter in real systems.
    }

    throw new TooMuchContentionException("quota update contention");
}

This is correct under low contention. It is not necessarily efficient under high contention.

WATCH and business invariants

WATCH protects keys, not abstract business concepts. If the invariant depends on multiple keys, watch all relevant keys. If the invariant depends on external database state, Redis cannot protect it alone.

Example:

Invariant:
A tenant cannot have more than 10 active exports.

Redis keys:
- export:tenant:{tenantId}:active-count
- export:{exportId}:status

Watching only the count key may not protect against races involving status repair, manual cancellation, or delayed worker cleanup. The model must be explicit.


6. Lua Scripting Mental Model

Lua scripting lets you execute logic inside Redis. A script can:

  • read keys
  • branch based on values
  • write keys
  • set TTL
  • return structured results

All within one atomic server-side execution.

The big benefit:

Lua eliminates the race between read, decision, and write.

The big danger:

Lua can hide complex, blocking, hard-to-debug application logic inside Redis.

Use it for small critical sections, not full business services.


7. Lua Script Anatomy

Example: fixed-window rate limiter.

-- KEYS[1] = rate-limit key
-- ARGV[1] = max requests
-- ARGV[2] = ttl seconds

local current = redis.call('INCR', KEYS[1])

if current == 1 then
  redis.call('EXPIRE', KEYS[1], tonumber(ARGV[2]))
end

if current > tonumber(ARGV[1]) then
  return {0, current}
end

return {1, current}

Call shape:

EVAL "...script..." 1 rl:v1:{tenant-123}:api:2026-07-02T14:00 100 60

Rules:

RuleReason
Put key names in KEYSRedis Cluster and command routing need key visibility
Put values/config in ARGVAvoid treating non-key values as keys
Keep scripts shortRedis is blocked while script runs
Avoid unbounded loopsOne slow script can damage global latency
Return small structured valuesJava mapping should be predictable
Do not generate dynamic key names inside LuaCluster correctness and reviewability suffer
Version scriptsReturn contracts and semantics evolve over time

8. Lua Pattern: Safe Lock Release

The classic lock release bug:

redis.del(lockKey);

This can delete another client's lock if the lease expired and was reacquired.

Correct pattern:

-- KEYS[1] = lock key
-- ARGV[1] = expected owner token

if redis.call('GET', KEYS[1]) == ARGV[1] then
  return redis.call('DEL', KEYS[1])
end

return 0

Java wrapper:

public boolean releaseLock(String lockKey, String ownerToken) {
    String script = """
        if redis.call('GET', KEYS[1]) == ARGV[1] then
          return redis.call('DEL', KEYS[1])
        end
        return 0
        """;

    Long result = redis.eval(script, ScriptOutputType.INTEGER, new String[] { lockKey }, ownerToken);
    return result != null && result == 1L;
}

Invariant:

Only the current owner token can release the lock.

This still does not solve stale owner writes to an external resource. For that, use fencing tokens as discussed in Part 018.


9. Lua Pattern: Idempotency State Machine

A robust idempotency key often needs more than SET NX. It may need states:

  • IN_PROGRESS
  • COMPLETED
  • FAILED_RETRYABLE
  • FAILED_FINAL

Claim script:

-- KEYS[1] = idempotency hash key
-- ARGV[1] = owner token
-- ARGV[2] = now millis
-- ARGV[3] = in-progress ttl seconds

local state = redis.call('HGET', KEYS[1], 'state')

if not state then
  redis.call('HSET', KEYS[1],
    'state', 'IN_PROGRESS',
    'owner', ARGV[1],
    'startedAt', ARGV[2]
  )
  redis.call('EXPIRE', KEYS[1], tonumber(ARGV[3]))
  return {'CLAIMED'}
end

if state == 'COMPLETED' then
  local status = redis.call('HGET', KEYS[1], 'status')
  local body = redis.call('HGET', KEYS[1], 'body')
  return {'REPLAY', status or '', body or ''}
end

return {'BUSY', state}

Complete script:

-- KEYS[1] = idempotency hash key
-- ARGV[1] = expected owner token
-- ARGV[2] = response status
-- ARGV[3] = response body
-- ARGV[4] = completed ttl seconds

local owner = redis.call('HGET', KEYS[1], 'owner')
local state = redis.call('HGET', KEYS[1], 'state')

if owner ~= ARGV[1] or state ~= 'IN_PROGRESS' then
  return 0
end

redis.call('HSET', KEYS[1],
  'state', 'COMPLETED',
  'status', ARGV[2],
  'body', ARGV[3]
)
redis.call('EXPIRE', KEYS[1], tonumber(ARGV[4]))
return 1

Why Lua helps:

  • claim is atomic
  • replay detection is atomic
  • owner validation is atomic
  • TTL is attached in the same critical section
  • Java does not interpret partial state between commands

Failure boundary:

Redis can make the idempotency state atomic. It cannot make your downstream database write and Redis completion marker one atomic distributed transaction.

You still need outbox, reconciliation, or recovery if the process crashes between DB commit and Redis completion update.


10. Lua Pattern: Sliding Window Rate Limit

Sorted-set sliding window:

-- KEYS[1] = zset key
-- ARGV[1] = now millis
-- ARGV[2] = window millis
-- ARGV[3] = limit
-- ARGV[4] = request id
-- ARGV[5] = ttl seconds

local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])
local requestId = ARGV[4]

redis.call('ZREMRANGEBYSCORE', KEYS[1], 0, now - window)
local count = redis.call('ZCARD', KEYS[1])

if count >= limit then
  local oldest = redis.call('ZRANGE', KEYS[1], 0, 0, 'WITHSCORES')
  return {0, count, oldest[2] or ''}
end

redis.call('ZADD', KEYS[1], now, requestId)
redis.call('EXPIRE', KEYS[1], tonumber(ARGV[5]))

return {1, count + 1, ''}

Atomic invariant:

No two clients can both observe capacity and insert beyond the limit without the script serializing their decisions.

Trade-offs:

DimensionImpact
AccuracyHigh; tracks individual requests
MemoryO(number of requests in window)
CPUZREMRANGEBYSCORE, ZCARD, ZADD per request
Hot key riskHigh for very active tenant/global limits
ClusterAll keys for one limiter must be in the same slot

For extreme QPS, use sliding-window counter or token bucket to reduce cardinality.


11. Lua Pattern: Atomic Multi-Structure Update

Example: write session state and update reverse index.

-- KEYS[1] = session hash
-- KEYS[2] = user session set
-- ARGV[1] = session id
-- ARGV[2] = user id
-- ARGV[3] = now millis
-- ARGV[4] = ttl seconds

redis.call('HSET', KEYS[1],
  'sessionId', ARGV[1],
  'userId', ARGV[2],
  'lastSeenAt', ARGV[3]
)
redis.call('EXPIRE', KEYS[1], tonumber(ARGV[4]))

redis.call('SADD', KEYS[2], ARGV[1])
redis.call('EXPIRE', KEYS[2], tonumber(ARGV[4]) + 60)

return 1

Cluster-safe key design:

session:v1:{user-42}:sess-abc
user-session:v1:{user-42}

The hash tag {user-42} ensures both keys map to the same hash slot.

Bad key design:

session:v1:sess-abc
user-session:v1:user-42

Those may be in different slots in Redis Cluster. A single script cannot atomically update them in Cluster mode.


12. Redis Cluster Constraints for Atomic Workflows

In Redis Cluster, multi-key atomic operations must respect hash slots.

Design rule:

Every key touched by one transaction/script/function must be known up front and must belong to the same hash slot.

Use hash tags intentionally:

quota:v1:{tenant-123}:api:minute:202607021400
quota:v1:{tenant-123}:api:day:20260702
quota:v1:{tenant-123}:meta

This allows tenant-scoped multi-key atomic logic.

Do not use hash tags casually:

cache:{global}:product:1
cache:{global}:product:2
cache:{global}:product:3

That creates a global hot slot.

Good hash tag selection:

Invariant scopeHash tag
Per user{userId}
Per tenant{tenantId}
Per order{orderId}
Per account{accountId}
Per API client{clientId}

Bad hash tag selection:

Hash tagProblem
{global}destroys sharding
{cache}puts unrelated cache keys in one slot
{today}creates time-window hot slot
{status}concentrates by low-cardinality dimension

13. EVAL, EVALSHA, and Script Cache

EVAL sends the full script source. EVALSHA sends a hash of a script that Redis already has in its script cache.

Operational pattern:

  1. load script at application startup using SCRIPT LOAD
  2. keep SHA in memory
  3. execute using EVALSHA
  4. fallback to EVAL or reload on NOSCRIPT

But there is a caveat:

NOSCRIPT handling is harder inside pipelines because responses are returned later.

For simple systems, using client libraries' script abstraction is often safer than hand-rolling SHA lifecycle. For large systems, package scripts or functions as versioned deployment artifacts.

Versioned script naming

Even though Redis identifies scripts by SHA, humans need names:

rate_limit_sliding_window_v3.lua
idempotency_claim_v2.lua
lock_release_v1.lua
session_touch_v4.lua

Store metadata in code:

public enum RedisScriptName {
    RATE_LIMIT_SLIDING_WINDOW_V3("rate_limit_sliding_window_v3"),
    IDEMPOTENCY_CLAIM_V2("idempotency_claim_v2"),
    LOCK_RELEASE_V1("lock_release_v1");

    private final String logicalName;

    RedisScriptName(String logicalName) {
        this.logicalName = logicalName;
    }
}

Do not let random inline script strings spread across service code.


14. Redis Functions Mental Model

Redis Functions are server-side libraries loaded into Redis and called by name. They are operationally different from ad-hoc scripts:

AspectLua script via EVALRedis Function
Deploymentapplication sends scriptfunction library loaded into Redis
InvocationEVAL / EVALSHAFCALL / FCALL_RO
Lifecyclescript cache can be flushedlibrary/function managed as database asset
Reuseper application conventionnamed reusable function
Versioningapp-level naminglibrary-level deployment strategy
Operational fitsmall app-owned logicshared server-side primitives

Use Redis Functions when:

  • multiple services need the same primitive
  • script logic is stable and reusable
  • you want explicit server-side deployment
  • you need cleaner operational lifecycle than script cache
  • functions are part of platform infrastructure

Avoid Redis Functions when:

  • logic changes frequently with one application
  • deployment coordination is weak
  • different services need incompatible versions
  • the logic belongs in application/domain layer
  • rollback strategy is unclear

Example function library shape:

#!lua name=quota_lib

redis.register_function('fixed_window_allow', function(keys, args)
  local current = redis.call('INCR', keys[1])
  if current == 1 then
    redis.call('EXPIRE', keys[1], tonumber(args[2]))
  end

  if current > tonumber(args[1]) then
    return {0, current}
  end

  return {1, current}
end)

Invocation:

FCALL fixed_window_allow 1 quota:v1:{tenant-123}:api:minute 100 60

Platform rule:

Treat Redis Functions like database migrations: reviewed, versioned, deployed, tested, and rollback-aware.


15. Java Integration Patterns

Pattern A: Script wrapper class

Do not call raw script strings from business services. Wrap them.

public final class RedisRateLimiterScripts {
    private final RedisCommands<String, String> redis;
    private final String slidingWindowScript;

    public RedisRateLimiterScripts(RedisCommands<String, String> redis, String slidingWindowScript) {
        this.redis = redis;
        this.slidingWindowScript = slidingWindowScript;
    }

    public RateLimitDecision allow(String key, long nowMillis, long windowMillis, int limit, String requestId) {
        @SuppressWarnings("unchecked")
        List<Object> result = redis.eval(
            slidingWindowScript,
            ScriptOutputType.MULTI,
            new String[] { key },
            Long.toString(nowMillis),
            Long.toString(windowMillis),
            Integer.toString(limit),
            requestId,
            Long.toString((windowMillis / 1000) + 5)
        );

        boolean allowed = "1".equals(String.valueOf(result.get(0)));
        long count = Long.parseLong(String.valueOf(result.get(1)));
        String oldest = String.valueOf(result.get(2));

        return new RateLimitDecision(allowed, count, oldest);
    }
}

Business service calls a domain method:

RateLimitDecision decision = limiter.allowApiRequest(tenantId, apiName, requestId);

if (!decision.allowed()) {
    throw new RateLimitExceededException(decision.retryAfterMillis());
}

Pattern B: Stable return envelope

Avoid return values like:

return 1

if future versions may need more detail.

Prefer:

return {'ALLOWED', current, resetAt}

or:

return cjson.encode({
  decision = 'ALLOWED',
  count = current,
  resetAt = resetAt
})

Trade-off:

Return formatProsCons
Arrayfast, compactpositional, brittle
JSON stringself-describingserialization overhead
Integer codecompacthard to evolve
Status + fieldsbalancedneeds mapping discipline

For hot paths, arrays are fine if wrapped tightly. For platform scripts, self-describing results may be worth the cost.

Pattern C: Typed domain response

public record AtomicWorkflowResult(
    String status,
    Map<String, String> fields
) {
    public boolean isAllowed() {
        return "ALLOWED".equals(status);
    }
}

Do not leak raw Redis script arrays beyond the infrastructure boundary.


16. Error and Timeout Semantics

Atomic operation wrappers must define what happens on:

  • Redis timeout
  • connection reset
  • script error
  • NOSCRIPT
  • MOVED/ASK redirect
  • READONLY during failover
  • cluster slot migration
  • response mapping error
  • Java thread interruption

Critical distinction:

A client timeout does not prove Redis did not execute the operation.

Example:

Client sends script.
Redis executes script and writes state.
Network stalls before response reaches client.
Client times out.
Client retries.

If the script is not idempotent, retry may duplicate mutation.

Therefore:

  • make atomic scripts idempotent where possible
  • include request ids for mutation workflows
  • store result markers for replay
  • use owner tokens for lock-like operations
  • avoid blind retry of non-idempotent scripts
  • distinguish read-only scripts/functions from mutating ones

Timeout classification

OperationSafe blind retry?Reason
GET keyUsually yesread-only
SET key valueSometimesoverwrites may be safe if value deterministic
INCR keyNoduplicate increment changes state
idempotency claim scriptUsually yesif keyed by request id and returns existing state
lock release scriptUsually yesowner-token compare makes it stable
queue claim scriptNo unless request/worker idempotentmay move jobs or update attempts
rate limiter scriptUsually noretry consumes extra quota unless request id dedup is included

17. Atomic Workflow Design Templates

Template 1: Read-decide-write

Use Lua when all state is Redis-owned.

Examples:

  • quota allow/deny
  • idempotency claim
  • sliding window insertion
  • lock release
  • session touch with max idle

Template 2: Compare owner token

Use for leases and owned state transitions.

if currentOwner != expectedOwner:
    return NOT_OWNER
else:
    mutate
    return OK

Examples:

  • lock release
  • worker heartbeat
  • job completion
  • in-progress idempotency completion

Template 3: Monotonic version/fencing

Use for ordered mutation.

version = INCR version-key
write state with version
return version

Consumers must reject stale versions. Redis can generate the token. The external resource must enforce it.

Template 4: Request-id dedup inside mutating script

Use for retry-safe mutation.

if requestId already seen:
    return previous result
else:
    perform mutation
    remember requestId/result
    return result

This is how you make INCR-like semantics retry-safe.


18. Anti-Patterns

Anti-pattern 1: GET then SET for conditional updates

Bad:

String current = redis.get(key);
if (current == null) {
    redis.setex(key, 60, value);
}

Use:

SET key value NX EX 60

or Lua if condition is more complex.

Anti-pattern 2: Lua as business process engine

Bad script responsibilities:

  • parse complex domain JSON
  • implement long state machine
  • scan thousands of keys
  • generate reports
  • call large aggregations on request path
  • encode business policy that changes weekly

Redis scripts should protect small invariants. They should not become a hidden microservice.

Anti-pattern 3: Dynamic key discovery inside script

Bad:

local keys = redis.call('KEYS', 'tenant:*:quota')
for i, key in ipairs(keys) do
  redis.call('DEL', key)
end

Problems:

  • blocks Redis
  • breaks Cluster key-slot reasoning
  • unsafe at scale
  • hard to test
  • unpredictable latency

Anti-pattern 4: Huge result payloads

Bad:

return redis.call('HGETALL', KEYS[1])

inside a hot path for a large hash.

Prefer returning only fields required by the decision.

Anti-pattern 5: Treating transaction as rollback-capable SQL transaction

Redis transaction errors and rollback semantics are different from relational transactions. Design commands so partial semantic assumptions do not depend on SQL-style rollback.

Anti-pattern 6: No script versioning

Bad:

redis.eval("if redis.call('GET', KEYS[1]) then ...", ...)

spammed across codebase.

Better:

scripts/
  idempotency_claim_v1.lua
  idempotency_complete_v1.lua
  rate_limit_sliding_window_v2.lua
  lock_release_v1.lua

with tests and checksums.


19. Testing Atomic Workflows

Atomicity bugs are concurrency bugs. Unit tests alone are not enough.

Test layers

LayerPurpose
Lua unit fixtureValidate return values for known Redis states
Integration testExecute against real Redis using Testcontainers
Concurrent stress testMany threads/processes hit same key and assert invariant
Retry simulationTimeout/retry duplicate request id and assert no duplicate mutation
Cluster testValidate hash slot key design and MOVED/ASK behavior
Failover testObserve behavior during connection reset and primary switch
Property testGenerate random interleavings and assert invariant

Example invariant test: rate limit never exceeds limit

@Test
void slidingWindowLimiterNeverAllowsMoreThanLimit() throws Exception {
    int limit = 100;
    String key = "rl:v1:{tenant-42}:api:test";

    ExecutorService pool = Executors.newFixedThreadPool(32);
    CountDownLatch start = new CountDownLatch(1);
    AtomicInteger allowed = new AtomicInteger();

    List<Future<?>> futures = IntStream.range(0, 1000)
        .mapToObj(i -> pool.submit(() -> {
            start.await();
            RateLimitDecision decision = limiter.allow(key, "req-" + i);
            if (decision.allowed()) {
                allowed.incrementAndGet();
            }
            return null;
        }))
        .toList();

    start.countDown();

    for (Future<?> future : futures) {
        future.get(10, TimeUnit.SECONDS);
    }

    assertThat(allowed.get()).isLessThanOrEqualTo(limit);
}

Do not only test happy path with sequential calls.


20. Observability for Atomic Workflows

You need visibility at four levels:

LevelSignal
Applicationoperation name, status, business result, latency, retry count
Redis commandEVAL, EVALSHA, FCALL, command latency, errors
Script/functionlogical script name/version, input key count, return status
Systemslowlog, latency spikes, CPU, memory, blocked clients, cluster redirects

Recommended app metrics:

redis.atomic.operation.count{operation,status}
redis.atomic.operation.latency{operation}
redis.atomic.operation.retry.count{operation,reason}
redis.atomic.operation.timeout.count{operation}
redis.atomic.script.noscript.count{script}
redis.atomic.script.error.count{script,errorType}
redis.atomic.result.count{operation,result}

Log fields:

{
  "event": "redis_atomic_workflow",
  "operation": "rate_limit_sliding_window",
  "version": "v3",
  "keyHashTag": "tenant-123",
  "result": "DENIED",
  "latencyMs": 3,
  "attempt": 1
}

Do not log full Redis keys when they contain user identifiers or secrets. Hash or redact sensitive dimensions.


21. Operational Safety

Keep scripts bounded

Every loop must have a clear maximum.

Bad:

while true do
  -- keep scanning
end

Better:

local maxItems = tonumber(ARGV[1])
for i = 1, maxItems do
  -- bounded work
end

Avoid heavy commands in scripts

Be careful with:

  • KEYS
  • large HGETALL
  • large SMEMBERS
  • large ZRANGE
  • unbounded SCAN-like loops
  • deleting huge keys synchronously

Prefer small critical sections

A good Lua script often does:

  • 1–5 reads
  • 1–5 writes
  • simple branching
  • simple numeric/string comparisons
  • small return envelope

If the script needs a design document to understand business semantics, it may belong in application code.

Plan rollout and rollback

For each script/function:

Deployment questionRequired answer
How is it versioned?filename/function name/version field
Who loads it?app startup, migration job, platform bootstrap
How is SHA cached?client abstraction or explicit registry
What happens on NOSCRIPT?reload/fallback policy
Can old and new app versions coexist?stable return contract or dual scripts
How is rollback performed?keep old function/script until consumers migrate
How is it tested?fixture + integration + concurrency

22. Decision Matrix

ProblemPreferred mechanismWhy
Set value only if absent with TTLSET NX EX/PXnative atomic primitive
Increment counterINCRnative atomic primitive
Write several known values togetherMULTI/EXECno branch needed
Optimistic object update under low contentionWATCH + MULTIeasier in Java
Rate limit read-count-insertLuaread-decide-write atomicity
Lock release by owner tokenLuacompare-and-delete must be atomic
Shared platform quota primitiveRedis Functionreusable server-side logic
Cross-tenant/global invariant in ClusterExternal system or redesigned keyingcross-slot atomicity problem
Money/accounting correctnessDatabase transaction/ledger systemRedis can assist but should not own ledger invariant
Long workflow with side effectsApplication saga/outbox/workflow engineRedis script is too narrow

23. Production Checklist

Before shipping a Redis atomic workflow:

  • The invariant is written in one sentence.
  • The keys participating in the invariant are listed.
  • The atomicity mechanism is justified.
  • Cluster hash slot behavior is verified.
  • The operation is safe or explicitly unsafe to retry after timeout.
  • Script/function return contract is versioned.
  • No unbounded loops exist.
  • No large unbounded reads are used.
  • TTL behavior is part of the atomic section if required.
  • Owner/request tokens are used where duplicate/retry risk exists.
  • Concurrent tests assert the invariant.
  • Application metrics expose result, latency, retry, timeout, and errors.
  • Rollout and rollback plan exists.

24. 20-Hour Practice Block

Use this part as deliberate practice, not passive reading.

Hour 1–3: Native atomic primitives

Implement:

  • idempotency claim with SET NX PX
  • unique event dedup with SADD
  • fixed counter with INCR + TTL
  • safe TTL extension with EXPIRE GT

Write concurrency tests.

Hour 4–6: MULTI/EXEC and WATCH

Implement:

  • profile update + secondary index in one transaction
  • optimistic quota update with WATCH
  • retry limit with jitter

Measure abort rate under contention.

Hour 7–11: Lua scripts

Implement scripts for:

  • safe lock release
  • idempotency claim/complete
  • sliding window limiter
  • worker heartbeat with owner token

Write fixture tests and integration tests.

Hour 12–15: Cluster-safe key design

For each workflow:

  • list keys
  • choose hash tag
  • verify same slot
  • identify hot-slot risk

Hour 16–18: Failure simulation

Inject:

  • client timeout
  • duplicate retry
  • Redis restart
  • script cache flush
  • connection reset

Record which operations are retry-safe.

Hour 19–20: Review and playbook

Create a one-page decision guide:

  • invariant
  • mechanism
  • retry policy
  • key design
  • test coverage
  • operational metrics

25. Part Summary

Redis atomic workflow engineering is about choosing the correct place for the critical section.

Use this ladder:

single command
→ command with options
→ MULTI/EXEC
→ WATCH + MULTI
→ Lua script
→ Redis Function
→ external transactional/consensus system

The key lessons:

  • Pipelining is not atomicity.
  • MULTI/EXEC queues commands but does not provide SQL-style rollback.
  • WATCH is optimistic concurrency control and works best under low contention.
  • Lua is excellent for small read-decide-write critical sections.
  • Redis Functions are better for shared, versioned server-side primitives.
  • In Cluster, all keys in one atomic operation must be in the same hash slot.
  • Client timeout does not prove the operation did not execute.
  • Atomic Redis state does not make external database or side-effect workflows atomic.

Top 1% Redis engineers do not ask, “Can I write this in Lua?” They ask:

What invariant am I protecting, where is the critical section, what happens on retry, and what is the weakest mechanism that preserves correctness?


References

Lesson Recap

You just completed lesson 25 in deepen practice. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.