Build CoreOrdered learning track

Rate Limiting and Quota Enforcement

Learn Java Redis In Action - Part 017

Production-grade rate limiting and quota enforcement with Redis and Java: fixed window, sliding window, token bucket, leaky bucket, weighted quota, multi-dimensional limits, Lua atomicity, cluster-safe key design, HTTP contracts, observability, and failure modeling.

[2026-07-02]25 min read4938 words

In This Lesson

1. Kaufman Skill Decomposition 2. Mental Model: Admission Control 3. Rate Limit vs Quota

PrevNext

Lesson 1734 lesson track07–18 Build Core

#java#redis#rate-limiting#quota+5 more

Part 017 — Rate Limiting and Quota Enforcement

Part 016 covered idempotency and deduplication. Now we move to one of the most common production uses of Redis:

Deciding whether a request, command, event, or background action is allowed to proceed right now.

Rate limiting is not just a security feature. It is a control mechanism for system stability.

A mature engineering team uses rate limiting to:

protect downstream dependencies
enforce tenant contracts
prevent accidental traffic explosions
reduce brute-force and scraping abuse
shape background workloads
guard expensive endpoints
isolate noisy tenants
protect write paths during incidents
slow retry storms
keep user experience predictable

Redis is a strong fit because rate limiting usually needs:

low latency
atomic read-decide-write
TTL-based cleanup
simple counters
sorted time windows
small state per subject
high concurrency support
centralized decision state across app instances

But Redis is also easy to misuse. A naive INCR limiter can be good enough for some paths and dangerously unfair for others. A precise sorted-set limiter can be correct but memory-heavy. A token bucket can handle bursts but needs careful time arithmetic. A distributed quota can look easy until tenant hierarchy, retry behavior, fail-open policy, and clock semantics enter the design.

This part builds the mental model and implementation patterns for production-grade Java systems.

1. Kaufman Skill Decomposition

The skill is not “know a rate limiter algorithm”. The skill is:

Given a business and technical constraint, design, implement, operate, and evolve a Redis-backed admission-control mechanism whose fairness, latency, failure mode, and memory cost are explicit.

Break the skill into sub-skills:

Sub-skill	What You Must Be Able To Do
Limit modeling	Define who is limited, what action is limited, and what cost model applies.
Algorithm choice	Choose fixed window, sliding window, token bucket, leaky bucket, or hybrid.
Atomicity	Keep read-decide-update atomic under concurrency.
Key design	Build cluster-safe, tenant-safe, memory-aware keys.
Java integration	Expose a clean decision API to filters, interceptors, consumers, and workers.
HTTP/API contract	Return clear `429`, retry metadata, and user-safe error semantics.
Quota governance	Support plan limits, override rules, hierarchy, and rollout.
Failure behavior	Decide fail-open, fail-closed, degraded local limiter, or bypass.
Observability	Measure allowed, rejected, near-limit, Redis latency, hot keys, and script failures.
Testing	Prove concurrent correctness, boundary behavior, and memory cleanup.

The practice goal for this part:

Implement three Redis-backed limiters in Java: fixed window, sliding window log, and token bucket. Then stress-test them with concurrent traffic and explain which one you would deploy for each endpoint class.

2. Mental Model: Admission Control

A rate limiter is an admission controller.

It does not process the request. It decides whether the request should be allowed to enter the expensive part of the system.

This means the limiter must run before expensive work:

before DB writes
before external API calls
before CPU-heavy computation
before fanout
before message publication when publication itself causes load
before authentication brute-force-sensitive actions when possible

The limiter answers a small but important question:

Given this subject, action, cost, and time, may this operation proceed?

A production decision usually includes more than allow/deny:

public record RateLimitDecision(
    boolean allowed,
    String ruleId,
    String subject,
    long limit,
    long remaining,
    long retryAfterMillis,
    long resetAtEpochMillis,
    String algorithm
) {}

That metadata matters because upstream code needs to decide whether to:

return 429 Too Many Requests
enqueue for later
reduce batch size
switch to degraded behavior
reject only optional features
attach rate-limit headers
emit metrics

3. Rate Limit vs Quota

These terms are often mixed. Keep them separate.

Concept	Meaning	Example
Rate limit	Maximum rate over a short time window.	100 requests per minute.
Quota	Maximum consumption over a longer accounting period.	1 million API calls per month.
Burst limit	Temporary allowance above average rate.	Allow 20 requests instantly, refill 5/sec.
Concurrency limit	Maximum simultaneous in-flight work.	10 active exports per tenant.
Cost limit	Limit based on weighted units, not request count.	Search query costs 5 units, simple lookup costs 1.
Budget	A shared allowance consumed by multiple actions.	Tenant has 10,000 compute units/day.

Redis can support all of them, but not with the same data structure.

Do not use one generic limiter for every case. Different constraints imply different failure and fairness behavior.

4. The Core Invariant

Every rate limiter has the same core invariant:

For a defined subject and action, the allowed cost within the relevant control period must not exceed the configured allowance beyond the explicitly accepted error bound.

This sounds abstract, but it is the difference between engineering and copy-pasting.

Examples:

Invariant A:
A user may perform at most 5 login attempts per 60 seconds per account.

Invariant B:
A tenant may consume at most 1,000 write units per minute across all app instances.

Invariant C:
A webhook sender may create at most 50 pending jobs per second, with burst capacity 200.

Invariant D:
A free-plan tenant may consume at most 100,000 API units per calendar month.

The hidden parts are:

what is the subject?
what is the action?
what is the cost unit?
what is the window?
what is the allowed error?
what happens if Redis is unavailable?
what happens under concurrency?
what happens when config changes?
what happens near boundary times?

A top-tier engineer makes these explicit.

5. Define the Limiting Dimension

Before choosing an algorithm, define the limiting key.

A limiter dimension is usually a tuple:

environment + product + tenant + subject + action + rule + time/window

Examples:

rl:{prod}:tenant:acme:user:42:login:fixed:202607021430
rl:{prod}:tenant:acme:api-key:k_123:search:sliding
rl:{prod}:tenant:acme:global:write:token-bucket
quota:{prod}:tenant:acme:monthly:2026-07

A good limiter key has these properties:

stable across app instances
scoped enough to avoid cross-tenant interference
specific enough to reflect the real business rule
compact enough to avoid memory waste
cluster-safe when multiple keys are used atomically
versioned when algorithm semantics change
does not contain raw PII where avoidable

Bad key:

rate:user@example.com

Better key:

rl:v1:{tenant-acme}:login:user:sha256_8f23:60s

For Redis Cluster, hash tags are important. All keys inside one Lua script must live in the same hash slot. Use {tenant-acme} or another common hash tag when a script touches multiple keys.

6. Algorithm Selection Matrix

Algorithm	Accuracy	Burst Handling	Memory	Fairness	Complexity	Best For
Fixed window	Low/medium	Poor near boundary	Very low	Boundary unfairness	Low	Simple endpoint protection.
Fixed window with sub-buckets	Medium	Better	Low/medium	Better	Medium	Cheap approximate sliding limits.
Sliding window log	High	Precise	High	High	Medium	Security-sensitive low/medium volume limits.
Sliding window counter	Medium/high	Good	Medium	Good	Medium	API limits requiring smoother behavior.
Token bucket	Medium/high	Excellent	Low	Configurable	Medium/high	Burst-friendly throughput shaping.
Leaky bucket	Medium	Smooth output	Low/medium	Smooth but queue-sensitive	Medium	Worker dispatch shaping.
Concurrency semaphore	Exact active count if implemented safely	Not rate-based	Low	Depends on lease cleanup	Medium/high	Limit in-flight jobs.

There is no best algorithm. There is only the algorithm that fits the invariant.

7. Fixed Window Counter

Fixed window is the simplest Redis limiter.

For each subject and time bucket:

increment counter
set expiry if this is first use
allow if count is within limit

The problem: INCR and EXPIRE must be coupled. If the app crashes after INCR and before EXPIRE, the key may live forever. Use Lua to make the mutation atomic.

Fixed Window Lua

-- fixed_window.lua
-- KEYS[1] = counter key
-- ARGV[1] = limit
-- ARGV[2] = ttl millis
-- ARGV[3] = cost

local limit = tonumber(ARGV[1])
local ttl = tonumber(ARGV[2])
local cost = tonumber(ARGV[3])

local current = redis.call('INCRBY', KEYS[1], cost)

if current == cost then
  redis.call('PEXPIRE', KEYS[1], ttl)
end

local pttl = redis.call('PTTL', KEYS[1])
local allowed = current <= limit
local remaining = limit - current
if remaining < 0 then remaining = 0 end

return {
  allowed and 1 or 0,
  current,
  remaining,
  pttl
}

Java Wrapper

public final class FixedWindowRateLimiter {
    private final RedisCommands<String, String> redis;
    private final String scriptSha;

    public FixedWindowRateLimiter(RedisCommands<String, String> redis, String scriptSha) {
        this.redis = redis;
        this.scriptSha = scriptSha;
    }

    public RateLimitDecision check(
            String key,
            long limit,
            Duration ttl,
            long cost,
            String ruleId,
            String subject
    ) {
        @SuppressWarnings("unchecked")
        List<Object> result = redis.evalsha(
            scriptSha,
            ScriptOutputType.MULTI,
            new String[] { key },
            Long.toString(limit),
            Long.toString(ttl.toMillis()),
            Long.toString(cost)
        );

        boolean allowed = ((Number) result.get(0)).longValue() == 1L;
        long current = ((Number) result.get(1)).longValue();
        long remaining = ((Number) result.get(2)).longValue();
        long retryAfterMs = Math.max(0, ((Number) result.get(3)).longValue());

        return new RateLimitDecision(
            allowed,
            ruleId,
            subject,
            limit,
            remaining,
            allowed ? 0 : retryAfterMs,
            System.currentTimeMillis() + retryAfterMs,
            "fixed-window"
        );
    }
}

Fixed Window Boundary Problem

Fixed window is unfair at boundaries.

Assume limit = 100/minute. A user can send:

100 requests at 12:00:59
100 requests at 12:01:00

That is 200 requests in roughly 1 second.

This is acceptable for many coarse protections. It is not acceptable for brute-force login, expensive write paths, or strict fairness.

8. Fixed Window Key Construction

A fixed window key usually includes the bucket timestamp.

public final class RateLimitKeys {
    public static String fixedWindowKey(
            String env,
            String tenantId,
            String subjectType,
            String subjectHash,
            String action,
            Instant now,
            Duration window
    ) {
        long bucket = now.toEpochMilli() / window.toMillis();
        return "rl:v1:{tenant:" + tenantId + "}:" + env
            + ":" + subjectType
            + ":" + subjectHash
            + ":" + action
            + ":fw:" + bucket;
    }
}

Why include {tenant:<id>}?

cluster scripts stay slot-local for tenant-level multi-key rules
tenant hotness can be observed
rule migration can be tenant-scoped

Why hash subject?

avoid storing raw email/API key/IP where possible
reduce accidental PII leakage in Redis dumps/logs
keep key length controlled

9. Sliding Window Log

Sliding window log gives precise rolling-window enforcement.

For each request:

remove entries older than window
count current entries
if under limit, add current request timestamp
set TTL
return decision

Redis Sorted Set fits this because scores can be timestamps.

Sliding Window Log Lua

For unit-cost requests:

-- sliding_window_log.lua
-- KEYS[1] = sorted set key
-- ARGV[1] = now millis
-- ARGV[2] = window millis
-- ARGV[3] = limit
-- ARGV[4] = member id
-- ARGV[5] = ttl millis

local now = tonumber(ARGV[1])
local window = tonumber(ARGV[2])
local limit = tonumber(ARGV[3])
local member = ARGV[4]
local ttl = tonumber(ARGV[5])

local min = now - window
redis.call('ZREMRANGEBYSCORE', KEYS[1], '-inf', min)

local count = redis.call('ZCARD', KEYS[1])

if count < limit then
  redis.call('ZADD', KEYS[1], now, member)
  redis.call('PEXPIRE', KEYS[1], ttl)
  return {1, count + 1, limit - count - 1, 0}
end

local oldest = redis.call('ZRANGE', KEYS[1], 0, 0, 'WITHSCORES')
local retryAfter = window
if oldest[2] ~= nil then
  retryAfter = math.max(0, tonumber(oldest[2]) + window - now)
end

return {0, count, 0, retryAfter}

Java Member ID

The sorted-set member must be unique. If two requests use the same member, ZADD overwrites instead of adding.

Use:

String member = now.toEpochMilli() + ":" + requestId + ":" + randomSuffix;

For security-sensitive limiters, prefer a request ID from your ingress layer. For internal worker limiters, a monotonic local sequence plus instance ID is often enough.

Memory Cost

Sliding window log stores one member per accepted request within the window.

Memory rough model:

memory ≈ active_subjects × limit_per_window × average_member_overhead

If you allow 10,000 requests/minute for 10,000 active API keys, a precise log can be expensive. Use it where precision matters.

Good use cases:

login attempts
password reset attempts
OTP verification attempts
expensive export requests
abuse-sensitive endpoints

Less ideal:

very high QPS generic API gateway limits
monthly quota
massive low-risk traffic shaping

10. Sliding Window Counter

Sliding window counter approximates the rolling window with buckets. Instead of storing every request, store counts per sub-window.

Example:

limit: 1,000 requests/minute
bucket size: 10 seconds
retain 6 buckets

12:00:00-12:00:09 => 120
12:00:10-12:00:19 => 180
12:00:20-12:00:29 => 160
12:00:30-12:00:39 => 150
12:00:40-12:00:49 => 200
12:00:50-12:00:59 => 170

This reduces memory from one entry per request to one counter per active bucket.

You can implement it as:

multiple string counters
one hash with bucket fields
sorted set of bucket ids

A hash version:

key: rl:v1:{tenant:acme}:api-key:abc:search:swc
field: 29411522 -> 120
field: 29411523 -> 180

In Redis 8, hash field expiration can simplify field-level cleanup for some designs, but you still need to design for compatibility if your deployment includes older Redis versions.

Approximation Trade-off

Sliding counter is not exact unless bucket size approaches request granularity. Smaller buckets improve accuracy but increase operations and memory.

Bucket Size	Accuracy	Redis Work	Memory
1 second	High	Higher	Higher
5 seconds	Good	Medium	Medium
10 seconds	Medium	Lower	Lower
30 seconds	Low	Low	Low

A practical choice is often:

bucket_size = window / 6 to window / 12

For a 60-second window, use 5s or 10s buckets.

11. Token Bucket

Token bucket is the most useful algorithm when you want to allow bursts while enforcing an average rate.

Mental model:

bucket has capacity
tokens refill over time
each request consumes tokens
if enough tokens exist, allow
otherwise reject or delay

Example:

capacity = 100 tokens
refill = 10 tokens/second
request cost = 1 token

This permits a burst of 100 requests, then settles at 10 requests/second.

Redis State

Use one hash:

key: rl:v1:{tenant:acme}:api-key:abc:search:tb
fields:
  tokens = 73.5
  updatedAt = 1783010400123

Token Bucket Lua

-- token_bucket.lua
-- KEYS[1] = bucket key
-- ARGV[1] = now millis
-- ARGV[2] = capacity
-- ARGV[3] = refill tokens per second
-- ARGV[4] = cost
-- ARGV[5] = ttl millis

local now = tonumber(ARGV[1])
local capacity = tonumber(ARGV[2])
local refillPerSecond = tonumber(ARGV[3])
local cost = tonumber(ARGV[4])
local ttl = tonumber(ARGV[5])

local data = redis.call('HMGET', KEYS[1], 'tokens', 'updatedAt')
local tokens = tonumber(data[1])
local updatedAt = tonumber(data[2])

if tokens == nil then
  tokens = capacity
  updatedAt = now
end

if updatedAt == nil then
  updatedAt = now
end

local elapsedMillis = math.max(0, now - updatedAt)
local refill = (elapsedMillis / 1000.0) * refillPerSecond
tokens = math.min(capacity, tokens + refill)

local allowed = tokens >= cost
local retryAfter = 0

if allowed then
  tokens = tokens - cost
else
  local missing = cost - tokens
  retryAfter = math.ceil((missing / refillPerSecond) * 1000.0)
end

redis.call('HSET', KEYS[1], 'tokens', tokens, 'updatedAt', now)
redis.call('PEXPIRE', KEYS[1], ttl)

local remaining = math.floor(tokens)
return {
  allowed and 1 or 0,
  remaining,
  retryAfter,
  tokens
}

Time Source

Use application time or Redis server time?

Options:

Time Source	Pros	Cons
Application clock	Simple, no extra Redis command	Clock skew between app instances.
Redis `TIME` inside script	Single authoritative Redis time	Script depends on Redis wall clock; harder to simulate.
Gateway-provided ingress time	Consistent per request path	Requires ingress infrastructure.

For single-region Java services, app clock with NTP discipline is usually acceptable for non-financial rate limiting. For high-stakes shared tenant quotas, prefer server-side time or a single time authority.

Token Bucket Failure Edge

If now moves backward, refill becomes negative unless guarded. Use:

local elapsedMillis = math.max(0, now - updatedAt)

But that means a skewed app instance may temporarily under-refill. That is safer than over-refill.

12. Leaky Bucket

Leaky bucket shapes output at a steady rate. It is commonly modeled as a queue draining at fixed speed.

For synchronous HTTP APIs, token bucket is usually easier. For background dispatch, leaky bucket can be useful.

Example:

External provider allows 5 calls/sec.
Your system receives 200 webhook events/sec.
You queue events and release 5/sec to provider workers.

Redis implementation options:

sorted set scheduled by eligible time
stream consumer with delayed scheduling
list queue plus worker sleep
token bucket at worker dispatch time

For production, prefer explicit queue/scheduler semantics over pretending a rate limiter is a durable job queue. Part 019 covers Redis work queues and delayed jobs in more depth.

13. Weighted Rate Limits

Not all requests cost the same.

Examples:

Action	Cost
GET `/profile`	1
GET `/search?q=...`	5
POST `/bulk-import`	100
Export CSV	500
AI embedding request	variable by tokens

A production limiter should support cost.

Fixed window:

INCRBY key cost

Token bucket:

if tokens >= cost then tokens = tokens - cost end

Sliding window log is harder for weighted requests because one member represents one request. Options:

Store one member per unit cost. Precise but memory-heavy.
Store member payload with cost and sum scores manually. Expensive.
Use bucketed counters instead of log.
Use token bucket for weighted actions.

Rule of thumb:

Use sorted-set sliding logs for request-count limits. Use token buckets or counters for weighted consumption.

14. Multi-Dimensional Limits

Real systems often need multiple rules at once.

Example:

API key:
- 100 requests/minute per API key
- 1,000 requests/minute per tenant
- 20 writes/minute per user
- 10 expensive exports/hour per tenant

The request is allowed only if all relevant rules allow it.

Atomicity Challenge

If you evaluate and mutate four independent Redis keys sequentially:

rule A consumes token and allows
rule B consumes token and allows
rule C rejects

Now A and B have consumed capacity even though the request was rejected.

Solutions:

Approach	Trade-off
Check-only then commit	Race unless implemented carefully.
Reserve all, rollback on reject	Rollback may fail and complicates correctness.
One Lua script with all keys	Atomic but keys must be same slot in Cluster.
Hierarchical coarse-to-fine	Some token waste accepted.
Evaluate most restrictive first	Reduces waste but not perfect.
Use local prefilter + Redis final limiter	Reduces Redis load but adds approximation.

For high-value quota, use one atomic script or a database ledger. For low-risk rate shaping, small token waste may be acceptable.

15. Multi-Rule Token Bucket Pattern

If all keys are in one Redis Cluster slot, one Lua script can evaluate multiple token buckets.

Key idea:

rl:v1:{tenant:acme}:api-key:k123:search:tb
rl:v1:{tenant:acme}:tenant:global:search:tb
rl:v1:{tenant:acme}:user:u42:search:tb

All share hash tag {tenant:acme}.

Pseudo-flow:

for each bucket:
  refill
  if tokens < cost:
    reject without mutating any bucket

for each bucket:
  consume cost
  persist new state

return most restrictive remaining/retryAfter

This avoids partial consumption.

However, it concentrates tenant-level keys into one slot. For very large tenants, this may create a hot shard. You may need per-action sharding or a dedicated Redis deployment for rate limiting.

16. HTTP Contract

For HTTP APIs, rate limiting must become a clear client contract.

Status:

429 Too Many Requests

Useful headers:

RateLimit-Limit: 100
RateLimit-Remaining: 0
RateLimit-Reset: 1783010460
Retry-After: 12

Response body:

{
  "error": "rate_limit_exceeded",
  "message": "Too many search requests. Please retry later.",
  "ruleId": "search-per-api-key-minute",
  "retryAfterMillis": 12000
}

Rules:

Do not expose internal Redis keys.
Do not expose sensitive tenant configuration.
Include stable error code.
Include retry metadata when safe.
Avoid saying “try again immediately”.
Make client retry behavior explicit.

For authentication endpoints, be careful not to reveal whether an account exists.

Bad:

{"message":"User alice@example.com has exceeded login attempts"}

Better:

{"message":"Too many attempts. Please retry later."}

17. Java API Design

A clean domain-level API prevents Redis details from leaking into controllers.

public interface RateLimiter {
    RateLimitDecision check(RateLimitRequest request);
}

public record RateLimitRequest(
    String ruleId,
    String tenantId,
    String subjectType,
    String subjectHash,
    String action,
    long cost,
    Instant now
) {}

A config object:

public sealed interface RateLimitRule permits FixedWindowRule, SlidingWindowRule, TokenBucketRule {
    String id();
    String action();
    boolean enabled();
}

public record FixedWindowRule(
    String id,
    String action,
    long limit,
    Duration window,
    boolean enabled
) implements RateLimitRule {}

public record TokenBucketRule(
    String id,
    String action,
    long capacity,
    double refillPerSecond,
    Duration idleTtl,
    boolean enabled
) implements RateLimitRule {}

Do not make controllers know about:

Redis scripts
key naming
Redis Cluster hash tags
TTL decisions
serializer details
fallback policy

Controllers should only know:

RateLimitDecision decision = limiter.check(request);
if (!decision.allowed()) {
    throw new TooManyRequestsException(decision);
}

18. Spring Web Filter Example

@Component
public final class RateLimitFilter extends OncePerRequestFilter {
    private final RateLimiter rateLimiter;
    private final SubjectResolver subjectResolver;
    private final RateLimitRuleResolver ruleResolver;

    public RateLimitFilter(
            RateLimiter rateLimiter,
            SubjectResolver subjectResolver,
            RateLimitRuleResolver ruleResolver
    ) {
        this.rateLimiter = rateLimiter;
        this.subjectResolver = subjectResolver;
        this.ruleResolver = ruleResolver;
    }

    @Override
    protected void doFilterInternal(
            HttpServletRequest request,
            HttpServletResponse response,
            FilterChain filterChain
    ) throws ServletException, IOException {
        Optional<RateLimitRule> rule = ruleResolver.resolve(request);
        if (rule.isEmpty()) {
            filterChain.doFilter(request, response);
            return;
        }

        Subject subject = subjectResolver.resolve(request);
        RateLimitRequest check = new RateLimitRequest(
            rule.get().id(),
            subject.tenantId(),
            subject.type(),
            subject.hash(),
            rule.get().action(),
            estimateCost(request),
            Instant.now()
        );

        RateLimitDecision decision = rateLimiter.check(check);
        addHeaders(response, decision);

        if (!decision.allowed()) {
            response.setStatus(429);
            response.setContentType("application/json");
            response.getWriter().write("""
                {"error":"rate_limit_exceeded","message":"Too many requests. Please retry later."}
                """);
            return;
        }

        filterChain.doFilter(request, response);
    }

    private long estimateCost(HttpServletRequest request) {
        return switch (request.getMethod()) {
            case "GET" -> 1L;
            case "POST", "PUT", "PATCH" -> 3L;
            default -> 1L;
        };
    }

    private void addHeaders(HttpServletResponse response, RateLimitDecision decision) {
        response.setHeader("RateLimit-Limit", Long.toString(decision.limit()));
        response.setHeader("RateLimit-Remaining", Long.toString(decision.remaining()));
        response.setHeader("RateLimit-Reset", Long.toString(decision.resetAtEpochMillis() / 1000));
        if (!decision.allowed()) {
            response.setHeader("Retry-After", Long.toString(Math.max(1, decision.retryAfterMillis() / 1000)));
        }
    }
}

In production, SubjectResolver must handle:

authenticated user
anonymous IP
API key
service account
tenant
trusted proxy headers
internal calls
test traffic

Never trust raw X-Forwarded-For unless your proxy boundary is well-defined.

19. Async and Reactive Considerations

Rate limiting sits on the hot path. If your service is reactive, avoid blocking Redis calls.

Bad reactive design:

boolean allowed = blockingRedisLimiter.check(request).allowed();
return allowed ? next.handle(exchange) : reject(exchange);

Better:

public interface ReactiveRateLimiter {
    Mono<RateLimitDecision> check(RateLimitRequest request);
}

But do not use reactive Redis only because it looks modern. Use it when your request path is already non-blocking and you understand backpressure.

Failure risk:

Redis latency increases
pending reactive commands accumulate
application memory grows
timeout settings are too generous
gateway thread starvation shifts to event-loop pressure

Limit Redis command concurrency if needed.

20. Fail-Open vs Fail-Closed

What happens if Redis is unavailable?

Policy	Meaning	Use When
Fail-open	Allow request if limiter unavailable.	User-facing low-risk availability-sensitive paths.
Fail-closed	Reject request if limiter unavailable.	Security-critical or cost-critical paths.
Local fallback	Use in-memory limiter temporarily.	Moderate protection needed during Redis incident.
Degraded mode	Disable expensive features but allow basic request.	Product supports reduced behavior.
Static emergency limit	Apply coarse local cap by instance.	Incident containment.

There is no universal answer.

Examples:

Login brute force limiter unavailable -> fail closed or local fallback.
Public product listing read limiter unavailable -> fail open.
Expensive AI inference quota unavailable -> fail closed or degraded.
Payment submission limiter unavailable -> do not rely only on Redis; use DB/business idempotency.

Make this a rule property:

public enum RateLimitFailurePolicy {
    FAIL_OPEN,
    FAIL_CLOSED,
    LOCAL_FALLBACK,
    DEGRADED
}

21. Local Prefilter Pattern

At very high traffic, Redis can become the bottleneck. A local prefilter can reduce load.

Pattern:

local in-memory approximate limiter catches obvious excess
Redis global limiter makes authoritative decision for remaining requests

This is useful for:

abusive IPs
bots
high-volume public endpoints
gateway-level protection

But local prefilters are not globally fair. They are a load-shedding optimization, not a quota source of truth.

22. Quota Enforcement

Long-term quota differs from short-term rate limiting.

Example:

Tenant ACME may consume 10,000,000 API units in July 2026.

A Redis counter can track usage:

quota:v1:{tenant:acme}:api-units:2026-07 -> 734928

But quota is often business-critical. Questions:

Does quota affect billing?
Must usage survive Redis loss?
Are corrections/audits needed?
Can admins override quota?
Does quota reset by UTC, tenant timezone, or contract timezone?
Are there grace allowances?
Are quota units eventually reconciled?

For billing-grade quota, Redis should usually be a fast gate plus write-through/async ledger to a durable database.

Redis gives hot-path speed. The durable ledger gives auditability.

23. Quota With Reservation

For expensive async jobs, do not merely count after execution. Reserve quota before work starts.

State model:

AVAILABLE -> RESERVED -> CONSUMED
                     -> RELEASED

Example:

export job estimated cost: 500 units
reserve 500 units before enqueue
consume when job succeeds
release if job is cancelled before execution
adjust if actual cost differs

Redis can store reservation state temporarily, but durable job/quota state should live in the database if business-critical.

24. Concurrency Limit Is Not Rate Limit

A concurrency limit controls active work, not rate over time.

Example:

Tenant may run at most 3 active report exports.

A Redis counter with TTL is tempting:

INCR active_exports:{tenant}
DECR when done

But if the worker crashes, the counter leaks. You need a lease/semaphore pattern:

acquire slot with owner token
heartbeat or TTL
release only by owner
recover expired slots

Part 018 covers locks/leases. Part 019 covers queues/workers.

25. Redis Cluster Considerations

Lua scripts in Redis Cluster can only access keys in the same hash slot.

Bad multi-key script keys:

rl:v1:tenant:acme:user:42:search
rl:v1:tenant:acme:global:search

They may map to different slots.

Better:

rl:v1:{tenant:acme}:user:42:search
rl:v1:{tenant:acme}:global:search

But beware hot tenant concentration. For very large tenants, a single hash tag can overload one shard.

Strategies:

Strategy	Benefit	Cost
Tenant hash tag	Easy atomic multi-rule scripts	Hot tenant risk.
Action-specific hash tag	Spreads some traffic	Harder tenant-wide atomicity.
Dedicated rate-limit Redis	Isolates limiter load	More infrastructure.
Approximate distributed counters	Better spread	Weaker exactness.
Gateway-level local limit + Redis global	Reduces Redis pressure	Approximation and complexity.

26. Hot Key Mitigation

A global limiter key can become hot.

Example:

rl:v1:{global}:public-search:tb

Every request hits the same key.

Mitigation options:

shard counter into N keys and approximate aggregate
local prefilter per instance
gateway-level limiter
per-tenant/per-api-key limiter instead of global
use Redis Enterprise / managed shard scaling where appropriate
add admission control before Redis for known bad traffic

For exact global rate limiting, a hot key may be unavoidable. Then the real question is whether Redis capacity and latency budget support it.

27. TTL and Memory Cleanup

Rate limiter keys must self-clean.

For fixed windows:

ttl = window + safety_margin

For sliding logs:

ttl = window + inactivity_margin

For token buckets:

ttl = time_to_full_refill + inactivity_margin

Example:

capacity = 100
refill = 10/sec
time_to_full = 10 sec
idle_ttl = 60 sec

Do not keep idle token bucket keys forever.

Memory failure pattern:

You deploy per-IP rate limiting.
Attackers generate millions of random IP-like subjects.
Redis fills with limiter keys.
Eviction begins.
Important cache keys disappear.

Mitigations:

hash or normalize subjects
cap unauthenticated subject cardinality
use local prefilter
shorter TTL for anonymous limits
separate Redis deployment/db for abuse-control data
monitor key cardinality

28. Rule Configuration and Rollout

Rate limit rules change often. Treat them as configuration with lifecycle.

Rule fields:

id: search-per-api-key-minute
algorithm: token-bucket
action: search
scope: api-key
capacity: 120
refillPerSecond: 2
costExpression: default
failurePolicy: fail-open
enabled: true
shadowMode: false

Support:

disabled mode
shadow mode
tenant override
plan override
emergency override
gradual rollout
audit trail

Shadow mode is critical. It records whether a request would have been rejected without actually rejecting it.

allowed = true
shadowWouldReject = true

This lets you tune limits before enforcement.

29. Security and Abuse Considerations

Rate limiting often touches security-sensitive flows.

For login:

limit by account identifier
limit by IP or IP prefix
limit by device/session fingerprint where safe
limit by tenant
avoid user enumeration
use exponential backoff when appropriate

For API keys:

limit by API key hash, not raw key
support revoked/disabled key fast path
separate user and machine limits

For public endpoints:

normalize IP through trusted proxy boundary
consider ASN/country/risk signals outside Redis
avoid unlimited cardinality from spoofable headers

For admin endpoints:

lower limits for destructive actions
higher limits for read-only screens
audit rejections

Redis is one layer. Do not replace authentication, authorization, bot detection, WAF, or business validation with Redis rate limiting.

30. Observability

Metrics to emit:

Metric	Labels
`rate_limit_checks_total`	rule, algorithm, outcome
`rate_limit_rejected_total`	rule, subject_type, reason
`rate_limit_shadow_rejected_total`	rule
`rate_limit_redis_latency_ms`	command/script, outcome
`rate_limit_script_errors_total`	script, error_type
`rate_limit_fallback_total`	policy
`rate_limit_near_limit_total`	rule
`rate_limit_remaining`	sampled, rule

Do not label by raw user ID, API key, or IP in high-cardinality metrics. Use logs or traces for specific subject diagnostics.

Structured log example:

{
  "event": "rate_limit_rejected",
  "ruleId": "search-per-api-key-minute",
  "tenantId": "acme",
  "subjectType": "api-key",
  "subjectHashPrefix": "8f23ab",
  "algorithm": "token-bucket",
  "limit": 120,
  "remaining": 0,
  "retryAfterMillis": 800,
  "redisLatencyMillis": 2
}

Dashboard panels:

rejection rate by rule
allowed vs rejected
Redis latency percentile
Redis errors
fallback activations
top rejecting rules
shadow-mode would-reject
hot key warnings
memory used by limiter key namespace

31. Testing Strategy

Unit Tests

Test pure key and config logic:

key includes tenant scope
subject is hashed
bucket calculation is stable
TTL is derived correctly
cost expression works
fail policy maps correctly

Script Tests

Run Lua scripts against real Redis in Testcontainers.

Test cases:

first request allowed
request at limit allowed
request above limit rejected
TTL is set
retry-after is positive when rejected
weighted cost works
no partial mutation in multi-rule script
script handles missing state
script handles backward time safely

Concurrent Tests

For fixed window:

int threads = 64;
int attempts = 10_000;
long limit = 1_000;

// Run attempts concurrently against same key.
// Assert allowed count <= limit.

For sliding log:

all concurrent accepted members must be unique
accepted count must not exceed limit
old entries are removed

For token bucket:

burst capacity is respected
refill rate works over simulated time
negative elapsed time does not over-refill

Boundary Tests

fixed window edge burst
window rollover
TTL expiry
Redis reconnect
script cache miss
Redis timeout
config change during traffic

32. Failure Injection

Inject failures before production.

Failure	Expected Behavior
Redis timeout	Apply rule failure policy.
Redis unavailable	Fail-open/closed/local fallback as configured.
High Redis latency	Requests should timeout quickly, not pile up forever.
Script NOSCRIPT	Reload script and retry safely.
Cluster MOVED/ASK	Client handles topology.
App clock jumps backward	No over-refill.
App instance restart	No local-only state needed for global correctness.
Massive new subject cardinality	TTL/memory limits prevent Redis meltdown.

Rate limiter failure should never be surprising during an incident.

33. Common Anti-Patterns

Anti-pattern 1 — One Limit for Everything

100 requests/minute/user

This ignores action cost. A profile lookup and bulk export are not equivalent.

Anti-pattern 2 — Non-Atomic Check Then Increment

GET count
if count < limit:
  INCR count

Concurrent requests can all observe below-limit and exceed allowance. Use Lua or atomic primitives.

Anti-pattern 3 — No Expiry

Limiter keys accumulate forever. This becomes a memory incident.

Anti-pattern 4 — Raw PII in Keys

Keys are visible in Redis inspection, dumps, logs, and metrics. Hash sensitive subjects.

Anti-pattern 5 — Treating Redis Quota as Billing Truth

Redis can help enforce hot-path quota. Billing-grade usage needs durable ledger and reconciliation.

Fixed window may allow 2x bursts near boundaries. That may or may not be acceptable. Document it.

Anti-pattern 7 — No Failure Policy

If Redis is down, the app behavior becomes accidental. Make it explicit per rule.

Anti-pattern 8 — High-Cardinality Metrics

Labeling metrics with user ID or IP can break the metrics backend.

34. Pattern Catalog

Pattern A — Simple Endpoint Protection

Use fixed window.

GET /public/products
Limit: 600/minute/IP
Failure: fail open

Good enough because boundary bursts are tolerable.

Use sliding window log or strict counter + progressive backoff.

POST /login
Limit: 5/10 minutes/account + 20/10 minutes/IP
Failure: fail closed or local fallback

Avoid user enumeration.

Pattern C — API Gateway Tenant Limit

Use token bucket.

Tenant ACME Pro Plan
Capacity: 1,000
Refill: 100/sec
Failure: fail open for reads, fail closed for expensive writes

Allows bursts but controls average.

Pattern D — Expensive Async Export

Use quota reservation + concurrency lease.

Max active exports: 3
Daily export units: 10,000

Redis can help, but durable DB state should own business truth.

Pattern E — Webhook Provider Protection

Use token bucket at dispatch worker.

External provider rate: 10/sec
Burst: 50

Rejecting incoming webhook may be wrong; queue it and shape outbound calls.

35. Production Checklist

Before shipping a Redis rate limiter:

36. Decision Framework

Ask these questions in order:

Is this a short-term rate, long-term quota, burst policy, or concurrency limit?
Is exactness required or is approximation acceptable?
Is burst allowed?
Is the request cost always 1 or variable?
What is the maximum subject cardinality?
What is the memory budget?
What happens when Redis is unavailable?
Does this affect billing, legal entitlement, or security?
Does the limiter need to be globally consistent across instances?
Will multiple rules be checked atomically?

Mapping:

Answer	Likely Choice
Simple, low-risk, count-based	Fixed window.
Security-sensitive low volume	Sliding window log.
High-QPS smooth API shaping	Token bucket.
Weighted cost	Token bucket or counter quota.
Billing-grade quota	Redis fast gate + durable ledger.
In-flight limit	Lease/semaphore, not rate limiter.
Multi-rule exactness	Same-slot Lua script or durable transaction.

37. Practice Exercises

Exercise 1 — Fixed Window

Implement a fixed-window limiter with Lettuce and Lua.

Requirements:

cost supported
TTL is set atomically
returns allowed, remaining, retryAfterMillis
tested with 64 concurrent threads

Exercise 2 — Sliding Log

Implement a sorted-set sliding window limiter.

Requirements:

unique member IDs
old entries are trimmed
rejected response includes retry-after
memory usage is estimated for 1 million active subjects

Exercise 3 — Token Bucket

Implement a token bucket limiter.

Requirements:

capacity and refill rate configurable
weighted request supported
clock skew does not over-refill
idle bucket expires

Exercise 4 — Multi-Rule Evaluation

Design a limiter for:

100 req/min/api-key
1,000 req/min/tenant
10 export/hour/user

Explain whether your design is atomic, approximate, or intentionally wasteful.

38. Part Summary

Redis rate limiting is not a single pattern. It is a family of admission-control patterns.

Key points:

Fixed window is cheap but boundary-unfair.
Sliding window log is precise but memory-heavy.
Sliding window counter is a useful approximation.
Token bucket is strong for burst-friendly throughput shaping.
Weighted limits require explicit cost modeling.
Long-term quota often needs durable ledger reconciliation.
Multi-rule atomicity is harder than single-key examples show.
Redis Cluster key design matters for Lua scripts.
Failure policy must be explicit.
Observability must include rejections, Redis latency, fallback, and near-limit signals.

A senior engineer does not ask, “Which rate limiter is best?” A senior engineer asks:

What invariant am I enforcing, what error can I tolerate, and how will the system behave when Redis, traffic, or time behaves badly?

Next, Part 018 covers distributed coordination: locks, leases, fencing tokens, and the Redlock debate.

References

Redis Docs — Rate limiter use case: https://redis.io/docs/latest/develop/use-cases/rate-limiter/
Redis Tutorial — Build 5 Rate Limiters with Redis: https://redis.io/tutorials/howtos/ratelimiting/
Redis Docs — Strings and INCR: https://redis.io/docs/latest/develop/data-types/strings/
Redis Docs — Sorted Sets: https://redis.io/docs/latest/develop/data-types/sorted-sets/
Redis Docs — EXPIRE: https://redis.io/docs/latest/commands/expire/
Redis Docs — SET: https://redis.io/docs/latest/commands/set/
Redis Docs — Programmability / Lua: https://redis.io/docs/latest/develop/programmability/eval-intro/

Lesson Recap

You just completed lesson 17 in build core. Use the series map if you want to review the broader track, or continue directly into the next lesson while the context is still warm.

Back To Series Next Lesson

Continue The Track

Keep the momentum while the lesson is still fresh. Move backward for review or continue forward into the next concept.

Previous Lesson

Lesson 16

Idempotency, Deduplication, and Exactly-Once Illusions

Next Lesson

Lesson 18

Distributed Coordination: Locks, Leases, Fencing Tokens, and Redlock Debate

Rate Limiting and Quota Enforcement

Part 017 — Rate Limiting and Quota Enforcement

1. Kaufman Skill Decomposition

2. Mental Model: Admission Control

3. Rate Limit vs Quota

4. The Core Invariant

5. Define the Limiting Dimension

6. Algorithm Selection Matrix

7. Fixed Window Counter

Fixed Window Lua

Java Wrapper

Fixed Window Boundary Problem

8. Fixed Window Key Construction

9. Sliding Window Log

Sliding Window Log Lua

Java Member ID

Memory Cost

10. Sliding Window Counter

Approximation Trade-off

11. Token Bucket

Redis State

Token Bucket Lua

Time Source

Token Bucket Failure Edge

12. Leaky Bucket

13. Weighted Rate Limits

14. Multi-Dimensional Limits

Atomicity Challenge

15. Multi-Rule Token Bucket Pattern

16. HTTP Contract

17. Java API Design

18. Spring Web Filter Example

19. Async and Reactive Considerations

20. Fail-Open vs Fail-Closed

21. Local Prefilter Pattern

22. Quota Enforcement

23. Quota With Reservation

24. Concurrency Limit Is Not Rate Limit

25. Redis Cluster Considerations

26. Hot Key Mitigation

27. TTL and Memory Cleanup

28. Rule Configuration and Rollout

29. Security and Abuse Considerations

30. Observability

31. Testing Strategy

Unit Tests

Script Tests

Concurrent Tests

Boundary Tests

32. Failure Injection

33. Common Anti-Patterns

Anti-pattern 1 — One Limit for Everything

Anti-pattern 2 — Non-Atomic Check Then Increment

Anti-pattern 3 — No Expiry

Anti-pattern 4 — Raw PII in Keys

Anti-pattern 5 — Treating Redis Quota as Billing Truth

Anti-pattern 6 — Boundary-Blind Fixed Window

Anti-pattern 7 — No Failure Policy

Anti-pattern 8 — High-Cardinality Metrics

34. Pattern Catalog

Pattern A — Simple Endpoint Protection

Pattern B — Login Attempt Protection

Pattern C — API Gateway Tenant Limit

Pattern D — Expensive Async Export

Pattern E — Webhook Provider Protection

35. Production Checklist

36. Decision Framework

37. Practice Exercises

Exercise 1 — Fixed Window

Exercise 2 — Sliding Log

Exercise 3 — Token Bucket

Exercise 4 — Multi-Rule Evaluation

38. Part Summary

References