Sub-Millisecond Rate Limiting with Redis Lua Scripts
Building a rate limiter for a single Node.js server is easy. You can just keep an in-memory dictionary mapping IP addresses to request counts. But what happens when you deploy behind a load balancer with 10 instances of your API running concurrently?
Suddenly, that local memory is useless. Instance A doesn't know how many requests Instance B has processed. To solve this, you need a centralized datastore. The industry standard for this is Redis.
The Distributed Race Condition
A naive implementation using Redis looks like this:
- API reads the current token count for User X from Redis (
GET user_x_tokens). - If tokens > 0, API decrements the count locally.
- API writes the new count back to Redis (
SET user_x_tokens).
This introduces a critical flaw: Race conditions. If two concurrent requests arrive at exactly the same time, they both read "100 tokens". They both decrement to 99, and they both write "99" back to Redis. You've just processed two requests but only deducted one token. Under high load, your rate limiter becomes completely inaccurate.
The Solution: Redis Lua Scripts
Redis has a superpower: it is single-threaded, and it can execute embedded Lua scripts atomically. When Redis runs a Lua script, it blocks all other operations until the script completes. This guarantees absolute atomicity without the overhead of complex distributed locks.
By moving the read-decrement-write logic entirely inside Redis, we eliminate race conditions.
local tokens_key = KEYS[1]
local capacity = tonumber(ARGV[1])
local tokens = redis.call("get", tokens_key)
if tokens and tonumber(tokens) <= 0 then
return -1 -- Blocked
end
redis.call("decr", tokens_key)
return 1 -- Allowed
Performance Overhead
Is blocking the Redis thread bad for performance? Not if the script is fast. Because Lua scripts execute entirely in memory within Redis, they typically run in microseconds.
At LimitYourAPI, our entire rate-limiting decision pipeline—from the moment the request hits our Go backend, travels to Upstash Redis, executes the Lua logic, and returns to the client—averages less than 15 milliseconds globally. The actual Lua execution inside Redis takes less than 0.5ms.
Advanced Pattern: Sliding Window Log with ZSET
While the Token Bucket is great for simple rate limiting, some high-precision billing use cases require a Sliding Window Log. This ensures that a user cannot perform their entire 100-request quota in the last second of a minute and another 100 in the first second of the next.
In Lua, we can implement this efficiently using a Redis Sorted Set (ZSET):
-- ARGV[1]: Current Timestamp (ms)
-- ARGV[2]: Window Size (ms)
-- ARGV[3]: Max Requests
local window_start = ARGV[1] - ARGV[2]
-- 1. Remove old requests outside the window
redis.call("ZREMRANGEBYSCORE", KEYS[1], "-inf", window_start)
-- 2. Count remaining requests
local count = redis.call("ZCARD", KEYS[1])
if count < tonumber(ARGV[3]) then
-- 3. Add current request and allow
redis.call("ZADD", KEYS[1], ARGV[1], ARGV[1])
return 1
else
return -1 -- Rate limited
end
Cost-Aware Limiting for AI & LLMs
In the age of LLMs, not all requests are created equal. A request that generates 5 tokens is significantly cheaper than one that generates 2,000. At LimitYourAPI, we support Cost-Aware Limiting.
Instead of deducting 1 token per request, our Lua script can accept a cost argument. This allows you to rate limit based on actual resource consumption or dollar value:
local cost = tonumber(ARGV[1])
if current_tokens and tonumber(current_tokens) >= cost then
redis.call("DECRBY", KEYS[1], cost)
return 1
else
return -1
end
Latency Benchmarks
We benchmarked several distributed rate limiting strategies to see how they compared to our Redis+Lua approach:
- PostgreSQL (ACID Transactions): 85ms - 120ms (Too slow for high-volume APIs)
- Redis (GET/SET via Middleware): 12ms - 18ms (Subject to race conditions)
- Redis+Lua (LimitYourAPI): 0.8ms - 2.5ms (Atomic and deterministic)
Conclusion
If you are building a distributed system and need rate limiting, do not rely on local memory or non-atomic database operations. Leverage Redis and Lua for guaranteed consistency and blazing fast performance.
Don't want to build and manage this infrastructure yourself? That's exactly why we built LimitYourAPI. We handle the Redis clusters, the Lua scripts, and the global edge routing, so you can just drop in our SDK and get back to building your product.
Start protecting your API today