Architecture

Token Bucket vs Sliding Window Rate Limiting

By Yash · Published April 2026

When you implement rate limiting, the first technical decision you have to make is choosing the underlying mathematical algorithm. The algorithm you choose dictates how strictly traffic is shaped, how memory-intensive the operation is, and how bursts of traffic are handled.

The two most dominant algorithms in modern distributed systems are the Token Bucket and the Sliding Window. Let's explore how they work, their pros and cons, and when to use which.

The Token Bucket Algorithm

Imagine a physical bucket that holds a maximum number of tokens. Every time an API request is made, a token is removed from the bucket. If the bucket is empty, the request is rejected. Concurrently, a separate process adds new tokens to the bucket at a constant, fixed rate.

How it works technically:

Capacity (Burst): The maximum number of tokens the bucket can hold.
Refill Rate: The rate at which new tokens are added (e.g., 10 tokens per second).

// Token Bucket Logic
tokens = min(capacity, current_tokens + (time_passed * refill_rate))
if tokens >= 1 {
  tokens--
  return ALLOWED
} else {
  return BLOCKED
}

Pros of Token Bucket:

Allows Bursts: If a user hasn't made requests in a while, their bucket fills up. They can then make a sudden burst of requests up to the bucket capacity without being blocked.
Memory Efficient: You only need to store two variables per user: the current token count and the timestamp of the last request.

The Sliding Window Algorithm

Instead of abstract tokens, the Sliding Window algorithm keeps an exact log of timestamps for every request made by a user. When a new request comes in, the algorithm looks back exactly N seconds (the window) and counts how many requests occurred in that exact timeframe.

How it works technically:

If the limit is 100 requests per minute, the system looks at the exact timestamps of requests made in the last 60 seconds rolling window. If the count is < 100, the request is allowed and its timestamp is logged. If it's >= 100, it's blocked.

Pros of Sliding Window:

Perfect Accuracy: It enforces limits perfectly. There are no edge cases where a user can double their quota by making requests right at the boundary of a reset window (a flaw in Fixed Window algorithms).
Smooth Traffic: It completely smooths out traffic spikes, ensuring that the backend is never hit with sudden bursts.

The Leaky Bucket Algorithm: Traffic Shaping

While often confused with the Token Bucket, the Leaky Bucket is fundamentally different. Imagine a bucket with a small hole at the bottom. Water (requests) can be poured into the bucket at any rate, but it only leaks out (gets processed) at a constant, fixed rate. If the bucket overflows, the requests are dropped.

Key Difference:

Unlike the Token Bucket, which allows for bursts, the Leaky Bucket enforces a strictly uniform output rate. This is ideal for traffic shaping where you want to protect a backend service that can only handle exactly X transactions per second without any deviation.

The Mathematical Model

For a Token Bucket, the maximum burst size \( B \) is equal to the bucket capacity. The average rate \( R \) is the refill rate. The maximum number of requests a user can make in time \( T \) is:

Total Requests = B + (R * T)

Comparison Matrix: Choosing Your Weapon

Feature Token Bucket Sliding Window Leaky Bucket

Memory O(1) - Low O(N) - High O(1) - Low

Bursting Allowed Strict Zero Burst

Accuracy High Absolute High

Use Case Public APIs Critical Quotas Traffic Shaping

Which Should You Choose?

Use Token Bucket when: You are protecting public APIs, external developer endpoints, or any service where you want to allow legitimate bursts of traffic while maintaining a steady long-term rate. (This is what AWS, Stripe, and LimitYourAPI use by default).

Use Sliding Window when: You are protecting highly sensitive, resource-intensive endpoints (like an expensive AI model inference route) where absolute strictness is required and memory cost is not an issue.

At LimitYourAPI, we provide both algorithms out of the box, implemented natively in Redis Lua scripts for sub-millisecond execution regardless of which one you choose.

Try both algorithms for free