Architecture

Token Bucket vs Sliding Window Rate Limiting

By Yash · Published April 2026

When you implement rate limiting, the first technical decision you have to make is choosing the underlying mathematical algorithm. The algorithm you choose dictates how strictly traffic is shaped, how memory-intensive the operation is, and how bursts of traffic are handled.

The two most dominant algorithms in modern distributed systems are the Token Bucket and the Sliding Window. Let's explore how they work, their pros and cons, and when to use which.

The Token Bucket Algorithm

Imagine a physical bucket that holds a maximum number of tokens. Every time an API request is made, a token is removed from the bucket. If the bucket is empty, the request is rejected. Concurrently, a separate process adds new tokens to the bucket at a constant, fixed rate.

How it works technically:

// Token Bucket Logic
tokens = min(capacity, current_tokens + (time_passed * refill_rate))
if tokens >= 1 {
  tokens--
  return ALLOWED
} else {
  return BLOCKED
}

Pros of Token Bucket:

The Sliding Window Algorithm

Instead of abstract tokens, the Sliding Window algorithm keeps an exact log of timestamps for every request made by a user. When a new request comes in, the algorithm looks back exactly N seconds (the window) and counts how many requests occurred in that exact timeframe.

How it works technically:

If the limit is 100 requests per minute, the system looks at the exact timestamps of requests made in the last 60 seconds rolling window. If the count is < 100, the request is allowed and its timestamp is logged. If it's >= 100, it's blocked.

Pros of Sliding Window:

The Leaky Bucket Algorithm: Traffic Shaping

While often confused with the Token Bucket, the Leaky Bucket is fundamentally different. Imagine a bucket with a small hole at the bottom. Water (requests) can be poured into the bucket at any rate, but it only leaks out (gets processed) at a constant, fixed rate. If the bucket overflows, the requests are dropped.

Key Difference:

Unlike the Token Bucket, which allows for bursts, the Leaky Bucket enforces a strictly uniform output rate. This is ideal for traffic shaping where you want to protect a backend service that can only handle exactly X transactions per second without any deviation.

The Mathematical Model

For a Token Bucket, the maximum burst size \( B \) is equal to the bucket capacity. The average rate \( R \) is the refill rate. The maximum number of requests a user can make in time \( T \) is:

Total Requests = B + (R * T)

Comparison Matrix: Choosing Your Weapon

Feature Token Bucket Sliding Window Leaky Bucket
Memory O(1) - Low O(N) - High O(1) - Low
Bursting Allowed Strict Zero Burst
Accuracy High Absolute High
Use Case Public APIs Critical Quotas Traffic Shaping

Which Should You Choose?

Use Token Bucket when: You are protecting public APIs, external developer endpoints, or any service where you want to allow legitimate bursts of traffic while maintaining a steady long-term rate. (This is what AWS, Stripe, and LimitYourAPI use by default).

Use Sliding Window when: You are protecting highly sensitive, resource-intensive endpoints (like an expensive AI model inference route) where absolute strictness is required and memory cost is not an issue.

At LimitYourAPI, we provide both algorithms out of the box, implemented natively in Redis Lua scripts for sub-millisecond execution regardless of which one you choose.


Try both algorithms for free