Best Rate Limiting Architectures for APIs
Best Rate Limiting Architectures for APIs
When your API serves thousands of requests per second, naive rate limiting implementations become bottlenecks that either block legitimate traffic or fail to prevent abuse. The difference between token bucket, leaky bucket, fixed window, and sliding window algorithms determines whether your rate limiter accurately enforces limits across distributed systems or creates false rejections during traffic spikes.
This guide explains how to architect rate limiters that scale to millions of requests while providing consistent enforcement across multiple API servers. You'll learn which algorithms suit which use cases, how to implement distributed rate limiting with Redis without race conditions, and the specific tradeoffs between accuracy, performance, and memory consumption.
We cover per-user rate limits, per-endpoint limits, global system limits, and how to combine multiple limit types. You'll understand why sliding windows prevent burst abuse that fixed windows allow, why token buckets provide smoother traffic patterns than leaky buckets, and when simple algorithms outperform sophisticated ones.
Rate Limiting Algorithms
Four primary algorithms dominate production rate limiting implementations. Each has distinct characteristics that make it suitable for specific scenarios.
Fixed Window Counter
Fixed window counting divides time into fixed intervals (windows) and counts requests within each window. When a window expires, the counter resets. This is the simplest rate limiting algorithm.
class FixedWindowRateLimiter {
constructor(maxRequests, windowSeconds) {
this.maxRequests = maxRequests;
this.windowSeconds = windowSeconds;
this.windows = new Map(); // userId -> { count, windowStart }
}
async allowRequest(userId) {
const now = Date.now();
const currentWindow = Math.floor(now / (this.windowSeconds * 1000));
const userWindow = this.windows.get(userId) || {
count: 0,
windowStart: currentWindow
};
// Check if we're in a new window
if (userWindow.windowStart < currentWindow) {
userWindow.count = 0;
userWindow.windowStart = currentWindow;
}
// Check if request exceeds limit
if (userWindow.count >= this.maxRequests) {
const resetTime = (currentWindow + 1) * this.windowSeconds * 1000;
return {
allowed: false,
resetAt: new Date(resetTime)
};
}
userWindow.count++;
this.windows.set(userId, userWindow);
return {
allowed: true,
remaining: this.maxRequests - userWindow.count
};
}
}
// Usage: Allow 100 requests per 60 seconds
const limiter = new FixedWindowRateLimiter(100, 60);
Fixed windows are memory efficient and simple to implement, but they suffer from boundary problems. A user can make 100 requests at 12:59:59, then immediately make another 100 requests at 13:00:00, effectively getting 200 requests in one second despite a "100 requests per minute" limit.
Sliding Window Log
Sliding window log stores a timestamp for each request and counts only requests within the sliding time window. This eliminates the boundary problem but requires more memory to store individual timestamps.
class SlidingWindowLogRateLimiter {
constructor(maxRequests, windowSeconds) {
this.maxRequests = maxRequests;
this.windowMs = windowSeconds * 1000;
this.requestLogs = new Map(); // userId -> [timestamps]
}
async allowRequest(userId) {
const now = Date.now();
const windowStart = now - this.windowMs;
// Get user's request log
let userLog = this.requestLogs.get(userId) || [];
// Remove requests outside the sliding window
userLog = userLog.filter(timestamp => timestamp > windowStart);
// Check if request exceeds limit
if (userLog.length >= this.maxRequests) {
const oldestRequest = userLog[0];
const resetTime = oldestRequest + this.windowMs;
return {
allowed: false,
resetAt: new Date(resetTime),
retryAfter: Math.ceil((resetTime - now) / 1000)
};
}
// Add current request timestamp
userLog.push(now);
this.requestLogs.set(userId, userLog);
return {
allowed: true,
remaining: this.maxRequests - userLog.length
};
}
// Clean up old logs to prevent memory leaks
cleanup() {
const now = Date.now();
const windowStart = now - this.windowMs;
for (const [userId, log] of this.requestLogs.entries()) {
const filteredLog = log.filter(ts => ts > windowStart);
if (filteredLog.length === 0) {
this.requestLogs.delete(userId);
} else {
this.requestLogs.set(userId, filteredLog);
}
}
}
}
// Periodic cleanup
setInterval(() => limiter.cleanup(), 60000);
Sliding window log provides accurate rate limiting but consumes memory proportional to the request rate. For 100 requests per minute across 10,000 users, this stores up to 1 million timestamps in memory.
Sliding Window Counter
Sliding window counter approximates sliding window log accuracy while using fixed window memory efficiency. It maintains counters for the current and previous windows and interpolates between them.
class SlidingWindowCounterRateLimiter {
constructor(maxRequests, windowSeconds) {
this.maxRequests = maxRequests;
this.windowSeconds = windowSeconds;
this.windows = new Map(); // userId -> { current, previous, windowStart }
}
async allowRequest(userId) {
const now = Date.now();
const currentWindowStart = Math.floor(now / (this.windowSeconds * 1000));
let userWindows = this.windows.get(userId) || {
current: 0,
previous: 0,
windowStart: currentWindowStart
};
// Check if we need to roll windows
if (userWindows.windowStart < currentWindowStart) {
if (userWindows.windowStart === currentWindowStart - 1) {
// Rolling to next window, current becomes previous
userWindows.previous = userWindows.current;
} else {
// More than one window passed, reset previous
userWindows.previous = 0;
}
userWindows.current = 0;
userWindows.windowStart = currentWindowStart;
}
// Calculate weighted count based on position in current window
const windowElapsedMs = now % (this.windowSeconds * 1000);
const windowProgress = windowElapsedMs / (this.windowSeconds * 1000);
const previousWeight = 1 - windowProgress;
const estimatedCount = Math.floor(
userWindows.previous * previousWeight + userWindows.current
);
// Check if request exceeds limit
if (estimatedCount >= this.maxRequests) {
return {
allowed: false,
resetAt: new Date((currentWindowStart + 1) * this.windowSeconds * 1000)
};
}
userWindows.current++;
this.windows.set(userId, userWindows);
return {
allowed: true,
remaining: this.maxRequests - estimatedCount - 1
};
}
}
// Example: At window position 30% (18 seconds into 60-second window)
// with previous=80, current=15, limit=100
// estimatedCount = 80 * 0.7 + 15 = 56 + 15 = 71 requests
Sliding window counter provides good accuracy with minimal memory overhead. The approximation error is negligible for most use cases and the memory usage is constant per user.
Token Bucket
Token bucket maintains a bucket of tokens that refills at a constant rate. Each request consumes a token. If no tokens are available, the request is rejected. This allows traffic bursts up to the bucket capacity while maintaining average rate limits.
class TokenBucketRateLimiter {
constructor(capacity, refillRate) {
this.capacity = capacity; // Maximum tokens
this.refillRate = refillRate; // Tokens per second
this.buckets = new Map(); // userId -> { tokens, lastRefill }
}
async allowRequest(userId, tokensNeeded = 1) {
const now = Date.now();
let bucket = this.buckets.get(userId) || {
tokens: this.capacity,
lastRefill: now
};
// Calculate tokens to add based on time elapsed
const timePassed = (now - bucket.lastRefill) / 1000;
const tokensToAdd = timePassed * this.refillRate;
bucket.tokens = Math.min(this.capacity, bucket.tokens + tokensToAdd);
bucket.lastRefill = now;
// Check if enough tokens available
if (bucket.tokens < tokensNeeded) {
const tokensNeededToWait = tokensNeeded - bucket.tokens;
const waitTime = tokensNeededToWait / this.refillRate;
return {
allowed: false,
retryAfter: Math.ceil(waitTime),
tokensAvailable: Math.floor(bucket.tokens)
};
}
// Consume tokens
bucket.tokens -= tokensNeeded;
this.buckets.set(userId, bucket);
return {
allowed: true,
tokensRemaining: Math.floor(bucket.tokens)
};
}
async allowBurst(userId, burstSize) {
// Allow burst up to capacity
return this.allowRequest(userId, burstSize);
}
}
// Example: 10 tokens capacity, refills at 1 token per second
// Allows burst of 10 requests, then 1 request per second average
const limiter = new TokenBucketRateLimiter(10, 1);
Token bucket is ideal for APIs that should allow occasional bursts while enforcing average rate limits. It handles variable-cost requests naturally by consuming multiple tokens per expensive operation.
Leaky Bucket
Leaky bucket processes requests at a constant rate regardless of incoming traffic patterns. Requests enter a queue that drains at a fixed rate. When the queue fills, new requests are rejected.
class LeakyBucketRateLimiter {
constructor(capacity, leakRate) {
this.capacity = capacity; // Maximum queue size
this.leakRate = leakRate; // Requests processed per second
this.queues = new Map(); // userId -> queue
this.processQueues();
}
async allowRequest(userId, request) {
let queue = this.queues.get(userId) || [];
if (queue.length >= this.capacity) {
return {
allowed: false,
queueFull: true,
retryAfter: Math.ceil(queue.length / this.leakRate)
};
}
// Add request to queue
queue.push({
request: request,
timestamp: Date.now()
});
this.queues.set(userId, queue);
return {
allowed: true,
queuePosition: queue.length
};
}
processQueues() {
setInterval(() => {
const leakPerInterval = this.leakRate / 10; // Process every 100ms
for (const [userId, queue] of this.queues.entries()) {
const itemsToProcess = Math.min(
Math.floor(leakPerInterval),
queue.length
);
for (let i = 0; i < itemsToProcess; i++) {
const item = queue.shift();
this.processRequest(item.request);
}
if (queue.length === 0) {
this.queues.delete(userId);
}
}
}, 100);
}
processRequest(request) {
// Actually process the rate-limited request
console.log('Processing request:', request);
}
}
Leaky bucket enforces strictly constant output rate, making it suitable for protecting downstream services that can't handle traffic spikes. The tradeoff is added latency as requests wait in the queue.
Distributed Rate Limiting with Redis
When multiple API servers handle requests, each server needs access to shared rate limiting state. Redis provides the distributed coordination needed for consistent rate limiting across servers.
Redis Fixed Window Implementation
class RedisFixedWindowLimiter {
constructor(redisClient, maxRequests, windowSeconds) {
this.redis = redisClient;
this.maxRequests = maxRequests;
this.windowSeconds = windowSeconds;
}
async allowRequest(userId) {
const now = Date.now();
const window = Math.floor(now / (this.windowSeconds * 1000));
const key = `ratelimit:${userId}:${window}`;
// Increment counter atomically
const count = await this.redis.incr(key);
// Set expiry on first increment to ensure cleanup
if (count === 1) {
await this.redis.expire(key, this.windowSeconds * 2);
}
if (count > this.maxRequests) {
return {
allowed: false,
current: count - 1,
limit: this.maxRequests,
resetAt: new Date((window + 1) * this.windowSeconds * 1000)
};
}
return {
allowed: true,
remaining: this.maxRequests - count
};
}
}
Redis Sliding Window Implementation
class RedisSlidingWindowLimiter {
constructor(redisClient, maxRequests, windowSeconds) {
this.redis = redisClient;
this.maxRequests = maxRequests;
this.windowSeconds = windowSeconds;
}
async allowRequest(userId) {
const now = Date.now();
const windowStart = now - (this.windowSeconds * 1000);
const key = `ratelimit:${userId}`;
// Use Lua script for atomic operations
const script = `
local key = KEYS[1]
local now = tonumber(ARGV[1])
local window_start = tonumber(ARGV[2])
local max_requests = tonumber(ARGV[3])
local ttl = tonumber(ARGV[4])
-- Remove old requests outside window
redis.call('ZREMRANGEBYSCORE', key, '-inf', window_start)
-- Count requests in window
local current = redis.call('ZCARD', key)
if current >= max_requests then
return {0, current}
end
-- Add current request
redis.call('ZADD', key, now, now .. '-' .. math.random())
redis.call('EXPIRE', key, ttl)
return {1, current + 1}
`;
const result = await this.redis.eval(
script,
1,
key,
now,
windowStart,
this.maxRequests,
this.windowSeconds * 2
);
const [allowed, current] = result;
if (allowed === 0) {
// Get oldest request to calculate reset time
const oldest = await this.redis.zrange(key, 0, 0, 'WITHSCORES');
const oldestTimestamp = parseInt(oldest[1]);
const resetTime = oldestTimestamp + (this.windowSeconds * 1000);
return {
allowed: false,
current: current,
limit: this.maxRequests,
resetAt: new Date(resetTime),
retryAfter: Math.ceil((resetTime - now) / 1000)
};
}
return {
allowed: true,
remaining: this.maxRequests - current
};
}
}
Redis Token Bucket Implementation
class RedisTokenBucketLimiter {
constructor(redisClient, capacity, refillRate) {
this.redis = redisClient;
this.capacity = capacity;
this.refillRate = refillRate;
}
async allowRequest(userId, tokensNeeded = 1) {
const key = `ratelimit:bucket:${userId}`;
const script = `
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local tokens_needed = tonumber(ARGV[3])
local now = tonumber(ARGV[4])
-- Get current bucket state
local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(bucket[1]) or capacity
local last_refill = tonumber(bucket[2]) or now
-- Calculate tokens to add
local time_passed = (now - last_refill) / 1000
local tokens_to_add = time_passed * refill_rate
tokens = math.min(capacity, tokens + tokens_to_add)
-- Check if enough tokens
if tokens < tokens_needed then
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, 3600)
return {0, math.floor(tokens)}
end
-- Consume tokens
tokens = tokens - tokens_needed
redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
redis.call('EXPIRE', key, 3600)
return {1, math.floor(tokens)}
`;
const result = await this.redis.eval(
script,
1,
key,
this.capacity,
this.refillRate,
tokensNeeded,
Date.now()
);
const [allowed, tokensRemaining] = result;
if (allowed === 0) {
const tokensNeededToWait = tokensNeeded - tokensRemaining;
const waitTime = tokensNeededToWait / this.refillRate;
return {
allowed: false,
tokensRemaining: tokensRemaining,
retryAfter: Math.ceil(waitTime)
};
}
return {
allowed: true,
tokensRemaining: tokensRemaining
};
}
}
Multi-Tier Rate Limiting
Production APIs typically enforce multiple rate limit tiers simultaneously: per-user limits, per-IP limits, global system limits, and per-endpoint limits. Implementing these tiers requires careful coordination to avoid false rejections.
Hierarchical Rate Limiting
class MultiTierRateLimiter {
constructor(redisClient) {
this.redis = redisClient;
// Define rate limit tiers
this.tiers = {
global: new RedisSlidingWindowLimiter(redisClient, 10000, 60),
perUser: new RedisSlidingWindowLimiter(redisClient, 100, 60),
perIP: new RedisSlidingWindowLimiter(redisClient, 200, 60),
perEndpoint: new Map() // endpoint -> limiter
};
// Expensive endpoints get stricter limits
this.tiers.perEndpoint.set(
'/api/search',
new RedisSlidingWindowLimiter(redisClient, 20, 60)
);
}
async checkAllLimits(request) {
const { userId, ip, endpoint } = request;
const results = [];
// Check global limit first (cheapest to evaluate)
const globalResult = await this.tiers.global.allowRequest('global');
if (!globalResult.allowed) {
return {
allowed: false,
tier: 'global',
...globalResult
};
}
results.push({ tier: 'global', ...globalResult });
// Check per-IP limit
if (ip) {
const ipResult = await this.tiers.perIP.allowRequest(ip);
if (!ipResult.allowed) {
// Rollback global limit increment
await this.rollbackLimit('global', 'global');
return {
allowed: false,
tier: 'per-ip',
...ipResult
};
}
results.push({ tier: 'per-ip', ...ipResult });
}
// Check per-user limit
if (userId) {
const userResult = await this.tiers.perUser.allowRequest(userId);
if (!userResult.allowed) {
// Rollback previous limits
await this.rollbackLimit('global', 'global');
if (ip) await this.rollbackLimit('perIP', ip);
return {
allowed: false,
tier: 'per-user',
...userResult
};
}
results.push({ tier: 'per-user', ...userResult });
}
// Check per-endpoint limit if defined
if (this.tiers.perEndpoint.has(endpoint)) {
const endpointLimiter = this.tiers.perEndpoint.get(endpoint);
const endpointResult = await endpointLimiter.allowRequest(
`${userId}:${endpoint}`
);
if (!endpointResult.allowed) {
// Rollback all previous limits
await this.rollbackLimit('global', 'global');
if (ip) await this.rollbackLimit('perIP', ip);
if (userId) await this.rollbackLimit('perUser', userId);
return {
allowed: false,
tier: 'per-endpoint',
endpoint: endpoint,
...endpointResult
};
}
results.push({ tier: 'per-endpoint', ...endpointResult });
}
return {
allowed: true,
limits: results
};
}
async rollbackLimit(tier, key) {
// Decrement the counter to rollback the limit check
// Implementation depends on the limiter type
const limiterKey = `ratelimit:${key}`;
await this.redis.decr(limiterKey);
}
}
Cost-Based Rate Limiting
Different API operations consume different resources. Search queries are more expensive than simple GETs. Assign cost weights to operations and consume multiple tokens per expensive operation.
class CostBasedRateLimiter {
constructor(redisClient, tokenCapacity, refillRate) {
this.limiter = new RedisTokenBucketLimiter(
redisClient,
tokenCapacity,
refillRate
);
// Define operation costs
this.operationCosts = {
'GET /api/users/:id': 1,
'POST /api/users': 5,
'GET /api/search': 10,
'POST /api/bulk-import': 50,
'POST /api/export': 20
};
}
async allowRequest(userId, operation) {
const cost = this.operationCosts[operation] || 1;
const result = await this.limiter.allowRequest(userId, cost);
return {
...result,
cost: cost,
operation: operation
};
}
async getOperationCost(method, path) {
// Match path patterns to defined costs
for (const [pattern, cost] of Object.entries(this.operationCosts)) {
if (this.matchesPattern(method, path, pattern)) {
return cost;
}
}
return 1; // Default cost
}
matchesPattern(method, path, pattern) {
const [patternMethod, patternPath] = pattern.split(' ');
if (method !== patternMethod) return false;
// Simple pattern matching (production would use proper router)
const pathRegex = patternPath.replace(/:[\w]+/g, '[^/]+');
return new RegExp('^' + pathRegex + '$').test(path);
}
}
Rate Limit Headers and Client Communication
APIs should communicate rate limit status through standard HTTP headers so clients can adjust their behavior before hitting limits.
Standard Rate Limit Headers
class RateLimitMiddleware {
constructor(rateLimiter) {
this.rateLimiter = rateLimiter;
}
async middleware(req, res, next) {
const userId = req.user?.id || req.ip;
const result = await this.rateLimiter.allowRequest(userId);
// Set rate limit headers (following draft standard)
res.setHeader('X-RateLimit-Limit', result.limit);
res.setHeader('X-RateLimit-Remaining', result.remaining || 0);
res.setHeader('X-RateLimit-Reset', result.resetAt?.toISOString());
if (!result.allowed) {
res.setHeader('Retry-After', result.retryAfter);
return res.status(429).json({
error: 'Too Many Requests',
message: 'Rate limit exceeded',
limit: result.limit,
resetAt: result.resetAt,
retryAfter: result.retryAfter
});
}
next();
}
}
// Express usage
app.use(new RateLimitMiddleware(rateLimiter).middleware);
Dynamic Rate Limits Based on User Tier
class TieredRateLimiter {
constructor(redisClient) {
this.redis = redisClient;
this.tiers = {
free: { requests: 100, window: 3600 },
basic: { requests: 1000, window: 3600 },
pro: { requests: 10000, window: 3600 },
enterprise: { requests: 100000, window: 3600 }
};
}
async allowRequest(user) {
const tier = user.subscriptionTier || 'free';
const limits = this.tiers[tier];
const limiter = new RedisSlidingWindowLimiter(
this.redis,
limits.requests,
limits.window
);
const result = await limiter.allowRequest(user.id);
return {
...result,
tier: tier,
tierLimits: limits
};
}
async getUserLimits(userId) {
const user = await db.users.findOne({ id: userId });
const tier = user.subscriptionTier || 'free';
return {
tier: tier,
...this.tiers[tier]
};
}
}
Rate Limiting Edge Cases
Handling Clock Skew in Distributed Systems
When API servers have slightly different system times, rate limiting based on timestamps can produce inconsistent results. Use centralized time from Redis or rely on monotonic counters instead.
class ClockSkewResistantLimiter {
constructor(redisClient, maxRequests, windowSeconds) {
this.redis = redisClient;
this.maxRequests = maxRequests;
this.windowSeconds = windowSeconds;
}
async allowRequest(userId) {
// Use Redis TIME command for consistent timestamp
const redisTime = await this.redis.time();
const now = redisTime[0] * 1000 + Math.floor(redisTime[1] / 1000);
const windowStart = now - (this.windowSeconds * 1000);
const key = `ratelimit:${userId}`;
const script = `
local key = KEYS[1]
local now = tonumber(ARGV[1])
local window_start = tonumber(ARGV[2])
local max_requests = tonumber(ARGV[3])
redis.call('ZREMRANGEBYSCORE', key, '-inf', window_start)
local current = redis.call('ZCARD', key)
if current >= max_requests then
return {0, current}
end
redis.call('ZADD', key, now, now)
redis.call('EXPIRE', key, ARGV[4])
return {1, current + 1}
`;
const result = await this.redis.eval(
script,
1,
key,
now,
windowStart,
this.maxRequests,
this.windowSeconds * 2
);
const [allowed, current] = result;
return {
allowed: allowed === 1,
current: current,
limit: this.maxRequests
};
}
}
Preventing Rate Limit Bypass with Multiple Identifiers
Users might try to bypass rate limits by making requests from multiple IPs or creating multiple accounts. Implement composite limits that track both authenticated and unauthenticated requests.
class BypassPreventionLimiter {
async checkRequest(request) {
const identifiers = [];
// Collect all identifiers
if (request.userId) identifiers.push(`user:${request.userId}`);
if (request.ip) identifiers.push(`ip:${request.ip}`);
if (request.deviceId) identifiers.push(`device:${request.deviceId}`);
// Check limits for all identifiers
for (const identifier of identifiers) {
const result = await this.limiter.allowRequest(identifier);
if (!result.allowed) {
return {
allowed: false,
blockedBy: identifier,
...result
};
}
}
// All checks passed
return { allowed: true };
}
}
Whitelisting and Blacklisting
class WhitelistBlacklistLimiter {
constructor(rateLimiter) {
this.rateLimiter = rateLimiter;
this.whitelist = new Set(); // Users exempt from limits
this.blacklist = new Set(); // Users completely blocked
}
async allowRequest(userId) {
// Check blacklist first
if (this.blacklist.has(userId)) {
return {
allowed: false,
reason: 'User is blacklisted',
permanent: true
};
}
// Whitelist bypasses rate limits
if (this.whitelist.has(userId)) {
return {
allowed: true,
whitelisted: true
};
}
// Apply normal rate limiting
return await this.rateLimiter.allowRequest(userId);
}
addToWhitelist(userId) {
this.whitelist.add(userId);
}
addToBlacklist(userId, reason) {
this.blacklist.add(userId);
// Persist to database
db.blacklist.create({ userId, reason, createdAt: new Date() });
}
}
Testing Rate Limiters
describe('Rate Limiter', () => {
let limiter;
let redis;
beforeEach(async () => {
redis = new Redis();
await redis.flushall();
limiter = new RedisSlidingWindowLimiter(redis, 10, 60);
});
test('allows requests under limit', async () => {
for (let i = 0; i < 10; i++) {
const result = await limiter.allowRequest('user1');
expect(result.allowed).toBe(true);
}
});
test('blocks requests over limit', async () => {
// Use up allowance
for (let i = 0; i < 10; i++) {
await limiter.allowRequest('user1');
}
// Next request should be blocked
const result = await limiter.allowRequest('user1');
expect(result.allowed).toBe(false);
expect(result.retryAfter).toBeGreaterThan(0);
});
test('resets after window expires', async () => {
// Use up allowance
for (let i = 0; i < 10; i++) {
await limiter.allowRequest('user1');
}
// Wait for window to expire
await new Promise(resolve => setTimeout(resolve, 61000));
// Should allow requests again
const result = await limiter.allowRequest('user1');
expect(result.allowed).toBe(true);
});
test('enforces limits independently per user', async () => {
// User 1 uses their allowance
for (let i = 0; i < 10; i++) {
await limiter.allowRequest('user1');
}
// User 2 should still have full allowance
const result = await limiter.allowRequest('user2');
expect(result.allowed).toBe(true);
});
test('handles concurrent requests correctly', async () => {
const requests = [];
for (let i = 0; i < 20; i++) {
requests.push(limiter.allowRequest('user1'));
}
const results = await Promise.all(requests);
const allowed = results.filter(r => r.allowed).length;
// Exactly 10 should be allowed due to atomic operations
expect(allowed).toBe(10);
});
});
Frequently Asked Questions
Should rate limits apply to authenticated and unauthenticated requests differently?
Yes, typically authenticated users get higher limits than unauthenticated requests. Unauthenticated requests should be rate limited by IP address with conservative limits to prevent abuse, while authenticated users can have generous per-account limits. This encourages users to authenticate while protecting against anonymous abuse. Some APIs also offer higher limits for paid tiers to monetize API access.
How do you rate limit WebSocket connections?
Rate limit WebSocket connection establishment the same as HTTP requests, but implement separate rate limiting for messages sent over established connections. Track message rate per connection and disconnect clients that exceed limits. For real-time applications, consider implementing backpressure mechanisms where the server tells clients to slow down rather than immediately disconnecting.
What happens to rate limit counters when Redis fails?
When Redis is unavailable, you have three options: fail open (allow all requests), fail closed (block all requests), or use local in-memory rate limiting as fallback. Fail open risks abuse during outages but maintains availability. Fail closed protects backend services but creates bad user experience. Local fallback provides best balance but counts won't be accurate across servers. Choose based on whether you prioritize security or availability.
How do you implement rate limiting for batch operations?
Batch operations should consume tokens proportional to their size. A request to delete 100 users should consume 100 tokens from a token bucket. For operations where cost isn't directly proportional to size, define cost functions based on actual resource consumption. Monitor execution time and resource usage of different batch sizes to calibrate costs accurately.
Should rate limits be per-endpoint or global per user?
Implement both. Global per-user limits prevent total system abuse, while per-endpoint limits protect specific expensive operations. A user might have 1000 requests/hour globally but only 50 search requests/hour since searches are expensive. This granular approach prevents users from monopolizing expensive operations while allowing generous limits for cheap operations.
How do you handle rate limiting for internal services?
Internal service-to-service calls typically need rate limiting to prevent cascading failures when one service misbehaves. Use much higher limits than public API limits but still enforce boundaries. Consider implementing circuit breakers alongside rate limits for internal APIs. Whitelist internal service accounts but still set generous upper bounds to catch bugs that create infinite loops or retry storms.
What's the best way to communicate rate limits to developers?
Document rate limits clearly in API documentation with specific numbers. Return rate limit headers with every response so developers can track their usage programmatically. Provide a dashboard where developers can view their current usage and limits. Send email alerts when users approach 80% of their limits. Make upgrade paths clear for users who need higher limits.
How do you prevent clock synchronization issues in distributed rate limiting?
Use Redis TIME command to get consistent timestamps across all API servers rather than relying on server system clocks. Alternatively, use counter-based algorithms like fixed window counters that don't depend on precise timestamps. For sliding window algorithms, small clock skew (a few seconds) usually doesn't matter because the window is typically measured in minutes.
Conclusion
Rate limiting algorithms each serve specific use cases: fixed windows for simplicity, sliding windows for accuracy, token buckets for burst tolerance, and leaky buckets for constant downstream rate. Production implementations require distributed coordination through Redis with Lua scripts to ensure atomic operations prevent race conditions.
Effective rate limiting combines multiple tiers including per-user, per-IP, per-endpoint, and global limits. Cost-based limiting assigns token costs proportional to operation expense, preventing expensive operations from consuming the same quota as cheap ones. Communicate limits clearly through HTTP headers and comprehensive documentation.
The critical implementation details are using atomic Redis operations to prevent race conditions, handling edge cases like clock skew and bypass attempts, and testing concurrent request scenarios to verify limits enforce correctly under load. Choose algorithms that match your specific requirements for accuracy, memory efficiency, and burst handling.