Best Rate Limiting Architectures for APIs

Best Rate Limiting Architectures for APIs

Profile-Image
Bright SEO Tools in saas Published: Apr 04, 2026 | Updated: Apr 04, 2026 · 2 months ago
0:00

Best Rate Limiting Architectures for APIs

When your API serves thousands of requests per second, naive rate limiting implementations become bottlenecks that either block legitimate traffic or fail to prevent abuse. The difference between token bucket, leaky bucket, fixed window, and sliding window algorithms determines whether your rate limiter accurately enforces limits across distributed systems or creates false rejections during traffic spikes.

This guide explains how to architect rate limiters that scale to millions of requests while providing consistent enforcement across multiple API servers. You'll learn which algorithms suit which use cases, how to implement distributed rate limiting with Redis without race conditions, and the specific tradeoffs between accuracy, performance, and memory consumption.

We cover per-user rate limits, per-endpoint limits, global system limits, and how to combine multiple limit types. You'll understand why sliding windows prevent burst abuse that fixed windows allow, why token buckets provide smoother traffic patterns than leaky buckets, and when simple algorithms outperform sophisticated ones.

Rate Limiting Algorithms

Four primary algorithms dominate production rate limiting implementations. Each has distinct characteristics that make it suitable for specific scenarios.

Fixed Window Counter

Fixed window counting divides time into fixed intervals (windows) and counts requests within each window. When a window expires, the counter resets. This is the simplest rate limiting algorithm.

class FixedWindowRateLimiter {
    constructor(maxRequests, windowSeconds) {
        this.maxRequests = maxRequests;
        this.windowSeconds = windowSeconds;
        this.windows = new Map(); // userId -> { count, windowStart }
    }

    async allowRequest(userId) {
        const now = Date.now();
        const currentWindow = Math.floor(now / (this.windowSeconds * 1000));

        const userWindow = this.windows.get(userId) || {
            count: 0,
            windowStart: currentWindow
        };

        // Check if we're in a new window
        if (userWindow.windowStart < currentWindow) {
            userWindow.count = 0;
            userWindow.windowStart = currentWindow;
        }

        // Check if request exceeds limit
        if (userWindow.count >= this.maxRequests) {
            const resetTime = (currentWindow + 1) * this.windowSeconds * 1000;
            return {
                allowed: false,
                resetAt: new Date(resetTime)
            };
        }

        userWindow.count++;
        this.windows.set(userId, userWindow);

        return {
            allowed: true,
            remaining: this.maxRequests - userWindow.count
        };
    }
}

// Usage: Allow 100 requests per 60 seconds
const limiter = new FixedWindowRateLimiter(100, 60);

Fixed windows are memory efficient and simple to implement, but they suffer from boundary problems. A user can make 100 requests at 12:59:59, then immediately make another 100 requests at 13:00:00, effectively getting 200 requests in one second despite a "100 requests per minute" limit.

Sliding Window Log

Sliding window log stores a timestamp for each request and counts only requests within the sliding time window. This eliminates the boundary problem but requires more memory to store individual timestamps.

class SlidingWindowLogRateLimiter {
    constructor(maxRequests, windowSeconds) {
        this.maxRequests = maxRequests;
        this.windowMs = windowSeconds * 1000;
        this.requestLogs = new Map(); // userId -> [timestamps]
    }

    async allowRequest(userId) {
        const now = Date.now();
        const windowStart = now - this.windowMs;

        // Get user's request log
        let userLog = this.requestLogs.get(userId) || [];

        // Remove requests outside the sliding window
        userLog = userLog.filter(timestamp => timestamp > windowStart);

        // Check if request exceeds limit
        if (userLog.length >= this.maxRequests) {
            const oldestRequest = userLog[0];
            const resetTime = oldestRequest + this.windowMs;

            return {
                allowed: false,
                resetAt: new Date(resetTime),
                retryAfter: Math.ceil((resetTime - now) / 1000)
            };
        }

        // Add current request timestamp
        userLog.push(now);
        this.requestLogs.set(userId, userLog);

        return {
            allowed: true,
            remaining: this.maxRequests - userLog.length
        };
    }

    // Clean up old logs to prevent memory leaks
    cleanup() {
        const now = Date.now();
        const windowStart = now - this.windowMs;

        for (const [userId, log] of this.requestLogs.entries()) {
            const filteredLog = log.filter(ts => ts > windowStart);

            if (filteredLog.length === 0) {
                this.requestLogs.delete(userId);
            } else {
                this.requestLogs.set(userId, filteredLog);
            }
        }
    }
}

// Periodic cleanup
setInterval(() => limiter.cleanup(), 60000);

Sliding window log provides accurate rate limiting but consumes memory proportional to the request rate. For 100 requests per minute across 10,000 users, this stores up to 1 million timestamps in memory.

Sliding Window Counter

Sliding window counter approximates sliding window log accuracy while using fixed window memory efficiency. It maintains counters for the current and previous windows and interpolates between them.

class SlidingWindowCounterRateLimiter {
    constructor(maxRequests, windowSeconds) {
        this.maxRequests = maxRequests;
        this.windowSeconds = windowSeconds;
        this.windows = new Map(); // userId -> { current, previous, windowStart }
    }

    async allowRequest(userId) {
        const now = Date.now();
        const currentWindowStart = Math.floor(now / (this.windowSeconds * 1000));

        let userWindows = this.windows.get(userId) || {
            current: 0,
            previous: 0,
            windowStart: currentWindowStart
        };

        // Check if we need to roll windows
        if (userWindows.windowStart < currentWindowStart) {
            if (userWindows.windowStart === currentWindowStart - 1) {
                // Rolling to next window, current becomes previous
                userWindows.previous = userWindows.current;
            } else {
                // More than one window passed, reset previous
                userWindows.previous = 0;
            }
            userWindows.current = 0;
            userWindows.windowStart = currentWindowStart;
        }

        // Calculate weighted count based on position in current window
        const windowElapsedMs = now % (this.windowSeconds * 1000);
        const windowProgress = windowElapsedMs / (this.windowSeconds * 1000);

        const previousWeight = 1 - windowProgress;
        const estimatedCount = Math.floor(
            userWindows.previous * previousWeight + userWindows.current
        );

        // Check if request exceeds limit
        if (estimatedCount >= this.maxRequests) {
            return {
                allowed: false,
                resetAt: new Date((currentWindowStart + 1) * this.windowSeconds * 1000)
            };
        }

        userWindows.current++;
        this.windows.set(userId, userWindows);

        return {
            allowed: true,
            remaining: this.maxRequests - estimatedCount - 1
        };
    }
}

// Example: At window position 30% (18 seconds into 60-second window)
// with previous=80, current=15, limit=100
// estimatedCount = 80 * 0.7 + 15 = 56 + 15 = 71 requests

Sliding window counter provides good accuracy with minimal memory overhead. The approximation error is negligible for most use cases and the memory usage is constant per user.

Token Bucket

Token bucket maintains a bucket of tokens that refills at a constant rate. Each request consumes a token. If no tokens are available, the request is rejected. This allows traffic bursts up to the bucket capacity while maintaining average rate limits.

class TokenBucketRateLimiter {
    constructor(capacity, refillRate) {
        this.capacity = capacity; // Maximum tokens
        this.refillRate = refillRate; // Tokens per second
        this.buckets = new Map(); // userId -> { tokens, lastRefill }
    }

    async allowRequest(userId, tokensNeeded = 1) {
        const now = Date.now();

        let bucket = this.buckets.get(userId) || {
            tokens: this.capacity,
            lastRefill: now
        };

        // Calculate tokens to add based on time elapsed
        const timePassed = (now - bucket.lastRefill) / 1000;
        const tokensToAdd = timePassed * this.refillRate;

        bucket.tokens = Math.min(this.capacity, bucket.tokens + tokensToAdd);
        bucket.lastRefill = now;

        // Check if enough tokens available
        if (bucket.tokens < tokensNeeded) {
            const tokensNeededToWait = tokensNeeded - bucket.tokens;
            const waitTime = tokensNeededToWait / this.refillRate;

            return {
                allowed: false,
                retryAfter: Math.ceil(waitTime),
                tokensAvailable: Math.floor(bucket.tokens)
            };
        }

        // Consume tokens
        bucket.tokens -= tokensNeeded;
        this.buckets.set(userId, bucket);

        return {
            allowed: true,
            tokensRemaining: Math.floor(bucket.tokens)
        };
    }

    async allowBurst(userId, burstSize) {
        // Allow burst up to capacity
        return this.allowRequest(userId, burstSize);
    }
}

// Example: 10 tokens capacity, refills at 1 token per second
// Allows burst of 10 requests, then 1 request per second average
const limiter = new TokenBucketRateLimiter(10, 1);

Token bucket is ideal for APIs that should allow occasional bursts while enforcing average rate limits. It handles variable-cost requests naturally by consuming multiple tokens per expensive operation.

Leaky Bucket

Leaky bucket processes requests at a constant rate regardless of incoming traffic patterns. Requests enter a queue that drains at a fixed rate. When the queue fills, new requests are rejected.

class LeakyBucketRateLimiter {
    constructor(capacity, leakRate) {
        this.capacity = capacity; // Maximum queue size
        this.leakRate = leakRate; // Requests processed per second
        this.queues = new Map(); // userId -> queue
        this.processQueues();
    }

    async allowRequest(userId, request) {
        let queue = this.queues.get(userId) || [];

        if (queue.length >= this.capacity) {
            return {
                allowed: false,
                queueFull: true,
                retryAfter: Math.ceil(queue.length / this.leakRate)
            };
        }

        // Add request to queue
        queue.push({
            request: request,
            timestamp: Date.now()
        });

        this.queues.set(userId, queue);

        return {
            allowed: true,
            queuePosition: queue.length
        };
    }

    processQueues() {
        setInterval(() => {
            const leakPerInterval = this.leakRate / 10; // Process every 100ms

            for (const [userId, queue] of this.queues.entries()) {
                const itemsToProcess = Math.min(
                    Math.floor(leakPerInterval),
                    queue.length
                );

                for (let i = 0; i < itemsToProcess; i++) {
                    const item = queue.shift();
                    this.processRequest(item.request);
                }

                if (queue.length === 0) {
                    this.queues.delete(userId);
                }
            }
        }, 100);
    }

    processRequest(request) {
        // Actually process the rate-limited request
        console.log('Processing request:', request);
    }
}

Leaky bucket enforces strictly constant output rate, making it suitable for protecting downstream services that can't handle traffic spikes. The tradeoff is added latency as requests wait in the queue.

Algorithm Selection Guide: Use fixed window for simple use cases with low memory. Use sliding window counter for accurate limiting with reasonable memory. Use token bucket when you need to allow bursts. Use leaky bucket when downstream services require constant rate.

Distributed Rate Limiting with Redis

When multiple API servers handle requests, each server needs access to shared rate limiting state. Redis provides the distributed coordination needed for consistent rate limiting across servers.

Redis Fixed Window Implementation

class RedisFixedWindowLimiter {
    constructor(redisClient, maxRequests, windowSeconds) {
        this.redis = redisClient;
        this.maxRequests = maxRequests;
        this.windowSeconds = windowSeconds;
    }

    async allowRequest(userId) {
        const now = Date.now();
        const window = Math.floor(now / (this.windowSeconds * 1000));
        const key = `ratelimit:${userId}:${window}`;

        // Increment counter atomically
        const count = await this.redis.incr(key);

        // Set expiry on first increment to ensure cleanup
        if (count === 1) {
            await this.redis.expire(key, this.windowSeconds * 2);
        }

        if (count > this.maxRequests) {
            return {
                allowed: false,
                current: count - 1,
                limit: this.maxRequests,
                resetAt: new Date((window + 1) * this.windowSeconds * 1000)
            };
        }

        return {
            allowed: true,
            remaining: this.maxRequests - count
        };
    }
}

Redis Sliding Window Implementation

class RedisSlidingWindowLimiter {
    constructor(redisClient, maxRequests, windowSeconds) {
        this.redis = redisClient;
        this.maxRequests = maxRequests;
        this.windowSeconds = windowSeconds;
    }

    async allowRequest(userId) {
        const now = Date.now();
        const windowStart = now - (this.windowSeconds * 1000);
        const key = `ratelimit:${userId}`;

        // Use Lua script for atomic operations
        const script = `
            local key = KEYS[1]
            local now = tonumber(ARGV[1])
            local window_start = tonumber(ARGV[2])
            local max_requests = tonumber(ARGV[3])
            local ttl = tonumber(ARGV[4])

            -- Remove old requests outside window
            redis.call('ZREMRANGEBYSCORE', key, '-inf', window_start)

            -- Count requests in window
            local current = redis.call('ZCARD', key)

            if current >= max_requests then
                return {0, current}
            end

            -- Add current request
            redis.call('ZADD', key, now, now .. '-' .. math.random())
            redis.call('EXPIRE', key, ttl)

            return {1, current + 1}
        `;

        const result = await this.redis.eval(
            script,
            1,
            key,
            now,
            windowStart,
            this.maxRequests,
            this.windowSeconds * 2
        );

        const [allowed, current] = result;

        if (allowed === 0) {
            // Get oldest request to calculate reset time
            const oldest = await this.redis.zrange(key, 0, 0, 'WITHSCORES');
            const oldestTimestamp = parseInt(oldest[1]);
            const resetTime = oldestTimestamp + (this.windowSeconds * 1000);

            return {
                allowed: false,
                current: current,
                limit: this.maxRequests,
                resetAt: new Date(resetTime),
                retryAfter: Math.ceil((resetTime - now) / 1000)
            };
        }

        return {
            allowed: true,
            remaining: this.maxRequests - current
        };
    }
}

Redis Token Bucket Implementation

class RedisTokenBucketLimiter {
    constructor(redisClient, capacity, refillRate) {
        this.redis = redisClient;
        this.capacity = capacity;
        this.refillRate = refillRate;
    }

    async allowRequest(userId, tokensNeeded = 1) {
        const key = `ratelimit:bucket:${userId}`;

        const script = `
            local key = KEYS[1]
            local capacity = tonumber(ARGV[1])
            local refill_rate = tonumber(ARGV[2])
            local tokens_needed = tonumber(ARGV[3])
            local now = tonumber(ARGV[4])

            -- Get current bucket state
            local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
            local tokens = tonumber(bucket[1]) or capacity
            local last_refill = tonumber(bucket[2]) or now

            -- Calculate tokens to add
            local time_passed = (now - last_refill) / 1000
            local tokens_to_add = time_passed * refill_rate
            tokens = math.min(capacity, tokens + tokens_to_add)

            -- Check if enough tokens
            if tokens < tokens_needed then
                redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
                redis.call('EXPIRE', key, 3600)
                return {0, math.floor(tokens)}
            end

            -- Consume tokens
            tokens = tokens - tokens_needed
            redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
            redis.call('EXPIRE', key, 3600)

            return {1, math.floor(tokens)}
        `;

        const result = await this.redis.eval(
            script,
            1,
            key,
            this.capacity,
            this.refillRate,
            tokensNeeded,
            Date.now()
        );

        const [allowed, tokensRemaining] = result;

        if (allowed === 0) {
            const tokensNeededToWait = tokensNeeded - tokensRemaining;
            const waitTime = tokensNeededToWait / this.refillRate;

            return {
                allowed: false,
                tokensRemaining: tokensRemaining,
                retryAfter: Math.ceil(waitTime)
            };
        }

        return {
            allowed: true,
            tokensRemaining: tokensRemaining
        };
    }
}
Pro Tip: Always use Lua scripts for Redis rate limiting to ensure atomic operations. Executing multiple Redis commands separately creates race conditions where two requests can both pass the limit check before either increments the counter.

Multi-Tier Rate Limiting

Production APIs typically enforce multiple rate limit tiers simultaneously: per-user limits, per-IP limits, global system limits, and per-endpoint limits. Implementing these tiers requires careful coordination to avoid false rejections.

Hierarchical Rate Limiting

class MultiTierRateLimiter {
    constructor(redisClient) {
        this.redis = redisClient;

        // Define rate limit tiers
        this.tiers = {
            global: new RedisSlidingWindowLimiter(redisClient, 10000, 60),
            perUser: new RedisSlidingWindowLimiter(redisClient, 100, 60),
            perIP: new RedisSlidingWindowLimiter(redisClient, 200, 60),
            perEndpoint: new Map() // endpoint -> limiter
        };

        // Expensive endpoints get stricter limits
        this.tiers.perEndpoint.set(
            '/api/search',
            new RedisSlidingWindowLimiter(redisClient, 20, 60)
        );
    }

    async checkAllLimits(request) {
        const { userId, ip, endpoint } = request;
        const results = [];

        // Check global limit first (cheapest to evaluate)
        const globalResult = await this.tiers.global.allowRequest('global');
        if (!globalResult.allowed) {
            return {
                allowed: false,
                tier: 'global',
                ...globalResult
            };
        }
        results.push({ tier: 'global', ...globalResult });

        // Check per-IP limit
        if (ip) {
            const ipResult = await this.tiers.perIP.allowRequest(ip);
            if (!ipResult.allowed) {
                // Rollback global limit increment
                await this.rollbackLimit('global', 'global');
                return {
                    allowed: false,
                    tier: 'per-ip',
                    ...ipResult
                };
            }
            results.push({ tier: 'per-ip', ...ipResult });
        }

        // Check per-user limit
        if (userId) {
            const userResult = await this.tiers.perUser.allowRequest(userId);
            if (!userResult.allowed) {
                // Rollback previous limits
                await this.rollbackLimit('global', 'global');
                if (ip) await this.rollbackLimit('perIP', ip);
                return {
                    allowed: false,
                    tier: 'per-user',
                    ...userResult
                };
            }
            results.push({ tier: 'per-user', ...userResult });
        }

        // Check per-endpoint limit if defined
        if (this.tiers.perEndpoint.has(endpoint)) {
            const endpointLimiter = this.tiers.perEndpoint.get(endpoint);
            const endpointResult = await endpointLimiter.allowRequest(
                `${userId}:${endpoint}`
            );

            if (!endpointResult.allowed) {
                // Rollback all previous limits
                await this.rollbackLimit('global', 'global');
                if (ip) await this.rollbackLimit('perIP', ip);
                if (userId) await this.rollbackLimit('perUser', userId);

                return {
                    allowed: false,
                    tier: 'per-endpoint',
                    endpoint: endpoint,
                    ...endpointResult
                };
            }
            results.push({ tier: 'per-endpoint', ...endpointResult });
        }

        return {
            allowed: true,
            limits: results
        };
    }

    async rollbackLimit(tier, key) {
        // Decrement the counter to rollback the limit check
        // Implementation depends on the limiter type
        const limiterKey = `ratelimit:${key}`;
        await this.redis.decr(limiterKey);
    }
}

Cost-Based Rate Limiting

Different API operations consume different resources. Search queries are more expensive than simple GETs. Assign cost weights to operations and consume multiple tokens per expensive operation.

class CostBasedRateLimiter {
    constructor(redisClient, tokenCapacity, refillRate) {
        this.limiter = new RedisTokenBucketLimiter(
            redisClient,
            tokenCapacity,
            refillRate
        );

        // Define operation costs
        this.operationCosts = {
            'GET /api/users/:id': 1,
            'POST /api/users': 5,
            'GET /api/search': 10,
            'POST /api/bulk-import': 50,
            'POST /api/export': 20
        };
    }

    async allowRequest(userId, operation) {
        const cost = this.operationCosts[operation] || 1;

        const result = await this.limiter.allowRequest(userId, cost);

        return {
            ...result,
            cost: cost,
            operation: operation
        };
    }

    async getOperationCost(method, path) {
        // Match path patterns to defined costs
        for (const [pattern, cost] of Object.entries(this.operationCosts)) {
            if (this.matchesPattern(method, path, pattern)) {
                return cost;
            }
        }
        return 1; // Default cost
    }

    matchesPattern(method, path, pattern) {
        const [patternMethod, patternPath] = pattern.split(' ');
        if (method !== patternMethod) return false;

        // Simple pattern matching (production would use proper router)
        const pathRegex = patternPath.replace(/:[\w]+/g, '[^/]+');
        return new RegExp('^' + pathRegex + '$').test(path);
    }
}

Rate Limit Headers and Client Communication

APIs should communicate rate limit status through standard HTTP headers so clients can adjust their behavior before hitting limits.

Standard Rate Limit Headers

class RateLimitMiddleware {
    constructor(rateLimiter) {
        this.rateLimiter = rateLimiter;
    }

    async middleware(req, res, next) {
        const userId = req.user?.id || req.ip;

        const result = await this.rateLimiter.allowRequest(userId);

        // Set rate limit headers (following draft standard)
        res.setHeader('X-RateLimit-Limit', result.limit);
        res.setHeader('X-RateLimit-Remaining', result.remaining || 0);
        res.setHeader('X-RateLimit-Reset', result.resetAt?.toISOString());

        if (!result.allowed) {
            res.setHeader('Retry-After', result.retryAfter);

            return res.status(429).json({
                error: 'Too Many Requests',
                message: 'Rate limit exceeded',
                limit: result.limit,
                resetAt: result.resetAt,
                retryAfter: result.retryAfter
            });
        }

        next();
    }
}

// Express usage
app.use(new RateLimitMiddleware(rateLimiter).middleware);

Dynamic Rate Limits Based on User Tier

class TieredRateLimiter {
    constructor(redisClient) {
        this.redis = redisClient;

        this.tiers = {
            free: { requests: 100, window: 3600 },
            basic: { requests: 1000, window: 3600 },
            pro: { requests: 10000, window: 3600 },
            enterprise: { requests: 100000, window: 3600 }
        };
    }

    async allowRequest(user) {
        const tier = user.subscriptionTier || 'free';
        const limits = this.tiers[tier];

        const limiter = new RedisSlidingWindowLimiter(
            this.redis,
            limits.requests,
            limits.window
        );

        const result = await limiter.allowRequest(user.id);

        return {
            ...result,
            tier: tier,
            tierLimits: limits
        };
    }

    async getUserLimits(userId) {
        const user = await db.users.findOne({ id: userId });
        const tier = user.subscriptionTier || 'free';

        return {
            tier: tier,
            ...this.tiers[tier]
        };
    }
}

Rate Limiting Edge Cases

Handling Clock Skew in Distributed Systems

When API servers have slightly different system times, rate limiting based on timestamps can produce inconsistent results. Use centralized time from Redis or rely on monotonic counters instead.

class ClockSkewResistantLimiter {
    constructor(redisClient, maxRequests, windowSeconds) {
        this.redis = redisClient;
        this.maxRequests = maxRequests;
        this.windowSeconds = windowSeconds;
    }

    async allowRequest(userId) {
        // Use Redis TIME command for consistent timestamp
        const redisTime = await this.redis.time();
        const now = redisTime[0] * 1000 + Math.floor(redisTime[1] / 1000);

        const windowStart = now - (this.windowSeconds * 1000);
        const key = `ratelimit:${userId}`;

        const script = `
            local key = KEYS[1]
            local now = tonumber(ARGV[1])
            local window_start = tonumber(ARGV[2])
            local max_requests = tonumber(ARGV[3])

            redis.call('ZREMRANGEBYSCORE', key, '-inf', window_start)
            local current = redis.call('ZCARD', key)

            if current >= max_requests then
                return {0, current}
            end

            redis.call('ZADD', key, now, now)
            redis.call('EXPIRE', key, ARGV[4])
            return {1, current + 1}
        `;

        const result = await this.redis.eval(
            script,
            1,
            key,
            now,
            windowStart,
            this.maxRequests,
            this.windowSeconds * 2
        );

        const [allowed, current] = result;

        return {
            allowed: allowed === 1,
            current: current,
            limit: this.maxRequests
        };
    }
}

Preventing Rate Limit Bypass with Multiple Identifiers

Users might try to bypass rate limits by making requests from multiple IPs or creating multiple accounts. Implement composite limits that track both authenticated and unauthenticated requests.

class BypassPreventionLimiter {
    async checkRequest(request) {
        const identifiers = [];

        // Collect all identifiers
        if (request.userId) identifiers.push(`user:${request.userId}`);
        if (request.ip) identifiers.push(`ip:${request.ip}`);
        if (request.deviceId) identifiers.push(`device:${request.deviceId}`);

        // Check limits for all identifiers
        for (const identifier of identifiers) {
            const result = await this.limiter.allowRequest(identifier);

            if (!result.allowed) {
                return {
                    allowed: false,
                    blockedBy: identifier,
                    ...result
                };
            }
        }

        // All checks passed
        return { allowed: true };
    }
}

Whitelisting and Blacklisting

class WhitelistBlacklistLimiter {
    constructor(rateLimiter) {
        this.rateLimiter = rateLimiter;
        this.whitelist = new Set(); // Users exempt from limits
        this.blacklist = new Set(); // Users completely blocked
    }

    async allowRequest(userId) {
        // Check blacklist first
        if (this.blacklist.has(userId)) {
            return {
                allowed: false,
                reason: 'User is blacklisted',
                permanent: true
            };
        }

        // Whitelist bypasses rate limits
        if (this.whitelist.has(userId)) {
            return {
                allowed: true,
                whitelisted: true
            };
        }

        // Apply normal rate limiting
        return await this.rateLimiter.allowRequest(userId);
    }

    addToWhitelist(userId) {
        this.whitelist.add(userId);
    }

    addToBlacklist(userId, reason) {
        this.blacklist.add(userId);
        // Persist to database
        db.blacklist.create({ userId, reason, createdAt: new Date() });
    }
}
Warning: Whitelists can become a security liability if not properly audited. Ensure whitelisted users still have reasonable upper bounds to prevent abuse if accounts are compromised.

Testing Rate Limiters

describe('Rate Limiter', () => {
    let limiter;
    let redis;

    beforeEach(async () => {
        redis = new Redis();
        await redis.flushall();
        limiter = new RedisSlidingWindowLimiter(redis, 10, 60);
    });

    test('allows requests under limit', async () => {
        for (let i = 0; i < 10; i++) {
            const result = await limiter.allowRequest('user1');
            expect(result.allowed).toBe(true);
        }
    });

    test('blocks requests over limit', async () => {
        // Use up allowance
        for (let i = 0; i < 10; i++) {
            await limiter.allowRequest('user1');
        }

        // Next request should be blocked
        const result = await limiter.allowRequest('user1');
        expect(result.allowed).toBe(false);
        expect(result.retryAfter).toBeGreaterThan(0);
    });

    test('resets after window expires', async () => {
        // Use up allowance
        for (let i = 0; i < 10; i++) {
            await limiter.allowRequest('user1');
        }

        // Wait for window to expire
        await new Promise(resolve => setTimeout(resolve, 61000));

        // Should allow requests again
        const result = await limiter.allowRequest('user1');
        expect(result.allowed).toBe(true);
    });

    test('enforces limits independently per user', async () => {
        // User 1 uses their allowance
        for (let i = 0; i < 10; i++) {
            await limiter.allowRequest('user1');
        }

        // User 2 should still have full allowance
        const result = await limiter.allowRequest('user2');
        expect(result.allowed).toBe(true);
    });

    test('handles concurrent requests correctly', async () => {
        const requests = [];
        for (let i = 0; i < 20; i++) {
            requests.push(limiter.allowRequest('user1'));
        }

        const results = await Promise.all(requests);
        const allowed = results.filter(r => r.allowed).length;

        // Exactly 10 should be allowed due to atomic operations
        expect(allowed).toBe(10);
    });
});

Frequently Asked Questions

Should rate limits apply to authenticated and unauthenticated requests differently?

Yes, typically authenticated users get higher limits than unauthenticated requests. Unauthenticated requests should be rate limited by IP address with conservative limits to prevent abuse, while authenticated users can have generous per-account limits. This encourages users to authenticate while protecting against anonymous abuse. Some APIs also offer higher limits for paid tiers to monetize API access.

How do you rate limit WebSocket connections?

Rate limit WebSocket connection establishment the same as HTTP requests, but implement separate rate limiting for messages sent over established connections. Track message rate per connection and disconnect clients that exceed limits. For real-time applications, consider implementing backpressure mechanisms where the server tells clients to slow down rather than immediately disconnecting.

What happens to rate limit counters when Redis fails?

When Redis is unavailable, you have three options: fail open (allow all requests), fail closed (block all requests), or use local in-memory rate limiting as fallback. Fail open risks abuse during outages but maintains availability. Fail closed protects backend services but creates bad user experience. Local fallback provides best balance but counts won't be accurate across servers. Choose based on whether you prioritize security or availability.

How do you implement rate limiting for batch operations?

Batch operations should consume tokens proportional to their size. A request to delete 100 users should consume 100 tokens from a token bucket. For operations where cost isn't directly proportional to size, define cost functions based on actual resource consumption. Monitor execution time and resource usage of different batch sizes to calibrate costs accurately.

Should rate limits be per-endpoint or global per user?

Implement both. Global per-user limits prevent total system abuse, while per-endpoint limits protect specific expensive operations. A user might have 1000 requests/hour globally but only 50 search requests/hour since searches are expensive. This granular approach prevents users from monopolizing expensive operations while allowing generous limits for cheap operations.

How do you handle rate limiting for internal services?

Internal service-to-service calls typically need rate limiting to prevent cascading failures when one service misbehaves. Use much higher limits than public API limits but still enforce boundaries. Consider implementing circuit breakers alongside rate limits for internal APIs. Whitelist internal service accounts but still set generous upper bounds to catch bugs that create infinite loops or retry storms.

What's the best way to communicate rate limits to developers?

Document rate limits clearly in API documentation with specific numbers. Return rate limit headers with every response so developers can track their usage programmatically. Provide a dashboard where developers can view their current usage and limits. Send email alerts when users approach 80% of their limits. Make upgrade paths clear for users who need higher limits.

How do you prevent clock synchronization issues in distributed rate limiting?

Use Redis TIME command to get consistent timestamps across all API servers rather than relying on server system clocks. Alternatively, use counter-based algorithms like fixed window counters that don't depend on precise timestamps. For sliding window algorithms, small clock skew (a few seconds) usually doesn't matter because the window is typically measured in minutes.

Conclusion

Rate limiting algorithms each serve specific use cases: fixed windows for simplicity, sliding windows for accuracy, token buckets for burst tolerance, and leaky buckets for constant downstream rate. Production implementations require distributed coordination through Redis with Lua scripts to ensure atomic operations prevent race conditions.

Effective rate limiting combines multiple tiers including per-user, per-IP, per-endpoint, and global limits. Cost-based limiting assigns token costs proportional to operation expense, preventing expensive operations from consuming the same quota as cheap ones. Communicate limits clearly through HTTP headers and comprehensive documentation.

The critical implementation details are using atomic Redis operations to prevent race conditions, handling edge cases like clock skew and bypass attempts, and testing concurrent request scenarios to verify limits enforce correctly under load. Choose algorithms that match your specific requirements for accuracy, memory efficiency, and burst handling.


Share on Social Media: