← API Gateway · intermediate · 13 min · 04 / 07 বাংলা

Rate Limiting at the Gateway

Fixed window, sliding window, token bucket — protect your backends from abuse and enforce fair usage without touching service code.

rate limitingtoken bucketsliding windowRedisthrottling

Real-World Analogy

A turnstile at a subway station — it allows one person through at a time, enforces a pace, and doesn’t care who you are or where you’re going. The platform (your backend) never sees the crowd; it just sees a steady stream.

Why at the Gateway

Rate limiting in every service is redundant and inconsistent. At the gateway you get:

One config to change limits globally
Limits enforced before requests consume any service resources
Aggregated view: limit per user across all services, not per-service buckets

Fixed Window

Count requests in a fixed time window (e.g., current minute). Simple but has a burst problem at window edges.

class FixedWindowLimiter {
	constructor(
		private redis: RedisClient,
		private limit: number,
		private windowSeconds: number
	) {}

	async isAllowed(key: string): Promise<{ allowed: boolean; remaining: number }> {
		const windowKey = `ratelimit:fw:${key}:${Math.floor(Date.now() / (this.windowSeconds * 1000))}`;

		const count = await this.redis.incr(windowKey);

		if (count === 1) {
			// First request in window — set expiry
			await this.redis.expire(windowKey, this.windowSeconds);
		}

		const allowed = count <= this.limit;
		return { allowed, remaining: Math.max(0, this.limit - count) };
	}
}

The edge burst problem: With a 60-request/minute limit, a client can send 60 at 11:59 and 60 at 12:00 — 120 requests in 2 seconds. Sliding window fixes this.

Sliding Window

Count requests in the last N seconds, not in the current calendar window:

class SlidingWindowLimiter {
	constructor(
		private redis: RedisClient,
		private limit: number,
		private windowMs: number
	) {}

	async isAllowed(key: string): Promise<{ allowed: boolean; remaining: number }> {
		const now = Date.now();
		const windowStart = now - this.windowMs;
		const redisKey = `ratelimit:sw:${key}`;

		const [, , count] = (await this.redis
			.multi()
			.zRemRangeByScore(redisKey, '-inf', windowStart) // remove old entries
			.zAdd(redisKey, { score: now, value: `${now}-${Math.random()}` })
			.zCard(redisKey)
			.expire(redisKey, Math.ceil(this.windowMs / 1000))
			.exec()) as [unknown, unknown, number, unknown];

		const allowed = count <= this.limit;
		return { allowed, remaining: Math.max(0, this.limit - count) };
	}
}

More accurate, but stores one Redis entry per request. For very high traffic keys, the sorted set grows large — cap with ZREMRANGEBYRANK to keep only the last N entries.

Token Bucket

The smoothest algorithm. A bucket fills at a constant rate (refill rate). Each request consumes one token. Bursts are allowed up to the bucket capacity.

class TokenBucketLimiter {
	constructor(
		private redis: RedisClient,
		private capacity: number, // max tokens (burst size)
		private refillRate: number // tokens per second
	) {}

	async isAllowed(key: string): Promise<{ allowed: boolean; remaining: number }> {
		const now = Date.now() / 1000; // seconds
		const bucketKey = `ratelimit:tb:${key}`;

		// Lua script for atomicity
		const script = `
      local key = KEYS[1]
      local capacity = tonumber(ARGV[1])
      local refill_rate = tonumber(ARGV[2])
      local now = tonumber(ARGV[3])
      local requested = tonumber(ARGV[4])

      local bucket = redis.call('HMGET', key, 'tokens', 'last_refill')
      local tokens = tonumber(bucket[1]) or capacity
      local last_refill = tonumber(bucket[2]) or now

      -- Refill tokens based on elapsed time
      local elapsed = now - last_refill
      tokens = math.min(capacity, tokens + elapsed * refill_rate)

      local allowed = 0
      if tokens >= requested then
        tokens = tokens - requested
        allowed = 1
      end

      redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
      redis.call('EXPIRE', key, math.ceil(capacity / refill_rate) + 1)

      return { allowed, math.floor(tokens) }
    `;

		const [allowed, remaining] = (await this.redis.eval(
			script,
			1,
			bucketKey,
			this.capacity,
			this.refillRate,
			now,
			1
		)) as [number, number];

		return { allowed: allowed === 1, remaining };
	}
}

Response Headers

Always tell clients their rate limit status:

function applyRateLimitHeaders(
	res: Response,
	limit: number,
	remaining: number,
	resetSeconds: number
): void {
	res.set({
		'X-RateLimit-Limit': String(limit),
		'X-RateLimit-Remaining': String(remaining),
		'X-RateLimit-Reset': String(Math.floor(Date.now() / 1000) + resetSeconds),
		'Retry-After': remaining === 0 ? String(resetSeconds) : undefined
	});
}

// When limited:
res.status(429).json({
	error: 'Too Many Requests',
	retryAfter: resetSeconds
});

The Retry-After header lets well-behaved clients back off automatically instead of hammering you harder.

Limit Keys

What you limit on determines the attack surface:

function getLimitKey(req: Request): string {
	// Option 1: by authenticated user (most fair)
	if (req.headers['x-user-id']) {
		return `user:${req.headers['x-user-id']}`;
	}

	// Option 2: by API key
	if (req.headers['x-api-key']) {
		return `apikey:${hashApiKey(req.headers['x-api-key'] as string)}`;
	}

	// Option 3: by IP (for unauthenticated routes)
	return `ip:${req.ip}`;
}

Layered limits — apply multiple limits simultaneously:

async function checkRateLimits(req: Request): Promise<void> {
	const userId = req.headers['x-user-id'] as string;

	await Promise.all([
		// Global: 1000 req/min per user
		limiter.check(`global:${userId}`, 1000, 60),
		// Per-route: 100 req/min on expensive endpoints
		limiter.check(`route:${req.path}:${userId}`, 100, 60),
		// Burst: max 20 req/sec
		limiter.check(`burst:${userId}`, 20, 1)
	]);
}

Kong Rate Limiting Plugin

In production, use battle-tested plugins rather than rolling your own:

# Kong declarative config (deck)
plugins:
  - name: rate-limiting
    config:
      minute: 1000
      hour: 10000
      policy: redis
      redis_host: redis
      redis_port: 6379
      limit_by: consumer # or ip, credential, header
      hide_client_headers: false

Kong handles the Redis atomicity, header injection, and 429 responses. Your job is configuring the limits per route and per consumer tier.