Skip to content
← API Gateway · beginner · 13 min · 02 / 07

Routing & Load Balancing

Path matching, header-based routing, weighted splits, and health-aware balancing — how the gateway decides where each request goes.

routingload balancingweighted traffichealth checksnginx

Real-World Analogy

A traffic management system at a busy intersection — it reads the destination on every car (URL, headers), knows which roads are clear (healthy backends), and directs each car accordingly. If one road is closed (unhealthy instance), it stops sending cars there without anyone having to manually redirect traffic.

Path-Based Routing

The most common pattern. Route by URL prefix to a backend service.

nginx:

upstream user_service {
    server user-service-1:3001;
    server user-service-2:3001;
}

upstream order_service {
    server order-service-1:3002;
}

server {
    listen 443 ssl;

    location /api/users/ {
        proxy_pass http://user_service/;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }

    location /api/orders/ {
        proxy_pass http://order_service/;
        proxy_set_header Host $host;
    }

    location /api/products/ {
        proxy_pass http://product_service/;
    }
}

Traefik (docker-compose labels):

services:
  user-service:
    image: user-service:latest
    labels:
      - "traefik.http.routers.users.rule=PathPrefix(`/api/users`)"
      - "traefik.http.services.users.loadbalancer.server.port=3001"

  order-service:
    image: order-service:latest
    labels:
      - "traefik.http.routers.orders.rule=PathPrefix(`/api/orders`)"
      - "traefik.http.services.orders.loadbalancer.server.port=3002"

Header-Based Routing

Route by request headers — useful for versioning, A/B tests, or tenant routing.

# Route by API version header
map $http_x_api_version $backend {
    "v2"     "http://api-v2:3000";
    default  "http://api-v1:3000";
}

server {
    location /api/ {
        proxy_pass $backend;
    }
}
// Kong plugin or custom middleware: route by tenant
function tenantRouter(req: Request): string {
  const tenantId = req.headers['x-tenant-id'];

  // Enterprise tenants get dedicated instances
  if (enterpriseTenants.has(tenantId)) {
    return `http://enterprise-cluster-${tenantId}:3000`;
  }

  return 'http://shared-cluster:3000';
}

Load Balancing Algorithms

Once a route is matched, the gateway picks which backend instance handles the request.

Round robin — requests distributed evenly, one at a time:

upstream backend {
    server backend-1:3000;
    server backend-2:3000;
    server backend-3:3000;
    # default: round-robin
}

Least connections — send to the instance with fewest active requests. Better when requests have variable duration:

upstream backend {
    least_conn;
    server backend-1:3000;
    server backend-2:3000;
    server backend-3:3000;
}

IP hash — same client always hits the same backend (session affinity):

upstream backend {
    ip_hash;
    server backend-1:3000;
    server backend-2:3000;
}

Weighted — send more traffic to higher-capacity instances:

upstream backend {
    server backend-1:3000 weight=3;  # 3x traffic
    server backend-2:3000 weight=1;
}

Health Checks

The gateway must stop sending traffic to unhealthy backends automatically.

Passive health checks (default in nginx) — mark a backend unhealthy after N consecutive failures:

upstream backend {
    server backend-1:3000 max_fails=3 fail_timeout=30s;
    server backend-2:3000 max_fails=3 fail_timeout=30s;
}

Active health checks (nginx Plus / open-source alternatives):

# nginx Plus
upstream backend {
    zone backend 64k;
    server backend-1:3000;
    server backend-2:3000;

    health_check interval=5s fails=2 passes=2 uri=/health;
}

Traefik health checks:

services:
  api:
    labels:
      - "traefik.http.services.api.loadbalancer.healthcheck.path=/health"
      - "traefik.http.services.api.loadbalancer.healthcheck.interval=10s"
      - "traefik.http.services.api.loadbalancer.healthcheck.timeout=3s"

Your backend /health endpoint should check its own dependencies:

app.get('/health', async (req, res) => {
  try {
    await db.query('SELECT 1'); // verify DB connection
    await redis.ping();          // verify cache connection
    res.json({ status: 'ok' });
  } catch (err) {
    res.status(503).json({ status: 'degraded', error: err.message });
  }
});

Weighted Traffic Splits (Canary Deploys)

Send a small percentage of traffic to a new version before full rollout:

upstream stable {
    server api-v1-1:3000;
    server api-v1-2:3000;
}

upstream canary {
    server api-v2-1:3000;
}

# Split: 95% stable, 5% canary
split_clients "${remote_addr}${request_uri}" $backend_pool {
    5%   canary;
    *    stable;
}

server {
    location /api/ {
        proxy_pass http://$backend_pool;
    }
}

Kong / Traefik weighted service:

# Traefik weighted round-robin
http:
  services:
    weighted:
      weighted:
        services:
          - name: stable
            weight: 95
          - name: canary
            weight: 5

Timeouts

Every route should have explicit timeouts. Without them, a slow backend holds connections indefinitely:

location /api/ {
    proxy_pass http://notes;

    proxy_connect_timeout 2s;    # time to establish connection
    proxy_send_timeout    10s;   # time to send request
    proxy_read_timeout    30s;   # time to receive response

    # Return 504 if backend doesn't respond in time
}

Match timeouts to your SLOs. A 30-second timeout on a route that should respond in 200ms means 30 seconds of degraded user experience before you detect the problem.