← Performance · intermediate · 9 min · 05 / 06 বাংলা

Load Testing

k6, autocannon, realistic traffic models, finding the breaking point — and why load testing in staging is different from production.

load testingk6autocannonthroughputbreaking pointperformance baseline

Real-World Analogy

A fire drill vs an actual fire: a fire drill reveals whether evacuation procedures work under controlled conditions, before the real emergency. Load testing does the same for traffic — you find out whether the system breaks, where it breaks, and how it breaks, while you’re in control and can stop the test. Discovering it during a traffic spike is the fire.

Types of Load Tests

Baseline: measure normal behavior. What are the P50/P99 latencies at expected traffic?

Stress: increase load until something breaks. Find the system’s limits.

Soak: run at expected load for hours. Find memory leaks, connection pool exhaustion, log file growth.

Spike: sudden 10x traffic jump. Does the system recover? How long does it take?

Breakpoint: same as stress but you’re looking for the exact RPS where P99 crosses your SLO.

autocannon

Fast, simple HTTP benchmarking for Node.js:

# 100 connections, 30 seconds
npx autocannon -c 100 -d 30 http://localhost:3000/api/orders

# With a body (POST)
npx autocannon -c 100 -d 30 \
  -m POST \
  -H 'Content-Type: application/json' \
  -b '{"customerId":"cust-123","items":[{"productId":"prod-456","quantity":1}]}' \
  http://localhost:3000/api/orders

# Ramp up (10 connections, then 50, then 100)
npx autocannon -c 10 -d 10 http://localhost:3000/api/orders
npx autocannon -c 50 -d 10 http://localhost:3000/api/orders
npx autocannon -c 100 -d 10 http://localhost:3000/api/orders

Output:

Stat         | 2.5% | 50%  | 97.5% | 99%  | Avg   | Stdev | Max
Latency      | 14ms | 22ms | 89ms  | 145ms| 23.4ms| 19.1ms| 2341ms

Req/Sec      | 3240 | 4100 | 4380  | 4410 |
Bytes/Sec    | 2.1M | 2.7M | 2.9M  | 2.9M |

35672 requests in 10s, 236 MB read

Watch for: P99 climbing, error rate appearing, Max exploding above P99 (outliers).

k6

k6 is a scripting tool for complex load test scenarios — think realistic user journeys, not just “hammer this endpoint”:

// load-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate } from 'k6/metrics';

const errorRate = new Rate('errors');

export const options = {
	stages: [
		{ duration: '2m', target: 10 }, // ramp up to 10 users
		{ duration: '5m', target: 10 }, // stay at 10
		{ duration: '2m', target: 50 }, // ramp to 50
		{ duration: '5m', target: 50 }, // stay at 50
		{ duration: '2m', target: 100 }, // ramp to 100
		{ duration: '5m', target: 100 }, // stay at 100
		{ duration: '2m', target: 0 } // ramp down
	],
	thresholds: {
		http_req_duration: ['p(99)<500'], // 99% of requests < 500ms
		errors: ['rate<0.01'] // error rate < 1%
	}
};

const BASE_URL = 'http://staging.api.example.com';

export default function () {
	// Realistic user journey
	// 1. Login
	const loginRes = http.post(
		`${BASE_URL}/auth/login`,
		JSON.stringify({
			email: 'test@example.com',
			password: 'testpass'
		}),
		{ headers: { 'Content-Type': 'application/json' } }
	);

	check(loginRes, {
		'login succeeded': (r) => r.status === 200
	});

	const token = loginRes.json('token');

	// 2. Browse products
	const productsRes = http.get(`${BASE_URL}/api/products`, {
		headers: { Authorization: `Bearer ${token}` }
	});

	check(productsRes, {
		'products loaded': (r) => r.status === 200,
		'has products': (r) => r.json('items').length > 0
	});

	errorRate.add(productsRes.status >= 400);
	sleep(1); // user "thinks"

	// 3. Create order
	const product = productsRes.json('items')[0];
	const orderRes = http.post(
		`${BASE_URL}/api/orders`,
		JSON.stringify({
			items: [{ productId: product.id, quantity: 1 }]
		}),
		{
			headers: {
				Authorization: `Bearer ${token}`,
				'Content-Type': 'application/json'
			}
		}
	);

	check(orderRes, {
		'order created': (r) => r.status === 201
	});

	errorRate.add(orderRes.status >= 400);
	sleep(2);
}

# Run the test
k6 run load-test.js

# With output to InfluxDB (for Grafana dashboard)
k6 run --out influxdb=http://localhost:8086/k6 load-test.js

# HTML report
k6 run --out json=results.json load-test.js
k6-html-reporter results.json

Finding the Breaking Point

Binary search on RPS to find where P99 crosses your SLO:

// breakpoint-test.js
import http from 'k6/http';
import { check } from 'k6';

export const options = {
	executor: 'ramping-arrival-rate', // RPS-based (not VU-based)
	stages: [
		{ target: 100, duration: '1m' }, // 100 RPS for 1 minute
		{ target: 200, duration: '1m' }, // 200 RPS
		{ target: 400, duration: '1m' }, // 400 RPS
		{ target: 800, duration: '1m' }, // 800 RPS
		{ target: 1600, duration: '1m' } // 1600 RPS — will this break it?
	],
	preAllocatedVUs: 200,
	maxVUs: 500,
	thresholds: {
		http_req_duration: ['p(99)<500']
	}
};

Watch the Grafana dashboard while the test runs. The point where P99 starts climbing steeply is your inflection point — where queuing begins. Your sustainable RPS is about 70% of the breaking point.

Realistic Test Data

Load testing with cust-123 hardcoded produces unrealistic cache hit rates and database behavior:

// k6 — parameterized test data
import { SharedArray } from 'k6/data';

const users = new SharedArray('users', function () {
	return JSON.parse(open('./test-users.json')); // 10,000 test users
});

const products = new SharedArray('products', function () {
	return JSON.parse(open('./test-products.json'));
});

export default function () {
	const user = users[Math.floor(Math.random() * users.length)];
	const product = products[Math.floor(Math.random() * products.length)];

	// Now the test exercises different code paths, database queries,
	// and cache keys — closer to production behavior
}

Monitoring During Load Tests

Watch these metrics while the test runs:

# Node.js process
watch -n1 "node -e \"const p=process; console.log(JSON.stringify(p.memoryUsage()))\""

# PostgreSQL connections and queries
watch -n1 "psql -c \"SELECT count(*), state FROM pg_stat_activity GROUP BY state\""

# Redis
watch -n1 "redis-cli info stats | grep -E 'connected_clients|used_memory_human|instantaneous_ops_per_sec'"

# System
htop  # CPU and memory
iostat -x 1  # disk I/O
ss -s  # connection counts

When latency spikes: check which resource saturated first. CPU? Disk I/O? Connection pool? The first one to saturate is the bottleneck.

Staging vs Production Differences

Load test in staging, but be aware of the gaps:

	Staging	Production
Database size	Small (100k rows)	Large (10M+ rows)
Cache state	Cold	Warm
Index effectiveness	Artificially good	Real-world performance
External API latency	Mocked	Variable
Background jobs	Off	Running and consuming resources

Mitigation:

Seed staging with production-scale data (anonymized)
Run load test with cache cold AND warm — measure both
Enable background jobs during load test
Mock external APIs with realistic latency (p50=100ms, p99=500ms)

A load test on empty-table staging will not reveal the index you forgot. Test with real data scale.