← Capacity & Cost · intermediate · 10 min · 02 / 06 বাংলা

Cloud vs Bare Metal vs VPS

Unit economics for each deployment model — when managed convenience costs more than it saves, and when it doesn't.

cloudbare metalVPSunit economicsTCO

Real-World Analogy

Renting vs buying a car: renting (cloud) costs more per mile but you can return it when you don’t need it, get a bigger one for a road trip, and never worry about maintenance. Buying (bare metal) is cheaper per mile if you drive a lot, but you’re on the hook for repairs, and it sits depreciating when unused.

The Three Models

Cloud (AWS, GCP, Azure):

Pay per hour of use — no upfront cost
Provision in minutes, deprovision immediately
Managed services for everything (RDS, ElastiCache, S3)
Premium pricing: 3-5x the cost of equivalent bare metal

VPS (Hetzner, Linode, DigitalOcean, Vultr):

Fixed monthly cost for a virtual machine
Good managed add-ons (managed Postgres, load balancers)
60-80% cheaper than AWS for equivalent specs
No spot instances, limited auto-scaling

Bare Metal (Hetzner Dedicated, OVH, Equinix):

Physical server, rented or owned
Cheapest per-core and per-GB-RAM at scale
No virtualization overhead
Lead time to provision: days to weeks
You manage everything at the OS level

Unit Cost Comparison

Comparing a roughly equivalent 8-core / 32GB RAM setup (2024 pricing):

Provider	Type	Monthly Cost	Notes
AWS (m7g.2xlarge)	Cloud	~$230 on-demand	More with data transfer
AWS (m7g.2xlarge)	Reserved 1yr	~$140	Commit upfront
Hetzner CX52	VPS	~$55	ARM-based, EU/US regions
DigitalOcean	VPS	~$96	More regions, better support
Hetzner AX102	Bare Metal	~$90	14-core, 64GB, NVMe

The same workload that costs $2,000/month on AWS on-demand runs for $500-600/month on Hetzner. The gap grows with data transfer and managed service costs.

True Cost of Cloud

Cloud bills have multipliers that the headline instance price hides:

Data transfer (egress):

AWS: $0.09/GB out to internet
At 10TB/month: $900/month just for egress

Hetzner: 20TB included in VPS plans, $1/TB after
Cloudflare (for static): free egress

Managed services premium:

RDS db.t3.medium (2 vCPU, 4GB):
  AWS RDS:        $60/month
  Self-hosted Postgres on $6 VPS: $6/month

ElastiCache cache.t3.micro (1 vCPU, 0.5GB):
  AWS ElastiCache: $25/month
  Self-hosted Redis on shared VPS: ~$5/month amortized

Operational overhead of self-hosting:

Self-hosted Postgres:
  Backup setup: 4 hours
  Monitoring setup: 4 hours
  Ongoing ops: 1-2 hours/month

RDS buys back this time — worth it until you're large enough to hire DBAs

The Real Decision Framework

Use cloud when:

Team is small and ops bandwidth is limited — managed services are worth the premium
Traffic is spiky or unpredictable — auto-scaling and pay-per-use matter
You need global regions quickly
You’re early and burning runway — time-to-market beats cost optimization
You need specific managed services (ML, analytics, compliance tools)

Use VPS when:

Steady, predictable load
You have ops bandwidth to manage your own infra
Cost matters — common at Series A and beyond
You want simplicity without the complexity of cloud primitives

Use bare metal when:

High, sustained compute need (ML training, video transcoding, large databases)
Your team has infrastructure engineering capacity
You’ve verified the workload — no over-provisioning on unused capacity
Per-core performance matters (no virtualization overhead)

Spot / Preemptible Instances

Cloud providers sell excess capacity at 70-90% discount as spot (AWS) or preemptible (GCP) instances — but they can be terminated with 2 minutes warning.

Good uses:

Stateless workers pulling from a queue (a terminated worker just loses its current job, which retries)
Batch processing jobs that checkpoint progress
CI/CD runners

Bad uses:

Primary database — termination mid-write causes corruption
Stateful services with no fast failover
Jobs longer than 2 minutes without checkpointing

# Kubernetes: mix of on-demand and spot
nodeGroups:
  - name: on-demand
    instanceType: m5.xlarge
    minSize: 2 # always-on baseline
    maxSize: 10

  - name: spot
    instanceTypes: [m5.xlarge, m5.2xlarge, m4.xlarge]
    spot: true
    minSize: 0
    maxSize: 20 # burst on spot
    taints:
      - key: spot
        effect: NoSchedule

Rightsizing in Practice

Most teams overprovision by 2-4x. Measure first:

# AWS Cost Explorer: rightsizing recommendations
aws ce get-rightsizing-recommendation \
  --service EC2 \
  --configuration '{"RecommendationTarget": "SAME_INSTANCE_FAMILY"}'

# Actual CPU use across your fleet
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --period 3600 \
  --statistics Average Maximum \
  --dimensions Name=InstanceId,Value=i-xxxx

If your instances run at 10-15% CPU, you’re massively overprovisioned. Downsize or pack more workloads per instance.

Packing workloads (multi-tenancy on a single host):

Instead of: 4 × m5.xlarge (4 vCPU, 16GB each) for 4 services at 20% CPU
Consider:   1 × m5.4xlarge (16 vCPU, 64GB) running all 4 services

Cost: ~4x cheaper, same total resources
Trade-off: one noisy neighbor affects all, less blast radius isolation

Reserved Instance Math

If you can predict your baseline compute need, commit to 1 or 3-year reserved instances:

On-demand m5.xlarge: $0.192/hr = $140/month
1-year reserved:     $0.122/hr = $89/month  (35% savings)
3-year reserved:     $0.077/hr = $56/month  (60% savings)

Annual savings on a single instance: $612 (1yr) or $1,008 (3yr)
For 10 instances: $6,120 or $10,080/year

Buy reserved instances for your steady-state baseline. Use on-demand or spot for burst above baseline.

Cost Per Request as a North Star

Instead of tracking raw cloud spend, track cost per 1000 requests (CPR):

Monthly spend: $5,000
Monthly requests: 50,000,000

CPR = $5,000 / 50,000 (thousands) = $0.10 per 1000 requests

As you scale, CPR should decrease (economies of scale). If CPR is flat or rising, your architecture isn’t scaling efficiently — investigate query costs, caching, or instance types.

// Dashboard metric
const costPerThousandRequests = totalMonthlyCostUsd / (totalMonthlyRequests / 1000);

Track this weekly. It tells you whether your infrastructure spend is growing proportionally to usage (expected) or faster (a problem).