Skip to content
← Capacity & Cost · intermediate · 10 min · 02 / 06

Cloud vs Bare Metal vs VPS

Unit economics for each deployment model — when managed convenience costs more than it saves, and when it doesn't.

cloudbare metalVPSunit economicsTCO

Real-World Analogy

Renting vs buying a car: renting (cloud) costs more per mile but you can return it when you don’t need it, get a bigger one for a road trip, and never worry about maintenance. Buying (bare metal) is cheaper per mile if you drive a lot, but you’re on the hook for repairs, and it sits depreciating when unused.

The Three Models

Cloud (AWS, GCP, Azure):

  • Pay per hour of use — no upfront cost
  • Provision in minutes, deprovision immediately
  • Managed services for everything (RDS, ElastiCache, S3)
  • Premium pricing: 3-5x the cost of equivalent bare metal

VPS (Hetzner, Linode, DigitalOcean, Vultr):

  • Fixed monthly cost for a virtual machine
  • Good managed add-ons (managed Postgres, load balancers)
  • 60-80% cheaper than AWS for equivalent specs
  • No spot instances, limited auto-scaling

Bare Metal (Hetzner Dedicated, OVH, Equinix):

  • Physical server, rented or owned
  • Cheapest per-core and per-GB-RAM at scale
  • No virtualization overhead
  • Lead time to provision: days to weeks
  • You manage everything at the OS level

Unit Cost Comparison

Comparing a roughly equivalent 8-core / 32GB RAM setup (2024 pricing):

ProviderTypeMonthly CostNotes
AWS (m7g.2xlarge)Cloud~$230 on-demandMore with data transfer
AWS (m7g.2xlarge)Reserved 1yr~$140Commit upfront
Hetzner CX52VPS~$55ARM-based, EU/US regions
DigitalOceanVPS~$96More regions, better support
Hetzner AX102Bare Metal~$9014-core, 64GB, NVMe

The same workload that costs $2,000/month on AWS on-demand runs for $500-600/month on Hetzner. The gap grows with data transfer and managed service costs.

True Cost of Cloud

Cloud bills have multipliers that the headline instance price hides:

Data transfer (egress):

AWS: $0.09/GB out to internet
At 10TB/month: $900/month just for egress

Hetzner: 20TB included in VPS plans, $1/TB after
Cloudflare (for static): free egress

Managed services premium:

RDS db.t3.medium (2 vCPU, 4GB):
  AWS RDS:        $60/month
  Self-hosted Postgres on $6 VPS: $6/month

ElastiCache cache.t3.micro (1 vCPU, 0.5GB):
  AWS ElastiCache: $25/month
  Self-hosted Redis on shared VPS: ~$5/month amortized

Operational overhead of self-hosting:

Self-hosted Postgres:
  Backup setup: 4 hours
  Monitoring setup: 4 hours
  Ongoing ops: 1-2 hours/month

RDS buys back this time — worth it until you're large enough to hire DBAs

The Real Decision Framework

Use cloud when:

  • Team is small and ops bandwidth is limited — managed services are worth the premium
  • Traffic is spiky or unpredictable — auto-scaling and pay-per-use matter
  • You need global regions quickly
  • You’re early and burning runway — time-to-market beats cost optimization
  • You need specific managed services (ML, analytics, compliance tools)

Use VPS when:

  • Steady, predictable load
  • You have ops bandwidth to manage your own infra
  • Cost matters — common at Series A and beyond
  • You want simplicity without the complexity of cloud primitives

Use bare metal when:

  • High, sustained compute need (ML training, video transcoding, large databases)
  • Your team has infrastructure engineering capacity
  • You’ve verified the workload — no over-provisioning on unused capacity
  • Per-core performance matters (no virtualization overhead)

Spot / Preemptible Instances

Cloud providers sell excess capacity at 70-90% discount as spot (AWS) or preemptible (GCP) instances — but they can be terminated with 2 minutes warning.

Good uses:

  • Stateless workers pulling from a queue (a terminated worker just loses its current job, which retries)
  • Batch processing jobs that checkpoint progress
  • CI/CD runners

Bad uses:

  • Primary database — termination mid-write causes corruption
  • Stateful services with no fast failover
  • Jobs longer than 2 minutes without checkpointing
# Kubernetes: mix of on-demand and spot
nodeGroups:
  - name: on-demand
    instanceType: m5.xlarge
    minSize: 2         # always-on baseline
    maxSize: 10

  - name: spot
    instanceTypes: [m5.xlarge, m5.2xlarge, m4.xlarge]
    spot: true
    minSize: 0
    maxSize: 20        # burst on spot
    taints:
      - key: spot
        effect: NoSchedule

Rightsizing in Practice

Most teams overprovision by 2-4x. Measure first:

# AWS Cost Explorer: rightsizing recommendations
aws ce get-rightsizing-recommendation \
  --service EC2 \
  --configuration '{"RecommendationTarget": "SAME_INSTANCE_FAMILY"}'

# Actual CPU use across your fleet
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --period 3600 \
  --statistics Average Maximum \
  --dimensions Name=InstanceId,Value=i-xxxx

If your instances run at 10-15% CPU, you’re massively overprovisioned. Downsize or pack more workloads per instance.

Packing workloads (multi-tenancy on a single host):

Instead of: 4 × m5.xlarge (4 vCPU, 16GB each) for 4 services at 20% CPU
Consider:   1 × m5.4xlarge (16 vCPU, 64GB) running all 4 services

Cost: ~4x cheaper, same total resources
Trade-off: one noisy neighbor affects all, less blast radius isolation

Reserved Instance Math

If you can predict your baseline compute need, commit to 1 or 3-year reserved instances:

On-demand m5.xlarge: $0.192/hr = $140/month
1-year reserved:     $0.122/hr = $89/month  (35% savings)
3-year reserved:     $0.077/hr = $56/month  (60% savings)

Annual savings on a single instance: $612 (1yr) or $1,008 (3yr)
For 10 instances: $6,120 or $10,080/year

Buy reserved instances for your steady-state baseline. Use on-demand or spot for burst above baseline.

Cost Per Request as a North Star

Instead of tracking raw cloud spend, track cost per 1000 requests (CPR):

Monthly spend: $5,000
Monthly requests: 50,000,000

CPR = $5,000 / 50,000 (thousands) = $0.10 per 1000 requests

As you scale, CPR should decrease (economies of scale). If CPR is flat or rising, your architecture isn’t scaling efficiently — investigate query costs, caching, or instance types.

// Dashboard metric
const costPerThousandRequests =
  totalMonthlyCostUsd / (totalMonthlyRequests / 1000);

Track this weekly. It tells you whether your infrastructure spend is growing proportionally to usage (expected) or faster (a problem).