Skip to content
← Storage · beginner · 8 min · 01 / 06

Storage Primitives

Block, file, and object storage — what each abstraction provides, where each breaks down, and how to choose.

block storagefile storageobject storageNFSS3storage architecture

Real-World Analogy

Three ways to store physical documents: block storage is blank notebooks — raw pages, you organize them however you want. File storage is a filing cabinet — pre-organized into folders and drawers, multiple people can open it at once. Object storage is a flat warehouse with numbered bins — each bin has a label, you fetch the whole bin at once, no filing system, infinite warehouse.

Block Storage

Raw, addressable storage presented as a disk. The operating system handles the filesystem on top.

Application

Filesystem (ext4, XFS, APFS)

Block device (/dev/sda, /dev/nvme0n1)

Physical disk or network-attached block (EBS, Longhorn, Ceph)

Characteristics:

  • Low-level — the OS decides how to organize data
  • Random access — seek to any byte position
  • Single-attach — one server mounts a block device at a time (with exceptions)
  • Fastest — databases, VMs, OS volumes
  • Not inherently sharable — can’t mount the same EBS volume on two EC2 instances simultaneously

Use for: databases (Postgres, MySQL, MongoDB data directories), VM images, OS volumes.

File Storage (Network Filesystem)

A filesystem shared over a network — multiple servers mount the same directory and see the same files.

Server A              Server B
    \                 /
     NFS / SMB / CIFS
         |
    Fileserver (Synology, EFS, Azure Files)

Characteristics:

  • Familiar filesystem API (open, read, write, ls)
  • Multi-mount — many servers read/write simultaneously
  • Distributed locking (with caveats — file locking over NFS is unreliable)
  • Slower than local block (network RTT on every operation)
  • Not for databases — POSIX compliance gaps cause corruption

Use for: shared configuration files, legacy application data that expects a filesystem, media files accessed by a rendering cluster, WordPress uploads shared across app servers.

NFS quick setup:

# Server
apt install nfs-kernel-server
mkdir /exports/shared
echo "/exports/shared 10.0.0.0/24(rw,sync,no_subtree_check)" >> /etc/exports
exportfs -a

# Client
apt install nfs-common
mount -t nfs 10.0.0.10:/exports/shared /mnt/shared

# Add to /etc/fstab for persistence
10.0.0.10:/exports/shared /mnt/shared nfs defaults,_netdev 0 0

Object Storage

Files stored as flat objects (key → binary blob). No hierarchy, no random writes — get and put entire objects.

PUT /bucket/user-avatars/user-123.jpg    (store)
GET /bucket/user-avatars/user-123.jpg    (retrieve)
DELETE /bucket/user-avatars/user-123.jpg (delete)

Characteristics:

  • Infinite scale — no capacity planning, no “disk full”
  • HTTP API — presigned URLs, direct browser upload
  • Eventual consistency (now strongly consistent on AWS S3)
  • Immutable by default — can’t append to an object, must replace it
  • Cheap — $0.023/GB/month on S3 vs $0.10/GB/month for EBS
  • Not a filesystem — no ls that scales, no random writes, no POSIX

Use for: user uploads (images, documents), media files, backups, static assets, logs, large datasets, database dumps.

Comparison

BlockFile (NFS)Object (S3)
AbstractionRaw diskFilesystemKey-value
Access patternRandom read/writeSequential + randomFull-object GET/PUT
Multi-mountNo (mostly)YesYes (HTTP)
ScaleFinite (volume size)Finite (NAS capacity)Effectively infinite
SpeedFastestMediumMedium (HTTP overhead)
CostHighestMediumCheapest
Use caseDatabases, VMsLegacy apps, shared FSUser files, media, backups

Choosing in Practice

User avatar uploads: object storage. Cheap, scales to millions, serve via CDN, presigned URLs for direct upload from browser.

Database data directory: block storage. Must be a real filesystem, must support random I/O.

Shared config across app servers: NFS or object storage. For small files read at startup: object storage. For files the app writes to and expects filesystem semantics: NFS.

Logs: object storage. Write sequentially, read rarely, keep for months, cheap.

Video files: object storage. Large, immutable, serve via CDN with range requests.

Temp files for a job: local disk or ephemeral block storage. Cheap, fast, throw away when done.

The Problem with Local Disk in Distributed Systems

Server 1: user uploads avatar.jpg → stored at /data/uploads/avatar.jpg
Server 2: user requests avatar.jpg → 404 (the file is on server 1)

Every stateless server instance must reach the same storage. Local disk breaks horizontal scaling. Object storage solves this — all servers make HTTP calls to the same endpoint.

// BAD — local disk, breaks with multiple servers
import fs from 'fs/promises';

async function saveAvatar(userId: string, buffer: Buffer) {
  await fs.writeFile(`/data/uploads/${userId}.jpg`, buffer);
}

// GOOD — object storage, works across any number of servers
import { S3Client, PutObjectCommand } from '@aws-sdk/client-s3';

const s3 = new S3Client({ region: 'us-east-1' });

async function saveAvatar(userId: string, buffer: Buffer) {
  await s3.send(new PutObjectCommand({
    Bucket: 'my-uploads',
    Key: `avatars/${userId}.jpg`,
    Body: buffer,
    ContentType: 'image/jpeg',
  }));
}

POSIX vs S3 Semantics

File operations you can do on a filesystem that don’t work on S3:

# POSIX — works on local disk and NFS
flock -x /data/file.lock     # file locking
tail -f /data/app.log        # append and follow
find /data -name "*.log"     # recursive directory listing
sed -i 's/old/new/' /data/config  # in-place edit

# S3 — none of these work
# Must download the entire object, modify, re-upload

If your application does any of these, it can’t use S3 directly. Use a local disk or NFS for that specific workload.

Storage Tiers

Cloud object storage offers multiple tiers at different price/access trade-offs:

S3 Standard       $0.023/GB   — frequently accessed data
S3 Standard-IA    $0.0125/GB  — infrequent access, retrieval fee
S3 Glacier        $0.004/GB   — archive, hours to retrieve
S3 Glacier Deep   $0.00099/GB — rare access, up to 12h retrieval

Use lifecycle policies to move data automatically:
  Logs → Standard (1 day) → IA (30 days) → Glacier (90 days) → delete (1 year)
{
  "Rules": [{
    "Status": "Enabled",
    "Transitions": [
      { "Days": 30, "StorageClass": "STANDARD_IA" },
      { "Days": 90, "StorageClass": "GLACIER" }
    ],
    "Expiration": { "Days": 365 }
  }]
}