Skip to content
← System Design · advanced · 25 min · 17 / 26

Microservices Patterns

Implement the saga pattern, circuit breakers, service discovery, and distributed transactions.

saga patterncircuit breakerservice discoverydistributed transactions

What are Microservices Patterns?

When you break a monolith into microservices, you trade one set of problems for another. A single database transaction that used to be atomic now spans multiple services. A service call that used to be a function call can now fail due to network issues. Microservices patterns are battle-tested solutions to these distributed systems challenges – the saga pattern for distributed transactions, circuit breakers for fault tolerance, and service discovery for dynamic routing.

Think of it like an orchestra. In a small band, everyone can see each other and stay in sync. But in a 100-piece orchestra, you need a conductor (saga orchestrator) to coordinate, section leaders (circuit breakers) to handle individual failures gracefully, and a seating chart (service registry) so everyone knows where to find each other.

Saga Orchestration Pattern
API Gateway
Entry Point
--->
Saga Orchestrator
Coordinator
v
Order Service
Step 1
--->
Inventory Service
Step 2
--->
Payment Service
Step 3

Real-World Analogy

Real-World Analogy

Like a shopping mall — instead of one mega-store, there are specialized shops (clothing, electronics, food), each independently run with their own staff and inventory.

When you order on Amazon, a single “Place Order” click triggers a multi-step saga across services: the order service creates the order, the inventory service reserves the items, and the payment service charges your card. If payment fails, the saga runs compensating transactions in reverse – unreserving inventory and cancelling the order. Netflix uses circuit breakers so that when their recommendation service goes down, the homepage still loads with a default list instead of showing an error page.

Building a Saga Orchestrator

Here’s a complete saga orchestrator with circuit breakers, service discovery, retry with exponential backoff, and compensating transactions. This implements the full order flow used by companies like Uber and Amazon.

// --- Types ---
type SagaStatus = "pending" | "running" | "completed" | "compensating" | "failed";
type StepStatus = "pending" | "success" | "failed" | "compensated";

interface SagaStep {
  name: string;
  execute: (context: Record<string, unknown>) => Promise<Record<string, unknown>>;
  compensate: (context: Record<string, unknown>) => Promise<void>;
}

interface SagaState {
  id: string;
  status: SagaStatus;
  steps: { name: string; status: StepStatus; error?: string }[];
  context: Record<string, unknown>;
  startedAt: number;
  completedAt?: number;
}

// --- Circuit Breaker ---
enum CircuitState {
  CLOSED = "CLOSED",
  OPEN = "OPEN",
  HALF_OPEN = "HALF_OPEN",
}

class CircuitBreaker {
  private state: CircuitState = CircuitState.CLOSED;
  private failureCount: number = 0;
  private lastFailureTime: number = 0;
  private successCount: number = 0;

  constructor(
    private readonly name: string,
    private readonly failureThreshold: number = 5,
    private readonly resetTimeoutMs: number = 30_000,
    private readonly halfOpenMaxAttempts: number = 3
  ) {}

  async call<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === CircuitState.OPEN) {
      if (Date.now() - this.lastFailureTime > this.resetTimeoutMs) {
        this.state = CircuitState.HALF_OPEN;
        this.successCount = 0;
        console.log(`[CIRCUIT:${this.name}] OPEN -> HALF_OPEN`);
      } else {
        throw new Error(`Circuit breaker ${this.name} is OPEN`);
      }
    }

    try {
      const result = await fn();

      if (this.state === CircuitState.HALF_OPEN) {
        this.successCount++;
        if (this.successCount >= this.halfOpenMaxAttempts) {
          this.state = CircuitState.CLOSED;
          this.failureCount = 0;
          console.log(`[CIRCUIT:${this.name}] HALF_OPEN -> CLOSED`);
        }
      } else {
        this.failureCount = 0;
      }

      return result;
    } catch (error) {
      this.failureCount++;
      this.lastFailureTime = Date.now();

      if (this.failureCount >= this.failureThreshold) {
        this.state = CircuitState.OPEN;
        console.log(`[CIRCUIT:${this.name}] -> OPEN (failures: ${this.failureCount})`);
      }

      throw error;
    }
  }

  getState(): CircuitState {
    return this.state;
  }
}

// --- Service Registry ---
interface ServiceInstance {
  id: string;
  name: string;
  url: string;
  healthy: boolean;
  lastHealthCheck: number;
}

class ServiceRegistry {
  private services = new Map<string, ServiceInstance[]>();
  private healthCheckInterval: ReturnType<typeof setInterval> | null = null;

  register(name: string, url: string): string {
    const id = `${name}-${crypto.randomUUID().slice(0, 8)}`;
    const instance: ServiceInstance = {
      id,
      name,
      url,
      healthy: true,
      lastHealthCheck: Date.now(),
    };

    if (!this.services.has(name)) {
      this.services.set(name, []);
    }
    this.services.get(name)!.push(instance);
    console.log(`[REGISTRY] Registered ${name} at ${url} (id: ${id})`);
    return id;
  }

  deregister(id: string): void {
    for (const [name, instances] of this.services) {
      const idx = instances.findIndex((i) => i.id === id);
      if (idx >= 0) {
        instances.splice(idx, 1);
        console.log(`[REGISTRY] Deregistered ${id} from ${name}`);
        if (instances.length === 0) this.services.delete(name);
        return;
      }
    }
  }

  resolve(name: string): ServiceInstance | null {
    const instances = this.services.get(name);
    if (!instances || instances.length === 0) return null;

    // Round-robin among healthy instances
    const healthy = instances.filter((i) => i.healthy);
    if (healthy.length === 0) return null;

    const idx = Math.floor(Math.random() * healthy.length);
    return healthy[idx];
  }

  markUnhealthy(id: string): void {
    for (const instances of this.services.values()) {
      const instance = instances.find((i) => i.id === id);
      if (instance) {
        instance.healthy = false;
        console.log(`[REGISTRY] Marked ${id} as unhealthy`);
        return;
      }
    }
  }

  startHealthChecks(intervalMs: number = 10_000): void {
    this.healthCheckInterval = setInterval(() => {
      for (const instances of this.services.values()) {
        for (const instance of instances) {
          instance.lastHealthCheck = Date.now();
          // In production, you'd make an HTTP call to instance.url/health
          console.log(`[HEALTH] Checking ${instance.id}: ${instance.healthy ? "UP" : "DOWN"}`);
        }
      }
    }, intervalMs);
  }

  stopHealthChecks(): void {
    if (this.healthCheckInterval) clearInterval(this.healthCheckInterval);
  }
}

// --- Retry with exponential backoff ---
async function retryWithBackoff<T>(
  fn: () => Promise<T>,
  maxRetries: number = 3,
  baseDelayMs: number = 1000
): Promise<T> {
  let lastError: Error | undefined;

  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await fn();
    } catch (error) {
      lastError = error instanceof Error ? error : new Error(String(error));

      if (attempt === maxRetries) break;

      const delay = baseDelayMs * Math.pow(2, attempt) + Math.random() * 1000;
      console.log(`[RETRY] Attempt ${attempt + 1}/${maxRetries} failed, retrying in ${Math.round(delay)}ms`);
      await new Promise((r) => setTimeout(r, delay));
    }
  }

  throw lastError;
}

// --- Saga Orchestrator ---
class SagaOrchestrator {
  private sagas = new Map<string, SagaState>();
  private circuitBreakers = new Map<string, CircuitBreaker>();

  constructor(private registry: ServiceRegistry) {}

  private getBreaker(name: string): CircuitBreaker {
    if (!this.circuitBreakers.has(name)) {
      this.circuitBreakers.set(name, new CircuitBreaker(name));
    }
    return this.circuitBreakers.get(name)!;
  }

  async execute(sagaId: string, steps: SagaStep[], initialContext: Record<string, unknown> = {}): Promise<SagaState> {
    const state: SagaState = {
      id: sagaId,
      status: "running",
      steps: steps.map((s) => ({ name: s.name, status: "pending" as StepStatus })),
      context: { ...initialContext },
      startedAt: Date.now(),
    };

    this.sagas.set(sagaId, state);
    console.log(`\n[SAGA:${sagaId}] Starting saga with ${steps.length} steps`);

    let completedSteps: number = 0;

    for (let i = 0; i < steps.length; i++) {
      const step = steps[i];
      const breaker = this.getBreaker(step.name);

      console.log(`[SAGA:${sagaId}] Step ${i + 1}/${steps.length}: ${step.name} -> EXECUTING`);

      try {
        const result = await retryWithBackoff(() =>
          breaker.call(() => step.execute(state.context))
        );

        state.context = { ...state.context, ...result };
        state.steps[i].status = "success";
        completedSteps = i + 1;
        console.log(`[SAGA:${sagaId}] Step ${step.name} -> SUCCESS`);
      } catch (error) {
        const errMsg = error instanceof Error ? error.message : String(error);
        state.steps[i].status = "failed";
        state.steps[i].error = errMsg;
        console.log(`[SAGA:${sagaId}] Step ${step.name} -> FAILED: ${errMsg}`);

        // Start compensation
        state.status = "compensating";
        console.log(`[SAGA:${sagaId}] Starting compensation for ${completedSteps} completed steps`);

        for (let j = completedSteps - 1; j >= 0; j--) {
          const compStep = steps[j];
          try {
            console.log(`[SAGA:${sagaId}] Compensating: ${compStep.name}`);
            await compStep.compensate(state.context);
            state.steps[j].status = "compensated";
            console.log(`[SAGA:${sagaId}] ${compStep.name} -> COMPENSATED`);
          } catch (compError) {
            const compErrMsg = compError instanceof Error ? compError.message : String(compError);
            console.error(`[SAGA:${sagaId}] Compensation FAILED for ${compStep.name}: ${compErrMsg}`);
            // In production, alert and queue for manual intervention
          }
        }

        state.status = "failed";
        state.completedAt = Date.now();
        console.log(`[SAGA:${sagaId}] Saga FAILED (took ${state.completedAt - state.startedAt}ms)`);
        return state;
      }
    }

    state.status = "completed";
    state.completedAt = Date.now();
    console.log(`[SAGA:${sagaId}] Saga COMPLETED (took ${state.completedAt - state.startedAt}ms)\n`);
    return state;
  }

  getState(sagaId: string): SagaState | undefined {
    return this.sagas.get(sagaId);
  }
}

// --- Define the Order Saga steps ---
function createOrderSagaSteps(): SagaStep[] {
  return [
    {
      name: "CreateOrder",
      execute: async (ctx) => {
        console.log(`  Creating order for user ${ctx.userId}, items: ${JSON.stringify(ctx.items)}`);
        const orderId = `ORD-${Date.now()}`;
        return { orderId, orderStatus: "created" };
      },
      compensate: async (ctx) => {
        console.log(`  Cancelling order ${ctx.orderId}`);
        // Mark order as cancelled in DB
      },
    },
    {
      name: "ReserveInventory",
      execute: async (ctx) => {
        console.log(`  Reserving inventory for order ${ctx.orderId}`);
        const items = ctx.items as Array<{ sku: string; qty: number }>;
        for (const item of items) {
          console.log(`    Reserving ${item.qty}x ${item.sku}`);
        }
        return { inventoryReserved: true, reservationId: `RES-${Date.now()}` };
      },
      compensate: async (ctx) => {
        console.log(`  Releasing inventory reservation ${ctx.reservationId}`);
      },
    },
    {
      name: "ChargePayment",
      execute: async (ctx) => {
        console.log(`  Charging payment for order ${ctx.orderId}, amount: $${ctx.amount}`);
        // Simulate payment failure for amounts > 1000
        if ((ctx.amount as number) > 1000) {
          throw new Error("Payment declined: insufficient funds");
        }
        return { paymentId: `PAY-${Date.now()}`, charged: true };
      },
      compensate: async (ctx) => {
        console.log(`  Refunding payment ${ctx.paymentId}`);
      },
    },
    {
      name: "ConfirmOrder",
      execute: async (ctx) => {
        console.log(`  Confirming order ${ctx.orderId} (payment: ${ctx.paymentId})`);
        return { orderStatus: "confirmed", confirmedAt: new Date().toISOString() };
      },
      compensate: async (ctx) => {
        console.log(`  Reverting order ${ctx.orderId} confirmation`);
      },
    },
  ];
}

// --- Demo execution ---
async function main(): Promise<void> {
  const registry = new ServiceRegistry();
  registry.register("order-service", "http://localhost:3001");
  registry.register("inventory-service", "http://localhost:3002");
  registry.register("payment-service", "http://localhost:3003");

  const orchestrator = new SagaOrchestrator(registry);
  const steps = createOrderSagaSteps();

  // Successful order
  console.log("=== Scenario 1: Successful Order ===");
  await orchestrator.execute("saga-001", steps, {
    userId: "user-42",
    items: [{ sku: "WIDGET-A", qty: 2 }, { sku: "GADGET-B", qty: 1 }],
    amount: 99.99,
  });

  // Failed order (payment declined) -- triggers compensation
  console.log("=== Scenario 2: Payment Failure with Rollback ===");
  await orchestrator.execute("saga-002", steps, {
    userId: "user-42",
    items: [{ sku: "EXPENSIVE-ITEM", qty: 1 }],
    amount: 5000,
  });

  registry.stopHealthChecks();
}

main().catch(console.error);
package main

import (
	"fmt"
	"math"
	"math/rand"
	"sync"
	"time"
)

// --- Types ---
type SagaStatus string
type StepStatus string

const (
	StatusPending      SagaStatus = "pending"
	StatusRunning      SagaStatus = "running"
	StatusCompleted    SagaStatus = "completed"
	StatusCompensating SagaStatus = "compensating"
	StatusFailed       SagaStatus = "failed"

	StepPending     StepStatus = "pending"
	StepSuccess     StepStatus = "success"
	StepFailed      StepStatus = "failed"
	StepCompensated StepStatus = "compensated"
)

type SagaStep struct {
	Name       string
	Execute    func(ctx map[string]interface{}) (map[string]interface{}, error)
	Compensate func(ctx map[string]interface{}) error
}

type StepState struct {
	Name   string     `json:"name"`
	Status StepStatus `json:"status"`
	Error  string     `json:"error,omitempty"`
}

type SagaState struct {
	ID          string                 `json:"id"`
	Status      SagaStatus             `json:"status"`
	Steps       []StepState            `json:"steps"`
	Context     map[string]interface{} `json:"context"`
	StartedAt   time.Time              `json:"startedAt"`
	CompletedAt *time.Time             `json:"completedAt,omitempty"`
}

// --- Circuit Breaker ---
type CircuitState int

const (
	CircuitClosed CircuitState = iota
	CircuitOpen
	CircuitHalfOpen
)

func (cs CircuitState) String() string {
	switch cs {
	case CircuitClosed:
		return "CLOSED"
	case CircuitOpen:
		return "OPEN"
	case CircuitHalfOpen:
		return "HALF_OPEN"
	default:
		return "UNKNOWN"
	}
}

type CircuitBreaker struct {
	mu               sync.Mutex
	name             string
	state            CircuitState
	failureCount     int
	successCount     int
	failureThreshold int
	resetTimeout     time.Duration
	lastFailureTime  time.Time
	halfOpenMax      int
}

func NewCircuitBreaker(name string, threshold int, resetTimeout time.Duration) *CircuitBreaker {
	return &CircuitBreaker{
		name:             name,
		state:            CircuitClosed,
		failureThreshold: threshold,
		resetTimeout:     resetTimeout,
		halfOpenMax:      3,
	}
}

func (cb *CircuitBreaker) Call(fn func() (map[string]interface{}, error)) (map[string]interface{}, error) {
	cb.mu.Lock()

	if cb.state == CircuitOpen {
		if time.Since(cb.lastFailureTime) > cb.resetTimeout {
			cb.state = CircuitHalfOpen
			cb.successCount = 0
			fmt.Printf("[CIRCUIT:%s] OPEN -> HALF_OPEN\n", cb.name)
		} else {
			cb.mu.Unlock()
			return nil, fmt.Errorf("circuit breaker %s is OPEN", cb.name)
		}
	}
	currentState := cb.state
	cb.mu.Unlock()

	result, err := fn()

	cb.mu.Lock()
	defer cb.mu.Unlock()

	if err != nil {
		cb.failureCount++
		cb.lastFailureTime = time.Now()
		if cb.failureCount >= cb.failureThreshold {
			cb.state = CircuitOpen
			fmt.Printf("[CIRCUIT:%s] -> OPEN (failures: %d)\n", cb.name, cb.failureCount)
		}
		return nil, err
	}

	if currentState == CircuitHalfOpen {
		cb.successCount++
		if cb.successCount >= cb.halfOpenMax {
			cb.state = CircuitClosed
			cb.failureCount = 0
			fmt.Printf("[CIRCUIT:%s] HALF_OPEN -> CLOSED\n", cb.name)
		}
	} else {
		cb.failureCount = 0
	}

	return result, nil
}

// --- Service Registry ---
type ServiceInstance struct {
	ID              string `json:"id"`
	Name            string `json:"name"`
	URL             string `json:"url"`
	Healthy         bool   `json:"healthy"`
	LastHealthCheck time.Time
}

type ServiceRegistry struct {
	mu       sync.RWMutex
	services map[string][]ServiceInstance
	stopCh   chan struct{}
}

func NewServiceRegistry() *ServiceRegistry {
	return &ServiceRegistry{
		services: make(map[string][]ServiceInstance),
		stopCh:   make(chan struct{}),
	}
}

func (sr *ServiceRegistry) Register(name, url string) string {
	sr.mu.Lock()
	defer sr.mu.Unlock()

	id := fmt.Sprintf("%s-%d", name, time.Now().UnixNano()%100000)
	instance := ServiceInstance{
		ID:              id,
		Name:            name,
		URL:             url,
		Healthy:         true,
		LastHealthCheck: time.Now(),
	}
	sr.services[name] = append(sr.services[name], instance)
	fmt.Printf("[REGISTRY] Registered %s at %s (id: %s)\n", name, url, id)
	return id
}

func (sr *ServiceRegistry) Resolve(name string) (*ServiceInstance, error) {
	sr.mu.RLock()
	defer sr.mu.RUnlock()

	instances := sr.services[name]
	if len(instances) == 0 {
		return nil, fmt.Errorf("no instances for service %s", name)
	}

	var healthy []ServiceInstance
	for _, inst := range instances {
		if inst.Healthy {
			healthy = append(healthy, inst)
		}
	}
	if len(healthy) == 0 {
		return nil, fmt.Errorf("no healthy instances for service %s", name)
	}

	picked := healthy[rand.Intn(len(healthy))]
	return &picked, nil
}

func (sr *ServiceRegistry) StartHealthChecks(interval time.Duration) {
	go func() {
		ticker := time.NewTicker(interval)
		defer ticker.Stop()
		for {
			select {
			case <-ticker.C:
				sr.mu.RLock()
				for _, instances := range sr.services {
					for _, inst := range instances {
						status := "UP"
						if !inst.Healthy {
							status = "DOWN"
						}
						fmt.Printf("[HEALTH] Checking %s: %s\n", inst.ID, status)
					}
				}
				sr.mu.RUnlock()
			case <-sr.stopCh:
				return
			}
		}
	}()
}

func (sr *ServiceRegistry) Stop() {
	close(sr.stopCh)
}

// --- Retry with exponential backoff ---
func retryWithBackoff(fn func() (map[string]interface{}, error), maxRetries int, baseDelay time.Duration) (map[string]interface{}, error) {
	var lastErr error

	for attempt := 0; attempt <= maxRetries; attempt++ {
		result, err := fn()
		if err == nil {
			return result, nil
		}
		lastErr = err

		if attempt == maxRetries {
			break
		}

		delay := time.Duration(float64(baseDelay) * math.Pow(2, float64(attempt)))
		jitter := time.Duration(rand.Float64() * float64(time.Second))
		sleepTime := delay + jitter

		fmt.Printf("[RETRY] Attempt %d/%d failed, retrying in %v\n", attempt+1, maxRetries, sleepTime)
		time.Sleep(sleepTime)
	}

	return nil, lastErr
}

// --- Saga Orchestrator ---
type SagaOrchestrator struct {
	mu       sync.RWMutex
	sagas    map[string]*SagaState
	breakers map[string]*CircuitBreaker
	registry *ServiceRegistry
}

func NewSagaOrchestrator(registry *ServiceRegistry) *SagaOrchestrator {
	return &SagaOrchestrator{
		sagas:    make(map[string]*SagaState),
		breakers: make(map[string]*CircuitBreaker),
		registry: registry,
	}
}

func (so *SagaOrchestrator) getBreaker(name string) *CircuitBreaker {
	so.mu.Lock()
	defer so.mu.Unlock()
	if cb, ok := so.breakers[name]; ok {
		return cb
	}
	cb := NewCircuitBreaker(name, 5, 30*time.Second)
	so.breakers[name] = cb
	return cb
}

func (so *SagaOrchestrator) Execute(sagaID string, steps []SagaStep, initialCtx map[string]interface{}) *SagaState {
	state := &SagaState{
		ID:        sagaID,
		Status:    StatusRunning,
		Steps:     make([]StepState, len(steps)),
		Context:   copyMap(initialCtx),
		StartedAt: time.Now(),
	}
	for i, step := range steps {
		state.Steps[i] = StepState{Name: step.Name, Status: StepPending}
	}

	so.mu.Lock()
	so.sagas[sagaID] = state
	so.mu.Unlock()

	fmt.Printf("\n[SAGA:%s] Starting saga with %d steps\n", sagaID, len(steps))

	completedSteps := 0

	for i, step := range steps {
		breaker := so.getBreaker(step.Name)
		fmt.Printf("[SAGA:%s] Step %d/%d: %s -> EXECUTING\n", sagaID, i+1, len(steps), step.Name)

		result, err := retryWithBackoff(func() (map[string]interface{}, error) {
			return breaker.Call(func() (map[string]interface{}, error) {
				return step.Execute(state.Context)
			})
		}, 3, time.Second)

		if err != nil {
			state.Steps[i].Status = StepFailed
			state.Steps[i].Error = err.Error()
			fmt.Printf("[SAGA:%s] Step %s -> FAILED: %v\n", sagaID, step.Name, err)

			// Compensate
			state.Status = StatusCompensating
			fmt.Printf("[SAGA:%s] Starting compensation for %d completed steps\n", sagaID, completedSteps)

			for j := completedSteps - 1; j >= 0; j-- {
				compStep := steps[j]
				fmt.Printf("[SAGA:%s] Compensating: %s\n", sagaID, compStep.Name)
				if compErr := compStep.Compensate(state.Context); compErr != nil {
					fmt.Printf("[SAGA:%s] Compensation FAILED for %s: %v\n", sagaID, compStep.Name, compErr)
				} else {
					state.Steps[j].Status = StepCompensated
					fmt.Printf("[SAGA:%s] %s -> COMPENSATED\n", sagaID, compStep.Name)
				}
			}

			state.Status = StatusFailed
			now := time.Now()
			state.CompletedAt = &now
			fmt.Printf("[SAGA:%s] Saga FAILED (took %v)\n", sagaID, now.Sub(state.StartedAt))
			return state
		}

		// Merge result into context
		for k, v := range result {
			state.Context[k] = v
		}
		state.Steps[i].Status = StepSuccess
		completedSteps = i + 1
		fmt.Printf("[SAGA:%s] Step %s -> SUCCESS\n", sagaID, step.Name)
	}

	state.Status = StatusCompleted
	now := time.Now()
	state.CompletedAt = &now
	fmt.Printf("[SAGA:%s] Saga COMPLETED (took %v)\n\n", sagaID, now.Sub(state.StartedAt))
	return state
}

func copyMap(src map[string]interface{}) map[string]interface{} {
	dst := make(map[string]interface{}, len(src))
	for k, v := range src {
		dst[k] = v
	}
	return dst
}

// --- Define Order Saga steps ---
func createOrderSagaSteps() []SagaStep {
	return []SagaStep{
		{
			Name: "CreateOrder",
			Execute: func(ctx map[string]interface{}) (map[string]interface{}, error) {
				fmt.Printf("  Creating order for user %v\n", ctx["userId"])
				orderID := fmt.Sprintf("ORD-%d", time.Now().UnixMilli())
				return map[string]interface{}{"orderId": orderID, "orderStatus": "created"}, nil
			},
			Compensate: func(ctx map[string]interface{}) error {
				fmt.Printf("  Cancelling order %v\n", ctx["orderId"])
				return nil
			},
		},
		{
			Name: "ReserveInventory",
			Execute: func(ctx map[string]interface{}) (map[string]interface{}, error) {
				fmt.Printf("  Reserving inventory for order %v\n", ctx["orderId"])
				resID := fmt.Sprintf("RES-%d", time.Now().UnixMilli())
				return map[string]interface{}{"inventoryReserved": true, "reservationId": resID}, nil
			},
			Compensate: func(ctx map[string]interface{}) error {
				fmt.Printf("  Releasing inventory reservation %v\n", ctx["reservationId"])
				return nil
			},
		},
		{
			Name: "ChargePayment",
			Execute: func(ctx map[string]interface{}) (map[string]interface{}, error) {
				amount, _ := ctx["amount"].(float64)
				fmt.Printf("  Charging payment for order %v, amount: $%.2f\n", ctx["orderId"], amount)
				if amount > 1000 {
					return nil, fmt.Errorf("payment declined: insufficient funds")
				}
				payID := fmt.Sprintf("PAY-%d", time.Now().UnixMilli())
				return map[string]interface{}{"paymentId": payID, "charged": true}, nil
			},
			Compensate: func(ctx map[string]interface{}) error {
				fmt.Printf("  Refunding payment %v\n", ctx["paymentId"])
				return nil
			},
		},
		{
			Name: "ConfirmOrder",
			Execute: func(ctx map[string]interface{}) (map[string]interface{}, error) {
				fmt.Printf("  Confirming order %v (payment: %v)\n", ctx["orderId"], ctx["paymentId"])
				return map[string]interface{}{
					"orderStatus": "confirmed",
					"confirmedAt": time.Now().Format(time.RFC3339),
				}, nil
			},
			Compensate: func(ctx map[string]interface{}) error {
				fmt.Printf("  Reverting order %v confirmation\n", ctx["orderId"])
				return nil
			},
		},
	}
}

// --- Main ---
func main() {
	registry := NewServiceRegistry()
	registry.Register("order-service", "http://localhost:3001")
	registry.Register("inventory-service", "http://localhost:3002")
	registry.Register("payment-service", "http://localhost:3003")

	orchestrator := NewSagaOrchestrator(registry)
	steps := createOrderSagaSteps()

	// Successful order
	fmt.Println("=== Scenario 1: Successful Order ===")
	orchestrator.Execute("saga-001", steps, map[string]interface{}{
		"userId": "user-42",
		"items":  []map[string]interface{}{{"sku": "WIDGET-A", "qty": 2}, {"sku": "GADGET-B", "qty": 1}},
		"amount": 99.99,
	})

	// Failed order -- triggers compensation
	fmt.Println("=== Scenario 2: Payment Failure with Rollback ===")
	orchestrator.Execute("saga-002", steps, map[string]interface{}{
		"userId": "user-42",
		"items":  []map[string]interface{}{{"sku": "EXPENSIVE-ITEM", "qty": 1}},
		"amount": 5000.0,
	})

	registry.Stop()
}

What Makes This Production-Ready

  • Saga orchestration – coordinates multi-step distributed transactions with a clear state machine
  • Compensating transactions – automatic rollback in reverse order when any step fails
  • Circuit breakers – prevents cascading failures by short-circuiting calls to unhealthy services
  • Exponential backoff with jitter – retries failed calls without thundering herd problems
  • Service registry – dynamic service discovery with health checking for resilient routing
  • Full state tracking – every saga step transition is logged for debugging and auditing

Key Takeaways

  • The saga pattern replaces distributed transactions with a sequence of local transactions plus compensating actions
  • Always define compensating transactions for every saga step – they are your rollback mechanism
  • Circuit breakers prevent one failing service from taking down the entire system
  • Exponential backoff with jitter prevents retry storms that would overwhelm recovering services
  • Service discovery enables dynamic scaling – services can be added or removed without configuration changes
  • Log every state transition in your saga for debugging production issues

Real-World Usage

  • Uber uses saga orchestration for ride booking: match driver, authorize payment, start ride, with rollback at every step
  • Netflix pioneered the circuit breaker pattern with Hystrix to handle partial failures across 700+ microservices
  • Amazon uses sagas for order fulfillment across inventory, payment, and shipping services
  • Use sagas when you need consistency across services but cannot use a single database transaction