Best Microservices Design Patterns Every Dev Must Know

Microservices create distributed system problems that monoliths never face: cascading failures when one service goes down, data consistency challenges across service boundaries, and debugging complexity when requests traverse five services. Teams that build microservices without established design patterns spend months solving problems that have known solutions. The difference between a resilient microservices system and a fragile distributed monolith often comes down to implementing the right patterns at the right time.

This guide covers the essential microservices patterns that prevent common failure modes in production systems. You'll learn when to use each pattern, how to implement them in Node.js and other stacks, and what tradeoffs they involve. These patterns come from analyzing production systems at companies running microservices at scale — patterns that solve real problems, not theoretical exercises.

We'll cover communication patterns, resilience patterns, data management patterns, and deployment patterns that every microservices developer needs in their toolkit.

API Gateway Pattern

The API Gateway pattern provides a single entry point for all client requests to your microservices system. Instead of clients calling five different services directly, they call one gateway that routes requests to appropriate services. This centralizes cross-cutting concerns like authentication, rate limiting, request logging, and response caching that would otherwise need implementation in every service.

Without an API gateway, mobile apps need to know about every service's location and API contract. When you add a new service or change a service's API, you must update all clients. When you need to implement rate limiting, you implement it in every service. The gateway eliminates this duplication and client coupling.

The pattern works by placing a reverse proxy between clients and services. The gateway authenticates incoming requests, checks rate limits, then forwards requests to the appropriate backend service. It aggregates responses from multiple services if needed, returning a single response to the client.

Implementation Approaches

Cloud providers offer managed API gateway services: AWS API Gateway, Google Cloud API Gateway, Azure API Management. These handle routing, authentication, rate limiting, and monitoring without custom code. The tradeoff is vendor lock-in and potential cost at scale — managed gateways charge per million requests.

Open-source gateways like Kong, Tyk, or KrakenD provide more control and can run anywhere. You configure routing rules, authentication methods, and plugins through configuration files or APIs. These require managing infrastructure but avoid per-request pricing.

// Express-based API Gateway example
const express = require('express');
const httpProxy = require('http-proxy-middleware');

const app = express();

// Authentication middleware
app.use(authenticateRequest);

// Rate limiting middleware
app.use(rateLimit({ windowMs: 60000, max: 100 }));

// Route to user service
app.use('/api/users', httpProxy.createProxyMiddleware({
    target: 'http://user-service:3001',
    changeOrigin: true
}));

// Route to order service
app.use('/api/orders', httpProxy.createProxyMiddleware({
    target: 'http://order-service:3002',
    changeOrigin: true
}));

// Route to payment service
app.use('/api/payments', httpProxy.createProxyMiddleware({
    target: 'http://payment-service:3003',
    changeOrigin: true
}));

app.listen(3000);

Gateway Aggregation

API gateways can aggregate data from multiple services into a single response. A product detail page needs data from the product service, inventory service, and review service. Without aggregation, the client makes three requests. With aggregation, the gateway makes three backend requests and combines results into one response.

This reduces client-side complexity and mobile data usage, but adds latency if backend calls are sequential. Implement parallel requests where possible. If the three services don't depend on each other, fetch data concurrently using Promise.all().

// Gateway aggregation pattern
app.get('/api/products/:id/details', async (req, res) => {
    const productId = req.params.id;

    // Fetch from multiple services in parallel
    const [product, inventory, reviews] = await Promise.all([
        fetch(`http://product-service/api/products/${productId}`),
        fetch(`http://inventory-service/api/inventory/${productId}`),
        fetch(`http://review-service/api/reviews/${productId}`)
    ]);

    // Aggregate into single response
    res.json({
        product: await product.json(),
        inventory: await inventory.json(),
        reviews: await reviews.json()
    });
});

Warning:

Avoid building overly smart gateways that contain business logic. The gateway should route, authenticate, and aggregate — not make business decisions. If you find yourself writing complex logic in the gateway, that logic probably belongs in a service. Smart gateways become bottlenecks that slow down all feature development.

Circuit Breaker Pattern

The circuit breaker pattern prevents cascading failures when a service becomes unhealthy. When Service A calls Service B and Service B is down, naive retry logic hammers the failing service with requests, wasting resources and delaying error responses. The circuit breaker detects repeated failures and stops making calls, returning errors immediately until the service recovers.

The pattern works like an electrical circuit breaker. In the closed state, requests flow normally. When failures exceed a threshold (e.g., 50% of requests fail in 10 seconds), the breaker trips to the open state and immediately returns errors without calling the service. After a timeout period, it enters half-open state and tries a few requests. If those succeed, it closes again. If they fail, it opens again.

This prevents resource exhaustion. Without circuit breakers, threads wait for timeouts on calls to a dead service. With circuit breakers, failed calls return immediately, freeing threads for healthy requests. The system degrades gracefully rather than cascading failures across all services.

Implementation with opossum

Use libraries rather than implementing circuit breakers from scratch. In Node.js, opossum provides production-ready circuit breaker functionality. In Java, use Resilience4j or Netflix Hystrix. These handle the state machine logic, metrics collection, and configuration.

const CircuitBreaker = require('opossum');

// Function to call external service
async function callUserService(userId) {
    const response = await fetch(`http://user-service/api/users/${userId}`);
    if (!response.ok) throw new Error('User service error');
    return response.json();
}

// Wrap in circuit breaker
const breaker = new CircuitBreaker(callUserService, {
    timeout: 3000,        // Timeout after 3 seconds
    errorThresholdPercentage: 50,  // Open after 50% failures
    resetTimeout: 30000   // Try again after 30 seconds
});

// Use the circuit breaker
breaker.fire(userId)
    .then(user => res.json(user))
    .catch(err => {
        // Circuit open or service failed
        res.status(503).json({ error: 'User service unavailable' });
    });

// Monitor circuit breaker events
breaker.on('open', () => console.log('Circuit opened'));
breaker.on('halfOpen', () => console.log('Circuit half-open'));
breaker.on('close', () => console.log('Circuit closed'));

Fallback Strategies

When the circuit opens, return fallback responses instead of errors. If the recommendation service is down, return popular items instead of personalized recommendations. If the review service is down, return "Reviews temporarily unavailable" instead of failing the entire product page. Degrade gracefully.

Implement tiered fallbacks. First try the live service. If that fails, try a cache. If the cache is empty, return a default. This maximizes the chance of returning useful data even when dependencies fail.

async function getProductRecommendations(userId) {
    try {
        // Try live service
        return await breaker.fire(userId);
    } catch (error) {
        // Circuit open or service failed, try cache
        const cached = await cache.get(`recommendations:${userId}`);
        if (cached) return cached;

        // No cache, return popular items
        return getPopularItems();
    }
}

Service Discovery Pattern

Service discovery solves the problem of services finding each other in dynamic environments. When you deploy Service A, how does it know where Service B is running? In static environments, you hardcode URLs. In dynamic environments with auto-scaling and container orchestration, service locations change constantly.

The pattern uses a service registry that tracks which services are running and where. Services register themselves on startup and deregister on shutdown. Client services query the registry to find available instances. If a service instance fails, the registry removes it and clients automatically route to healthy instances.

Two main approaches exist: client-side discovery where services query the registry directly, and server-side discovery where a load balancer queries the registry. Kubernetes uses server-side discovery built into its service abstraction. Service mesh tools like Istio handle discovery transparently.

Using Consul for Service Discovery

Consul provides service registration, health checking, and DNS-based discovery. Services register themselves with Consul on startup. Consul performs health checks and removes unhealthy instances. Clients query Consul to find healthy service instances.

const Consul = require('consul');
const consul = new Consul();

// Register service on startup
async function registerService() {
    await consul.agent.service.register({
        name: 'user-service',
        id: 'user-service-1',
        address: 'localhost',
        port: 3001,
        check: {
            http: 'http://localhost:3001/health',
            interval: '10s'
        }
    });
}

// Discover service instances
async function findUserService() {
    const result = await consul.health.service({
        service: 'user-service',
        passing: true  // Only healthy instances
    });

    const instances = result.map(r => ({
        address: r.Service.Address,
        port: r.Service.Port
    }));

    // Round-robin or random selection
    return instances[Math.floor(Math.random() * instances.length)];
}

// Use discovered service
async function callUserService(userId) {
    const instance = await findUserService();
    const url = `http://${instance.address}:${instance.port}/api/users/${userId}`;
    return fetch(url);
}

Kubernetes Service Discovery

Kubernetes provides built-in service discovery through Services and DNS. Define a Service resource that selects pods by label. Kubernetes automatically creates DNS records and load balances traffic to healthy pods. Services discover each other using DNS names like user-service.default.svc.cluster.local.

This simplifies service discovery for Kubernetes deployments. No additional infrastructure needed. Services register automatically when pods start. Health checks come from pod readiness probes. DNS resolution is fast and cached locally.

Pro Tip:

If you're running on Kubernetes, use the built-in service discovery rather than adding Consul or Eureka. The additional complexity isn't justified unless you have multi-cluster requirements. For non-Kubernetes environments, Consul provides the most mature open-source service discovery solution.

Saga Pattern for Distributed Transactions

The Saga pattern handles transactions that span multiple services. Traditional ACID transactions don't work across service boundaries because each service owns its database. When an order requires updating inventory, processing payment, and creating a shipment record, you can't use a single database transaction.

A saga coordinates these operations through a sequence of local transactions. Each service performs its transaction and publishes an event. If a later step fails, earlier steps execute compensation transactions to rollback changes. This creates eventual consistency rather than immediate consistency.

Two implementation approaches exist: choreography where services react to events without central coordination, and orchestration where a coordinator service manages the workflow. Choreography is more decoupled but harder to monitor. Orchestration is easier to understand but creates a central point of failure.

Choreography-Based Saga

In choreography, services publish and subscribe to events without a coordinator. The order service publishes OrderCreated. The inventory service consumes it, reserves items, and publishes InventoryReserved. The payment service consumes that, charges the payment method, and publishes PaymentProcessed. Each service knows what to do when it receives specific events.

// Order Service: Start the saga
async function createOrder(orderData) {
    const order = await db.orders.create({
        ...orderData,
        status: 'PENDING'
    });

    await eventBus.publish('OrderCreated', {
        orderId: order.id,
        items: order.items,
        userId: order.userId,
        total: order.total
    });

    return order;
}

// Inventory Service: Reserve items
eventBus.subscribe('OrderCreated', async (event) => {
    try {
        await db.inventory.decrementStock(event.items);

        await eventBus.publish('InventoryReserved', {
            orderId: event.orderId,
            items: event.items
        });
    } catch (error) {
        // Compensation: Cancel order
        await eventBus.publish('InventoryReservationFailed', {
            orderId: event.orderId,
            reason: error.message
        });
    }
});

// Payment Service: Process payment
eventBus.subscribe('InventoryReserved', async (event) => {
    try {
        await processPayment(event.orderId);

        await eventBus.publish('PaymentProcessed', {
            orderId: event.orderId
        });
    } catch (error) {
        // Compensation: Release inventory
        await eventBus.publish('PaymentFailed', {
            orderId: event.orderId
        });
    }
});

// Order Service: Handle success
eventBus.subscribe('PaymentProcessed', async (event) => {
    await db.orders.update(event.orderId, {
        status: 'CONFIRMED'
    });
});

// Order Service: Handle failures
eventBus.subscribe('PaymentFailed', async (event) => {
    await db.orders.update(event.orderId, {
        status: 'CANCELLED'
    });
});

Orchestration-Based Saga

In orchestration, a saga orchestrator coordinates the workflow. It knows the sequence of steps and which service handles each. If a step fails, the orchestrator executes compensation steps in reverse order. This centralizes workflow logic, making it easier to understand and modify.

// Saga Orchestrator
class OrderSaga {
    async execute(orderData) {
        const sagaId = generateId();
        const steps = [];

        try {
            // Step 1: Create order
            const order = await orderService.create(orderData);
            steps.push({ service: 'order', action: 'create', orderId: order.id });

            // Step 2: Reserve inventory
            await inventoryService.reserve(order.items);
            steps.push({ service: 'inventory', action: 'reserve', items: order.items });

            // Step 3: Process payment
            await paymentService.charge(order.userId, order.total);
            steps.push({ service: 'payment', action: 'charge', userId: order.userId });

            // Step 4: Confirm order
            await orderService.confirm(order.id);

            return { success: true, orderId: order.id };

        } catch (error) {
            // Compensation: Rollback completed steps in reverse
            await this.compensate(steps.reverse());
            return { success: false, error: error.message };
        }
    }

    async compensate(steps) {
        for (const step of steps) {
            switch (step.service) {
                case 'payment':
                    await paymentService.refund(step.userId);
                    break;
                case 'inventory':
                    await inventoryService.release(step.items);
                    break;
                case 'order':
                    await orderService.cancel(step.orderId);
                    break;
            }
        }
    }
}

CQRS Pattern

Command Query Responsibility Segregation (CQRS) separates read and write operations into different models. Commands change state (create order, update user). Queries retrieve state (get order details, list users). By separating these concerns, you can optimize each independently.

The write model focuses on validation, business rules, and data integrity. The read model focuses on query performance and denormalized views optimized for specific use cases. For example, the write model maintains normalized order tables. The read model maintains a denormalized "order dashboard" view that joins data for fast rendering.

This pattern shines in systems with different read and write characteristics. An e-commerce system might have 100 reads for every write. CQRS lets you scale read replicas independently, use different databases optimized for each operation, and implement aggressive caching on the read side.

Basic CQRS Implementation

Start with separate code paths for commands and queries, even if they use the same database. Commands go through domain models with business logic. Queries go directly to database views optimized for reading. This establishes the pattern before adding infrastructure complexity.

// Command: Create order (goes through domain model)
class CreateOrderCommand {
    async execute(orderData) {
        // Business logic and validation
        if (orderData.items.length === 0) {
            throw new Error('Order must have items');
        }

        // Calculate total
        const total = orderData.items.reduce((sum, item) =>
            sum + (item.price * item.quantity), 0
        );

        // Write to database
        const order = await db.orders.create({
            ...orderData,
            total,
            status: 'PENDING',
            createdAt: new Date()
        });

        // Publish event for read model
        await eventBus.publish('OrderCreated', order);

        return order.id;
    }
}

// Query: Get order details (optimized read path)
class GetOrderQuery {
    async execute(orderId) {
        // Direct database query against denormalized view
        const order = await db.query(`
            SELECT
                o.id,
                o.status,
                o.total,
                o.created_at,
                u.name as customer_name,
                u.email as customer_email,
                json_agg(json_build_object(
                    'product', p.name,
                    'quantity', oi.quantity,
                    'price', oi.price
                )) as items
            FROM orders o
            JOIN users u ON o.user_id = u.id
            JOIN order_items oi ON oi.order_id = o.id
            JOIN products p ON oi.product_id = p.id
            WHERE o.id = $1
            GROUP BY o.id, u.name, u.email
        `, [orderId]);

        return order;
    }
}

Event Sourcing with CQRS

CQRS pairs naturally with event sourcing. Instead of storing current state, store all events that led to that state. The write model appends events. The read model rebuilds state by replaying events. This provides complete audit history and enables building multiple read models from the same event stream.

When an order is created, store an OrderCreated event. When it's confirmed, store an OrderConfirmed event. The current order state is derived by replaying these events. Different read models can subscribe to events and build optimized views: one for customer order history, one for warehouse fulfillment, one for accounting.

Backend for Frontend (BFF) Pattern

The BFF pattern creates separate backend services tailored to specific frontend needs. Instead of one API serving web, mobile, and third-party clients, create three separate BFFs. Each optimizes responses for its client: the mobile BFF returns minimal data to save bandwidth, the web BFF returns more detailed data, the partner API BFF includes authentication and rate limiting specific to external partners.

This prevents the problem of a bloated universal API that serves every use case poorly. A single API that serves both mobile and web apps ends up with optional fields everywhere to accommodate different client needs. Changes for mobile affect web and vice versa. The BFF pattern decouples frontend evolution from backend services.

BFF Implementation Structure

Each BFF talks to backend microservices and aggregates data appropriately for its client. The mobile BFF might fetch data from five services and return a compact response. The web BFF might fetch from the same services but return more detailed information including nested relationships.

// Mobile BFF: Optimized for limited bandwidth
app.get('/mobile/api/product/:id', async (req, res) => {
    const [product, inventory] = await Promise.all([
        productService.get(req.params.id),
        inventoryService.check(req.params.id)
    ]);

    // Return minimal data for mobile
    res.json({
        id: product.id,
        name: product.name,
        price: product.price,
        image: product.thumbnailUrl,  // Small thumbnail only
        available: inventory.quantity > 0
    });
});

// Web BFF: Full details for desktop
app.get('/web/api/product/:id', async (req, res) => {
    const [product, inventory, reviews, recommendations] = await Promise.all([
        productService.get(req.params.id),
        inventoryService.check(req.params.id),
        reviewService.getRecent(req.params.id, 10),
        recommendationService.getSimilar(req.params.id, 5)
    ]);

    // Return comprehensive data for web
    res.json({
        id: product.id,
        name: product.name,
        description: product.description,  // Full description
        price: product.price,
        images: product.allImages,  // All high-res images
        specifications: product.specs,
        inventory: {
            quantity: inventory.quantity,
            warehouse: inventory.location
        },
        reviews: reviews,
        recommendations: recommendations
    });
});

Data Point:

Netflix pioneered the BFF pattern to handle different client needs across smart TVs, mobile devices, and web browsers. Each platform has different performance characteristics and user experience requirements. BFFs let teams optimize each platform independently while reusing backend microservices.

Sidecar Pattern

The sidecar pattern deploys helper functionality alongside your main application container. Instead of adding logging, monitoring, and service mesh capabilities to every service's code, deploy them as sidecar containers. This decouples infrastructure concerns from application code and standardizes cross-cutting concerns across all services.

Common sidecar uses include: service mesh proxies (Envoy, Linkerd) that handle service-to-service communication, logging agents that collect and ship logs, monitoring agents that collect metrics, and configuration managers that synchronize configuration from central stores.

In Kubernetes, sidecars deploy as additional containers in the same pod as your application. They share the same network namespace and can access the same volumes, but run separate processes. This provides tight integration without modifying application code.

Service Mesh Sidecar Example

Service meshes like Istio inject sidecar proxies that intercept all network traffic. Your application thinks it's calling another service directly. Actually, the request goes through the sidecar proxy, which handles encryption, authentication, retries, circuit breaking, and observability, then forwards to the destination service's sidecar.

# Kubernetes pod with Istio sidecar
apiVersion: v1
kind: Pod
metadata:
  name: product-service
  labels:
    app: product-service
spec:
  containers:
  # Main application container
  - name: product-service
    image: product-service:1.0
    ports:
    - containerPort: 3000

  # Istio sidecar proxy (auto-injected)
  - name: istio-proxy
    image: istio/proxyv2:1.14.0
    ports:
    - containerPort: 15001  # Envoy admin
    - containerPort: 15090  # Prometheus metrics

The application code remains simple, making direct HTTP calls. The sidecar transparently adds mutual TLS, distributed tracing headers, circuit breaking, and metrics collection. This standardizes infrastructure concerns across all services regardless of programming language.

Strangler Fig Pattern

The strangler fig pattern gradually replaces legacy systems by building new functionality alongside old code and incrementally routing traffic to the new implementation. Named after strangler fig trees that grow around host trees, the pattern avoids risky big-bang rewrites.

Implementation routes specific URLs or features to the new microservice while keeping everything else in the legacy system. Use a reverse proxy or API gateway to implement routing rules. Gradually expand the microservice's scope while shrinking the legacy system's scope until the legacy system can be retired.

This pattern appeared in the monolith decomposition article but deserves mention here as a fundamental microservices migration pattern. It minimizes risk through incremental change and provides rollback capability at each step.

Database per Service Pattern

Each microservice owns its database and no other service accesses it directly. This provides loose coupling, independent scaling, and technology choice per service. The payment service can use PostgreSQL while the analytics service uses MongoDB and the cache service uses Redis.

The challenge is data consistency and querying across services. If you need to display order information with user details, you can't just join the orders and users tables — they're in different databases. Solutions include: API calls between services (synchronous), event-driven data replication (eventual consistency), or CQRS with separate read models that aggregate data.

Data Synchronization Strategies

When services need data owned by other services, replicate that data through events. The user service publishes UserCreated and UserUpdated events. The order service consumes these events and maintains a local user_cache table with just the fields it needs (user ID, email, name). This denormalized cache enables fast queries without cross-service database access.

// User Service: Publish events on changes
async function updateUser(userId, updates) {
    const user = await db.users.update(userId, updates);

    await eventBus.publish('UserUpdated', {
        userId: user.id,
        email: user.email,
        name: user.name,
        updatedAt: user.updatedAt
    });

    return user;
}

// Order Service: Maintain local user cache
eventBus.subscribe('UserUpdated', async (event) => {
    await db.userCache.upsert({
        userId: event.userId,
        email: event.email,
        name: event.name,
        syncedAt: new Date()
    });
});

// Order Service: Query local cache
async function getOrderWithUser(orderId) {
    const order = await db.orders.findById(orderId);
    const user = await db.userCache.findById(order.userId);

    return { ...order, user };
}

Health Check Pattern

Every microservice should expose health check endpoints that report service status. Load balancers and orchestrators query these endpoints to determine if a service instance is healthy. Unhealthy instances get removed from rotation automatically.

Implement two types of health checks: liveness checks determine if the service is running at all, and readiness checks determine if it's ready to handle traffic. A service might be running (liveness = healthy) but not ready (database connection pool warming up). Don't route traffic until both checks pass.

// Liveness check: Is the service running?
app.get('/health/live', (req, res) => {
    // Simplest check - if this responds, the service is alive
    res.status(200).json({ status: 'alive' });
});

// Readiness check: Can the service handle traffic?
app.get('/health/ready', async (req, res) => {
    try {
        // Check database connection
        await db.raw('SELECT 1');

        // Check required dependencies
        const cacheAvailable = await checkRedis();
        const queueAvailable = await checkMessageQueue();

        if (!cacheAvailable || !queueAvailable) {
            return res.status(503).json({
                status: 'not ready',
                cache: cacheAvailable,
                queue: queueAvailable
            });
        }

        res.status(200).json({ status: 'ready' });
    } catch (error) {
        res.status(503).json({
            status: 'not ready',
            error: error.message
        });
    }
});

Health Check Best Practices

Keep health checks fast — under 100ms. They run frequently (every few seconds) and slow checks waste resources. Don't perform expensive operations like querying large datasets. Just verify that dependencies are reachable.

Include dependency checks in readiness but not liveness. If the database is down, the service should report not ready (don't route traffic) but still alive (don't restart the pod). This distinction matters for orchestrators like Kubernetes that automatically restart pods failing liveness checks.

Return detailed health information in development but minimal information in production. Development health checks can expose database connection details and internal state. Production health checks should return only healthy/unhealthy status to avoid leaking system internals.

Retry Pattern with Exponential Backoff

Network calls between services fail for transient reasons: temporary network issues, service restarts, brief resource exhaustion. The retry pattern handles these gracefully by retrying failed requests instead of immediately returning errors. Exponential backoff prevents retry storms that overwhelm recovering services.

Linear retries (wait 1 second, retry, wait 1 second, retry) hammer the failing service with constant load. Exponential backoff increases wait time after each failure: wait 1 second, wait 2 seconds, wait 4 seconds, wait 8 seconds. This gives the service time to recover while reducing load.

// Retry with exponential backoff
async function callWithRetry(fn, maxRetries = 3) {
    let lastError;

    for (let attempt = 0; attempt < maxRetries; attempt++) {
        try {
            return await fn();
        } catch (error) {
            lastError = error;

            // Don't retry on client errors (4xx)
            if (error.status >= 400 && error.status < 500) {
                throw error;
            }

            // Last attempt, don't wait
            if (attempt === maxRetries - 1) {
                break;
            }

            // Exponential backoff: 1s, 2s, 4s, 8s...
            const delay = Math.pow(2, attempt) * 1000;

            // Add jitter to prevent thundering herd
            const jitter = Math.random() * 1000;

            await sleep(delay + jitter);
        }
    }

    throw lastError;
}

// Usage
const user = await callWithRetry(() =>
    fetch('http://user-service/api/users/123')
);

Idempotency and Retries

Retrying requests safely requires idempotent operations — operations that produce the same result whether executed once or multiple times. GET requests are naturally idempotent. POST requests often aren't — retrying "create order" might create duplicate orders.

Make POST requests idempotent using idempotency keys. Clients generate a unique key (UUID) and include it with the request. The server stores which keys it has processed. If it receives a request with a key it has already processed, it returns the original result instead of creating a duplicate.

// Client: Generate idempotency key
const idempotencyKey = generateUUID();
const response = await fetch('http://order-service/api/orders', {
    method: 'POST',
    headers: {
        'Idempotency-Key': idempotencyKey,
        'Content-Type': 'application/json'
    },
    body: JSON.stringify(orderData)
});

// Server: Check for duplicate requests
app.post('/api/orders', async (req, res) => {
    const idempotencyKey = req.headers['idempotency-key'];

    // Check if we've processed this key before
    const existing = await db.idempotencyKeys.findOne({
        key: idempotencyKey
    });

    if (existing) {
        // Return original result
        return res.json(existing.result);
    }

    // Process the request
    const order = await createOrder(req.body);

    // Store result with idempotency key
    await db.idempotencyKeys.create({
        key: idempotencyKey,
        result: order,
        createdAt: new Date()
    });

    res.json(order);
});

Bulkhead Pattern

The bulkhead pattern isolates resources so failures in one area don't affect others. Named after ship bulkheads that contain flooding to specific compartments, the pattern prevents cascading resource exhaustion. If one feature consumes all available threads or database connections, other features continue functioning.

Implementation creates separate resource pools for different operations or clients. A service might allocate 50 threads for API requests and 10 for background jobs. If background jobs hang, they consume only their 10 threads. API requests still have 50 threads available. Without bulkheads, hung background jobs would consume all 60 threads and block API requests.

// Bulkhead pattern using worker pools
const { Worker } = require('worker_threads');

// Separate worker pools for different operations
const apiWorkerPool = createWorkerPool({ size: 50 });
const backgroundWorkerPool = createWorkerPool({ size: 10 });

// API requests use API pool
app.get('/api/products', async (req, res) => {
    const result = await apiWorkerPool.execute(() =>
        fetchProducts(req.query)
    );
    res.json(result);
});

// Background jobs use background pool
async function processReportGeneration(reportId) {
    const result = await backgroundWorkerPool.execute(() =>
        generateReport(reportId)
    );
    return result;
}

Database Connection Pooling

Database connections represent another resource to bulkhead. Create separate connection pools for different operations: one for user-facing queries (high priority, small pool), one for analytics (low priority, larger pool), one for admin operations (medium priority, small pool). This prevents analytics queries from exhausting connections needed for user-facing operations.

Pool Type	Size	Max Wait Time	Use Case
API Pool	20 connections	5 seconds	User-facing requests
Analytics Pool	5 connections	30 seconds	Long-running reports
Background Pool	10 connections	60 seconds	Background jobs

Frequently Asked Questions

Do I need to implement all these patterns from day one?

No. Start with the essential patterns: API Gateway for routing, Health Checks for monitoring, and Circuit Breakers for resilience. Add other patterns as you encounter the problems they solve. Implementing CQRS before you have read/write scaling needs adds complexity without benefit. Let pain points guide which patterns to adopt.

Which patterns work together and which conflict?

Most patterns complement each other. API Gateway + Circuit Breaker + Retry is a common combination for resilient service communication. CQRS + Event Sourcing + Saga work well together for complex domains. BFF + API Gateway lets you create client-specific gateways. The main conflict is complexity — implementing too many patterns simultaneously creates confusion. Adopt incrementally.

How do I choose between choreography and orchestration for sagas?

Use choreography for simple workflows with 2-3 steps where services are naturally decoupled. Use orchestration for complex workflows with 5+ steps, conditional logic, or requirements to monitor workflow state. Orchestration is easier to understand and debug but creates a central coordinator. Choreography is more decoupled but harder to visualize workflow state.

Should I build my own API Gateway or use a managed service?

For most teams, use a managed service (AWS API Gateway, Google Cloud API Gateway) or mature open-source gateway (Kong, Tyk). Building custom gateways makes sense only if you have very specific requirements that existing solutions don't meet. The operational overhead of managing gateway infrastructure usually exceeds the cost of managed services until you reach significant scale.

How do I implement circuit breakers for database connections?

Most circuit breaker libraries focus on HTTP calls. For databases, implement connection pool monitoring instead. Track failed connection attempts and temporarily stop accepting requests if the database is unreachable. Return cached data or degraded responses during outages. Libraries like Knex.js (Node.js) and HikariCP (Java) provide connection pool health monitoring.

What's the difference between API Gateway and Service Mesh?

API Gateway handles north-south traffic (external clients to services). Service Mesh handles east-west traffic (service-to-service communication). You often use both: API Gateway for client requests, Service Mesh for internal service communication. Service meshes like Istio provide features similar to API gateways (routing, authentication, observability) but for internal traffic.

How do I test saga compensation logic?

Write integration tests that deliberately fail specific saga steps and verify compensation executes correctly. For orchestration sagas, inject failures into service calls and verify the orchestrator runs compensation. For choreography sagas, publish failure events and verify services react appropriately. Chaos engineering tools can help by randomly failing services in test environments.

Can I use CQRS without event sourcing?

Yes. CQRS simply separates command and query models. You can use traditional database storage for both sides. Event sourcing is a common complement to CQRS but not required. Start with CQRS using conventional databases, add event sourcing later if you need audit history or temporal queries.

Conclusion

Microservices design patterns solve common distributed system problems through proven approaches. Start with foundational patterns — API Gateway for routing, Circuit Breaker for resilience, Health Checks for monitoring. Add patterns as you encounter specific problems: Saga for distributed transactions, CQRS for read/write scaling, BFF for client-specific APIs. Each pattern solves specific problems and introduces specific complexity. Adopt patterns incrementally based on measured pain points rather than theoretical benefits.

The most successful microservices systems combine multiple patterns appropriately. API Gateway routes traffic, Circuit Breakers prevent cascading failures, Retry patterns handle transient errors, and Service Discovery enables dynamic environments. Understanding when and how to apply each pattern separates resilient production systems from fragile distributed monoliths.

Best Microservices Design Patterns Every Dev Must Know

Best Microservices Design Patterns Every Dev Must Know

API Gateway Pattern

Implementation Approaches

Gateway Aggregation

Circuit Breaker Pattern

Implementation with opossum

Fallback Strategies

Service Discovery Pattern

Using Consul for Service Discovery

Kubernetes Service Discovery

Saga Pattern for Distributed Transactions

Choreography-Based Saga

Orchestration-Based Saga

CQRS Pattern

Basic CQRS Implementation

Event Sourcing with CQRS

Backend for Frontend (BFF) Pattern

BFF Implementation Structure

Sidecar Pattern

Service Mesh Sidecar Example

Strangler Fig Pattern

Database per Service Pattern

Data Synchronization Strategies

Health Check Pattern

Health Check Best Practices

Retry Pattern with Exponential Backoff

Idempotency and Retries

Bulkhead Pattern

Database Connection Pooling

Frequently Asked Questions

Do I need to implement all these patterns from day one?

Which patterns work together and which conflict?

How do I choose between choreography and orchestration for sagas?

Should I build my own API Gateway or use a managed service?

How do I implement circuit breakers for database connections?

What's the difference between API Gateway and Service Mesh?

How do I test saga compensation logic?

Can I use CQRS without event sourcing?

Conclusion

Share on Social Media: