Top Saga Pattern for Distributed Transactions

Top Saga Pattern for Distributed Transactions

Profile-Image
Bright SEO Tools in saas Published: Apr 04, 2026 | Updated: Apr 04, 2026 · 2 months ago
0:00

Top Saga Pattern for Distributed Transactions

Distributed transactions fail when a single database transaction can't span multiple microservices. The saga pattern solves this by breaking long-lived transactions into a sequence of local transactions, each with a compensating transaction that undoes its changes if a later step fails. This approach trades ACID guarantees for eventual consistency, which is the only viable option when your data lives across service boundaries.

This guide explains how to implement sagas using choreography and orchestration patterns, when each approach makes sense, and how to handle the partial failure scenarios that make distributed transactions difficult. You'll learn how to design compensating transactions that actually work, how to ensure saga execution survives system crashes, and the specific consistency guarantees you can and cannot provide.

We cover implementation patterns in Node.js with event sourcing, saga coordination strategies for complex business workflows, and the monitoring approaches that let you debug failed sagas without drowning in distributed trace logs.

Understanding the Saga Pattern

The saga pattern manages data consistency across services without distributed transactions by coordinating a sequence of local transactions. Each local transaction updates data within a single service and publishes an event or message that triggers the next transaction in the sequence. If any transaction fails, saga executes compensating transactions to undo the impact of preceding transactions.

Traditional ACID transactions guarantee that either all operations succeed or all fail atomically. Distributed transactions attempt to extend this guarantee across service boundaries using two-phase commit protocols, but these protocols introduce blocking, reduce availability, and create coupling between services that defeats the purpose of microservices architecture.

Sagas accept eventual consistency as a necessary tradeoff. During saga execution, the system may be in an inconsistent state, but it eventually reaches consistency when the saga completes or all compensations execute. This means your application must handle intermediate states where some operations succeeded and others haven't yet occurred.

Key Insight: Sagas don't provide isolation. Other transactions can observe partial results before the saga completes. Your application logic must handle reading data that represents an in-progress saga.

Choreography-Based Sagas

Choreography-based sagas coordinate through events. Each service listens for events, performs its local transaction, and publishes an event that triggers the next service. There's no central coordinator; services collaborate by reacting to each other's events.

How Choreography Works

In an order processing saga, the order service creates an order and publishes an OrderCreated event. The payment service listens for OrderCreated, processes payment, and publishes PaymentProcessed. The inventory service listens for PaymentProcessed, reserves items, and publishes InventoryReserved. Each service knows only about the events it publishes and the events it consumes.

When a step fails, the service publishes a failure event. Other services listen for failure events and execute compensating transactions. If payment fails, the payment service publishes PaymentFailed. The order service listens for PaymentFailed and cancels the order.

Implementation with Event Sourcing

// Order Service
class OrderService {
    async createOrder(orderData) {
        const order = await this.db.orders.create({
            ...orderData,
            status: 'PENDING',
            sagaState: 'CREATED'
        });

        await this.eventBus.publish('OrderCreated', {
            orderId: order.id,
            customerId: order.customerId,
            amount: order.amount,
            items: order.items
        });

        return order;
    }

    // Listen for payment events
    async onPaymentProcessed(event) {
        await this.db.orders.update(event.orderId, {
            status: 'PAID',
            sagaState: 'PAYMENT_COMPLETED'
        });
    }

    async onPaymentFailed(event) {
        await this.db.orders.update(event.orderId, {
            status: 'CANCELLED',
            sagaState: 'COMPENSATED',
            reason: event.reason
        });
    }
}

// Payment Service
class PaymentService {
    async processPayment(event) {
        try {
            const payment = await this.paymentGateway.charge({
                customerId: event.customerId,
                amount: event.amount
            });

            await this.db.payments.create({
                orderId: event.orderId,
                transactionId: payment.id,
                status: 'COMPLETED'
            });

            await this.eventBus.publish('PaymentProcessed', {
                orderId: event.orderId,
                paymentId: payment.id,
                amount: event.amount
            });
        } catch (error) {
            await this.eventBus.publish('PaymentFailed', {
                orderId: event.orderId,
                reason: error.message
            });
        }
    }

    // Compensating transaction for payment
    async refundPayment(event) {
        const payment = await this.db.payments.findOne({
            orderId: event.orderId
        });

        if (payment && payment.status === 'COMPLETED') {
            await this.paymentGateway.refund(payment.transactionId);
            await this.db.payments.update(payment.id, {
                status: 'REFUNDED'
            });

            await this.eventBus.publish('PaymentRefunded', {
                orderId: event.orderId,
                paymentId: payment.id
            });
        }
    }
}

// Inventory Service
class InventoryService {
    async reserveInventory(event) {
        try {
            for (const item of event.items) {
                const available = await this.db.inventory.findOne({
                    productId: item.productId
                });

                if (available.quantity < item.quantity) {
                    throw new Error(`Insufficient inventory for ${item.productId}`);
                }

                await this.db.inventory.update({
                    productId: item.productId
                }, {
                    quantity: available.quantity - item.quantity,
                    reserved: available.reserved + item.quantity
                });
            }

            await this.eventBus.publish('InventoryReserved', {
                orderId: event.orderId,
                items: event.items
            });
        } catch (error) {
            // Trigger compensation
            await this.eventBus.publish('InventoryReservationFailed', {
                orderId: event.orderId,
                reason: error.message
            });
        }
    }

    // Compensating transaction for inventory
    async releaseInventory(event) {
        const items = event.items;
        for (const item of items) {
            const current = await this.db.inventory.findOne({
                productId: item.productId
            });

            await this.db.inventory.update({
                productId: item.productId
            }, {
                quantity: current.quantity + item.quantity,
                reserved: current.reserved - item.quantity
            });
        }

        await this.eventBus.publish('InventoryReleased', {
            orderId: event.orderId
        });
    }
}

Each service maintains its own database and publishes events after successfully committing local transactions. The event bus ensures events are delivered at least once, which means services must implement idempotent handlers that produce the same result when processing duplicate events.

Advantages of Choreography

Choreography creates loosely coupled services. No service knows about the complete saga workflow; each service only understands its immediate neighbors through events. This makes it easy to add new services to the saga by subscribing to existing events and publishing new ones.

Services can be developed and deployed independently. Changes to one service's internal logic don't affect other services as long as event contracts remain stable. This independence enables parallel development by different teams.

Choreography naturally supports multiple sagas using the same events. If both order processing and analytics workflows need to react to PaymentProcessed events, they simply subscribe to the same event independently.

Disadvantages of Choreography

Understanding the complete saga flow requires reading code across multiple services. There's no single place that documents which services participate in the saga or in what order. This makes debugging difficult because tracing a saga execution requires correlating logs and events across all services.

Adding new steps to the saga requires changes to multiple services. If you need to add a loyalty points step between payment and inventory, you must modify the payment service to publish a different event, add the loyalty service, and modify the inventory service to listen for the new event instead of PaymentProcessed.

Circular dependencies emerge when compensations trigger more compensations. If inventory failure triggers payment refund, which triggers order cancellation, which triggers a notification that could potentially fail and need compensation, the event chains become difficult to reason about.

Orchestration-Based Sagas

Orchestration-based sagas use a central coordinator that tells each service what operation to perform. The orchestrator maintains the saga state, decides which step executes next, and handles compensation when failures occur.

How Orchestration Works

The saga orchestrator receives a request to start the saga, invokes each service in sequence through synchronous calls or commands, tracks which steps completed successfully, and triggers compensating transactions in reverse order when a step fails.

class OrderSagaOrchestrator {
    constructor(orderService, paymentService, inventoryService, shippingService) {
        this.orderService = orderService;
        this.paymentService = paymentService;
        this.inventoryService = inventoryService;
        this.shippingService = shippingService;
    }

    async execute(orderData) {
        const saga = await this.createSagaRecord(orderData);
        const compensations = [];

        try {
            // Step 1: Create order
            const order = await this.orderService.createOrder(orderData);
            await this.updateSagaState(saga.id, 'ORDER_CREATED', { orderId: order.id });
            compensations.push(() => this.orderService.cancelOrder(order.id));

            // Step 2: Process payment
            const payment = await this.paymentService.processPayment({
                orderId: order.id,
                customerId: order.customerId,
                amount: order.amount
            });
            await this.updateSagaState(saga.id, 'PAYMENT_COMPLETED', { paymentId: payment.id });
            compensations.push(() => this.paymentService.refundPayment(payment.id));

            // Step 3: Reserve inventory
            await this.inventoryService.reserveInventory({
                orderId: order.id,
                items: order.items
            });
            await this.updateSagaState(saga.id, 'INVENTORY_RESERVED');
            compensations.push(() => this.inventoryService.releaseInventory(order.items));

            // Step 4: Create shipment
            const shipment = await this.shippingService.createShipment({
                orderId: order.id,
                address: order.shippingAddress
            });
            await this.updateSagaState(saga.id, 'SHIPMENT_CREATED', { shipmentId: shipment.id });

            // All steps succeeded
            await this.updateSagaState(saga.id, 'COMPLETED');
            return { success: true, orderId: order.id };

        } catch (error) {
            // Execute compensations in reverse order
            console.error('Saga failed, executing compensations:', error);
            await this.compensate(saga.id, compensations);
            await this.updateSagaState(saga.id, 'COMPENSATED', { error: error.message });

            return {
                success: false,
                reason: error.message
            };
        }
    }

    async compensate(sagaId, compensations) {
        // Execute compensations in reverse order (LIFO)
        for (let i = compensations.length - 1; i >= 0; i--) {
            try {
                await compensations[i]();
                await this.updateSagaState(sagaId, `COMPENSATED_STEP_${i}`);
            } catch (error) {
                // Log compensation failure but continue with remaining compensations
                console.error(`Compensation ${i} failed:`, error);
                await this.updateSagaState(sagaId, `COMPENSATION_FAILED_STEP_${i}`, {
                    error: error.message
                });
            }
        }
    }

    async createSagaRecord(orderData) {
        return await this.db.sagas.create({
            type: 'ORDER_PROCESSING',
            status: 'STARTED',
            data: orderData,
            steps: []
        });
    }

    async updateSagaState(sagaId, status, data = {}) {
        await this.db.sagas.update(sagaId, {
            status: status,
            updatedAt: new Date(),
            $push: {
                steps: {
                    status: status,
                    data: data,
                    timestamp: new Date()
                }
            }
        });
    }

    // Recover incomplete sagas after system restart
    async recoverSagas() {
        const incompleteSagas = await this.db.sagas.find({
            status: { $in: ['STARTED', 'ORDER_CREATED', 'PAYMENT_COMPLETED', 'INVENTORY_RESERVED'] }
        });

        for (const saga of incompleteSagas) {
            console.log(`Recovering saga ${saga.id}, status: ${saga.status}`);

            // Determine which compensations need to run based on completed steps
            const compensations = this.buildCompensationsFromSteps(saga.steps);
            await this.compensate(saga.id, compensations);
        }
    }

    buildCompensationsFromSteps(steps) {
        const compensations = [];

        for (const step of steps) {
            if (step.status === 'ORDER_CREATED') {
                compensations.push(() => this.orderService.cancelOrder(step.data.orderId));
            }
            if (step.status === 'PAYMENT_COMPLETED') {
                compensations.push(() => this.paymentService.refundPayment(step.data.paymentId));
            }
            if (step.status === 'INVENTORY_RESERVED') {
                compensations.push(() => this.inventoryService.releaseInventory(step.data.items));
            }
        }

        return compensations;
    }
}

The orchestrator persists saga state before each step execution. If the system crashes, recovery logic reads incomplete sagas from the database and either completes them or runs compensations based on the last recorded state.

Advantages of Orchestration

The complete saga workflow is visible in one place. Reading the orchestrator code shows exactly which services participate, in what order, and what compensations run when failures occur. This centralized view makes saga behavior easy to understand and debug.

Adding or removing steps requires changes only to the orchestrator. Services expose simple operations without needing to understand saga logic or compensation flows. This separation of concerns keeps service implementations focused on business logic.

Orchestration handles complex conditional logic naturally. If the saga flow changes based on order amount, customer type, or inventory availability, the orchestrator implements these decisions without requiring event-based state machines in each service.

Disadvantages of Orchestration

The orchestrator becomes a single point of failure. If the orchestrator service is down, no sagas can execute. This requires running multiple orchestrator instances with shared saga state in a database, adding operational complexity.

Services couple to the orchestrator. Each service needs to expose operations that the orchestrator can call, creating an API contract between the orchestrator and every participating service. Changes to saga steps may require coordinated deployments.

The orchestrator can become a god object that accumulates too much business logic. When orchestrators start making business decisions beyond coordination, they violate the single responsibility principle and become difficult to maintain.

Pro Tip: Use choreography for simple sagas with few steps and clear event flows. Use orchestration for complex sagas with conditional logic, many steps, or when visibility into saga execution is critical for debugging.

Designing Compensating Transactions

Compensating transactions undo the effects of completed saga steps. Designing compensations that actually work requires understanding the semantic meaning of operations, not just reversing database changes.

Semantic Compensations vs Database Rollbacks

Database rollbacks restore previous values, but compensations must make semantic sense in your business domain. If a user places an order and the saga fails after sending a confirmation email, the compensation isn't to un-send the email. The compensation is to send a cancellation email explaining why the order failed.

Consider a hotel booking saga that reserves a room, charges the customer, and sends a confirmation. If the payment fails, the compensation must release the room reservation. But if the system already sent a confirmation email, the compensation should send an apology email rather than pretending the reservation never happened.

Idempotent Compensations

Compensations must be idempotent because saga recovery might execute them multiple times. If the orchestrator crashes after starting a payment refund but before marking it complete, recovery will retry the refund. The refund operation must check whether it already executed and return success without double-refunding.

async function refundPayment(paymentId) {
    const payment = await db.payments.findOne({ id: paymentId });

    // Already refunded, idempotent success
    if (payment.status === 'REFUNDED') {
        return { success: true, alreadyRefunded: true };
    }

    // Can't refund what was never charged
    if (payment.status !== 'COMPLETED') {
        throw new Error('Cannot refund payment that is not completed');
    }

    const refund = await paymentGateway.refund(payment.transactionId);

    await db.payments.update(paymentId, {
        status: 'REFUNDED',
        refundId: refund.id,
        refundedAt: new Date()
    });

    return { success: true, refundId: refund.id };
}

Handling Non-Compensatable Operations

Some operations can't be compensated. Once you send a product shipment notification to a fulfillment center, you can't un-send it. The package might already be on a truck. In these cases, design your saga so non-compensatable operations occur last, after all compensatable operations succeed.

For truly critical operations, use the pivot transaction pattern. The saga executes all compensatable operations first, then executes one non-compensatable operation that serves as the commit point. After this pivot transaction succeeds, the saga only moves forward with operations that don't require compensation.

async function executeOrderSaga(order) {
    try {
        // Compensatable phase
        const payment = await processPayment(order);
        const inventory = await reserveInventory(order);

        // Pivot transaction - non-compensatable
        const shipment = await submitToFulfillmentCenter(order);

        // Post-pivot operations - no compensation needed
        await sendConfirmationEmail(order);
        await recordAnalytics(order);

        return { success: true };
    } catch (error) {
        // Only compensate if we haven't reached the pivot
        if (!shipment) {
            if (inventory) await releaseInventory(order);
            if (payment) await refundPayment(payment);
        }
        throw error;
    }
}

Saga State Management

Sagas must survive system crashes and network failures. Proper state management ensures sagas either complete successfully or compensate fully, never leaving the system in a partially committed state.

Persisting Saga State

Every saga execution needs a persistent record that tracks which steps completed, which are in progress, and what data each step produced. This state enables recovery after crashes.

const sagaSchema = {
    id: String,
    type: String, // 'ORDER_PROCESSING', 'USER_REGISTRATION', etc.
    status: String, // 'STARTED', 'IN_PROGRESS', 'COMPLETED', 'COMPENSATING', 'COMPENSATED', 'FAILED'
    currentStep: String,
    completedSteps: [{
        name: String,
        completedAt: Date,
        data: Object // Data needed for compensation
    }],
    failedStep: {
        name: String,
        error: String,
        failedAt: Date
    },
    input: Object, // Original saga input
    output: Object, // Final saga result
    startedAt: Date,
    completedAt: Date,
    timeout: Date // When to consider this saga abandoned
};

Before executing each step, update the saga record with the step name and status. After successful execution, record the step completion and any data needed for compensation. This write-ahead logging ensures that even if the orchestrator crashes mid-step, recovery logic knows which steps completed.

Saga Recovery

On startup, the orchestrator queries for sagas in non-terminal states (STARTED, IN_PROGRESS, COMPENSATING) and resumes them. The recovery logic must handle multiple scenarios.

async function recoverSagas() {
    const incompleteSagas = await db.sagas.find({
        status: { $in: ['STARTED', 'IN_PROGRESS', 'COMPENSATING'] },
        startedAt: { $lt: Date.now() - 3600000 } // Older than 1 hour
    });

    for (const saga of incompleteSagas) {
        if (saga.status === 'COMPENSATING') {
            // Resume compensation
            await resumeCompensation(saga);
        } else {
            // Decide whether to continue forward or compensate
            if (await shouldContinue(saga)) {
                await resumeSaga(saga);
            } else {
                await startCompensation(saga);
            }
        }
    }
}

async function resumeSaga(saga) {
    // Load the saga definition
    const definition = getSagaDefinition(saga.type);

    // Find which step to execute next
    const completedStepNames = saga.completedSteps.map(s => s.name);
    const nextStep = definition.steps.find(s => !completedStepNames.includes(s.name));

    if (!nextStep) {
        // All steps completed, mark saga as done
        await db.sagas.update(saga.id, { status: 'COMPLETED', completedAt: new Date() });
        return;
    }

    try {
        const result = await nextStep.execute(saga.input, saga.completedSteps);
        await db.sagas.update(saga.id, {
            $push: { completedSteps: { name: nextStep.name, data: result, completedAt: new Date() } }
        });
        // Recursively continue
        await resumeSaga(await db.sagas.findOne({ id: saga.id }));
    } catch (error) {
        await startCompensation(saga, error);
    }
}

async function resumeCompensation(saga) {
    const definition = getSagaDefinition(saga.type);

    // Execute compensations for completed steps in reverse order
    for (let i = saga.completedSteps.length - 1; i >= 0; i--) {
        const step = saga.completedSteps[i];
        const compensation = definition.compensations[step.name];

        if (!compensation) continue;

        try {
            await compensation.execute(step.data);
            await db.sagas.update(saga.id, {
                $pull: { completedSteps: { name: step.name } }
            });
        } catch (error) {
            console.error(`Compensation for ${step.name} failed:`, error);
            // Continue with remaining compensations
        }
    }

    await db.sagas.update(saga.id, {
        status: 'COMPENSATED',
        completedAt: new Date()
    });
}

Timeouts and Abandoned Sagas

Some sagas might never complete due to persistent failures or external service outages. Set timeouts on saga execution and mark sagas as abandoned if they exceed the timeout. Abandoned sagas should trigger compensations rather than remaining in limbo indefinitely.

async function checkSagaTimeouts() {
    const abandonedSagas = await db.sagas.find({
        status: { $in: ['STARTED', 'IN_PROGRESS'] },
        startedAt: { $lt: Date.now() - 86400000 } // Older than 24 hours
    });

    for (const saga of abandonedSagas) {
        console.warn(`Saga ${saga.id} abandoned after timeout`);
        await startCompensation(saga, new Error('Saga timeout'));
    }
}

// Run timeout checker periodically
setInterval(checkSagaTimeouts, 3600000); // Every hour

Monitoring and Debugging Sagas

Distributed saga execution creates complex traces across multiple services. Effective monitoring requires correlation IDs, structured logging, and saga-specific observability tools.

Correlation IDs

Every saga execution generates a unique saga ID that propagates through all service calls and events. This correlation ID appears in every log entry, event, and database record related to the saga, enabling you to reconstruct the complete execution flow.

class SagaOrchestrator {
    async execute(orderData) {
        const sagaId = uuid();
        const logger = createLogger({ sagaId });

        logger.info('Starting order saga', { orderData });

        try {
            const order = await this.orderService.createOrder(orderData, { sagaId });
            logger.info('Order created', { orderId: order.id });

            const payment = await this.paymentService.processPayment({
                orderId: order.id,
                amount: order.amount,
                sagaId
            });
            logger.info('Payment processed', { paymentId: payment.id });

            await this.inventoryService.reserveInventory({
                items: order.items,
                sagaId
            });
            logger.info('Inventory reserved');

            logger.info('Saga completed successfully');
            return { success: true };
        } catch (error) {
            logger.error('Saga failed', { error: error.message });
            throw error;
        }
    }
}

Saga Execution Visualizations

Build dashboards that visualize saga execution flow. Show each saga instance with its current state, completed steps, and time spent in each step. This makes it immediately obvious when sagas are stuck or taking unexpectedly long.

Track saga execution metrics including success rate per saga type, average completion time, compensation frequency, and which steps fail most often. These metrics identify problematic saga steps that need reliability improvements.

Failed Saga Analysis

When sagas fail, you need quick answers to specific questions: Which step failed? Why did it fail? Did compensation complete successfully? What data was lost or inconsistent?

async function analyzeSagaFailure(sagaId) {
    const saga = await db.sagas.findOne({ id: sagaId });

    const report = {
        sagaId: sagaId,
        type: saga.type,
        failedAt: saga.failedStep?.failedAt,
        failedStep: saga.failedStep?.name,
        error: saga.failedStep?.error,
        completedSteps: saga.completedSteps.map(s => s.name),
        wasCompensated: saga.status === 'COMPENSATED',
        timeline: []
    };

    // Build timeline of saga execution
    report.timeline.push({
        event: 'Saga started',
        timestamp: saga.startedAt
    });

    for (const step of saga.completedSteps) {
        report.timeline.push({
            event: `Step ${step.name} completed`,
            timestamp: step.completedAt,
            duration: step.completedAt - saga.startedAt
        });
    }

    if (saga.failedStep) {
        report.timeline.push({
            event: `Step ${saga.failedStep.name} failed`,
            timestamp: saga.failedStep.failedAt,
            error: saga.failedStep.error
        });
    }

    // Check for data inconsistencies
    report.inconsistencies = await checkDataConsistency(saga);

    return report;
}

async function checkDataConsistency(saga) {
    const issues = [];

    // Check if order exists when it should
    if (saga.completedSteps.some(s => s.name === 'CreateOrder')) {
        const order = await db.orders.findOne({ sagaId: saga.id });
        if (!order && saga.status !== 'COMPENSATED') {
            issues.push('Order should exist but not found');
        }
        if (order && saga.status === 'COMPENSATED' && order.status !== 'CANCELLED') {
            issues.push('Order exists but should be cancelled');
        }
    }

    return issues;
}
Warning: Saga logs across distributed services generate enormous data volumes. Use log aggregation with proper retention policies and only log saga execution at INFO level, reserving DEBUG for specific investigations.

Testing Saga Implementations

Testing sagas requires simulating failures at every step and verifying that compensations execute correctly. Standard unit tests aren't sufficient because the complexity lies in the interactions between services.

Testing Compensation Logic

describe('Order Saga Compensation', () => {
    let orchestrator;
    let mockServices;

    beforeEach(() => {
        mockServices = {
            orderService: {
                createOrder: jest.fn(),
                cancelOrder: jest.fn()
            },
            paymentService: {
                processPayment: jest.fn(),
                refundPayment: jest.fn()
            },
            inventoryService: {
                reserveInventory: jest.fn(),
                releaseInventory: jest.fn()
            }
        };

        orchestrator = new OrderSagaOrchestrator(mockServices);
    });

    test('compensates order and payment when inventory fails', async () => {
        mockServices.orderService.createOrder.mockResolvedValue({ id: '123' });
        mockServices.paymentService.processPayment.mockResolvedValue({ id: 'pay_123' });
        mockServices.inventoryService.reserveInventory.mockRejectedValue(
            new Error('Insufficient inventory')
        );

        await orchestrator.execute({ items: [{ productId: 'prod_1', quantity: 1 }] });

        // Verify compensations executed in reverse order
        expect(mockServices.paymentService.refundPayment).toHaveBeenCalledWith('pay_123');
        expect(mockServices.orderService.cancelOrder).toHaveBeenCalledWith('123');
    });

    test('handles compensation failures gracefully', async () => {
        mockServices.orderService.createOrder.mockResolvedValue({ id: '123' });
        mockServices.paymentService.processPayment.mockResolvedValue({ id: 'pay_123' });
        mockServices.inventoryService.reserveInventory.mockRejectedValue(
            new Error('Insufficient inventory')
        );
        mockServices.paymentService.refundPayment.mockRejectedValue(
            new Error('Refund service unavailable')
        );

        const result = await orchestrator.execute({ items: [{ productId: 'prod_1', quantity: 1 }] });

        // Saga should still complete compensation attempts
        expect(mockServices.orderService.cancelOrder).toHaveBeenCalledWith('123');
        expect(result.compensationErrors).toBeDefined();
    });

    test('saga is idempotent on retry', async () => {
        const orderData = { customerId: 'cust_1', items: [] };

        mockServices.orderService.createOrder.mockResolvedValue({ id: '123' });
        mockServices.paymentService.processPayment.mockResolvedValue({ id: 'pay_123' });
        mockServices.inventoryService.reserveInventory.mockResolvedValue({ success: true });

        // Execute saga twice
        await orchestrator.execute(orderData);
        await orchestrator.execute(orderData);

        // Should have created two separate sagas, not duplicated operations
        expect(mockServices.orderService.createOrder).toHaveBeenCalledTimes(2);
    });
});

Integration Testing with Real Services

Integration tests should use real services running in test mode with test databases. Inject failures at specific points to verify compensation behavior with actual service implementations.

describe('Order Saga Integration', () => {
    let testDb;
    let services;
    let orchestrator;

    beforeAll(async () => {
        testDb = await createTestDatabase();
        services = await startTestServices(testDb);
        orchestrator = new OrderSagaOrchestrator(services);
    });

    test('complete saga execution with real services', async () => {
        const orderData = {
            customerId: 'test_customer',
            items: [{ productId: 'test_product', quantity: 1 }],
            amount: 1000
        };

        const result = await orchestrator.execute(orderData);

        expect(result.success).toBe(true);

        // Verify database state
        const order = await testDb.orders.findOne({ id: result.orderId });
        expect(order.status).toBe('COMPLETED');

        const payment = await testDb.payments.findOne({ orderId: result.orderId });
        expect(payment.status).toBe('COMPLETED');

        const inventory = await testDb.inventory.findOne({ productId: 'test_product' });
        expect(inventory.reserved).toBeGreaterThan(0);
    });

    test('compensation leaves database consistent', async () => {
        // Configure inventory service to fail
        services.inventoryService.enableFailureMode();

        const orderData = {
            customerId: 'test_customer',
            items: [{ productId: 'test_product', quantity: 100 }],
            amount: 1000
        };

        await orchestrator.execute(orderData);

        // Verify compensations executed
        const order = await testDb.orders.findOne({ customerId: 'test_customer' });
        expect(order.status).toBe('CANCELLED');

        const payment = await testDb.payments.findOne({ orderId: order.id });
        expect(payment.status).toBe('REFUNDED');

        services.inventoryService.disableFailureMode();
    });
});

Saga Pattern Variations

Hybrid Choreography-Orchestration

Some systems benefit from combining both patterns. Use choreography for the main flow where services react to events, but introduce orchestration for complex sub-workflows that require visibility and control.

// Main flow uses choreography
orderService.on('OrderCreated', async (order) => {
    // Use orchestrator for complex payment workflow
    const paymentOrchestrator = new PaymentSagaOrchestrator();
    const paymentResult = await paymentOrchestrator.execute({
        orderId: order.id,
        amount: order.amount,
        paymentMethod: order.paymentMethod
    });

    if (paymentResult.success) {
        await eventBus.publish('PaymentCompleted', paymentResult);
    } else {
        await eventBus.publish('PaymentFailed', paymentResult);
    }
});

// Payment orchestrator handles complex payment logic
class PaymentSagaOrchestrator {
    async execute(paymentData) {
        // Complex orchestrated flow for payment processing
        await this.validatePaymentMethod();
        await this.checkFraud();
        await this.authorizePayment();
        await this.capturePayment();
        return { success: true };
    }
}

Saga with Timeouts

Long-running sagas need timeout handling. If a step takes too long, the saga should compensate rather than waiting indefinitely.

async function executeStepWithTimeout(step, timeout) {
    const timeoutPromise = new Promise((_, reject) => {
        setTimeout(() => reject(new Error('Step timeout')), timeout);
    });

    try {
        return await Promise.race([
            step.execute(),
            timeoutPromise
        ]);
    } catch (error) {
        if (error.message === 'Step timeout') {
            console.error(`Step ${step.name} timed out after ${timeout}ms`);
            throw error;
        }
        throw error;
    }
}

Parallel Saga Steps

Some saga steps can execute in parallel. If inventory check and payment authorization don't depend on each other, execute them concurrently to reduce total saga duration.

async function executeParallelSteps() {
    const [inventoryResult, paymentResult] = await Promise.all([
        inventoryService.checkAvailability(items),
        paymentService.authorizePayment(paymentData)
    ]);

    // If either fails, compensate both
    if (!inventoryResult.success || !paymentResult.success) {
        await Promise.all([
            inventoryService.releaseReservation(inventoryResult),
            paymentService.voidAuthorization(paymentResult)
        ]);
        throw new Error('Parallel steps failed');
    }

    return { inventoryResult, paymentResult };
}

Frequently Asked Questions

Can sagas provide ACID guarantees?

Sagas provide eventual consistency, not ACID guarantees. During saga execution, other transactions can observe intermediate states where some operations completed and others haven't. Sagas don't provide isolation, and compensation isn't the same as atomicity because the system was briefly in an inconsistent state. Use sagas when you can accept eventual consistency; use traditional transactions when you need strict ACID guarantees.

What happens if a compensation fails?

Failed compensations are one of the hardest saga problems. The system should log compensation failures, continue attempting remaining compensations, and trigger alerts for manual intervention. Some compensations might need human review to resolve. Design compensations to be idempotent so they can be safely retried, and implement retries with exponential backoff before marking a compensation as failed.

How do you handle reading data from an in-progress saga?

Applications must be aware that data might represent incomplete sagas. Include saga status in your data model so readers can distinguish between completed transactions and in-progress sagas. Expose saga state through APIs so clients can show appropriate UI for pending operations. For critical operations, consider implementing read-your-writes consistency where clients wait for saga completion before proceeding.

Should every microservice call use sagas?

Sagas are only necessary when you need to maintain consistency across multiple service boundaries. If a single service can handle an operation using a local transaction, use a local transaction. Sagas add complexity through state management, compensation logic, and monitoring. Only introduce sagas when the business logic genuinely requires coordinating multiple services, and when eventual consistency is acceptable.

How do you version saga definitions?

Store saga version with each saga instance. When you need to change saga logic, create a new version and handle both old and new versions during a transition period. Old saga instances continue executing with old logic while new instances use new logic. This prevents breaking in-flight sagas when you deploy changes. After all old saga versions complete, you can remove the old version's code.

Can choreography and orchestration patterns coexist?

Yes, and this is common in complex systems. Use choreography for high-level business flows and orchestration for specific sub-workflows that need centralized control. The order service might publish events that trigger multiple choreographed workflows, while each workflow uses orchestration internally for its specific steps. Choose the pattern that fits each workflow's characteristics rather than forcing a single pattern everywhere.

How do you prevent duplicate saga executions?

Use idempotency keys to prevent duplicate saga starts. When clients initiate a saga, they provide an idempotency key (usually based on the business operation, like order ID). Before starting a new saga, check if a saga with that idempotency key already exists. If it does, return the existing saga result instead of starting a duplicate. This prevents the same order from being processed twice if the client retries due to network issues.

What metrics should you track for saga health?

Track saga success rate, average completion time, compensation rate (percentage of sagas that needed compensation), per-step failure rates, saga timeout rate, and in-progress saga count. Alert when compensation rate exceeds baseline, when average completion time increases significantly, or when in-progress saga count grows without corresponding throughput increase. These metrics indicate problems in either saga implementation or downstream services.

Conclusion

The saga pattern enables consistent data management across microservices without distributed transactions by coordinating local transactions through choreography or orchestration. Choreography uses events to create loosely coupled services, while orchestration provides centralized control and visibility. Both patterns require careful compensation design, persistent state management, and comprehensive monitoring.

Choose choreography for simple workflows with clear event flows and when loose coupling is critical. Choose orchestration for complex workflows with conditional logic or when debugging visibility is important. Many systems benefit from combining both patterns, using the approach that best fits each workflow.

Successful saga implementations handle partial failures gracefully through idempotent compensations, persist saga state to survive system crashes, and provide monitoring that makes failed sagas easy to diagnose and fix. Accept eventual consistency as a necessary tradeoff and design your application to handle intermediate states where sagas are still executing.


Share on Social Media: