How to Design a Notification System at Scale

How to Design a Notification System at Scale

Profile-Image
Bright SEO Tools in saas Published: Apr 04, 2026 | Updated: Apr 04, 2026 · 2 months ago
0:00

How to Design a Notification System at Scale

When your notification system sends 10 million notifications per day across email, SMS, push, and in-app channels, naive implementations create bottlenecks that delay delivery, drop messages, and overwhelm external providers. A properly designed notification system needs distributed queuing, rate limiting, priority handling, delivery tracking, and provider failover to maintain reliability at scale.

This guide explains how to architect notification systems that handle high throughput without sacrificing delivery guarantees. You'll learn queue-based architectures that prevent thundering herds, templating strategies that separate content from delivery, provider abstraction layers that enable failover, and the tracking infrastructure needed to debug delivery failures across millions of notifications.

We cover the specific design decisions that determine whether your system can scale from thousands to millions of notifications: choosing between push and pull models, implementing deduplication without memory explosions, batching for efficiency without sacrificing latency, and handling partial failures where some channels succeed while others fail.

Core Architecture Components

A scalable notification system separates notification creation, queuing, routing, delivery, and tracking into distinct components. This separation enables independent scaling and prevents failures in one channel from affecting others.

Notification Service API

The notification service exposes an API that accepts notification requests from other services. This API validates input, enriches notifications with user preferences, and enqueues notifications for delivery without blocking the caller.

// Notification Service API
class NotificationAPI {
    async sendNotification(request) {
        // Validate request
        const validation = this.validateRequest(request);
        if (!validation.valid) {
            throw new Error(validation.error);
        }

        // Fetch user preferences
        const preferences = await this.getUserPreferences(request.userId);

        // Determine which channels to use
        const channels = this.selectChannels(
            request.channels,
            preferences,
            request.priority
        );

        if (channels.length === 0) {
            return {
                notificationId: null,
                status: 'SKIPPED',
                reason: 'User opted out of all requested channels'
            };
        }

        // Create notification record
        const notification = await this.db.notifications.create({
            userId: request.userId,
            type: request.type,
            priority: request.priority,
            channels: channels,
            templateId: request.templateId,
            data: request.data,
            status: 'PENDING',
            createdAt: new Date()
        });

        // Enqueue for delivery
        for (const channel of channels) {
            await this.queue.publish(`notifications.${channel}`, {
                notificationId: notification.id,
                userId: request.userId,
                channel: channel,
                priority: request.priority,
                template: request.templateId,
                data: request.data,
                attempt: 0
            });
        }

        return {
            notificationId: notification.id,
            status: 'QUEUED',
            channels: channels
        };
    }

    validateRequest(request) {
        if (!request.userId) {
            return { valid: false, error: 'userId required' };
        }
        if (!request.type) {
            return { valid: false, error: 'notification type required' };
        }
        if (!request.channels || request.channels.length === 0) {
            return { valid: false, error: 'at least one channel required' };
        }
        return { valid: true };
    }

    selectChannels(requestedChannels, preferences, priority) {
        const allowedChannels = [];

        for (const channel of requestedChannels) {
            // Check user preferences
            if (preferences[channel] === false) {
                continue;
            }

            // Critical notifications override preferences
            if (priority === 'CRITICAL') {
                allowedChannels.push(channel);
                continue;
            }

            // Check quiet hours for non-critical
            if (this.isQuietHours(preferences.quietHours)) {
                if (channel === 'push' || channel === 'sms') {
                    continue;
                }
            }

            allowedChannels.push(channel);
        }

        return allowedChannels;
    }
}

The API returns immediately after enqueueing notifications. Callers receive a notification ID for tracking but don't wait for actual delivery. This async approach prevents notification delays from slowing down critical application flows like user registration or payment processing.

Message Queue Architecture

Message queues buffer notifications between creation and delivery, providing backpressure when delivery is slower than creation. Using separate queues per channel enables independent scaling and prevents one channel's problems from blocking others.

// Queue structure
notifications.email      // Email notifications
notifications.sms        // SMS notifications
notifications.push       // Push notifications
notifications.in_app     // In-app notifications
notifications.webhook    // Webhook notifications

// Priority queues for time-sensitive notifications
notifications.email.critical
notifications.sms.critical
notifications.push.critical

// Dead letter queues for failed deliveries
notifications.email.dlq
notifications.sms.dlq
notifications.push.dlq

Priority queues ensure critical notifications like security alerts or payment confirmations deliver before promotional messages. Workers consume from critical queues first, falling back to standard queues when no critical messages exist.

class NotificationWorker {
    async processNotifications() {
        while (true) {
            // Try critical queue first
            let message = await this.queue.consume('notifications.email.critical', {
                timeout: 1000
            });

            // Fall back to standard queue
            if (!message) {
                message = await this.queue.consume('notifications.email', {
                    timeout: 5000
                });
            }

            if (message) {
                await this.handleNotification(message);
            }
        }
    }

    async handleNotification(message) {
        try {
            await this.deliverNotification(message);
            await this.queue.ack(message);
        } catch (error) {
            if (message.attempt < 3) {
                // Retry with exponential backoff
                await this.requeueWithDelay(message, message.attempt);
            } else {
                // Move to dead letter queue
                await this.queue.publish('notifications.email.dlq', message);
                await this.queue.ack(message);
            }
        }
    }
}

Channel Abstraction Layer

The channel abstraction defines a common interface for all notification channels. This enables adding new channels without changing core system logic and allows provider failover when primary providers fail.

interface NotificationChannel {
    async send(notification: Notification): Promise;
    async validate(destination: string): Promise;
    getRetryPolicy(): RetryPolicy;
    getRateLimits(): RateLimits;
}

class EmailChannel implements NotificationChannel {
    constructor(primaryProvider, fallbackProvider) {
        this.primaryProvider = primaryProvider; // SendGrid
        this.fallbackProvider = fallbackProvider; // AWS SES
    }

    async send(notification) {
        const email = await this.buildEmail(notification);

        try {
            return await this.primaryProvider.send(email);
        } catch (error) {
            console.warn('Primary email provider failed, using fallback', error);
            return await this.fallbackProvider.send(email);
        }
    }

    async buildEmail(notification) {
        const template = await this.templateEngine.render(
            notification.template,
            notification.data
        );

        return {
            to: notification.destination,
            from: this.getFromAddress(notification.type),
            subject: template.subject,
            html: template.html,
            text: template.text
        };
    }

    getRetryPolicy() {
        return {
            maxAttempts: 3,
            backoffMultiplier: 2,
            initialDelay: 1000
        };
    }

    getRateLimits() {
        return {
            perSecond: 100,
            perHour: 10000
        };
    }
}

class SMSChannel implements NotificationChannel {
    constructor(twilioClient) {
        this.client = twilioClient;
    }

    async send(notification) {
        const message = await this.buildMessage(notification);

        // SMS has character limits
        if (message.body.length > 160) {
            message.body = message.body.substring(0, 157) + '...';
        }

        const result = await this.client.messages.create({
            to: notification.destination,
            from: this.config.phoneNumber,
            body: message.body
        });

        return {
            success: true,
            providerId: result.sid,
            cost: result.price
        };
    }

    async validate(phoneNumber) {
        // Validate phone number format
        return /^\+[1-9]\d{1,14}$/.test(phoneNumber);
    }

    getRetryPolicy() {
        return {
            maxAttempts: 2, // SMS failures usually aren't transient
            backoffMultiplier: 1,
            initialDelay: 5000
        };
    }

    getRateLimits() {
        return {
            perSecond: 10,
            perHour: 1000
        };
    }
}
Pro Tip: Abstract provider-specific details behind the channel interface. When your SendGrid API key gets rate limited, switching to AWS SES should require changing one line of configuration, not rewriting notification logic.

Template Management

Templates separate notification content from delivery logic. This enables marketing teams to update notification copy without deploying code and ensures consistent branding across channels.

Template Storage and Versioning

const templateSchema = {
    id: String,
    name: String,
    version: Number,
    channels: {
        email: {
            subject: String,
            htmlBody: String,
            textBody: String
        },
        sms: {
            body: String
        },
        push: {
            title: String,
            body: String,
            icon: String
        },
        in_app: {
            title: String,
            body: String,
            actionUrl: String
        }
    },
    variables: [String], // Required template variables
    createdAt: Date,
    status: String // 'DRAFT', 'ACTIVE', 'DEPRECATED'
};

class TemplateEngine {
    async renderTemplate(templateId, data, channel) {
        const template = await this.getTemplate(templateId, channel);

        // Validate all required variables are provided
        this.validateVariables(template.variables, data);

        // Render template with data
        const rendered = {
            subject: this.interpolate(template.subject, data),
            htmlBody: this.interpolate(template.htmlBody, data),
            textBody: this.interpolate(template.textBody, data)
        };

        // Apply transformations
        rendered.htmlBody = this.applyHTMLSanitization(rendered.htmlBody);
        rendered.textBody = this.stripHTML(rendered.textBody);

        return rendered;
    }

    interpolate(template, data) {
        return template.replace(/\{\{(\w+)\}\}/g, (match, key) => {
            if (!(key in data)) {
                throw new Error(`Missing template variable: ${key}`);
            }
            return this.escape(data[key]);
        });
    }

    escape(value) {
        if (typeof value === 'string') {
            return value
                .replace(/&/g, '&')
                .replace(//g, '>')
                .replace(/"/g, '"')
                .replace(/'/g, ''');
        }
        return value;
    }

    async getTemplate(templateId, channel) {
        // Cache templates to avoid database queries
        const cacheKey = `template:${templateId}:${channel}`;
        let template = await this.cache.get(cacheKey);

        if (!template) {
            const templateDoc = await this.db.templates.findOne({
                id: templateId,
                status: 'ACTIVE'
            });

            if (!templateDoc) {
                throw new Error(`Template not found: ${templateId}`);
            }

            template = templateDoc.channels[channel];
            if (!template) {
                throw new Error(`Template ${templateId} does not support channel ${channel}`);
            }

            await this.cache.set(cacheKey, template, 3600);
        }

        return template;
    }
}

Multi-Channel Template Consistency

The same logical notification may need different content per channel due to format constraints. Email supports rich HTML, SMS has 160-character limits, and push notifications need short titles. Templates must maintain semantic consistency while adapting to channel constraints.

// Example: Order confirmation template
{
    "id": "order_confirmed",
    "name": "Order Confirmation",
    "channels": {
        "email": {
            "subject": "Order {{orderNumber}} Confirmed",
            "htmlBody": "<h1>Thanks for your order!</h1><p>Your order {{orderNumber}} for {{itemCount}} items totaling {{total}} has been confirmed...</p>",
            "textBody": "Thanks for your order! Your order {{orderNumber}} for {{itemCount}} items totaling {{total}} has been confirmed..."
        },
        "sms": {
            "body": "Order {{orderNumber}} confirmed. Total: {{total}}. Track at {{trackingUrl}}"
        },
        "push": {
            "title": "Order Confirmed",
            "body": "Order {{orderNumber}} - {{total}}",
            "icon": "order_confirmed.png"
        },
        "in_app": {
            "title": "Order Confirmed",
            "body": "Your order {{orderNumber}} has been confirmed",
            "actionUrl": "/orders/{{orderNumber}}"
        }
    },
    "variables": ["orderNumber", "itemCount", "total", "trackingUrl"]
}

Rate Limiting and Throttling

External notification providers impose rate limits. Exceeding these limits causes delivery failures and may result in account suspension. Proper throttling prevents hitting provider limits while maintaining high throughput.

Token Bucket Rate Limiter

class TokenBucketRateLimiter {
    constructor(capacity, refillRate) {
        this.capacity = capacity; // Maximum tokens
        this.tokens = capacity; // Current tokens
        this.refillRate = refillRate; // Tokens per second
        this.lastRefill = Date.now();
    }

    async acquire(tokens = 1) {
        this.refill();

        if (this.tokens >= tokens) {
            this.tokens -= tokens;
            return true;
        }

        // Calculate wait time for required tokens
        const tokensNeeded = tokens - this.tokens;
        const waitTime = (tokensNeeded / this.refillRate) * 1000;

        await this.sleep(waitTime);
        this.refill();
        this.tokens -= tokens;
        return true;
    }

    refill() {
        const now = Date.now();
        const timePassed = (now - this.lastRefill) / 1000;
        const tokensToAdd = timePassed * this.refillRate;

        this.tokens = Math.min(this.capacity, this.tokens + tokensToAdd);
        this.lastRefill = now;
    }

    sleep(ms) {
        return new Promise(resolve => setTimeout(resolve, ms));
    }
}

// Usage for SendGrid rate limit (100 emails per second)
const emailRateLimiter = new TokenBucketRateLimiter(100, 100);

async function sendEmail(email) {
    await emailRateLimiter.acquire(1);
    return await emailProvider.send(email);
}

Distributed Rate Limiting with Redis

When multiple notification workers send via the same provider, they need shared rate limiting state. Redis-based rate limiters coordinate across workers to prevent exceeding aggregate limits.

class RedisRateLimiter {
    constructor(redisClient, key, limit, window) {
        this.redis = redisClient;
        this.key = key;
        this.limit = limit;
        this.window = window; // Window in seconds
    }

    async acquire() {
        const now = Date.now();
        const windowStart = now - (this.window * 1000);

        // Use Redis sorted set to track requests
        const pipeline = this.redis.pipeline();

        // Remove old requests outside the window
        pipeline.zremrangebyscore(this.key, '-inf', windowStart);

        // Count requests in current window
        pipeline.zcard(this.key);

        // Add current request
        pipeline.zadd(this.key, now, `${now}-${Math.random()}`);

        // Set expiry on the key
        pipeline.expire(this.key, this.window * 2);

        const results = await pipeline.exec();
        const currentCount = results[1][1];

        if (currentCount >= this.limit) {
            // Remove the request we just added since we're over limit
            await this.redis.zrem(this.key, `${now}-${Math.random()}`);

            // Calculate wait time
            const oldestRequest = await this.redis.zrange(this.key, 0, 0, 'WITHSCORES');
            const oldestTimestamp = parseInt(oldestRequest[1]);
            const waitTime = (oldestTimestamp + (this.window * 1000)) - now;

            throw new RateLimitError(`Rate limit exceeded, retry after ${waitTime}ms`);
        }

        return true;
    }
}

// Usage across multiple workers
const sendgridLimiter = new RedisRateLimiter(
    redisClient,
    'ratelimit:sendgrid',
    100, // 100 requests
    1    // per 1 second
);

Provider-Specific Rate Limit Handling

Different providers have different rate limit characteristics. Twilio charges per SMS, so exceeding limits costs money. SendGrid returns 429 responses that indicate when to retry. Handle each provider's limits according to their specific behavior.

class ProviderRateLimitHandler {
    async sendWithRetry(provider, message, maxRetries = 3) {
        let lastError;

        for (let attempt = 0; attempt < maxRetries; attempt++) {
            try {
                return await provider.send(message);
            } catch (error) {
                lastError = error;

                // Check if error is rate limit
                if (error.statusCode === 429) {
                    const retryAfter = error.headers['retry-after'];
                    const waitTime = retryAfter ? parseInt(retryAfter) * 1000 : Math.pow(2, attempt) * 1000;

                    console.warn(`Rate limited by provider, waiting ${waitTime}ms`);
                    await this.sleep(waitTime);
                    continue;
                }

                // Other errors don't benefit from retry
                throw error;
            }
        }

        throw lastError;
    }
}
Warning: Rate limits apply at the account level, not per worker. If you scale to 10 workers but your provider allows 100 requests/second, each worker can only send 10/second on average. Coordinate through distributed rate limiters or you'll hit limits immediately.

Deduplication

Duplicate notifications frustrate users and waste money on SMS. Deduplication prevents sending the same notification multiple times while handling the reality that notification systems must be idempotent to survive crashes.

Time-Window Deduplication

class NotificationDeduplicator {
    constructor(redisClient, windowMinutes = 60) {
        this.redis = redisClient;
        this.window = windowMinutes * 60; // Convert to seconds
    }

    async shouldSend(userId, notificationType, contentHash) {
        const key = `dedup:${userId}:${notificationType}:${contentHash}`;

        // Try to set key with NX (only if not exists)
        const result = await this.redis.set(key, '1', 'EX', this.window, 'NX');

        // If result is null, key already existed (duplicate)
        return result !== null;
    }

    generateContentHash(data) {
        // Hash notification content to detect duplicates
        const content = JSON.stringify(data);
        return crypto.createHash('sha256').update(content).digest('hex').substring(0, 16);
    }
}

// Usage in notification worker
async function processNotification(message) {
    const contentHash = deduplicator.generateContentHash({
        template: message.template,
        data: message.data
    });

    const shouldSend = await deduplicator.shouldSend(
        message.userId,
        message.type,
        contentHash
    );

    if (!shouldSend) {
        console.log(`Skipping duplicate notification for user ${message.userId}`);
        await metrics.increment('notifications.deduplicated');
        return { status: 'DEDUPLICATED' };
    }

    return await channel.send(message);
}

Cross-Channel Deduplication

Users shouldn't receive the same notification via email and SMS simultaneously unless explicitly configured. Track which channels already delivered a notification within a time window.

async function checkChannelDeduplication(userId, notificationId) {
    const key = `notification:${notificationId}:channels`;

    // Get channels that already received this notification
    const sentChannels = await redis.smembers(key);

    // Determine which channels still need sending
    const pendingChannels = requestedChannels.filter(
        ch => !sentChannels.includes(ch)
    );

    if (pendingChannels.length === 0) {
        return { status: 'ALL_CHANNELS_SENT' };
    }

    // Send to pending channels
    for (const channel of pendingChannels) {
        await sendToChannel(channel, notification);

        // Mark channel as sent
        await redis.sadd(key, channel);
        await redis.expire(key, 86400); // Keep for 24 hours
    }

    return { status: 'SENT', channels: pendingChannels };
}

Delivery Tracking and Analytics

Track notification delivery status, user engagement, and provider performance to debug failures and optimize notification strategy.

Delivery Status Tracking

const deliveryStatuses = [
    'CREATED',      // Notification created
    'QUEUED',       // In delivery queue
    'SENT',         // Sent to provider
    'DELIVERED',    // Confirmed delivery
    'FAILED',       // Permanent failure
    'BOUNCED',      // Email bounced
    'OPENED',       // Email opened or push clicked
    'CLICKED'       // Link clicked
];

class DeliveryTracker {
    async updateDeliveryStatus(notificationId, channel, status, metadata = {}) {
        await this.db.deliveries.updateOne(
            { notificationId, channel },
            {
                $set: {
                    status: status,
                    updatedAt: new Date(),
                    ...metadata
                },
                $push: {
                    statusHistory: {
                        status: status,
                        timestamp: new Date(),
                        metadata: metadata
                    }
                }
            },
            { upsert: true }
        );

        // Update metrics
        await this.metrics.increment(`notifications.${channel}.${status.toLowerCase()}`);
    }

    async trackEmailOpen(notificationId) {
        await this.updateDeliveryStatus(notificationId, 'email', 'OPENED', {
            openedAt: new Date()
        });
    }

    async trackLinkClick(notificationId, linkUrl) {
        await this.db.deliveries.updateOne(
            { notificationId },
            {
                $push: {
                    clicks: {
                        url: linkUrl,
                        clickedAt: new Date()
                    }
                }
            }
        );

        await this.updateDeliveryStatus(notificationId, 'email', 'CLICKED');
    }

    async getDeliveryReport(notificationId) {
        const deliveries = await this.db.deliveries.find({ notificationId });

        return {
            notificationId,
            channels: deliveries.map(d => ({
                channel: d.channel,
                status: d.status,
                sentAt: d.statusHistory.find(s => s.status === 'SENT')?.timestamp,
                deliveredAt: d.statusHistory.find(s => s.status === 'DELIVERED')?.timestamp,
                openedAt: d.openedAt,
                clicks: d.clicks || []
            }))
        };
    }
}

Webhook Handling for Provider Events

Notification providers send webhooks when delivery status changes. Process these webhooks to update delivery tracking and detect problems.

// Webhook endpoint for SendGrid events
app.post('/webhooks/sendgrid', async (req, res) => {
    const events = req.body;

    for (const event of events) {
        const notificationId = event.notification_id; // Set in custom args

        switch (event.event) {
            case 'delivered':
                await tracker.updateDeliveryStatus(notificationId, 'email', 'DELIVERED', {
                    providerId: event.sg_message_id,
                    timestamp: event.timestamp
                });
                break;

            case 'bounce':
                await tracker.updateDeliveryStatus(notificationId, 'email', 'BOUNCED', {
                    reason: event.reason,
                    bounceType: event.type
                });
                await handleBounce(event.email, event.type);
                break;

            case 'open':
                await tracker.trackEmailOpen(notificationId);
                break;

            case 'click':
                await tracker.trackLinkClick(notificationId, event.url);
                break;

            case 'dropped':
                await tracker.updateDeliveryStatus(notificationId, 'email', 'FAILED', {
                    reason: event.reason
                });
                break;
        }
    }

    res.status(200).send('OK');
});

async function handleBounce(email, bounceType) {
    if (bounceType === 'hard') {
        // Hard bounce means email is invalid
        await db.users.updateOne(
            { email },
            { $set: { emailValid: false, emailBounced: true } }
        );
    } else if (bounceType === 'soft') {
        // Soft bounce might be temporary (full mailbox, etc.)
        // Track bounce count and disable after threshold
        const user = await db.users.findOne({ email });
        const bounceCount = (user.softBounceCount || 0) + 1;

        if (bounceCount >= 3) {
            await db.users.updateOne(
                { email },
                { $set: { emailValid: false } }
            );
        } else {
            await db.users.updateOne(
                { email },
                { $inc: { softBounceCount: 1 } }
            );
        }
    }
}

User Preferences and Opt-Out Management

Respect user notification preferences to avoid annoying users and comply with regulations. Proper preference management prevents sending unwanted notifications while maintaining the ability to send critical alerts.

Preference Schema

const userPreferencesSchema = {
    userId: String,
    channels: {
        email: {
            enabled: Boolean,
            categories: {
                marketing: Boolean,
                product_updates: Boolean,
                security: Boolean,
                billing: Boolean
            }
        },
        sms: {
            enabled: Boolean,
            categories: {
                critical_only: Boolean,
                order_updates: Boolean
            }
        },
        push: {
            enabled: Boolean,
            categories: {
                messages: Boolean,
                comments: Boolean,
                mentions: Boolean
            }
        },
        in_app: {
            enabled: Boolean
        }
    },
    quietHours: {
        enabled: Boolean,
        start: String, // "22:00"
        end: String,   // "08:00"
        timezone: String
    },
    frequency: {
        digest: Boolean, // Bundle into daily/weekly digest
        digestFrequency: String // 'daily', 'weekly'
    }
};

Preference Enforcement

class PreferenceEnforcer {
    async shouldSendNotification(userId, notificationType, channel, priority) {
        const preferences = await this.getPreferences(userId);

        // Always allow critical notifications
        if (priority === 'CRITICAL') {
            return true;
        }

        // Check channel enabled
        if (!preferences.channels[channel]?.enabled) {
            return false;
        }

        // Check category preferences
        const category = this.getNotificationCategory(notificationType);
        if (preferences.channels[channel].categories[category] === false) {
            return false;
        }

        // Check quiet hours
        if (preferences.quietHours?.enabled) {
            if (this.isInQuietHours(preferences.quietHours)) {
                // Only allow critical notifications during quiet hours
                return false;
            }
        }

        // Check if user should receive digest instead
        if (preferences.frequency?.digest) {
            if (!this.isCritical(notificationType)) {
                await this.addToDigest(userId, notificationType, data);
                return false;
            }
        }

        return true;
    }

    isInQuietHours(quietHours) {
        const now = moment().tz(quietHours.timezone);
        const start = moment.tz(quietHours.start, 'HH:mm', quietHours.timezone);
        const end = moment.tz(quietHours.end, 'HH:mm', quietHours.timezone);

        if (end.isBefore(start)) {
            // Quiet hours span midnight
            return now.isAfter(start) || now.isBefore(end);
        }

        return now.isBetween(start, end);
    }

    async addToDigest(userId, notificationType, data) {
        await this.db.digests.updateOne(
            { userId, date: moment().format('YYYY-MM-DD') },
            {
                $push: {
                    notifications: {
                        type: notificationType,
                        data: data,
                        timestamp: new Date()
                    }
                }
            },
            { upsert: true }
        );
    }
}

Batching and Digest Notifications

Instead of sending individual notifications for every event, batch related notifications into digests. This reduces notification fatigue and costs while still keeping users informed.

class DigestProcessor {
    async processDigests(frequency) {
        const users = await this.db.digests.find({
            frequency: frequency,
            lastSent: { $lt: moment().subtract(1, frequency).toDate() }
        });

        for (const user of users) {
            await this.sendDigest(user);
        }
    }

    async sendDigest(user) {
        const notifications = await this.db.digests.findOne({
            userId: user.id,
            date: moment().format('YYYY-MM-DD')
        });

        if (!notifications || notifications.notifications.length === 0) {
            return;
        }

        // Group notifications by type
        const grouped = this.groupNotifications(notifications.notifications);

        // Render digest email
        const digestEmail = await this.renderDigest(grouped);

        // Send digest
        await this.notificationService.send({
            userId: user.id,
            type: 'digest',
            channels: ['email'],
            template: 'daily_digest',
            data: {
                notifications: grouped,
                count: notifications.notifications.length
            }
        });

        // Mark digest as sent
        await this.db.digests.updateOne(
            { userId: user.id },
            { $set: { lastSent: new Date() } }
        );
    }

    groupNotifications(notifications) {
        const grouped = {};

        for (const notification of notifications) {
            if (!grouped[notification.type]) {
                grouped[notification.type] = [];
            }
            grouped[notification.type].push(notification);
        }

        return grouped;
    }
}

Frequently Asked Questions

How do you handle notifications for users in different timezones?

Store user timezone preferences in your database and schedule notifications in their local time. For immediate notifications, send right away regardless of timezone. For scheduled notifications like daily digests, calculate the send time based on the user's timezone. Queue notifications with a scheduled delivery time and have workers check for notifications ready to send based on current time in each timezone.

What's the best way to prevent notification spam?

Implement per-user rate limits that prevent sending more than N notifications per time window regardless of how many events trigger notifications. Group related notifications into single messages when possible. Provide granular preference controls so users can opt out of specific notification types without losing critical alerts. Use digest mode for non-urgent notifications to batch many events into periodic summaries.

How do you test notification delivery without spamming real users?

Use test mode flags that redirect all notifications to test email addresses or phone numbers. Most providers offer sandbox environments that accept notifications without actually delivering them. For integration tests, mock provider APIs to verify your system makes the correct calls without touching external services. Implement email address patterns like +test suffixes that your system recognizes as test accounts.

What metrics should you track for notification system health?

Track delivery success rate per channel, average delivery latency, queue depth, provider API error rates, bounce rates for email, unsubscribe rates per notification type, and click-through rates. Alert when delivery success drops below threshold, queue depth grows without being processed, or provider errors spike. Monitor costs per channel to detect unexpected increases from provider rate limit errors or spam.

How do you recover from notification provider outages?

Implement automatic failover to backup providers when primary providers fail. Keep notifications in queue with retry logic that includes exponential backoff. Set maximum retry periods to avoid queuing notifications that become stale. Store notification state so you can query which notifications failed during outages. For critical notifications, use multiple providers in parallel to ensure at least one delivery succeeds.

Should you allow users to completely opt out of all notifications?

Allow opting out of marketing and feature notifications but reserve the right to send critical notifications like security alerts, password resets, and billing issues. This distinction is both legally acceptable and necessary for account security. Make this clear in your preference UI by showing some notifications as always enabled. For users who want zero notifications, consider disabling their account since they won't receive security alerts.

How do you handle notification delivery failures?

Retry transient failures with exponential backoff, typically 3 attempts over 10-30 minutes. Move permanently failed notifications to a dead letter queue for investigation. Track failure reasons to identify systemic issues like invalid email addresses or provider configuration problems. For critical notifications, try alternative channels if primary channel fails. Alert operations team when failure rates exceed thresholds.

What's the best approach for A/B testing notifications?

Assign users to test variants based on user ID hash for consistency. Track variant assignment in notification metadata. Measure engagement metrics like open rates, click-through rates, and conversion rates per variant. Run tests long enough to gather statistical significance, typically at least 1000 notifications per variant. Test one variable at a time: subject lines, send times, content, or calls to action.

Conclusion

Scalable notification systems require queue-based architectures that separate notification creation from delivery, channel abstraction layers that enable provider failover, rate limiting that respects provider constraints, and comprehensive tracking that enables debugging delivery failures. These components work together to deliver millions of notifications reliably across email, SMS, push, and in-app channels.

The critical architectural decisions are choosing message queues that provide ordering guarantees, implementing distributed rate limiters when multiple workers share provider accounts, designing deduplication that prevents spam without excessive memory usage, and building preference systems that respect user choices while maintaining the ability to send critical alerts.

Successful notification systems balance throughput with user experience by batching non-urgent notifications into digests, enforcing quiet hours for non-critical alerts, and providing granular preference controls. Monitor delivery metrics continuously and implement automatic failover to maintain reliability when providers experience outages.


Share on Social Media: