Kafka/SQS/RabbitMQ/Bull topology, ordering guarantees, consumer groups, DLQ patterns. Use when decoupling async work from the request path or building event-driven pipelines.
From sde-system-designnpx claudepluginhub chavangorakh1999/sde-skills --plugin sde-system-designThis skill uses the workspace's default tool permissions.
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Fetches up-to-date documentation from Context7 for libraries and frameworks like React, Next.js, Prisma. Use for setup questions, API references, and code examples.
Guides Payload CMS config (payload.config.ts), collections, fields, hooks, access control, APIs. Debugs validation errors, security, relationships, queries, transactions, hook behavior.
Queues decouple producers from consumers, absorb traffic spikes, and enable async processing. Choose the right tool for your durability, ordering, and throughput requirements.
Use case or system: $ARGUMENTS
Use a queue when:
Don't use a queue when:
| Bull/BullMQ | RabbitMQ | SQS | Kafka | |
|---|---|---|---|---|
| Best for | Node.js job queues | Multi-language task routing | AWS-native async jobs | High-throughput event streaming |
| Throughput | ~10K jobs/sec | ~50K msgs/sec | ~3K msgs/sec per queue | >1M msgs/sec |
| Ordering | FIFO per queue | Per-queue FIFO | Best-effort (FIFO queue: strict) | Per-partition ordered |
| Durability | Redis persistence | Disk (durable queues) | S3-backed, 14-day retention | Configurable, up to years |
| Exactly-once | No | No (at-least-once) | No (SQS FIFO: ~yes) | No (idempotent consumer needed) |
| Complexity | Low | Medium | Low | High |
| Use when | Node.js app, Redis already used | Multi-language, complex routing | AWS shop, simple async | Event log, high-throughput, replay |
// BullMQ with Redis backend — great for Node.js microservices
import { Queue, Worker, QueueEvents } from 'bullmq';
const connection = { host: 'localhost', port: 6379 };
// Producer: add jobs to queue
const emailQueue = new Queue('email-notifications', { connection });
async function sendWelcomeEmail(userId, email) {
await emailQueue.add(
'welcome-email',
{ userId, email },
{
attempts: 3, // retry up to 3 times
backoff: {
type: 'exponential',
delay: 2000 // 2s, 4s, 8s
},
removeOnComplete: { count: 100 }, // keep last 100 completed jobs
removeOnFail: { count: 500 } // keep last 500 failed jobs for inspection
}
);
}
// Consumer: process jobs
const worker = new Worker(
'email-notifications',
async (job) => {
const { userId, email } = job.data;
await sendEmailViaProvider(email, 'Welcome!', generateWelcomeTemplate(userId));
// Return value is stored in job.returnvalue
return { sent: true, timestamp: new Date().toISOString() };
},
{
connection,
concurrency: 5, // process up to 5 jobs simultaneously
limiter: { // rate limit to 100 jobs per second
max: 100,
duration: 1000
}
}
);
// Handle failures
worker.on('failed', (job, err) => {
console.error(`Job ${job.id} failed: ${err.message}`);
// After all attempts exhausted, job moves to 'failed' state
// Monitor failed jobs, alert if count exceeds threshold
});
// Dead Letter Queue pattern with BullMQ:
// Failed jobs (after all retries) stay in 'failed' state
// Inspect with: await emailQueue.getFailed(0, 50)
// Retry manually: await job.retry()
// Or move to a separate DLQ queue for human review
// Standard Queue: at-least-once, best-effort ordering, high throughput
// FIFO Queue: exactly-once processing, strict ordering, 300 TPS (3000 with batching)
import { SQSClient, SendMessageCommand, ReceiveMessageCommand, DeleteMessageCommand } from '@aws-sdk/client-sqs';
const sqs = new SQSClient({ region: 'us-east-1' });
// Producer
async function enqueueOrderProcessing(orderId, orderData) {
await sqs.send(new SendMessageCommand({
QueueUrl: process.env.ORDER_QUEUE_URL,
MessageBody: JSON.stringify({ orderId, ...orderData }),
// For FIFO queue, add:
// MessageGroupId: customerId, // ordering within a customer's orders
// MessageDeduplicationId: orderId // exactly-once within 5-minute window
}));
}
// Consumer (long-polling pattern)
async function processOrders() {
while (true) {
const response = await sqs.send(new ReceiveMessageCommand({
QueueUrl: process.env.ORDER_QUEUE_URL,
MaxNumberOfMessages: 10,
WaitTimeSeconds: 20, // long polling — reduces empty receives
VisibilityTimeout: 60 // hide message for 60s while processing
}));
for (const message of response.Messages ?? []) {
try {
await processOrder(JSON.parse(message.Body));
// Delete only after successful processing
await sqs.send(new DeleteMessageCommand({
QueueUrl: process.env.ORDER_QUEUE_URL,
ReceiptHandle: message.ReceiptHandle
}));
} catch (err) {
console.error(`Failed to process message ${message.MessageId}:`, err);
// Don't delete — message becomes visible again after VisibilityTimeout
// After maxReceiveCount failures -> moves to DLQ automatically
}
}
}
}
// DLQ configuration (in Terraform/CDK):
// redrive_policy = {
// deadLetterTargetArn: order_dlq.arn
// maxReceiveCount: 3 -- move to DLQ after 3 failed attempts
// }
// Kafka: topics -> partitions -> offsets
// Partition = unit of parallelism and ordering
// Consumer group = logical consumer with distributed partition assignment
import { Kafka } from 'kafkajs';
const kafka = new Kafka({
clientId: 'order-service',
brokers: ['kafka-1:9092', 'kafka-2:9092', 'kafka-3:9092']
});
// Producer
const producer = kafka.producer({
// Idempotent producer: exactly-once delivery to broker (no duplicate messages)
idempotent: true,
// Required for idempotent: acks must be -1 (all in-sync replicas acknowledge)
});
await producer.connect();
async function publishOrderEvent(orderId, eventType, data) {
await producer.send({
topic: 'order-events',
messages: [{
// Key determines partition assignment — orders from same customer go to same partition
key: data.customerId,
value: JSON.stringify({ eventType, orderId, ...data, timestamp: Date.now() })
}]
});
}
// Consumer
const consumer = kafka.consumer({ groupId: 'inventory-service' });
await consumer.connect();
await consumer.subscribe({ topic: 'order-events', fromBeginning: false });
await consumer.run({
// autoCommit: false for manual offset management (process-then-commit)
autoCommitInterval: 5000, // commit offsets every 5 seconds
eachMessage: async ({ topic, partition, message, heartbeat }) => {
const event = JSON.parse(message.value.toString());
try {
await processOrderEvent(event);
// Offset committed automatically per autoCommitInterval
} catch (err) {
// Kafka has no built-in DLQ — implement your own:
// Option 1: Send to a dead-letter topic
await producer.send({
topic: 'order-events-dlq',
messages: [{ key: message.key, value: message.value, headers: { error: err.message } }]
});
// Option 2: Log error and continue (skip poison pill)
}
}
});
// Every queue MUST have a DLQ. Without it, poison pill messages block the queue forever.
// DLQ handling workflow:
// 1. Message fails maxReceiveCount times
// 2. Moves to DLQ automatically (SQS) or via code (Kafka, Bull)
// 3. Alert fires when DLQ depth > 0
// 4. On-call engineer investigates: is this a bug? bad data? transient failure?
// 5. Fix the consumer or the data, then replay from DLQ
// DLQ message should include:
// - Original message body (unchanged)
// - Error message/stack trace
// - Number of attempts
// - Timestamps of each failure
// - Source queue name
// DLQ monitoring:
// Metric: dlq_depth (messages in DLQ)
// Alert: dlq_depth > 0 for 5 minutes -> PagerDuty
// SLO: DLQ messages investigated within 1 business hour
No ordering guarantee:
SQS Standard, most distributed queues
Use when: parallel processing, order doesn't matter
Example: sending marketing emails — doesn't matter if email B sends before email A
Per-partition ordering (Kafka):
Messages with same key go to same partition, are ordered within partition
Use when: events for same entity must be processed in order
Example: user_id as key -> all events for a user are ordered
Strict global ordering (SQS FIFO, single Bull queue):
All messages processed in exact send order
Throughput limited (SQS FIFO: 300 TPS per message group)
Use when: financial ledger, inventory deductions where order is critical
## Message Queue Design: [System]
### Queue Technology Choice
[Bull / SQS / Kafka / RabbitMQ with rationale]
### Topic/Queue Structure
| Queue/Topic | Producer | Consumer(s) | Ordering | DLQ |
### Message Schema
[JSON schema for each message type]
### Error Handling
[Retry policy, backoff, DLQ configuration, alerting]
### Scalability
[Partition count, consumer scaling, throughput targets]
### Tradeoffs
[At-least-once vs exactly-once, ordering vs throughput, etc.]