Help us improve
Share bugs, ideas, or general feedback.
From kafka-skills
Analyzes Kafka consumer group lag via Lenses MCP, diagnosing causes like throughput bottlenecks, rebalancing, partition skew, and stalled consumers. Reports findings with remediation steps.
npx claudepluginhub lensesio/agentic-engineering-for-apache-kafka --plugin kafka-skillsHow this skill is triggered — by the user, by Claude, or both
Slash command
/kafka-skills:kafka-consumer-lag [required: environment name] [optional: topic name to filter by][required: environment name] [optional: topic name to filter by]This skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Analyses consumer group lag across all groups and diagnoses potential causes. Consumer lag is the most commonly monitored Kafka metric and the first thing engineers check during incidents.
Audits Kafka topic configurations against production best practices using the Lenses MCP server. Checks replication factor, retention, partitions, compaction, naming conventions, orphaned topics, and missing metadata.
Guides Kafka topic design (partitions, replication), KafkaJS idempotent producers/consumers, consumer lag monitoring, exactly-once semantics, schema registry, compacted topics, and DLQ patterns. Use for reliable streaming implementations.
Architect, build, and debug Kafka Streams apps (JVM-embedded stream processing). Use when user mentions KStream, KTable, topology, TopologyTestDriver, StreamsBuilder, interactive queries, GlobalKTable, joins/windows/aggregations, or debugging issues (rebalancing, state stores, lag, deserialization errors). Do NOT trigger for Flink, connectors, CDC, or plain producer/consumer.
Share bugs, ideas, or general feedback.
Analyses consumer group lag across all groups and diagnoses potential causes. Consumer lag is the most commonly monitored Kafka metric and the first thing engineers check during incidents.
Target environment: $ARGUMENTS
Copy this checklist and track your progress:
Lag Analysis Progress:
- [ ] Step 1: Fetch all consumer groups
- [ ] Step 2: Identify problematic groups
- [ ] Step 3: Diagnose root causes
- [ ] Step 4: Generate report
Use the Lenses MCP list_consumer_groups tool to get all consumer groups with:
For topic-specific analysis, use list_consumer_groups_by_topic to narrow the scope.
Expected output: List of all consumer groups with state, lag and member count.
Validation: If no consumer groups are returned, report this finding and stop - the cluster may have no active consumers.
Flag consumer groups in these categories:
get_topic_partitions)get_dataset_message_metrics to check producer throughput)For each problematic group, determine the likely cause:
max.poll.records, reduce processing time per message, check for synchronous blocking callssession.timeout.ms, reduce max.poll.interval.ms riskUse the Lenses MCP execute_sql tool to sample recent messages from lagging topics:
SELECT * FROM {topic} LIMIT 5
This confirms the topic has active producers and messages are flowing.
User says: "Consumers are falling behind, check the lag"
Actions:
User says: "Check consumer lag for orders.payment.completed"
Actions:
list_consumer_groups_by_topic to get groups consuming from that topicexecute_sql
Result: Focused report on a single topic's consumer groupsUser says: "Find any dead or empty consumer groups"
Actions:
Cause: No consumer groups exist in the environment or permissions are restricted. Solution: Verify consumers are running. Check Lenses agent permissions allow listing consumer groups.
Cause: Consumer group offsets may be stale if the consumer recently stopped.
Solution: Compare lag against partition end offsets from get_topic_partitions. Note when offsets were last committed.
Cause: Topic is empty or has very old messages beyond the query window.
Solution: Use get_dataset_message_metrics to check if the topic has recent throughput. An empty topic with lagging consumers indicates a producer issue.
## Consumer Lag Report
### Environment: {name}
### Critical (immediate action)
| Consumer Group | Topic | Total Lag | State | Diagnosis | Remediation |
|---------------|-------|-----------|-------|-----------|-------------|
| group-name | topic | 50000 | Stable | Throughput bottleneck | Scale consumers |
### Warning (investigate)
| Consumer Group | Topic | Total Lag | State | Diagnosis | Remediation |
|---------------|-------|-----------|-------|-----------|-------------|
### Suggestion (optimise)
| Consumer Group | Topic | Total Lag | State | Diagnosis | Remediation |
|---------------|-------|-----------|-------|-----------|-------------|
### Summary
- X consumer groups analysed
- Y groups with critical lag
- Z groups with warnings
- Stale/empty groups: N