From shannon
Use when tests have race conditions, timing dependencies, or inconsistent pass/fail behavior - replaces arbitrary timeouts with condition polling to wait for actual state changes, eliminating flaky tests with quantitative reliability tracking
npx claudepluginhub krzemienski/shannon-framework --plugin shannonThis skill uses the workspace's default tool permissions.
Flaky tests often guess at timing with arbitrary delays. This creates race conditions where tests pass on fast machines but fail under load or in CI.
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
Flaky tests often guess at timing with arbitrary delays. This creates race conditions where tests pass on fast machines but fail under load or in CI.
Core principle: Wait for the actual condition you care about, not a guess about how long it takes.
Shannon enhancement: Track flakiness quantitatively and learn optimal wait patterns.
digraph when_to_use {
"Test uses setTimeout/sleep?" [shape=diamond];
"Testing timing behavior?" [shape=diamond];
"Document WHY timeout needed" [shape=box];
"Use condition-based waiting" [shape=box];
"Test uses setTimeout/sleep?" -> "Testing timing behavior?" [label="yes"];
"Testing timing behavior?" -> "Document WHY timeout needed" [label="yes"];
"Testing timing behavior?" -> "Use condition-based waiting" [label="no"];
}
Use when:
setTimeout, sleep, time.sleep())Don't use when:
// ❌ BEFORE: Guessing at timing
await new Promise(r => setTimeout(r, 50));
const result = getResult();
expect(result).toBeDefined();
// ✅ AFTER: Waiting for condition
await waitFor(() => getResult() !== undefined);
const result = getResult();
expect(result).toBeDefined();
| Scenario | Pattern |
|---|---|
| Wait for event | waitFor(() => events.find(e => e.type === 'DONE')) |
| Wait for state | waitFor(() => machine.state === 'ready') |
| Wait for count | waitFor(() => items.length >= 5) |
| Wait for file | waitFor(() => fs.existsSync(path)) |
| Complex condition | waitFor(() => obj.ready && obj.value > 10) |
Generic polling function:
async function waitFor<T>(
condition: () => T | undefined | null | false,
description: string,
timeoutMs = 5000
): Promise<T> {
const startTime = Date.now();
while (true) {
const result = condition();
if (result) {
// Shannon: Track successful wait
trackWaitSuccess(description, Date.now() - startTime);
return result;
}
if (Date.now() - startTime > timeoutMs) {
// Shannon: Track timeout failure
trackWaitTimeout(description, timeoutMs);
throw new Error(`Timeout waiting for ${description} after ${timeoutMs}ms`);
}
await new Promise(r => setTimeout(r, 10)); // Poll every 10ms
}
}
// Shannon tracking helpers
function trackWaitSuccess(description: string, durationMs: number) {
serena.write_memory(`test_reliability/waits/${test_name}`, {
condition: description,
duration_ms: durationMs,
success: true,
timestamp: new Date().toISOString()
});
}
function trackWaitTimeout(description: string, timeoutMs: number) {
serena.write_memory(`test_reliability/waits/${test_name}`, {
condition: description,
timeout_ms: timeoutMs,
success: false,
timestamp: new Date().toISOString()
});
}
See @example.ts for complete implementation with domain-specific helpers (waitForEvent, waitForEventCount, waitForEventMatch) from actual debugging session.
❌ Polling too fast: setTimeout(check, 1) - wastes CPU
✅ Fix: Poll every 10ms
❌ No timeout: Loop forever if condition never met ✅ Fix: Always include timeout with clear error
❌ Stale data: Cache state before loop ✅ Fix: Call getter inside loop for fresh data
❌ Not tracking flakiness: No visibility into test stability ✅ Fix: Use Shannon tracking to measure reliability
// Tool ticks every 100ms - need 2 ticks to verify partial output
await waitForEvent(manager, 'TOOL_STARTED'); // First: wait for condition
await new Promise(r => setTimeout(r, 200)); // Then: wait for timed behavior
// 200ms = 2 ticks at 100ms intervals - documented and justified
Requirements:
Flakiness Score Formula:
# Track test runs over time
test_runs = serena.query_memory(f"test_reliability/tests/{test_name}/*")
total_runs = len(test_runs)
failures = len([r for r in test_runs if not r["success"]])
# Flakiness score: 0.00 (perfect) to 1.00 (always fails)
flakiness_score = failures / total_runs if total_runs > 0 else 0.0
# Classifications:
# 0.00-0.05: STABLE (excellent)
# 0.05-0.10: ACCEPTABLE (monitor)
# 0.10-0.25: FLAKY (needs condition-based-waiting)
# 0.25+: BROKEN (urgent fix required)
Track per test:
test_metrics = {
"test_name": test_name,
"total_runs": 100,
"failures": 8,
"flakiness_score": 0.08,
"status": "ACCEPTABLE",
"avg_duration_ms": 245,
"timeout_rate": 0.02,
"last_failure": ISO_timestamp,
"recommendations": [
"Consider condition-based-waiting for async operations",
"Monitor timeout rate"
]
}
serena.write_memory(f"test_reliability/tests/{test_name}/metrics", test_metrics)
Learn from historical data:
# Query historical wait times for similar conditions
wait_history = serena.query_memory("test_reliability/waits/*:condition~'database ready'")
# Calculate optimal timeout
optimal_timeout = calculate_optimal_timeout(wait_history)
# Typical wait patterns:
patterns = {
"p50": percentile(wait_history, 0.50), # 50% complete within
"p95": percentile(wait_history, 0.95), # 95% complete within
"p99": percentile(wait_history, 0.99), # 99% complete within
"max": max([w["duration_ms"] for w in wait_history])
}
# Recommend timeout based on p99 + buffer
recommended_timeout = patterns["p99"] * 1.5
Example output:
Database ready condition:
P50: 120ms (50% of waits complete)
P95: 380ms (95% of waits complete)
P99: 520ms (99% of waits complete)
Recommended timeout: 780ms (p99 × 1.5 buffer)
Current timeout: 5000ms (too long, wastes time on failures)
SUGGESTION: Set timeout to 800ms for faster failure detection
For web testing with Puppeteer:
// Use Puppeteer's built-in waitFor capabilities
import { Page } from 'puppeteer';
async function waitForSelector(page: Page, selector: string) {
// Shannon: Track Puppeteer wait metrics
const startTime = Date.now();
try {
const element = await page.waitForSelector(selector, { timeout: 5000 });
// Track success
trackWaitSuccess(`selector: ${selector}`, Date.now() - startTime);
return element;
} catch (error) {
// Track timeout
trackWaitTimeout(`selector: ${selector}`, 5000);
throw error;
}
}
For complex async scenarios:
// Use Sequential MCP for deep analysis of why test is flaky
if (flakiness_score > 0.10) {
const analysis = await sequential.analyze({
prompt: `Analyze why test "${test_name}" has ${flakiness_score} flakiness.
Review recent failures and suggest condition-based-waiting improvements.`,
context: test_runs.slice(-10) // Last 10 runs
});
console.log("Sequential Analysis:", analysis.recommendations);
}
Pre-commit hook integration:
#!/bin/bash
# hooks/pre-commit-test-check.sh
# Run tests with tracking
npm test
# Query flaky tests
FLAKY_TESTS=$(serena_cli query "test_reliability/tests/*:flakiness_score>0.10" --format json)
if [ -n "$FLAKY_TESTS" ]; then
echo "⚠️ FLAKY TESTS DETECTED:"
echo "$FLAKY_TESTS" | jq -r '.[] | " - \(.test_name): \(.flakiness_score) flakiness"'
echo ""
echo "RECOMMENDATION: Apply condition-based-waiting skill"
echo "See: /shannon:skill condition-based-waiting"
exit 1
fi
From debugging session (2025-10-03):
Shannon tracking proves this:
# Query before/after metrics
before = serena.query_memory("test_reliability/2025-10-02/*")
after = serena.query_memory("test_reliability/2025-10-04/*")
improvement = {
"avg_flakiness_before": 0.42,
"avg_flakiness_after": 0.00,
"tests_fixed": 15,
"avg_duration_before": 2450, # ms
"avg_duration_after": 1470, # ms (40% faster)
"speedup_percent": 40
}
This skill works with:
Shannon integration:
Arbitrary timeouts = guessing. Condition polling = knowing.
Shannon's quantitative tracking turns test reliability from hope into science.
Measure flakiness. Learn patterns. Wait for conditions, not guesses.