From evaluation
Tracks AI product quality over time, detecting drift, degradation, and improvements using golden test sets, automated evals, dashboards, and alerts. Useful for AI reliability maintenance.
npx claudepluginhub owl-listener/ai-design-skills --plugin evaluationThis skill uses the workspace's default tool permissions.
AI products change over time — models get updated, usage patterns shift, and quality can drift without anyone noticing. Longitudinal measurement is how you track quality across time and catch degradation before users do.
Guides post-launch AI feature calibration: document production error patterns, review eval performance, decide agency promotion. Uses CC/CD loop with /calibrate shortcuts.
Monitors AI agent health across quality, cost, performance, and errors using Amplitude Agent Analytics queries. Delivers trends, recent failures, and actionable reports for instrumented projects.
Guides AI-native product development addressing agency-control tradeoffs, calibration loops, CCCD framework, and eval strategies for AI agents and LLM features.
Share bugs, ideas, or general feedback.
AI products change over time — models get updated, usage patterns shift, and quality can drift without anyone noticing. Longitudinal measurement is how you track quality across time and catch degradation before users do.
When measurements show drift: