Skill

task-success-metrics

Defines task success metrics like completion rate, time to completion, and intervention rate to evaluate if AI helps users achieve goals beyond output quality.

ai-ml

monitoring

npx claudepluginhub owl-listener/ai-design-skills --plugin evaluation

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Output quality doesn't guarantee task success. The AI might produce a beautiful response that doesn't actually help the user do what they came to do. Task success metrics measure the end-to-end outcome.

SKILL.md

Similar Skills

longitudinal-measurement

Tracks AI product quality over time, detecting drift, degradation, and improvements using golden test sets, automated evals, dashboards, and alerts. Useful for AI reliability maintenance.

evaluation

monitor-ai-quality

Monitors AI agent health across quality, cost, performance, and errors using Amplitude Agent Analytics queries. Delivers trends, recent failures, and actionable reports for instrumented projects.

amplitude

ai-health-check

Audits pre-launch AI features across 6 dimensions—model selection, data quality, cost, monitoring, failure UX, optimization—grading readiness and blocking shipment of broken products.

bette-think

Stats

Parent Repo Stars18

Parent Repo Forks3

Last CommitApr 25, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Task Success Metrics

Defining Task Success

For each user task, define:

What does success look like? The user completed their goal (sent the email, found the information, finished the design)
What are the success criteria? Specific, observable conditions that indicate the task is done
What's the time expectation? How long should this task take with AI assistance vs. without?
What's the quality bar? Not just done, but done well enough

Task Success Metrics

Task completion rate: Percentage of users who complete the task (not just get a response)
Time to completion: How long from first input to task done
Turns to completion: How many back-and-forth exchanges needed
First-attempt success rate: Did the AI's first response accomplish the task, or did it require iteration?
Intervention rate: How often did the user need to correct, redirect, or override the AI?
Abandonment rate: How often did users give up before completing the task?

Measuring Task Success

Direct measurement: Track task completion through product analytics (user clicked "done", saved the output, moved to next step)
Inferred measurement: Infer success from proxy signals (session length, return rate, output edits)
Self-reported measurement: Ask users whether the AI helped them accomplish their goal
Comparative measurement: Compare task success with AI vs. without AI, or with version A vs. version B

Task Success vs. Output Quality

These can diverge:

High output quality, low task success: The AI's answer is well-written but doesn't address the real need
Low output quality, high task success: The AI's answer is rough but gives the user exactly what they needed
Both matter: Track both and investigate when they diverge

Design Artefacts

Task success definitions per key user task
Metrics framework with measurement methods
Success criteria specifications
Baseline measurements (before AI, or current version)
Task success dashboard specifications