Design and plan usability tests - task creation, think-aloud protocols, moderator scripts, metrics definition, and analysis frameworks.
Design and plan usability tests with realistic tasks, think-aloud protocols, moderator scripts, and metrics like SUS. Use when creating test plans, writing tasks, or analyzing results for product validation.
/plugin marketplace add melodic-software/claude-code-plugins/plugin install ux-research@melodic-softwareThis skill is limited to using the following tools:
Design and execute usability tests to evaluate how well users can accomplish tasks with your product.
Use this skill when:
Before answering ANY usability testing question:
| Aspect | Moderated | Unmoderated |
|---|---|---|
| Facilitator | Present, guides session | Absent, automated |
| Depth | Deep insights, probing | Surface-level, task-focused |
| Sample Size | 5-8 typical | 20-100+ typical |
| Cost | Higher (facilitator time) | Lower (scale) |
| Turnaround | Days-weeks | Hours-days |
| Best For | Complex flows, discovery | Validation, benchmarking |
| Format | Description | When to Use |
|---|---|---|
| In-Person Moderated | Face-to-face, controlled environment | High-fidelity prototypes, sensitive topics |
| Remote Moderated | Video call, screen share | Geographic diversity, convenience |
| Remote Unmoderated | Recorded tasks, no moderator | Scale, quick validation |
| Guerrilla Testing | Quick tests in public spaces | Early concepts, low budget |
| First-Click Testing | Where users click first | Navigation, labeling |
| 5-Second Test | First impressions | Visual hierarchy, messaging |
| Benchmark Testing | Repeated measurement | Tracking improvements |
Good tasks are:
## Task [N]: [Short Name]
**Scenario:**
[Context setting - why user is doing this]
**Task:**
[What to accomplish - goal, not steps]
**Success Criteria:**
- [ ] [Observable completion indicator]
- [ ] [Secondary success measure if applicable]
**Metrics:**
- Task success (binary or graded)
- Time on task
- Errors/assists needed
- Satisfaction rating
**Probes (Moderated):**
- What are you thinking right now?
- What did you expect to happen?
- What would you do next?
Poor Task: "Click the hamburger menu, then click Settings, then Privacy, then change your notification preferences."
Good Task: "You've been receiving too many email notifications from this app. Find where you can change your notification settings."
Poor Task: "Test the checkout flow."
Good Task: "You've been shopping for a birthday gift and found a book you want to purchase. Complete your purchase using the credit card already saved in your account."
Structure tasks from easy to difficult:
Participant verbalizes thoughts while performing tasks.
Moderator Prompts:
Advantages:
Disadvantages:
Participant reviews recording and explains thoughts after.
Moderator Approach:
Advantages:
Disadvantages:
# Usability Test Script: [Product/Feature]
## Pre-Session Setup (10 min before)
- [ ] Test recording equipment
- [ ] Verify prototype/product works
- [ ] Review participant background
- [ ] Prepare task cards/materials
- [ ] Set up note-taking template
## Introduction (5 min)
"Hello [Name], thank you for helping us today. I'm [Researcher] and
I'll be guiding our session. [Observer] is taking notes.
We're testing [product], not you. There are no wrong answers or
mistakes—everything you do helps us improve the design.
I'll ask you to complete some tasks and think out loud as you go.
Please share whatever comes to mind—your thoughts, reactions,
questions, even frustrations. There's no need to be polite about
problems you encounter.
We're recording the session to help with our notes. The recording
is confidential and only for our team.
Do you have any questions before we begin?"
## Warm-up Questions (3 min)
- Tell me briefly about your role and typical day.
- How familiar are you with [product category]?
- What tools do you currently use for [relevant activity]?
## Task Introduction
"I'm going to give you a series of tasks. I'll read each task aloud
and you'll also have it written down. Please read it back to me so
I know we're on the same page.
Remember to think out loud. If you get stuck or would normally give
up, just let me know—we can move on. Ready?"
## Tasks
### Task 1: [Name] (Warm-up)
[Read task aloud, hand written card]
**Observer Notes:**
- Start time: ___
- End time: ___
- Success: [ ] Complete [ ] Partial [ ] Fail
- Errors: ___
- Assists: ___
- Path taken: ___
**Post-task Questions:**
- How difficult was that task? (1-7 scale)
- What, if anything, was confusing?
### Task 2: [Name] (Primary)
[Continue pattern...]
### Task 3: [Name] (Primary)
...
## Post-Test Questions (5 min)
- What stood out to you about this experience?
- What was the most frustrating part?
- What was the most satisfying part?
- How does this compare to [competitor/current solution]?
- Would you recommend this to a colleague? Why/why not?
## SUS Questionnaire (3 min)
[Administer System Usability Scale]
## Wrap-up (2 min)
"Thank you so much for your time and feedback. Your insights will
directly influence how we improve [product].
Do you have any final questions for me?
[Explain incentive process]"
## Post-Session
- [ ] Save recording
- [ ] Complete observer notes
- [ ] Note immediate impressions
- [ ] Highlight key quotes/moments
- [ ] Debrief with observers
| Metric | Definition | Measurement |
|---|---|---|
| Task Success | Completed successfully | Binary (0/1) or Graded (0/0.5/1) |
| Time on Task | Duration to complete | Seconds/minutes |
| Errors | Mistakes made | Count |
| Assists | Help requests | Count |
| Lostness | Navigation efficiency | (N/S) - (S/N) where N=nodes visited, S=minimum |
| First Click | Correct initial action | Binary |
| Metric | Formula | Benchmark |
|---|---|---|
| Task Success Rate | Successes / Attempts | 78% average (Sauro) |
| Average Time | Sum(times) / n | Task-dependent |
| Error Rate | Errors / Tasks | Lower = better |
| SUS Score | Standardized formula | 68 = average |
| SEQ (Single Ease) | 7-point post-task | 5.5 = average |
| SUPR-Q | Website UX benchmark | Percentile rank |
// SUS Score calculation
public class SusCalculator
{
private static readonly string[] Questions =
[
"I think that I would like to use this system frequently.",
"I found the system unnecessarily complex.",
"I thought the system was easy to use.",
"I think that I would need the support of a technical person.",
"I found the various functions well integrated.",
"I thought there was too much inconsistency.",
"I imagine most people would learn to use quickly.",
"I found the system very cumbersome to use.",
"I felt very confident using the system.",
"I needed to learn a lot before I could get going."
];
public decimal Calculate(int[] responses)
{
if (responses.Length != 10)
throw new ArgumentException("SUS requires exactly 10 responses");
// Responses are 1-5 (Strongly Disagree to Strongly Agree)
decimal score = 0;
for (int i = 0; i < 10; i++)
{
// Odd questions (1,3,5,7,9): score = response - 1
// Even questions (2,4,6,8,10): score = 5 - response
score += i % 2 == 0
? responses[i] - 1
: 5 - responses[i];
}
// Multiply by 2.5 to get 0-100 scale
return score * 2.5m;
}
public SusInterpretation Interpret(decimal score) => score switch
{
>= 85 => SusInterpretation.Excellent, // Top 10%
>= 72 => SusInterpretation.Good, // Top 30%
>= 68 => SusInterpretation.Average, // Median
>= 51 => SusInterpretation.BelowAverage,
_ => SusInterpretation.Poor
};
}
public enum TaskSuccessLevel
{
Complete = 100, // Completed without assistance
PartialMinor = 75, // Completed with minor struggle
PartialMajor = 50, // Completed with significant difficulty
Assisted = 25, // Required moderator hint
Failure = 0 // Could not complete
}
public class TaskResult
{
public required Guid TaskId { get; init; }
public required Guid ParticipantId { get; init; }
public required TaskSuccessLevel Success { get; init; }
public required TimeSpan Duration { get; init; }
public required int ErrorCount { get; init; }
public required int AssistCount { get; init; }
public required int SingleEaseQuestion { get; init; } // 1-7 scale
public string? Notes { get; init; }
public List<string> ClickPath { get; init; } = [];
}
// Configuration for unmoderated test
public class UnmoderatedTestConfig
{
public required string TestName { get; init; }
public required string WelcomeMessage { get; init; }
public required List<ScreenerQuestion> Screener { get; init; }
public required List<UnmoderatedTask> Tasks { get; init; }
public required List<PostTestQuestion> PostQuestions { get; init; }
public TestSettings Settings { get; init; } = new()
{
RecordScreen = true,
RecordAudio = true,
RecordWebcam = false,
RequireThinkAloud = true,
MaxTestDuration = TimeSpan.FromMinutes(30),
AllowTaskSkip = true
};
}
public class UnmoderatedTask
{
public required int Order { get; init; }
public required string Scenario { get; init; }
public required string TaskInstructions { get; init; }
public required string PrototypeUrl { get; init; }
public TimeSpan? TimeLimit { get; init; }
public bool RequireRecording { get; init; } = true;
public List<PostTaskQuestion> FollowUp { get; init; } = [];
}
# Session Analysis: P[N]
**Participant:** [ID/Code]
**Date:** [Date]
**Duration:** [Time]
## Task Performance
| Task | Success | Time | Errors | Assists | SEQ |
|------|---------|------|--------|---------|-----|
| T1 | ✓ | 1:23 | 0 | 0 | 6 |
| T2 | ~ | 3:45 | 2 | 1 | 4 |
| T3 | ✗ | 5:00+ | 3 | - | 2 |
## Key Observations
### Positive
- [What worked well]
### Issues Found
1. **[Issue Name]** - Severity: [Critical/Major/Minor]
- Location: [Where in interface]
- Behavior: [What user did]
- Quote: "[Participant verbalization]"
- Impact: [Effect on task]
### Notable Quotes
- "[Quote]" - Re: [Topic]
## Recommendations
- [Immediate action]
- [Design consideration]
| Severity | Definition | Action |
|---|---|---|
| Critical (4) | Prevents task completion | Must fix before launch |
| Major (3) | Causes significant difficulty | Should fix before launch |
| Minor (2) | Causes slight hesitation | Fix if possible |
| Cosmetic (1) | Noted but didn't affect task | Consider for future |
Track issues across participants:
| Issue | P1 | P2 | P3 | P4 | P5 | Count | Severity |
|-------|----|----|----|----|-------|-----|----------|
| Can't find settings | X | X | | X | | 3/5 | Major |
| Confusing label | X | X | X | X | | 4/5 | Major |
| Slow load time | | X | | | X | 2/5 | Minor |
public class UsabilityTest
{
public Guid Id { get; init; }
public required string Name { get; init; }
public required UsabilityTestType Type { get; init; }
public required string ProductVersion { get; init; }
public required List<UsabilityTask> Tasks { get; init; }
public required int TargetParticipants { get; init; }
public List<TestSession> Sessions { get; } = [];
public List<UsabilityIssue> Issues { get; } = [];
public UsabilityTestMetrics CalculateMetrics()
{
var completedSessions = Sessions.Where(s => s.Status == SessionStatus.Completed);
return new UsabilityTestMetrics
{
TotalParticipants = completedSessions.Count(),
OverallSuccessRate = CalculateOverallSuccess(completedSessions),
AverageSusScore = CalculateAverageSus(completedSessions),
TaskMetrics = Tasks.Select(t => CalculateTaskMetrics(t, completedSessions)).ToList(),
IssuesBySeverity = Issues.GroupBy(i => i.Severity)
.ToDictionary(g => g.Key, g => g.Count())
};
}
}
public class UsabilityIssue
{
public Guid Id { get; init; }
public required string Title { get; init; }
public required string Description { get; init; }
public required IssueSeverity Severity { get; init; }
public required string Location { get; init; }
public required List<Guid> AffectedParticipants { get; init; }
public string? Recommendation { get; init; }
public IssueStatus Status { get; set; } = IssueStatus.Open;
}
public record UsabilityTestMetrics
{
public required int TotalParticipants { get; init; }
public required decimal OverallSuccessRate { get; init; }
public required decimal AverageSusScore { get; init; }
public required List<TaskMetrics> TaskMetrics { get; init; }
public required Dictionary<IssueSeverity, int> IssuesBySeverity { get; init; }
}
user-research-planning - Overall research planningheuristic-evaluation - Expert review methodsaccessibility-planning - Inclusive testing practicesprototype-strategy - Prototype fidelity for testingLast Updated: 2025-12-27
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.