Help us improve
Share bugs, ideas, or general feedback.
From fs-dev
Use when assessing task complexity, determining workflow depth, asking "how complex is this?", "should I use a worktree?", "is this a big change?", or sizing work before planning. Analyzes 7 dimensions (scope, integration, infrastructure, knowledge, risk, testing, decomposability) to return SIMPLE, STANDARD, or COMPLEX tier with confidence scoring. Includes intent confirmation gating and review agent selection hints for consensus-review integration.
npx claudepluginhub fyrsmithlabs/marketplace --plugin fs-devHow this skill is triggered — by the user, by Claude, or both
Slash command
/fs-dev:complexity-assessmentThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Evaluate task complexity before planning to ensure appropriate workflow depth and artifact structure.
Scores task complexity 1-5, enforces quality gates with blocking thresholds for YAGNI, unknowns, and failed attempts, and manages escalation before complex tasks or when stuck.
Orchestrates a pre-task multi-agent briefing that classifies task complexity, runs parallel analysis, estimates time, maps risks, and produces an execution plan with contingencies. Use before tackling complex or critical work to reduce rework.
Guides technical evaluation of code review feedback: read fully, restate for understanding, verify against codebase, respond with reasoning or pushback before implementing.
Share bugs, ideas, or general feedback.
Evaluate task complexity before planning to ensure appropriate workflow depth and artifact structure.
If contextd MCP is available:
memory_search to find similar past assessmentsmemory_record for tier recommendations with rationalememory_outcome to track assessment accuracy over timebranch_create/return for isolated assessmentIf contextd is NOT available:
This skill is called by /brainstorm Phase 2 to determine:
Assess each dimension and aggregate to determine tier:
Question: How many files will be touched?
| Files | Score | Indicator |
|---|---|---|
| 1-2 | 1 | Single component, localized change |
| 3-10 | 2 | Multiple components, cross-cutting |
| 10+ | 3 | System-wide, architectural change |
Question: What external dependencies are involved?
| Dependencies | Score | Indicator |
|---|---|---|
| None | 1 | Self-contained, no external calls |
| 1-2 APIs/services | 2 | Moderate integration work |
| 3+ APIs/services | 3 | Complex orchestration required |
Question: Are config/infra changes required?
| Changes | Score | Indicator |
|---|---|---|
| None | 1 | Code-only change |
| Config files | 2 | Environment variables, feature flags |
| New infra | 3 | Database, Docker, cloud resources |
Question: What domain expertise is required?
| Expertise | Score | Indicator |
|---|---|---|
| Familiar patterns | 1 | Standard CRUD, common patterns |
| Some research | 2 | New library, unfamiliar domain |
| Deep expertise | 3 | Security, performance, compliance |
Question: What's the blast radius if something goes wrong?
| Risk | Score | Indicator |
|---|---|---|
| Low | 1 | Easily reversible, non-critical path |
| Medium | 2 | User-facing, requires testing |
| High | 3 | Data integrity, security, payments |
Question: What testing effort is required?
| Testing | Score | Indicator |
|---|---|---|
| Minimal | 1 | Unit tests only, existing patterns |
| Moderate | 2 | Integration tests, new test fixtures |
| Extensive | 3 | E2E tests, performance tests, security audits |
Question: Can this task be split into independent units?
| Decomposability | Score | Indicator |
|---|---|---|
| Easily decomposable | 1 | Clear subtasks, no shared state |
| Partially decomposable | 2 | Some dependencies between subtasks |
| Monolithic | 3 | Tightly coupled, must be done together |
Apply these multipliers to the base score when present:
| Factor | Multiplier | When to Apply |
|---|---|---|
| Security-sensitive | 1.2x | Auth, encryption, user data, API keys |
| Data migration | 1.3x | Schema changes, data transformations |
| Breaking changes | 1.2x | API changes, deprecated features |
| Technical debt | 1.1x | Working in legacy/poorly-tested code |
| Time pressure | 1.1x | Urgent deadline, reduced review time |
Calculation: adjusted_score = base_score * highest_applicable_multiplier
Every assessment includes a confidence percentage (0-100%):
| Confidence | Range | Criteria |
|---|---|---|
| High | 80-100% | Clear requirements, familiar domain, similar past tasks |
| Medium | 50-79% | Some ambiguity, partial familiarity |
| Low | 0-49% | Unclear scope, unfamiliar domain, novel problem |
When confidence < 60%: Use AskUserQuestion to clarify ambiguous dimensions before finalizing tier.
Confidence factors:
Adjust expectations based on team context:
| Factor | Impact | Consideration |
|---|---|---|
| Team size | +/- 1 tier | Solo dev may need COMPLEX timeline adjustment |
| Experience level | +/- 1 dimension score | Junior devs add +1 to Knowledge dimension |
| Codebase familiarity | +/- 1 dimension score | New team members add +1 to Scope |
| Concurrent priorities | +/- 1 tier | High WIP may require elevated treatment |
Sum scores across all 7 dimensions (max 21):
| Total Score | Tier | Characteristics |
|---|---|---|
| 7-11 | SIMPLE | Quick change, minimal planning needed |
| 12-16 | STANDARD | Moderate complexity, structured approach |
| 17-21 | COMPLEX | High complexity, extensive planning required |
After multipliers: Round to nearest integer before tier assignment.
Return assessment as JSON for downstream consumption:
{
"tier": "STANDARD",
"confidence": 75,
"confidence_justification": "Clear scope but unfamiliar OAuth library",
"base_score": 14,
"adjusted_score": 16.8,
"multipliers_applied": ["security-sensitive"],
"dimensions": {
"scope": { "score": 2, "rationale": "Auth routes, middleware, user model, frontend" },
"integration": { "score": 2, "rationale": "OAuth provider API" },
"infrastructure": { "score": 2, "rationale": "Environment variables, session config" },
"knowledge": { "score": 2, "rationale": "OAuth2 flow, security best practices" },
"risk": { "score": 3, "rationale": "Security-critical, user data" },
"testing": { "score": 2, "rationale": "Integration tests for OAuth flow" },
"decomposability": { "score": 1, "rationale": "Backend/frontend can be parallelized" }
},
"decomposition_suggestions": null,
"historical_comparison": "Similar OAuth task took 3 iterations (Issue #45)",
"team_velocity_adjustments": null,
"integration_hints": {
"github_artifact": "epic_with_sub_issues",
"context_branch_budget": 8192,
"worktree_recommended": true
},
"recommended_agents": {
"always": ["code-quality-reviewer"],
"if_go": ["go-reviewer"],
"if_security_sensitive": ["security-reviewer", "vulnerability-reviewer"],
"if_api_change": ["user-persona-reviewer", "documentation-reviewer"],
"all": false
},
"intent_confirmation": {
"required": true,
"mode": "STANDARD"
}
}
After tier determination, include an intent_confirmation field in the output to signal the caller whether intent confirmation is recommended:
| Tier | Recommendation |
|---|---|
| SIMPLE (7-11) | "intent_confirmation": "none" - Caller should auto-proceed. |
| STANDARD (12-16) | "intent_confirmation": "standard" - Caller should invoke intent-confirmation skill. |
| COMPLEX (17-21) | "intent_confirmation": "detailed" - Caller should invoke intent-confirmation skill in DETAILED mode. |
This skill does NOT invoke intent-confirmation itself. It outputs a recommendation that the calling workflow uses to decide whether to invoke intent-confirmation. The typical workflow is:
Caller (e.g., /brainstorm, /plan):
1. Invoke complexity-assessment → returns tier + intent_confirmation field
2. If intent_confirmation != "none":
→ Invoke intent-confirmation skill with the recommended mode
→ Wait for user confirmation
3. Proceed with execution
This avoids a circular dependency between the two skills.
For COMPLEX tasks, the intent preview includes:
Include a recommended_agents field in the structured output based on the assessment. This allows the consensus-review skill to perform adaptive agent selection without re-analyzing the diff.
{
"recommended_agents": {
"always": ["code-quality-reviewer"],
"if_go": ["go-reviewer"],
"if_security_sensitive": ["security-reviewer", "vulnerability-reviewer"],
"if_api_change": ["user-persona-reviewer", "documentation-reviewer"],
"all": false
}
}
| Tier | all | Base Agents |
|---|---|---|
| SIMPLE | false | ["code-quality-reviewer"] |
| STANDARD | false | ["code-quality-reviewer"] + conditional agents |
| COMPLEX | true | All 6 reviewers (overrides conditional logic) |
These are derived from the dimension scores:
| Condition | Trigger | Agents Added |
|---|---|---|
| Risk score >= 2 | Security-sensitive | security-reviewer, vulnerability-reviewer |
| Knowledge score >= 2 AND domain is Go | Go code | go-reviewer |
| Integration score >= 2 | API changes likely | user-persona-reviewer, documentation-reviewer |
| Infrastructure score >= 2 | Config/infra changes | security-reviewer |
| Scope score == 3 | System-wide change | All reviewers (all: true) |
The consensus-review skill reads recommended_agents from the assessment output:
1. Run complexity-assessment → get tier + recommended_agents
2. Pass recommended_agents to consensus-review adaptive agent selection
3. Consensus-review merges recommended_agents with its own override rules
4. Final agent set = union(recommended_agents, override_agents)
Tier determines GitHub artifact structure:
| Tier | Artifact Type | Details |
|---|---|---|
| SIMPLE | Single Issue | Checklist in body, no sub-issues |
| STANDARD | Epic + Sub-Issues | Parent issue links children |
| COMPLEX | Epic + Sub-Issues + Project | Board with columns, milestones |
Tier determines branch budget:
| Tier | Token Budget | Max Depth |
|---|---|---|
| SIMPLE | 4096 | 1 level |
| STANDARD | 8192 | 2 levels |
| COMPLEX | 16384 | 3 levels |
For COMPLEX tasks (score 17+), provide decomposition suggestions:
{
"decomposition_suggestions": [
{
"subtask": "Backend OAuth integration",
"estimated_tier": "STANDARD",
"dependencies": [],
"can_parallelize": true
},
{
"subtask": "Frontend auth UI components",
"estimated_tier": "SIMPLE",
"dependencies": [],
"can_parallelize": true
},
{
"subtask": "Session management middleware",
"estimated_tier": "SIMPLE",
"dependencies": ["Backend OAuth integration"],
"can_parallelize": false
}
],
"parallelization_potential": "2 of 3 subtasks can run in parallel",
"recommended_approach": "Split into 3 issues, assign backend and frontend in parallel"
}
When confidence < 60%, use AskUserQuestion to clarify:
AskUserQuestion(
question: "I'm uncertain about the complexity of this task. To provide an accurate assessment, I need clarification on:",
options: [
"What external services/APIs will this integrate with?",
"Are there existing tests I should reference for patterns?",
"What's the expected timeline pressure?",
"Are there schema/database changes involved?"
],
allow_custom: true
)
After clarification: Re-score affected dimensions and recalculate tier.
When contextd is available, search for similar past assessments:
# Search for similar tasks
mcp__contextd__memory_search(
project_id: "<project>",
query: "<task description keywords>",
tags: ["complexity-assessment"]
)
Include in output:
Example output:
Historical comparison: "Similar OAuth task (Issue #45) was assessed as STANDARD,
took 3 iterations to complete. Assessment was accurate."
Store assessment in contextd with full rationale:
mcp__contextd__memory_record(
project_id: "<project>",
title: "Complexity assessment: <task-summary>",
content: JSON.stringify({
tier: "<tier>",
confidence: <0-100>,
base_score: <sum>,
adjusted_score: <after multipliers>,
dimensions: { <all 7 with scores and rationale> },
multipliers: [<applied multipliers>],
decomposition: <suggestions if COMPLEX>,
integration_hints: { <github, context-folding settings> }
}),
outcome: "success",
tags: ["complexity-assessment", "<tier>", "confidence-<high|medium|low>"]
)
Task: "Add user authentication with OAuth2"
| Dimension | Score | Rationale |
|---|---|---|
| Scope | 2 | Auth routes, middleware, user model, frontend components |
| Integration | 2 | OAuth provider API integration |
| Infrastructure | 2 | Environment variables, session config |
| Knowledge | 2 | OAuth2 flow, security best practices |
| Risk | 3 | Security-critical, user data |
| Testing | 2 | Integration tests for OAuth flow needed |
| Decomposability | 1 | Backend/frontend can be parallelized |
Base Score: 14/21 Multiplier: 1.2x (security-sensitive) Adjusted Score: 16.8 -> 17 = COMPLEX (borderline) Confidence: 75% (clear scope, unfamiliar library)
JSON Output:
{
"tier": "COMPLEX",
"confidence": 75,
"confidence_justification": "Clear scope but unfamiliar OAuth library",
"base_score": 14,
"adjusted_score": 17,
"multipliers_applied": ["security-sensitive"],
"dimensions": {
"scope": { "score": 2, "rationale": "Auth routes, middleware, user model, frontend" },
"integration": { "score": 2, "rationale": "OAuth provider API" },
"infrastructure": { "score": 2, "rationale": "Environment variables, session config" },
"knowledge": { "score": 2, "rationale": "OAuth2 flow, security best practices" },
"risk": { "score": 3, "rationale": "Security-critical, user data" },
"testing": { "score": 2, "rationale": "Integration tests for OAuth flow" },
"decomposability": { "score": 1, "rationale": "Backend/frontend can be parallelized" }
},
"decomposition_suggestions": [
{ "subtask": "Backend OAuth routes", "estimated_tier": "STANDARD", "can_parallelize": true },
{ "subtask": "Frontend auth UI", "estimated_tier": "SIMPLE", "can_parallelize": true }
],
"historical_comparison": "Similar OAuth task took 3 iterations (Issue #45)",
"integration_hints": {
"github_artifact": "epic_with_sub_issues_and_project",
"context_branch_budget": 16384,
"worktree_recommended": true
},
"recommended_agents": {
"always": ["code-quality-reviewer"],
"if_go": [],
"if_security_sensitive": ["security-reviewer", "vulnerability-reviewer"],
"if_api_change": ["user-persona-reviewer", "documentation-reviewer"],
"all": true
},
"intent_confirmation": {
"required": true,
"mode": "DETAILED"
}
}
This skill is designed to be called by other commands/skills:
# In /brainstorm command Phase 2:
## Phase 2: Complexity Assessment (Context Folded)
mcp__contextd__branch_create(
session_id: "<session>",
description: "Assess task complexity",
budget: 4096
)
1. Search contextd for similar past assessments
2. Apply 7-dimension analysis to user's task description
3. Calculate confidence score
4. Apply risk multipliers if applicable
5. Calculate tier based on adjusted score
6. Generate decomposition suggestions if COMPLEX
7. Record assessment in contextd with full JSON
mcp__contextd__branch_return(
branch_id: "<branch>",
message: JSON.stringify({
tier: "<tier>",
confidence: <0-100>,
score: "<adjusted>/<max>",
key_factors: ["<top 2 dimensions>"],
decomposition: <if COMPLEX>,
integration_hints: { github_artifact, context_branch_budget, worktree_recommended }
})
)
EVERY assessment MUST complete ALL steps:
recommended_agents field based on tier and dimension scoresintent_confirmation field (required + mode)The assessment is NOT complete until contextd recording is done.
If you're thinking any of these, you're about to violate the skill:
| Thought | Reality |
|---|---|
| "This is obviously simple, no need to score" | Score anyway. Intuition misses edge cases. |
| "I'll just say STANDARD to be safe" | Calculate the actual score. Don't guess. |
| "Recording to contextd is optional" | Recording is MANDATORY. No exceptions. |
| "Risk doesn't matter for this task" | Risk ALWAYS matters. Score it. |
| "I can skip dimensions for simple tasks" | ALL 7 dimensions. Every time. |
| "The user already knows the complexity" | Document for future sessions. Record it. |
| "Confidence scoring is extra work" | Confidence enables better decisions. Calculate it. |
| "Decomposition is only for COMPLEX" | Consider it for STANDARD too if helpful. |
| "Historical comparison is optional" | Past data improves accuracy. Search for it. |
| "Intent confirmation slows things down" | It prevents wasted work on misunderstood requirements. |
| "I don't need to include recommended_agents" | Downstream consensus-review depends on it. Always include. |
| "SIMPLE tasks don't need intent logging" | Log anyway. Calibration data improves future accuracy. |
| Mistake | Correct Approach |
|---|---|
| Inflating simple tasks | Typo fix = SIMPLE. Don't over-engineer. |
| Deflating complex tasks | Payment/security = COMPLEX. Don't underestimate. |
| Gut-feel tier assignment | Show your math. 7 dimensions, explicit scores. |
| Skipping contextd | Recording enables future efficiency. Always record. |
| Ignoring high-risk dimensions | Risk=3 deserves attention regardless of total. |
| Not applying multipliers | Security/migrations/breaking changes need multipliers. |
| Low confidence without clarification | Use AskUserQuestion when confidence < 60%. |
| Missing decomposition for COMPLEX | Always suggest how to break down COMPLEX tasks. |
| Returning text instead of JSON | Downstream tools expect structured output. |
| Skipping intent confirmation for STANDARD | STANDARD requires plan approval before proceeding. |
| Not including recommended_agents | consensus-review uses this for adaptive agent selection. |
Setting all: true for SIMPLE | Only COMPLEX warrants all 6 reviewers. |
After task completion, update the original assessment:
mcp__contextd__memory_outcome(
memory_id: "<original assessment id>",
outcome: "success|partial|failure",
notes: "Predicted STANDARD, actual was COMPLEX due to unexpected API rate limits"
)
Track metrics over time:
Adapted from Auto-Claude complexity_assessor.md by AndyMik90 (MIT).
Enhanced with testing dimension, confidence scoring, decomposability analysis, and structured output.
See CREDITS.md for full attribution.