Development cycle feedback system - calculates assertiveness scores, analyzes prompt quality for all agents executed, aggregates cycle metrics, performs root cause analysis on failures, and generates improvement reports to docs/feedbacks/cycle-{date}/.
/plugin marketplace add lerianstudio/ring/plugin install ring-dev-team@ringThis skill inherits all available tools. When active, it can use any tool Claude has access to.
See CLAUDE.md for canonical validation and gate requirements. This skill collects metrics and generates improvement reports.
Continuous improvement system that tracks development cycle effectiveness through assertiveness scores, identifies recurring failure patterns, and generates actionable improvement suggestions.
Core principle: What gets measured gets improved. Track every gate transition to identify systemic issues.
<cannot_skip>
⛔ HARD GATE: Before any other action, you MUST add feedback-loop to todo list.
Execute this TodoWrite call IMMEDIATELY when this skill starts:
TodoWrite tool:
todos:
- id: "feedback-loop-execution"
content: "Execute dev-feedback-loop: collect metrics, calculate scores, write report"
status: "in_progress"
priority: "high"
Why this is mandatory:
After completing all feedback-loop steps, mark as completed:
TodoWrite tool:
todos:
- id: "feedback-loop-execution"
content: "Execute dev-feedback-loop: collect metrics, calculate scores, write report"
status: "completed"
priority: "high"
Anti-Rationalization:
| Rationalization | Why It's WRONG | Required Action |
|---|---|---|
| "TodoWrite slows things down" | 1 tool call = 2 seconds. Not an excuse. | Execute TodoWrite NOW |
| "I'll remember to complete it" | Memory is unreliable. Todo is proof. | Execute TodoWrite NOW |
| "Skill is simple, no tracking needed" | Simple ≠ optional. all skills get tracked. | Execute TodoWrite NOW |
You CANNOT proceed to Step 1 without executing TodoWrite above.
See shared-patterns/shared-pressure-resistance.md for universal pressure scenarios.
Feedback-specific note: Feedback MUST be collected for every completed task, regardless of outcome or complexity. "Simple tasks" and "perfect scores" still need tracking.
See shared-patterns/shared-anti-rationalization.md for universal anti-rationalizations.
Feedback-specific rationalizations:
| Excuse | Reality |
|---|---|
| "It was just a spike/experiment" | Spikes produce learnings. Track what worked and what didn't. |
| "Perfect score, no insights" | Perfect scores reveal what works. Document for replication. |
| "Reporting my own failures reflects badly" | Unreported failures compound. Self-reporting is professional. |
| "Round up to passing threshold" | Rounding is falsification. Report exact score. |
See shared-patterns/shared-red-flags.md for universal red flags.
If you catch yourself thinking any of those patterns, STOP immediately. Collect metrics for every task.
Agents must report accurately, even when scores are low:
| Bias Pattern | Why It's Wrong | Correct Behavior |
|---|---|---|
| "Round up score" | Falsifies data, masks trends | Report exact: 68, not 70 |
| "Skip failed task" | Selection bias, incomplete picture | Report all tasks |
| "Blame external factors" | Avoids actionable insights | Document factors + still log score |
| "Report only successes" | Survivorship bias | Success and failure needed |
Reporting protocol:
Self-interest check: If you're tempted to adjust a score, ask: "Would I report this score if someone else achieved it?" If yes, report as-is.
<cannot_skip>
Non-negotiable: Feedback MUST be collected for every completed task, regardless of:
| Factor | Still Collect? | Reason |
|---|---|---|
| Task complexity | ✅ YES | Simple tasks reveal patterns |
| Outcome quality | ✅ YES | 100-score tasks need tracking |
| User satisfaction | ✅ YES | Approval ≠ process quality |
| Time pressure | ✅ YES | Metrics take <5 min |
| "Nothing to report" | ✅ YES | Absence of issues is data |
Consequence: Skipping feedback breaks continuous improvement loop and masks systemic issues.
When the same feedback appears multiple times:
| Repetition | Classification | Action |
|---|---|---|
| 2nd occurrence | RECURRING | Flag as recurring issue. Add to patterns. |
| 3rd occurrence | UNRESOLVED | Escalate. Stop current work. Report blocker. |
Recurring feedback indicates systemic issue not being addressed.
Escalation format:
## RECURRING ISSUE - Escalation Required
**Issue:** [Description]
**Occurrences:** [Count] times across [N] tasks
**Pattern:** [What triggers this issue]
**Previous Responses:** [What was tried]
**Recommendation:** [Systemic fix needed]
**Awaiting:** User decision on root cause resolution
<block_condition>
When thresholds are breached, response is REQUIRED:
| Alert | Threshold | Required Action |
|---|---|---|
| Task score | < 70 | Document what went wrong. Identify root cause. |
| Gate iterations | > 3 | STOP. Request human intervention. Document blocker. |
| Cycle average | < 80 | Deep analysis required. Pattern identification mandatory. |
You CANNOT proceed past threshold without documented response.
always pause and report blocker for:
| Decision Type | Examples | Action |
|---|---|---|
| Score interpretation | "Is 65 acceptable?" | STOP. Follow interpretation table. |
| Threshold override | "Skip analysis for this task" | STOP. Analysis is MANDATORY for low scores. |
| Pattern judgment | "Is this pattern significant?" | STOP. Document pattern, let user decide significance. |
| Improvement priority | "Which fix first?" | STOP. Report all findings, let user prioritize. |
Before skipping any feedback collection:
You CANNOT skip feedback collection. Period.
Base score of 100 points, with deductions for inefficiencies:
| Event | Penalty | Max Penalty | Rationale |
|---|---|---|---|
| Extra iteration (beyond 1) | -10 per iteration | -30 | Each iteration = rework |
| Review FAIL verdict | -20 | -20 | Critical/High issues found |
| Review NEEDS_DISCUSSION | -10 | -10 | Uncertainty in implementation |
| Unmet criterion at validation | -10 per criterion | -40 | Requirements gap |
| User REJECTED validation | -100 (score = 0) | -100 | Complete failure |
score = 100 - min(30, extra_iterations*10) - review_fail*20 - needs_discussion*10 - min(40, unmet_criteria*10) | User rejected → score = 0
| Score Range | Rating | Action Required |
|---|---|---|
| 90-100 | Excellent | No action needed |
| 80-89 | Good | Minor improvements possible |
| 70-79 | Acceptable | Review patterns, optimize |
| 60-69 | Needs Improvement | Root cause analysis required |
| < 60 | Poor | Mandatory deep analysis |
| 0 | Failed | Full post-mortem required |
MANDATORY: Execute this step for all tasks, regardless of:
Anti-exemption check: If you're thinking "perfect outcome, skip metrics" → STOP. This is Red Flag at line 75 ("Perfect outcome, skip the metrics").
After task completion, gather from agent_outputs in state file:
The state file now contains structured error/issue data for direct analysis:
| Gate | Structured Fields | Use For |
|---|---|---|
| Gate 0 | implementation.standards_compliance, implementation.iterations | Implementation standards patterns |
| Gate 1 | devops.standards_compliance, devops.verification_errors[] | DevOps standards + build/deploy failures |
| Gate 2 | sre.standards_compliance, sre.validation_errors[] | SRE standards + observability gaps |
| Gate 3 | testing.standards_compliance, testing.failures[], testing.uncovered_criteria[] | Testing standards + test failures + coverage |
| Gate 4 | review.{reviewer}.standards_compliance, review.{reviewer}.issues[] | Review standards + issues by category/severity |
All gates have standards_compliance with:
total_sections, compliant, not_applicable, non_compliantgaps[] - array of non-compliant sections with details# From state file, extract standards compliance from all gates:
all_standards_gaps = [
...agent_outputs.implementation.standards_compliance.gaps,
...agent_outputs.devops.standards_compliance.gaps,
...agent_outputs.sre.standards_compliance.gaps,
...agent_outputs.testing.standards_compliance.gaps,
...agent_outputs.review.code_reviewer.standards_compliance.gaps,
...agent_outputs.review.business_logic_reviewer.standards_compliance.gaps,
...agent_outputs.review.security_reviewer.standards_compliance.gaps
]
# Gate-specific errors/issues:
devops_errors = agent_outputs.devops.verification_errors
sre_errors = agent_outputs.sre.validation_errors
test_failures = agent_outputs.testing.failures
uncovered_acs = agent_outputs.testing.uncovered_criteria
review_issues = [
...agent_outputs.review.code_reviewer.issues,
...agent_outputs.review.business_logic_reviewer.issues,
...agent_outputs.review.security_reviewer.issues
]
# Aggregate standards compliance metrics:
total_standards_sections = sum(all_gates.standards_compliance.total_sections)
total_compliant = sum(all_gates.standards_compliance.compliant)
overall_compliance_rate = total_compliant / total_standards_sections * 100
# Total extra iterations across all gates:
extra_iterations = (
max(0, implementation.iterations - 1) +
max(0, devops.iterations - 1) +
max(0, sre.iterations - 1) +
max(0, testing.iterations - 1) +
max(0, review.iterations - 1)
)
Apply formula: Base 100 - deductions (extra iterations, review failures, unmet criteria) = Final Score / 100. Map to rating per interpretation table.
After calculating assertiveness, analyze prompt quality for all agents that executed in the task.
Read agent_outputs from state file (docs/dev-cycle/current-cycle.json or docs/dev-refactor/current-cycle.json):
Agents to analyze (if executed, not null):
- implementation: backend-engineer-golang | backend-engineer-typescript
- devops: devops-engineer
- sre: sre
- testing: qa-analyst
- review: code-reviewer, business-logic-reviewer, security-reviewer
<dispatch_required agent="prompt-quality-reviewer" model="opus"> Analyze prompt quality for all agents executed in this task. </dispatch_required>
Task tool:
subagent_type: "ring-dev-team:prompt-quality-reviewer"
prompt: |
Analyze prompt quality for agents in task [task_id].
Agent outputs from state:
[agent_outputs]
For each agent:
1. Load definition from dev-team/agents/ or default/agents/
2. Extract rules: MUST, MUST not, ask_when, output_schema
3. Compare output vs rules
4. Calculate score
5. Identify gaps with evidence
6. Generate improvements
Return structured analysis per agent.
Directory: docs/feedbacks/cycle-YYYY-MM-DD/
One file per agent, accumulating all tasks that used that agent.
File: docs/feedbacks/cycle-YYYY-MM-DD/{agent-name}.md
# Prompt Feedback: {agent-name}
**Cycle:** YYYY-MM-DD
**Total Executions:** N
**Average Score:** XX%
---
## Task T-001 (Gate X)
**Score:** XX/100
**Rating:** {rating}
### Gaps Found
| Category | Rule | Evidence | Impact |
|----------|------|----------|--------|
| MUST | [rule text] | [quote from output] | -X |
### What Went Well
- [positive observation]
---
## Task T-002 (Gate X)
**Score:** XX/100
...
---
## Consolidated Improvements
### Priority 1: [Title]
**Occurrences:** X/Y tasks
**Impact:** +X points expected
**File:** dev-team/agents/{agent}.md
**Current text (line ~N):**
[existing prompt]
**Suggested addition:**
```markdown
[new prompt text]
...
### 3.4 Append to Existing File
If file already exists (from previous task in same cycle), **append** the new task section before "## Consolidated Improvements" and update:
- Total Executions count
- Average Score
- Consolidated Improvements (re-analyze patterns)
## Step 3.5: Pattern Analysis from Structured Data (NEW)
**Analyze structured error/issue data to identify recurring patterns:**
### 3.5.1 Standards Compliance Patterns
```yaml
# Group gaps by section name
standards_gaps_by_section = group(all_implementation_gaps, by: "section")
# Identify recurring gaps (same section fails across tasks)
recurring_standards_gaps = filter(standards_gaps_by_section, count >= 2)
# Output pattern
For each recurring gap:
- Section: [section name]
- Occurrences: [N] tasks
- Common reason: [most frequent reason]
- Recommendation: [agent prompt improvement]
# Group review issues by category
issues_by_category = group(all_review_issues, by: "category")
# Group by severity for prioritization
issues_by_severity = group(all_review_issues, by: "severity")
# Identify top recurring categories
top_categories = sort(issues_by_category, by: count, descending).take(5)
# Output pattern
For each top category:
- Category: [category name]
- Occurrences: [N] issues across [M] tasks
- Severity breakdown: [CRITICAL: X, HIGH: Y, MEDIUM: Z]
- Most common: [most frequent description pattern]
- Fix rate: [fixed_count / total_count]%
# Group failures by error_type
failures_by_type = group(all_test_failures, by: "error_type")
# Identify flaky tests (same test fails multiple times)
flaky_tests = filter(all_test_failures, count_by_test_name >= 2)
# Output pattern
For each error type:
- Type: [assertion|panic|timeout|compilation]
- Occurrences: [N] failures
- Fix iterations: avg [X] iterations to fix
# Correlate: Do standards gaps predict review issues?
correlation_standards_review = correlate(
implementation.standards_compliance.gaps[].section,
review.*.issues[].category
)
# Output insight
If correlation > 0.5:
"Standards gap in [section] correlates with review issues in [category]"
→ Recommendation: Strengthen agent prompt for [section]
| Alert | Trigger | Action | Report Contents |
|---|---|---|---|
| Score < 70 | Individual task assertiveness < 70 | Mandatory root cause analysis | Failure events, "5 Whys" per event, Corrective actions, Prevention measures |
| Iterations > 3 | Any gate exceeds 3 iterations | STOP + human intervention | Iteration history, Recurring issue, Options: [Continue/Reassign/Descope/Cancel] |
| Avg < 80 | Cycle average below 80 | Deep analysis report | Score distribution, Failure patterns (freq/cause/fix), Improvement plan |
| Recurring Pattern | Same issue category in 3+ tasks | Pattern alert | Category, frequency, suggested prompt fix |
Report formats: RCA = Score → Failure Events → 5 Whys → Root cause → Corrective action | Gate Blocked = History → Issue → BLOCKED UNTIL human decision | Deep Analysis = Distribution → Patterns → Improvement Plan
Location: .ring/dev-team/feedback/cycle-YYYY-MM-DD.md
Required sections:
| Section | Content |
|---|---|
| Header | Date, Tasks Completed, Average Assertiveness |
| Task Summary | Table: Task ID, Score, Rating, Key Issue |
| By Gate | Table: Gate, Avg Iterations, Avg Duration, Pass Rate |
| By Penalty | Table: Penalty type, Occurrences, Points Lost |
| Patterns | Positive patterns (what works) + Negative patterns (what needs improvement) |
| Recommendations | Immediate (this sprint), Short-term (this month), Long-term (this quarter) |
| Next Review | Date, Target assertiveness, Focus areas |
Improvement types based on pattern analysis:
| Target | When to Suggest | Format |
|---|---|---|
| Agents | Same issue type recurring in reviews | Agent name → Issue → Suggestion → Specific addition to prompt |
| Skills | Gate consistently needs iterations | Skill name → Issue → Suggestion → Specific change to skill |
| Process | Pattern spans multiple tasks | Process area → Issue → Suggestion → Implementation |
⛔ HARD GATE: After all steps complete, you MUST mark feedback-loop todo as completed.
Execute this TodoWrite call to finalize:
TodoWrite tool:
todos:
- id: "feedback-loop-execution"
content: "Execute dev-feedback-loop: collect metrics, calculate scores, write report"
status: "completed"
priority: "high"
Verification before marking complete:
.ring/dev-team/feedback/You CANNOT mark todo as completed until all steps above are done.
Base metrics per shared-patterns/output-execution-report.md.
| Metric | Value |
|---|---|
| Tasks Analyzed | N |
| Average Assertiveness | XX.X% |
| Threshold Alerts | X |
| Root Cause Analyses | Y |
| Improvement Suggestions | Z |
| Report Location | .ring/dev-team/feedback/cycle-YYYY-MM-DD.md |
Never:
Always:
| Integration | Process |
|---|---|
| Retrospectives | Share metrics → Discuss trends → Prioritize improvements → Assign actions |
| Skill Updates | Document gap → Update skill → Track improvement → Iterate |
| Agent Updates | Identify behavior → Update prompt → Track change → Validate |
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.