Post-task performance evaluation. Applies first-principles reflection to evaluate how well the current session's workflow, agents, and skills performed. Produces a structured improvement report saved to docs/superomni/improvements/. Triggers: "self-improve", "evaluate performance", "reflect on execution", "how did we do", "what could be better", "evaluate this sprint", "improve process", "first principles review".
From superomninpx claudepluginhub wilder1222/superomni --plugin superomniThis skill is limited to using the following tools:
SKILL.md.tmplSearches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Searches prompts.chat for AI prompt templates by keyword or category, retrieves by ID with variable handling, and improves prompts via AI. Use for discovering or enhancing prompts.
Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.
mkdir -p ~/.omni-skills/sessions
_PROACTIVE=$(~/.claude/skills/superomni/bin/config get proactive 2>/dev/null || echo "true")
_BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown")
_TEL_START=$(date +%s)
echo "Branch: $_BRANCH | PROACTIVE: $_PROACTIVE"
If PROACTIVE is false: do NOT proactively suggest skills. Only run skills the
user explicitly invokes. If you would have auto-invoked, say:
"I think [skill-name] might help here — want me to run it?" and wait.
Report status using one of these at the end of every skill session:
Pipeline stage order: THINK → PLAN → REVIEW → BUILD → VERIFY → SHIP → REFLECT
REVIEW is the only human gate. All other stages auto-advance on DONE.
| Status | At REVIEW stage | At all other stages |
|---|---|---|
| DONE | STOP — present review summary, wait for user input (Y / N / revision notes) | Auto-advance — print [STAGE] DONE → advancing to [NEXT-STAGE] and immediately invoke next skill |
| DONE_WITH_CONCERNS | STOP — present concerns, wait for user decision | STOP — present concerns, wait for user decision |
| BLOCKED / NEEDS_CONTEXT | STOP — present blocker, wait for user | STOP — present blocker, wait for user |
When auto-advancing:
docs/superomni/[STAGE] DONE → advancing to [NEXT-STAGE] ([skill-name])When the user sends a follow-up message after a completed session, before doing anything else:
ls docs/superomni/specs/spec-*.md docs/superomni/plans/plan-*.md docs/superomni/ .superomni/ 2>/dev/null | head -20
git log --oneline -3 2>/dev/null
To find the latest spec or plan:
_LATEST_SPEC=$(ls docs/superomni/specs/spec-*.md 2>/dev/null | sort | tail -1)
_LATEST_PLAN=$(ls docs/superomni/plans/plan-*.md 2>/dev/null | sort | tail -1)
workflow skill for stage → skill mapping) and announce:
"Continuing in superomni mode — picking up at [stage] using [skill-name]."using-skills/SKILL.md.When asking the user a question, match the confirmation requirement to the complexity of the response:
| Question type | Confirmation rule |
|---|---|
| Single-choice — user picks one option (A/B/C, 1/2/3, Yes/No) | The user's selection IS the confirmation. Do NOT ask "Are you sure?" or require a second submission. |
| Free-text input — user types a value and presses Enter | The submitted text IS the confirmation. No secondary prompt needed. |
| Multi-choice — user selects multiple items from a list | After the user lists their selections, ask once: "Confirm these selections? (Y to proceed)" before acting. |
| Complex / open-ended discussion — back-and-forth clarification | Collect all input, then present a summary and ask: "Ready to proceed with the above? (Y/N)" before acting. |
Rule: never add a redundant confirmation layer on top of a single-choice or text-input answer.
Custom Input Option Rule: Whenever you present a predefined list of choices (A/B/C, numbered options, etc.), always append a final "Other" option that lets the user describe their own idea:
[last letter/number + 1]) Other — describe your own idea: ___________
When the user selects "Other" and provides their custom text, treat that text as the chosen option and proceed exactly as you would for any other selection. If the custom text is ambiguous, ask one clarifying question before proceeding.
Load context progressively — only what is needed for the current phase:
| Phase | Load these | Defer these |
|---|---|---|
| Planning | Latest docs/superomni/specs/spec-*.md, constraints, prior decisions | Full codebase, test files |
| Implementation | Latest docs/superomni/plans/plan-*.md, relevant source files | Unrelated modules, docs |
| Review/Debug | diff, failing test output, minimal repro | Full history, specs |
If context pressure is high: summarize prior phases into 3-5 bullet points, then discard raw content.
All skill artifacts are written to docs/superomni/ (relative to project root).
See the Document Output Convention in CLAUDE.md for the full directory map.
Agent failures are harness signals — not reasons to retry the same approach:
harness-engineering skill to update the harness before retrying.It is always OK to stop and say "this is too hard for me." Escalation is expected, not penalized.
After completing any skill session, run a 3-question self-check before writing the final status:
If any answer is NO, address it before reporting DONE. If it cannot be addressed, report DONE_WITH_CONCERNS and name the gap.
For a full performance evaluation spanning the entire sprint, use the self-improvement skill.
_TEL_END=$(date +%s)
_TEL_DUR=$(( _TEL_END - _TEL_START ))
~/.claude/skills/superomni/bin/analytics-log "SKILL_NAME" "$_TEL_DUR" "OUTCOME" 2>/dev/null || true
Nothing is sent to external servers. Data is stored only in ~/.omni-skills/analytics/.
Goal: Close the feedback loop on every sprint by systematically evaluating process adherence, agent behavior, and skill effectiveness — then produce concrete improvement actions for the next session.
A FRAMEWORK THAT CANNOT MEASURE ITS OWN PERFORMANCE CANNOT IMPROVE.
Every sprint cycle must end with a self-evaluation. A session without reflection is a missed learning opportunity.
Performance problems in AI-assisted development reduce to three root causes:
Every metric in this skill traces back to one of these three root causes.
Collect objective data about what happened in this session:
# What was built/changed
git log --oneline -10
git diff --stat HEAD~3 2>/dev/null | tail -5
# What artifacts were produced
ls docs/superomni/specs/spec-*.md docs/superomni/plans/plan-*.md 2>/dev/null
ls docs/superomni/ .superomni/ 2>/dev/null
# Read the latest evaluation report (from verification skill)
LATEST_EVAL=$(find docs/superomni/evaluations -name "*.md" -type f 2>/dev/null | sort | tail -1)
if [ -n "$LATEST_EVAL" ]; then
echo "Latest verification evaluation:"
cat "$LATEST_EVAL" | head -40
fi
# Skill telemetry for this session
tail -10 ~/.omni-skills/analytics/usage.jsonl 2>/dev/null || echo "(no telemetry)"
# Current test status
npm test 2>/dev/null || bash lib/validate-skills.sh 2>/dev/null || echo "(no test suite found)"
Document the raw facts:
Answer each question with YES / PARTIAL / NO + reason:
| Question | Answer | Evidence |
|---|---|---|
| Did each major task follow the THINK→PLAN→REVIEW→BUILD→VERIFY→SHIP→REFLECT cycle? | ||
| Was a spec or plan artifact created before implementation? | ||
| Were skills invoked for their intended triggers (not bypassed)? | ||
| Did the session end with a status report (DONE/BLOCKED/etc.)? |
| Law | Followed? | Notes |
|---|---|---|
| No fixes without root cause investigation | ||
| One change at a time during debugging | ||
| 3-strike escalation rule respected | ||
| Blast radius flagged when >5 files touched | ||
| Tests written before claiming done |
Evaluate the AI agent's performance on three dimensions:
Score: __ / 5 — Evidence: ___
Score: __ / 5 — Evidence: ___
Score: __ / 5 — Evidence: ___
Agent Performance Score: __ / 15
For each skill invoked in this session, rate its effectiveness:
| Skill | Was it the right skill? | Phases completed? | Output quality | Score (1-5) |
|---|---|---|---|---|
| [skill-1] | YES/NO | 100% / 80% / <50% | clear/partial/missing | |
| [skill-2] | YES/NO | 100% / 80% / <50% | clear/partial/missing |
Questions to answer for each skill:
Trace every deviation found back to a root cause category:
| Deviation observed | Root cause | Principle violated |
|---|---|---|
| [example: skipped plan review] | Process drift — time pressure | "Plan Lean" — even lean plans need review |
| [example: claimed done without tests] | Evidence gap | "Evidence over Claims" |
The 6 Decision Principles check:
Generate exactly 3 concrete improvement actions for the next sprint:
Format for each action:
ACTION [N]: [TITLE]
Problem: [what went wrong or what was missing]
Root cause: [which of the 3 root causes — process drift / evidence gap / scope creep]
Fix: [specific, actionable change to process or behavior]
Verify: [how to confirm this improvement was applied in the next session]
Example:
ACTION 1: WRITE SPEC BEFORE IMPLEMENTATION
Problem: Started coding directly from the issue title without a spec
Root cause: Process drift — skipped THINK stage under time pressure
Fix: Before any implementation task, spend 5 minutes writing docs/superomni/specs/spec-[branch]-[session]-[date].md with problem, goals, non-goals, acceptance criteria
Verify: Next session starts with `ls docs/superomni/specs/spec-*.md` — must exist before first code change
IMPROVE_DIR="docs/superomni/improvements"
mkdir -p "$IMPROVE_DIR"
BRANCH=$(git branch --show-current 2>/dev/null | tr '/' '-' || echo "main")
TIMESTAMP=$(date +%Y-%m-%d-%H%M%S)
REPORT_FILE="$IMPROVE_DIR/improvement-${BRANCH}-${TIMESTAMP}.md"
Save the full evaluation report to $REPORT_FILE using the following structure. All scores and tables from Phases 1–6 must be included — not just the action items:
# Improvement Report: [branch]
**Date:** [date]
**Branch:** [branch]
**Task description:** [what was worked on this session]
## Session Evidence (Phase 1)
- Skills invoked: [list]
- Artifacts produced: [list of files in .superomni/ and project root]
- Tests outcome: [pass/fail counts]
- Evaluation report referenced: [path or "none"]
## Process Adherence (Phase 2)
| Question | Answer | Evidence |
|----------|--------|----------|
| THINK→PLAN→REVIEW→BUILD→VERIFY→SHIP→REFLECT followed | YES/PARTIAL/NO | |
| Spec/plan created before implementation | YES/PARTIAL/NO | |
| Skills used for intended triggers | YES/PARTIAL/NO | |
| Session ended with status report | YES/PARTIAL/NO | |
**Iron Law compliance:** [N/5 laws followed]
## Agent Evaluation (Phase 3)
| Dimension | Score | Evidence |
|-----------|-------|---------|
| Scope management | [N]/5 | |
| Instruction following | [N]/5 | |
| Escalation behavior | [N]/5 | |
**Agent total: [N]/15**
## Skill Effectiveness (Phase 4)
| Skill | Right skill? | Phases done | Output quality | Score |
|-------|-------------|-------------|---------------|-------|
| [skill-1] | YES/NO | 100%/80%/<50% | clear/partial/missing | [N]/5 |
**Skills avg: [N]/5**
## Gap Analysis (Phase 5)
| Deviation | Root cause | Principle violated |
|-----------|-----------|-------------------|
| [deviation] | [root cause] | [principle] |
## Action Items (Phase 6)
### ACTION 1: [TITLE]
Problem: ...
Root cause: ...
Fix: ...
Verify: ...
### ACTION 2: [TITLE]
Problem: ...
Root cause: ...
Fix: ...
Verify: ...
### ACTION 3: [TITLE]
Problem: ...
Root cause: ...
Fix: ...
Verify: ...
echo "Improvement report saved to $REPORT_FILE"
This report is the canonical record of agent and skill performance for this session. The workflow skill reads it at the next sprint start to apply the action items.
SELF-IMPROVEMENT REPORT
════════════════════════════════════════
Session: [branch / date / task description]
Process adherence: [N/N checks passed]
Agent score: [N/15] (scope: N/5 | instructions: N/5 | escalation: N/5)
Skills evaluated: [N skills] — avg [N]/5
Top gap: [single most important finding]
Action 1: [title]
Action 2: [title]
Action 3: [title]
Report saved: [docs/superomni/improvements/...]
Status: DONE | DONE_WITH_CONCERNS
════════════════════════════════════════