Evaluate Model

Run Phase 1 of the learning pipeline - evaluate session and extract lessons.

Usage

/rules-learning-pipeline:evaluate-model <session-file> [--verbose]

What It Does

Parse action sequences (gold vs predicted)
Calculate metrics: Success Match, F1, Precision, Recall
Identify error types:
- Standard: Skipped Step, Wrong Order, Wrong Target, Hallucinated Action
- Wasteful: BDN, CIP, Verification Theater, Redundant Tool Chains
Detect dismissive language patterns
Extract lessons with:
- Specificity score validation (≥ 2)
- Scope classification
- Domain keywords

Execution

Task(
  subagent_type: "model-evaluator",
  prompt: "Evaluate session at {input_file}:

    1. PARSE action sequences
    2. CALCULATE metrics (F1, Precision, Recall)
    3. IDENTIFY error types including:
       - Wasteful Verification (broad → dismiss → narrow)
       - Dismissive Reasoning (pre-existing, not related)
       - Redundant Tool Calls
       - Scope Mismatch
    4. EXTRACT lessons with specificity validation
    5. CLASSIFY scope for each lesson

    Output to:
    - docs/evaluations/{session}-evaluation.md
    - docs/evaluations/{session}-lessons-raw.md"
)

Output

docs/evaluations/{session}-evaluation.md - Full metrics and error analysis
docs/evaluations/{session}-lessons-raw.md - Extracted lessons with scopes

/evaluate-model

Evaluate Model

Usage

What It Does

Execution

Output

Other plugins with /evaluate-model