From gilfoyle
Evaluate-Loop Step 2: EVALUATE PLAN. Use this agent to verify an execution plan before any code is written. Checks scope alignment, overlap with completed work, DAG validity, dependency correctness, task clarity, and invokes Board of Directors for major tracks. Outputs PASS/FAIL verdict. Triggered by: 'evaluate plan', 'review plan', 'check plan before executing'. Always runs after loop-planner and before loop-executor.
npx claudepluginhub ahmedelhadarey/gilfoyle --plugin gilfoyleThis skill uses the workspace's default tool permissions.
Pre-execution quality gate. Verifies the plan is correct and scoped before any implementation begins. This prevents the exact problem that caused the PLAN-005 design system rebuild — an agent executing work that was already done.
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
Pre-execution quality gate. Verifies the plan is correct and scoped before any implementation begins. This prevents the exact problem that caused the PLAN-005 design system rebuild — an agent executing work that was already done.
For major tracks (architecture, features with 5+ tasks, integrations, infrastructure), this step also invokes the Board of Directors for multi-perspective expert review.
plan.md — the plan to evaluate (including DAG)spec.md — requirements to check againstconductor/tracks.md — completed tracks (overlap check)metadata.json — track type and priorityCheck every task against spec.md:
| For Each Task | Check |
|---|---|
| Is it in spec? | Task must trace to a specific spec requirement |
| Is it needed? | Would removing this task leave a spec requirement unmet? |
| Is it scoped? | Does the task do only what spec asks, not more? |
Output:
### Scope Alignment: PASS ✅ / FAIL ❌
- Tasks in spec: [X]/[Y]
- Tasks NOT in spec (scope creep): [list]
- Spec requirements NOT covered: [list]
Cross-reference with tracks.md and the codebase:
| Check | Method |
|---|---|
| Track overlap | Compare plan tasks against completed track deliverables |
| File overlap | Check if planned files already exist in codebase |
| Component overlap | Check if planned components already exist |
Output:
### Overlap Detection: PASS ✅ / FAIL ❌
- Overlapping tasks: [list with which track already did them]
- Files that already exist: [list]
- Recommendation: [SKIP/MODIFY/PROCEED for each overlap]
Verify task ordering and prerequisites:
| Check | Question |
|---|---|
| Track deps | Are prerequisite tracks marked complete in tracks.md? |
| Task ordering | Do later tasks depend on earlier tasks being done first? |
| External deps | Are required packages/APIs available? |
Output:
### Dependencies: PASS ✅ / FAIL ❌
- Missing track dependencies: [list]
- Misordered tasks: [list]
- Missing external dependencies: [list]
Evaluate each task for clarity and completeness:
| Check | Criteria |
|---|---|
| Specific | Action is clear (not vague like "set up infrastructure") |
| Acceptance criteria | Can you objectively verify completion? |
| File targets | Expected file paths are listed? |
| Session-sized | Can be completed in one sitting? |
Output:
### Task Quality: PASS ✅ / FAIL ❌
- Vague tasks: [list with suggestions to clarify]
- Missing acceptance criteria: [list]
- Oversized tasks (should split): [list]
Verify the dependency graph is valid for parallel execution:
| Check | Method |
|---|---|
| DAG exists | Plan contains dag: block with nodes and parallel_groups |
| No cycles | Topological sort succeeds (no circular dependencies) |
| Valid refs | All depends_on references point to existing task IDs |
| File conflicts | Parallel groups with shared files have coordination strategy |
| Levels correct | Tasks in same parallel_group are at same topological level |
Cycle Detection Algorithm:
def detect_cycles(dag):
"""Returns True if cycle exists, False otherwise."""
visited = set()
rec_stack = set()
def dfs(node_id):
visited.add(node_id)
rec_stack.add(node_id)
node = next((n for n in dag['nodes'] if n['id'] == node_id), None)
for dep in node.get('depends_on', []):
if dep not in visited:
if dfs(dep):
return True
elif dep in rec_stack:
return True # Cycle detected
rec_stack.remove(node_id)
return False
for node in dag['nodes']:
if node['id'] not in visited:
if dfs(node['id']):
return True
return False
Output:
### DAG Validation: PASS ✅ / FAIL ❌
- DAG present: yes/no
- Nodes: [count]
- Parallel groups: [count]
- Cycle detected: yes/no (list cycle path if yes)
- Invalid references: [list of broken depends_on]
- Conflict issues: [list parallel groups with unhandled file conflicts]
For major tracks, invoke the Board of Directors for expert deliberation:
When to invoke Board:
architecture, integration, or infrastructureBoard Invocation:
// If track qualifies for board review
if (isMajorTrack(metadata)) {
// Initialize board session via message bus
const boardResult = await invokeBoardMeeting(
proposal: plan.md content,
context: { spec, metadata, dag }
);
// Store board session in metadata
metadata.loop_state.board_sessions.push({
session_id: boardResult.session_id,
checkpoint: "EVALUATE_PLAN",
verdict: boardResult.verdict,
vote_summary: boardResult.votes,
conditions: boardResult.conditions,
timestamp: new Date().toISOString()
});
// Board verdict affects overall evaluation
if (boardResult.verdict === "REJECTED") {
return FAIL with board conditions;
}
}
Output:
### Board Review: PASS ✅ / FAIL ❌ / SKIPPED ⏭️
- Board invoked: yes/no (reason if no)
- Directors voted: [CA, CPO, CSO, COO, CXO]
- Verdict: APPROVED / APPROVED_WITH_REVIEW / REJECTED
- Vote breakdown: [X] APPROVE / [Y] REJECT
- Conditions from board:
1. [Condition 1] (from [Director])
2. [Condition 2] (from [Director])
## Plan Evaluation Report
**Track**: [track-id]
**Evaluator**: loop-plan-evaluator
**Date**: [YYYY-MM-DD]
**Execution Mode**: SEQUENTIAL | PARALLEL
### Results
| Pass | Status |
|------|--------|
| Scope Alignment | PASS ✅ / FAIL ❌ |
| Overlap Detection | PASS ✅ / FAIL ❌ |
| Dependencies | PASS ✅ / FAIL ❌ |
| Task Quality | PASS ✅ / FAIL ❌ |
| DAG Validation | PASS ✅ / FAIL ❌ |
| Board Review | PASS ✅ / FAIL ❌ / SKIPPED ⏭️ |
### Parallel Execution Summary
- **Total Tasks**: [count]
- **Parallel Groups**: [count]
- **Max Concurrency**: [max workers in a parallel group]
- **Conflict-Free Groups**: [count]
- **Coordinated Groups**: [count with shared resources]
### Board Decision (if applicable)
- **Verdict**: [APPROVED / APPROVED_WITH_REVIEW / REJECTED]
- **Vote**: [X APPROVE / Y REJECT]
- **Conditions**: [count] conditions attached
- **Session ID**: [board-{timestamp}]
### Verdict: PASS ✅ → Proceed to Parallel Execution
### Verdict: FAIL ❌ → Return to Planner with fixes:
1. [Fix 1]
2. [Fix 2]
### Board Conditions (carry forward):
1. [Condition from board that must be verified in EVALUATE_EXECUTION]
The plan evaluator MUST update the track's metadata.json at key points:
{
"loop_state": {
"current_step": "EVALUATE_PLAN",
"step_status": "IN_PROGRESS",
"step_started_at": "[ISO timestamp]",
"checkpoints": {
"EVALUATE_PLAN": {
"status": "IN_PROGRESS",
"started_at": "[ISO timestamp]",
"agent": "loop-plan-evaluator"
}
}
}
}
{
"loop_state": {
"current_step": "EXECUTE",
"step_status": "NOT_STARTED",
"execution_mode": "PARALLEL",
"checkpoints": {
"EVALUATE_PLAN": {
"status": "PASSED",
"completed_at": "[ISO timestamp]",
"verdict": "PASS",
"checks": {
"scope_alignment": true,
"overlap_detection": true,
"dependencies": true,
"task_quality": true,
"dag_validation": true,
"board_review": true
},
"cto_review": {
"status": "PASSED",
"reviewed_at": "[timestamp if run]"
},
"dag_summary": {
"total_tasks": 8,
"parallel_groups": 3,
"max_concurrency": 4,
"conflict_free_groups": 2,
"coordinated_groups": 1
}
},
"EXECUTE": {
"status": "NOT_STARTED"
}
},
"board_sessions": [
{
"session_id": "board-20260201-123456",
"checkpoint": "EVALUATE_PLAN",
"verdict": "APPROVED",
"vote_summary": {
"CA": "APPROVE",
"CPO": "APPROVE",
"CSO": "APPROVE",
"COO": "APPROVE",
"CXO": "APPROVE"
},
"conditions": [
"Add caching layer (CA)",
"Security audit before launch (CSO)"
],
"timestamp": "[ISO timestamp]"
}
]
}
}
{
"loop_state": {
"current_step": "PLAN",
"step_status": "NOT_STARTED",
"checkpoints": {
"EVALUATE_PLAN": {
"status": "FAILED",
"completed_at": "[ISO timestamp]",
"verdict": "FAIL",
"checks": {
"scope_alignment": true,
"overlap_detection": false,
"dependencies": true,
"task_quality": false
},
"failure_reasons": [
"Overlap with existing track: component already built",
"Task 3 is too vague"
]
},
"PLAN": {
"status": "NOT_STARTED",
"plan_version": 2
}
}
}
}
metadata.jsonloop_state.checkpoints.EVALUATE_PLAN with verdict and checkscurrent_step to EXECUTEcurrent_step to PLAN, increment plan_versionmetadata.json