From gilfoyle
Evaluate-Loop Step 4: EVALUATE EXECUTION. This is the dispatcher agent — it determines the track type and invokes the correct specialized evaluator. Does NOT run a generic checklist. Instead dispatches to: eval-ui-ux (screens/design), eval-code-quality (features/infrastructure), eval-integration (APIs/auth/payments), eval-business-logic (generator/rules/state). Triggered by: 'evaluate execution', 'review implementation', 'check build', '/phase-review'. Always runs after loop-executor.
npx claudepluginhub ahmedelhadarey/gilfoyle --plugin gilfoyleThis skill uses the workspace's default tool permissions.
This agent does NOT evaluate directly. It determines the track type and dispatches the correct specialized evaluator.
Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.
Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.
Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.
This agent does NOT evaluate directly. It determines the track type and dispatches the correct specialized evaluator.
Different track types need fundamentally different checks:
A generic checklist misses critical issues specific to each type.
Read the track's metadata.json and spec.md to determine the track type, then dispatch:
| Track Type | Keywords in spec/metadata | Evaluator |
|---|---|---|
| UI / Design | "screen", "component", "design system", "layout", "visual", "UI shell" | eval-ui-ux |
| Feature / Code | "implement", "feature", "refactor", "infrastructure", "hook", "store" | eval-code-quality |
| Integration | "Supabase", "Stripe", "Gemini", "API", "auth", "database", "webhook" | eval-integration |
| Business Logic | "generation", "lock", "dependency", "pricing", "tier", "pipeline", "download" | eval-business-logic |
Some tracks need multiple evaluators. For example:
eval-business-logic + eval-code-qualityeval-integration + eval-code-qualityeval-ui-ux onlyWhen multiple evaluators apply, run them all. The track passes only if ALL evaluators pass.
1. Read track metadata.json + spec.md
2. Determine track type(s)
3. Dispatch evaluator(s):
→ eval-ui-ux (if UI track)
→ eval-code-quality (if code/feature track)
→ eval-integration (if integration track)
→ eval-business-logic (if logic track)
4. Collect results from all dispatched evaluators
5. Aggregate into final verdict
Regardless of track type, always verify these baseline checks:
| Check | Method |
|---|---|
| plan.md updated | All completed tasks marked [x] with commit SHA and summary |
| Scope alignment | No unplanned work added without documentation |
| No skipped tasks | All [ ] tasks either completed or documented as intentionally deferred |
| Build passes | npm run build exits 0 |
| Business docs in sync | If track made pricing/model/business decisions, verify docs are flagged for Step 5.5 sync |
If the track made any business-impacting changes, verify:
Business Doc Sync Required: YesWhat counts as business-impacting:
See .claude/skills/business-docs-sync/SKILL.md for the full registry.
## Execution Evaluation Report
**Track**: [track-id]
**Evaluator**: loop-execution-evaluator (dispatcher)
**Date**: [YYYY-MM-DD]
### Evaluators Dispatched
| Evaluator | Reason | Verdict |
|-----------|--------|---------|
| eval-ui-ux | Track builds P0 screens | PASS ✅ / FAIL ❌ |
| eval-code-quality | Track implements features | PASS ✅ / FAIL ❌ |
### Structural Checks
- plan.md updated: YES / NO
- Scope alignment: YES / NO
- Build passes: YES / NO
- Business doc sync needed: YES / NO (if YES, list affected docs)
### Final Verdict: PASS ✅ / FAIL ❌
All evaluators must PASS for the track to pass.
[If FAIL, aggregate all fix actions from all evaluators]
The execution evaluator MUST update the track's metadata.json at key points:
{
"loop_state": {
"current_step": "EVALUATE_EXECUTION",
"step_status": "IN_PROGRESS",
"step_started_at": "[ISO timestamp]",
"checkpoints": {
"EVALUATE_EXECUTION": {
"status": "IN_PROGRESS",
"started_at": "[ISO timestamp]",
"agent": "loop-execution-evaluator"
}
}
}
}
{
"loop_state": {
"current_step": "BUSINESS_SYNC",
"step_status": "NOT_STARTED",
"checkpoints": {
"EVALUATE_EXECUTION": {
"status": "PASSED",
"completed_at": "[ISO timestamp]",
"verdict": "PASS",
"evaluators_run": [
{ "evaluator": "eval-code-quality", "verdict": "PASS", "issues": [] },
{ "evaluator": "eval-business-logic", "verdict": "PASS", "issues": [] }
],
"business_sync_required": true
},
"BUSINESS_SYNC": {
"status": "NOT_STARTED",
"required": true
}
}
}
}
{
"loop_state": {
"current_step": "FIX",
"step_status": "NOT_STARTED",
"checkpoints": {
"EVALUATE_EXECUTION": {
"status": "FAILED",
"completed_at": "[ISO timestamp]",
"verdict": "FAIL",
"evaluators_run": [
{ "evaluator": "eval-code-quality", "verdict": "PASS", "issues": [] },
{ "evaluator": "eval-business-logic", "verdict": "FAIL", "issues": ["Business rule violation found"] }
],
"failure_items": [
"Fix business rule enforcement in resolver",
"Add test coverage for edge case"
]
},
"FIX": {
"status": "NOT_STARTED",
"cycle": 1
}
}
}
}
metadata.jsonloop_state.checkpoints.EVALUATE_EXECUTION with resultscurrent_step to BUSINESS_SYNCcurrent_step to COMPLETEcurrent_step to FIX, increment fix_cycle_count in loop_statemetadata.jsonloop-fixer with combined fix list