By alexsds
Agent-Driven Engineering: Anthropic's 3-agent harness with pluggable rubrics and testing tools for long-running app development
npx claudepluginhub alexsds/ade-workflow --plugin adeArchive the current plan after completion
Launch Generator + Evaluator agent team to execute the approved plan
Research context, ask questions, and create a plan scaled to the scope of the work
Show current ADE build progress and evaluator scores
Use this agent as a team member during /ade:execute to test and score features implemented by the generator. The evaluator uses pluggable rubrics and testing tools to grade work with hard thresholds. <example> Context: The execute command is launching the agent team. user: "Build the approved plan" assistant: "Spawning the ade-evaluator as a team member to test and score features as the generator implements them." <commentary> The evaluator is spawned alongside the generator. It waits for features to review, loads rubrics, tests them, and sends scored feedback. </commentary> </example> <example> Context: Generator messaged that a feature is ready for review. user: "Feature: User Registration. Status: Ready for review." assistant: "The evaluator loads relevant rubrics, tests the feature, scores each criterion, and sends detailed feedback." <commentary> Evaluator receives handoff from generator, runs adversarial testing, scores against rubric thresholds, and reports back. </commentary> </example>
Use this agent as a team member during /ade:execute to implement features from an approved plan. The generator builds the app feature-by-feature, commits to git, and hands off to the evaluator for scoring. <example> Context: The execute command is launching the agent team. user: "Build the approved plan" assistant: "Spawning the ade-generator as a team member to implement features from the plan." <commentary> The generator is spawned as part of an agent team alongside the evaluator. It implements features and messages the evaluator for review. </commentary> </example> <example> Context: Generator received feedback from evaluator and needs to iterate. user: "Evaluator scored originality 5/10 — needs distinctive color palette" assistant: "The generator iterates on the feature based on evaluator feedback until all rubric scores pass." <commentary> Generator accepts evaluator feedback without argument and iterates until all criteria pass threshold. </commentary> </example>
This skill should be used when the user asks about evaluation methodology, scoring rubrics, testing tools, "how does scoring work", "evaluation criteria", "rubric format", "add a rubric", "create testing tool", "evaluate my feature", "run evaluation", "why did evaluation fail", or needs guidance on adversarial evaluation, graded scoring, or hard thresholds in the ADE workflow. Make sure to use this skill whenever the user mentions quality assessment, code review scoring, feature validation, or wants to understand why a feature passed or failed evaluation.
Use when asking about the ADE build process, how the generator works, iteration strategy, "why is the generator doing X", "how does building work", "commit conventions", "pivot vs refine", or understanding the implementation phase of the ADE workflow. This skill covers the Generator's methodology for implementing features from approved plans, including the 4-phase build cycle and strategic iteration.
Use when the user wants to plan any work — building an app, adding a feature, fixing a bug, solving a problem, refactoring, or any task that benefits from thinking before doing. Triggers on "plan", "build me", "I want to", "fix this", "add a", "we need to", "how should we", or when the user describes something they want done. This skill guides interactive discovery, research, and planning scaled to the scope of the work — from a quick task plan to a full product spec.
A Claude Code plugin implementing Anthropic's recommended 3-agent harness for long-running application development.
Based on: Harness Design for Long-Running Apps
/ade:plan "Build a task management app"
→ Planner researches context, asks questions with suggested answers
→ Creates a plan scaled to scope (app, feature, task, bug)
→ You review and approve
/ade:execute
→ Generator implements deliverables one by one
→ Evaluator tests and scores each against rubrics
→ They iterate until all criteria pass
/ade:done
→ Archives the completed plan
| Agent | Role | Key Behavior |
|---|---|---|
| Planner | Interactive planning | Researches → asks questions → writes plan scaled to scope |
| Generator | Implementation | Builds deliverable-by-deliverable, commits to git |
| Evaluator | Adversarial QA | Scores against rubrics with hard thresholds, can't modify code |
The planner adapts to the scope of the work:
| Scope | Examples | Questions | Plan Structure |
|---|---|---|---|
| Large | Full app, new product | 3-5+ | Phased features + user stories |
| Medium | New feature, integration | 1-3 | Deliverables + acceptance criteria |
| Small | Bug fix, task, refactor | 0-1 | Goal + what to change + done when |
The planner always researches before asking questions — exploring the codebase for existing projects or searching for similar products for greenfield work.
Skills are the source of truth for methodology. Agents are thin execution shells that reference skills for guidance.
skills/ → methodology, knowledge, the "why" and "how"
agents/ → execution shells that read skills
commands/ → user-facing entry points that invoke agents
rubrics/ → evaluation criteria with scored thresholds
testing-tools/ → testing configurations for the evaluator
Default rubrics in rubrics/:
frontend-design.md — UI quality, originality, craft, functionalitycode-architecture.md — separation of concerns, clarity, error handling, testabilityapi-quality.md — API design, responses, validation, securityux-flows.md — flow coherence, edge cases, information architecture, feedbackAdd custom rubrics by dropping .md files in .ade/rubrics/ in your project. Project rubrics override plugin defaults with the same filename.
Default tools in testing-tools/:
playwright.md — browser testing via Playwright MCP (auto-configured, falls back to curl)api-tester.md — HTTP endpoint testing via curlunit-test-runner.md — test suite execution (auto-detects framework)Add custom tools by dropping .md files in .ade/testing-tools/ in your project.
Project settings in .claude/ade.local.md:
---
commits_style: conventional # conventional | jira
---
| Command | Description |
|---|---|
/ade:plan [anything] | Research, ask questions, create plan scaled to scope |
/ade:execute | Launch Generator + Evaluator team |
/ade:done | Archive completed plan |
/ade:status | Show build progress |
Inside a Claude Code session, add the marketplace:
/plugin marketplace add alexsds/ade-workflow
Then install the plugin:
/plugin install ade@alexsds-ade-workflow
Or run /plugin to open the interactive plugin manager.
Anthropic's research found:
MIT
Uses power tools
Uses Bash, Write, or Edit tools
Share bugs, ideas, or general feedback.
HelloAGENTS — The orchestration kernel that makes any AI CLI smarter. Adds intelligent routing, quality verification (Ralph Loop), safety guards, and notifications.
Persona-driven AI development team: orchestrator, team agents, review agents, skills, slash commands, and advisory hooks for Claude Code
Access thousands of AI prompts and skills directly in your AI coding assistant. Search prompts, discover skills, save your own, and improve prompts with AI.
Comprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.
Complete developer toolkit for Claude Code