Evolve LLM agent code autonomously in Python projects using LangSmith evaluations, multi-agent proposers, and git worktrees. Run propose-evaluate-iterate loops to boost performance, check dataset quality and score stability, analyze architectures on stagnation, audit evaluators for issues, generate diverse tests, monitor progress charts, and commit tagged improvements.
npx claudepluginhub raphaelchristi/harness-evolver --plugin harness-evolverUse this agent when the evolution loop stagnates or regresses. Analyzes the agent architecture and recommends topology changes (single-call → RAG, chain → ReAct, etc.).
Background agent for cross-iteration memory consolidation. Runs after each iteration to extract learnings and update evolution_memory.md. Read-only analysis — does not modify agent code.
Use this agent when scores converge suspiciously fast, evaluator quality is questionable, or the agent reaches high scores in few iterations. Detects gaming AND implements fixes.
Use this agent to evaluate experiment outputs using LLM-as-judge. Reads run inputs/outputs from LangSmith via langsmith-cli, judges correctness, and writes scores back as feedback. No external API keys needed.
Self-organizing agent optimizer. Investigates a data-driven lens (question), decides its own approach, and modifies real code in an isolated git worktree. May self-abstain if it cannot add meaningful value.
Use this agent to generate test inputs for the evaluation dataset. Spawned by the setup skill when no test data exists.
Use when the user wants to verify that the evolved agent's score is stable and reliable. Runs evaluation multiple times and reports mean ± std.
Use when the user is done evolving and wants to finalize, clean up, tag the result, or push the optimized agent.
Use when the user wants to run the optimization loop, improve agent performance, evolve the agent, or iterate on quality. Requires .evolver.json to exist (run harness:setup first).
Use when the user wants to check dataset quality, diagnose eval issues, or before running evolve. Checks size, difficulty distribution, dead examples, coverage, and splits. Auto-corrects issues found.
Use when the user wants to set up the evolver in their project, optimize an LLM agent, improve agent performance, or mentions evolver for the first time in a project without .evolver.json.
Use when the user asks about evolution progress, current scores, best version, how many iterations ran, or whether the loop is stagnating.
Qiushi Skill: methodology skills for AI agents guided by seeking truth from facts, with Claude Code, Cursor, OpenClaw, Codex, OpenCode, and Hermes guidance.
Uses power tools
Uses Bash, Write, or Edit tools
Ultra-compressed communication mode. Cuts ~75% of tokens while keeping full technical accuracy by speaking like a caveman.
Comprehensive UI/UX design plugin for mobile (iOS, Android, React Native) and web applications with design systems, accessibility, and modern patterns
Intelligent prompt optimization using skill-based architecture. Enriches vague prompts with research-based clarifying questions before Claude Code executes them
Persistent memory system for Claude Code - seamlessly preserve context across sessions
Access thousands of AI prompts and skills directly in your AI coding assistant. Search prompts, discover skills, save your own, and improve prompts with AI.