AI agent evaluation framework based on Anthropic best practices. Create use cases, LLM judges, A/B prompt tests, and model comparisons.
npx claudepluginhub markac007/cg-claude-workspaces-plugins --plugin evals- Existing use case with test cases and a prompt
**Implements the Science Protocol for prompt experimentation.**
---
Save all generated config files to `~/Downloads/evals/<name>/` before moving to the project.
- Use case config.yaml exists
- Evaluations have been run (results stored as JSON files in `Results/`)
Complete collection of battle-tested Claude Code configs from an Anthropic hackathon winner - agents, skills, hooks, rules, and legacy command shims evolved over 10+ months of intensive daily use
Tools to maintain and improve CLAUDE.md files - audit quality, capture session learnings, and keep project memory current.
Comprehensive PR review agents specializing in comments, tests, error handling, type design, code quality, and code simplification
Comprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.
Complete creative writing suite with 10 specialized agents covering the full writing process: research gathering, character development, story architecture, world-building, dialogue coaching, editing/review, outlining, content strategy, believability auditing, and prose style/voice analysis. Includes genre-specific guides, templates, and quality checklists.
Efficient skill management system with progressive discovery — 410+ production-ready skills across 33+ domains