Evaluation framework skills for designing scoring rubrics, running structured evaluations on LLM outputs, and comparing candidate outputs to recommend a winner.
npx claudepluginhub ats-kinoshita-iso/agent-workshop --plugin eval-frameworkCompare two LLM outputs on the same evaluation criteria and recommend a winner with justification. Use this skill when asked to "compare these outputs", "which response is better", "A/B eval", or "pick the best candidate".
Design evaluation criteria and a 1-5 scoring rubric for a task or LLM output. Use this skill when asked to "create an eval", "define evaluation criteria", "build a scoring rubric", or "design how to measure quality" for any output.
Execute a structured evaluation against a set of LLM outputs and produce a scored report. Use this skill when asked to "run the eval", "score these outputs", "evaluate this response", or "generate an evaluation report".
A plugin library and development framework for Claude Code. This repo serves two purposes:
/plugin install.# Add the marketplace
/plugin marketplace add ats-kinoshita-iso/agent-workshop
# Browse and install plugins
/plugin install planning@agent-workshop
Browse cookbook/ and copy what you need into your project:
cookbook/claude-md/ — CLAUDE.md templates by project typecookbook/hooks/ — Reusable hook recipes for .claude/settings.jsoncookbook/mcp/ — MCP server configurations for common integrations.claude-plugin/ # Marketplace definition (marketplace.json)
plugins/ # Stable, packaged Claude Code plugins
cookbook/ # Golden baseline configs (copy into your projects)
claude-md/ # CLAUDE.md templates
hooks/ # Hook recipes
mcp/ # MCP server configs
tools/ # Development & validation tooling
tests/ # Plugin validation gates
1. Develop in .claude/ → Iterate locally with Claude Code
2. Validate with test suite → uv run pytest
3. Package as plugin → Create plugin.json + SKILL.md in plugins/
4. Auto-register in marketplace → marketplace_gen.py updates marketplace.json
5. Users install from here → /plugin marketplace add ats-kinoshita-iso/agent-workshop
| Plugin | Description | Version |
|---|---|---|
| code-quality-gate | Unified quality orchestrator (lint, format, typecheck, test) | 1.0.0 |
| context-sync | Keeps CLAUDE.md files in sync with the codebase | 1.0.0 |
| plan-manager | Plan lifecycle management with gate tracking and archival | 1.0.0 |
| planning | Phased implementation plans with gates and tests | 2.0.0 |
| test-quality | Test generation, auditing, and knowledge extraction | 1.0.0 |
| workspace-clean | Workspace hygiene checks and cleanup | 1.0.0 |
| Plugin | Skills | License |
|---|---|---|
| anthropic-document-skills | docx, pdf, pptx, xlsx | Source-available |
| anthropic-creative-skills | algorithmic-art, brand-guidelines, canvas-design, frontend-design, slack-gif-creator, theme-factory | Apache 2.0 |
| anthropic-dev-skills | claude-api, mcp-builder, skill-creator, web-artifacts-builder, webapp-testing | Apache 2.0 |
| anthropic-enterprise-skills | doc-coauthoring, internal-comms | Apache 2.0 |
uvbunuv syncuv run pytestuv run ruff check .uv run ruff format .uv run mypy .bun installbun testbunx biome check --write ..claude/ (skills, hooks, agents, etc.)plugins/<your-plugin>/.claude-plugin/plugin.json with name, description, version, keywordsuv run pytest to validate structureuv run python tools/marketplace_gen.py to update the marketplace catalogComplete creative writing suite with 10 specialized agents covering the full writing process: research gathering, character development, story architecture, world-building, dialogue coaching, editing/review, outlining, content strategy, believability auditing, and prose style/voice analysis. Includes genre-specific guides, templates, and quality checklists.
Show your plugin is listed on ClaudePluginHub.
[](https://www.claudepluginhub.com/plugins/ats-kinoshita-iso-eval-framework-plugins-eval-framework?ref=badge)Paste near the top of your README, alongside other badges.
Tools to maintain and improve CLAUDE.md files - audit quality, capture session learnings, and keep project memory current.
Comprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.
Automates browser interactions for web testing, form filling, screenshots, and data extraction
Comprehensive .NET development skills for modern C#, ASP.NET, MAUI, Blazor, Aspire, EF Core, Native AOT, testing, security, performance optimization, CI/CD, and cloud-native applications
20 modular skills for idiomatic Go — each under 225 lines, backed by 48 reference files, 8 automation scripts (all with --json, --limit, --force), and 4 asset templates. Covers error handling, naming, testing, concurrency, interfaces, generics, documentation, logging, performance, and more. Activates automatically with progressive disclosure and conditional cross-references.