From ftitos-claude-code
Formal evaluation framework for Claude Code sessions implementing eval-driven development (EDD) principles.
npx claudepluginhub nassimbf/ftitos-claude-codeThis skill uses the workspace's default tool permissions.
A formal evaluation framework for Claude Code sessions, implementing eval-driven development (EDD) principles.
Expert guidance for Next.js Cache Components and Partial Prerendering (PPR). **PROACTIVE ACTIVATION**: Use this skill automatically when working in Next.js projects that have `cacheComponents: true` in their next.config.ts/next.config.js. When this config is detected, proactively apply Cache Components patterns and best practices to all React Server Component implementations. **DETECTION**: At the start of a session in a Next.js project, check for `cacheComponents: true` in next.config. If enabled, this skill's patterns should guide all component authoring, data fetching, and caching decisions. **USE CASES**: Implementing 'use cache' directive, configuring cache lifetimes with cacheLife(), tagging cached data with cacheTag(), invalidating caches with updateTag()/revalidateTag(), optimizing static vs dynamic content boundaries, debugging cache issues, and reviewing Cache Component implementations.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Share bugs, ideas, or general feedback.
A formal evaluation framework for Claude Code sessions, implementing eval-driven development (EDD) principles.
Eval-Driven Development treats evals as the "unit tests of AI development":
Test if Claude can do something it couldn't before:
[CAPABILITY EVAL: feature-name]
Task: Description of what Claude should accomplish
Success Criteria:
- [ ] Criterion 1
- [ ] Criterion 2
Expected Output: Description of expected result
Ensure changes don't break existing functionality:
[REGRESSION EVAL: feature-name]
Baseline: SHA or checkpoint name
Tests:
- existing-test-1: PASS/FAIL
- existing-test-2: PASS/FAIL
Result: X/Y passed (previously Y/Y)
Deterministic checks using code:
# Check if tests pass
npm test -- --testPathPattern="auth" && echo "PASS" || echo "FAIL"
Use Claude to evaluate open-ended outputs:
Evaluate the following code change:
1. Does it solve the stated problem?
2. Is it well-structured?
3. Are edge cases handled?
Score: 1-5
Flag for manual review when automated grading is insufficient.
"At least one success in k attempts"
"All k trials succeed"
## EVAL DEFINITION: feature-xyz
### Capability Evals
1. Can create new user account
2. Can validate email format
### Regression Evals
1. Existing login still works
2. Session management unchanged
### Success Metrics
- pass@3 > 90% for capability evals
- pass^3 = 100% for regression evals
Write code to pass the defined evals.
Run each eval, record PASS/FAIL.
EVAL REPORT: feature-xyz
Capability Evals:
create-user: PASS (pass@1)
validate-email: PASS (pass@2)
Overall: 2/2 passed
Regression Evals:
login-flow: PASS
session-mgmt: PASS
Overall: 2/2 passed
Metrics:
pass@1: 50% (1/2)
pass@3: 100% (2/2)
Status: READY FOR REVIEW