Use when running benchmark competitions or improving scoring accuracy. Used by pss-agent-profiler. Trigger with /pss-benchmark.
npx claudepluginhub emasoft/emasoft-plugins --plugin perfect-skill-suggesterThis skill uses the workspace's default tool permissions.
Benchmark protocol for Opus agents competing to improve the PSS scoring engine. Defines report structure, tracking format, sacred parameters, and anti-patterns.
Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.
Processes PDFs: extracts text/tables/images, merges/splits/rotates pages, adds watermarks, creates/fills forms, encrypts/decrypts, OCRs scans. Activates on PDF mentions or output requests.
Share bugs, ideas, or general feedback.
Benchmark protocol for Opus agents competing to improve the PSS scoring engine. Defines report structure, tracking format, sacred parameters, and anti-patterns.
rust/skill-suggester/src/main.rsdocs_dev/benchmark-v2-prompts-100.jsonl and docs_dev/benchmark-v2-gold-100.jsondocs_dev/methodology-improvement-history.mdcargo build --release in rust/skill-suggester/Write every file to $MAIN_ROOT/reports/pss-benchmark-agent/ — the main
repo root's reports folder, never the worktree's own. Both ./reports/
and ./reports_dev/ are gitignored project-wide. Resolve the path with
this shell prologue at the start of your Bash section:
MAIN_ROOT="$(git worktree list | head -n1 | awk '{print $1}')"
REPORT_DIR="$MAIN_ROOT/reports/pss-benchmark-agent"
mkdir -p "$REPORT_DIR"
TIMESTAMP="$(date +%Y%m%d_%H%M%S%z)" # local time + GMT offset, e.g. 20260421_183012+0200
REPORT_FILE="$REPORT_DIR/$TIMESTAMP-worktree-${AGENT_ID}-report.md"
LOG_FILE="$REPORT_DIR/$TIMESTAMP-worktree-${AGENT_ID}-benchmark-log.md"
docs_dev/methodology-improvement-history.md and current main.rs$REPORT_FILE (resolved via the prologue above)$LOG_FILE (same prologue)cargo test and cargo build --releaseToken savings: Use mcp__plugin_llm-externalizer_llm-externalizer__code_task (when available) to analyze benchmark logs and scoring engine source. Use chat to compare log snapshots in parallel. Reserve Opus reasoning for scoring algorithm changes.
Copy this checklist and track your progress:
cargo build --release && uv run scripts/pss_agent_benchmark.py --binary target/release/pss
$MAIN_ROOT/reports/pss-benchmark-agent/<TS±TZ>-worktree-{AGENT_ID}-report.md -- structured report with all mandatory sections$MAIN_ROOT/reports/pss-benchmark-agent/<TS±TZ>-worktree-{AGENT_ID}-benchmark-log.md -- per-prompt benchmark results (append-only)20260421_183012+0200). Both dirs are gitignored project-wide.rust/skill-suggester/src/main.rs -- with improvements to the scoring engineIf the benchmark script fails or produces unexpected output, check docs_dev/methodology-improvement-history.md for known issues.
rust/skill-suggester/src/main.rs — scoring enginedocs_dev/benchmark-v2-prompts-100.jsonl — promptsdocs_dev/benchmark-v2-gold-100.json — gold standarddocs_dev/methodology-improvement-history.md — history