Skill

bare-eval

From ork

Runs isolated claude --bare mode calls for skill evaluation, output grading, trigger testing, and prompt benchmarking without plugin interference.

Bash

npx claudepluginhub yonatangross/orchestkit --plugin ork

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Run `claude -p --bare` for fast, clean eval/grading without plugin overhead.

Supporting Assets

references/grading-schemas.mdreferences/invocation-patterns.mdreferences/troubleshooting.mdrules/_sections.mdrules/bare-grading-only.mdrules/bare-plugin-conflict.mdrules/bare-requires-api-key.mdtest-cases.json

SKILL.md

Similar Skills

skill-eval

Evaluates Claude Code skill output quality via assertion-based grading, blind before/after version comparisons, and variance analysis across multiple runs for benchmarking.

8 files

claude-caliper-workflow

skill-forge-eval

Runs evaluation pipelines on Claude Code skills to test triggering accuracy, workflow correctness, and output quality. Spawns sub-agents for parallel execution and generates JSON reports.

skill-forge

skill-tester

Tests and benchmarks Claude Code skills empirically via evaluation-driven development. Compares skill vs baseline performance using pass rates, timing, token metrics in quick workflow or 7-phase full pipeline.

3 files8 tools

skills-toolkit

Stats

Parent Repo Stars133

Parent Repo Forks14

Last CommitMar 24, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Bare Eval — Isolated Evaluation Calls

Run claude -p --bare for fast, clean eval/grading without plugin overhead.

CC 2.1.81 required. The --bare flag skips hooks, LSP, plugin sync, and skill directory walks.

When to Use

Grading skill outputs against assertions
Trigger classification (which skill matches a prompt)
Description optimization iterations
Any scripted -p call that doesn't need plugins

When NOT to Use

Testing skill routing (needs --plugin-dir)
Testing agent orchestration (needs full plugin context)
Interactive sessions

Prerequisites

# --bare requires ANTHROPIC_API_KEY (OAuth/keychain disabled)
export ANTHROPIC_API_KEY="sk-ant-..."

# Verify CC version
claude --version  # Must be >= 2.1.81

Quick Reference

Call Type	Command Pattern
Grading	`claude -p "$prompt" --bare --max-turns 1 --output-format text`
Trigger	`claude -p "$prompt" --bare --json-schema "$schema" --output-format json`
Optimize	`echo "$prompt" \| claude -p --bare --max-turns 1 --output-format text`
Force-skill	`claude -p "$prompt" --bare --print --append-system-prompt "$content"`

Invocation Patterns

Load detailed patterns and examples:

Read("${CLAUDE_SKILL_DIR}/references/invocation-patterns.md")

Grading Schemas

JSON schemas for structured eval output:

Read("${CLAUDE_SKILL_DIR}/references/grading-schemas.md")

Pipeline Integration

OrchestKit's eval scripts (npm run eval:skill) auto-detect bare mode:

# eval-common.sh detects ANTHROPIC_API_KEY → sets BARE_MODE=true
# Scripts add --bare to all non-plugin calls automatically

Bare calls: Trigger classification, force-skill, baseline, all grading. Never bare: run_with_skill (needs plugin context for routing tests).

Performance

Scenario	Without --bare	With --bare	Savings
Single grading call	~3-5s startup	~0.5-1s	2-4x
Trigger (per prompt)	~3-5s	~0.5-1s	2-4x
Full eval (50 calls)	~150-250s overhead	~25-50s	3-5x

Rules

Read("${CLAUDE_SKILL_DIR}/rules/_sections.md")

Troubleshooting

Read("${CLAUDE_SKILL_DIR}/references/troubleshooting.md")

eval:skill npm script — unified skill evaluation runner
eval:trigger — trigger accuracy testing
eval:quality — A/B quality comparison
optimize-description.sh — iterative description improvement
Version compatibility: doctor/references/version-compatibility.md

bare-eval

Tool Access

Preview

Supporting Assets

SKILL.md

Similar Skills

Help us improve

Help us improve

bare-eval

Tool Access

Preview

Supporting Assets

SKILL.md

Bare Eval — Isolated Evaluation Calls

When to Use

When NOT to Use

Prerequisites

Quick Reference

Invocation Patterns

Grading Schemas

Pipeline Integration

Performance

Rules

Troubleshooting

Related

Similar Skills

Help us improve

Bare Eval — Isolated Evaluation Calls

When to Use

When NOT to Use

Prerequisites

Quick Reference

Invocation Patterns

Grading Schemas

Pipeline Integration

Performance

Rules

Troubleshooting

Related