judgeval

Judgeval Claude Code Plugin

Claude Code plugin for automatic tracing and observability with Judgeval.

Install

claude plugin marketplace add JudgmentLabs/judgeval-claude-plugin
claude plugin install trace-claude-code@judgeval-claude-plugin

See trace-claude-code/SKILL.md for setup instructions.

Setup

After installing, run the setup script in your project directory:

bash ~/.claude/plugins/marketplaces/judgeval-claude-plugin/skills/trace-claude-code/setup.sh

You'll need:

JUDGMENT_API_KEY - Get from Judgeval Settings
JUDGMENT_ORG_ID - Get from Organization Settings

What You Get

Claude Code Session (root trace)
├── Turn 1: "Add error handling"
│   ├── LLM: claude-opus-4-5 (3.2s, 1,240 tokens)
│   ├── Read: src/app.ts
│   ├── Edit: src/app.ts
│   └── LLM: claude-opus-4-5 (1.8s, 890 tokens)
├── Turn 2: "Now run the tests"
│   ├── LLM: claude-opus-4-5
│   ├── Terminal: npm test
│   └── LLM: claude-opus-4-5
└── Turn 3: "Commit this"
    └── ...

Captured data:

Session start/end times
Each conversation turn
All LLM calls with model, tokens, and duration
Tool invocations (file reads, edits, terminal, MCP)
Cache metrics (creation + read tokens)

Development

Test locally without marketplace:

claude --plugin-dir /path/to/judgeval-claude-plugin

Run the test suite (bats):

npm install -g bats   # or: brew install bats-core
bats --recursive tests

The same suite runs in CI on every pull request.

Versioning

The version is single-sourced from the repo-root VERSION file. pyproject.toml reads it dynamically (hatchling); the .claude-plugin/*.json manifests are generated from it. To bump the version:

scripts/sync_version.py 1.1.0   # writes VERSION + regenerates the manifests
git commit -am "Release 1.1.0" && git tag v1.1.0

CI fails if any manifest drifts from VERSION.

Updating

After plugin updates are released:

claude plugin marketplace update judgeval-claude-plugin
claude plugin update trace-claude-code@judgeval-claude-plugin

License

MIT

Popularity

What's Inside

README

Judgeval Claude Code Plugin

Install

Setup

What You Get

Development

Versioning

Updating

License

Confidence

Similar Plugins

braintrust

opik

wshobson-llm-evaluation

langfuse-pack

llm-observability

langsmith-skills

More by JudgmentLabs

judgment

Judgment

Popularity

Health & Quality

More by JudgmentLabs

judgment

Judgment

Similar Plugins

braintrust

opik

wshobson-llm-evaluation

langfuse-pack

llm-observability

langsmith-skills