Help us improve
Share bugs, ideas, or general feedback.
Share bugs, ideas, or general feedback.
Share bugs, ideas, or general feedback.
By judgmentlabs
Enables AI agents to use Judgeval for LLM evaluation, logging, and observability. Provides correct API usage, working examples, and helper scripts for common operations.
npx claudepluginhub judgmentlabs/judgeval-claude-plugin --plugin judgevalClaude Code plugin for automatic tracing and observability with Judgeval.
claude plugin marketplace add JudgmentLabs/judgeval-claude-plugin
claude plugin install trace-claude-code@judgeval-claude-plugin
See trace-claude-code/SKILL.md for setup instructions.
After installing, run the setup script in your project directory:
bash ~/.claude/plugins/marketplaces/judgeval-claude-plugin/skills/trace-claude-code/setup.sh
You'll need:
JUDGMENT_API_KEY - Get from Judgeval SettingsJUDGMENT_ORG_ID - Get from Organization SettingsClaude Code Session (root trace)
├── Turn 1: "Add error handling"
│ ├── LLM: claude-opus-4-5 (3.2s, 1,240 tokens)
│ ├── Read: src/app.ts
│ ├── Edit: src/app.ts
│ └── LLM: claude-opus-4-5 (1.8s, 890 tokens)
├── Turn 2: "Now run the tests"
│ ├── LLM: claude-opus-4-5
│ ├── Terminal: npm test
│ └── LLM: claude-opus-4-5
└── Turn 3: "Commit this"
└── ...
Captured data:
Test locally without marketplace:
claude --plugin-dir /path/to/judgeval-claude-plugin
After plugin updates are released:
claude plugin marketplace update judgeval-claude-plugin
claude plugin update trace-claude-code@judgeval-claude-plugin
MIT
Share bugs, ideas, or general feedback.
Based on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Enables AI agents to use Braintrust for LLM evaluation, logging, and observability. Provides correct API usage, working examples, and helper scripts for common operations.
OpenLit telemetry for Claude Code: sessions, tool calls, edit decisions, and cost rollups.
LLM observability tooling for agent development and Claude Code
Claude Code skill pack for Langfuse LLM observability (24 skills)
Skills for adding DeepEval evaluations, tracing, datasets, Confident AI reports, and iterative improvement loops to AI applications.
Observability platform for Claude Code and Agent SDK — monitor, debug, and control AI coding agents
Skills for working with Judgment — the continuous-improvement stack for agents. Add tracing, evaluations, code judges, MCP server workflows, and monitoring with best practices.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claim