Implements LLM-as-a-Judge techniques: direct scoring, pairwise comparison, rubric generation, bias mitigation. For building eval systems, comparing model outputs, setting AI quality standards.
npx claudepluginhub shipshitdev/libraryThis skill uses the workspace's default tool permissions.
LLM-as-a-Judge techniques for evaluating AI outputs. Not a single technique but a family of approaches - choosing the right one and mitigating biases is the core competency.
Implements LLM-as-a-Judge techniques: direct scoring, pairwise comparison, rubric generation, bias mitigation. For building eval systems, comparing model outputs, setting AI quality standards.
Implements LLM-as-judge techniques for evaluating LLM outputs via direct scoring, pairwise comparison, rubrics, and bias mitigation including position and length bias.
Implements LLM-as-judge techniques for evaluating outputs via direct scoring, pairwise comparison, rubrics, and bias mitigation including position, length, and verbosity biases.
Share bugs, ideas, or general feedback.
LLM-as-a-Judge techniques for evaluating AI outputs. Not a single technique but a family of approaches - choosing the right one and mitigating biases is the core competency.
Direct Scoring: Single LLM rates one response on a defined scale.
Pairwise Comparison: LLM compares two responses and selects better one.
| Bias | Description | Mitigation |
|---|---|---|
| Position | First-position preference | Swap positions, check consistency |
| Length | Longer = higher scores | Explicit prompting, length-normalized scoring |
| Self-Enhancement | Models rate own outputs higher | Use different model for evaluation |
| Verbosity | Unnecessary detail rated higher | Criteria-specific rubrics |
| Authority | Confident tone rated higher | Require evidence citation |
Is there an objective ground truth?
├── Yes → Direct Scoring (factual accuracy, format compliance)
└── No → Pairwise Comparison (tone, style, creativity)
Works with:
For detailed implementation patterns, prompt templates, examples, and metrics: references/full-guide.md
See also: references/implementation-patterns.md, references/bias-mitigation.md, references/metrics-guide.md