From create-evaluator
Scaffolds new Evaluator classes for Dokimos LLM evaluation framework with custom metrics, scoring functions, and grading logic for LLM outputs.
npx claudepluginhub dokimos-dev/dokimos --plugin create-evaluatorThis skill uses the workspace's default tool permissions.
Scaffold a new Evaluator implementation for Dokimos following project conventions.
Scaffolds JUnit parameterized tests for LLM evaluations using dokimos-junit and @DatasetSource. Enables eval-driven development with datasets as test cases in CI.
Scaffolds Dokimos Experiments wiring datasets, tasks, evaluators, and reporters for LLM evaluation pipelines, model testing, and end-to-end eval workflows.
Creates evaluator functions in evaluators.ts for Output SDK workflows to implement quality assessment, validation logic, and LLM-powered content evaluation with confidence scores.
Share bugs, ideas, or general feedback.
Scaffold a new Evaluator implementation for Dokimos following project conventions.
The user will describe what the evaluator should do via $ARGUMENTS. Use that description to determine the evaluator's name, scoring logic, which test case parameters it requires, and what constitutes a passing score.
dokimos-core/src/main/java/dev/dokimos/core/evaluators/dokimos-core/src/test/java/dev/dokimos/core/dev.dokimos.core.BaseEvaluatorEvalResult, EvalTestCase, EvalTestCaseParam, JudgeLMBefore writing code, read these files to understand the current conventions:
dokimos-core/src/main/java/dev/dokimos/core/BaseEvaluator.javadokimos-core/src/main/java/dev/dokimos/core/evaluators/ExactMatchEvaluator.java (simple evaluator reference)dokimos-core/src/main/java/dev/dokimos/core/evaluators/LLMJudgeEvaluator.java (LLM-based evaluator reference, if the new evaluator needs a JudgeLM)Every evaluator follows this pattern:
BaseEvaluator with a private constructor that takes a BuilderrunEvaluation(EvalTestCase testCase) — this is where the scoring logic goesBuilder class with sensible defaults for name, threshold, and evaluationParamsEvalResult via EvalResult.builder().name(name).score(score).threshold(threshold).reason(reason).build()The evaluationParams field declares which EvalTestCaseParam values the evaluator requires (INPUT, ACTUAL_OUTPUT, EXPECTED_OUTPUT). BaseEvaluator validates these are non-null before calling runEvaluation.
If the evaluator needs an LLM to judge (semantic similarity, faithfulness, etc.), accept a JudgeLM in the builder. JudgeLM is a functional interface: String generate(String prompt).
package dev.dokimos.core.evaluators;
import dev.dokimos.core.BaseEvaluator;
import dev.dokimos.core.EvalResult;
import dev.dokimos.core.EvalTestCase;
import dev.dokimos.core.EvalTestCaseParam;
import java.util.List;
/**
* [One-line description of what this evaluator checks.]
*/
public class NameEvaluator extends BaseEvaluator {
// Add any evaluator-specific fields here (e.g., JudgeLM judge)
private NameEvaluator(Builder builder) {
super(builder.name, builder.threshold, builder.evaluationParams);
// Initialize evaluator-specific fields from builder
}
/**
* Creates a new builder for constructing [Name] evaluators.
*
* @return a new builder
*/
public static Builder builder() {
return new Builder();
}
@Override
protected EvalResult runEvaluation(EvalTestCase testCase) {
double score = 0.0;
String reason = "...";
return EvalResult.builder()
.name(name)
.score(score)
.threshold(threshold)
.reason(reason)
.build();
}
public static class Builder {
private String name = "Name";
private double threshold = 1.0;
private List<EvalTestCaseParam> evaluationParams = List.of(
EvalTestCaseParam.ACTUAL_OUTPUT,
EvalTestCaseParam.EXPECTED_OUTPUT
);
public Builder name(String name) { this.name = name; return this; }
public Builder threshold(double threshold) { this.threshold = threshold; return this; }
public Builder evaluationParams(List<EvalTestCaseParam> params) {
this.evaluationParams = List.copyOf(params);
return this;
}
// Add evaluator-specific builder methods
public NameEvaluator build() { return new NameEvaluator(this); }
}
}
Create a test class in dokimos-core/src/test/java/dev/dokimos/core/. Tests use JUnit 6 (Jupiter) and AssertJ. Name it <EvaluatorName>Test.java in the dev.dokimos.core package (not the evaluators subpackage — this matches existing convention).
Cover at minimum:
IllegalArgumentException)If the evaluator uses a JudgeLM, mock it with Mockito in tests.
dev.dokimos.core.evaluators packageBaseEvaluator, not Evaluator directlyBuilderpublic static Builder builder() methodList.copyOf() for defensive copying of evaluationParamsdev.dokimos.core package (not evaluators)assertThat(...))