Skill

evaluate-koog

Sets up Dokimos evaluations for Koog AI agents in Kotlin, as system under test or judge using ExactMatchEvaluator, LLMJudgeEvaluator, or DSL.

Kotlin

ai-ml

testing

npx claudepluginhub dokimos-dev/dokimos --plugin evaluate-koog

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Set up Dokimos evaluation for a Koog AI agent. The user will describe their agent and evaluation goals via `$ARGUMENTS`.

SKILL.md

Similar Skills

evaluate-agent

Sets up Dokimos evaluation for AI agents using tools, assessing tool call validity, correctness, task completion, argument hallucinations, and tool definition quality.

evaluate-agent

anthropic-evaluations

Builds AI agent evaluations using Anthropic patterns: code/model/human graders, tasks, trials, benchmarks for coding, conversational, research agents.

8 files2 tools

toolkit

evaluation

32.8k

Builds evaluation frameworks for agent systems to test performance systematically, validate context engineering choices, and measure improvements over time.

antigravity-awesome-skills

Stats

Parent Repo Stars20

Parent Repo Forks3

Last CommitFeb 21, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Evaluate Koog Agent

Set up Dokimos evaluation for a Koog AI agent. The user will describe their agent and evaluation goals via $ARGUMENTS.

Where things live

Koog support: dokimos-koog/src/main/kotlin/dev/dokimos/koog/KoogSupport.kt
Koog tests: dokimos-koog/src/test/kotlin/dev/dokimos/koog/
Maven dependency: dev.dokimos:dokimos-koog

Before writing code, read KoogSupport.kt to understand the available utilities.

Key functions

KoogSupport.kt provides:

asJudge(agentCall: suspend (String) -> String) — wraps any suspend function into a JudgeLM
asJudge(agent: () -> AIAgent<String, String>) — wraps a Koog agent factory into a JudgeLM
AIAgent.runBlocking(input, context) — extension to run a Koog agent synchronously

Setting up evaluation

Using a Koog agent as the system under test

val agent: () -> AIAgent<String, String> = { createMyAgent() }

val task = Task { example ->
    val input = example.inputs()["input"] as String
    val output = agent().runBlocking(input)
    mapOf("output" to output)
}

val result = Experiment.builder()
    .name("Koog Agent Evaluation")
    .dataset(dataset)
    .task(task)
    .evaluator(ExactMatchEvaluator.builder().build())
    .build()
    .run()

Using a Koog agent as a judge

val judge = asJudge { prompt -> myAgent().run(prompt) }
// or
val judge = asJudge { createMyAgent() }

val evaluator = LLMJudgeEvaluator.builder()
    .name("helpfulness")
    .judge(judge)
    .criteria("Is the response helpful and accurate?")
    .evaluationParams(listOf(
        EvalTestCaseParam.INPUT,
        EvalTestCaseParam.ACTUAL_OUTPUT,
        EvalTestCaseParam.EXPECTED_OUTPUT
    ))
    .threshold(0.7)
    .build()

Kotlin DSL (with dokimos-kotlin)

If the user has dokimos-kotlin as a dependency, use the DSL:

val result = experiment {
    name = "Koog Agent Eval"
    dataset = Dataset.fromJson(Path.of("datasets/qa.json"))
    task { example ->
        val output = agent().runBlocking(example.input())
        mapOf("output" to output)
    }
    evaluator(ExactMatchEvaluator.builder().build())
}

Dependencies

The user needs dokimos-koog:

<dependency>
    <groupId>dev.dokimos</groupId>
    <artifactId>dokimos-koog</artifactId>
    <version>${dokimos.version}</version>
</dependency>

Koog itself is a provided-scope dependency — the user must bring their own version.

Steps

Understand from $ARGUMENTS what the Koog agent does and how it's constructed
Determine if the agent is the system under test, the judge, or both
Create a dataset appropriate for the agent's domain
Wire up the evaluation using KoogSupport utilities
Write tests in Kotlin using MockK for mocking