Scaffold a new Evaluator implementation for the Dokimos LLM evaluation framework. Creates evaluator classes extending BaseEvaluator with the builder pattern, supporting both simple evaluators and LLM-judged evaluators using JudgeLM.
Create evaluation datasets for the Dokimos LLM evaluation framework in JSON, CSV, or JSONL format. Supports simple and structured example formats with inputs, expected outputs, and metadata.
Scaffold eval-driven tests using dokimos-junit. Creates JUnit parameterized tests with @DatasetSource and Assertions.assertEval() for running Dokimos evaluations as unit tests in CI.
Scaffold a Dokimos Experiment that wires together a dataset, task, evaluators, and optional reporter. Supports parallelism, multiple runs for variance reduction, and server-based reporting.
Set up evaluation of AI agents with tool call validation, correctness checks, task completion, and tool reliability using Dokimos. Framework-agnostic — works with any agent framework.
Set up evaluation of Koog AI agents using Dokimos. Wires Koog agents as the system under test or as LLM judges via KoogSupport utilities, with Kotlin DSL support.
Set up evaluation of LangChain4j applications and RAG pipelines using Dokimos. Provides task and judge creation via LangChain4jSupport, with evaluators for faithfulness, contextual relevance, and hallucination.
Set up evaluation of Spring AI applications using Dokimos. Provides judge creation and type conversion via SpringAiSupport, with @SpringBootTest integration for evaluations in CI.
Claude Code marketplace entries for the plugin-safe Antigravity Awesome Skills library and its compatible editorial bundles.
Directory of popular Claude Code extensions including development tools, productivity plugins, and MCP integrations
No description available.
Share bugs, ideas, or general feedback.