From skills
Create and manage evaluators in Adaline to score prompt outputs. Use when setting up LLM-as-a-judge, JavaScript, text-matcher, cost, latency, or response-length evaluators for quality assessment.
npx claudepluginhub adaline/skills --plugin skillsThis skill uses the workspace's default tool permissions.
Evaluators define scoring criteria for prompt outputs. Each evaluator run produces a grade (pass/fail), a numeric score, and a reason. Multiple evaluators can be stacked on the same prompt to assess different quality dimensions simultaneously.
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Conducts multi-round deep research on GitHub repos via API and web searches, generating markdown reports with executive summaries, timelines, metrics, and Mermaid diagrams.
Dynamically discovers and combines enabled skills into cohesive, unexpected delightful experiences like interactive HTML or themed artifacts. Activates on 'surprise me', inspiration, or boredom cues.
Evaluators define scoring criteria for prompt outputs. Each evaluator run produces a grade (pass/fail), a numeric score, and a reason. Multiple evaluators can be stacked on the same prompt to assess different quality dimensions simultaneously.
Key terms:
pass or failai-powered, code, or performanceSet these environment variables when your Adaline credentials are available:
ADALINE_API_KEY — your workspace API key (from Settings > API Keys at app.adaline.ai)promptId — the prompt to attach evaluators to (from the Adaline editor URL)datasetId — the dataset to link evaluators to (from the Adaline dashboard)https://api.adaline.ai/v2You can start integrating before you have credentials. All code examples use placeholder values — replace them with real values when ready.
Uses a model to score outputs against a custom rubric. Most versatile option — use for qualitative assessment, tone, helpfulness, accuracy, or any criterion expressible in natural language.
Config:
{
"type": "llm-as-a-judge",
"group": "ai-powered",
"value": "Does the response answer the user's question completely and without hallucination? Score pass if all claims are grounded in the context provided."
}
The value field is the rubric string. Write it as a clear instruction to the judge model. Include what constitutes pass vs. fail.
Runs a custom JavaScript function body against the output. Return 'pass' or 'fail' as a string. Use for structural checks, format validation, or logic that text matching cannot express.
Config:
{
"type": "javascript",
"group": "code",
"value": "const parsed = JSON.parse(output); return parsed.hasOwnProperty('answer') ? 'pass' : 'fail';"
}
The function body receives output (string), input (string), and expectedOutput (string) as variables.
Checks the output string against a pattern using a built-in operator. Use for exact or partial string validation without custom code.
Operators: regex, equals, starts-with, ends-with, contains-all, contains-any, not-contains-any
Config:
{
"type": "text-matcher",
"group": "code",
"value": {
"operator": "contains-all",
"patterns": ["summary", "conclusion"]
}
}
For regex, equals, starts-with, ends-with: use a single "pattern" string field.
For contains-all, contains-any, not-contains-any: use a "patterns" array.
Passes when the token cost of the prompt run meets a threshold. Use to enforce spending SLAs on prompt runs.
Config:
{
"type": "cost",
"group": "performance",
"value": {
"threshold": {
"value": 0.01,
"unit": "USD",
"operator": "less"
}
}
}
Operators: less, greater, equals
Unit: USD
Passes when the response time of the prompt run meets a threshold. Use to enforce latency budgets.
Config:
{
"type": "latency",
"group": "performance",
"value": {
"threshold": {
"value": 2000,
"unit": "ms",
"operator": "less"
}
}
}
Operators: less, greater, equals
Unit: ms
Passes when the length of the model response meets a threshold. Use to enforce brevity or minimum coverage requirements.
Config:
{
"type": "response-length",
"group": "performance",
"value": {
"threshold": {
"value": 200,
"unit": "words",
"operator": "less"
}
}
}
Operators: less, greater, equals
Units: words, tokens, characters
| Symptom | Fix |
|---|---|
| Need to assess tone, accuracy, or helpfulness | Use LLM-as-a-judge with an explicit rubric |
| Need to validate JSON structure or field presence | Use JavaScript evaluator |
| Need to check if output contains required phrases | Use Text Matcher with contains-all |
| Need to ensure responses stay under a cost budget | Use Cost evaluator with less operator |
| Need to catch slow prompt runs | Use Latency evaluator with less operator |
| Need to prevent overly long or too-short responses | Use Response Length evaluator |
curl -X POST https://api.adaline.ai/v2/prompts/{promptId}/evaluators \
-H "Authorization: Bearer $ADALINE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"projectId": "your-project-id",
"title": "Factual accuracy judge",
"datasetId": "your-dataset-id",
"config": {
"type": "llm-as-a-judge",
"group": "ai-powered",
"value": "Does the response answer the question accurately without hallucination?"
}
}'
See references/api.md for the full REST API reference with all endpoints, request schemas, and curl examples.