From r-package-skills
Use when code loads or uses vitals (library(vitals), vitals::), evaluating LLM output quality, scoring AI responses, testing RAG retrieval accuracy, or benchmarking prompt changes in R
npx claudepluginhub arthurgailes/r-package-skills --plugin r-package-skillsThis skill uses the workspace's default tool permissions.
**vitals tests LLM output quality.** Create test datasets, define solvers (LLM pipelines), score outputs. Benchmark RAG systems, prompt changes, model performance.
Evaluates LLM apps using automated metrics (BLEU, ROUGE, BERTScore, MRR), human feedback, and LLM-as-judge. For testing performance, benchmarking, and regressions.
Use when code loads ellmer, btw, mcptools, ragnar, or vitals, building LLM-powered R applications, implementing RAG workflows, or choosing between R AI packages (meta-skill for ellmer/btw/mcptools/ragnar/vitals)
Implements LLM evaluation strategies using automated metrics (BLEU, ROUGE, BERTScore, perplexity), human feedback, LLM-as-judge, and benchmarking. For testing AI app performance and quality.
Share bugs, ideas, or general feedback.
vitals tests LLM output quality. Create test datasets, define solvers (LLM pipelines), score outputs. Benchmark RAG systems, prompt changes, model performance.
Install: install.packages("vitals")
Read references/API.md before writing code.
references/API.md - Complete function referencereferences/package-docs.md - Test suite creation and scoring patternslibrary(vitals)
# Create test dataset
test_cases <- tibble::tibble(
input = c("question 1", "question 2"),
target = c("answer 1", "answer 2")
)
# Define solver (your LLM pipeline)
chat <- chat_openai()
solver <- function(input) {
chat$chat(input, echo = "none")
}
# Run evaluation
task <- Task$new(
dataset = test_cases,
solver = solver,
scorer = model_graded_qa()
)
task$run()
task$view()
# Test RAG system
ragnar_register_tool_retrieve(chat, store)
task$run(chat) # Tests with RAG
| Issue | Solution |
|---|---|
| No test dataset | Create tibble with input/target columns |
| Solver not function | Wrap chat in function: function(input) chat$chat(input) |
| Using for non-LLM tests | Use testthat for traditional testing |
| Forgetting echo = "none" | Solver should return text, not print |
Task Management:
Task$new(): Create evaluation tasktask$run(): Execute teststask$view(): View resultsScorers:
model_graded_qa(): LLM grades Q&A qualitySee references/ for:
With ellmer: Test chat quality With ragnar: Evaluate RAG accuracy Cross-package patterns: See r-ai meta-skill