Skill

evaluating-with-promptfoo

From ai

LLM evaluation and red-teaming toolkit using promptfoo. Covers promptfooconfig.yaml configuration, 40+ assertion types (deterministic, model-graded, RAG), provider setup (OpenAI, Anthropic, Google, Ollama, HTTP, custom JS/Python), red teaming (134+ plugins, jailbreak strategies, compliance frameworks), CLI commands, caching, and CI/CD integration. Use when writing promptfooconfig.yaml, designing LLM test suites, running adversarial red team evaluations, or integrating LLM quality gates in CI/CD. Detects: promptfooconfig.yaml or promptfoo in package.json. For general LLMOps operations, use designing-genai-patterns. For general test methodology (TDD/AAA), use testing-code.

Popularity

Parent stars

Shared by

Invocation

How this skill is triggered — by the user, by Claude, or both

Slash command

/ai:evaluating-with-promptfoo

User invocable

Model invocable

Forked subagent

Default effort

Configuration

Agentgeneral-purpose

Context Preview

The summary Claude sees in its skill listing — used to decide when to auto-load this skill

詳細な手順・ガイドラインは `INSTRUCTIONS.md` を参照してください。

Supporting Files

INSTRUCTIONS.mdreferences/ASSERTIONS.mdreferences/CI-CD.mdreferences/PROVIDERS.mdreferences/RED-TEAMING.md

SKILL.md

18 lines · ~209 tokens

Stats

LanguagePython

Parent stars1

MaintenanceGood

Last CommitJun 13, 2026

Actions

View Source View Plugin View on GitHub View README

Stats

Actions

evaluating-with-promptfoo

Popularity

Invocation

Configuration

Context Preview

Supporting Files

SKILL.md

evaluating-with-promptfoo

Popularity

Invocation

Configuration

Context Preview

Supporting Files

SKILL.md

Similar Skills

Similar Skills