Skill

skill-comply

Visualize whether skills, rules, and agent definitions are actually followed — auto-generates scenarios at 3 prompt strictness levels, runs agents, classifies behavioral sequences, and reports compliance rates with full tool call timelines

From everything-claude-code

Install

Run in your terminal

npx claudepluginhub binzetss/mobile-hvgllocal

Tool Access

This skill uses the workspace's default tool permissions.

Supporting Assets

View in Repository

fixtures/tdd_spec.yaml

prompts/classifier.md

prompts/scenario_generator.md

prompts/spec_generator.md

Skill Content

Similar Skills

agent-harness-construction

Designs and optimizes AI agent action spaces, tool definitions, observation formats, error recovery, and context for higher task completion rates.

everything-claude-code

139.2k

agent-payment-x402

Enables AI agents to execute x402 payments with per-task budgets, spending controls, and non-custodial wallets via MCP tools. Use when agents pay for APIs, services, or other agents.

everything-claude-code

139.2k

agent-eval

Compares coding agents like Claude Code and Aider on custom YAML-defined codebase tasks using git worktrees, measuring pass rate, cost, time, and consistency.

everything-claude-code

139.2k

Stats

Stars1

Forks0

Last CommitApr 4, 2026

Actions

View Source View Plugin View on GitHub View README

skill-comply: Automated Compliance Measurement

Measures whether coding agents actually follow skills, rules, or agent definitions by:

Auto-generating expected behavioral sequences (specs) from any .md file

Auto-generating scenarios with decreasing prompt strictness (supportive → neutral → competing)

Running claude -p and capturing tool call traces via stream-json

Classifying tool calls against spec steps using LLM (not regex)

Checking temporal ordering deterministically

Generating self-contained reports with spec, prompts, and timelines

# Full run uv run python -m scripts.run ~/.claude/rules/common/testing.md # Dry run (no cost, spec + scenarios only) uv run python -m scripts.run --dry-run ~/.claude/skills/search-first/SKILL.md # Custom models uv run python -m scripts.run --gen-model haiku --model sonnet <path>

Report Contents

Reports are self-contained and include:

Expected behavioral sequence (auto-generated spec)

Scenario prompts (what was asked at each strictness level)

Compliance scores per scenario

Tool call timelines with LLM classification labels

Advanced (optional)

For users familiar with hooks, reports also include hook promotion recommendations for steps with low compliance. This is informational — the main value is the compliance visibility itself.

skill-comply: Automated Compliance Measurement

Measures whether coding agents actually follow skills, rules, or agent definitions by:

Auto-generating expected behavioral sequences (specs) from any .md file
Auto-generating scenarios with decreasing prompt strictness (supportive → neutral → competing)
Running claude -p and capturing tool call traces via stream-json
Classifying tool calls against spec steps using LLM (not regex)
Checking temporal ordering deterministically
Generating self-contained reports with spec, prompts, and timelines

Supported Targets

Skills (skills/*/SKILL.md): Workflow skills like search-first, TDD guides
Rules (rules/common/*.md): Mandatory rules like testing.md, security.md, git-workflow.md
Agent definitions (agents/*.md): Whether an agent gets invoked when expected (internal workflow verification not yet supported)

When to Activate

User runs /skill-comply <path>
User asks "is this rule actually being followed?"
After adding new rules/skills, to verify agent compliance
Periodically as part of quality maintenance

Usage

# Full run
uv run python -m scripts.run ~/.claude/rules/common/testing.md

# Dry run (no cost, spec + scenarios only)
uv run python -m scripts.run --dry-run ~/.claude/skills/search-first/SKILL.md

# Custom models
uv run python -m scripts.run --gen-model haiku --model sonnet <path>

Key Concept: Prompt Independence

Measures whether a skill/rule is followed even when the prompt doesn't explicitly support it.

Report Contents

Reports are self-contained and include:

Expected behavioral sequence (auto-generated spec)
Scenario prompts (what was asked at each strictness level)
Compliance scores per scenario
Tool call timelines with LLM classification labels

Advanced (optional)

For users familiar with hooks, reports also include hook promotion recommendations for steps with low compliance. This is informational — the main value is the compliance visibility itself.