Skill

skill-comply

Measures coding agent compliance with skills/rules/agents by generating specs/scenarios at 3 strictness levels, running agents, classifying tool calls, and reporting timelines with scores.

Python

Bash

testing

developer-tools

Install

npx claudepluginhub colin4k1024/tsp

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Measures whether coding agents actually follow skills, rules, or agent definitions by:

Supporting Assets

fixtures/compliant_trace.jsonlfixtures/noncompliant_trace.jsonlfixtures/tdd_spec.yamlprompts/classifier.mdprompts/scenario_generator.mdprompts/spec_generator.mdpyproject.tomlscripts/__init__.pyscripts/classifier.pyscripts/grader.pyscripts/parser.pyscripts/report.pyscripts/run.pyscripts/runner.pyscripts/scenario_generator.pyscripts/spec_generator.pyscripts/utils.pytests/test_grader.pytests/test_parser.py

SKILL.md

Similar Skills

design-system

Generates design tokens/docs from CSS/Tailwind/styled-components codebases, audits visual consistency across 10 dimensions, detects AI slop in UI.

team-skills-platform

163.7k

ui-demo

Records polished WebM UI demo videos of web apps using Playwright with cursor overlay, natural pacing, and three-phase scripting. Activates for demo, walkthrough, screen recording, or tutorial requests.

team-skills-platform

163.7k

kotlin-patterns

Delivers idiomatic Kotlin patterns for null safety, immutability, sealed classes, coroutines, Flows, extensions, DSL builders, and Gradle DSL. Use when writing, reviewing, refactoring, or designing Kotlin code.

team-skills-platform

163.7k

Stats

Stars0

Forks0

Last CommitApr 22, 2026

Used By26 plugins

Actions

View Source View Plugin View on GitHub View README

skill-comply: Automated Compliance Measurement

Measures whether coding agents actually follow skills, rules, or agent definitions by:

Auto-generating expected behavioral sequences (specs) from any .md file
Auto-generating scenarios with decreasing prompt strictness (supportive → neutral → competing)
Running claude -p and capturing tool call traces via stream-json
Classifying tool calls against spec steps using LLM (not regex)
Checking temporal ordering deterministically
Generating self-contained reports with spec, prompts, and timelines

Supported Targets

Skills (skills/*/SKILL.md): Workflow skills like search-first, TDD guides
Rules (rules/common/*.md): Mandatory rules like testing.md, security.md, git-workflow.md
Agent definitions (agents/*.md): Whether an agent gets invoked when expected (internal workflow verification not yet supported)

When to Activate

User runs /skill-comply <path>
User asks "is this rule actually being followed?"
After adding new rules/skills, to verify agent compliance
Periodically as part of quality maintenance

Usage

# Full run
uv run python -m scripts.run ~/.claude/rules/common/testing.md

# Dry run (no cost, spec + scenarios only)
uv run python -m scripts.run --dry-run ~/.claude/skills/search-first/SKILL.md

# Custom models
uv run python -m scripts.run --gen-model haiku --model sonnet <path>

Key Concept: Prompt Independence

Measures whether a skill/rule is followed even when the prompt doesn't explicitly support it.

Report Contents

Reports are self-contained and include:

Expected behavioral sequence (auto-generated spec)
Scenario prompts (what was asked at each strictness level)
Compliance scores per scenario
Tool call timelines with LLM classification labels

Advanced (optional)

For users familiar with hooks, reports also include hook promotion recommendations for steps with low compliance. This is informational — the main value is the compliance visibility itself.