Audit binary text benchmarks for dataset shortcuts before trusting high AUROC — bundled shortcut-battery CLI + 7-family auditing methodology skill.
A Claude Code plugin marketplace for auditing binary text benchmarks for dataset shortcuts — before you trust (or publish) a suspiciously high AUROC from a human-vs-AI detector, authorship classifier, or any binary text benchmark.
High classifier scores are usually a mix of construction artifacts (formatting relics, leaked tokens), brittle-but-real signals (LLM-isms, length policy), and robust signal. This plugin helps you decompose the score instead of trusting it.
/plugin marketplace add security-engineer/custom-dataset-audit
/plugin install dataset-shortcut-audit@custom-dataset-audit
Then the auditing-dataset-shortcuts skill activates whenever a text benchmark reports a suspiciously high score, and the bundled CLI is available.
auditing-dataset-shortcuts skill — the 7-family auditing methodology (length, format markers, unicode, surface stats, lexical, human-side relics, signal location) plus iron rules (match the protocol, verify the split before claiming leakage, compare rates not counts) and the normalize-and-retest workflow.audit_shortcuts.py CLI — a dataset-agnostic shortcut battery you can run directly.Two JSONL files, one per class, each line a JSON object with a text field:
python plugins/dataset-shortcut-audit/skills/auditing-dataset-shortcuts/scripts/audit_shortcuts.py \
--human human.jsonl --ai ai.jsonl --text-field text --json report.json
Reports separability AUROC (0.5 = no signal, 1.0 = perfect giveaway) for:
…then prints a verdict flagging ceiling lexical shortcuts, register confounds, length confounds, surface giveaways, and opener asymmetry.
Dependencies: scikit-learn and numpy for the two probes (pip install scikit-learn numpy). The surface/length battery runs on pure stdlib; the probes are skipped gracefully if those packages are absent.
The CLI is a fast first pass. It is necessary, not sufficient — pair it with the manual split/leakage and signal-location checks in the skill, which an automated battery cannot do.
MIT
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
Delegate exploration, drafting, second-opinion review, and edge-case hunting to OpenAI Codex (GPT-5.x) as a sub-agent under Claude Code's supervision. Bundles the codex MCP server, a balanced-delegation skill, slash commands, and a SessionStart policy that keeps the workflow always-on.
Generate beautiful, well-composed .pptx from code (pptxgenjs): 4 genre presets (research/business), 6 themes, 11 semantic slide components, a quantitative design linter, and optional template-edit + visual-QA. Korean-first.
Talk like a human, not an AI: 10-level difficulty dial for conversations, auto-glossary for new terms, and a blog-style single-file HTML report mode with inline-SVG visualizations. Korean-first.
npx claudepluginhub security-engineer/custom-dataset-audit --plugin dataset-shortcut-auditComprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.
A growing collection of Claude-compatible academic workflow bundles. Covers scientific figures, manuscript writing and polishing, reviewer assessment, citation retrieval, data availability, paper reading, literature search, response letters, paper-to-PPTX conversion, and evidence-grounded Chinese invention patent drafting. Rules are organized as reusable skill folders with explicit workflows and quality checks.
Intelligent draw.io diagramming plugin with AI-powered diagram generation, multi-platform embedding (GitHub, Confluence, Azure DevOps, Notion, Teams, Harness), conditional formatting, live data binding, and MCP server integration for programmatic diagram creation and management.
Persistent file-based planning for AI coding agents. Crash-proof markdown plans (task_plan.md, findings.md, progress.md) that survive context loss and /clear, with an opt-in completion gate and multi-agent shared state. Manus-style. Works with Claude Code, Codex CLI, Cursor, Kiro, OpenCode and 60+ agents via the SKILL.md standard. Includes Arabic, German, Spanish, and Chinese (Simplified and Traditional).
Complete creative writing suite with 10 specialized agents covering the full writing process: research gathering, character development, story architecture, world-building, dialogue coaching, editing/review, outlining, content strategy, believability auditing, and prose style/voice analysis. Includes genre-specific guides, templates, and quality checklists.
Write SQL, explore datasets, and generate insights faster. Build visualizations and dashboards, and turn raw data into clear stories for stakeholders.