Skill

training-data-poisoning

Checks fine-tuning pipelines, dataset loaders, and curation scripts for training data poisoning risks like unverified ingestion, missing validation, duplicates, and poor splits. Suggests OWASP LLM03 fixes.

Python

security

ai-ml

npx claudepluginhub thejefflarson/soundcheck --plugin soundcheck

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Protects against malicious or low-quality examples being introduced into training or

SKILL.md

Similar Skills

AI/ML Attack Surface

Detects AI/ML security vulnerabilities like unsafe model deserialization in PyTorch/Joblib/NumPy, prompt injection in LLM prompts, and risks in Jupyter notebooks or ML pipelines.

vuln-scout

llm-supply-chain

Checks for LLM supply chain vulnerabilities including unverified model downloads, floating version tags, unapproved providers, and unchecked automated updates. Flags risks and suggests pinned SHAs, checksums, org allowlists, and human approval.

soundcheck

Ai Code Security

Audits AI-generated code and LLM applications for security vulnerabilities, covering OWASP Top 10 for LLMs, secure coding patterns, and AI-specific threat models.

3 files

omer-metin-skills-for-antigravity-2

Stats

Stars13

Forks0

Last CommitApr 18, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Training Data Poisoning Security Check (OWASP LLM03:2025)

What this checks

Protects against malicious or low-quality examples being introduced into training or fine-tuning datasets. Poisoned data can embed backdoors, degrade accuracy, or skew model behavior in ways that are difficult to detect after training completes.

Vulnerable patterns

Ingesting scraped or user-contributed examples with no content validation
No deduplication or anomaly detection on training set statistics
Loading dataset files without verifying provenance or checksums
Using the same split for training and validation, hiding distribution shift

Fix immediately

Flag the vulnerable code and explain the risk. Then suggest a fix that establishes these properties:

Every external dataset file is checksum-verified before use. A pinned SHA-256 in version control; the loader computes the digest on load and refuses to proceed on mismatch. Pinning a URL or version alone doesn't help when the bytes behind them change.
Every example passes content validation before entering the training set: type and length checks, disallowed-pattern filtering (injection tokens like ignore previous, <|im_start|>, jailbreak signatures), and encoding/Unicode sanity. Invalid examples are dropped, not silently used.
Duplicates are removed before training. Poisoning attacks often batch the same backdoor trigger across many examples; deduplication by content hash limits the leverage of a single injected payload.
Label distribution is checked and alerts fire on imbalance above a threshold. A sudden 80%-one-class shift is a statistical signature of bulk-inserted poison; it's cheap to catch at ingestion and impossible to reverse after training.
Train and validation splits come from disjoint sources or time windows. Reusing the same split for both hides distribution shift and lets poisoned examples score well on validation.

Anchor — shape, not implementation:

require(sha256(dataset_file) == PINNED_SHA256)
rows    = [r for r in parse(dataset_file) if validate(r)]        # per-example
unique  = dedupe_by_hash(rows)
require(max_class_fraction(unique) < 0.8)                         # anomaly gate
train, val = split_by_source(unique, val_fraction=0.1)

Verification

Confirm the response:

For every external dataset load present, files are verified against pinned checksums before use
Every training example passes content validation (length limits, disallowed-pattern filtering)
Duplicates are removed before training starts
For every dataset with categorical labels present, class distribution is checked and alerted on imbalance above a threshold

References

CWE-20 (Improper Input Validation)
CWE-1021 (Improper Restriction of Rendered UI Layers)
OWASP LLM03:2025 Training Data Poisoning