Help us improve
Share bugs, ideas, or general feedback.
From vuln-scout
Detects AI/ML security vulnerabilities like unsafe model deserialization in PyTorch/Joblib/NumPy, prompt injection in LLM prompts, and risks in Jupyter notebooks or ML pipelines.
npx claudepluginhub allsmog/vuln-scout --plugin whitebox-pentestHow this skill is triggered — by the user, by Claude, or both
Slash command
/vuln-scout:ai-ml-attacksThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Detect security vulnerabilities specific to AI/ML pipelines, LLM-backed applications, and data science workflows. These attack surfaces are increasingly common and often overlooked by traditional SAST tools.
Scans AI models for malicious elements before loading in inference engines. Detects unsafe formats like pickle, backdoored models, and embedded scripts.
Detects compromised models from unverified sources, floating tags, and unreviewed registries with checksum verification, pinned revisions, and approval gates.
Provides Python security patterns for API key management with env vars/.gitignore/validation and input sanitization against path traversal.
Share bugs, ideas, or general feedback.
Detect security vulnerabilities specific to AI/ML pipelines, LLM-backed applications, and data science workflows. These attack surfaces are increasingly common and often overlooked by traditional SAST tools.
Activate this skill when reviewing code that:
The most critical ML-specific vulnerability. Many ML serialization formats execute arbitrary code on load.
Dangerous Functions:
| Framework | Dangerous | Safe Alternative |
|---|---|---|
| PyTorch | torch.load(path) | torch.load(path, weights_only=True) |
| Joblib | joblib.load(path) | Verify source, use safetensors |
| NumPy | numpy.load(path, allow_pickle=True) | numpy.load(path, allow_pickle=False) |
| Scikit-learn | joblib.load() / pickle.load() | skops.io with trusted types |
| TensorFlow | tf.saved_model.load() with custom ops | Verify model provenance |
| ONNX | Generally safe | Validate graph structure |
| SafeTensors | Safe by design | Recommended format |
Detection:
# PyTorch unsafe load
grep -rn "torch\.load(" --include="*.py" | grep -v "weights_only=True"
# Joblib/sklearn model loading
grep -rn "joblib\.load\|sklearn.*load" --include="*.py"
# NumPy with pickle enabled
grep -rn "numpy\.load\|np\.load" --include="*.py" | grep "allow_pickle"
# Generic unsafe deserialization in ML context
grep -rn "pickle\.load\|pickle\.loads\|dill\.load\|cloudpickle\.load" --include="*.py"
Exploitation: An attacker who can supply a malicious model file achieves arbitrary code execution on the server loading the model. This is especially dangerous in:
User input flowing into LLM prompts without sanitization, allowing attackers to override system instructions.
Patterns to Detect:
# Direct string formatting in prompts
grep -rn 'f".*{.*}.*prompt\|f".*{.*}.*system\|\.format(.*user' --include="*.py"
# LangChain prompt templates with user input
grep -rn "PromptTemplate\|ChatPromptTemplate\|HumanMessage" --include="*.py"
# OpenAI/Anthropic API calls with user input in system message
grep -rn "system.*content.*=.*f\"\|system.*content.*\.format" --include="*.py"
grep -rn "messages.*append\|messages.*system" --include="*.py" --include="*.ts" --include="*.js"
Vulnerable Pattern:
# User input directly in system prompt
prompt = f"You are a helpful assistant. The user's name is {user_input}. Answer their question."
response = openai.chat.completions.create(messages=[{"role": "system", "content": prompt}])
Indicators of Risk:
Untrusted .ipynb files can execute arbitrary code when opened or processed.
Detection:
# Notebook execution in pipelines
grep -rn "nbconvert\|nbclient\|ExecutePreprocessor\|execute_notebook" --include="*.py"
# Papermill execution
grep -rn "papermill\.execute\|pm\.execute" --include="*.py"
# Magic commands in notebooks
grep -rn "%system\|%sx\|!.*pip\|!.*apt\|!.*curl\|!.*wget" --include="*.ipynb"
# IPython display with JS
grep -rn "IPython\.display\.Javascript\|display\.HTML" --include="*.py" --include="*.ipynb"
Loading models from untrusted sources (user-specified repos, URLs).
Detection:
# HuggingFace from_pretrained with user-controlled repo
grep -rn "from_pretrained\|AutoModel\|AutoTokenizer\|pipeline(" --include="*.py"
# Verify if the model ID comes from user input
grep -rn "from_pretrained.*request\|from_pretrained.*params\|from_pretrained.*args" --include="*.py"
# TensorFlow Hub
grep -rn "hub\.load\|hub\.KerasLayer" --include="*.py"
# Model download from URLs
grep -rn "urllib.*model\|requests.*model.*download\|wget.*\.pt\|wget.*\.bin" --include="*.py"
Paths where an attacker can influence training data.
Detection:
# Writable training data paths
grep -rn "train.*path\|data.*dir\|dataset.*path" --include="*.py" --include="*.yaml" --include="*.yml"
# Unvalidated data pipeline inputs
grep -rn "pd\.read_csv\|pd\.read_json\|pd\.read_sql" --include="*.py" | grep -i "url\|request\|user\|input"
# S3/GCS data loading without integrity checks
grep -rn "s3://\|gs://\|blob\.download" --include="*.py" | grep -v "checksum\|hash\|verify"
grep -rn "import torch\|import tensorflow\|import sklearn\|import transformers\|import langchain\|import openai\|import anthropic" --include="*.py"
grep -rn "\.load\|from_pretrained\|load_model\|load_weights" --include="*.py"
For each loading point, determine if the source (file path, URL, repo ID) can be controlled by an attacker.
weights_only=True for PyTorch