WHEN: Machine Learning/Deep Learning code review, PyTorch/TensorFlow patterns, Model training optimization, MLOps checks WHAT: Model architecture review + Training patterns + Data pipeline checks + GPU optimization + Experiment tracking WHEN NOT: Data analysis only → python-data-reviewer, General Python → python-reviewer
/plugin marketplace add physics91/claude-vibe/plugin install claude-vibe@physics91-pluginsThis skill inherits all available tools. When active, it can use any tool Claude has access to.
Reviews Machine Learning and Deep Learning code for PyTorch, TensorFlow, scikit-learn, and MLOps best practices.
torch, tensorflow, keras, sklearn in requirements.txt/pyproject.toml.pt, .pth, .h5, .pkl model filestrain.py, model.py, dataset.py files**Framework**: PyTorch / TensorFlow / scikit-learn
**Python**: 3.10+
**CUDA**: 11.x / 12.x
**Task**: Classification / Regression / NLP / CV
**Stage**: Research / Production
AskUserQuestion:
"Which areas to review?"
Options:
- Full ML pattern check (recommended)
- Model architecture review
- Training loop optimization
- Data pipeline efficiency
- MLOps/deployment patterns
multiSelect: true
| Check | Recommendation | Severity |
|---|---|---|
| Missing model.eval() | Inconsistent inference | HIGH |
| Missing torch.no_grad() | Memory leak in inference | HIGH |
| In-place operations in autograd | Gradient computation error | CRITICAL |
| DataLoader num_workers=0 | CPU bottleneck | MEDIUM |
| Missing gradient clipping | Exploding gradients | MEDIUM |
# BAD: Missing eval() and no_grad()
def predict(model, x):
return model(x) # Dropout/BatchNorm inconsistent!
# GOOD: Proper inference mode
def predict(model, x):
model.eval()
with torch.no_grad():
return model(x)
# BAD: In-place operation breaking autograd
x = torch.randn(10, requires_grad=True)
x += 1 # In-place! Breaks gradient computation
# GOOD: Out-of-place operation
x = torch.randn(10, requires_grad=True)
x = x + 1
# BAD: DataLoader bottleneck
loader = DataLoader(dataset, batch_size=32) # num_workers=0
# GOOD: Parallel data loading
loader = DataLoader(
dataset,
batch_size=32,
num_workers=4,
pin_memory=True, # For GPU
persistent_workers=True,
)
# BAD: No gradient clipping
optimizer.step()
# GOOD: Clip gradients
torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)
optimizer.step()
| Check | Recommendation | Severity |
|---|---|---|
| Missing @tf.function | Performance loss | MEDIUM |
| Eager mode in production | Slow inference | HIGH |
| Large model in memory | OOM risk | HIGH |
| Missing mixed precision | Training inefficiency | MEDIUM |
# BAD: No @tf.function
def train_step(x, y):
with tf.GradientTape() as tape:
pred = model(x)
loss = loss_fn(y, pred)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
# GOOD: Use @tf.function
@tf.function
def train_step(x, y):
with tf.GradientTape() as tape:
pred = model(x, training=True)
loss = loss_fn(y, pred)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
# BAD: Missing mixed precision
model.fit(x_train, y_train, epochs=10)
# GOOD: Enable mixed precision
tf.keras.mixed_precision.set_global_policy('mixed_float16')
model.fit(x_train, y_train, epochs=10)
| Check | Recommendation | Severity |
|---|---|---|
| fit_transform on test data | Data leakage | CRITICAL |
| Missing cross-validation | Overfitting risk | HIGH |
| No feature scaling | Model performance | MEDIUM |
| Hardcoded random_state | Reproducibility | LOW |
# BAD: Data leakage
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.fit_transform(X_test) # LEAK! Re-fitting
# GOOD: transform only on test
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test) # No re-fit
# BAD: No cross-validation
model.fit(X_train, y_train)
score = model.score(X_test, y_test)
# GOOD: Use cross-validation
from sklearn.model_selection import cross_val_score
scores = cross_val_score(model, X, y, cv=5)
print(f"CV Score: {scores.mean():.3f} (+/- {scores.std():.3f})")
# BAD: Pipeline without scaling
model = LogisticRegression()
model.fit(X_train, y_train)
# GOOD: Use Pipeline with scaling
from sklearn.pipeline import Pipeline
pipeline = Pipeline([
('scaler', StandardScaler()),
('classifier', LogisticRegression())
])
pipeline.fit(X_train, y_train)
| Check | Problem | Solution |
|---|---|---|
| Loading full dataset to memory | OOM | Use generators/tf.data |
| No data augmentation | Overfitting | Add augmentation |
| Unbalanced classes | Biased model | Oversample/undersample/weights |
| No validation split | No early stopping | Use validation set |
# BAD: Full dataset in memory
images = []
for path in all_image_paths:
images.append(load_image(path)) # OOM for large datasets!
# GOOD: Use generator
def data_generator(paths, batch_size):
for i in range(0, len(paths), batch_size):
batch_paths = paths[i:i+batch_size]
yield np.array([load_image(p) for p in batch_paths])
# GOOD: Use tf.data
dataset = tf.data.Dataset.from_tensor_slices(paths)
dataset = dataset.map(load_and_preprocess)
dataset = dataset.batch(32).prefetch(tf.data.AUTOTUNE)
# BAD: No class weights for imbalanced data
model.fit(X_train, y_train)
# GOOD: Add class weights
from sklearn.utils.class_weight import compute_class_weight
weights = compute_class_weight('balanced', classes=np.unique(y), y=y)
class_weights = dict(enumerate(weights))
model.fit(X_train, y_train, class_weight=class_weights)
| Check | Recommendation | Severity |
|---|---|---|
| CPU tensor operations | Use GPU tensors | HIGH |
| Frequent GPU-CPU transfer | Batch transfers | HIGH |
| No gradient accumulation | OOM for large batch | MEDIUM |
| Missing torch.cuda.empty_cache() | Memory fragmentation | LOW |
# BAD: CPU operations
x = torch.randn(1000, 1000)
y = torch.randn(1000, 1000)
z = x @ y # CPU computation
# GOOD: GPU operations
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
x = torch.randn(1000, 1000, device=device)
y = torch.randn(1000, 1000, device=device)
z = x @ y # GPU computation
# BAD: Frequent CPU-GPU transfer
for x, y in dataloader:
x = x.cuda()
y = y.cuda()
loss = model(x, y)
print(loss.item()) # Sync every iteration!
# GOOD: Batch logging
losses = []
for x, y in dataloader:
x, y = x.to(device), y.to(device)
loss = model(x, y)
losses.append(loss)
if step % log_interval == 0:
print(torch.stack(losses).mean().item())
# Gradient accumulation for large effective batch
accumulation_steps = 4
for i, (x, y) in enumerate(dataloader):
loss = model(x, y) / accumulation_steps
loss.backward()
if (i + 1) % accumulation_steps == 0:
optimizer.step()
optimizer.zero_grad()
| Check | Recommendation | Severity |
|---|---|---|
| No experiment tracking | Reproducibility | HIGH |
| Hardcoded hyperparameters | Config management | MEDIUM |
| No model versioning | Deployment issues | MEDIUM |
| Missing seed setting | Non-reproducible | HIGH |
# BAD: No seed setting
model = train_model(X, y)
# GOOD: Set all seeds
import random
import numpy as np
import torch
def set_seed(seed=42):
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True
set_seed(42)
# BAD: Hardcoded hyperparameters
lr = 0.001
batch_size = 32
epochs = 100
# GOOD: Use config file or hydra
import hydra
from omegaconf import DictConfig
@hydra.main(config_path="configs", config_name="train")
def train(cfg: DictConfig):
model = build_model(cfg.model)
optimizer = torch.optim.Adam(model.parameters(), lr=cfg.lr)
# GOOD: Use experiment tracking
import wandb
wandb.init(project="my-project", config=cfg)
for epoch in range(epochs):
loss = train_epoch(model, dataloader)
wandb.log({"loss": loss, "epoch": epoch})
wandb.finish()
## ML Code Review Results
**Project**: [name]
**Framework**: PyTorch/TensorFlow/scikit-learn
**Task**: Classification/Regression/NLP/CV
**Files Analyzed**: X
### Model Architecture
| Status | File | Issue |
|--------|------|-------|
| MEDIUM | models/resnet.py | Missing dropout for regularization |
| LOW | models/transformer.py | Consider gradient checkpointing |
### Training Loop
| Status | File | Issue |
|--------|------|-------|
| HIGH | train.py | Missing model.eval() in validation (line 45) |
| HIGH | train.py | No gradient clipping (line 67) |
### Data Pipeline
| Status | File | Issue |
|--------|------|-------|
| CRITICAL | data/dataset.py | fit_transform on test data (line 23) |
| HIGH | data/loader.py | DataLoader num_workers=0 |
### MLOps
| Status | File | Issue |
|--------|------|-------|
| HIGH | train.py | No seed setting for reproducibility |
| MEDIUM | train.py | Hardcoded hyperparameters |
### Recommended Actions
1. [ ] Add model.eval() and torch.no_grad() for inference
2. [ ] Fix data leakage in preprocessing
3. [ ] Set random seeds for reproducibility
4. [ ] Add experiment tracking (wandb/mlflow)
python-reviewer skill: General Python code qualitypython-data-reviewer skill: Data preprocessing patternstest-generator skill: ML test generationdocker-reviewer skill: ML containerizationThis skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.