WHEN: Pandas/NumPy code review, data processing, vectorization, memory optimization WHAT: Vectorization patterns + Memory efficiency + Data validation + Performance optimization + Best practices WHEN NOT: Web framework → fastapi/django/flask-reviewer, General Python → python-reviewer
Reviews Pandas/NumPy code for vectorization opportunities, memory optimization, and data validation best practices. Use when reviewing data processing scripts or investigating slow data operations.
/plugin marketplace add physics91/claude-vibe/plugin install claude-vibe@physics91-pluginsThis skill inherits all available tools. When active, it can use any tool Claude has access to.
Reviews data science code for Pandas/NumPy efficiency, memory usage, and best practices.
pandas, numpy in requirements.txt.ipynb Jupyter notebooksdata/, notebooks/ directories**Pandas**: 2.0+
**NumPy**: 1.24+
**Other**: polars, dask, vaex
**Visualization**: matplotlib, seaborn, plotly
**ML**: scikit-learn, xgboost
AskUserQuestion:
"Which areas to review?"
Options:
- Full data code review (recommended)
- Vectorization and performance
- Memory optimization
- Data validation
- Code organization
multiSelect: true
| Check | Recommendation | Severity |
|---|---|---|
| iterrows() loop | Use vectorized operations | CRITICAL |
| apply() with simple func | Use built-in vectorized | HIGH |
| Manual loop over array | Use NumPy broadcasting | HIGH |
| List comprehension on Series | Use .map() or vectorize | MEDIUM |
# BAD: iterrows (extremely slow)
for idx, row in df.iterrows():
df.loc[idx, "total"] = row["price"] * row["quantity"]
# GOOD: Vectorized operation
df["total"] = df["price"] * df["quantity"]
# BAD: apply with simple operation
df["upper_name"] = df["name"].apply(lambda x: x.upper())
# GOOD: Built-in string method
df["upper_name"] = df["name"].str.upper()
# BAD: apply with condition
df["status"] = df["score"].apply(lambda x: "pass" if x >= 60 else "fail")
# GOOD: np.where or np.select
df["status"] = np.where(df["score"] >= 60, "pass", "fail")
# Multiple conditions
conditions = [
df["score"] >= 90,
df["score"] >= 60,
df["score"] < 60,
]
choices = ["A", "B", "F"]
df["grade"] = np.select(conditions, choices)
| Check | Recommendation | Severity |
|---|---|---|
| int64 for small ints | Use int8/int16/int32 | MEDIUM |
| object dtype for categories | Use category dtype | HIGH |
| Loading full file | Use chunks or usecols | HIGH |
| Keeping unused columns | Drop early | MEDIUM |
# BAD: Default dtypes waste memory
df = pd.read_csv("large_file.csv") # All int64, object
# GOOD: Specify dtypes
dtype_map = {
"id": "int32",
"age": "int8",
"status": "category",
"price": "float32",
}
df = pd.read_csv("large_file.csv", dtype=dtype_map)
# GOOD: Load only needed columns
df = pd.read_csv(
"large_file.csv",
usecols=["id", "name", "price"],
dtype=dtype_map,
)
# GOOD: Process in chunks
chunks = pd.read_csv("huge_file.csv", chunksize=100_000)
result = pd.concat([process_chunk(chunk) for chunk in chunks])
# Memory check
print(df.info(memory_usage="deep"))
# Convert object to category if low cardinality
for col in df.select_dtypes(include=["object"]).columns:
if df[col].nunique() / len(df) < 0.5: # < 50% unique
df[col] = df[col].astype("category")
| Check | Recommendation | Severity |
|---|---|---|
| No null check | Validate nulls early | HIGH |
| No dtype validation | Assert expected types | MEDIUM |
| No range validation | Check value bounds | MEDIUM |
| Silent data issues | Raise or log warnings | HIGH |
# GOOD: Data validation function
def validate_dataframe(df: pd.DataFrame) -> pd.DataFrame:
"""Validate and clean input DataFrame."""
# Required columns
required_cols = ["id", "name", "price", "quantity"]
missing = set(required_cols) - set(df.columns)
if missing:
raise ValueError(f"Missing columns: {missing}")
# Null checks
null_counts = df[required_cols].isnull().sum()
if null_counts.any():
logger.warning(f"Null values found:\n{null_counts[null_counts > 0]}")
# Type validation
assert df["id"].dtype in ["int32", "int64"], "id must be integer"
assert df["price"].dtype in ["float32", "float64"], "price must be float"
# Range validation
invalid_prices = df[df["price"] < 0]
if len(invalid_prices) > 0:
logger.warning(f"Found {len(invalid_prices)} negative prices")
df = df[df["price"] >= 0]
# Duplicate check
duplicates = df.duplicated(subset=["id"])
if duplicates.any():
logger.warning(f"Removing {duplicates.sum()} duplicates")
df = df.drop_duplicates(subset=["id"])
return df
| Check | Recommendation | Severity |
|---|---|---|
| Python loop over array | Use broadcasting | CRITICAL |
| np.append in loop | Pre-allocate array | HIGH |
| Repeated array creation | Reuse arrays | MEDIUM |
| Copy when not needed | Use views | MEDIUM |
# BAD: Python loop
result = []
for i in range(len(arr)):
result.append(arr[i] * 2 + 1)
result = np.array(result)
# GOOD: Vectorized
result = arr * 2 + 1
# BAD: np.append in loop (creates new array each time)
result = np.array([])
for x in data:
result = np.append(result, process(x))
# GOOD: Pre-allocate
result = np.empty(len(data))
for i, x in enumerate(data):
result[i] = process(x)
# BETTER: Vectorize if possible
result = np.vectorize(process)(data)
# BEST: Pure NumPy operation
result = np.sqrt(data) + np.log(data)
# BAD: Unnecessary copy
subset = arr[arr > 0].copy() # copy() often not needed
# GOOD: View when modification not needed
subset = arr[arr > 0] # Returns view
| Check | Recommendation | Severity |
|---|---|---|
| Chained indexing | Use .loc/.iloc | HIGH |
| inplace=True | Assign result instead | MEDIUM |
| reset_index() abuse | Keep index when useful | LOW |
| df.append (deprecated) | Use pd.concat | HIGH |
# BAD: Chained indexing (unpredictable)
df[df["price"] > 100]["status"] = "premium" # May not work!
# GOOD: Use .loc
df.loc[df["price"] > 100, "status"] = "premium"
# BAD: df.append (deprecated)
result = pd.DataFrame()
for chunk in chunks:
result = result.append(chunk)
# GOOD: pd.concat
result = pd.concat(chunks, ignore_index=True)
# BAD: Multiple operations creating copies
df = df.dropna()
df = df.reset_index(drop=True)
df = df.sort_values("date")
# GOOD: Method chaining
df = (
df
.dropna()
.reset_index(drop=True)
.sort_values("date")
)
## Python Data Code Review Results
**Project**: [name]
**Pandas**: 2.1 | **NumPy**: 1.26 | **Size**: ~1M rows
### Vectorization
| Status | File | Issue |
|--------|------|-------|
| CRITICAL | process.py:45 | iterrows() loop - use vectorized ops |
### Memory
| Status | File | Issue |
|--------|------|-------|
| HIGH | load.py:23 | object dtype for 'status' - use category |
### Data Validation
| Status | File | Issue |
|--------|------|-------|
| HIGH | etl.py:67 | No null value handling |
### NumPy
| Status | File | Issue |
|--------|------|-------|
| HIGH | calc.py:12 | np.append in loop - pre-allocate |
### Recommended Actions
1. [ ] Replace iterrows with vectorized operations
2. [ ] Convert low-cardinality columns to category
3. [ ] Add data validation before processing
4. [ ] Pre-allocate NumPy arrays in loops
# Profile memory usage
df.info(memory_usage="deep")
# Profile time
%timeit df.apply(func) # Jupyter magic
%timeit df["col"].map(func)
# Use query() for complex filters
df.query("price > 100 and category == 'A'")
# Use eval() for complex expressions
df.eval("profit = revenue - cost")
python-reviewer: General Python patternsperf-analyzer: Performance profilingcoverage-analyzer: Test coverage for data codeThis skill should be used when the user asks to "create a slash command", "add a command", "write a custom command", "define command arguments", "use command frontmatter", "organize commands", "create command with file references", "interactive command", "use AskUserQuestion in command", or needs guidance on slash command structure, YAML frontmatter fields, dynamic arguments, bash execution in commands, user interaction patterns, or command development best practices for Claude Code.
This skill should be used when the user asks to "create an agent", "add an agent", "write a subagent", "agent frontmatter", "when to use description", "agent examples", "agent tools", "agent colors", "autonomous agent", or needs guidance on agent structure, system prompts, triggering conditions, or agent development best practices for Claude Code plugins.
This skill should be used when the user asks to "create a hook", "add a PreToolUse/PostToolUse/Stop hook", "validate tool use", "implement prompt-based hooks", "use ${CLAUDE_PLUGIN_ROOT}", "set up event-driven automation", "block dangerous commands", or mentions hook events (PreToolUse, PostToolUse, Stop, SubagentStop, SessionStart, SessionEnd, UserPromptSubmit, PreCompact, Notification). Provides comprehensive guidance for creating and implementing Claude Code plugin hooks with focus on advanced prompt-based hooks API.