Best practices for creating comprehensive Jupyter notebook data analyses with statistical rigor, outlier handling, and publication-quality visualizations. Includes Claude API image size helpers.
npx claudepluginhub joshuarweaver/cascade-ai-ml-engineering --plugin delphine-l-claude-globalThis skill is limited to using the following tools:
Expert knowledge for creating comprehensive, statistically rigorous Jupyter notebook analyses.
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Expert knowledge for creating comprehensive, statistically rigorous Jupyter notebook analyses.
When generating images to share with Claude, images must not exceed 8000 pixels in either dimension. Add this helper to your notebook imports:
# Standard imports with Claude size checking
import matplotlib.pyplot as plt
import seaborn as sns
from PIL import Image
MAX_CLAUDE_DIM = 7999 # Claude API limit with safety margin
def save_figure(filename, dpi=300, **kwargs):
"""Save figure with automatic Claude size constraint check."""
plt.savefig(filename, dpi=dpi, bbox_inches='tight', **kwargs)
# Verify and auto-resize if needed
img = Image.open(filename)
if img.width > MAX_CLAUDE_DIM or img.height > MAX_CLAUDE_DIM:
print(f"Auto-resizing {filename} for Claude compatibility")
print(f" Original: {img.width}x{img.height}")
img.thumbnail((MAX_CLAUDE_DIM, MAX_CLAUDE_DIM), Image.Resampling.LANCZOS)
img.save(filename)
print(f" Resized: {img.width}x{img.height}")
else:
print(f"OK {filename}: {img.width}x{img.height}")
# Safe figure sizes for Claude (300 DPI)
FIG_SIZES = {
'small': (7, 5), # 2100x1500 px
'medium': (12, 9), # 3600x2700 px
'large': (20, 15), # 6000x4500 px
'max': (26, 26), # 7800x7800 px - maximum safe
}
# Use in notebook
fig, ax = plt.subplots(figsize=FIG_SIZES['medium'])
# ... plotting code ...
save_figure('figure.png')
For complete image size guidance, see the data-visualization skill.
Use structured notebook patterns for multi-source data merging and enrichment. Key principles:
ENABLE_AWS_FETCH = False, TEST_MODE = True)For detailed patterns including data update, enrichment, and AWS GenomeArk workflows, see notebook-patterns.md.
Always use NotebookEdit tool for .ipynb file modifications -- never the Edit tool (corrupts JSON structure).
Three modes: replace (update cell content), insert (add new cell after target), delete (remove cell).
Key rules:
cell_type when insertingjq or Python JSON parsingFor NotebookEdit usage, programmatic JSON manipulation, bulk operations, and cell newline handling, see notebook-editing.md.
scipy.stats.pearsonrBEFORE finalizing any analysis notebook, verify ALL statistical claims against actual computed values. Text claims can become stale after data/code updates. Extract claims, rerun tests, create verification table.
For detailed statistical methods, outlier removal code, claim verification workflow, and confounding analysis, see statistical-methods.md.
#0173B2 + Orange #DE8F05 for two-group comparisons<img> tags in markdown cells for responsive SVG/PNG scalingviewBox attributes directly (no ImageMagick needed)For detailed font size tables, color palette code, imbalance handling, SVG manipulation, and DPI management, see visualization-guide.md.
For analyses with 5+ figures preparing for publication:
When splitting notebooks, recreate all calculated columns and variable definitions in each split. When deprecating, create dated directories with documentation.
For figure usage analysis, splitting strategies, dual-notebook workflow, publication notebook structure, TOC generation, deprecation workflow, and migration guides, see notebook-organization.md.
For path management, HTML/PDF/LaTeX export, sharing package structure, and output preservation guidelines, see sharing-and-export.md.
For creating multiple similar analysis cells:
template = '''
if len(data_with_species) > 0:
print('Analyzing {display} vs {metric}...\\n')
species_data = {{}}
for inv in data_with_species:
{name} = safe_float_convert(inv.get('{name}'))
if {name} is None:
continue
# ... analysis code
'''
characteristics = [
{'name': 'genome_size', 'display': 'Genome Size', 'unit': 'Gb'},
{'name': 'heterozygosity', 'display': 'Heterozygosity', 'unit': '%'},
]
for char in characteristics:
code = template.format(**char)
Define once, reuse throughout:
def safe_float_convert(value):
"""Convert string to float, handling comma separators"""
if not value or not str(value).strip():
return None
try:
return float(str(value).replace(',', ''))
except (ValueError, TypeError):
return None
Key pitfalls to watch for:
data as a loop variable (shadows global)df.columns.tolist() before processingjq for notebooks > 256 KBFor detailed troubleshooting, variable validation, debugging techniques, and environment setup, see troubleshooting.md.
.ipynb file modifications| File | Contents |
|---|---|
| notebook-patterns.md | Data update, enrichment, AWS GenomeArk patterns |
| notebook-editing.md | NotebookEdit tool, programmatic manipulation, metrics updates |
| visualization-guide.md | Publication figures, colors, image display, SVG, DPI |
| statistical-methods.md | Outlier handling, statistical rigor, claim verification |
| notebook-organization.md | Splitting, dual-notebook, deprecation, figure analysis |
| sharing-and-export.md | Paths, HTML/PDF export, sharing packages |
| troubleshooting.md | Common pitfalls, debugging, validation, environment |