Analyze GitHub repository to extract methodology and generate the Methods section. Fourth step of writer workflow. Requires scope.md and cloned code/ repository.
Analyzes cloned repository code to extract computational methodology and draft the Methods section. Triggered after scope.md is created and code is cloned, pausing to clarify any ambiguous procedures before drafting.
/plugin marketplace add sxg/science/plugin install writer@scienceThis skill inherits all available tools. When active, it can use any tool Claude has access to.
Analyzes the cloned GitHub repository to understand the computational methodology, then generates structured notes and drafts the Methods section.
NEVER assume what the code does without user confirmation.
Code can be complex and context-dependent. When uncertain about methodology:
Unclear analysis purpose
Ambiguous parameters
threshold = 0.7 mean?)Multiple analysis paths
Statistical method uncertainty
Missing context
X represent in domain terms?I have questions about the code before writing the Methods section:
**File**: analysis.ipynb, Cell 15
**Question**: I see `model = RandomForestClassifier(n_estimators=100, max_depth=5)`.
- Were these hyperparameters tuned, or are they defaults?
- If tuned, what was the tuning method (grid search, random search)?
- Should I report these specific values in the Methods?
**Why this matters**: Reviewers often ask about hyperparameter selection.
Log clarifications in notes/code-analysis.md:
## Clarifications Received
| File | Question | User Response |
|------|----------|---------------|
| analysis.ipynb | Hyperparameter tuning? | Grid search with 5-fold CV |
| preprocess.py | Why z-score normalization? | Standard for this imaging modality |
scope.md must existnotes/ethics-summary.md may exist (provides approved procedures and endpoints for cross-reference)notes/ethics-scope-comparison.md may exist (clarifies what was actually implemented vs. approved)code/ directory with cloned repository (from context-ingestion)[Read scope.md and notes/ethics-summary.md for context]
│
▼
[Scan Repository Structure]
│
▼
[Identify Key Files] ─── Notebooks, scripts, configs
│
▼
[CHECKPOINT: Clarify Code Purpose] ─── Ask about unclear scripts/params
│
▼
[Analyze Code Flow] ─── Data loading → Processing → Analysis → Output
│
▼
[CHECKPOINT: Verify Methodology] ─── Confirm interpretation with user
│
▼
[ETHICS CROSS-REFERENCE] ─── Compare code procedures vs. approved procedures
│
▼
[Extract Methodology] ─── Generate notes/code-analysis.md
│
▼
[STATISTICAL REVIEW] ─── Validate statistical methods
│ └── agents/statistical-reviewer.md
▼
[Draft Methods] ─── drafts/methods.md (with statistical sign-off)
# Get repository overview
find code/ -type f -name "*.py" -o -name "*.ipynb" -o -name "*.R" | head -30
# Check for dependency files
ls code/requirements.txt code/environment.yml code/setup.py 2>/dev/null
# Check for README
cat code/README.md 2>/dev/null | head -50
Identify:
Prioritize analysis of:
Most important - usually contain the full analysis workflow.
ls code/*.ipynb code/**/*.ipynb 2>/dev/null
ls code/main.py code/analysis.py code/run*.py 2>/dev/null
ls code/*preprocess* code/*clean* code/*load* 2>/dev/null
ls code/*model* code/*train* code/*stat* 2>/dev/null
For each key file, trace the methodology:
Look for patterns:
# Python patterns
pd.read_csv(...)
pd.read_excel(...)
nibabel.load(...) # Neuroimaging
pydicom.dcmread(...) # DICOM
SimpleITK.ReadImage(...)
Extract:
Look for patterns:
# Cleaning
df.dropna(...)
df.fillna(...)
# Transformation
StandardScaler()
normalize(...)
resample(...)
# Feature engineering
df['new_col'] = ...
Extract:
Look for patterns:
# Hypothesis tests
scipy.stats.ttest_ind(...)
scipy.stats.mannwhitneyu(...)
scipy.stats.pearsonr(...)
scipy.stats.spearmanr(...)
# Regression
statsmodels.api.OLS(...)
statsmodels.api.Logit(...)
# Multiple comparison correction
statsmodels.stats.multitest.multipletests(...)
Extract:
Look for patterns:
# Splitting
train_test_split(..., test_size=0.2, random_state=42)
cross_val_score(...)
StratifiedKFold(...)
# Models
RandomForestClassifier(...)
LogisticRegression(...)
XGBClassifier(...)
# Evaluation
accuracy_score(...)
roc_auc_score(...)
confusion_matrix(...)
Extract:
cat code/requirements.txt 2>/dev/null
Or extract from imports:
import pandas as pd
print(pd.__version__)
Create notes/code-analysis.md:
# Code Analysis
**Repository**: [GitHub URL]
**Analyzed**: [timestamp]
**Primary Language**: Python [version]
## Repository Structure
code/ ├── analysis.ipynb # Main analysis ├── preprocessing.py # Data cleaning ├── models.py # ML models └── requirements.txt # Dependencies
## Data Pipeline
### 1. Data Loading
- **Source**: CSV files from [source]
- **Format**: Tabular data with [n] columns
- **Initial Size**: [n] rows
### 2. Preprocessing
- Missing data: [handling approach]
- Normalization: [method]
- Exclusions: [criteria]
### 3. Analysis Approach
#### Statistical Tests
| Test | Purpose | Parameters |
|------|---------|------------|
| Independent t-test | Group comparison | α = 0.05 |
| Pearson correlation | Association | |
| Mann-Whitney U | Non-parametric comparison | |
#### Machine Learning (if applicable)
- **Model**: [type]
- **Split**: [ratio] train/test
- **Cross-validation**: [k]-fold
- **Hyperparameters**: [key params]
- **Metrics**: [accuracy, AUC, etc.]
### 4. Output Generation
- Figures saved to: [location]
- Results saved to: [location]
## Dependencies
| Package | Version | Purpose |
|---------|---------|---------|
| pandas | 2.0.3 | Data manipulation |
| scikit-learn | 1.3.0 | ML models |
| scipy | 1.11.1 | Statistical tests |
| matplotlib | 3.7.2 | Visualization |
## Key Code Snippets
### Statistical Test Implementation
```python
[relevant code snippet]
[relevant code snippet]
## Step 4b: Ethics Cross-Reference (If Ethics Docs Exist)
**Skip this step if `notes/ethics-summary.md` does not exist.**
Compare the procedures identified in code with the approved procedures to ensure consistency and identify any discrepancies.
### Cross-Reference Table
Create a comparison in `notes/code-analysis.md`:
```markdown
## Ethics Procedure Cross-Reference
| Approved Procedure | Implemented in Code? | Code Location | Notes |
|--------------------|---------------------|---------------|-------|
| [procedure from ethics] | ✓/✗ | analysis.ipynb | [matches/differs] |
| [procedure from ethics] | ✓/✗ | [file] | [notes] |
| Code Procedure | In Ethics Doc? | Notes |
|----------------|----------------|-------|
| [procedure from code] | ✓/✗ | [expected/unexpected] |
Reference notes/ethics-scope-comparison.md to understand if differences were already documented during scoping. If not:
I found a discrepancy between the ethics document and implemented code:
**Ethics Doc**: Primary endpoint is outcome measurement at 6 months
**Code**: Analyzes measurements at 3 months and 6 months
Was this intentional? Should the Methods describe both timepoints,
or is one the primary and one exploratory?
Before drafting Methods, invoke the statistical reviewer agent for statistical review.
Read agents/statistical-reviewer.md and execute Phase 1 (Code Analysis Review):
The statistical reviewer must confirm:
## Statistical Methods Review
| Method | Appropriate? | Assumptions Checked? | Recommendation |
|--------|--------------|---------------------|----------------|
| [test 1] | ✓/✗ | ✓/✗ | [approve/revise] |
| [test 2] | ✓/✗ | ✓/✗ | [approve/revise] |
**Statistical Sign-Off**: [ ] Approved for Methods drafting
Do NOT proceed to Methods draft until statistical reviewer approves.
If issues identified:
notes/code-analysis.mdOnly proceed after statistical reviewer sign-off from Step 5.
Create drafts/methods.md:
# Methods
## Study Design and Population
[From scope.md - describe study design]
[From code analysis - describe data source and selection]
## Data Acquisition
[If applicable - describe how raw data was obtained]
## Data Preprocessing
[From code analysis - describe cleaning, normalization, feature engineering]
Data preprocessing was performed using Python [version]. [Describe steps in scientific prose.]
## Statistical Analysis
[From code analysis - describe all statistical approaches]
[Test] was used to [purpose]. Statistical significance was set at p < [threshold]. [Multiple comparison correction] was applied for [reason].
## Machine Learning Analysis (if applicable)
[From code analysis - describe ML pipeline]
A [model type] was trained using [split] of the data. [Cross-validation strategy] was employed to assess model generalization. Model performance was evaluated using [metrics].
## Software
Statistical analysis was performed using Python [version] with [packages]. [Additional software if relevant.]
---
## Methods Checklist
- [ ] Study design clearly stated
- [ ] Population/sample described
- [ ] All preprocessing steps documented
- [ ] Statistical tests specified with parameters
- [ ] Significance threshold stated
- [ ] Software and versions listed
- [ ] Reproducibility considerations addressed
Save to:
notes/code-analysis.md - Detailed analysis (includes statistical review)drafts/methods.md - Methods section draft (with statistical reviewer approval)Return to parent skill with summary:
Use when working with Payload CMS projects (payload.config.ts, collections, fields, hooks, access control, Payload API). Use when debugging validation errors, security issues, relationship queries, transactions, or hook behavior.
Applies Anthropic's official brand colors and typography to any sort of artifact that may benefit from having Anthropic's look-and-feel. Use it when brand colors or style guidelines, visual formatting, or company design standards apply.
Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.