Skill

ARIS — Autonomous Research In Sleep

Automates ML research workflows with idea discovery, GPU-ready experiment code gen and W&B tracking, LaTeX paper writing, rebuttals, and cross-model review loops via Claude Code.

Python

OpenAI

Install

npx claudepluginhub joshuarweaver/cascade-ai-ml-agents-misc-1 --plugin aradotso-trending-skills-37

Tool Access

This skill uses the workspace's default tool permissions.

Preview

```markdown

SKILL.md

Similar Skills

cache-components

Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.

cache-components

139.2k

mcp-builder

9 files

Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).

anthropics-skills-13

124.2k

canvas-design

20 files

Generates original PNG/PDF visual art via design philosophy manifestos for posters, graphics, and static designs on user request.

anthropics-skills-13

124.2k

Stats

Stars36

Forks8

Last CommitMar 24, 2026

Actions

View Source View Plugin View on GitHub View README

2. Install skills into Claude Code

Copy the skills directory to your project, or symlink it:

# Option A: copy skills to your project
cp -r skills/ /your/project/.claude/skills/

# Option B: symlink (keeps skills up to date)
ln -s /path/to/Auto-claude-code-research-in-sleep/skills /your/project/.claude/skills

Claude Code auto-discovers SKILL.md files in .claude/skills/**. No registration step needed.

3. Configure the MCP reviewer (cross-model review)

ARIS uses the llm-chat MCP server so the executor model can call a second model for review. Install it:

cd mcp-servers/llm-chat
pip install -r requirements.txt   # or: uv pip install -r requirements.txt

Add to your claude_desktop_config.json (or Claude Code MCP config):

{
  "mcpServers": {
    "llm-chat": {
      "command": "python",
      "args": ["/path/to/Auto-claude-code-research-in-sleep/mcp-servers/llm-chat/server.py"],
      "env": {
        "OPENAI_API_KEY": "$OPENAI_API_KEY",
        "LLM_MODEL": "gpt-4o"
      }
    }
  }
}

For alternative reviewers (Kimi, GLM, MiniMax, DeepSeek) set LLM_BASE_URL and LLM_MODEL to the provider's OpenAI-compatible endpoint. No Claude or OpenAI API required.

4. (Optional) Codex MCP for OpenAI Codex as executor

{
  "mcpServers": {
    "codex": {
      "command": "npx",
      "args": ["@openai/codex-mcp"],
      "env": {
        "OPENAI_API_KEY": "$OPENAI_API_KEY"
      }
    }
  }
}

Environment Variables

Variable	Required	Description
`OPENAI_API_KEY`	For GPT reviewer	OpenAI API key
`ANTHROPIC_API_KEY`	For Claude executor	Anthropic API key
`LLM_BASE_URL`	Alternative reviewer	OpenAI-compatible base URL
`LLM_MODEL`	Alternative reviewer	Model name at that endpoint
`WANDB_API_KEY`	Experiment tracking	Weights & Biases key

Workflows & Commands

Full pipeline (idea → paper)

/research-pipeline "factorized gap in discrete diffusion LMs"

With a reference paper and base codebase:

/research-pipeline "improve positional encoding in transformers" — ref paper: https://arxiv.org/abs/2104.09864, base repo: https://github.com/facebookresearch/fairseq

Parameters:

Flag	Default	Effect
`ref paper`	—	ARIS reads this paper, finds weaknesses, targets them
`base repo`	—	Clone and use this repo as experiment base
`compact: true`	false	Generate lean summary files (good for short-context models)

Workflow 1 — Idea Discovery

/idea-discovery "sparse attention in long-context LLMs"

What it does:

Searches recent arXiv papers in the direction
Identifies open gaps and contradiction clusters
Generates 3–5 novel ideas with novelty scores
Runs /research-refine to sharpen the best idea into a problem statement

Workflow 1.5 — Experiment Bridge

/experiment-bridge "idea_proposal.md" — base repo: https://github.com/huggingface/diffusers

What it does:

Reads the sharpened idea from Workflow 1
Generates GPU-ready experiment code
Runs GPT cross-model code review before deployment (code review: true by default)
Executes training loop with W&B logging
Saves results to experiment_results/

Example generated experiment scaffold:

# experiment_results/run_001/train.py  (auto-generated by /experiment-bridge)
import wandb
import torch
from torch.utils.data import DataLoader

wandb.init(
    project=os.environ.get("WANDB_PROJECT", "aris-experiment"),
    config={
        "method": "factorized_discrete_diffusion",
        "lr": 3e-4,
        "epochs": 50,
        "batch_size": 32,
    }
)

for epoch in range(config.epochs):
    for batch in dataloader:
        loss = model(batch)
        wandb.log({"loss": loss.item(), "epoch": epoch})

Workflow 2 — Literature Review

/literature-review "discrete diffusion language models"

Anti-hallucination: ARIS verifies every citation via DBLP → CrossRef → marks unverified as [VERIFY]. Never fabricates BibTeX.

Workflow 3 — Paper Writing

/paper-write "experiment_results/ + idea_proposal.md" — venue: NeurIPS

Supported venue templates: CVPR, NeurIPS, ICML, ICLR, ACL, AAAI, ACM MM

Workflow 4 — Rebuttal

/rebuttal "paper/ + reviews/" — venue: ICML, character limit: 5000

Parameters:

Parameter	Default	Description
`venue`	`ICML`	Target conference
`character limit`	Required	Hard character limit
`quick mode`	false	Stop after strategy (Phase 0–3), no draft
`auto experiment`	false	Auto-run experiments when reviewers ask for new evidence
`max stress test rounds`	1	GPT-5.4 stress-test passes on draft
`max followup rounds`	3	Per-reviewer follow-up round limit

Three safety gates — rebuttal will NOT finalize if any fails:

🔒 No fabrication — every claim maps to paper/review/confirmed result
🔒 No overpromise — every promise is user-approved
🔒 Full coverage — every reviewer concern is tracked

Outputs:

PASTE_READY.txt — exact character count, ready to paste to venue portal
REBUTTAL_DRAFT_rich.md — extended version for manual editing

Presentation & Poster

/paper-slides "paper/"     # Beamer PDF + PPTX + speaker notes + Q&A prep
/paper-poster "paper/"     # A0/A1 poster PDF + PPTX + SVG (venue colors)

Standalone Utility Skills

Skill	Command	What it does
`training-check`	`/training-check "train.py"`	Diagnose training instability, NaN, slow convergence
`result-to-claim`	`/result-to-claim "results.json"`	Convert raw numbers into paper-ready claims
`ablation-planner`	`/ablation-planner "idea.md"`	Design minimal ablation study for a method
`experiment-plan`	`/experiment-plan "idea.md"`	Claim-driven experiment roadmap
`research-refine`	`/research-refine "idea.md"`	Sharpen vague idea into problem-anchored proposal
`formula-derivation`	`/formula-derivation "method.md"`	Develop and verify research formulas
`paper-illustration`	`/paper-illustration "paper/"`	Generate figures (Gemini-assisted)
`grant-proposal`	`/grant-proposal "idea.md"`	Draft grant proposal from research idea

Alternative Model Combinations

ARIS requires only an OpenAI-compatible endpoint for the reviewer. Set environment variables:

# Kimi as reviewer
export LLM_BASE_URL="https://api.moonshot.cn/v1"
export LLM_MODEL="moonshot-v1-128k"
export LLM_API_KEY=$MOONSHOT_API_KEY

# DeepSeek as reviewer
export LLM_BASE_URL="https://api.deepseek.com/v1"
export LLM_MODEL="deepseek-chat"
export LLM_API_KEY=$DEEPSEEK_API_KEY

# MiniMax as reviewer
export LLM_BASE_URL="https://api.minimax.chat/v1"
export LLM_MODEL="abab6.5s-chat"
export LLM_API_KEY=$MINIMAX_API_KEY

Then in mcp-servers/llm-chat/server.py the LLM_BASE_URL env var overrides the OpenAI default. No code changes needed.

Using with Codex CLI (no Claude)

ARIS ships a parallel skills-codex/ directory with the same workflows adapted for OpenAI Codex CLI:

# Install Codex CLI
npm install -g @openai/codex

# Run a workflow
codex --skill skills/skills-codex/research-pipeline/SKILL.md \
  "improve contrastive learning in vision transformers"

Using with Cursor

Open Cursor settings → Rules → paste content of docs/CURSOR_ADAPTATION.md
Copy skills/ to .cursorrules-skills/ in your project
In chat: @research-pipeline "your research direction"

Using with Trae (ByteDance IDE)

See docs/TRAE_ARIS_RUNBOOK_EN.md for full setup. Trae supports SKILL.md natively via its plugin system.

Input Templates

Pre-filled templates for every workflow live in templates/:

templates/
  research-pipeline.md      # Full pipeline input
  idea-discovery.md
  experiment-bridge.md
  literature-review.md
  paper-write.md
  rebuttal.md
  paper-slides.md
  paper-poster.md

Use a template:

/research-pipeline — template: templates/research-pipeline.md

Project Structure

Auto-claude-code-research-in-sleep/
├── skills/
│   ├── research-pipeline/SKILL.md      # Main orchestration workflow
│   ├── idea-discovery/SKILL.md         # Workflow 1
│   ├── experiment-bridge/SKILL.md      # Workflow 1.5
│   ├── literature-review/SKILL.md      # Workflow 2
│   ├── paper-write/SKILL.md            # Workflow 3
│   ├── rebuttal/SKILL.md               # Workflow 4
│   ├── paper-slides/SKILL.md
│   ├── paper-poster/SKILL.md
│   ├── training-check/SKILL.md
│   ├── result-to-claim/SKILL.md
│   ├── ablation-planner/SKILL.md
│   ├── experiment-plan/SKILL.md
│   ├── research-refine/SKILL.md
│   ├── formula-derivation/SKILL.md
│   └── skills-codex/                   # Codex CLI variants
├── mcp-servers/
│   └── llm-chat/                       # OpenAI-compatible reviewer MCP
├── templates/                          # Input templates per workflow
├── docs/
│   ├── CURSOR_ADAPTATION.md
│   ├── TRAE_ARIS_RUNBOOK_EN.md
│   ├── ANTIGRAVITY_ADAPTATION.md
│   ├── MODELSCOPE_GUIDE.md             # Free tier setup
│   ├── MiniMax-GLM-Configuration.md
│   └── CODEX_GEMINI_REVIEW_GUIDE.md
└── README.md

Common Patterns

Pattern 1: Start from an arXiv paper you want to beat

/research-pipeline "improve method" — ref paper: https://arxiv.org/abs/2406.04329, base repo: https://github.com/org/repo

ARIS reads the paper → identifies weaknesses → clones repo → generates targeted ideas → runs experiments → writes paper.

Pattern 2: Resume interrupted session

Add compact: true to any workflow. ARIS writes a lean SESSION_SUMMARY.md. On resume:

/research-pipeline — resume: SESSION_SUMMARY.md

Pattern 3: Jump into the middle of a pipeline

Already have results? Jump to paper writing:

/paper-write "my_results/ + my_idea.md" — venue: NeurIPS

Already have a paper? Jump to rebuttal:

/rebuttal "paper/ + reviews/" — venue: ICML, character limit: 5000

Pattern 4: Free tier via ModelScope

export LLM_BASE_URL="https://api-inference.modelscope.cn/v1"
export LLM_MODEL="Qwen/Qwen2.5-72B-Instruct"
export LLM_API_KEY=$MODELSCOPE_API_KEY

See docs/MODELSCOPE_GUIDE.md for zero-cost setup.

Troubleshooting

Skills not discovered by Claude Code

Ensure SKILL.md files are under .claude/skills/ relative to your project root. Claude Code scans this path at startup.

MCP reviewer not connecting

# Test the llm-chat server directly
cd mcp-servers/llm-chat
python server.py --test
# Should print: {"status": "ok", "model": "gpt-4o"}

W&B logging not working in experiment-bridge

wandb login  # uses WANDB_API_KEY env var, or prompts for manual entry

Citation hallucination in literature-review

All unverified citations are tagged [VERIFY] in output. Search DBLP manually for flagged entries before including in your paper. Never remove the [VERIFY] tag without confirming.

Rebuttal exceeds character limit

ARIS tracks character count per section. If a draft exceeds the limit, it automatically trims supporting evidence (keeps claims, removes elaboration). You can also pass quick mode: true to get the strategy without the draft, then write targeted sections manually.

Cross-model review loop not running (self-review fallback)

If the llm-chat MCP is unreachable, ARIS falls back to single-model review with a warning in the output. Check MCP server logs:

tail -f ~/.claude/mcp-logs/llm-chat.log

Session context overflow

Use compact: true on any workflow invocation to produce a compressed SESSION_SUMMARY.md that fits in a fresh context window.

Extending ARIS

Every skill is a plain Markdown file. To create a custom skill:

# my-custom-skill

## Trigger
When the user says "run my custom analysis"...

## Steps
1. Read input files
2. Call `mcp__llm-chat__chat` with the review prompt
3. Write output to `custom_output/`

## Output
- `custom_output/analysis.md`
- `custom_output/score.json`

Save as .claude/skills/my-custom-skill/SKILL.md and Claude Code will discover it automatically.