From aradotso-trending-skills-37
Automates ML research workflows with idea discovery, GPU-ready experiment code gen and W&B tracking, LaTeX paper writing, rebuttals, and cross-model review loops via Claude Code.
npx claudepluginhub joshuarweaver/cascade-ai-ml-agents-misc-1 --plugin aradotso-trending-skills-37This skill uses the workspace's default tool permissions.
```markdown
Guides Next.js Cache Components and Partial Prerendering (PPR) with cacheComponents enabled. Implements 'use cache', cacheLife(), cacheTag(), revalidateTag(), static/dynamic optimization, and cache debugging.
Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).
Generates original PNG/PDF visual art via design philosophy manifestos for posters, graphics, and static designs on user request.
---
name: aris-autonomous-research
description: ARIS (Auto-Research-In-Sleep) — Markdown-only autonomous ML research workflows using cross-model review loops, idea discovery, experiment automation, and paper writing with Claude Code or any LLM agent.
triggers:
- run autonomous research pipeline
- set up ARIS research workflow
- use claude code for ML research
- automate paper writing with AI
- cross-model research review loop
- run experiment automation with ARIS
- install ARIS skills for claude code
- generate research ideas while sleeping
---
# ARIS — Autonomous Research In Sleep
> Skill by [ara.so](https://ara.so) — Daily 2026 Skills collection.
ARIS is a **zero-dependency, Markdown-only** autonomous ML research system. Each "skill" is a plain `SKILL.md` file that any LLM agent can read and execute. The system orchestrates **cross-model collaboration**: one model executes (Claude Code / Codex) while another critiques (GPT-5.4 / Gemini / GLM / MiniMax), breaking self-review blind spots without any framework or lock-in.
Core capabilities:
- 🔬 **Idea discovery** from a research direction or existing paper
- 🧪 **Experiment automation** with GPU-ready code generation and W&B tracking
- 📝 **Paper writing** (LaTeX, Beamer slides, A0 poster)
- 🔁 **Cross-model review loops** with score progression
- 📬 **Rebuttal drafting** with safety gates (no fabrication, no overpromise, full coverage)
---
## Installation
### 1. Clone the repository
```bash
git clone https://github.com/wanshuiyin/Auto-claude-code-research-in-sleep.git
cd Auto-claude-code-research-in-sleep
Copy the skills directory to your project, or symlink it:
# Option A: copy skills to your project
cp -r skills/ /your/project/.claude/skills/
# Option B: symlink (keeps skills up to date)
ln -s /path/to/Auto-claude-code-research-in-sleep/skills /your/project/.claude/skills
Claude Code auto-discovers SKILL.md files in .claude/skills/**. No registration step needed.
ARIS uses the llm-chat MCP server so the executor model can call a second model for review. Install it:
cd mcp-servers/llm-chat
pip install -r requirements.txt # or: uv pip install -r requirements.txt
Add to your claude_desktop_config.json (or Claude Code MCP config):
{
"mcpServers": {
"llm-chat": {
"command": "python",
"args": ["/path/to/Auto-claude-code-research-in-sleep/mcp-servers/llm-chat/server.py"],
"env": {
"OPENAI_API_KEY": "$OPENAI_API_KEY",
"LLM_MODEL": "gpt-4o"
}
}
}
}
For alternative reviewers (Kimi, GLM, MiniMax, DeepSeek) set
LLM_BASE_URLandLLM_MODELto the provider's OpenAI-compatible endpoint. No Claude or OpenAI API required.
{
"mcpServers": {
"codex": {
"command": "npx",
"args": ["@openai/codex-mcp"],
"env": {
"OPENAI_API_KEY": "$OPENAI_API_KEY"
}
}
}
}
| Variable | Required | Description |
|---|---|---|
OPENAI_API_KEY | For GPT reviewer | OpenAI API key |
ANTHROPIC_API_KEY | For Claude executor | Anthropic API key |
LLM_BASE_URL | Alternative reviewer | OpenAI-compatible base URL |
LLM_MODEL | Alternative reviewer | Model name at that endpoint |
WANDB_API_KEY | Experiment tracking | Weights & Biases key |
/research-pipeline "factorized gap in discrete diffusion LMs"
With a reference paper and base codebase:
/research-pipeline "improve positional encoding in transformers" — ref paper: https://arxiv.org/abs/2104.09864, base repo: https://github.com/facebookresearch/fairseq
Parameters:
| Flag | Default | Effect |
|---|---|---|
ref paper | — | ARIS reads this paper, finds weaknesses, targets them |
base repo | — | Clone and use this repo as experiment base |
compact: true | false | Generate lean summary files (good for short-context models) |
/idea-discovery "sparse attention in long-context LLMs"
What it does:
/research-refine to sharpen the best idea into a problem statement/experiment-bridge "idea_proposal.md" — base repo: https://github.com/huggingface/diffusers
What it does:
code review: true by default)experiment_results/Example generated experiment scaffold:
# experiment_results/run_001/train.py (auto-generated by /experiment-bridge)
import wandb
import torch
from torch.utils.data import DataLoader
wandb.init(
project=os.environ.get("WANDB_PROJECT", "aris-experiment"),
config={
"method": "factorized_discrete_diffusion",
"lr": 3e-4,
"epochs": 50,
"batch_size": 32,
}
)
for epoch in range(config.epochs):
for batch in dataloader:
loss = model(batch)
wandb.log({"loss": loss.item(), "epoch": epoch})
/literature-review "discrete diffusion language models"
Anti-hallucination: ARIS verifies every citation via DBLP → CrossRef → marks unverified as [VERIFY]. Never fabricates BibTeX.
/paper-write "experiment_results/ + idea_proposal.md" — venue: NeurIPS
Supported venue templates: CVPR, NeurIPS, ICML, ICLR, ACL, AAAI, ACM MM
/rebuttal "paper/ + reviews/" — venue: ICML, character limit: 5000
Parameters:
| Parameter | Default | Description |
|---|---|---|
venue | ICML | Target conference |
character limit | Required | Hard character limit |
quick mode | false | Stop after strategy (Phase 0–3), no draft |
auto experiment | false | Auto-run experiments when reviewers ask for new evidence |
max stress test rounds | 1 | GPT-5.4 stress-test passes on draft |
max followup rounds | 3 | Per-reviewer follow-up round limit |
Three safety gates — rebuttal will NOT finalize if any fails:
Outputs:
PASTE_READY.txt — exact character count, ready to paste to venue portalREBUTTAL_DRAFT_rich.md — extended version for manual editing/paper-slides "paper/" # Beamer PDF + PPTX + speaker notes + Q&A prep
/paper-poster "paper/" # A0/A1 poster PDF + PPTX + SVG (venue colors)
| Skill | Command | What it does |
|---|---|---|
training-check | /training-check "train.py" | Diagnose training instability, NaN, slow convergence |
result-to-claim | /result-to-claim "results.json" | Convert raw numbers into paper-ready claims |
ablation-planner | /ablation-planner "idea.md" | Design minimal ablation study for a method |
experiment-plan | /experiment-plan "idea.md" | Claim-driven experiment roadmap |
research-refine | /research-refine "idea.md" | Sharpen vague idea into problem-anchored proposal |
formula-derivation | /formula-derivation "method.md" | Develop and verify research formulas |
paper-illustration | /paper-illustration "paper/" | Generate figures (Gemini-assisted) |
grant-proposal | /grant-proposal "idea.md" | Draft grant proposal from research idea |
ARIS requires only an OpenAI-compatible endpoint for the reviewer. Set environment variables:
# Kimi as reviewer
export LLM_BASE_URL="https://api.moonshot.cn/v1"
export LLM_MODEL="moonshot-v1-128k"
export LLM_API_KEY=$MOONSHOT_API_KEY
# DeepSeek as reviewer
export LLM_BASE_URL="https://api.deepseek.com/v1"
export LLM_MODEL="deepseek-chat"
export LLM_API_KEY=$DEEPSEEK_API_KEY
# MiniMax as reviewer
export LLM_BASE_URL="https://api.minimax.chat/v1"
export LLM_MODEL="abab6.5s-chat"
export LLM_API_KEY=$MINIMAX_API_KEY
Then in mcp-servers/llm-chat/server.py the LLM_BASE_URL env var overrides the OpenAI default. No code changes needed.
ARIS ships a parallel skills-codex/ directory with the same workflows adapted for OpenAI Codex CLI:
# Install Codex CLI
npm install -g @openai/codex
# Run a workflow
codex --skill skills/skills-codex/research-pipeline/SKILL.md \
"improve contrastive learning in vision transformers"
docs/CURSOR_ADAPTATION.mdskills/ to .cursorrules-skills/ in your project@research-pipeline "your research direction"See docs/TRAE_ARIS_RUNBOOK_EN.md for full setup. Trae supports SKILL.md natively via its plugin system.
Pre-filled templates for every workflow live in templates/:
templates/
research-pipeline.md # Full pipeline input
idea-discovery.md
experiment-bridge.md
literature-review.md
paper-write.md
rebuttal.md
paper-slides.md
paper-poster.md
Use a template:
/research-pipeline — template: templates/research-pipeline.md
Auto-claude-code-research-in-sleep/
├── skills/
│ ├── research-pipeline/SKILL.md # Main orchestration workflow
│ ├── idea-discovery/SKILL.md # Workflow 1
│ ├── experiment-bridge/SKILL.md # Workflow 1.5
│ ├── literature-review/SKILL.md # Workflow 2
│ ├── paper-write/SKILL.md # Workflow 3
│ ├── rebuttal/SKILL.md # Workflow 4
│ ├── paper-slides/SKILL.md
│ ├── paper-poster/SKILL.md
│ ├── training-check/SKILL.md
│ ├── result-to-claim/SKILL.md
│ ├── ablation-planner/SKILL.md
│ ├── experiment-plan/SKILL.md
│ ├── research-refine/SKILL.md
│ ├── formula-derivation/SKILL.md
│ └── skills-codex/ # Codex CLI variants
├── mcp-servers/
│ └── llm-chat/ # OpenAI-compatible reviewer MCP
├── templates/ # Input templates per workflow
├── docs/
│ ├── CURSOR_ADAPTATION.md
│ ├── TRAE_ARIS_RUNBOOK_EN.md
│ ├── ANTIGRAVITY_ADAPTATION.md
│ ├── MODELSCOPE_GUIDE.md # Free tier setup
│ ├── MiniMax-GLM-Configuration.md
│ └── CODEX_GEMINI_REVIEW_GUIDE.md
└── README.md
/research-pipeline "improve method" — ref paper: https://arxiv.org/abs/2406.04329, base repo: https://github.com/org/repo
ARIS reads the paper → identifies weaknesses → clones repo → generates targeted ideas → runs experiments → writes paper.
Add compact: true to any workflow. ARIS writes a lean SESSION_SUMMARY.md. On resume:
/research-pipeline — resume: SESSION_SUMMARY.md
Already have results? Jump to paper writing:
/paper-write "my_results/ + my_idea.md" — venue: NeurIPS
Already have a paper? Jump to rebuttal:
/rebuttal "paper/ + reviews/" — venue: ICML, character limit: 5000
export LLM_BASE_URL="https://api-inference.modelscope.cn/v1"
export LLM_MODEL="Qwen/Qwen2.5-72B-Instruct"
export LLM_API_KEY=$MODELSCOPE_API_KEY
See docs/MODELSCOPE_GUIDE.md for zero-cost setup.
Skills not discovered by Claude Code
Ensure SKILL.md files are under .claude/skills/ relative to your project root. Claude Code scans this path at startup.
MCP reviewer not connecting
# Test the llm-chat server directly
cd mcp-servers/llm-chat
python server.py --test
# Should print: {"status": "ok", "model": "gpt-4o"}
W&B logging not working in experiment-bridge
wandb login # uses WANDB_API_KEY env var, or prompts for manual entry
Citation hallucination in literature-review
All unverified citations are tagged [VERIFY] in output. Search DBLP manually for flagged entries before including in your paper. Never remove the [VERIFY] tag without confirming.
Rebuttal exceeds character limit
ARIS tracks character count per section. If a draft exceeds the limit, it automatically trims supporting evidence (keeps claims, removes elaboration). You can also pass quick mode: true to get the strategy without the draft, then write targeted sections manually.
Cross-model review loop not running (self-review fallback)
If the llm-chat MCP is unreachable, ARIS falls back to single-model review with a warning in the output. Check MCP server logs:
tail -f ~/.claude/mcp-logs/llm-chat.log
Session context overflow
Use compact: true on any workflow invocation to produce a compressed SESSION_SUMMARY.md that fits in a fresh context window.
Every skill is a plain Markdown file. To create a custom skill:
# my-custom-skill
## Trigger
When the user says "run my custom analysis"...
## Steps
1. Read input files
2. Call `mcp__llm-chat__chat` with the review prompt
3. Write output to `custom_output/`
## Output
- `custom_output/analysis.md`
- `custom_output/score.json`
Save as .claude/skills/my-custom-skill/SKILL.md and Claude Code will discover it automatically.