Set up a GPU sandbox for interpretability research
Sets up a GPU sandbox for interpretability research experiments with Jupyter notebooks.
/plugin marketplace add ajobi-uhc/seer-claude-plugin/plugin install seer@seer-localYou are helping the user set up an interpretability research experiment.
IMPORTANT: Before writing any setup.py code, consult the seer skill for the complete API reference. Even before you ask the user what they want to do. The skill contains critical information about SandboxConfig, ModelConfig, and all available parameters.
First, understand what they're investigating. Ask them to describe their research goal.
If they just ran /seer:setup without context, ask:
What are you trying to investigate?
Listen to their description. Understand:
Don't jump to model/GPU questions yet.
Based on their research, figure out:
Model:
Setup type:
Repos:
Ask clarifying questions only if needed. If their description is clear, just confirm your understanding and proceed.
Create a directory structure:
experiments/<experiment-name>/
├── setup.py # Sandbox setup script
└── task.md # Research description
Document their research:
# <Experiment Name>
## Research Question
<What they told you they're investigating>
## Approach
<Brief description of the technical approach>
## Setup
- Model: <model>
- GPU: <gpu>
- Type: Notebook / Scoped Sandbox
## Notes
<Any specific details they mentioned>
IMPORTANT: Consult the seer skill for exact API details, GPU options, and parameter specifications before writing this code.
For Notebook Sandbox (most common):
from src.environment import Sandbox, SandboxConfig, ExecutionMode, ModelConfig
from src.workspace import Workspace
from src.execution import create_notebook_session
import json
config = SandboxConfig(
execution_mode=ExecutionMode.NOTEBOOK,
gpu="A100-40GB", # or A100-80GB for larger models
models=[ModelConfig(name="google/gemma-2-9b-it")],
python_packages=["torch", "transformers", "accelerate", "matplotlib", "numpy"],
)
sandbox = Sandbox(config).start()
session = create_notebook_session(sandbox, Workspace())
print(json.dumps({
"session_id": session.session_id,
"jupyter_url": session.jupyter_url,
}))
For Scoped Sandbox (RPC interface):
from src.environment import ScopedSandbox, SandboxConfig, ModelConfig
from src.workspace import Workspace
from src.execution import create_local_session
import json
config = SandboxConfig(
gpu="A100-40GB",
models=[ModelConfig(name="google/gemma-2-9b")],
python_packages=["torch", "transformers"],
)
scoped = ScopedSandbox(config)
scoped.start()
interface_lib = scoped.serve(
"interface.py",
expose_as="library",
name="model_tools"
)
print(json.dumps({"status": "ready", "interface": "model_tools"}))
If scoped sandbox, also create interface.py with the functions they need.
IMPORTANT: Only now do you run the script.
cd experiments/<experiment-name>
uv run python setup.py
This takes 3-5 minutes first time. Wait for the JSON output.
Only after Step 5 completes, parse the JSON and call:
attach_to_session(session_id="<from output>", jupyter_url="<from output>")
Tell the user:
model and tokenizer are pre-loadedNow you can use execute_code() to run experiments.
| Model Size | GPU | Notes |
|---|---|---|
| 7B | "A10G" or "L4" | Cheapest option |
| 9B-13B | "A100-40GB" | Good default |
| 30B+ | "A100-80GB" | Need 80GB VRAM |
| 70B+ | "H100" or gpu="A100-80GB", gpu_count=2 | Multi-GPU |
Private/gated models:
secrets=["HF_TOKEN"]
With external repos:
repos=[RepoConfig(url="org/repo", install=True)]
Multiple models:
models=[
ModelConfig(name="model-a", var_name="model_a"),
ModelConfig(name="model-b", var_name="model_b"),
]