deep-research
A Claude Code plugin for rigorous web research using three independent LLM
families as cross-checkers, worktree-sandboxed for security, and
primary-source verified via mechanical URL + passage checks.
The single goal: produce research output you can actually trust to make
decisions on — by hardening against the failure modes single-pipeline LLM
research is known to produce (consensus hallucination, fabricated citations,
prompt injection from web content, CLI/API confabulation).
/research <your question>
What it does
/research spawns one researcher agent in worktree isolation. The agent
then runs three parallel research streams:
- Claude (you) — WebSearch + WebFetch
- Gemini 3.1 Pro — via the
agy (Antigravity)
CLI, sandboxed (--sandbox)
- GPT-5 — via OpenAI's
codex CLI,
sandboxed (-s read-only)
Each stream uses a different web-search backend (Google Search inside Gemini,
OpenAI's web tool inside Codex, Claude's WebSearch). After all three return,
the researcher triages four lists — Agreements / Gemini-only / GPT-only /
Conflicts — and resolves each one via primary-source verification before
including it in the final report.
The defense model is evidence weight > vote count. One cross-checker with
a primary-source URL that passes mechanical verification (URL resolves 2xx +
quoted passage actually appears in the fetched page) beats a two-way LLM
consensus with no source. Why this matters is documented in the literature
on consensus hallucination — when training corpora overlap heavily, LLM
agreement-without-evidence is a weaker signal than minority-dissent-with-
evidence (Shared Imagination, arXiv:2407.16604;
Till et al. 2025).
Defenses
| Threat | Defense |
|---|
| Consensus hallucination (overlapping LLM training corpora produce confident agreement on false facts) | 3 different LLM families with different search backends + primary-source verification on every retained claim |
| Fabricated citations (LLM cites a paper / URL that doesn't exist or doesn't say what was claimed) | Rule C: curl -sIL HTTP code check on every URL retained + passage match (re-fetch the URL, confirm quoted value appears verbatim) |
| CLI / API / config-syntax confabulation (LLM "remembers" a flag that doesn't exist) | Rule B: empirical <cmd> --help or vendor-doc fetch before claiming a flag exists |
| Prompt injection from web content | Output wrapped in <external-research trust="untrusted"> tags before main session processes it; researcher is reminded that fetched HTML is data, not instructions; web-suggested shell commands never executed |
| Code execution from compromised pages | Worktree isolation on the researcher (sandboxed copy of repo); agy --sandbox enables kernel-level terminal restrictions; codex -s read-only enforced via macOS Seatbelt / Linux bubblewrap |
$TMPDIR race condition between concurrent /research sessions | Each researcher generates a unique RUN_ID via date +%s%N and namespaces all tempfile paths with it |
Installation
This is a local Claude Code plugin. Drop it under your plugins directory and
enable it.
# Clone into local plugins
mkdir -p ~/.claude/plugins/local/plugins
cd ~/.claude/plugins/local/plugins
git clone https://github.com/oh-rid/deep-research.git
Restart Claude Code (or reload plugins) and /research becomes available.
Requirements
The plugin works in degraded mode without optional cross-checkers, but for
full 3-way triangulation you need:
-
agy (Antigravity CLI) — runs Gemini models via Google AI Pro.
Install via Homebrew or the official installer. Model is set in
~/.gemini/antigravity-cli/settings.json ("model": "Gemini 3.1 Pro (High)"
is the recommended choice; Flash variants may return empty on long
academic prompts).
-
codex (OpenAI Codex CLI) — runs GPT-5. Web search must be enabled
in ~/.codex/config.toml:
[tools.web_search]
enabled = true
If either CLI is missing the researcher empirically detects it
(which agy / which codex) and proceeds with the remaining
cross-checker plus its own WebSearch.
Usage
/research what's the current academic consensus on r-star (natural rate of interest)?
/research is the Brand-Goy-Lemke SUERF Policy Brief 1360 about the US or the euro area?
/research what did the latest FOMC minutes say about rate hikes, and how did the curve react?
The slash command spawns the researcher, wraps the result in an
untrusted-source block, and presents a structured final answer to you with: