Skill

LLM Skill

Multi-LLM research with deep research, second opinions, and multi-model debate.

npx claudepluginhub beorn/bearly

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Multi-LLM research with deep research, second opinions, and multi-model debate.

SKILL.md

Similar Skills

council

Runs multi-LLM council for adversarial debate, critique, and synthesis on implementation, architecture, review, security, research, and planning tasks via llm-council CLI.

13 files

llm-council

ask-council

Runs 3 AI models in parallel (gpt-5.2-pro, gemini-3-pro-preview, claude-opus-4-5-20251101) for diverse perspectives on code queries. Invoke via /ask-council or auto-activates.

consultant

model-council

Orchestrates parallel analysis of coding problems across AI models (Claude, GPT, Gemini, Grok) via CLI tools or APIs, collects recommendations, and synthesizes optimal solution.

2 files

skills

Stats

Stars1

Forks0

Last CommitMar 25, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

LLM Skill

Multi-LLM research with deep research, second opinions, and multi-model debate.

Run from your project:

bun vendor/bearly/tools/llm.ts "question"

When to Use

Getting a second opinion from other AI models
Deep research requiring web search capabilities
Building consensus across multiple models
Research tasks needing citations and sources

Commands

# Standard question (~$0.02)
bun llm "What is the capital of France?"

# Deep research with web search (~$2-5)
bun llm --deep -y "Best practices for TUI testing in 2026"

# Second opinion from a different provider (~$0.02)
bun llm opinion "Is my caching approach reasonable?"

# Multi-model debate with synthesis (~$1-3)
bun llm debate -y "Monorepo vs polyrepo for our use case?"

# Quick/cheap model (~$0.01)
bun llm quick "What port does postgres use?"

Context Flags

# Explicit context
bun llm --deep -y --context "relevant code or info" "topic"

# Context from file
bun llm --deep -y --context-file ./src/module.ts "Review this code"

# Include session history
bun llm --deep -y --with-history "topic"

Output

Response is ALWAYS written to a file. JSON metadata goes to stdout (single line):

{
  "query": "What is the capital of France?",
  "file": "/tmp/llm-abc12345-1738800000000-x1y2.txt",
  "chars": 5432,
  "model": "GPT-5.2",
  "tokens": 1234,
  "cost": "$0.02",
  "durationMs": 3200
}

File path also printed on stderr: Output written to: <path>. Streaming tokens go to stderr ONLY if it's a TTY (interactive terminal). In background tasks, stderr is quiet — just the file path line. No truncation risk.

Read the output file with Read tool. Stale files (>7 days) are auto-cleaned on next run.

Agent Usage: Background & Async Patterns

Quick queries (~$0.02) — run foreground

Fast enough to run synchronously. Stdout contains JSON with file path:

bun llm "question"
# stdout = JSON with "file" key — Read the file

Deep research (~$2-5) — run in background, wait with TaskOutput

Deep research takes 2-15 minutes. Never poll output files manually (sleep + read loops waste turns). Use the Task tool with run_in_background=true, then TaskOutput with block=true:

# Step 1: Launch background task
Task(subagent_type="Bash", run_in_background=true,
     prompt='bun llm --deep -y "topic"')
→ Returns task_id

# Step 2: Do other work while it runs...

# Step 3: Block-wait for completion (up to 10 min)
TaskOutput(task_id=<id>, block=true, timeout=600000)

# Step 4: Find the output file
# Look for "Output written to: /tmp/llm-*.txt" in the last lines.
# If truncated (deep research streams thousands of tokens to stderr):
#   ls -lt /tmp/llm-*.txt | head -1
# Read the OUTPUT FILE — NOT the task output (which is just streaming tokens).

CRITICAL: Background task output captures stderr (streaming tokens) + stdout (JSON). For deep research this can exceed 30KB, causing Claude Code to truncate it. The actual response is ALWAYS in the output file. Read the file, not the task output.

Anti-pattern — do NOT do this:

# BAD: Manual polling wastes 5+ turns on sleep/read cycles
Bash("sleep 30 && wc -l /tmp/output")
Read("/tmp/output")
Bash("sleep 30 && wc -l /tmp/output")  # still not done...

Debate (~$1-3) — run foreground or background

Similar to deep research timing-wise. Use the same TaskOutput pattern for background execution.

Recovery

bun llm recover              # List incomplete responses
bun llm recover <id>         # Retrieve by ID from OpenAI
bun llm partials --clean     # Clean up old partial files

Environment Variables

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export GOOGLE_GENERATIVE_AI_API_KEY="..."
export XAI_API_KEY="..."
export PERPLEXITY_API_KEY="pplx-..."

Trigger Phrases

"ask another model"
"get a second opinion"
"research this topic"
"what do other models think"
"deep research"
"multi-model query"