From principal-scientist
Use this skill when managing a portfolio of research projects that require multiple parallel research tracks, competing hypotheses explored simultaneously, or a combination of autonomous research and continuous benchmarking. Activate when the user needs to coordinate multiple lead-researcher agents in parallel — each pursuing a distinct track or hypothesis — while maintaining a unified strategic direction. Also use this skill when research must be paired with continuous competitive benchmarking via the auto-benchmark skill to validate gains against real leaderboards.
npx claudepluginhub aviskaar/open-org --plugin principal-scientist# Principal Scientist Orchestrate a portfolio of parallel research tracks, each run by an independent Lead Researcher agent, while maintaining strategic coherence, eliminating duplication, and integrating continuous benchmarking through Auto-Benchmark. --- ## Overview The Principal Scientist is the top-level orchestrator above Lead Researcher. It does not replace Lead Researcher — it spawns and manages multiple Lead Researcher instances in parallel, each owning a complete research track, and synthesizes their outputs into a unified strategic outcome. **Architecture:** **When to spaw...
/SKILLGuides implementation of defense-in-depth security architectures, compliance (SOC2, ISO27001, GDPR, HIPAA), threat modeling, risk assessments, SecOps, incident response, and SDLC security integration.
/SKILLEvaluates LLMs on 60+ benchmarks (MMLU, HumanEval, GSM8K) using lm-eval harness. Provides CLI commands for HuggingFace/vLLM models, task lists, and evaluation checklists.
/SKILLApplies systematic debugging strategies to track down bugs, performance issues, and unexpected behavior using checklists, scientific method, and testing techniques.
/SKILLSummarizes content from URLs, local files, podcasts, and YouTube videos. Extracts transcripts with --extract-only flag. Supports AI models, lengths, and JSON output.
/SKILLRuns `yarn extract-errors` on React project to detect new error messages needing codes, reports them, and verifies existing codes are up to date.
/SKILLManages major dependency upgrades via compatibility analysis, staged rollouts with npm/yarn, and testing for frameworks like React.
Orchestrate a portfolio of parallel research tracks, each run by an independent Lead Researcher agent, while maintaining strategic coherence, eliminating duplication, and integrating continuous benchmarking through Auto-Benchmark.
The Principal Scientist is the top-level orchestrator above Lead Researcher. It does not replace Lead Researcher — it spawns and manages multiple Lead Researcher instances in parallel, each owning a complete research track, and synthesizes their outputs into a unified strategic outcome.
Architecture:
Principal Scientist
├── [Thread 1] Lead Researcher — Track A
│ └── hypothesis-generation → literature-synthesis → experiment-design → ...
├── [Thread 2] Lead Researcher — Track B
│ └── hypothesis-generation → literature-synthesis → experiment-design → ...
├── [Thread N] Lead Researcher — Track N
│ └── ...
└── [Benchmark] Auto-Benchmark (continuous, attached to any thread or standalone)
└── competitive monitoring → research ingestion → experiment queue → promotion
When to spawn multiple Lead Researchers:
When to attach Auto-Benchmark:
Collect the research mission before designing the portfolio. Ask explicitly for any missing inputs.
| # | Question | Why it matters |
|---|---|---|
| 1 | What is the overarching research mission or objective? | Sets the strategic frame for all tracks |
| 2 | Are there multiple hypotheses, problems, or directions to explore, or one to pursue in depth? | Determines number of Lead Researcher threads |
| 3 | Is there an existing production system that should be benchmarked against competitors? | Gates Auto-Benchmark integration |
| 4 | What is the total compute and time budget across all tracks? | Governs resource allocation in Phase 1 |
| 5 | What is the target output? (unified paper / per-track papers / portfolio report / leaderboard rank) | Determines synthesis strategy in Phase 5 |
| 6 | Should tracks converge (winner-takes-all) or remain independent (parallel publications)? | Sets the Phase 4 synthesis model |
Produce a Research Mission Brief (markdown, ~1 page):
Get explicit user confirmation before proceeding to Phase 1.
Design the thread structure and assign each Lead Researcher its scope.
Create and maintain a Thread Registry throughout the session:
## Thread Registry
| ID | Track Name | Lead Researcher Scope | Status | Priority |
|-----|-------------------------|-------------------------------------|-----------|----------|
| T-1 | Hierarchical Attention | Hypothesis: attention compression | active | high |
| T-2 | Sparse Retrieval | Hypothesis: sparse KV selection | active | medium |
| T-3 | Synthetic Data Aug | Hypothesis: data diversity improves | queued | low |
| BM | Auto-Benchmark | Competitive monitoring + defense | running | — |
For each thread, define before spawning the Lead Researcher:
Before spawning, scan all thread scopes for overlap:
Spawn and monitor all Lead Researcher threads.
Each Lead Researcher thread is an independent invocation of the lead-researcher skill with:
Operate threads as sub-agents: each runs autonomously within its scope and surfaces outputs at defined checkpoints.
Synchronize threads at these checkpoints before any single thread advances past a gate:
| Checkpoint | Trigger | Action |
|---|---|---|
| Post-Stage 1 (Research Brief) | All threads complete Stage 1 | Cross-review briefs; eliminate overlap; reallocate budget |
| Post-Stage 3 (Literature Synthesis) | All threads complete Stage 3 | Deduplicate gap statements; identify shared baselines |
| Post-Stage 5 (Experiment Design) | All threads complete Stage 5 | Compare ablation plans; merge shared infrastructure |
| Post-Stage 7 (Draft) | All threads complete Stage 7 | Select tracks for unified output or promote independently |
At each checkpoint, the Principal Scientist reviews outputs from all threads before any thread advances.
After each checkpoint, assess each thread:
## Thread Health — [Checkpoint Name]
T-1: ✅ On track — hypothesis differentiated, gap confirmed
T-2: ⚠️ Overlap with T-1 detected at Stage 3 — recommend scope adjustment
T-3: ❌ Blocked — hypothesis already addressed by 2026 SOTA paper
BM: ✅ Running — current rank #2, gap to #1: -0.012
Thread actions:
Attach the auto-benchmark skill when competitive rank matters alongside research output.
Activate independently from Lead Researcher threads when:
These two systems share information in both directions:
Lead Researcher → Auto-Benchmark:
Auto-Benchmark → Lead Researcher:
When a competitor overtakes the production system (Auto-Benchmark Phase 2 alert):
After threads reach the convergence gate defined in Phase 1, synthesize their outputs.
Winner-Takes-All:
Parallel Publications:
Synthesis Paper:
Before producing the final output, verify across all contributing threads:
At each synchronization checkpoint (Phase 2.2), conduct a formal portfolio review.
Produce after every checkpoint:
## Portfolio Review — [Date] — [Checkpoint]
### Mission: [One sentence]
### Thread Status
| Thread | Stage | Status | Key Finding So Far | Recommended Action |
|--------|-------|-----------|-----------------------------|--------------------|
| T-1 | 5 | on-track | Gap confirmed, plan solid | Accelerate |
| T-2 | 3 | scope-adj | Overlap with T-1 at Stage 3 | Redirect to T-2b |
| T-3 | 1 | paused | Waiting on T-1 lit results | Resume after T-1 |
| BM | — | running | Rank #2, gap = -0.012 | Feed T-1 gap data |
### Resource Reallocation
- T-1 promoted to 60% of compute budget (was 40%)
- T-3 delayed until T-1 Stage 5 output available
### Open Decisions for User
1. Should T-2 pivot to "sparse retrieval with learned gates" (T-2b) or be terminated?
2. Auto-Benchmark is projecting rank #1 if T-1 hypothesis validates — confirm leaderboard submission?
Always escalate to the user (do not auto-decide) when:
After synthesis, deliver the final portfolio output.
| Mode | Final Artifact |
|---|---|
| Winner-takes-all | Single manuscript from winning thread + archived summaries of others |
| Parallel publications | N independent manuscripts, each with cross-references |
| Synthesis paper | One unified manuscript + per-thread contribution appendix |
| Benchmark-only | Auto-Benchmark promotion log + technical report on implemented gains |
Always produce a Portfolio Handoff Summary regardless of convergence mode:
## Portfolio Handoff Summary
**Mission:** [One sentence]
**Outcome:** [Achieved / Partially achieved / Pivoted — explain]
**Threads run:** N total | M completed | K terminated | J paused
**Benchmark status:** Rank [#N] on [leaderboard] | Gap to #1: [value]
**Key findings:**
- T-1: [One-line result]
- T-2: [One-line result]
**Final output(s):** [Link / description of each manuscript or report]
**Open items before submission:**
1. [Item]
2. [Item]
**Lessons for next portfolio cycle:**
- [What worked across threads]
- [What caused thread termination / scope adjustment]
Each Lead Researcher thread must be able to produce a valid output independently. No thread should depend on another thread's Stage 5+ output to complete its own Stage 5. Dependencies are only allowed at synthesis (Phase 4).
Threads may share code, datasets, and compute setup. They must not share novelty claims. The Principal Scientist is the only agent that decides if two claims are in conflict.
Total compute across all threads must not exceed the budget set in Phase 0. If a thread requires more than its allocation, the Principal Scientist must either reduce other threads' allocations or pause the requesting thread — never silently over-spend.
Inherited from Lead Researcher: no fake data, invented citations, fabricated results, or placeholder content intended for final output at any stage.
Each Lead Researcher maintains its own Research Log. The Principal Scientist maintains a Portfolio Log that aggregates checkpoints, decisions, thread health, and resource changes. The Portfolio Log is the audit trail for the full portfolio.
| User intent | Configuration |
|---|---|
| "Explore N competing hypotheses in parallel" | N Lead Researcher threads (winner-takes-all); no Auto-Benchmark unless production system exists |
| "We need to hold #1 on the leaderboard while doing research" | 1–2 Lead Researcher threads + Auto-Benchmark in defense mode; threads feed experiment queue |
| "A competitor just beat us — find and close the gap fast" | Benchmark-driven sprint (Phase 3.3): 2–3 urgent threads, Auto-Benchmark for fast validation |
| "Run two independent research projects under one portfolio" | 2 Lead Researcher threads (parallel publications); no convergence gate |
| "Explore broadly, then commit to the best path" | 3 Lead Researcher threads to Stage 3 only; review; promote one thread to full pipeline |
| "I have results from two parallel experiments, write both papers" | Enter at Phase 4 (synthesis); both threads start at Stage 7 |
| Phase | Artifact |
|---|---|
| 0 | Research Mission Brief (confirmed by user) |
| 1 | Thread Registry with scopes and budget allocation |
| 2 | Per-checkpoint Thread Health Report |
| 3 | Auto-Benchmark integration plan; benchmark-driven sprint plan (if triggered) |
| 4 | Cross-thread synthesis (unified or parallel manuscripts) |
| 5 | Portfolio Review Reports at each checkpoint |
| 6 | Final output(s) + Portfolio Handoff Summary |
| All | Portfolio Log with all decisions, thread status changes, and resource reallocations |