Generate hypotheses from accumulated knowledge, create a Research PRD, convert to JSON, and prepare for Ralph Loop execution.
Generates research hypotheses from accumulated knowledge and creates a Research PRD for Ralph Loop execution.
/plugin marketplace add hdubey-debug/orion/plugin install hdubey-debug-orion@hdubey-debug/orionGenerate hypotheses from accumulated knowledge, create a Research PRD, convert to JSON, and prepare for Ralph Loop execution.
/hypothesis-generation [--target <score>] [--benchmark <name>]
/hypothesis-generation
/hypothesis-generation --target 75.0 --benchmark VideoMME
IMPORTANT: This command MUST use Plan Mode. Create a plan first, get user approval, then execute.
When user invokes /hypothesis-generation, follow this process:
Use EnterPlanMode, then read accumulated skills:
# Literature knowledge
cat research/skills/literature/_overview.md
# Domain knowledge
cat research/skills/domain/_overview.md
# Benchmark knowledge
cat research/skills/benchmarks/_overview.md
# Any previous learnings
cat research/skills/learned/_overview.md
# Current project state
cat research/orion.json
Create a synthesis of all knowledge:
## Knowledge Synthesis
### From Literature
- Method A from P001: [key insight]
- Method B from P002: [key insight]
- Gap identified: [what papers don't solve]
### From Domain Expertise
- User intuition: [relevant ideas]
- Constraints: [what limits our approach]
### From Benchmarks
- Target: [benchmark] at [score]
- Current SOTA: [method] at [score]
- Gap to close: [X points]
### Promising Directions
1. [Direction 1] - combines [Method A] with [Domain insight]
2. [Direction 2] - addresses [gap] using [technique]
3. [Direction 3] - [reasoning]
Based on synthesis, propose 5-10 hypotheses:
## Proposed Hypotheses
### H-001: [Title]
- **Rationale**: [Why this might work - cite knowledge source]
- **From**: Literature (P001) + Domain intuition
- **Implementation**: [Specific changes needed]
- **Estimated Impact**: [Low/Medium/High]
- **Complexity**: [Low/Medium/High]
- **Priority**: [Score = Impact × 1/Complexity]
### H-002: [Title]
...
### Priority Order
| Rank | ID | Title | Impact | Complexity | Priority |
|------|-----|-------|--------|------------|----------|
| 1 | H-003 | [Title] | High | Low | 9 |
| 2 | H-001 | [Title] | High | Medium | 6 |
| 3 | H-002 | [Title] | Medium | Low | 6 |
Generated [N] hypotheses from accumulated knowledge:
1. H-001: [Title] (Priority: High)
Rationale: [Brief]
2. H-002: [Title] (Priority: Medium)
Rationale: [Brief]
...
Options:
A. Accept all hypotheses as proposed
B. I want to modify/reorder (will show editor)
C. Add my own hypothesis
D. Regenerate with different focus
Which option?
If user wants to modify (Option B):
If user wants to add (Option C):
Use ExitPlanMode after finalizing hypothesis list.
Create research/research-prd.md:
# Research PRD: [Project Name]
## 1. Objective
**Goal**: [Primary research objective]
**Target**: Beat [benchmark] score of [X] (current SOTA: [Y])
**Success Metric**: [metric] > [target]
## 2. Background
### Literature Summary
[Key methods and insights from papers]
### Domain Context
[Relevant domain knowledge]
### Current State
- Baseline: [method] at [score]
- Gap: [X points]
## 3. Hypotheses (User Stories)
### H-001: [Title]
**As a** researcher
**I want to** [implement hypothesis]
**So that** [expected improvement]
**Rationale**: [Why this should work]
**Source**: [Literature/Domain/Intuition]
**Acceptance Criteria**:
- [ ] Implementation complete
- [ ] Subset test shows improvement (>= baseline)
- [ ] Full benchmark run if subset promising
- [ ] Results documented
- [ ] Typecheck passes (if code changes)
**Priority**: 1
**Estimated Complexity**: [Low/Medium/High]
### H-002: [Title]
...
## 4. Evaluation Plan
### Benchmarks
| Benchmark | Metric | Subset Size | Full Size |
|-----------|--------|-------------|-----------|
| [Name] | [Metric] | [N] | [M] |
### Testing Protocol
1. Run subset test (10% data)
2. If subset >= baseline: run full benchmark
3. If subset < baseline: analyze failure, skip or iterate
### Success Criteria
- **Hypothesis passes**: Full benchmark > baseline
- **Project succeeds**: Full benchmark >= target
## 5. Codebase Setup
**Repository**: [URL or path]
**Branch Strategy**:
- `main`: Stable baseline
- `orion/hXXX-name`: Per-hypothesis experiments
## 6. Non-Goals
- [What we're NOT trying to do]
- [Scope boundaries]
## 7. Risks
- [Risk 1]: [Mitigation]
- [Risk 2]: [Mitigation]
## 8. Timeline
[Not time-based, but order of operations]
1. Test H-001
2. If pass, merge; if fail, analyze
3. Test H-002
4. Continue until target or exhausted
Create research/research-prd.json:
{
"project": "[project-name]",
"branchName": "orion/research",
"description": "[Research goal]",
"target": {
"benchmark": "[benchmark]",
"metric": "[metric]",
"score": [target-score]
},
"baseline": {
"method": "[method]",
"score": [baseline-score]
},
"userStories": [
{
"id": "H-001",
"title": "[Hypothesis title]",
"description": "As a researcher, I want to [hypothesis] so that [benefit]",
"rationale": "[Why this should work]",
"source": "[Literature/Domain/Intuition]",
"implementation": "[Specific changes]",
"acceptanceCriteria": [
"Implementation complete",
"Subset test shows improvement (>= baseline)",
"Full benchmark run if subset promising",
"Results documented in skills/learned/",
"Typecheck passes"
],
"priority": 1,
"complexity": "medium",
"status": "pending",
"subset_result": null,
"full_result": null,
"analysis": "",
"branch": "orion/h001-[slug]"
}
],
"completed": false,
"best_result": null
}
{
"phases": {
"hypothesis_generation": "complete",
"experimentation": "ready"
},
"hypotheses": [...],
"target": {
"benchmark": "[benchmark]",
"score": [target]
}
}
Research PRD Generated!
Target: [benchmark] >= [target] (baseline: [baseline])
Hypotheses ready for testing:
| Priority | ID | Title | Complexity |
|----------|-----|-------|------------|
| 1 | H-001 | [Title] | Medium |
| 2 | H-002 | [Title] | Low |
| 3 | H-003 | [Title] | High |
Files created:
- research/research-prd.md (human readable)
- research/research-prd.json (for Ralph Loop)
Ready to start experiments!
Next steps:
1. Setup codebase: /orion-setup <repo-url>
2. Or start directly: /ralph-loop research/research-prd.json
Recommended command:
/ralph-loop "Test hypotheses in research/research-prd.json. For each: implement, subset test, full test if promising, document learnings. Output <promise>RESEARCH_COMPLETE</promise> when target achieved or all hypotheses tested." --max-iterations 50 --completion-promise "RESEARCH_COMPLETE"
Priority = Impact × (1 / Complexity)
Impact:
- High (3): Could achieve target alone
- Medium (2): Meaningful improvement expected
- Low (1): Incremental improvement
Complexity:
- Low (1): Config change or small code edit
- Medium (2): New component or significant changes
- High (3): Major architecture change
After this command, run:
/ralph-loop "Implement research hypotheses from research/research-prd.json..." --max-iterations 50
Ralph will: