Interactive algorithm selection wizard - match problem characteristics to the right RL algorithm
Interactive wizard that diagnoses your RL problem and recommends the optimal algorithm based on action space, data regime, and sample efficiency constraints.
/plugin marketplace add tachyon-beep/skillpacks/plugin install yzmir-deep-rl@foundryside-marketplaceInteractive wizard to select the right RL algorithm based on problem characteristics.
You MUST ask these questions before recommending an algorithm:
Ask: "What actions can your agent take?"
| Answer | Implication |
|---|---|
| Discrete (buttons, menu choices, grid moves) | DQN family OR PPO |
| Continuous (joint angles, forces, steering) | SAC, TD3, or PPO |
| Mixed (some discrete, some continuous) | PPO (handles both) |
Critical Rule: NEVER recommend DQN for continuous actions.
Ask: "Can your agent interact with the environment during training, or do you have a fixed dataset?"
| Answer | Implication |
|---|---|
| Online (agent interacts, tries actions) | Standard algorithms (DQN, PPO, SAC) |
| Offline (fixed dataset, no interaction) | CQL, IQL (offline-rl) - standard algorithms FAIL |
Critical Rule: Offline data requires special algorithms. DQN/PPO/SAC will fail.
Ask: "How many environment interactions can you afford?"
| Answer | Implication |
|---|---|
| Unlimited (fast simulator) | PPO (simple, stable) |
| Limited (<100k steps, expensive sim) | SAC (off-policy, sample efficient) |
| Very limited (<10k, real robot) | Model-based RL (MBPO, Dreamer) |
Ask: "Any special requirements?"
| Requirement | Algorithm |
|---|---|
| Multiple agents | QMIX, MADDPG (multi-agent-rl) |
| Sparse rewards | Add curiosity/RND (exploration-strategies) |
| Need interpretability | Consider simpler (DQN, REINFORCE) |
| Must be deterministic | TD3 (deterministic policy) |
START
│
├─ Offline data only?
│ └─ YES → CQL or IQL (offline-rl)
│
├─ Continuous actions?
│ ├─ YES + sample efficiency critical → SAC
│ ├─ YES + stability critical → TD3
│ └─ YES + simplicity preferred → PPO
│
├─ Discrete actions?
│ ├─ Small action space (<100) → DQN, Double DQN
│ └─ Large action space → PPO
│
├─ Multi-agent?
│ ├─ Cooperative → QMIX, COMA
│ └─ Competitive/Mixed → MADDPG
│
└─ Extreme sample efficiency needed?
└─ YES → Model-based (MBPO, Dreamer)
After gathering information:
## Algorithm Recommendation
### Problem Characteristics
- Action space: [discrete/continuous]
- Data regime: [online/offline]
- Sample budget: [unlimited/limited/very limited]
- Special requirements: [none/multi-agent/sparse rewards/etc.]
### Recommended Algorithm: [NAME]
**Why this algorithm:**
- [Reason 1 based on action space]
- [Reason 2 based on data regime]
- [Reason 3 based on constraints]
**Alternatives to consider:**
- [Alternative 1]: Use if [condition]
- [Alternative 2]: Use if [condition]
### Next Steps
1. Load [algorithm-skill].md for implementation details
2. Use `/deep-rl:new-experiment --algorithm=[name]` to scaffold
3. Follow rl-debugging if training issues arise
| User Says | Wrong Choice | Correct Choice | Why |
|---|---|---|---|
| "I'll just use PPO" | PPO for everything | Depends on problem | PPO is good but not optimal everywhere |
| "DQN for my robot arm" | DQN for continuous | SAC or TD3 | DQN requires discrete actions |
| "I have logged data from production" | PPO on offline data | CQL or IQL | Offline needs conservative algorithms |
| "I want the newest algorithm" | Latest paper | Problem-appropriate | Newer ≠ better for your problem |
For complete decision framework:
Load skill: yzmir-deep-rl:using-deep-rl
Read: SKILL.md (contains full routing decision tree)
For specific algorithm details:
- value-based-methods.md (DQN family)
- policy-gradient-methods.md (PPO, REINFORCE)
- actor-critic-methods.md (SAC, TD3)
- offline-rl.md (CQL, IQL)
- multi-agent-rl.md (QMIX, MADDPG)