From agentdb-learning
Train one of AgentDB's 9 RL algorithms on a stream of episodes. Use when the user has accumulated successful/failed episodes and wants to derive a policy, or when a task type is repeated enough to benefit from RL routing.
npx claudepluginhub ruvnet/agentdb --plugin agentdb-learningThis skill uses the workspace's default tool permissions.
Train an RL agent on episode data to derive a policy.
Ask the AgentDB bandit which RL algorithm / skill / pattern fits the current task best. Use at task start when there are multiple plausible approaches and you want the data-driven pick.
Guides training RL agents with Stable Baselines3 algorithms (PPO, SAC, DQN, TD3, A2C) using Gymnasium environments, custom env creation, callbacks, and optimization.
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Share bugs, ideas, or general feedback.
Train an RL agent on episode data to derive a policy.
| Algo | Best for |
|---|---|
| Q-Learning | Tabular state-spaces, discrete actions |
| SARSA | On-policy variant, conservative exploration |
| DQN | High-dimensional state, neural Q-fn |
| PPO | Continuous control, high-dim action |
| Actor-Critic | Baseline reduction, stable training |
| Policy Gradient | Direct policy parameterization |
| Decision Transformer | Offline RL on trajectories |
| MCTS | Tree-search planning under known dynamics |
| Model-Based RL | Sample-efficient when env model is learnable |
If you don't know which to pick, call agentdb_learning_route first — the bandit suggests one based on past performance on similar task signatures.
agentdb_learning_train(
algorithm: <one of above> // or 'auto' to let bandit pick
episodes: [<episodeId>, ...] // or task name → fetch automatically
hyperparams: { lr, gamma, epsilon, ... }
iterations: N
)
agentdb_learning_route(task) may pick this trained skill if it scores well.agentdb_learning_route says "no algorithm has > 0.6 expected reward on this task", the answer is to gather more episodes, not to force-train.