From agentdb-learning
Ask the AgentDB bandit which RL algorithm / skill / pattern fits the current task best. Use at task start when there are multiple plausible approaches and you want the data-driven pick.
npx claudepluginhub ruvnet/agentdb --plugin agentdb-learningThis skill uses the workspace's default tool permissions.
Ask the Thompson Sampling bandit which approach to use for the current task.
Train one of AgentDB's 9 RL algorithms on a stream of episodes. Use when the user has accumulated successful/failed episodes and wants to derive a policy, or when a task type is repeated enough to benefit from RL routing.
Scores candidate agent actions by utility (gain minus step cost, uncertainty, redundancy) to guide tool calls, delegation, verification, and termination in LLM orchestration.
Promote a validated pattern into a reusable Skill in AgentDB's skill library. Use when the same approach has worked 3+ times across episodes, or when the user explicitly says "make this a skill" / "save this as reusable".
Share bugs, ideas, or general feedback.
Ask the Thompson Sampling bandit which approach to use for the current task.
agentdb_learning_route(
task: <description>
candidates?: [<skill_id> | <algo>, ...] // omit to consider everything
context?: { stack, project, ... }
)
Returns: { picked, expectedReward, confidence, alternatives: [...] }
Thompson Sampling: each candidate has a Beta(α, β) posterior over reward. The bandit samples once from each, picks the highest sample. Exploration emerges naturally — uncertain candidates get tried until their posterior tightens.
Four bandit decision points across AgentDB:
The router unifies them: it returns the picked candidate AND a decisionTrace showing which decision points fired.
const { picked } = await agentdb_learning_route(...)
const result = await runWith(picked)
agentdb_bandit_update(arm: picked, reward: result.reward)
The agentdb-feedback skill (this plugin) wraps the close-loop step.