Skill

agentdb-route

Ask the AgentDB bandit which RL algorithm / skill / pattern fits the current task best. Use at task start when there are multiple plausible approaches and you want the data-driven pick.

npx claudepluginhub ruvnet/agentdb --plugin agentdb-learning

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Ask the Thompson Sampling bandit which approach to use for the current task.

SKILL.md

Similar Skills

agentdb-learn

Train one of AgentDB's 9 RL algorithms on a stream of episodes. Use when the user has accumulated successful/failed episodes and wants to derive a policy, or when a task type is repeated enough to benefit from RL routing.

agentdb-learning

utility

275

Scores candidate agent actions by utility (gain minus step cost, uncertainty, redundancy) to guide tool calls, delegation, verification, and termination in LLM orchestration.

7 files

leyline

agentdb-skill-create

Promote a validated pattern into a reusable Skill in AgentDB's skill library. Use when the same approach has worked 3+ times across episodes, or when the user explicitly says "make this a skill" / "save this as reusable".

agentdb-memory

Stats

Parent Repo Stars37

Parent Repo Forks3

Last CommitMay 6, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Route

Ask the Thompson Sampling bandit which approach to use for the current task.

When to use

Task start with multiple plausible skills / algorithms

Branching decision — A/B between approaches

Cold start on a new task type — let the bandit explore

API

agentdb_learning_route( task: <description> candidates?: [<skill_id> | <algo>, ...] // omit to consider everything context?: { stack, project, ... } ) Returns: { picked, expectedReward, confidence, alternatives: [...] }

How it picks

Thompson Sampling: each candidate has a Beta(α, β) posterior over reward. The bandit samples once from each, picks the highest sample. Exploration emerges naturally — uncertain candidates get tried until their posterior tightens.

Four bandit decision points across AgentDB:

Pattern ranking — which historical pattern matches this query best?

Algorithm selection — which RL algo trains best on this task?

Compression tier — full / PQ8 / PQ4 / binary?

Skill composition — chain A→B→C or A→D→E?

The router unifies them: it returns the picked candidate AND a decisionTrace showing which decision points fired.

Use the result, then close the loop

const { picked } = await agentdb_learning_route(...) const result = await runWith(picked) agentdb_bandit_update(arm: picked, reward: result.reward)

The agentdb-feedback skill (this plugin) wraps the close-loop step.

Don't

Don't second-guess the bandit on early calls — exploration is by design.

Don't refuse the bandit's pick without recording negative reward. If you ignored a suggestion and used a different one, log that — otherwise the bandit thinks its pick "worked" because no negative signal arrived.

Route

Ask the Thompson Sampling bandit which approach to use for the current task.

When to use

Task start with multiple plausible skills / algorithms
Branching decision — A/B between approaches
Cold start on a new task type — let the bandit explore

API

agentdb_learning_route(
  task:        <description>
  candidates?: [<skill_id> | <algo>, ...]   // omit to consider everything
  context?:    { stack, project, ... }
)

Returns: { picked, expectedReward, confidence, alternatives: [...] }

How it picks

Four bandit decision points across AgentDB:

Pattern ranking — which historical pattern matches this query best?
Algorithm selection — which RL algo trains best on this task?
Compression tier — full / PQ8 / PQ4 / binary?
Skill composition — chain A→B→C or A→D→E?

The router unifies them: it returns the picked candidate AND a decisionTrace showing which decision points fired.

Use the result, then close the loop

const { picked } = await agentdb_learning_route(...)
const result = await runWith(picked)
agentdb_bandit_update(arm: picked, reward: result.reward)

The agentdb-feedback skill (this plugin) wraps the close-loop step.

Don't

Don't second-guess the bandit on early calls — exploration is by design.
Don't refuse the bandit's pick without recording negative reward. If you ignored a suggestion and used a different one, log that — otherwise the bandit thinks its pick "worked" because no negative signal arrived.