Create AI learning plugins using AgentDB's 9 reinforcement learning algorithms. Train Decision Transformer, Q-Learning, SARSA, and Actor-Critic models. Deploy these plugins to build self-learning agents, implement RL workflows, and optimize agent behavior through experience. Apply offline RL for safe learning from logged data.
/plugin marketplace add DNYoussef/context-cascade/plugin install dnyoussef-context-cascade@DNYoussef/context-cascadeThis skill inherits all available tools. When active, it can use any tool Claude has access to.
examples/example-1-q-learning.mdexamples/example-2-sarsa.mdexamples/example-3-deep-rl.mdgraphviz/workflow.dotreadme.mdreferences/reward-design.mdreferences/rl-algorithms.mdresources/scripts/benchmark_9algorithms.shresources/scripts/readme.mdresources/scripts/test_learning.pyresources/scripts/train_rl_agent.pyresources/templates/actor-critic.jsonresources/templates/decision-transformer.yamlresources/templates/q-learning-config.yamlresources/templates/readme.mdtests/test-1-q-learning.mdtests/test-2-policy-gradient.mdtests/test-3-decision-transformer.mdUse this skill to create, train, and deploy learning plugins for autonomous agents using AgentDB's 9 reinforcement learning algorithms. Implement offline RL (Decision Transformer) for safe learning from logged experiences. Apply value-based learning (Q-Learning) for discrete actions. Deploy policy gradients (Actor-Critic) for continuous control. Enable agents to improve through experience with WASM-accelerated neural inference.
Performance: Train models 10-100x faster with WASM-accelerated neural inference.
# Interactive wizard
npx agentdb@latest create-plugin
# Use specific template
npx agentdb@latest create-plugin -t decision-transformer -n my-agent
# Preview without creating
npx agentdb@latest create-plugin -t q-learning --dry-run
# Custom output directory
npx agentdb@latest create-plugin -t actor-critic -o ./plugins
# Show all plugin templates
npx agentdb@latest list-templates
# Available templates:
# - decision-transformer (sequence modeling RL - recommended)
# - q-learning (value-based learning)
# - sarsa (on-policy TD learning)
# - actor-critic (policy gradient with baseline)
# - curiosity-driven (exploration-based)
# List installed plugins
npx agentdb@latest list-plugins
# Get plugin information
npx agentdb@latest plugin-info my-agent
# Shows: algorithm, configuration, training status
import { createAgentDBAdapter } from 'agentic-flow/reasoningbank';
// Initialize with learning enabled
const adapter = await createAgentDBAdapter({
dbPath: '.agentdb/learning.db',
enableLearning: true, // Enable learning plugins
enableReasoning: true,
cacheSize: 1000,
});
// Store training experience
await adapter.insertPattern({
id: '',
type: 'experience',
domain: 'game-playing',
pattern_data: JSON.stringify({
embedding: await computeEmbedding('state-action-reward'),
pattern: {
state: [0.1, 0.2, 0.3],
action: 2,
reward: 1.0,
next_state: [0.15, 0.25, 0.35],
done: false
}
}),
confidence: 0.9,
usage_count: 1,
success_count: 1,
created_at: Date.now(),
last_used: Date.now(),
});
// Train learning model
const metrics = await adapter.train({
epochs: 50,
batchSize: 32,
});
console.log('Training Loss:', metrics.loss);
console.log('Duration:', metrics.duration, 'ms');
Type: Offline Reinforcement Learning Best For: Learning from logged experiences, imitation learning Strengths: No online interaction needed, stable training
npx agentdb@latest create-plugin -t decision-transformer -n dt-agent
Use Cases:
Configuration:
{
"algorithm": "decision-transformer",
"model_size": "base",
"context_length": 20,
"embed_dim": 128,
"n_heads": 8,
"n_layers": 6
}
Type: Value-Based RL (Off-Policy) Best For: Discrete action spaces, sample efficiency Strengths: Proven, simple, works well for small/medium problems
npx agentdb@latest create-plugin -t q-learning -n q-agent
Use Cases:
Configuration:
{
"algorithm": "q-learning",
"learning_rate": 0.001,
"gamma": 0.99,
"epsilon": 0.1,
"epsilon_decay": 0.995
}
Type: Value-Based RL (On-Policy) Best For: Safe exploration, risk-sensitive tasks Strengths: More conservative than Q-Learning, better for safety
npx agentdb@latest create-plugin -t sarsa -n sarsa-agent
Use Cases:
Configuration:
{
"algorithm": "sarsa",
"learning_rate": 0.001,
"gamma": 0.99,
"epsilon": 0.1
}
Type: Policy Gradient with Value Baseline Best For: Continuous actions, variance reduction Strengths: Stable, works for continuous/discrete actions
npx agentdb@latest create-plugin -t actor-critic -n ac-agent
Use Cases:
Configuration:
{
"algorithm": "actor-critic",
"actor_lr": 0.001,
"critic_lr": 0.002,
"gamma": 0.99,
"entropy_coef": 0.01
}
Type: Query-Based Learning Best For: Label-efficient learning, human-in-the-loop Strengths: Minimizes labeling cost, focuses on uncertain samples
Use Cases:
Type: Robustness Enhancement Best For: Safety, robustness to perturbations Strengths: Improves model robustness, adversarial defense
Use Cases:
Type: Progressive Difficulty Training Best For: Complex tasks, faster convergence Strengths: Stable learning, faster convergence on hard tasks
Use Cases:
Type: Distributed Learning Best For: Privacy, distributed data Strengths: Privacy-preserving, scalable
Use Cases:
Type: Transfer Learning Best For: Related tasks, knowledge sharing Strengths: Faster learning on new tasks, better generalization
Use Cases:
// Store experiences during agent execution
for (let i = 0; i < numEpisodes; i++) {
const episode = runEpisode();
for (const step of episode.steps) {
await adapter.insertPattern({
id: '',
type: 'experience',
domain: 'task-domain',
pattern_data: JSON.stringify({
embedding: await computeEmbedding(JSON.stringify(step)),
pattern: {
state: step.state,
action: step.action,
reward: step.reward,
next_state: step.next_state,
done: step.done
}
}),
confidence: step.reward > 0 ? 0.9 : 0.5,
usage_count: 1,
success_count: step.reward > 0 ? 1 : 0,
created_at: Date.now(),
last_used: Date.now(),
});
}
}
// Train on collected experiences
const trainingMetrics = await adapter.train({
epochs: 100,
batchSize: 64,
learningRate: 0.001,
validationSplit: 0.2,
});
console.log('Training Metrics:', trainingMetrics);
// {
// loss: 0.023,
// valLoss: 0.028,
// duration: 1523,
// epochs: 100
// }
// Retrieve similar successful experiences
const testQuery = await computeEmbedding(JSON.stringify(testState));
const result = await adapter.retrieveWithReasoning(testQuery, {
domain: 'task-domain',
k: 10,
synthesizeContext: true,
});
// Evaluate action quality
const suggestedAction = result.memories[0].pattern.action;
const confidence = result.memories[0].similarity;
console.log('Suggested Action:', suggestedAction);
console.log('Confidence:', confidence);
// Store experiences in buffer
const replayBuffer = [];
// Sample random batch for training
const batch = sampleRandomBatch(replayBuffer, batchSize: 32);
// Train on batch
await adapter.train({
data: batch,
epochs: 1,
batchSize: 32,
});
// Store experiences with priority (TD error)
await adapter.insertPattern({
// ... standard fields
confidence: tdError, // Use TD error as confidence/priority
// ...
});
// Retrieve high-priority experiences
const highPriority = await adapter.retrieveWithReasoning(queryEmbedding, {
domain: 'task-domain',
k: 32,
minConfidence: 0.7, // Only high TD-error experiences
});
// Collect experiences from multiple agents
for (const agent of agents) {
const experience = await agent.step();
await adapter.insertPattern({
// ... store experience with agent ID
domain: `multi-agent/${agent.id}`,
});
}
// Train shared model
await adapter.train({
epochs: 50,
batchSize: 64,
});
// Collect batch of experiences
const experiences = collectBatch(size: 1000);
// Batch insert (500x faster)
for (const exp of experiences) {
await adapter.insertPattern({ /* ... */ });
}
// Train on batch
await adapter.train({
epochs: 10,
batchSize: 128, // Larger batch for efficiency
});
// Train incrementally as new data arrives
setInterval(async () => {
const newExperiences = getNewExperiences();
if (newExperiences.length > 100) {
await adapter.train({
epochs: 5,
batchSize: 32,
});
}
}, 60000); // Every minute
Combine learning with reasoning for better performance:
// Train learning model
await adapter.train({ epochs: 50, batchSize: 32 });
// Use reasoning agents for inference
const result = await adapter.retrieveWithReasoning(queryEmbedding, {
domain: 'decision-making',
k: 10,
useMMR: true, // Diverse experiences
synthesizeContext: true, // Rich context
optimizeMemory: true, // Consolidate patterns
});
// Make decision based on learned experiences + reasoning
const decision = result.context.suggestedAction;
const confidence = result.memories[0].similarity;
# Create plugin
npx agentdb@latest create-plugin -t decision-transformer -n my-plugin
# List plugins
npx agentdb@latest list-plugins
# Get plugin info
npx agentdb@latest plugin-info my-plugin
# List templates
npx agentdb@latest list-templates
// Reduce learning rate
await adapter.train({
epochs: 100,
batchSize: 32,
learningRate: 0.0001, // Lower learning rate
});
// Use validation split
await adapter.train({
epochs: 50,
batchSize: 64,
validationSplit: 0.2, // 20% validation
});
// Enable memory optimization
await adapter.retrieveWithReasoning(queryEmbedding, {
optimizeMemory: true, // Consolidate, reduce overfitting
});
# Enable quantization for faster inference
# Use binary quantization (32x faster)
npx agentdb@latest mcpCategory: Machine Learning / Reinforcement Learning Difficulty: Intermediate to Advanced Estimated Time: 30-60 minutes