npx claudepluginhub aznatkoiny/zai-skills --plugin AI-ToolkitWant just this skill?
Then install: npx claudepluginhub u/[userId]/[slug]
C++ Reinforcement Learning best practices using libtorch (PyTorch C++ frontend) and modern C++17/20. Use when: - Implementing RL algorithms in C++ for performance-critical applications - Building production RL systems with libtorch - Creating replay buffers and experience storage - Optimizing RL training with GPU acceleration - Deploying RL models with ONNX Runtime
This skill uses the workspace's default tool permissions.
references/algorithms.mdreferences/libtorch.mdreferences/memory-management.mdreferences/performance.mdreferences/testing.mdC++ Reinforcement Learning
Overview
This skill covers implementing reinforcement learning algorithms in C++ using LibTorch (PyTorch C++ frontend) and modern C++17/20 features. It provides patterns for building high-performance RL systems suitable for production deployment, robotics, game AI, and real-time applications.
When to Use
- Implementing DQN, PPO, SAC, or other RL algorithms in C++
- Building performance-critical RL training pipelines
- Creating efficient replay buffers with proper memory management
- Deploying trained models with ONNX Runtime
- Parallelizing environment rollouts across threads
- Integrating RL with existing C++ codebases (games, robotics, simulations)
Core Libraries
Primary: LibTorch (PyTorch C++ Frontend)
LibTorch provides the same tensor operations and autograd capabilities as PyTorch in C++.
Installation: Download from https://pytorch.org/get-started/locally (select C++/LibTorch)
CMake Integration:
cmake_minimum_required(VERSION 3.18)
project(rl_project)
set(CMAKE_CXX_STANDARD 17)
find_package(Torch REQUIRED)
add_executable(train_agent src/main.cpp)
target_link_libraries(train_agent "${TORCH_LIBRARIES}")
Secondary Libraries
- ONNX Runtime - Cross-platform inference deployment
- cpprl (mhubii/cpprl) - Reference PPO implementation
- Gymnasium C++ bindings - Environment interfaces
Quick Start: DQN Agent
#include <torch/torch.h>
struct DQNNet : torch::nn::Module {
torch::nn::Linear fc1{nullptr}, fc2{nullptr}, fc3{nullptr};
DQNNet(int64_t state_dim, int64_t action_dim) {
fc1 = register_module("fc1", torch::nn::Linear(state_dim, 128));
fc2 = register_module("fc2", torch::nn::Linear(128, 128));
fc3 = register_module("fc3", torch::nn::Linear(128, action_dim));
}
torch::Tensor forward(torch::Tensor x) {
x = torch::relu(fc1->forward(x));
x = torch::relu(fc2->forward(x));
return fc3->forward(x);
}
};
// Training loop
auto policy_net = std::make_shared<DQNNet>(state_dim, action_dim);
auto target_net = std::make_shared<DQNNet>(state_dim, action_dim);
torch::optim::Adam optimizer(policy_net->parameters(), lr);
// Compute loss
auto q_values = policy_net->forward(states).gather(1, actions);
auto next_q = target_net->forward(next_states).max(1).values.detach();
auto target = rewards + gamma * next_q * (1 - dones);
auto loss = torch::mse_loss(q_values.squeeze(), target);
// Backward pass
optimizer.zero_grad();
loss.backward();
optimizer.step();
Essential Patterns
Replay Buffer (Ring Buffer)
class ReplayBuffer {
public:
explicit ReplayBuffer(size_t capacity)
: capacity_(capacity), position_(0), size_(0) {
buffer_.reserve(capacity);
}
void push(Experience exp) {
if (buffer_.size() < capacity_) {
buffer_.push_back(std::move(exp));
} else {
buffer_[position_] = std::move(exp);
}
position_ = (position_ + 1) % capacity_;
size_ = std::min(size_ + 1, capacity_);
}
std::vector<Experience> sample(size_t batch_size);
private:
std::vector<Experience> buffer_;
size_t capacity_, position_, size_;
std::mt19937 rng_{std::random_device{}()};
};
GPU Device Management
torch::Device device = torch::cuda::is_available() ? torch::kCUDA : torch::kCPU;
model->to(device);
// Create tensors on device
auto tensor = torch::zeros({batch_size, state_dim},
torch::TensorOptions().device(device).dtype(torch::kFloat32));
Inference Mode
{
torch::NoGradGuard no_grad;
auto action_values = model->forward(state);
auto action = action_values.argmax(1);
}
Common Pitfalls
- Forgetting train/eval mode - Call
model->train()ormodel->eval() - Missing NoGradGuard - Use for inference to save memory
- Tensor accumulation - Use
.detach()for stored tensors - Thread safety - Clone models for parallel threads
- Device mismatch - Verify all tensors on same device
Reference Files
- references/libtorch.md - LibTorch setup and API guide
- references/algorithms.md - DQN, PPO, SAC implementations
- references/memory-management.md - Replay buffers, smart pointers, RAII
- references/performance.md - Optimization, parallelization, GPU
- references/testing.md - Testing and debugging strategies
Similar Skills
Expert guidance for Next.js Cache Components and Partial Prerendering (PPR). **PROACTIVE ACTIVATION**: Use this skill automatically when working in Next.js projects that have `cacheComponents: true` in their next.config.ts/next.config.js. When this config is detected, proactively apply Cache Components patterns and best practices to all React Server Component implementations. **DETECTION**: At the start of a session in a Next.js project, check for `cacheComponents: true` in next.config. If enabled, this skill's patterns should guide all component authoring, data fetching, and caching decisions. **USE CASES**: Implementing 'use cache' directive, configuring cache lifetimes with cacheLife(), tagging cached data with cacheTag(), invalidating caches with updateTag()/revalidateTag(), optimizing static vs dynamic content boundaries, debugging cache issues, and reviewing Cache Component implementations.
Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.