Skill

Community

cpp-reinforcement-learning

Install

Install the plugin

npx claudepluginhub aznatkoiny/zai-skills --plugin AI-Toolkit

Want just this skill?

Then install: npx claudepluginhub u/[userId]/[slug]

Description

C++ Reinforcement Learning best practices using libtorch (PyTorch C++ frontend) and modern C++17/20. Use when: - Implementing RL algorithms in C++ for performance-critical applications - Building production RL systems with libtorch - Creating replay buffers and experience storage - Optimizing RL training with GPU acceleration - Deploying RL models with ONNX Runtime

Tool Access

This skill uses the workspace's default tool permissions.

Supporting Assets

View in Repository

references/algorithms.md

references/libtorch.md

references/memory-management.md

references/performance.md

references/testing.md

Skill Content

C++ Reinforcement Learning

Overview

This skill covers implementing reinforcement learning algorithms in C++ using LibTorch (PyTorch C++ frontend) and modern C++17/20 features. It provides patterns for building high-performance RL systems suitable for production deployment, robotics, game AI, and real-time applications.

When to Use

Implementing DQN, PPO, SAC, or other RL algorithms in C++
Building performance-critical RL training pipelines
Creating efficient replay buffers with proper memory management
Deploying trained models with ONNX Runtime
Parallelizing environment rollouts across threads
Integrating RL with existing C++ codebases (games, robotics, simulations)

Core Libraries

Primary: LibTorch (PyTorch C++ Frontend)

LibTorch provides the same tensor operations and autograd capabilities as PyTorch in C++.

Installation: Download from https://pytorch.org/get-started/locally (select C++/LibTorch)

CMake Integration:

cmake_minimum_required(VERSION 3.18)
project(rl_project)

set(CMAKE_CXX_STANDARD 17)
find_package(Torch REQUIRED)

add_executable(train_agent src/main.cpp)
target_link_libraries(train_agent "${TORCH_LIBRARIES}")

Secondary Libraries

ONNX Runtime - Cross-platform inference deployment
cpprl (mhubii/cpprl) - Reference PPO implementation
Gymnasium C++ bindings - Environment interfaces

Quick Start: DQN Agent

#include <torch/torch.h>

struct DQNNet : torch::nn::Module {
    torch::nn::Linear fc1{nullptr}, fc2{nullptr}, fc3{nullptr};

    DQNNet(int64_t state_dim, int64_t action_dim) {
        fc1 = register_module("fc1", torch::nn::Linear(state_dim, 128));
        fc2 = register_module("fc2", torch::nn::Linear(128, 128));
        fc3 = register_module("fc3", torch::nn::Linear(128, action_dim));
    }

    torch::Tensor forward(torch::Tensor x) {
        x = torch::relu(fc1->forward(x));
        x = torch::relu(fc2->forward(x));
        return fc3->forward(x);
    }
};

// Training loop
auto policy_net = std::make_shared<DQNNet>(state_dim, action_dim);
auto target_net = std::make_shared<DQNNet>(state_dim, action_dim);
torch::optim::Adam optimizer(policy_net->parameters(), lr);

// Compute loss
auto q_values = policy_net->forward(states).gather(1, actions);
auto next_q = target_net->forward(next_states).max(1).values.detach();
auto target = rewards + gamma * next_q * (1 - dones);
auto loss = torch::mse_loss(q_values.squeeze(), target);

// Backward pass
optimizer.zero_grad();
loss.backward();
optimizer.step();

Essential Patterns

Replay Buffer (Ring Buffer)

class ReplayBuffer {
public:
    explicit ReplayBuffer(size_t capacity)
        : capacity_(capacity), position_(0), size_(0) {
        buffer_.reserve(capacity);
    }

    void push(Experience exp) {
        if (buffer_.size() < capacity_) {
            buffer_.push_back(std::move(exp));
        } else {
            buffer_[position_] = std::move(exp);
        }
        position_ = (position_ + 1) % capacity_;
        size_ = std::min(size_ + 1, capacity_);
    }

    std::vector<Experience> sample(size_t batch_size);

private:
    std::vector<Experience> buffer_;
    size_t capacity_, position_, size_;
    std::mt19937 rng_{std::random_device{}()};
};

GPU Device Management

torch::Device device = torch::cuda::is_available() ? torch::kCUDA : torch::kCPU;
model->to(device);

// Create tensors on device
auto tensor = torch::zeros({batch_size, state_dim},
    torch::TensorOptions().device(device).dtype(torch::kFloat32));

Inference Mode

{
    torch::NoGradGuard no_grad;
    auto action_values = model->forward(state);
    auto action = action_values.argmax(1);
}

Common Pitfalls

Forgetting train/eval mode - Call model->train() or model->eval()
Missing NoGradGuard - Use for inference to save memory
Tensor accumulation - Use .detach() for stored tensors
Thread safety - Clone models for parallel threads
Device mismatch - Verify all tensors on same device

Reference Files

references/libtorch.md - LibTorch setup and API guide
references/algorithms.md - DQN, PPO, SAC implementations
references/memory-management.md - Replay buffers, smart pointers, RAII
references/performance.md - Optimization, parallelization, GPU
references/testing.md - Testing and debugging strategies

Links

Stats

Stars0

Forks0

Last CommitFeb 13, 2026

Similar Skills

cache-components

Expert guidance for Next.js Cache Components and Partial Prerendering (PPR). **PROACTIVE ACTIVATION**: Use this skill automatically when working in Next.js projects that have `cacheComponents: true` in their next.config.ts/next.config.js. When this config is detected, proactively apply Cache Components patterns and best practices to all React Server Component implementations. **DETECTION**: At the start of a session in a Next.js project, check for `cacheComponents: true` in next.config. If enabled, this skill's patterns should guide all component authoring, data fetching, and caching decisions. **USE CASES**: Implementing 'use cache' directive, configuring cache lifetimes with cacheLife(), tagging cached data with cacheTag(), invalidating caches with updateTag()/revalidateTag(), optimizing static vs dynamic content boundaries, debugging cache issues, and reviewing Cache Component implementations.

cache-components

138.4k

algorithmic-art

3 files

Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.

anthropic-skills

99.3k

brand-guidelines

1 file

Applies Anthropic's official brand colors and typography to any sort of artifact that may benefit from having Anthropic's look-and-feel. Use it when brand colors or style guidelines, visual formatting, or company design standards apply.

anthropic-skills

99.3k