Help us improve
Share bugs, ideas, or general feedback.
High-performance RL training with PufferLib: fast parallel training, vectorized environments, and multi-agent RL. Use for scaling training or integrating Atari, Procgen, NetHack.
npx claudepluginhub alterlab-ieu/alterlab-academic-skills --plugin alterlab-writing-toolsHow this skill is triggered — by the user, by Claude, or both
Slash command
/alterlab-writing-tools:alterlab-pufferlibThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
PufferLib is a high-performance reinforcement learning library designed for fast parallel environment simulation and training. It achieves training at millions of steps per second through optimized vectorization, native multi-agent support, and efficient PPO implementation (PuffeRL). The library provides the Ocean suite of 20+ environments and seamless integration with Gymnasium, PettingZoo, an...
Accelerates RL training with optimized PPO, vectorized environments, and multi-agent support. Integrates with Gymnasium, PettingZoo, Atari, Procgen, NetHack.
Trains RL agents with Stable-Baselines3 (PPO, SAC, DQN, TD3, DDPG, A2C) using a scikit-learn-like API. Covers custom Gymnasium environments, callbacks, and model saving/loading.
Troubleshoots LLM agent RL training: reward stagnation, KL/entropy blow-ups, eval flat, tool-call failures, credit assignment, benchmark contamination. Routes symptoms to cited fixes from a curated corpus.
Share bugs, ideas, or general feedback.
PufferLib is a high-performance reinforcement learning library designed for fast parallel environment simulation and training. It achieves training at millions of steps per second through optimized vectorization, native multi-agent support, and efficient PPO implementation (PuffeRL). The library provides the Ocean suite of 20+ environments and seamless integration with Gymnasium, PettingZoo, and specialized RL frameworks.
Use this skill when:
PuffeRL is PufferLib's optimized PPO+LSTM training algorithm achieving 1M-4M steps/second.
Quick start training:
# CLI training
puffer train procgen-coinrun --train.device cuda --train.learning-rate 3e-4
# Distributed training
torchrun --nproc_per_node=4 train.py
Python training loop:
import pufferlib
import pufferlib.vector
from pufferlib.pufferl import PuffeRL
# Create vectorized environment (vector.make takes an env-constructor callable)
env = pufferlib.vector.make(make_coinrun_env, num_envs=256)
# Create trainer
trainer = PuffeRL(
env=env,
policy=my_policy,
device='cuda',
learning_rate=3e-4,
batch_size=32768
)
# Training loop
for iteration in range(num_iterations):
trainer.evaluate() # Collect rollouts
trainer.train() # Train on batch
trainer.mean_and_log() # Log results
For comprehensive training guidance, read references/training.md for:
Create custom high-performance environments with the PufferEnv API.
Basic environment structure:
import numpy as np
import gymnasium
from pufferlib import PufferEnv
class MyEnvironment(PufferEnv):
def __init__(self, buf=None):
# Define spaces BEFORE calling super().__init__(buf)
self.single_observation_space = gymnasium.spaces.Box(
low=-np.inf, high=np.inf, shape=(4,), dtype=np.float32)
self.single_action_space = gymnasium.spaces.Discrete(4)
self.num_agents = 1
super().__init__(buf)
def reset(self, seed=None):
# Reset state and return (observation, info-list)
obs = self._get_observation()
return obs, []
def step(self, action):
# Execute action, compute reward, check termination/truncation
obs = self._get_observation()
rewards = self._compute_reward()
terminals = self._is_done()
truncations = self._is_truncated()
info = []
return obs, rewards, terminals, truncations, info
Use the template script: scripts/env_template.py provides complete single-agent and multi-agent environment templates with examples of:
For complete environment development, read references/environments.md for:
Achieve maximum throughput with optimized parallel simulation.
Vectorization setup:
import pufferlib.vector
# Automatic vectorization (pass an env-constructor callable)
env = pufferlib.vector.make(env_creator, num_envs=256, num_workers=8)
# Performance benchmarks:
# - Pure Python envs: 100k-500k SPS
# - C-based envs: 100M+ SPS
# - With training: 400k-4M total SPS
Key optimizations:
For vectorization optimization, read references/vectorization.md for:
Build policies as standard PyTorch modules with optional utilities.
Basic policy structure:
import torch.nn as nn
from pufferlib.pytorch import layer_init
class Policy(nn.Module):
def __init__(self, observation_space, action_space):
super().__init__()
# Encoder
self.encoder = nn.Sequential(
layer_init(nn.Linear(obs_dim, 256)),
nn.ReLU(),
layer_init(nn.Linear(256, 256)),
nn.ReLU()
)
# Actor and critic heads
self.actor = layer_init(nn.Linear(256, num_actions), std=0.01)
self.critic = layer_init(nn.Linear(256, 1), std=1.0)
def forward(self, observations):
features = self.encoder(observations)
return self.actor(features), self.critic(features)
For complete policy development, read references/policies.md for:
Seamlessly integrate environments from popular RL frameworks.
Gymnasium integration:
import gymnasium as gym
import pufferlib.emulation
import pufferlib.vector
# Wrap a Gymnasium env in a GymnasiumPufferEnv, then vectorize
def env_creator():
return pufferlib.emulation.GymnasiumPufferEnv(
env_creator=lambda: gym.make('CartPole-v1'))
env = pufferlib.vector.make(env_creator, num_envs=256)
PettingZoo multi-agent:
import pufferlib.emulation
import pufferlib.vector
from pettingzoo.butterfly import knights_archers_zombies_v10
# Wrap a PettingZoo env in a PettingZooPufferEnv, then vectorize
def env_creator():
return pufferlib.emulation.PettingZooPufferEnv(
env_creator=lambda: knights_archers_zombies_v10.parallel_env())
env = pufferlib.vector.make(env_creator, num_envs=128)
Supported frameworks:
For integration details, read references/integration.md for:
scripts/train_template.py as starting pointreferences/training.md for optimizationscripts/env_template.pyreset() and step() methodspufferlib.emulation.GymnasiumPufferEnv and vectorize with pufferlib.vector.make()references/environments.md for advanced patternsreferences/vectorization.md if neededlayer_init for proper weight initializationreferences/policies.mdreferences/vectorization.md for systematic optimizationtrain_template.py - Complete training script template with:
env_template.py - Environment implementation templates:
training.md - Comprehensive training guide:
environments.md - Environment development guide:
vectorization.md - Vectorization optimization:
policies.md - Policy architecture guide:
integration.md - Framework integration guide:
Start simple: Begin with Ocean environments or Gymnasium integration before creating custom environments
Profile early: Measure steps per second from the start to identify bottlenecks
Use templates: scripts/train_template.py and scripts/env_template.py provide solid starting points
Read references as needed: Each reference file is self-contained and focused on a specific capability
Optimize progressively: Start with Python, profile, then optimize critical paths with C if needed
Leverage vectorization: PufferLib's vectorization is key to achieving high throughput
Monitor training: Use WandB or Neptune to track experiments and identify issues early
Test environments: Validate environment logic before scaling up training
Check existing environments: Ocean suite provides 20+ pre-built environments
Use proper initialization: Always use layer_init from pufferlib.pytorch for policies
import pufferlib.vector
# Atari (pass an env-constructor callable)
env = pufferlib.vector.make(make_pong_env, num_envs=256)
# Procgen
env = pufferlib.vector.make(make_coinrun_env, num_envs=256)
# Minigrid
env = pufferlib.vector.make(make_minigrid_env, num_envs=256)
import pufferlib.vector
# PettingZoo (pass an env-constructor callable)
env = pufferlib.vector.make(make_pistonball_env, num_envs=128)
# Shared policy for all agents
policy = create_policy(env.single_observation_space, env.single_action_space)
trainer = PuffeRL(env=env, policy=policy)
import pufferlib.vector
# Create custom environment (a native PufferEnv subclass)
class MyTask(PufferEnv):
# ... implement environment ...
# Vectorize (pass the env class/constructor callable) and train
env = pufferlib.vector.make(MyTask, num_envs=256)
trainer = PuffeRL(env=env, policy=my_policy)
import pufferlib.vector
# Maximize throughput (pass an env-constructor callable)
env = pufferlib.vector.make(
my_env_creator, # env constructor callable
num_envs=1024, # Large batch
num_workers=16, # Many workers
backend=pufferlib.vector.Multiprocessing,
)
uv pip install pufferlib