npx claudepluginhub plurigrid/asi --plugin asiThis skill uses the workspace's default tool permissions.
> *"The agent's job is to predict its actions by predicting its sensations."* — Patrick Kenny
Indexes K-Scale Labs robotics skills for humanoid robot development, RL training, sim-to-real transfer, and deployment. Organizes 9 skills in GF(3) triadic structure.
Guides training RL agents with Stable Baselines3 algorithms (PPO, SAC, DQN, TD3, A2C) using Gymnasium environments, custom env creation, callbacks, and optimization.
Guides training RL agents with Stable Baselines3 (PPO, SAC, DQN, TD3, DDPG, A2C), custom Gym environments, callbacks for monitoring, vectorized envs for parallel training, and deep RL workflows.
Share bugs, ideas, or general feedback.
"The agent's job is to predict its actions by predicting its sensations." — Patrick Kenny
Second-order skill synthesizing Patrick Kenny's discrete active inference framework with K-Scale's JAX/MuJoCo robotics stack. This skill emerges from the constructive collision between:
┌─────────────────────────────────────────────────────────────────────────────┐
│ CONSTRUCTIVE COLLISION: Two Threads Converging │
│ │
│ Thread A: Patrick Kenny (Nov 2025) │
│ ════════════════════════════════════ │
│ "Active inference can be formulated as constrained KL divergence │
│ minimization solved by standard mean field methods" │
│ │
│ Key insight: Expected Free Energy ≈ KL Divergence + Entropy Regularizer │
│ │
│ Thread B: K-Scale Labs (2024-2025) │
│ ═══════════════════════════════════ │
│ "RL-based closed-loop control using policies trained in simulation │
│ has firmly won as the best way of achieving real-time control" │
│ │
│ Key insight: Stateless vs Stateful behaviors as pure/coalgebraic semantics │
│ │
│ COLLISION POINT: Both minimize surprise about future observations │
│ ══════════════════════════════════════════════════════════════════ │
│ │
│ Active Inference Robotics RL │
│ ──────────────── ────────── │
│ Predictive Distribution ←→ Policy π(a|s) │
│ Hidden Markov Model ←→ MDP/POMDP │
│ Mean Field Updates ←→ PPO Gradient Steps │
│ Variational Free Energy ←→ Policy Loss │
│ Expected Free Energy ←→ Value Function + Entropy │
│ Perception/Action Loop ←→ Observation/Action Loop │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
From arXiv:2511.20321:
Perception/Action Divergence = VFE(past) + KL(future states)
Where:
- VFE(past) = Standard variational free energy on observed history
- KL(future) = Divergence of predictive distribution from HMM
This differs from Expected Free Energy by an ENTROPY REGULARIZER:
EFE ≈ Pragmatic Value + Mutual Information
PAD ≈ Pragmatic Value + Entropy(Q)
# In ksim PPO training, entropy bonus prevents policy collapse:
loss = policy_loss + value_loss - entropy_coef * entropy
# Kenny's formulation shows this is NOT ad-hoc but principled:
# Entropy regularizer = not being overconfident about predictions
# Biological rationale: know limitations of future predictions
| Active Inference Concept | ksim Implementation |
|---|---|
| Hidden Markov Model | PhysicsEngine (MJX/MuJoCo) |
| Observation distribution | Observation.observe(state) |
| State inference Q(s) | Critic.forward(obs, carry) |
| Action inference Q(a) | Actor.forward(obs, carry) |
| Mean field factorization | Independent Q(s_t) per timestep |
| Predictive distribution | Policy rollout trajectory |
| VFE minimization | PPO policy gradient |
| EFE/PAD minimization | Value function + entropy bonus |
# Agent predicts proprioceptive sensations → fulfills reflexively
class ReflexiveController:
"""
Kenny: "If the agent can successfully predict its future sensations,
it can fulfill them unconsciously via motor reflexes."
"""
def step(self, predicted_proprio: Array) -> Action:
# Low-level PD control fulfills proprioceptive predictions
return self.pd_controller(predicted_proprio, self.current_state)
# When reflexive prediction fails, engage deliberative inference
class DeliberativeController:
"""
Extends reflexive control with policy search over trajectories.
This is where EFE differs from Kenny's PAD formulation.
"""
def plan(self, beliefs: Distribution, horizon: int) -> Policy:
# Tree search over policies weighted by expected free energy
for policy in self.policy_space:
efe = self.expected_free_energy(beliefs, policy, horizon)
# EFE includes mutual information (curiosity/exploration)
# PAD would use entropy instead (uncertainty awareness)
Level 3: Goal Selection (minimize long-horizon EFE)
↓ sets reference for
Level 2: Trajectory Planning (predictive distribution)
↓ sets reference for
Level 1: Reflexive Execution (fulfill proprio predictions)
↓ actuates
Level 0: Motor Primitives (PD control, actuator dynamics)
active-inference (0) ⊗ kscale-ksim (0) ⊗ mujoco-playground (0) = 0 ✓
All three are ERGODIC — coordination/infrastructure skills.
This is a "resonant triad" where all components coordinate.
For generation (+1), add: skill-creator, algorithmic-art
For verification (-1), add: sheaf-cohomology, code-review
| Skill | Trit | Color | Role |
|---|---|---|---|
active-inference | 0 | #DF8D0F | Coordination (theory) |
kscale-ksim | 0 | #25BC3D | Coordination (simulation) |
mujoco-playground | 0 | #93DBDA | Coordination (framework) |
Applying prime-indexed refinement to identify domain experts:
| Prime | Expert | Domain | Key Contribution |
|---|---|---|---|
| 2 | Patrick Kenny | Active Inference | Mean field formulation, PAD criterion |
| 3 | Thomas Parr | Active Inference | 2022 textbook, EFE derivation |
| 5 | Ben Bolte | K-Scale | ksim architecture, open-source humanoids |
| 7 | Karl Friston | Free Energy Principle | FEP foundations, continuous formulation |
| 11 | (DeepMind team) | MuJoCo Playground | MJX, sim2real zero-shot |
| 13 | Wesley Maa | K-Scale | Tooling, visualization |
This skill references and is referenced by:
depends_on:
- kscale-ksim # Simulation implementation
- kscale-ecosystem # Hardware context
- mujoco-playground # Framework foundation
referenced_by:
- cognitive-superposition # Team mental models
- parametrised-optics-cybernetics # Category theory bridge
- reafference-corollary-discharge # Sensorimotor prediction
# Unified Active Inference + RL Training Loop
class ActiveInferenceTrainer:
"""
Combines Kenny's PAD criterion with ksim's PPO.
"""
def __init__(self, hmm: PhysicsEngine, config: Config):
self.hmm = hmm
self.actor = Actor(config)
self.critic = Critic(config)
def perception_action_divergence(
self,
observations: Array, # O_{1:t} (past)
q_future: Distribution # Q(S_{t+1:T}, O_{t+1:T})
) -> Scalar:
"""
Kenny's PAD = VFE(past) + KL(future states from HMM)
"""
# Past: standard VFE on observation history
vfe_past = self.variational_free_energy(observations)
# Future: KL divergence of predicted states from HMM
# Note: Observable emissions cancel out in future KL
kl_future = self.kl_future_states(q_future, self.hmm)
return vfe_past + kl_future
def train_step(self, trajectory: Trajectory) -> Metrics:
# PPO updates approximate mean field coordinate ascent
# Entropy bonus provides Kenny's regularization
return ppo_update(
self.actor,
self.critic,
trajectory,
entropy_coef=0.01 # ← The regularizer!
)
@present SchActiveInferenceRobotics(FreeSchema) begin
# Objects
HMM::Ob # Hidden Markov Model (generative model)
State::Ob # Latent state
Observation::Ob # Sensory observation
Action::Ob # Motor command
Policy::Ob # Action sequence
# Morphisms (inference)
perceive::Hom(Observation, State) # Perception: O → S
predict::Hom(State, Observation) # Prediction: S → O
act::Hom(State, Action) # Action selection: S → A
transition::Hom(State × Action, State) # Dynamics: S × A → S'
# Attributes
FreeEnergy::AttrType
vfe::Attr(State, FreeEnergy) # Variational free energy
efe::Attr(Policy, FreeEnergy) # Expected free energy
pad::Attr(Policy, FreeEnergy) # Perception/action divergence
# The key relationship (Kenny's contribution):
# pad ≈ efe + entropy_regularizer
end