npx claudepluginhub plurigrid/asi --plugin asiThis skill uses the workspace's default tool permissions.
> Layer 5: SDE-Based Learning Analysis via Langevin Dynamics
Verifies neural network training convergence to Gibbs equilibrium using Fokker-Planck equation and KL divergence. Detects if trajectory reached mixing time.
Analyzes diffusion processes via SDEs, Fokker-Planck equations, first-passage times, and sensitivity analysis. Use for density evolution, mean first-passage times, parameter effects, or simulation validation.
Guides building dynamic neural networks that grow, prune, or adapt topology during training. Routes to skills on continual learning, gradient isolation, PEFT adapters, modular composition, and lifecycle orchestration.
Share bugs, ideas, or general feedback.
Layer 5: SDE-Based Learning Analysis via Langevin Dynamics
"what would it mean to become the Fokker-Planck equation—identity as probability flow?" — bmorphism gist
Active Inference Connection: Langevin dynamics is the generative model underlying Active Inference in String Diagrams (Tull, Kleiner, Smithe). The gradient descent + noise duality maps to:
Philosophical Frame: bmorphism's question about "becoming the Fokker-Planck equation" points to identity as probability flow — the self is not a fixed point but a trajectory through parameter space, converging toward equilibrium while maintaining exploratory uncertainty.
Ergodic Convergence: For ergodic systems, time averages equal ensemble averages. This is the mathematical foundation for the GF(3) ERGODIC trit — the neutral state that connects BACKFILL (-1) and LIVE (+1) through mixing.
Version: 1.0.0 Trit: 0 (Ergodic - understands convergence) Bundle: analysis Status: ✅ New (based on Moritz Schauer's approach)
Langevin Dynamics Skill implements Moritz Schauer's approach to understanding neural network training through stochastic differential equations (SDEs). Instead of treating training as a black-box optimization, this skill instruments the randomness to reveal:
Key Contribution (Schauer 2015-2025): Continuous-time theory is a guide, not gospel. Real training is discrete. We instrument and verify empirically.
Based on Moritz Schauer's work:
Schauer emphasizes that:
"Don't use continuous theory as a black box. Solve the SDE numerically, compare different discretizations, then verify empirically."
dθ(t) = -∇L(θ(t)) dt + √(2T) dW(t)
Where:
θ = network parameters
L = loss function
∇L = gradient (drift)
T = temperature (noise scale)
dW = Brownian motion (noise)
The distribution of θ evolves according to:
∂p/∂t = ∇·(∇L·p) + T∆p
Stationary distribution: p∞(θ) ∝ exp(-L(θ)/T)
Convergence to this Gibbs distribution governs learning dynamics.
τ_mix ≈ 1 / λ_min(H)
Where H = Hessian of loss landscape
Time until the network reaches equilibrium. Training that stops before equilibration reaches different minima than continuous theory predicts.
Solve Langevin SDE with multiple discretization schemes:
from langevin_dynamics import LangevinSDE, solve_langevin
# Define SDE
sde = LangevinSDE(
loss_fn=neural_network_loss,
gradient_fn=compute_gradient,
temperature=0.01,
base_seed=0xDEADBEEF
)
# Solve with different solvers
solutions = {}
for solver in [EM(), SOSRI(), RKMil()]:
sol, tracking = solve_langevin(
sde=sde,
θ_init=initial_params,
time_span=(0.0, 1.0),
solver=solver,
dt=0.01
)
solutions[solver.__class__.__name__] = (sol, tracking)
# Compare solutions to understand discretization effects
Check if trajectory is approaching Gibbs distribution:
from langevin_dynamics import check_gibbs_convergence
convergence = check_gibbs_convergence(
trajectory=solution,
temperature=0.01,
loss_fn=loss_fn,
gradient_fn=gradient_fn
)
print(f"Mean loss (initial): {convergence['mean_initial_loss']:.5f}")
print(f"Mean loss (final): {convergence['mean_final_loss']:.5f}")
print(f"Std dev (final): {convergence['std_final']:.5f}")
print(f"Gibbs probability ratio: {convergence['gibbs_ratio']:.4f}")
if convergence['converged']:
print("✓ Trajectory has reached Gibbs equilibrium")
else:
print("⚠ Training stopped before equilibration")
Estimate how long until network reaches steady state:
from langevin_dynamics import estimate_mixing_time
tau_mix = estimate_mixing_time(
solution=trajectory,
gradient_fn=gradient_fn,
temperature=T
)
print(f"Estimated mixing time: {tau_mix:.0f} steps")
print(f"Training length: {len(trajectory)} steps")
if len(trajectory) < tau_mix:
print("⚠ Training likely stopped before equilibration")
print(f" Need {tau_mix - len(trajectory)} more steps")
Study how temperature controls exploration:
from langevin_dynamics import analyze_temperature
analysis = analyze_temperature(
temperatures=[0.001, 0.01, 0.1],
loss_fn=loss_fn,
gradient_fn=gradient_fn,
n_steps=1000
)
for T, metrics in analysis.items():
print(f"\nTemperature T = {T}:")
print(f" Final train loss: {metrics['train_loss']:.5f}")
print(f" Test loss: {metrics['test_loss']:.5f}")
print(f" Gen gap: {metrics['gen_gap']:.5f}")
print(f" Trajectory variance: {metrics['variance']:.5f}")
# Interpretation:
# Low T → Sharp basin (good train, may overfit)
# High T → Flat basin (bad train, better generalization)
Compare different step sizes (dt):
from langevin_dynamics import compare_discretizations
comparison = compare_discretizations(
loss_fn=loss_fn,
gradient_fn=gradient_fn,
dt_values=[0.001, 0.01, 0.05],
n_steps=100,
temperature=0.01
)
for dt, result in comparison.items():
print(f"dt = {dt}: final_loss = {result['final_loss']:.5f}")
# Schauer's insight: Different dt give different results
# The continuous limit is asymptotic - finite dt matters!
Track which colors affect which parameter updates:
from langevin_dynamics import instrument_langevin_noise
from gay_mcp import color_at
# Instrument the trajectory
audit_log = instrument_langevin_noise(
trajectory=solution,
seed=base_seed
)
# Example output:
# step_47 → color_0xD8267F (trit=-1) → noise_0.342 → ∆w_42 = -0.0015
# step_48 → color_0x2CD826 (trit=0) → noise_0.156 → ∆b_7 = +0.0082
# Verify GF(3) conservation
gf3_check(audit_log['colors'], balance_threshold=0.1)
All noise is deterministically seeded via Gay.jl:
from gay_mcp import GayIndexedRNG
# Create deterministic noise generator
rng = GayIndexedRNG(base_seed=0xDEADBEEF)
# Each step gets auditable noise
for step in range(n_steps):
color = rng.color_at(step)
noise = rng.randn_from_color(color)
# Update parameters with noise
θ += dt * gradient + sqrt(2*T*dt) * noise
| Layer | Issue | Our Solution |
|---|---|---|
| Numerical | "Which discretization?" | Test multiple dt values; show differences |
| Theoretical | "Does Fokker-Planck hold?" | Verify empirically; measure convergence |
| Empirical | "Matches practice?" | Compare continuous bound vs actual |
| Trit | Skill | Role |
|---|---|---|
| -1 | fokker-planck-analyzer | Validates steady state |
| 0 | langevin-dynamics-skill | Analyzes convergence |
| +1 | entropy-sequencer | Optimizes sequences |
Conservation: (-1) + (0) + (+1) = 0 ✓
# langevin-dynamics.yaml
sde:
temperature: 0.01
learning_rate: 0.01
base_seed: 0xDEADBEEF
discretization:
solvers: [EM, SOSRI, RKMil]
dt_values: [0.001, 0.01, 0.05]
n_steps: 1000
verification:
check_fokker_planck: true
estimate_mixing_time: true
compare_discretizations: true
instrumentation:
track_colors: true
verify_gf3: true
export_audit_log: true
# 1. Solve Langevin SDE
just langevin-solve net=logistic T=0.01 dt=0.01
# 2. Check Fokker-Planck convergence
just langevin-check-gibbs
# 3. Estimate mixing time
just langevin-mixing-time
# 4. Compare discretizations
just langevin-discretization-study
# 5. Analyze temperature effects
just langevin-temperature-sweep
# 6. Verify GF(3) via color tracking
just langevin-verify-colors
entropy-sequencer (Layer 5) - Arranges sequences for learningfokker-planck-analyzer (Validation) - Checks equilibriumgay-mcp (Infrastructure) - Deterministic noiseagent-o-rama (Layer 4) - Temporal learningunworld-skill (Layer 4) - Derivational alternativeSkill Name: langevin-dynamics-skill Type: Analysis / Understanding Trit: 0 (ERGODIC - neutral/analytic) Key Property: Bridges continuous theory to discrete practice via empirical verification Status: ✅ Production Ready Based on: Moritz Schauer's work on SDEs and discretization
This skill connects to the K-Dense-AI/claude-scientific-skills ecosystem:
dynamical-systems: 41 citations in bib.duckdbThis skill maps to Cat# = Comod(P) as a bicomodule in the equipment structure:
Trit: 1 (PLUS)
Home: Prof
Poly Op: ⊗
Kan Role: Lan_K
Color: #4ECDC4
The skill participates in triads satisfying:
(-1) + (0) + (+1) ≡ 0 (mod 3)
This ensures compositional coherence in the Cat# equipment structure.