From curry-train
Leaky Integrate-and-Fire spiking neuron with surrogate gradient — converts continuous activations into binary spike trains over T timesteps. Used by spiking transformer architectures (CSLA-MT). Activate when the user asks "LIF neuron", "spiking neural network", "SNN", "spike encoding", "surrogate gradient", or wires up a spiking layer.
npx claudepluginhub curryfromuestc/curry-train --plugin curry-trainThis skill uses the workspace's default tool permissions.
Leaky Integrate-and-Fire neuron: a stateful, time-stepped non-linearity that converts continuous-valued inputs into binary `{0, 1}` spike trains over `T` timesteps. The spike-time dimension is added to the tensor shape: `(B, N, D) → (B, T, N, D)`.
Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.
Processes PDFs: extracts text/tables/images, merges/splits/rotates pages, adds watermarks, creates/fills forms, encrypts/decrypts, OCRs scans. Activates on PDF mentions or output requests.
Share bugs, ideas, or general feedback.
Leaky Integrate-and-Fire neuron: a stateful, time-stepped non-linearity that converts continuous-valued inputs into binary {0, 1} spike trains over T timesteps. The spike-time dimension is added to the tensor shape: (B, N, D) → (B, T, N, D).
At each timestep t:
u[t] = beta * u[t-1] + x (leaky integration).s[t] = H(u[t] - theta) (Heaviside step at threshold).u[t] = u[t] - s[t] * theta (subtract threshold from neurons that fired).Backward uses a surrogate gradient for the non-differentiable Heaviside — typically a triangular pulse centered at the threshold.
from curry_train.primitives import LIFNeuron
lif = LIFNeuron(
d_model=2048,
T=4, # number of spike timesteps
init_theta=1.0, # firing threshold (learnable, per-feature)
init_beta=0.5, # decay before sigmoid (learnable, per-feature)
gamma=1.0, # surrogate gradient width
backend="custom", # or "spikingjelly"
)
# Input: x shape (B, N, D)
# Output: s shape (B, T, N, D) — binary spikes
spikes = lif(x)
This is the only V1 primitive that changes tensor rank: it adds a T dimension.
(B, N, D).(B, T, N, D).Downstream layers (attention, MLP) must either:
(B, T, N, D) directly (channel-aware spike processing), orT first (e.g., spike rate mean over T → (B, N, D)) and then operate continuously.The model's documentation should pin the rank contract at every boundary.
V1 should support two backends, selectable at construction:
torch.autograd.Function for the surrogate gradient. Slower, fully transparent, easy to debug.ParametricLIFNode with optional CuPy backend. Faster, but requires pip install spikingjelly and CuPy for the fast path.The reference implementation in csla_mt/model/spiking_neuron.py shows both.
g(u) = max(0, 1 - |u - theta| / gamma) / gamma. Piecewise linear, centered at threshold.g(u) = sigmoid(alpha * (u - theta)) * (1 - sigmoid(alpha * (u - theta))). Smooth.g(u) = (1/pi) * 1 / (1 + (alpha * (u - theta))²). Most common in modern SNN papers.gamma (or alpha) controls the width: smaller = sharper, larger = smoother. Default 1.0 is a reasonable starting point; tune per task.
node.reset() between independent forward passes (for SpikingJelly). Forgetting this carries state across batches — silent bug.T dimension makes attention memory O(B * T * H * N²) — T=4 is a 4× memory tax on attention.V1: stub at template/curry_train/primitives/lif_neuron.py. Reference implementation at /home/yanggl/code/autoresearch/csla_mt/model/spiking_neuron.py (the user's own working code).
skills/primitive-rmsnorm — for non-spiking layers; SNN often uses BatchNorm instead.