Skill

enzyme-autodiff

From asi

Implements Enzyme.jl automatic differentiation for Julia code, enabling high-performance forward/reverse mode gradients on CPU/GPU for ML models and optimization.

ai-ml

npx claudepluginhub plurigrid/asi --plugin asi

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Enzyme.jl provides LLVM-level automatic differentiation for Julia, enabling high-performance gradient computation for both CPU and GPU code.

SKILL.md

Similar Skills

julia-scientific

Maps 137 Python scientific packages in bioinformatics, chemistry, ML, quantum, and data science to Julia equivalents. Useful for migrating code to Julia ecosystem.

1 file

asi

julia-pro

682

Provides expertise in Julia 1.10+ features, performance optimization, multiple dispatch, package management, and high-performance scientific computing.

rmyndharis-antigravity-skills

julia-pro

36.4k

Guides Julia 1.10+ development with modern features, performance optimization, multiple dispatch, tooling, testing, and production practices.

antigravity-awesome-skills

Stats

Parent Repo Stars16

Parent Repo Forks5

Last CommitFeb 16, 2026

Used By2 plugins

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Enzyme.jl Automatic Differentiation Skill

Enzyme.jl provides LLVM-level automatic differentiation for Julia, enabling high-performance gradient computation for both CPU and GPU code.

Type Annotations

Type annotations control how arguments are treated during differentiation:

Annotation	Description	Usage
`Const(x)`	Constant, not differentiated	Parameters, hyperparameters
`Active(x)`	Scalar to differentiate (reverse mode only)	Scalar inputs
`Duplicated(x, ∂x)`	Mutable with shadow accumulator	Arrays, mutable structs
`DuplicatedNoNeed(x, ∂x)`	Like Duplicated, may skip primal	Performance optimization
`BatchDuplicated(x, ∂xs)`	Batched shadows (tuple)	Multiple derivatives at once
`MixedDuplicated(x, ∂x)`	Mixed active/duplicated data	Custom rules with mixed types

using Enzyme

# Active for scalars (reverse mode)
f(x) = x^2
autodiff(Reverse, f, Active, Active(3.0))  # Returns ((6.0,),)

# Duplicated for arrays
A = [1.0, 2.0, 3.0]
dA = zeros(3)
g(A) = sum(A .^ 2)
autodiff(Reverse, g, Active, Duplicated(A, dA))
# dA now contains [2.0, 4.0, 6.0]

# Const for non-differentiated arguments
h(x, c) = c * x^2
autodiff(Reverse, h, Active, Active(2.0), Const(3.0))  # Only differentiates x

Differentiation Modes

Mode	Direction	Returns	Use Case
`Forward`	Tangent propagation	Derivative	Single input, many outputs
`ForwardWithPrimal`	Forward + primal	(primal, derivative)	Need both values
`Reverse`	Adjoint propagation	Gradient tuple	Many inputs, scalar output
`ReverseWithPrimal`	Reverse + primal	(primal, gradients)	Need both values
`ReverseSplitWithPrimal`	Separated passes	(forward_fn, reverse_fn)	Custom control flow

# Forward mode: use Duplicated, not Active
autodiff(Forward, x -> x^2, Duplicated(3.0, 1.0))  # Returns (6.0,)

# Forward with primal
autodiff(ForwardWithPrimal, x -> x^2, Duplicated(3.0, 1.0))  # Returns (9.0, 6.0)

# Reverse mode: scalar outputs, use Active
autodiff(Reverse, x -> x^2, Active, Active(3.0))  # Returns ((6.0,),)

autodiff and autodiff_thunk

autodiff

Primary differentiation interface:

autodiff(mode, func, return_annotation, arg_annotations...)

autodiff_thunk

Returns compiled forward/reverse thunks for repeated use:

# Split mode returns separate forward and reverse functions
forward, reverse = autodiff_thunk(
    ReverseSplitWithPrimal,
    Const{typeof(f)},
    Active,
    Duplicated{typeof(A)},
    Active{typeof(v)}
)

# Forward pass returns (tape, primal, shadow)
tape, primal, shadow = forward(Const(f), Duplicated(A, dA), Active(v))

# Reverse pass uses tape
reverse(Const(f), Duplicated(A, dA), Active(v), 1.0, tape)

LLVM Integration

Enzyme operates at LLVM IR level, providing:

Direct LLVM transformation without Julia overhead
Optimal derivative code generation
Integration with GPUCompiler.jl for GPU support

# Enzyme uses LLVM-level activity analysis
# to determine which values need differentiation
using Enzyme.API
API.typeWarning!(false)  # Suppress type warnings
API.strictAliasing!(true)  # Enable strict aliasing optimizations

Rule System (EnzymeRules)

Define custom derivatives when automatic differentiation is insufficient:

using EnzymeRules
using EnzymeCore

# Custom forward rule
function EnzymeRules.forward(
    ::Const{typeof(my_func)},
    RT::Type{<:Union{Duplicated, DuplicatedNoNeed}},
    x::Duplicated
)
    primal = my_func(x.val)
    derivative = custom_derivative(x.val) * x.dval
    return Duplicated(primal, derivative)
end

# Custom reverse rule: augmented_primal + reverse
function EnzymeRules.augmented_primal(
    config,
    ::Const{typeof(my_func)},
    RT::Type{<:Active},
    x::Active
)
    primal = my_func(x.val)
    tape = (x.val,)  # Store for reverse pass
    return AugmentedReturn(primal, nothing, tape)
end

function EnzymeRules.reverse(
    config,
    ::Const{typeof(my_func)},
    dret::Active,
    tape,
    x::Active
)
    x_val = tape[1]
    dx = custom_derivative(x_val) * dret.val
    return (dx,)
end

Import ChainRules

using Enzyme
using ChainRulesCore

# Import existing ChainRules as Enzyme rules
@import_rrule typeof(special_func) Float64
@import_frule typeof(special_func) Float64

CUDA.jl Integration (EnzymeCoreExt)

Differentiate GPU kernels with autodiff_deferred:

using CUDA
using Enzyme

# GPU kernel
function mul_kernel!(A, B, C)
    i = threadIdx().x
    C[i] = A[i] * B[i]
    return nothing
end

# Differentiate within kernel
function grad_kernel!(A, dA, B, dB, C, dC)
    autodiff_deferred(
        Reverse,
        mul_kernel!,
        Const,
        Duplicated(A, dA),
        Duplicated(B, dB),
        Duplicated(C, dC)
    )
    return nothing
end

# Launch differentiated kernel
A = CUDA.rand(32)
dA = CUDA.zeros(32)
B = CUDA.rand(32)
dB = CUDA.zeros(32)
C = CUDA.zeros(32)
dC = CUDA.ones(32)  # Seed adjoint

@cuda threads=32 grad_kernel!(A, dA, B, dB, C, dC)

GPUCompiler Integration

using EnzymeCore

# Enzyme uses compiler_job_from_backend for GPU compilation
# This is automatically configured when CUDA.jl is loaded
function EnzymeCore.compiler_job_from_backend(::CUDABackend, F, TT)
    return GPUCompiler.CompilerJob(
        CUDA.compiler_config(CUDA.device()),
        F, TT
    )
end

Common Patterns

Gradient of loss function

function loss(params, data)
    predictions = model(params, data.x)
    return sum((predictions .- data.y).^2)
end

dparams = zero(params)
autodiff(Reverse, loss, Active, Duplicated(params, dparams), Const(data))
# dparams now contains ∇loss

Jacobian-vector product (JVP)

function f(x)
    return [x[1]^2 + x[2], x[1] * x[2]]
end

x = [2.0, 3.0]
v = [1.0, 0.0]  # Direction vector
dx = copy(v)
dy = zeros(2)

autodiff(Forward, f, Duplicated(x, dx))  # Returns JVP

Vector-Jacobian product (VJP)

function f!(y, x)
    y[1] = x[1]^2 + x[2]
    y[2] = x[1] * x[2]
    return nothing
end

x = [2.0, 3.0]
dx = zeros(2)
y = zeros(2)
dy = [1.0, 0.0]  # Adjoint seed

autodiff(Reverse, f!, Const, Duplicated(y, dy), Duplicated(x, dx))
# dx now contains VJP

MaxEnt Triad Testing Protocol

Three agents maximize mutual information through complementary verification:

Agent	Role	Verifies
julia-gpu-kernels	Input provider	@kernel functions to differentiate
enzyme-autodiff	Differentiator	Correct gradient computation
julia-tempering	Seed provider	Reproducible differentiation

Information Flow

julia-tempering ──seed──▶ julia-gpu-kernels ──kernel──▶ enzyme-autodiff
       │                                                      │
       └──────────────────── verify ◀─────────────────────────┘

Test 1: Reverse Mode Scalar Differentiation

using Enzyme

# Polynomial differentiation
f(x) = x^2 + 2x + 1
∂f_∂x = autodiff(Reverse, f, Active, Active(3.0))[1][1]
@assert ∂f_∂x ≈ 8.0  # 2x + 2 at x=3

Test 2: Forward Mode with Primal

using Enzyme

g(x) = exp(x) * sin(x)
primal, derivative = autodiff(ForwardWithPrimal, g, Duplicated(1.0, 1.0))
# derivative = exp(x)(sin(x) + cos(x)) at x=1
@assert derivative ≈ exp(1.0) * (sin(1.0) + cos(1.0))

Test 3: GPU Kernel Differentiation (julia-gpu-kernels provides)

using CUDA, Enzyme

# Kernel from julia-gpu-kernels agent
function saxpy_kernel!(Y, a, X)
    i = threadIdx().x
    Y[i] += a * X[i]
    return nothing
end

# enzyme-autodiff differentiates
function grad_saxpy!(Y, dY, a, X, dX)
    autodiff_deferred(Reverse, saxpy_kernel!,
        Const,
        Duplicated(Y, dY),
        Active(a),
        Duplicated(X, dX))
    return nothing
end

# julia-tempering provides reproducible seed
seed = 42
CUDA.seed!(seed)
X = CUDA.rand(Float32, 256)
Y = CUDA.zeros(Float32, 256)
dY = CUDA.ones(Float32, 256)
dX = CUDA.zeros(Float32, 256)

@cuda threads=256 grad_saxpy!(Y, dY, 2.0f0, X, dX)
@assert all(Array(dX) .≈ 2.0f0)  # ∂(aX)/∂X = a

Test 4: Reproducibility Verification

using Enzyme, Random

# julia-tempering seed ensures reproducibility
function reproducible_test(seed::UInt64)
    Random.seed!(seed)
    x = randn()
    
    f(x) = x^3 - 2x^2 + x
    grad = autodiff(Reverse, f, Active, Active(x))[1][1]
    
    # Derivative: 3x² - 4x + 1
    expected = 3x^2 - 4x + 1
    return (x=x, grad=grad, expected=expected, match=isapprox(grad, expected))
end

# Same seed → same results across agents
result = reproducible_test(0x7f4a3c2b1d0e9a8f)
@assert result.match

Triad Verification Matrix

Test	julia-gpu-kernels	enzyme-autodiff	julia-tempering
Scalar AD	-	Reverse/Forward	RNG seed
Array AD	-	Duplicated	Array seed
GPU kernel	@cuda kernel	autodiff_deferred	CUDA.seed!
Batched	-	BatchDuplicated	Batch seeds
Custom rules	Complex kernel	EnzymeRules	Deterministic tape

Agent Communication Protocol

# Message format between agents
struct TriadMessage
    from::Symbol      # :gpu_kernels, :enzyme, :tempering
    to::Symbol
    payload::Any
    seed::UInt64      # For reproducibility
end

# Example flow
msg1 = TriadMessage(:tempering, :gpu_kernels, seed, seed)
msg2 = TriadMessage(:gpu_kernels, :enzyme, kernel_fn, seed)
msg3 = TriadMessage(:enzyme, :tempering, gradients, seed)  # Verification

Scientific Skill Interleaving

This skill connects to the K-Dense-AI/claude-scientific-skills ecosystem:

Autodiff

jax [○] via bicomodule
- Hub for autodiff/ML

Bibliography References

general: 734 citations in bib.duckdb

Cat# Integration

This skill maps to Cat# = Comod(P) as a bicomodule in the equipment structure:

Trit: 0 (ERGODIC)
Home: Prof
Poly Op: ⊗
Kan Role: Adj
Color: #26D826

GF(3) Naturality

The skill participates in triads satisfying:

(-1) + (0) + (+1) ≡ 0 (mod 3)

This ensures compositional coherence in the Cat# equipment structure.