Help us improve
Share bugs, ideas, or general feedback.
Share bugs, ideas, or general feedback.
Share bugs, ideas, or general feedback.
By BBuf
Runs iterative development loops with AI-driven planning and independent Codex review, transforming drafts into structured plans and refining code through continuous quality checks and gated feedback cycles.
npx claudepluginhub bbuf/kernel-pilot --plugin humanizeCancel active RLCR loop
Generate a repo-grounded idea draft via directed-swarm exploration
Generate implementation plan from draft document
Refine an annotated implementation plan and generate a QA ledger
Start iterative loop with Codex review
Selects required BitLesson entries for a specific sub-task. Use before execution for every task or sub-task.
Checks if a draft document is relevant to the current repository. Use when validating draft content for gen-plan command.
Checks plan relevance and compliance before RLCR loop. Use when validating plan files for start-rlcr-loop command.
Analyzes a plan and generates multiple-choice technical comprehension questions to verify user understanding before RLCR loop. Use when validating user readiness for start-rlcr-loop command.
Consult Codex as an independent expert. Sends a question or task to codex exec and returns the response.
Consult Gemini as an independent expert with deep web research. Sends a question or task to Gemini CLI and returns a research-backed response.
Generate a structured implementation plan from a draft document. Validates input, checks relevance, analyzes for issues, and generates a complete plan.md with acceptance criteria.
Run an autonomous Humanize Kernel Agent Loop for GPU kernel optimization: plan/refine K/R/W into task-acceptance pairs, create a clean standalone repo, research with kernel-knowledge, iterate with benchmark/profile evidence, autotune across the workload distribution, emit kernels/dispatcher/tuning decisions, maintain ledgers, and start RLCR.
Refine an annotated implementation plan into a comment-free plan and a QA ledger while preserving the gen-plan schema.
Executes bash commands
Hook triggers when Bash tool is used
Modifies files
Hook triggers on file write and edit operations
Share bugs, ideas, or general feedback.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.
GPU kernel knowledge-base, benchmarking, profiling, and optimization-loop skills for CUDA, Triton, CuTe DSL, CUTLASS, PyTorch, and Nsight Compute workflows.
Autonomous improvement engine for Claude Code. Runs an unbounded modify-verify-keep/discard loop against any mechanical metric. 10 subcommands: plan, debug, fix, security, ship, scenario, predict, learn, and reason.
Autonomous experiment loop that optimizes any file by a measurable metric. 5 slash commands, 8 evaluators, configurable loop intervals (10min to monthly).
Behavioral guidelines to reduce common LLM coding mistakes, derived from Andrej Karpathy's observations on LLM coding pitfalls
Humanize - An iterative development plugin that uses Codex to review Claude's work. Creates a feedback loop where Claude implements plans and Codex independently reviews progress, ensuring quality through continuous refinement.
Iterative artifact refinement - hone any artifact or workspace over multiple rounds using criteria-driven judge feedback, runnable evaluators, and focused directional improvements
Agent-ready playbooks for LLM serving benchmarks, capacity planning, torch-profiler triage, pipeline analysis, compute simulation, SGLang/vLLM SOTA Humanize loops, human code review, production incident triage, and model PR-history dossiers.
No model invocation
Executes directly as bash, bypassing the AI model
No model invocation
Executes directly as bash, bypassing the AI model
Share bugs, ideas, or general feedback.
An autonomous Humanize-powered GPU kernel optimization loop with a local PR-driven CUDA knowledge base, Nsight Compute report skills, and clean standalone benchmark repos.
KernelPilot is for serious CUDA kernel tuning runs where the important facts are easy to lose: which upstream PR inspired a candidate, which shape regressed, what Nsight Compute actually said, which evidence changed the next edit, and whether the candidate belongs in a framework repo or a clean experiment.
The project packages three cooperating skills:
| Skill | Role |
|---|---|
humanize-kernel-agent-loop | Turns kernel definition K, reference R, and workload distribution W into task-acceptance pairs, a standalone optimization repo, autonomous research/iteration/autotuning, correctness tests, benchmarks, ledgers, dispatcher, tuning decisions, and review-gated iteration. |
kernel-knowledge | A local PR-diff-first CUDA kernel evidence corpus. It routes by architecture, repo, topic, technique, profile symptom, operator, and DSL, then opens PR diffs, source snapshots, wiki pages, docs, and blogs as needed. |
ncu-report | Converts Nsight Compute reports into a reproducible profile digest: metrics, source counters, PM sampling, PTX/SASS hotspots, bottleneck diagnosis, and exactly one next kernel edit. |
Together they make an optimization loop that can work from a simple request:
[$humanize-kernel-agent-loop] Optimize SGLang's GEMM path for M=64, N=2048, K=2048, fp16, bias=true, and beat the current SGLang baseline by at least 10%.
The loop decides how to plan, when to query knowledge, what to profile, how to record lineage, how to scan the workload distribution, and when to ask the Humanize review gate whether another round is needed. The human should specify the target when it is ambiguous; the loop owns the rest.
knowledge/evidence/pull-bundles/.ncu-report is worth
running, then uses it to move from vague labels like "memory-bound" toward
measured bottlenecks and one concrete next edit.flowchart LR
K[Kernel definition K] --> P[Plan P = task and AC pairs]
R[Correctness reference R] --> P
W[Workload distribution W] --> P
P --> S[Clean standalone repo]
subgraph R0[Stage 1: Research]
KW[kernel-knowledge / KernelWiki]
B[Baseline and repo inspection]
RD[Research digest and recipes]
KW --> RD
B --> RD
end
subgraph I0[Stage 2: Iterate]
T[Writer executes task t_i]
E[Inspect, edit, compile, test, benchmark, profile]
V{Reviewer checks evidence vs ac_i}
T --> E --> V
V -->|blocked feedback| T
end
subgraph A0[Stage 3: Autotune]
PM[Performance map over W]
D[Shape-aware dispatcher]
TD[Tuning decisions]
PM --> D --> TD
end