From lllllllama-ai-paper-reproduction-skill
Diagnoses deep learning training failures like tracebacks, CUDA OOM, checkpoint loads, shape mismatches, NaN loss with conservative root-cause analysis before patching.
npx claudepluginhub lllllllama/ai-research-workflow-skillsThis skill uses the workspace's default tool permissions.
- The user provides a traceback, terminal error, or concrete training or inference failure symptom.
Diagnoses ML/AI failures like OOM, NaN, divergence, crashes, bad throughput, wrong outputs, and dependency conflicts using grounded framework docs and citations.
Provides systematic debugging framework for root cause analysis after 2+ failed fixes, complex failures, intermittent bugs, and circular debugging.
Performs root cause analysis for bugs by tracing errors through code, analyzing stack traces, forming and testing hypotheses, then hands off to fix. Auto-triggers on stack traces.
Share bugs, ideas, or general feedback.
debug_outputs/DIAGNOSIS.mddebug_outputs/PATCH_PLAN.mddebug_outputs/status.jsonUse references/debug-policy.md and the shared references/research-pitfall-checklist.md.