From jeremylongshore-claude-code-plugins-plus-skills
Configures distributed training setups for ML models with PyTorch, TensorFlow, or scikit-learn. Generates code, configs, and best practices for multi-node training tasks.
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin framecraftThis skill is limited to using the following tools:
This skill provides automated assistance for distributed training setup tasks within the ML Training domain.
Guides distributed training across multiple GPUs/nodes for large models: DDP, FSDP, DeepSpeed ZeRO, model/data parallelism, gradient checkpointing. Grounds advice in patterns/sharp_edges/validations refs for creation/diagnosis/review.
Builds TensorFlow model trainers with guidance on data preparation, training, hyperparameter tuning, and experiment tracking. Activates on TensorFlow trainer phrases.
Orchestrates end-to-end MLOps pipelines from data preparation, model training, validation, to deployment and monitoring. Use for ML workflow automation, DAG orchestration, and productionizing models.
Share bugs, ideas, or general feedback.
This skill provides automated assistance for distributed training setup tasks within the ML Training domain.
This skill activates automatically when you:
Example: Basic Usage Request: "Help me with distributed training setup" Result: Provides step-by-step guidance and generates appropriate configurations
| Error | Cause | Solution |
|---|---|---|
| Configuration invalid | Missing required fields | Check documentation for required parameters |
| Tool not found | Dependency not installed | Install required tools per prerequisites |
| Permission denied | Insufficient access | Verify credentials and permissions |
Part of the ML Training skill category. Tags: ml, training, pytorch, tensorflow, sklearn