šØ EXECUTION NOTICE FOR CLAUDE
When you invoke this command via SlashCommand, the system returns THESE INSTRUCTIONS below.
YOU are the executor. This is NOT an autonomous subprocess.
- ā
The phases below are YOUR execution checklist
- ā
YOU must run each phase immediately using tools (Bash, Read, Write, Edit, TodoWrite)
- ā
Complete ALL phases before considering this command done
- ā DON't wait for "the command to complete" - YOU complete it by executing the phases
- ā DON't treat this as status output - it IS your instruction set
Immediately after SlashCommand returns, start executing Phase 0, then Phase 1, etc.
See @CLAUDE.md section "SlashCommand Execution - YOU Are The Executor" for detailed explanation.
Available Skills
This commands has access to the following skills from the ml-training plugin:
- cloud-gpu-configs: Platform-specific configuration templates for Modal, Lambda Labs, and RunPod with GPU selection guides
- cost-calculator: Cost estimation scripts and tools for calculating GPU hours, training costs, and inference pricing across Modal, Lambda Labs, and RunPod platforms. Use when estimating ML training costs, comparing platform pricing, calculating GPU hours, budgeting for ML projects, or when user mentions cost estimation, pricing comparison, GPU budgeting, training cost analysis, or inference cost optimization.
- example-projects: Provides three production-ready ML training examples (sentiment classification, text generation, RedAI trade classifier) with complete training scripts, deployment configs, and datasets. Use when user needs example projects, reference implementations, starter templates, or wants to see working code for sentiment analysis, text generation, or financial trade classification.
- integration-helpers: Integration templates for FastAPI endpoints, Next.js UI components, and Supabase schemas for ML model deployment. Use when deploying ML models, creating inference APIs, building ML prediction UIs, designing ML database schemas, integrating trained models with applications, or when user mentions FastAPI ML endpoints, prediction forms, model serving, ML API deployment, inference integration, or production ML deployment.
- monitoring-dashboard: Training monitoring dashboard setup with TensorBoard and Weights & Biases (WandB) including real-time metrics tracking, experiment comparison, hyperparameter visualization, and integration patterns. Use when setting up training monitoring, tracking experiments, visualizing metrics, comparing model runs, or when user mentions TensorBoard, WandB, training metrics, experiment tracking, or monitoring dashboard.
- training-patterns: Templates and patterns for common ML training scenarios including text classification, text generation, fine-tuning, and PEFT/LoRA. Provides ready-to-use training configurations, dataset preparation scripts, and complete training pipelines. Use when building ML training pipelines, fine-tuning models, implementing classification or generation tasks, setting up PEFT/LoRA training, or when user mentions model training, fine-tuning, classification, generation, or parameter-efficient tuning.
- validation-scripts: Data validation and pipeline testing utilities for ML training projects. Validates datasets, model checkpoints, training pipelines, and dependencies. Use when validating training data, checking model outputs, testing ML pipelines, verifying dependencies, debugging training failures, or ensuring data quality before training.
To use a skill:
!{skill skill-name}
Use skills when you need:
- Domain-specific templates and examples
- Validation scripts and automation
- Best practices and patterns
- Configuration generators
Skills provide pre-built resources to accelerate your work.
Security Requirements
CRITICAL: All generated files must follow security rules:
@docs/security/SECURITY-RULES.md
Key requirements:
- Never hardcode API keys or secrets
- Use placeholders:
your_service_key_here
- Protect
.env files with .gitignore
- Create
.env.example with placeholders only
- Document key acquisition for users
Arguments: $ARGUMENTS
Goal: Generate production-ready training configuration including TrainingArguments, hyperparameters, and train.py script for the specified training type.
Core Principles:
- Detect existing project structure before generating configs
- Use appropriate defaults based on training type
- Generate framework-agnostic configurations when possible
- Validate compatibility with detected ML frameworks
Phase 1: Discovery
Goal: Understand project context and training requirements
Actions:
- Parse $ARGUMENTS to extract training type (classification/generation/fine-tuning)
- Detect ML framework in use (PyTorch, TensorFlow, JAX)
- Check for existing training scripts or configurations
- Example: !{bash ls train.py training_config.yaml config/ 2>/dev/null}
- Identify project structure and data locations
Phase 2: Validation
Goal: Verify training type and environment compatibility
Actions:
- Validate training type is one of: classification, generation, fine-tuning
- Check if required dependencies are available
- Example: !{bash python -c "import transformers; import torch; import datasets" 2>&1}
- Load existing configs if present to understand patterns
- Identify GPU/CPU availability for hardware-specific settings
Phase 3: Configuration Design
Goal: Architect training configuration with optimal hyperparameters
Actions:
Task(description="Design training configuration", subagent_type="training-architect", prompt="You are the training-architect agent. Create a comprehensive training configuration for $ARGUMENTS.
Context:
- Training type: Extract from $ARGUMENTS (classification/generation/fine-tuning)
- Detected framework: Based on discovery phase findings
- Project structure: Based on codebase analysis
Requirements:
- Generate TrainingArguments configuration with appropriate hyperparameters
- Set learning rate, batch size, epochs based on training type
- Configure evaluation strategy and checkpointing
- Include mixed precision training settings (fp16/bf16)
- Set up gradient accumulation if needed
- Configure warmup steps and scheduler
- Add logging and early stopping parameters
Training Type Specific Settings:
- Classification: CrossEntropyLoss, accuracy metrics, class weights
- Generation: Language modeling loss, perplexity metrics, generation parameters
- Fine-tuning: LoRA/QLoRA configs, adapter settings, freeze layers
Deliverables:
- training_config.yaml - Complete TrainingArguments configuration
- train.py - Training script with data loading, model setup, trainer initialization
- hyperparameters.json - Searchable hyperparameter ranges for tuning
- README-TRAINING.md - Documentation on running training and tuning parameters
Follow best practices for reproducibility, mixed precision, and gradient checkpointing.")
Phase 4: File Generation
Goal: Write configuration files to project
Actions:
- Write training_config.yaml to project root or config/ directory
- Write train.py script with proper imports and setup
- Write hyperparameters.json for reference
- Create README-TRAINING.md with usage instructions
- Ensure all files follow project conventions
Phase 5: Verification
Goal: Validate generated configurations
Actions:
- Check that all required files were created
- Validate YAML/JSON syntax
- Example: !{bash python -c "import yaml; yaml.safe_load(open('training_config.yaml'))"}
- Verify train.py has no syntax errors
- Example: !{bash python -m py_compile train.py}
- Check that hyperparameters are in valid ranges
Phase 6: Summary
Goal: Report generated configuration and next steps
Actions:
- Summarize created files and their locations
- Display key hyperparameters set for the training type
- Provide command to start training
- Suggest next steps:
- Review and adjust hyperparameters for your dataset
- Prepare dataset using /ml-training:prepare-dataset command
- Run training with: python train.py --config training_config.yaml
- Monitor training with TensorBoard or wandb
- Experiment with hyperparameter tuning ranges provided