Router skill directing to deployment, optimization, MLOps, and monitoring guides.
Routes ML models to production by diagnosing your concern—optimization, serving, MLOps, or monitoring—then directs you to the right specialist skill for deployment, inference speed, or production observability.
/plugin marketplace add tachyon-beep/skillpacks/plugin install yzmir-ml-production@foundryside-marketplaceThis skill inherits all available tools. When active, it can use any tool Claude has access to.
deployment-strategies.mdexperiment-tracking-and-versioning.mdhardware-optimization-strategies.mdmlops-pipeline-automation.mdmodel-compression-techniques.mdmodel-serving-patterns.mdproduction-debugging-techniques.mdproduction-monitoring-and-alerting.mdquantization-for-inference.mdscaling-and-load-balancing.mdThis meta-skill routes you to the right production deployment skill based on your concern. Load this when you need to move ML models to production but aren't sure which specific aspect to address.
Core Principle: Production concerns fall into four categories. Identify the concern first, then route to the appropriate skill. Tools and infrastructure choices are implementation details, not routing criteria.
Load this skill when:
Don't use for: Training optimization (use training-optimization), model architecture selection (use neural-architectures), PyTorch infrastructure (use pytorch-engineering)
IMPORTANT: All reference sheets are located in the SAME DIRECTORY as this SKILL.md file.
When this skill is loaded from:
skills/using-ml-production/SKILL.md
Reference sheets like quantization-for-inference.md are at:
skills/using-ml-production/quantization-for-inference.md
NOT at:
skills/quantization-for-inference.md ← WRONG PATH
When you see a link like [quantization-for-inference.md](quantization-for-inference.md), read the file from the same directory as this SKILL.md.
Symptoms: "Model too slow", "inference latency high", "model too large", "need to optimize for edge", "reduce model size", "speed up inference"
When to route here:
Routes to:
Key question to ask: "Is the MODEL the bottleneck, or is it infrastructure/serving?"
Symptoms: "How to serve model", "need API endpoint", "deploy to production", "containerize model", "scale serving", "load balancing", "traffic management"
When to route here:
Routes to:
Key distinction:
Symptoms: "Track experiments", "version models", "automate deployment", "reproducibility", "CI/CD for ML", "feature store", "model registry", "experiment management"
When to route here:
Routes to:
Key distinction:
Multi-concern: Queries like "track experiments AND automate deployment" → route to BOTH skills
Symptoms: "Monitor production", "model degrading", "detect drift", "production debugging", "alert on failures", "model not working in prod", "performance issues in production"
When to route here:
Routes to:
Key distinction:
"Performance" ambiguity:
User query → Identify primary concern
Is model THE problem (size/speed)?
YES → Category 1: Model Optimization
NO → Continue
Is it about HOW to expose/deploy model?
YES → Category 2: Serving Infrastructure
NO → Continue
Is it about workflow/process/automation?
YES → Category 3: MLOps Tooling
NO → Continue
Is it about monitoring/debugging in production?
YES → Category 4: Observability
NO → Ask clarifying question
Ambiguous? → Ask ONE question to clarify concern category
Ask: "Is this inference latency (how fast predictions are), or training time?"
training-optimization (wrong pack)Ask: "What's your deployment target - cloud server, edge device, or batch processing?"
Ask: "By performance, do you mean inference speed or prediction accuracy?"
Ask: "What's the current pain point - experiment tracking, automated deployment, or both?"
Some queries span multiple categories. Route to ALL relevant skills in logical order:
| Scenario | Route Order | Why |
|---|---|---|
| "Optimize and deploy model" | 1. Optimization → 2. Serving | Optimize BEFORE deploying |
| "Deploy and monitor model" | 1. Serving → 2. Observability | Deploy BEFORE monitoring |
| "Track experiments and automate deployment" | 1. Experiment tracking → 2. Pipeline automation | Track BEFORE automating |
| "Quantize model and serve with TorchServe" | 1. Quantization → 2. Serving patterns | Optimize BEFORE serving |
| "Deploy with A/B testing and monitor" | 1. Deployment strategies → 2. Monitoring | Deploy strategy BEFORE monitoring |
Principle: Route in execution order (what needs to happen first).
ml-production covers: General serving, quantization, deployment, monitoring (universal patterns)
llm-specialist covers: LLM-specific optimization (KV cache, prompt caching, speculative decoding, token streaming)
When to use both:
Rule of thumb: LLM-specific optimization stays in llm-specialist. General production patterns use ml-production.
Clear boundary:
"Too slow" disambiguation:
pytorch-engineering covers: Foundation (distributed training, profiling, memory management)
ml-production covers: Production-specific (serving APIs, deployment patterns, MLOps)
When to use both:
| Query | Wrong Route | Correct Route | Why |
|---|---|---|---|
| "Model too slow in production" | Immediately to quantization | Ask: inference or training? Then model vs infrastructure? | Could be serving/batching issue, not model |
| "Deploy with Kubernetes" | Defer to Kubernetes docs | Category 2: serving-patterns or deployment-strategies | Kubernetes is tool choice, not routing concern |
| "Set up MLOps" | Route to one skill | Ask about specific pain point, might be both tracking AND automation | MLOps spans multiple skills |
| "Performance issues" | Assume accuracy | Ask: speed or accuracy? | Performance is ambiguous |
| "We use TorchServe" | Skip routing | Still route to serving-patterns | Tool choice doesn't change routing |
| Excuse | Reality |
|---|---|
| "User mentioned Kubernetes, route to deployment" | Tools are implementation details. Route by concern first. |
| "Slow = optimization, route to quantization" | Slow could be infrastructure. Clarify model vs serving bottleneck. |
| "They said deploy, must be serving-patterns" | Could need serving + deployment-strategies + monitoring. Don't assume single concern. |
| "MLOps = experiment tracking" | MLOps spans tracking AND automation. Ask which pain point. |
| "Performance obviously means speed" | Could mean accuracy. Clarify inference speed vs prediction quality. |
| "They're technical, skip clarification" | Technical users still benefit from clarifying questions. |
If you catch yourself thinking ANY of these, STOP and clarify:
When in doubt: Ask ONE clarifying question. 10 seconds of clarification prevents minutes of wrong-skill loading.
| User Concern | Ask Clarifying | Route To | Also Consider |
|---|---|---|---|
| Model slow/large | Inference or training? | Optimization skills | If inference, check serving too |
| Deploy model | Target (cloud/edge/batch)? | Serving patterns | Deployment strategies for gradual rollout |
| Production monitoring | Proactive or reactive? | Monitoring OR debugging | Both if setting up + fixing issues |
| MLOps setup | Tracking or automation? | Experiment tracking AND/OR automation | Often both needed |
| Performance issues | Speed or accuracy? | Optimization OR observability | Depends on clarification |
| Scale serving | Traffic pattern? | Scaling-and-load-balancing | Serving patterns if not set up yet |
Query: "I trained a model, now I need to put it in production"
Routing:
Query: "My inference is slow"
Routing:
Query: "We need better ML workflows"
Routing:
Skip ml-production when:
Red flag: If model isn't trained yet, probably don't need ml-production. Finish training first.
You've routed correctly when:
After routing, load the appropriate specialist skill for detailed guidance:
docs/plans/2025-10-30-ml-production-pack-design.mdyzmir/ai-engineering-expert/using-ai-engineeringllm-specialist/using-llm-specialist, training-optimization/using-training-optimizationCreating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.
Applies Anthropic's official brand colors and typography to any sort of artifact that may benefit from having Anthropic's look-and-feel. Use it when brand colors or style guidelines, visual formatting, or company design standards apply.
Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user asks to create a poster, piece of art, design, or other static piece. Create original visual designs, never copying existing artists' work to avoid copyright violations.