From jeremylongshore-claude-code-plugins-plus-skills
Profiles inference latency operations in ML deployments, providing guidance, code, and configs for model serving, MLOps pipelines, monitoring, and production optimization.
npx claudepluginhub jeremylongshore/claude-code-plugins-plus-skills --plugin langchain-py-packThis skill is limited to using the following tools:
This skill provides automated assistance for inference latency profiler tasks within the ML Deployment domain.
Optimizes ML inference latency via model compression, distillation, pruning, quantization, caching strategies, and edge deployment patterns.
Guides streaming inference setup for ML deployment with step-by-step instructions, production-ready code, configurations, and best practices for model serving, MLOps pipelines, monitoring, and optimization.
Guides MLOps workflows for ML model deployment: readiness checklists, serving infrastructure (FastAPI, SageMaker, Triton), inference optimization, versioning, A/B testing, drift detection, retraining, and monitoring.
Share bugs, ideas, or general feedback.
This skill provides automated assistance for inference latency profiler tasks within the ML Deployment domain.
This skill activates automatically when you:
Example: Basic Usage Request: "Help me with inference latency profiler" Result: Provides step-by-step guidance and generates appropriate configurations
| Error | Cause | Solution |
|---|---|---|
| Configuration invalid | Missing required fields | Check documentation for required parameters |
| Tool not found | Dependency not installed | Install required tools per prerequisites |
| Permission denied | Insufficient access | Verify credentials and permissions |
Part of the ML Deployment skill category. Tags: mlops, serving, inference, monitoring, production