From zenml-io-skills
Implements ZenML quick wins to enhance MLOps workflows. Investigates codebase and stack configuration, recommends high-priority improvements, and implements metadata logging, experiment tracking, alerts, scheduling, secrets management, tags, git hooks, HTML reports, and Model Control Plane setup. Use when: user wants to improve their ZenML setup, asks about MLOps best practices, mentions "quick wins", wants to enhance pipelines, or needs help with ZenML features like experiment tracking, alerting, scheduling, or model governance.
npx claudepluginhub joshuarweaver/cascade-ai-ml-engineering --plugin zenml-io-skillsThis skill uses the workspace's default tool permissions.
Guides users through discovering and implementing high-impact ZenML features that take ~5 minutes each. Investigates current setup, recommends priorities, and implements chosen improvements.
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
Guides users through discovering and implementing high-impact ZenML features that take ~5 minutes each. Investigates current setup, recommends priorities, and implements chosen improvements.
┌─────────────────────┐
│ 1. INVESTIGATE │ Understand current stack + codebase
└──────────┬──────────┘
▼
┌─────────────────────┐
│ 2. CONFIRM │ ⏸️ Check understanding with user
└──────────┬──────────┘
▼
┌─────────────────────┐
│ 3. GATHER CONTEXT │ ⏸️ Get additional context from user
└──────────┬──────────┘
▼
┌─────────────────────┐
│ 4. RECOMMEND │ Prioritize quick wins based on findings
└──────────┬──────────┘
▼
┌─────────────────────┐
│ 5. PREPARE │ ⏸️ Verify branch setup before changes
└──────────┬──────────┘
▼
┌─────────────────────┐
│ 6. IMPLEMENT │ Apply selected quick wins
└──────────┬──────────┘
▼
┌─────────────────────┐
│ 7. VERIFY │ Confirm implementation works
└─────────────────────┘
⏸️ = User checkpoint (uses AskUserQuestion tool)
Use subagents to gather information efficiently. This keeps verbose output out of the main conversation while enabling parallel investigation.
Spawn both subagents in parallel using a single Task tool call with multiple invocations:
Use the Task tool to launch BOTH agents simultaneously:
1. zenml-quick-wins:zenml-stack-investigator agent:
- Prompt: "Investigate the ZenML stack configuration for this project.
Run all the ZenML CLI commands to understand stacks, components,
and recent pipeline activity. Return a structured summary."
2. zenml-quick-wins:zenml-codebase-analyzer agent:
- Prompt: "Analyze the Python codebase for ZenML patterns and quick win
opportunities. Search for pipeline definitions, current feature usage,
and areas for improvement. Return a structured summary."
Both agents run concurrently and return structured summaries. Synthesize their findings before proceeding to Phase 2.
zenml-quick-wins:zenml-stack-investigator (Bash-focused, Haiku model):
zenml status, zenml stack list, zenml stack describezenml-quick-wins:zenml-codebase-analyzer (Read-only, Haiku model):
@pipeline, @step decoratorslog_metadata, tags=, Model(, HTMLString, Schedule| Pattern | Indicates | Quick Win Opportunity |
|---|---|---|
@pipeline without tags= | No tagging | #9 Tags |
@step without log_metadata | No metadata | #1 Metadata |
No HTMLString imports | No HTML reports | #11 Reports |
No Model() usage | No model governance | #12 Model Control Plane |
| Hardcoded credentials | Security risk | #7 Secrets |
No Schedule imports | Manual runs | #5 Scheduling |
If subagents are unavailable, run these commands directly.
First, activate the Python environment:
# Check for uv project
if [ -f "pyproject.toml" ] && command -v uv &> /dev/null; then
# Use "uv run zenml ..." for all commands below
echo "Using uv - prefix commands with 'uv run'"
fi
# Or activate venv/conda
source .venv/bin/activate 2>/dev/null || source venv/bin/activate 2>/dev/null || echo "No venv found"
# Verify zenml is available
zenml version
Then run the investigation commands:
# Core stack info
zenml status
zenml stack list --output=json
zenml stack describe
# Component details
zenml experiment-tracker list 2>/dev/null || echo "No experiment trackers"
zenml alerter list 2>/dev/null || echo "No alerters configured"
zenml secret list 2>/dev/null || echo "No secrets or no access"
zenml code-repository list 2>/dev/null || echo "No code repos connected"
zenml model list 2>/dev/null || echo "No models registered"
# Recent runs (check for metadata usage)
zenml pipeline runs list --size=10 --output=json 2>/dev/null
If ZenML MCP server is available, use it for deeper exploration:
# Check if zenml MCP tools are available
# If yes, use them to query pipelines, runs, artifacts
IMPORTANT: Stop and check in with the user before proceeding.
After investigation, summarize what you've learned and use the AskUserQuestion tool to confirm your understanding. This prevents wasted effort from misunderstandings.
Present your findings clearly:
Stack Configuration
Pipeline Execution Patterns
Current Feature Usage
Use AskUserQuestion with questions like:
Stack Usage:
default stack used for local development, or do you use a different one?"Pipeline Execution:
Feature Gaps:
## 📋 Here's what I understand about your ZenML setup:
**Stacks:**
- `default` - Local stack (seems to be for development)
- `aws-production` - AWS stack with SageMaker orchestrator
**Active Stack:** `default` (LocalOrchestrator, local artifact store)
**Recent Activity:**
- 47 pipeline runs in the last month
- All appear to be manual runs (no schedules detected)
- Using `training_pipeline` and `inference_pipeline`
**Current Features:**
- ✅ Basic pipeline structure
- ❌ No metadata logging detected
- ❌ No tags on pipelines
- ❌ No Model Control Plane usage
**Questions for you:**
1. Is `default` your local dev stack and `aws-production` for production?
2. Are these pipelines meant to run on a schedule, or is manual execution intentional?
3. Any other stacks or environments I should know about?
Wait for user confirmation before proceeding to Phase 3.
IMPORTANT: Gather tacit knowledge that isn't captured in code.
Before making recommendations, ask about context that affects implementation choices. Use AskUserQuestion to learn about:
## 🔍 Before I make recommendations, some questions:
**Infrastructure:**
- Any constraints I should know about (resource limits, compliance, network)?
- Anything that's worked poorly in the past with this setup?
**Development:**
- How does local development/testing work for your team?
- Any CI/CD integration with ZenML?
**Team:**
- Solo project or team? (affects things like git hooks, naming conventions)
- Any established patterns or conventions I should follow?
**Operations:**
- How do you currently learn about pipeline failures?
- Preferences for alerting (Slack, Discord, email)?
Incorporate this context into your recommendations in Phase 4.
| # | Quick Win | Complexity | Impact | Prerequisites |
|---|---|---|---|---|
| 1 | Metadata logging | ⭐ | 🔥🔥🔥 | None |
| 2 | Experiment comparison | ⭐ | 🔥🔥 | #1 + Pro |
| 3 | Autologging | ⭐⭐ | 🔥🔥 | Exp tracker |
| 4 | Slack/Discord alerts | ⭐⭐ | 🔥🔥 | Slack/Discord |
| 5 | Cron scheduling | ⭐⭐ | 🔥🔥🔥 | Orchestrator |
| 6 | Warm pools | ⭐ | 🔥🔥 | SageMaker/Vertex |
| 7 | Secrets management | ⭐⭐ | 🔥🔥🔥 | None |
| 8 | Local smoke tests | ⭐⭐ | 🔥🔥 | Docker |
| 9 | Tags | ⭐ | 🔥🔥 | None |
| 10 | Git repo hooks | ⭐⭐ | 🔥🔥🔥 | GitHub/GitLab |
| 11 | HTML reports | ⭐ | 🔥🔥 | None |
| 12 | Model Control Plane | ⭐⭐ | 🔥🔥🔥 | None |
| 13 | Parent Docker images | ⭐⭐⭐ | 🔥🔥 | Container reg |
| 14 | ZenML docs MCP server | ⭐ | 🔥🔥 | IDE with MCP |
| 15 | CLI export formats | ⭐ | 🔥 | None |
Start here (foundational):
Next tier (observability): 4. #12 Model Control Plane — Governance foundation 5. #10 Git hooks — Reproducibility 6. #3 Autologging or #11 HTML reports
Operations tier: 7. #5 Scheduling — Automation 8. #4 Alerts — Awareness 9. #8 Smoke tests — Faster iteration
Advanced: 10. #6 Warm pools, #13 Parent images — Performance optimization
Developer experience: 11. #14 ZenML docs MCP — Better IDE assistance 12. #15 CLI export — Scripting/automation
Use AskUserQuestion to ask which quick wins they want to implement based on your findings and their priorities. Present the most relevant options first based on the context gathered in previous phases.
IMPORTANT: Verify git and branch setup before making any changes.
Before implementing any quick wins, ensure the codebase is ready for changes.
# Check current branch and status
git status
git branch --show-current
# Check for uncommitted changes
git diff --stat
Use AskUserQuestion to confirm branch setup:
## 🌿 Before I make changes, let's confirm the branch setup:
**Current state:**
- Branch: `main` (or whatever branch)
- Status: [clean / X uncommitted changes]
**Questions:**
1. Should I create a feature branch for these changes? (Recommended: `feature/zenml-quick-wins`)
2. If yes, is `main` the correct base branch, or should I branch from somewhere else (e.g., `develop`)?
3. Any branch naming conventions I should follow?
# Create and checkout feature branch
git checkout -b feature/zenml-quick-wins
# Or with their naming convention
git checkout -b <their-preferred-name>
Before starting implementation, use AskUserQuestion to ask about the preferred approach:
## 📝 How would you like to proceed with implementation?
**Option A: Write a plan first (Recommended)**
I can write a detailed implementation plan as a `.md` file that you can:
- Review and share with your team before changes are made
- Use as documentation of what will be implemented
- Reference during code review
After the plan is approved, I'd recommend compacting this session to start fresh with clean context for implementation.
**Option B: Proceed directly to implementation**
I'll start implementing the selected quick wins right away.
**Questions:**
1. Would you like a written plan first, or shall we proceed directly?
2. If writing a plan: where should I save it? (e.g., `docs/zenml-quick-wins-plan.md`)
Why offer this choice:
See quick-wins-catalog.md for detailed implementation guides for each quick win.
For each quick win, follow this pattern:
- [ ] Verify prerequisites met
- [ ] Implement the change
- [ ] Test the implementation
- [ ] Document what was done
After implementing, verify with:
# Check stack configuration
zenml stack describe
# Run a test pipeline (if applicable)
python your_pipeline.py
# Verify in dashboard (ZenML Pro)
# Check metadata, tags, model versions appear correctly
If implementation fails or documentation seems outdated:
site:docs.zenml.io <topic> for latest docsFor better IDE assistance during implementation:
Claude Code (VS Code):
claude mcp add zenmldocs --transport http https://docs.zenml.io/~gitbook/mcp
Cursor:
{
"mcpServers": {
"zenmldocs": {
"transport": {
"type": "http",
"url": "https://docs.zenml.io/~gitbook/mcp"
}
}
}
}
from zenml import step, log_metadata
@step
def train_model(data):
# ... training code ...
log_metadata({"accuracy": 0.95, "loss": 0.05, "epochs": 10})
return model
from zenml import pipeline
@pipeline(tags=["experiment", "v2"])
def my_pipeline():
...
from zenml import pipeline, Model
model = Model(name="my_classifier", tags=["production"])
@pipeline(model=model)
def training_pipeline():
...