From zenml-io-skills
Migrate Metaflow flows and Outerbounds-flavored Metaflow projects to idiomatic ZenML pipelines. Handles concept mapping (FlowSpec->pipeline, @step->@step, self.* artifacts->explicit returns and inputs), code translation for Parameters, IncludeFile, Config, self.next transitions, branch/join, foreach, scheduling, retry/resource/dependency decorators, and flags unsupported or high-risk patterns (@catch, merge_artifacts, resume and checkpoint semantics, recursion, event triggers, @batch) for human review. Use this skill whenever the user mentions Metaflow migration, converting FlowSpec code, porting flows from Metaflow or Outerbounds, replacing Metaflow orchestration with ZenML, or asks how a Metaflow concept maps to ZenML -- even if they don't explicitly say "migrate". Also use when they paste FlowSpec code or describe workflows using Metaflow terminology (self.next, foreach, current, Parameter, IncludeFile, Config, @catch, @kubernetes, @batch, Runner, Deployer) in a ZenML context. If the user just asks a quick conceptual question ("what's the ZenML equivalent of merge_artifacts?"), answer it directly from the concept map -- no need to run the full migration workflow.
npx claudepluginhub joshuarweaver/cascade-ai-ml-engineering --plugin zenml-io-skillsThis skill uses the workspace's default tool permissions.
This skill translates Metaflow flows into idiomatic ZenML pipelines. It handles the full migration workflow: analyzing `FlowSpec` code, classifying each pattern, translating what maps cleanly, flagging what needs redesign, and producing a working ZenML project.
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
This skill translates Metaflow flows into idiomatic ZenML pipelines. It handles the full migration workflow: analyzing FlowSpec code, classifying each pattern, translating what maps cleanly, flagging what needs redesign, and producing a working ZenML project.
Metaflow and ZenML are deceptively close cousins. Both talk about steps, artifacts, local vs remote execution, and moving the same code between environments. But they tell that story in different ways:
FlowSpec class, @step methods, self.next(...) transitions, and self.* assignments that become persisted artifacts.@pipeline function, standalone @step functions, and explicit step inputs and outputs that become typed, versioned artifacts.So this is not a rename-the-primitives migration. The dangerous cases are the ones that still "look right" after a naive rewrite but silently change behavior: join semantics, foreach, merge_artifacts, @catch, resume/checkpoint behavior, conditional branching, recursion, and platform-specific decorators like @batch.
Every Metaflow concept falls into one of these categories:
| Type | Meaning | Action |
|---|---|---|
| Direct | Clean 1:1 mapping exists | Translate automatically |
| Approximate | Conceptual equivalent exists but semantics differ | Translate with caveats noted in the migration report |
| Absent | No safe ZenML equivalent exists | Flag for human review with redesign suggestions |
See references/concept-map.md for the full mapping tables.
Ask the user for their Metaflow flow files, supporting modules, configuration files, and any deployment/runtime commands they currently use. Read everything before writing code.
For each flow, identify:
FlowSpec class namestart and end stepsself.next(...) transitionforeachself.<name> = ... assignmentmerge_artifacts(inputs)foreach, self.input, self.indexParameterIncludeFileConfig@retry@catch@timeout@resources@batch@kubernetes@conda, @pypi, @conda_base@environment@secrets@card@schedule@trigger, @trigger_on_finish@project@checkpoint--with <decorator> overlayscurrentmetaflow.clientRunnerDeployerresumemetaflow.S3@docker@gpu_profileIf the user gives you only a quick conceptual question, answer from the concept map and stop there. Use the full migration workflow only when there is real code or a real migration design problem to solve.
For each pattern from Phase 1, classify it as direct, approximate, or absent. Use the quick guide below plus the detailed tables in references/concept-map.md and references/gaps-and-flags.md.
Direct translations (translate automatically):
self.next(self.a) chains@step method logic -> ZenML @stepParameter values -> pipeline parameters@retry -> StepRetryConfigApproximate translations (translate with caveats):
FlowSpec -> @pipelineself.* artifacts -> explicit step returns and downstream inputsforeach -> @pipeline(dynamic=True) plus .map() and explicit reducer/join steps; manual loops may also need .load() for decisions and .chunk(idx) for DAG wiring@resources -> ResourceSettings@kubernetes -> Kubernetes orchestrator or step operator settings@conda / @pypi / Fast Bakery -> DockerSettings and container-image design@schedule -> Schedule(...), with target orchestrator support and cron semantics called out explicitlylocal, local_docker, kubernetes, sagemaker, vertex, azureml)Config -> YAML config / .with_options(config_file=...)current -> get_step_context() for narrow step/run metadata lookup only; broader current.* usage must be flaggedmetaflow.client -> zenml.client.Client only for limited lineage/artifact lookup; richer history traversal should be flaggedAbsent / must flag for review:
@catchmerge_artifactsresume semantics@checkpoint@batch as a direct portable equivalent@timeout semantics@trigger / @trigger_on_finishcurrent.* stateBefore writing migration code, summarize the flow like this:
"Here's what I found in your Metaflow flow:
- Direct translations (will migrate cleanly): [list]
- Approximate translations (will work but with caveats): [list]
- Needs redesign (cannot be auto-migrated safely): [list with explanation]
Shall I proceed with the migration?"
If there are HIGH-severity flags, explain them concretely in story form: what the Metaflow flow currently does, where the behavior lives, why ZenML cannot preserve it directly, and what redesign path is most honest.
Translate the Metaflow flow into a ZenML project. Follow these conventions strictly.
Every migrated project MUST use this layout:
migrated_pipeline/
├── steps/ # One file per step
│ ├── extract.py
│ ├── transform.py
│ └── load.py
├── pipelines/
│ └── my_pipeline.py # Pipeline definition
├── materializers/ # Custom materializers if needed
├── configs/
│ ├── dev.yaml
│ └── prod.yaml
├── run.py # CLI entry point (argparse, not click)
├── README.md
└── pyproject.toml
Key rules:
steps/run.py uses argparsepyproject.toml should use requires-python = ">=3.12" and a current ZenML dependency appropriate for the target environmentconfigs/dev.yaml and configs/prod.yamlREADME.md that explains what changed, how to run, and what still needs manual attentionzenml init at the project rootFor each Metaflow step, apply the right translation. See references/code-patterns.md for side-by-side examples.
Core rule: move step logic out of the FlowSpec class and into standalone @step functions. Replace implicit self.* state with explicit function returns and typed inputs.
# Metaflow
class MyFlow(FlowSpec):
@step
def start(self):
self.x = 1
self.next(self.end)
# ZenML
@step
def start() -> int:
return 1
@step
def end(x: int) -> None:
print(x)
@pipeline
def my_pipeline() -> None:
x = start()
end(x)
self.* artifacts -> explicit artifacts:
# Metaflow
self.features = build_features(self.raw)
# ZenML
@step
def build_features_step(raw: list[int]) -> list[int]:
return build_features(raw)
Parameters -> pipeline parameters:
# Metaflow
class TrainFlow(FlowSpec):
alpha = Parameter("alpha", default=0.1)
# ZenML
@pipeline
def train_pipeline(alpha: float = 0.1) -> None:
...
Retries -> StepRetryConfig:
@step(retry=StepRetryConfig(max_retries=3, delay=60, backoff=2))
def flaky_step() -> None:
...
Scheduling -> Schedule(...):
from zenml.config.schedule import Schedule
schedule = Schedule(cron_expression="0 2 * * *")
my_pipeline.with_options(schedule=schedule)()
Always note that scheduling support depends on the orchestrator. Check the scheduling table in references/concept-map.md.
When a pattern is close but not identical, keep the generated code honest with short inline comments:
@step
def join_results(left_score: float, right_score: float) -> float:
# Migration note: Metaflow join steps can rely on implicit artifact
# propagation. ZenML requires the join contract to be explicit, so all
# branch outputs needed downstream are listed here directly.
return max(left_score, right_score)
Approximation comments should be short and actionable. Put the long explanation in the migration report, not in the code.
Never silently approximate absent patterns. Instead:
# TODO(migration): comment in the generated code# TODO(migration): UNSUPPORTED -- original flow used @catch to convert step
# failure into a successful downstream continuation path. ZenML has no direct
# equivalent. Consider returning an explicit Result/Error envelope from the step
# or splitting the recovery logic into a separate pipeline.
@step
def recovery_wrapper(...) -> ...:
...
After generating the ZenML project, produce a MIGRATION_REPORT.md in the project root:
# Migration Report: [Metaflow Flow] -> [ZenML Pipeline]
## Summary
- **Source**: Metaflow flow `[FlowSpec name]`
- **Target**: ZenML pipeline `[pipeline_name]`
- **Steps migrated**: X direct, Y approximate, Z flagged
## Direct Translations
| Metaflow Pattern | ZenML Equivalent | Notes |
|---|---|---|
| `@retry` on `train` | `StepRetryConfig` | Clean translation |
## Approximate Translations
| Metaflow Pattern | ZenML Equivalent | What Changed |
|---|---|---|
| `self.features` artifact propagation | explicit step outputs | downstream dependencies are now explicit |
| `foreach` fan-out | dynamic pipeline `.map()` | experimental and orchestrator-limited |
## Flagged for Review
| Metaflow Pattern | Severity | Issue | Suggested Redesign |
|---|---|---|---|
| `@catch` on `score_model` | HIGH | no direct placeholder-success behavior | return explicit error envelope |
| `merge_artifacts(inputs)` | HIGH | no implicit merge primitive | write explicit conflict resolution logic |
## Control-Flow Redesign Notes
[Explain branch/join, foreach, conditionals, or recursion changes.]
## Environment and Compute Mapping
[Explain dependency, Docker, step-operator, and resource changes.]
## Resume and Recovery Semantics
- **Original**: [How resume/checkpoint behaved in Metaflow]
- **Migrated**: [How caching/artifact reuse behaves in ZenML]
- **Important difference**: [Why this is approximate, not exact]
## What's NOT Migrated
[List unsupported decorators, platform features, or manual follow-ups.]
## What You Get for Free After Migration
- typed, versioned artifacts
- lineage and caching
- stack abstraction
- Model Control Plane
- service connectors
- pipeline deployments
## Recommended Next Steps
1. Run `zenml-quick-wins`
2. Install the ZenML docs MCP server
3. Review the flagged redesign items
4. Use `zenml-pipeline-authoring` for deeper customization
Always include the "Resume and Recovery Semantics" section when the source flow used resume, @checkpoint, @catch, or complex retry behavior.
After migration, always include a next-steps section in the report and summarize it to the user.
zenml-quick-winsAlways suggest this first:
"Now that the migration is done, I'd recommend running the
zenml-quick-winsskill to add metadata logging, experiment tracking, alerters, secrets, and other production features."
Use current official ZenML docs when suggesting follow-up reading:
https://docs.zenml.io/concepts/steps_and_pipelines/dynamic_pipelineshttps://docs.zenml.io/concepts/steps_and_pipelines/schedulinghttps://docs.zenml.io/concepts/artifacts/materializershttps://docs.zenml.io/concepts/deploymenthttps://docs.zenml.io/concepts/service_connectorshttps://docs.zenml.io/concepts/stack_componentshttps://docs.zenml.io/concepts/models"For easier doc-grounded help while you work, you can install the ZenML docs MCP server:
claude mcp add zenmldocs --transport http https://docs.zenml.io/~gitbook/mcp"
When there are 2 or more HIGH-severity flags, generate a ready-to-send Slack message for zenml.io/slack that includes:
If the migration surfaces a real missing ZenML capability, offer to open an issue on zenml-io/zenml with the blocked Metaflow pattern, the attempted workaround, and why the gap matters.
/simplifyAlways suggest running /simplify on the generated code after migration. Migration output often carries extra comments, duplicated plumbing, or defensive wrappers that can be cleaned up once the user has reviewed the semantics.
zenml-pipeline-authoringFor deeper follow-up work, recommend zenml-pipeline-authoring for:
These are the places where users most easily get surprised after a migration.
self.* artifacts != explicit step outputsMetaflow lets a step quietly create many persisted artifacts just by assigning to self.<name>. ZenML persists what you explicitly return. If you forget to return something in ZenML, the downstream step will not magically find it later.
Metaflow joins can inherit artifacts implicitly and resolve ambiguity with merge_artifacts(inputs). ZenML has no equivalent "carry forward whatever is unambiguous" rule. The join contract has to be written out by hand.
Metaflow can decide graph shape at step runtime with self.next(...). ZenML static pipelines decide structure when the pipeline function runs. Runtime-dependent branching and fan-out generally require @pipeline(dynamic=True). As of the current docs, dynamic pipelines are supported on local, local_docker, kubernetes, sagemaker, vertex, and azureml, but still carry important feature and runtime limitations that should be called out in the migration report.
Metaflow resume works by step identity and prior run state. ZenML caching works by code, inputs, settings, and artifact lineage. They both help you avoid re-running work, but they are not semantic twins.
Metaflow often expresses dependencies as decorators like @conda, @pypi, or Outerbounds baking workflows. ZenML expects you to think in terms of Docker images, stack components, and step runtime environments.
| Anti-pattern | Why it's wrong | What to do instead |
|---|---|---|
Keeping a FlowSpec class and sprinkling ZenML decorators on methods | ZenML steps should be standalone callables with explicit inputs/outputs | Extract step logic into functions and rebuild the DAG in a @pipeline |
Translating self.* to module-level mutable state | Loses artifact persistence and lineage | Return typed values from steps and pass them downstream explicitly |
Silently replacing merge_artifacts(inputs) with "take one branch" | Changes join behavior | Write explicit merge/conflict logic and flag it |
Rewriting foreach as a plain Python for loop without calling out the semantic change | Loses orchestrated fan-out, observability, and parallelism | Use dynamic pipelines where supported, or flag the redesign |
Pretending @catch is just try/except | Metaflow changes pipeline failure semantics | Return explicit error objects or redesign the failure boundary |
Treating resume as identical to ZenML caching | They decide reuse differently | Explain the difference in the migration report |
Mapping @batch directly to a generic remote stack | Hides real compute and orchestration differences | Flag as redesign and choose the target compute model explicitly |
Assuming current.* metadata always has a ZenML twin | ZenML context is narrower | Replace with explicit inputs, metadata logging, or step context where possible |
| Copying Outerbounds deploy semantics decorator-for-decorator | The control plane is different | Treat deployment and serving as redesign work using ZenML deployments/model deployers |
foreach, Parameters, IncludeFile, retries, compute, and runtime APIsFor questions beyond the migration surface itself, use the current ZenML documentation at https://docs.zenml.io.