From zenml-io-skills
Migrate Kedro pipelines and projects to idiomatic ZenML pipelines. Handles concept mapping (node->step, Pipeline->pipeline, Data Catalog->explicit boundary steps plus artifacts, params:->typed parameters), catalog analysis, code translation, hooks/runners/deployment mapping, and flags unsupported patterns (transcoding, dataset lifecycle hooks, namespace remapping, SharedMemoryDataset, slicing semantics) for human review. Use this skill whenever the user mentions Kedro migration, converting a Kedro project to ZenML, porting Kedro pipelines, replacing Kedro orchestration or deployment plugins with ZenML, or asks how a Kedro concept maps to ZenML -- even if they do not explicitly say "migrate". Also use when the user pastes `catalog.yml`, `parameters.yml`, `pipeline_registry.py`, node code, hook code, or describes a workflow using Kedro terminology such as node, pipeline, Data Catalog, `params:`, namespace, modular pipeline, runner, `MemoryDataset`, or transcoding in a ZenML context. If the user just asks a quick conceptual question ("what is the ZenML equivalent of `MemoryDataset`?"), answer it directly from the concept map -- no need to run the full migration workflow.
npx claudepluginhub joshuarweaver/cascade-ai-ml-engineering --plugin zenml-io-skillsThis skill uses the workspace's default tool permissions.
This skill translates Kedro projects into idiomatic ZenML pipelines. It handles the full migration workflow: analyze the Kedro project, classify each pattern as direct / approximate / absent, translate what maps cleanly, flag what needs redesign, and produce a working ZenML project plus a clear migration report.
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
This skill translates Kedro projects into idiomatic ZenML pipelines. It handles the full migration workflow: analyze the Kedro project, classify each pattern as direct / approximate / absent, translate what maps cleanly, flag what needs redesign, and produce a working ZenML project plus a clear migration report.
Kedro and ZenML are both Python-first workflow systems, so the business logic inside node functions often survives migration surprisingly well. The hard part is not the node code. The hard part is the contract around the node code.
Kedro uses the Data Catalog as the central registry of datasets, storage details, credentials, versioning, and sometimes representation tricks such as transcoding. ZenML does not have a DataCatalog equivalent. ZenML treats internal handoffs as typed artifacts and expects external reads and writes to be made explicit in step code, materializers, stack settings, and secrets.
So this migration is not a rename exercise. It is mostly a careful rewrite of:
parameters.yml and credentials.yml move into ZenML parameters, YAML config, secrets, and stack settingsEvery Kedro concept falls into one of these categories:
| Type | Meaning | Action |
|---|---|---|
| Direct | Clean 1:1 mapping exists | Translate automatically |
| Approximate | Similar intent exists but semantics differ | Translate with caveats noted in the migration report |
| Absent | No ZenML equivalent exists | Flag for human review with redesign suggestions |
See references/concept-map.md for the full tables.
Ask the user for the Kedro project files. Read them before writing any code. Prefer this order because it reveals the real contract of the system:
conf/base/catalog.yml and any environment-specific catalog filesconf/base/parameters.yml and related parameter filesconf/local/credentials.yml references if presentnodes.py and related modules)pipeline.py, modular pipeline factories)pipeline_registry.pysettings.pykedro run, Airflow/Kubeflow/Vertex/Docker plugins, CI entrypoints)For each project, inventory the following before deciding how to migrate it:
params: references? Are they simple scalars, nested config objects, or runtime overrides?before_node_run, after_node_run, on_node_error, dataset lifecycle hooks, command hooks, or catalog hooks?--from-nodes, --to-nodes, --tags, --from-inputs, --to-outputs, or --only-missing-outputs?SequentialRunner, ParallelRunner, ThreadRunner, SharedMemoryDataset, or other memory-sensitive behavior?kedro-mlflow / Kedro-Viz?After Phase 1, classify everything as direct / approximate / absent using references/concept-map.md and references/gaps-and-flags.md.
Direct or nearly direct translations (usually safe to generate automatically):
@stepAnnotated[...]Pipeline -> @pipelineparameters.yml + params: -> typed pipeline / step parametersApproximate translations (generate, but explain what changed):
pandas.CSVDataset, ParquetDataset, JSONDataset, ExcelDataset -> explicit loader/exporter boundary stepsversioned: true -> ZenML artifact versioning, explicit external version handling, or bothParallelRunner -> orchestrator-managed parallelism with isolated stepskedro-mlflow -> ZenML experiment trackerkedro-docker -> DockerSettings and stack-driven containerizationAbsent / redesign required (must be flagged):
DataCatalog as a global dataset registrydataset@pandas, dataset@spark)SharedMemoryDataset / SharedMemoryDataCatalog--from-nodes, --to-nodes, --only-missing-outputs, etc.)Before writing code, summarize your findings for the user:
"Here's what I found in your Kedro project:
- Direct translations (will migrate cleanly): [list]
- Approximate translations (will work but with caveats): [list]
- Needs redesign (cannot be auto-migrated safely): [list]
The main migration theme is: [for example, 'catalog-driven IO becomes explicit boundary steps']. Shall I proceed with the migration?"
If there are HIGH-severity flags, explain each one concretely:
Translate the Kedro project into a ZenML project. Follow these conventions strictly.
Every migrated project MUST use this layout:
migrated_pipeline/
├── steps/ # One file per step
│ ├── load_customers.py
│ ├── transform_features.py
│ └── export_report.py
├── pipelines/
│ └── my_pipeline.py
├── materializers/ # Only when truly needed
├── configs/
│ ├── dev.yaml
│ └── prod.yaml
├── run.py # argparse, not click
├── README.md
└── pyproject.toml
Key rules:
steps/run.py uses argparsepyproject.toml should use requires-python = ">=3.12" and zenml>=0.94.1configs/dev.yaml and configs/prod.yamlREADME.mdzenml init at the project rootSee references/code-patterns.md for the concrete side-by-side examples. Use these rules consistently:
Pure nodes become steps
@stepAnnotated[...] when stable output names matterCatalog-driven IO becomes boundary steps
params: becomes explicit typed parameters
params:threshold into a pipeline or step parameterVersioning must be decided, not assumed
Hooks only map partially
@step(on_success=...) and @step(on_failure=...) are only partial substitutesNamespaces and modular pipelines become explicit composition
Runners and deployment plugins become stack design
Keep migration comments short and actionable:
# Migration note: for brief inline caveats# TODO(migration): for required manual follow-upDo not hide major semantic differences in comments alone. They must also appear in the migration report.
When the migration is approximate, explain the difference right at the point of use:
@step
def load_orders(path: str) -> pd.DataFrame:
# Migration note: Kedro previously loaded this dataset through catalog.yml
# with credentials and versioning managed outside the node code. In ZenML
# that boundary is explicit here and configured via parameters + secrets.
return pd.read_csv(path)
Never silently approximate patterns with no real ZenML equivalent. Instead:
# TODO(migration): comment in generated codeMIGRATION_REPORT.md# TODO(migration): UNSUPPORTED -- this Kedro project relied on transcoding
# (`features@pandas` and `features@spark`) for the same logical dataset.
# ZenML has no equivalent hidden representation switch. Pick one canonical
# artifact representation and make conversions explicit in dedicated steps.
After generating the ZenML project, produce a MIGRATION_REPORT.md in the project root:
# Migration Report: [Kedro Project] -> [ZenML Pipeline]
## Summary
- **Source**: Kedro project `[project_name]`
- **Target**: ZenML pipeline `[pipeline_name]`
- **Nodes migrated**: X direct, Y approximate, Z flagged
- **Catalog entries analyzed**: N
## Direct Translations
| Kedro Concept | ZenML Equivalent | Notes |
|---|---|---|
| node(clean_orders) | steps/clean_orders.py | Pure transform |
## Approximate Translations
| Kedro Pattern | ZenML Equivalent | What Changed |
|---|---|---|
| pandas.CSVDataset | explicit loader/exporter steps | IO boundary is now explicit |
| versioned: true | artifact versioning + optional external export versioning | File-path semantics may differ |
## Flagged for Review
| Kedro Pattern | Severity | Issue | Suggested Redesign |
|---|---|---|---|
| transcoding | HIGH | No hidden representation switching in ZenML | Use explicit conversion steps |
| before_dataset_loaded hook | HIGH | No dataset lifecycle hook equivalent | Move logic into loader step |
## Catalog Translation Summary
| Catalog Entry | Original Type | Migration Target | Notes |
|---|---|---|---|
| raw_orders | pandas.CSVDataset | load_orders step | external input |
| model | pickle.PickleDataset | artifact + exporter step | check path/version semantics |
## Configuration Migration Summary
- `parameters.yml` keys moved to: [pipeline config path(s)]
- Runtime overrides previously passed through: [Kedro mechanism]
- New ZenML config entrypoints: [dev/prod yaml files]
## Credential / Auth Migration Summary
- `credentials.yml` entries moved to ZenML secrets / env vars / service connectors
- Any manual setup still required: [list]
## Namespace / Composition Summary
- Which modular pipelines were preserved
- Which namespaces/remappings required explicit wrapper logic
## Runner / Deployment Migration Summary
- Original runner(s): [SequentialRunner / ParallelRunner / ThreadRunner]
- Original plugins: [kedro-airflow / kedro-kubeflow / kedro-vertexai / kedro-docker / kedro-mlflow]
- New ZenML stack assumptions: [orchestrator / step operator / tracking setup]
## Limitations and Key Differences
[Put the most important behavior differences here BEFORE the benefits section]
## What's NOT Migrated
[List unsupported or intentionally deferred patterns]
## What You Get for Free After Migration
- Artifact versioning and lineage
- Step caching
- Stack abstraction
- Stronger typed artifact flow
- Optional dynamic pipelines and cross-pipeline artifact reuse
- Better alignment with experiment trackers, step operators, and deployment workflows
## Recommended Next Steps
1. Run `zenml-quick-wins`
2. Install the ZenML docs MCP server
3. Use `zenml-pipeline-authoring` for deeper Docker / materializer / YAML / deployment work
4. Follow up on every HIGH-severity flag before production use
After migration is complete, always include a Recommended Next Steps section in the migration report and communicate it to the user directly.
zenml-quick-wins skillAlways suggest this first:
"Now that the migration is done, I recommend running the
zenml-quick-winsskill to add metadata logging, experiment tracking, alerters, and other production features."
For every flagged pattern, include relevant ZenML documentation links when they help:
https://docs.zenml.io/concepts/steps_and_pipelines/yaml_configurationhttps://docs.zenml.io/concepts/secretshttps://docs.zenml.io/concepts/steps_and_pipelines/dynamic_pipelineshttps://docs.zenml.io/concepts/containerizationhttps://docs.zenml.io/stacks/stack-components/step-operators/custom"For easier access to ZenML docs while you keep iterating, you can install the ZenML docs MCP server:
claude mcp add zenmldocs --transport http https://docs.zenml.io/~gitbook/mcp"
When there are HIGH-severity flags, offer to help the user ask the ZenML community for guidance. When there are 2+ HIGH-severity flags, generate a ready-to-send Slack message for zenml.io/slack that includes:
When the migration reveals a real missing ZenML capability -- not merely a different design, but a genuine gap that multiple users would benefit from -- offer to open a GitHub issue on zenml-io/zenml.
/simplifyAfter migration, always suggest running /simplify on the generated code so migration comments, wrappers, and duplication can be cleaned up.
zenml-pipeline-authoringRecommend zenml-pipeline-authoring when the user needs deeper help with:
DockerSettingsAlways mention the relevant behavior differences in the migration report.
Kedro uses a named dataset registry that hides storage details from node code. ZenML uses typed artifacts flowing between steps. That means the migration often adds explicit loader/exporter steps at the edges of the graph.
MemoryDataset != "stay in memory"Kedro can keep intermediates ephemeral. ZenML artifacts are normally persisted and versioned. If ephemerality mattered for correctness or cost, flag it.
Kedro namespaces and remapping change how datasets and parameters resolve. ZenML reuse is explicit Python composition. These are related ideas, not the same feature.
Kedro's CLI slicing changes which part of a graph runs. ZenML caching reuses outputs when inputs, code, and settings match. Do not present them as equivalents.
Kedro runners are not a direct code-level concept in ZenML. Translate them into stack design, step isolation assumptions, and resource settings.
DataCatalog abstraction on top of ZenML unless the user explicitly wants a transitional adapter and understands the tradeoff.