Skill

publish-to-kaggle

Publish or update a dataset on Kaggle from a local directory using the Kaggle CLI. Creates a new dataset on first publish, or pushes a new version to an existing one. Use when the user wants to "sync to Kaggle", "publish a Kaggle dataset", "update a Kaggle dataset", or similar.

npx claudepluginhub danielrosehill/claude-code-plugins --plugin loose-tasks

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Use the Kaggle CLI (`kaggle`) to publish a local directory as a new Kaggle dataset, or push a new version to one that already exists.

SKILL.md

Similar Skills

cache-components

139.3k

Guides Next.js Cache Components and Partial Prerendering (PPR): 'use cache' directives, cacheLife(), cacheTag(), revalidateTag() for caching, invalidation, static/dynamic optimization. Auto-activates on cacheComponents: true.

cache-components

mcp-builder

124.2k

Guides building MCP servers enabling LLMs to interact with external services via tools. Covers best practices, TypeScript/Node (MCP SDK), Python (FastMCP).

9 files

anthropics-skills-13

Stats

Stars0

Forks0

Last CommitApr 28, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Publish To Kaggle

Use the Kaggle CLI (kaggle) to publish a local directory as a new Kaggle dataset, or push a new version to one that already exists.

Prerequisites

kaggle CLI installed. Check with which kaggle. If missing, install via pip install kaggle (or pipx install kaggle).
API token configured. The CLI accepts either form:
- KAGGLE_API_TOKEN environment variable, or
- ~/.kaggle/access_token file (mode 600), single line containing the token.
If neither is present, ask the user for the token before continuing — do not invent one. Once provided, write it to ~/.kaggle/access_token with chmod 600 so subsequent invocations work without re-prompting:
```
mkdir -p ~/.kaggle && echo "<TOKEN>" > ~/.kaggle/access_token && chmod 600 ~/.kaggle/access_token
```
Authentication sanity check — confirm the token works before doing anything destructive:
```
kaggle datasets list --user <username> 2>&1 | head -5
```
A non-empty table = good. An auth error means re-prompt for the token.

Decide: new dataset or new version?

Look for dataset-metadata.json in the target directory.

Exists → updating an existing dataset; the id field tells you the slug (<owner>/<dataset-name>).
Missing → either creating a new dataset, or the user has an existing dataset and the metadata file just hasn't been generated yet. Ask which.

If updating, also confirm the slug exists on Kaggle:

kaggle datasets metadata -p /tmp/_kchk <owner>/<dataset-name> 2>&1

If that succeeds, the dataset is real and you're versioning. If it 404s, the slug doesn't exist yet — fall through to the "create new" path.

Path A: create a new dataset

Confirm the directory contains the files the user wants to publish, and only those. Kaggle uploads everything in the directory non-recursively by default; pass -r for recursion if subdirectories are intentional. Anything you don't want published — drafts, secrets, large irrelevant files — must be excluded before the call.
Generate the metadata template inside the target directory:
```
kaggle datasets init -p <target-dir>
```
This writes dataset-metadata.json with placeholders.
Edit dataset-metadata.json:
- title — human-readable name (≤50 chars, must be unique within the user's account).
- id — <owner>/<dataset-slug>. The slug is lowercase, hyphenated, ≤50 chars. Must not collide with an existing dataset on the account.
- licenses — usually [{"name": "CC0-1.0"}] for fully open, or whatever the user specifies. Confirm before assuming.
- keywords — optional list, helps discovery.
- subtitle, description — optional but recommended.
Show the user the final metadata file before publishing — let them sanity-check the slug, title, and license.
Create the dataset:
```
kaggle datasets create -p <target-dir>          # files at top level only
kaggle datasets create -p <target-dir> -r zip   # zip the directory tree
kaggle datasets create -p <target-dir> -r tar   # tar the directory tree
```
Use -r zip or -r tar only if there are subdirectories that need to be preserved. Otherwise omit -r.

Add -u (--public) to publish immediately as public. Without it the dataset is private; the user can flip visibility later in the Kaggle UI.
Confirm with the user that the dataset URL works: https://www.kaggle.com/datasets/<owner>/<dataset-slug>.

Path B: push a new version to an existing dataset

Ensure dataset-metadata.json is present in the target directory and the id field matches the live dataset slug. If the file is missing, fetch it:
```
kaggle datasets metadata -p <target-dir> <owner>/<dataset-name>
```
Refresh the directory contents to whatever should be in the new version. Same caveat: only what should be published belongs there.
Ask the user for a version-notes message. This is required and shows up in the dataset's version history. Examples: "Add deweathered series + cross-city panel", "Fix unit conversion in NO₂ column".
Push the new version:
```
kaggle datasets version -p <target-dir> -m "<version-notes>"
kaggle datasets version -p <target-dir> -m "<version-notes>" -r zip
```
Use -r zip/-r tar if the directory tree has meaningful subdirectories.

Add --dir-mode skip if you want to skip files that haven't changed (default is to re-upload everything in the directory). For most cases the default is fine.
Confirm by listing the dataset's versions:
```
kaggle datasets list -m --user <owner> 2>&1 | grep <dataset-slug>
```
Or open the dataset URL — Kaggle will show the new version under the "Data Explorer" → version dropdown after a short ingest delay (usually under a minute).

Common pitfalls

Slug collisions. kaggle datasets create fails with a generic error if the slug already exists. Check with kaggle datasets metadata -p /tmp/_chk <owner>/<slug> before creating.
Subdirectory uploads silently dropped. Without -r zip or -r tar, files in subdirectories of the target dir are not uploaded. If a user reports "my data folder didn't make it", this is almost always why.
Version push needs -m. The CLI errors out without it. Don't pass an empty string — use a real description.
Large files (>20GB). Kaggle has a 20GB per-file cap on the API path. For larger files, the user has to use the website upload flow.
Title vs. slug. title can be changed later via kaggle datasets metadata; the slug (id) is permanent — confirm with the user before creating.
Public vs. private. Default is private. -u on create makes it public on first publish. To flip an existing dataset's visibility, the user has to do it in the Kaggle web UI — there's no CLI flag for it.

Reporting back

Once the publish or version-push succeeds, tell the user:

The dataset URL.
Which path was taken (new dataset vs. new version).
The version note used (for new versions).
Anything skipped or that needs follow-up (e.g. "directory had a private/ subfolder which I excluded — let me know if you wanted that included").