From data-annotation
Set up a Hugging Face dataset repository — create the remote repo (asking public/private), copy prepared data over, generate a dataset card, and push. Uses the huggingface-cli, not an MCP. Use when the user says "set up a HF dataset", "publish to Hugging Face", "create the HF dataset repo", or after annotation/prep is complete.
npx claudepluginhub danielrosehill/claude-code-plugins --plugin data-annotationThis skill uses the workspace's default tool permissions.
End-to-end setup of an HF dataset repository from a prepared local dataset. Encompasses creation, data copy, dataset card, and push.
Conducts multi-round deep research on GitHub repos via API and web searches, generating markdown reports with executive summaries, timelines, metrics, and Mermaid diagrams.
Share bugs, ideas, or general feedback.
End-to-end setup of an HF dataset repository from a prepared local dataset. Encompasses creation, data copy, dataset card, and push.
huggingface-cli installed and logged in. Check with huggingface-cli whoami. If not logged in, instruct the user to run huggingface-cli login (don't try to do it programmatically).<workspace>/final/ from shape-dataset, optionally enriched with annotation outputs from scaffold-annotation-env.mit, apache-2.0, cc-by-4.0, cc-by-sa-4.0, cc0-1.0, or other. Ask if unclear.huggingface-cli repo create <name> --type dataset [--private] [--organization <org>]
Use --private if the user chose private. Capture the resulting repo URL (https://huggingface.co/datasets/<owner>/<name>).
If the workspace doesn't already have a git repo for the dataset, run init-dataset-repo first. Otherwise reuse it.
cd <dataset-repo>
git lfs install
huggingface-cli lfs-enable-largefiles .
git remote add origin https://huggingface.co/datasets/<owner>/<name>
Configure .gitattributes for LFS on *.parquet, *.arrow, *.json, *.jsonl over a size threshold, and any media files.
Lay out the dataset on disk in the conventional HF structure:
<dataset-repo>/
├── README.md # the dataset card (next step)
├── data/
│ ├── train.parquet # or .jsonl
│ ├── validation.parquet
│ └── test.parquet
└── LICENSE
If the prepared data is in a different format/layout, convert it. Prefer Parquet for tabular, JSONL for variable-shape records.
Write README.md with the YAML frontmatter HF expects:
---
license: <license>
task_categories:
- <task>
language:
- en
size_categories:
- <auto>
pretty_name: <Pretty Name>
tags:
- <tag>
---
Below the frontmatter, generate sections from the workspace artifacts:
schema.json, columns, splits, sizes (read from the actual files).pii-scanner run, if any.Anything the workspace doesn't have an answer for should be a clearly-marked <!-- TODO --> rather than a fabricated detail.
git add -A
git commit -m "Initial dataset upload"
git push origin main
Report back the dataset URL and remind the user that the card preview may take a minute to render on the Hub.