End-to-end data annotation toolkit. Prep raw data, design annotation schemas, annotate interactively with Claude (small scale) or scaffold Gemini batch inference (large scale), and publish to Hugging Face.
npx claudepluginhub danielrosehill/claude-code-plugins --plugin data-annotationRecommend which columns to keep, drop, rename, or transform given the dataset profile and the target task. Produces a curation plan; applies it only on instruction.
Profile a staged dataset — file inventory, format detection, schema inference, row/record counts, null and duplicate stats, encoding issues, and representative samples. Read-only. Invoked by `shape-dataset` early in the pipeline.
Execute format conversions, encoding fixes, JSON flattening, reshaping into one-record-per-task, and train/val/test splits. Operates on approved plans from the orchestrator; writes outputs to a new stage directory.
Scan a dataset for personally identifiable information and propose a redaction strategy per column. Detects emails, phone numbers, names, addresses, government IDs, IP addresses, credit cards, and free-text leaks. Produces a redaction plan; only redacts when explicitly told to.
Audit a set of completed annotations — schema validation, label distribution, sampled spot-checks, agreement analysis if multiple passes exist, and a list of records flagged for human review. Produces an audit report; does not modify annotations.
Propose an annotation schema (label set, field definitions, guidelines, edge cases) for a dataset given a target task and a profile. Produces schema.json plus human-readable guidelines. Iterates with the user before the schema is locked.
Annotate a small dataset directly inside the Claude Code session — Claude reads each record, applies the locked schema, and writes structured annotations to disk. For datasets where standing up Label Studio or a Gemini batch job would be overkill (typically tens to low hundreds of records). Use when the user says "annotate this dataset", "label these records with Claude", "do the annotations yourself", or "small-scale annotation".
Set up a Hugging Face dataset repository — create the remote repo (asking public/private), copy prepared data over, generate a dataset card, and push. Uses the huggingface-cli, not an MCP. Use when the user says "set up a HF dataset", "publish to Hugging Face", "create the HF dataset repo", or after annotation/prep is complete.
Stage raw data from a GitHub repo, local path, or remote URL into a working directory so downstream prep skills can operate on it. Use when the user provides a data source and wants it pulled in, or when `shape-dataset` needs to ingest before profiling.
Initialize a local git repository for a dataset, with the conventional Hugging Face layout, LFS configuration, license, and a stub dataset card. Use when the user wants to start a new dataset repo, or as a step inside `hf-setup`.
Create a self-contained annotation workspace for a prepared dataset, including a locked annotation schema and ready-to-run boilerplate for batch inference via the Gemini API. Use when the user says "set up annotation", "create annotation environment", "scaffold annotation", or after `shape-dataset` has produced a clean reshaped dataset.
Main entry point for turning raw data into a dataset. Examines a source (GitHub repo, local path, URL), works with the user to define the target task, then plans and executes a prep pipeline — profiling, PII handling, column curation, format normalization, splits, and annotation schema design. Delegates specialized work to subagents. Use when the user says "prep this data", "turn this into a dataset", "get this ready for annotation", or "build a dataset from X".
End-to-end toolkit for turning raw data into an annotated dataset and publishing it to Hugging Face. Covers the full lifecycle: ingest, profile, clean (PII, columns, format), design an annotation schema, annotate (interactively with Claude or via Gemini batch inference), review, and publish.
The plugin is built around a small set of orchestrator skills delegating to specialized subagents — instead of one micro-skill per micro-operation, the orchestrator looks at the data, talks to the user, and figures out which steps are needed.
shape-dataset — top-level prep workflow. Ingests a source, profiles it, proposes a prep plan (PII, columns, format, splits, schema), executes approved steps. Hands off to annotation or publication.annotate-with-claude — Claude annotates a small dataset interactively in-session against a locked schema. For runs where Label Studio or batch inference would be overkill (typically tens to low hundreds of records).scaffold-annotation-env — generates a workspace for large-scale annotation via the Gemini batch inference API, with Python boilerplate (run, poll, validate).hf-setup — creates a Hugging Face dataset repo (asks public/private), copies the prepared data over, generates the dataset card, pushes via huggingface-cli.ingest-source — stages raw data from GitHub, a local path, or a remote URL into a known working directory.init-dataset-repo — initializes a local git repo with the conventional HF dataset layout, LFS rules, license, and card stub.The orchestrator skills delegate to these — they are not invoked directly by the user.
data-profiler — schema inference, stats, encoding, samples; flags concerns.pii-scanner — detects PII (direct and quasi), proposes redaction strategy per column; can apply on approval.column-curator — recommends keep/drop/rename/recast/derive per column for the target task.schema-designer — proposes annotation schema and guidelines from data + task; iterates with the user before locking.format-normalizer — executes format conversions, encoding fixes, JSON flattening, reshape-for-annotation, and splits.review-annotations — audits finished annotations: schema validation, label distribution, sampled spot-checks, agreement analysis.ingest-source
↓
shape-dataset ──→ data-profiler, pii-scanner, column-curator, format-normalizer, schema-designer
↓
annotate-with-claude (small) OR scaffold-annotation-env (large)
↓ ↓
review-annotations review-annotations
↓ ↓
hf-setup hf-setup
claude plugins marketplace update danielrosehill
claude plugins install data-annotation@danielrosehill
After installing, restart Claude Code.
huggingface-cli (logged in) for hf-setup.GEMINI_API_KEY in the annotation workspace .env for scaffold-annotation-env.pandas, pyarrow, presidio-analyzer, jsonschema (skill scripts install via uv on first run).${CLAUDE_USER_DATA:-${XDG_DATA_HOME:-$HOME/.local/share}/claude-plugins}/data-annotation/.~/repos/ or ~/Documents/); only a pointer is stored under $CLAUDE_USER_DATA.MIT.
Agent Skills for AI/ML tasks including dataset creation, model training, evaluation, and research paper publishing on Hugging Face Hub
Uses power tools
Uses Bash, Write, or Edit tools
Share bugs, ideas, or general feedback.
Complete creative writing suite with 10 specialized agents covering the full writing process: research gathering, character development, story architecture, world-building, dialogue coaching, editing/review, outlining, content strategy, believability auditing, and prose style/voice analysis. Includes genre-specific guides, templates, and quality checklists.
Efficient skill management system with progressive discovery — 410+ production-ready skills across 33+ domains
Open-source, local-first Claude Code plugin for token reduction, context compression, and cost optimization using hybrid RAG retrieval (BM25 + vector search), reranking, AST-aware chunking, and compact context packets.
Manus-style persistent markdown files for planning, progress tracking, and knowledge storage. Works with Claude Code, Kiro, Clawd CLI, Gemini CLI, Cursor, Continue, Hermes, and 17+ AI coding assistants. Now with Arabic, German, Spanish, and Chinese (Simplified & Traditional) support.
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge.
Sign in to claim