From huggingface-skills
Read-only Hugging Face Dataset Viewer API workflows: validate datasets, list splits, preview rows, paginate, search text, filter, get parquet URLs, size, and statistics.
How this skill is triggered — by the user, by Claude, or both
Slash command
/huggingface-skills:huggingface-datasetsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Use this skill to execute read-only Dataset Viewer API calls for dataset exploration and extraction.
Use this skill to execute read-only Dataset Viewer API calls for dataset exploration and extraction.
/is-valid.config + split with /splits./first-rows./rows using offset and length (max 100)./search for text matching and /filter for row predicates./parquet and totals/metadata via /size and /statistics.https://datasets-server.huggingface.coGEToffset is 0-based.length max is usually 100 for row-like endpoints.Authorization: Bearer <HF_TOKEN>.Validate dataset: /is-valid?dataset=<namespace/repo>List subsets and splits: /splits?dataset=<namespace/repo>Preview first rows: /first-rows?dataset=<namespace/repo>&config=<config>&split=<split>Paginate rows: /rows?dataset=<namespace/repo>&config=<config>&split=<split>&offset=<int>&length=<int>Search text: /search?dataset=<namespace/repo>&config=<config>&split=<split>&query=<text>&offset=<int>&length=<int>Filter with predicates: /filter?dataset=<namespace/repo>&config=<config>&split=<split>&where=<predicate>&orderby=<sort>&offset=<int>&length=<int>List parquet shards: /parquet?dataset=<namespace/repo>Get size totals: /size?dataset=<namespace/repo>Get column statistics: /statistics?dataset=<namespace/repo>&config=<config>&split=<split>Get Croissant metadata (if available): /croissant?dataset=<namespace/repo>Pagination pattern:
curl "https://datasets-server.huggingface.co/rows?dataset=stanfordnlp/imdb&config=plain_text&split=train&offset=0&length=100"
curl "https://datasets-server.huggingface.co/rows?dataset=stanfordnlp/imdb&config=plain_text&split=train&offset=100&length=100"
When pagination is partial, use response fields such as num_rows_total, num_rows_per_page, and partial to drive continuation logic.
Search/filter notes:
/search matches string columns (full-text style behavior is internal to the API)./filter requires predicate syntax in where and optional sort in orderby.For CLI-based parquet URL discovery or SQL, use the hf-cli skill with hf datasets parquet and hf datasets sql.
Use one of these flows depending on dependency constraints.
Zero local dependencies (Hub UI):
https://huggingface.co/new-datasetcurl -s "https://datasets-server.huggingface.co/parquet?dataset=<namespace>/<repo>"
Low dependency CLI flow (npx @huggingface/hub / hfjs):
export HF_TOKEN=<your_hf_token>
npx -y @huggingface/hub upload datasets/<namespace>/<repo> ./local/parquet-folder data
npx -y @huggingface/hub upload datasets/<namespace>/<repo> ./local/parquet-folder data --private
After upload, call /parquet to discover <config>/<split>/<shard> values for querying with @~parquet.
The Hub supports raw agent session traces from Claude Code, Codex, and Pi Agent. Upload them to Hugging Face Datasets as original JSONL files and the Hub can auto-detect the trace format, tag the dataset as Traces, and enable the trace viewer for browsing sessions, turns, tool calls, and model responses. Common local session directories:
~/.claude/projects~/.codex/sessions~/.pi/agent/sessionsDefault to private dataset repos because traces can contain prompts, file paths, tool outputs, secrets, or PII. Preserve the raw .jsonl files and nest them by project/cwd instead of uploading every session at the dataset root.
hf repos create <namespace>/<repo> --type dataset --private --exist-ok
hf upload <namespace>/<repo> ~/.codex/sessions codex/<project-or-cwd> --type dataset
npx claudepluginhub huggingface/skills --plugin trl-trainingQueries Hugging Face datasets via Dataset Viewer API for splits, preview/paginated rows, search, filters, parquet links, size, statistics, and metadata. For read-only exploration.
Manages Hugging Face Hub resources via the `hf` CLI: download/upload models, datasets, spaces, buckets; manage repos, discussions, and jobs; handle auth and cache. Replaces deprecated `huggingface-cli`.
Creates, manages, and uploads evaluation datasets to LangSmith using CLI and SDK. Handles types like final_response, single_step, trajectory, RAG for LLM testing.