Help us improve
Share bugs, ideas, or general feedback.
From training
Generate Datasheet, Model Card, and Data Statement from a dataset manifest
npx claudepluginhub jmagly/aiwg-trainingHow this skill is triggered — by the user, by Claude, or both
Slash command
/training:dataset-docsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Generate standards-compliant dataset documentation — Datasheet (Gebru et al. 2021), Model Card (Mitchell et al. 2019), and Data Statement (Bender & Friedman 2018) — by auto-populating templates from a dataset manifest and related AIWG training artifacts.
Generates Model Cards per Mitchell et al. and HuggingFace standards, covering intended use, limitations, training data provenance, ethical considerations, and regulatory alignment (EU AI Act, NIST AI RMF).
Generates standardized model cards in HuggingFace and NVIDIA Model Card++ formats for ML models, covering details, intended uses, training data, metrics, limitations, and ethics. Use when preparing models for deployment or handoff.
Create, configure, and update datasets on Hugging Face Hub with SQL-based querying, streaming row updates, and multi-format template support.
Share bugs, ideas, or general feedback.
Generate standards-compliant dataset documentation — Datasheet (Gebru et al. 2021), Model Card (Mitchell et al. 2019), and Data Statement (Bender & Friedman 2018) — by auto-populating templates from a dataset manifest and related AIWG training artifacts.
Invoke this skill after dataset-version has produced a finalized manifest and the downstream artifacts (quality report, license ledger, decontamination report, provenance record) exist in .aiwg/training/. The skill produces the compliance documentation bundle required by ADR-022 D9 before a dataset is released or used to train a published model.
Typical trigger points:
| Parameter | Required | Default | Description |
|---|---|---|---|
manifest-path | yes | — | Path to the dataset manifest YAML (e.g., .aiwg/training/datasets/v1.2.0-manifest.yaml). |
--type | no | all | One of datasheet, model-card, data-statement, or all. |
--interactive | no | false | If set, prompt the operator to fill <!-- HUMAN FILL --> fields inline; otherwise leave markers for later review. |
manifest-path. Validate required top-level fields (dataset_name, version, modality, instance_count, license_id). Fail fast with a clear error if the manifest is malformed..aiwg/training/ gather the quality report, license ledger, decontamination report, and W3C PROV provenance record keyed by {{version}}. Record missing artifacts as warnings — do not hard-fail, but flag the affected template fields as unknown.--type, load one or more of:
templates/datasheet-for-datasets.mdtemplates/model-card.mdtemplates/data-statement.md{{field_name}} placeholders. Substitute values from the manifest and related artifacts. Target ≥60% of fields auto-filled on a typical well-instrumented dataset (per REF-451 feasibility study). Unresolved placeholders are replaced with UNKNOWN — see manifest rather than left literal.--interactive is set, prompt the operator once per <!-- HUMAN FILL --> marker using the platform-native UX tool (see native-ux-tools rule); otherwise leave the markers in place for downstream editorial review..aiwg/training/datasets/<version>-{datasheet,model-card,data-statement}.md. Update manifest.yaml with documentation: block pointing at the generated files. Append an entry to .aiwg/activity.log per the activity-log rule.The skill aims for the ≥60% auto-fill rate documented in REF-451 as the threshold at which datasheets become practical to maintain. Fields mapped from the manifest include dataset identity, composition counts, splits, source URLs, license, collection window, preprocessing pipeline references, IRB identifiers, retention policy, and provenance links. Fields requiring human judgment (bias analysis, intended users, ethical considerations, out-of-scope uses) remain explicit <!-- HUMAN FILL --> markers.
Generated datasheets validate against the HuggingFace dataset card schema (YAML frontmatter fields: license, language, task_categories, size_categories, pretty_name). The skill emits a post-write validation report listing any HuggingFace fields that could not be derived from the manifest so the operator can decide whether to source them manually before upload.
.aiwg/training/datasets/<version>-datasheet.md.aiwg/training/datasets/<version>-model-card.md.aiwg/training/datasets/<version>-data-statement.md.aiwg/training/datasets/<version>-manifest.yaml documentation block.aiwg/activity.logdataset-version (produces manifest), provenance-create (produces PROV record), grade-on-ingest (produces quality report).