Skill

model-card

From ds

Generates standardized model cards in HuggingFace and NVIDIA Model Card++ formats for ML models, covering details, intended uses, training data, metrics, limitations, and ethics. Use when preparing models for deployment or handoff.

Hugging Face

ai-ml

documentation

npx claudepluginhub andikarachman/data-science-plugin --plugin ds

Tool Access

This skill uses the workspace's default tool permissions.

Preview

Generate a standardized model card that documents a trained ML model's purpose, performance, limitations, and ethical considerations. Based on HuggingFace Model Card format and NVIDIA Model Card++ extensions.

SKILL.md

Similar Skills

hugging-face-evaluation

37.1k

Adds and manages structured evaluation results in Hugging Face model cards: extracts tables from READMEs, imports from Artificial Analysis API, runs custom evals with vLLM/lighteval/inspect-ai. Supports model-index format.

antigravity-awesome-skills

Harness ML Ops

Advises on ML pipeline management, experiment tracking hygiene, model serving patterns, and prompt evaluation frameworks. Audits reproducibility, model versioning, and deployment readiness across MLflow, Weights & Biases, SageMaker, Vertex AI.

1 file

harness-claude

ml-research

155

Deep-dives into ML/AI topics by fetching official docs and GitHub sources via KB or web tools, for explaining concepts, comparing approaches, or surveying frameworks like 'how does X work?' or 'X vs Y'.

superml

Stats

Stars11

Forks1

Last CommitFeb 24, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Model Card Generation

Required Sections

1. Model Details

Field	Description
Name	Human-readable model name
Version	Model version (e.g., v1.0.0)
Type	Algorithm family (e.g., gradient boosting, neural network, linear regression)
Framework	Library used (scikit-learn, statsmodels, aeon, xgboost, etc.)
Task	What the model does (classification, regression, forecasting, anomaly detection, etc.)
Date trained	When the model was last trained
Author	Who developed the model

2. Intended Use

Document the model's intended use case clearly:

Primary use case -- The specific problem this model solves
Intended users -- Who should use this model (data scientists, business analysts, automated systems)
Out-of-scope uses -- What this model should NOT be used for
Deployment context -- Where and how the model will be used (batch scoring, real-time API, embedded)

3. Training Data

Field	Description
Source	Where the training data comes from
Date range	Time period of training data
Size	Number of samples and features
Data hash	SHA-256 hash for version tracking
Preprocessing	Key transformations applied
Known biases	Any known biases in the training data

4. Evaluation Data

Field	Description
Source	Same or different from training?
Date range	Time period of evaluation data
Size	Number of samples
Split strategy	How train/eval was split

5. Metrics

Report performance metrics with context:

Metric	Value	Baseline	Improvement	Confidence Interval
[Primary]
[Secondary]

Include:

Performance by subgroup/slice (if slicing was done)
Calibration metrics (for probabilistic models)
Latency metrics (inference time per sample)

6. Limitations

Document known limitations honestly:

Data limitations -- What data scenarios the model hasn't seen
Performance limitations -- Where the model performs poorly (specific slices, edge cases)
Temporal limitations -- How quickly the model may degrade (data drift sensitivity)
Technical limitations -- Hardware requirements, latency constraints, dependency versions

7. Ethical Considerations

Fairness -- Potential for disparate impact across protected groups
Privacy -- What personal data was used in training
Environmental impact -- Training compute cost (if significant)
Dual use risk -- Could the model be misused?

8. How to Get Started

Provide concrete usage examples:

# Example: Loading and using the model
import joblib

model = joblib.load("path/to/model.pkl")
predictions = model.predict(X_new)

Include:

Required dependencies and versions
Input format and schema
Output format and interpretation
Common pitfalls

Model Card Checklist

Before shipping, verify:

All 8 sections are filled
Metrics include confidence intervals
Limitations are honest and specific (not generic disclaimers)
Ethical considerations are addressed (even if "low risk -- no protected attributes used")
Usage examples are runnable
Dependencies and versions are specified
Out-of-scope uses are documented

Common Mistakes

Mistake	Impact	Fix
Vague limitations ("may not work for all data")	Users can't assess risk	Be specific: "Accuracy drops 15% on samples with >50% missing values"
Missing subgroup metrics	Hides fairness issues	Report metrics for all meaningful slices
No baseline comparison	Can't assess model value	Always include baseline performance
Outdated training data dates	Users assume data is fresh	Include data recency and staleness risk
Missing dependency versions	Can't recreate environment	Pin exact versions in requirements