Help us improve
Share bugs, ideas, or general feedback.
From grimoire
Measures ML model performance across demographic groups to detect discriminatory outcomes. Required for regulatory compliance (EU AI Act, CFPB, EEOC) and ethical AI deployment.
npx claudepluginhub jeffreytse/grimoire --plugin grimoireHow this skill is triggered — by the user, by Claude, or both
Slash command
/grimoire:audit-model-fairnessThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Systematically measure and document a model's performance across demographic groups to identify discriminatory outcomes before or after deployment.
Validates AI/ML models and datasets for bias, fairness using Fairlearn/AIF360 metrics, four-fifths rule, severity classification, and ethics mapping.
Audits algorithms, models, ranking systems, and automated decisions for discriminatory patterns and unfair outcomes. Use before deploying any system that makes decisions about people.
Conducts a structured ethical review of AI/ML features, models, or products covering fairness, transparency, privacy, safety, accountability, and societal impact with risk scoring and mitigations.
Share bugs, ideas, or general feedback.
Systematically measure and document a model's performance across demographic groups to identify discriminatory outcomes before or after deployment.
Adopted by: Required by EU AI Act (2024) for high-risk AI systems; CFPB fair lending requirements; EEOC guidelines for employment AI; NIST AI RMF adopted by US federal agencies Impact: Biased models create legal liability (CFPB fines up to $1M/day for fair lending violations); Amazon famously retracted an AI hiring tool after discovering gender bias; proactive audits prevent reputational and regulatory harm Why best: Models trained on historical data encode historical discrimination; without measurement, unfairness is invisible until harm occurs
Sources: NIST AI RMF 1.0 (2023); Barocas, Hardt & Narayanan "Fairness and Machine Learning" (2019); IEEE Ethically Aligned Design v2 (2019)
Define the protected attributes — Identify legally and ethically relevant attributes for your context: race, gender, age, disability status, national origin, religion, sexual orientation. Determine which you can directly measure and which must be inferred from proxies. Document legal basis and jurisdiction.
Select fairness metrics — Choose metrics appropriate to the decision context. Demographic parity: equal positive prediction rates across groups (appropriate for representation goals). Equalized odds: equal TPR and FPR across groups (appropriate for classification). Calibration: equal prediction accuracy across groups (appropriate for risk scoring). No single metric satisfies all simultaneously (impossibility theorem); choose based on harm type.
Assemble a stratified evaluation dataset — Evaluation data must be representative of the deployment population. Oversample minority groups to ensure statistical significance (minimum 100 samples per subgroup for meaningful metrics). Use held-out data, not training data. Document dataset construction methodology.
Measure overall model performance — Establish baseline accuracy, precision, recall, and AUC for the full population. This is the reference point for group-level comparisons. Document evaluation date and model version.
Measure per-group performance — Compute the same performance metrics for every protected group. Calculate disparity ratios: group metric / majority group metric. Flag disparities above 0.8 (80% rule, EEOC 4/5ths rule) as potential adverse impact. Visualize as a fairness dashboard.
Investigate sources of disparity — Analyze: Is disparity in the training data (historical bias)? In feature selection (proxy discrimination)? In model architecture? In label quality (human labeling bias)? Use SHAP values to identify which features drive differential predictions across groups.
Apply mitigation techniques — Pre-processing: reweight training data, resample underrepresented groups. In-processing: add fairness constraints to the loss function (adversarial debiasing, regularization). Post-processing: adjust decision thresholds per group to equalize error rates. Document trade-offs with overall model performance.
Conduct human review of edge cases — Sample 50-100 misclassified cases per protected group. Have domain experts review for patterns. Automated metrics miss contextual harms that human review surfaces (e.g., stereotyped language in text models).
Produce a model card or fairness audit report — Document: model purpose, intended use, evaluation methodology, per-group performance metrics, known limitations, and mitigation steps taken. Publish internally and externally per your disclosure policy. EU AI Act requires this for high-risk systems.
Establish ongoing monitoring — Deploy fairness monitoring in production. Track per-group prediction distributions monthly. Set alerts if demographic disparity increases above threshold post-deployment. Retrain or retune when drift is detected. Fairness is not a one-time audit.