Guides biomedical data analysis using MATLAB Statistics and Machine Learning Toolbox: t-tests/ANOVA, SVM/random forest classification, regression, clustering with k-means/DBSCAN, PCA/t-SNE, survival analysis, cross-validation, SHAP explanations.
npx claudepluginhub rrmaram2000/matlab-toolbox-skills --plugin matlab-toolbox-skillsThis skill uses the workspace's default tool permissions.
**Version:** R2025b | **Focus:** Biomedical-specific workflows and template scripts
knowledge/INDEX.mdknowledge/cards/bayesian.mdknowledge/cards/biomedical.mdknowledge/cards/classification.mdknowledge/cards/clustering.mdknowledge/cards/deep-learning.mdknowledge/cards/dimensionality-reduction.mdknowledge/cards/distributions.mdknowledge/cards/hypothesis-testing.mdknowledge/cards/regression.mdknowledge/cards/survival-analysis.mdscripts/template_bayesopt_hyperparameter.mscripts/template_cox_survival_analysis.mscripts/template_cross_validation_pipeline.mscripts/template_distribution_fitting.mscripts/template_glm_regression.mscripts/template_hypothesis_testing.mscripts/template_kmeans_patient_clustering.mscripts/template_missing_data_handling.mscripts/template_pca_feature_reduction.mFits regression models (linear, logistic, mixed-effects, Cox survival, Kaplan-Meier) to biomedical datasets using Python/statsmodels. Computes ORs, HRs, CIs, p-values for clinical data analysis.
Analyzes survival data and fits time-to-event models in Python with scikit-survival: Cox, Random Survival Forests, Gradient Boosting, SVMs; handles censoring, competing risks, evaluates with concordance index or Brier score.
Builds classical ML models in Python with scikit-learn: classification, regression, clustering, dimensionality reduction, preprocessing pipelines, evaluation, and tuning.
Share bugs, ideas, or general feedback.
Version: R2025b | Focus: Biomedical-specific workflows and template scripts
This skill provides expert guidance for applying the Statistics and Machine Learning Toolbox to biomedical data analysis. The model already has strong knowledge of the toolbox API — this skill adds domain-specific workflows, clinical best practices, and ready-to-use template scripts.
Biomedical data analysis has unique challenges that generic ML tutorials do not address:
Class imbalance is the norm — Disease prevalence is often 1-10%. Always use cost-sensitive learning or resampling. Never report accuracy alone; use sensitivity, specificity, AUC, and likelihood ratios.
p >> n is common — Genomics/proteomics panels have thousands of features and hundreds of samples. Feature selection and dimensionality reduction are mandatory, not optional.
Data leakage is catastrophic — All preprocessing (normalization, feature selection, imputation) must happen within cross-validation folds. Leakage inflates performance and leads to clinical failures.
Clinical interpretability matters — Clinicians need odds ratios, hazard ratios, and feature importance — not just predictions. Use fitglm for odds ratios, coxphfit for hazard ratios, shapley for model explanation.
Missing data is expected — Use fillmissing(X, 'knn') for clinical imputation (requires Stats & ML Toolbox, not Bioinformatics Toolbox).
Censored data requires special handling — Survival analysis with ecdf(..., 'Censoring', ...) and coxphfit — not standard regression.
Multiple testing correction is essential — When testing many biomarkers, use Benjamini-Hochberg FDR, not just Bonferroni.
| Task | Start Here |
|---|---|
| Build a diagnostic classifier | scripts/template_svm_classification.m or scripts/template_random_forest_ensemble.m |
| Analyze survival / time-to-event data | scripts/template_cox_survival_analysis.m + cards/survival-analysis.md |
| Discover biomarkers | cards/biomedical.md (consensus feature selection) |
| Optimize hyperparameters | scripts/template_bayesopt_hyperparameter.m + cards/bayesian.md |
| Compare treatment groups | scripts/template_hypothesis_testing.m + cards/hypothesis-testing.md |
| Cluster patients into subtypes | scripts/template_kmeans_patient_clustering.m |
| Fit distributions to clinical data | scripts/template_distribution_fitting.m |
| Reduce dimensions (genomics, etc.) | scripts/template_pca_feature_reduction.m |
| Interpret a black-box model | scripts/template_shapley_interpretability.m |
| Handle missing clinical data | scripts/template_missing_data_handling.m |
| Build a regression model | scripts/template_glm_regression.m |
| Cross-validate properly | scripts/template_cross_validation_pipeline.m |
The scripts/ directory contains 12 ready-to-use MATLAB scripts covering the most common biomedical analysis workflows:
| Script | Purpose |
|---|---|
template_svm_classification.m | SVM classifier with RBF kernel, standardization, optimization |
template_random_forest_ensemble.m | Random Forest with feature importance and OOB error |
template_cox_survival_analysis.m | Kaplan-Meier curves, Cox regression, hazard ratios |
template_bayesopt_hyperparameter.m | Bayesian hyperparameter optimization with bayesopt |
template_hypothesis_testing.m | T-test, ANOVA, nonparametric tests with effect sizes |
template_kmeans_patient_clustering.m | Patient subtyping with optimal k selection |
template_distribution_fitting.m | Multi-distribution fitting with AIC/BIC comparison |
template_pca_feature_reduction.m | PCA pipeline for high-dimensional biodata |
template_shapley_interpretability.m | SHAP values for model explanation |
template_missing_data_handling.m | Missing data strategies (kNN, interpolation) |
template_glm_regression.m | GLM with odds ratios and clinical reporting |
template_cross_validation_pipeline.m | Nested CV for unbiased performance estimation |
%% Complete Biomedical Classification Pipeline
% 1. Load and explore clinical data
data = readtable('patient_data.csv');
summary(data);
grpstats(data, 'Diagnosis', {'mean', 'std'});
%% 2. Handle missing data
data = fillmissing(data, 'knn'); % kNN imputation (Stats & ML Toolbox)
%% 3. Feature selection (within training fold only!)
cv = cvpartition(data.Diagnosis, 'Holdout', 0.2, 'Stratify', true);
XTrain = data(training(cv), :);
[idx, scores] = fscmrmr(XTrain(:, predictors), XTrain.Diagnosis);
selectedFeatures = predictors(idx(1:10));
%% 4. Train classifier with cross-validation
Mdl = fitcensemble(XTrain(:, selectedFeatures), XTrain.Diagnosis, ...
'Method', 'Bag', 'NumLearningCycles', 100, ...
'OptimizeHyperparameters', 'auto', ...
'HyperparameterOptimizationOptions', struct('ShowPlots', false));
%% 5. Evaluate on held-out test set
XTest = data(test(cv), :);
[label, score] = predict(Mdl, XTest(:, selectedFeatures));
[X, Y, T, AUC] = perfcurve(XTest.Diagnosis, score(:,2), 'Positive');
%% 6. Interpret model
explainer = shapley(Mdl, XTrain(:, selectedFeatures));
plot(explainer);
See knowledge/INDEX.md for the complete card index. Key cards:
cards/survival-analysis.md — Kaplan-Meier, Cox regression, coxphfit patternscards/biomedical.md — Diagnostic classifiers, biomarker discovery, class imbalancecards/bayesian.md — bayesopt, MCMC sampling, Bayesian model comparisontrainnet, unet, dlnetwork)medicalVolume, radiomics)regionprops, imgaussfilt)wavedec2, wdenoise2)