From awesome-cognitive-and-neuroscience-skills
Guides building hierarchical Bayesian cognitive models with Stan/PyMC: prior specification respecting cognitive constraints, model structure, MCMC diagnostics, posterior predictive checks.
npx claudepluginhub neuroaihub/awesome_cognitive_and_neuroscience_skills --plugin awesome-cognitive-and-neuroscience-skillsThis skill uses the workspace's default tool permissions.
This skill encodes expert knowledge for building hierarchical Bayesian cognitive models using probabilistic programming languages (Stan, PyMC). It addresses the modeling decisions that require domain expertise beyond knowing Stan/PyMC syntax: how to choose priors that respect cognitive constraints, when to use hierarchical structure, how to diagnose MCMC pathologies, and how to evaluate model a...
Guides statistical modeling for cognitive science/neuroscience: mixed-effects models vs ANOVA, multiple corrections, Bayesian methods, RT data, effect sizes.
Builds and validates Bayesian models with PyMC: hierarchical models, MCMC (NUTS), variational inference, LOO/WAIC comparison, posterior checks for probabilistic programming.
Builds and fits Bayesian models using PyMC: hierarchical models, MCMC (NUTS), variational inference, LOO/WAIC comparison, posterior checks for probabilistic programming.
Share bugs, ideas, or general feedback.
This skill encodes expert knowledge for building hierarchical Bayesian cognitive models using probabilistic programming languages (Stan, PyMC). It addresses the modeling decisions that require domain expertise beyond knowing Stan/PyMC syntax: how to choose priors that respect cognitive constraints, when to use hierarchical structure, how to diagnose MCMC pathologies, and how to evaluate model adequacy through posterior predictive checks.
A competent programmer without cognitive modeling training would get wrong: which prior families are appropriate for cognitive parameters (e.g., RT must be positive, learning rates bounded in [0,1]), when partial pooling outperforms complete pooling or no pooling, how to detect non-identifiability in cognitive models, and what constitutes adequate MCMC convergence for publishable results.
cogsci-statistics skill)erp-analysis or fmri-glm-analysis-guide skills)Before executing the domain-specific steps below, you MUST:
For detailed methodology guidance, see the research-literacy skill.
This skill was generated by AI from academic literature. All parameters, thresholds, and citations require independent verification before use in research. If you find errors, please open an issue.
Choosing the right level of pooling is a fundamental modeling decision that a non-specialist routinely gets wrong.
If your data has a natural grouping structure (e.g., multiple trials per participant, participants within conditions), you need to decide on a pooling strategy. If not, fit a single model.
| Strategy | Structure | When Appropriate | Risk |
|---|---|---|---|
| Complete pooling | One set of parameters for all participants | Large homogeneous groups, nuisance individual differences | Ignores meaningful individual variation; biased group estimates if heterogeneity exists (Gelman et al., 2013, Ch. 5) |
| No pooling | Separate parameters per participant | Many trials per participant (>200), individual-level inference is the goal | Noisy estimates for participants with few trials; no borrowing of strength (Gelman et al., 2013, Ch. 5) |
| Partial pooling (hierarchical) | Individual parameters drawn from group distribution | Default choice for cognitive modeling; few-to-moderate trials per participant; individual differences are scientifically meaningful | Requires MCMC; potential convergence issues with centered parameterization (Gelman et al., 2013, Ch. 5) |
Critical domain knowledge: Hierarchical (partial pooling) models should be the default in cognitive science. They automatically regularize extreme individual estimates toward the group mean -- a property called "shrinkage" -- which is especially valuable with typical cognitive science sample sizes of 20-40 participants with 50-200 trials each (Lee & Wagenmakers, 2014, Ch. 8).
For hierarchical models, the parameterization choice affects MCMC efficiency:
theta_j ~ Normal(mu, sigma). Use when there are many observations per group (>100 trials per participant) and the data are informative relative to the prior (Betancourt & Girolami, 2015).theta_j = mu + sigma * eta_j where eta_j ~ Normal(0, 1). Use when there are few observations per group, the group-level variance is small, or you encounter divergent transitions with centered parameterization (Betancourt & Girolami, 2015; Stan User's Guide, Section 1.13).When in doubt, use non-centered parameterization. It is more robust across a wider range of data configurations and is the Stan Development Team's default recommendation.
Use weakly informative priors that encode known constraints without dominating the likelihood. The goal is to rule out impossible or implausible parameter values while remaining agnostic about the precise value (Gelman et al., 2008; Gelman et al., 2013, Ch. 2).
Domain-critical principle: Cognitive parameters have natural constraints that generic "flat" or "diffuse" priors violate. Reaction times cannot be negative. Probabilities must lie in [0,1]. Learning rates are bounded. Firing rates are non-negative. Encoding these constraints in the prior is not "being subjective" -- it is encoding physical and psychological reality (Lee & Wagenmakers, 2014, Ch. 4).
| Parameter Type | Recommended Prior | Rationale | Source |
|---|---|---|---|
| Location (unbounded) | Normal(0, sd) or Student-t(3, 0, sd) | Weakly informative; heavier tails with Student-t for robustness | Gelman et al., 2008 |
| Scale / variance | Half-Normal(0, sd) or Half-Cauchy(0, sd) | Positive-only; Half-Cauchy allows heavier tails for group-level SDs | Gelman, 2006; Polson & Scott, 2012 |
| Probability (0 to 1) | Beta(a, b) | Natural conjugate for binomial; Beta(1,1) = Uniform; Beta(2,2) = weakly informative centered at 0.5 | Kruschke, 2015, Ch. 6 |
| Rate (0 to 1) | Beta(1.1, 1.1) or logit-Normal | Gently regularizes away from boundaries | Gelman et al., 2013, Ch. 2 |
| Positive continuous | Gamma(shape, rate) or Lognormal(mu, sigma) | For RT, non-decision time, threshold parameters | Lee & Wagenmakers, 2014, Ch. 4 |
| Correlation matrix | LKJ(eta) | eta=1: uniform over matrices; eta=2: weakly informative (Stan default recommendation) | Lewandowski et al., 2009; Stan User's Guide |
| Simplex (sums to 1) | Dirichlet(alpha) | alpha=1: uniform on simplex; alpha>1: concentrates toward center | Gelman et al., 2013, Ch. 2 |
For detailed cognitive-domain-specific prior tables, see references/prior-selection-guide.md.
Always run a prior predictive check before fitting to data (Schad et al., 2021; Gabry et al., 2019):
references/prior-selection-guide.md for parameter-specific recommendationsdrift-diffusion-model skill for detailed DDM guidancesignal-detection-analysis skillEvery Bayesian analysis requires thorough convergence diagnostics. Never report posterior summaries without first verifying convergence. See references/diagnostics-checklist.md for the full step-by-step protocol.
| Diagnostic | Threshold | Interpretation | Source |
|---|---|---|---|
| R-hat (split R-hat) | < 1.01 | Between-chain vs. within-chain variance; values > 1.01 indicate non-convergence | Vehtari et al., 2021 |
| Bulk-ESS | > 400 (100 per chain with 4 chains) | Effective independent draws for posterior mean/median estimation | Vehtari et al., 2021 |
| Tail-ESS | > 400 | Effective draws for tail quantiles (credible intervals) | Vehtari et al., 2021 |
| Divergent transitions | 0 | Any divergences indicate the sampler failed to explore the posterior faithfully | Betancourt, 2017 |
| E-BFMI | > 0.3 | Energy Bayesian Fraction of Missing Information; low values indicate poor exploration | Betancourt, 2017 |
| Tree depth saturation | Rare (<1% of transitions) | Hitting maximum tree depth suggests difficult geometry | Stan User's Guide |
Critical domain knowledge: The older threshold of R-hat < 1.1 is outdated. Vehtari et al. (2021) demonstrated that the traditional R-hat can miss convergence failures. Use the rank-normalized split R-hat with a threshold of 1.01 and always report both bulk-ESS and tail-ESS.
See references/diagnostics-checklist.md for remediation steps. The most common fixes in cognitive modeling:
adapt_delta to 0.95-0.99| Method | When to Use | Implementation | Source |
|---|---|---|---|
| PSIS-LOO-CV | Default choice for comparing predictive accuracy; more robust than WAIC with weak priors or influential observations | loo package (R), az.loo (Python/ArviZ) | Vehtari et al., 2017 |
| WAIC | Asymptotically equivalent to LOO; acceptable when PSIS diagnostics are clean (all Pareto k < 0.7) | loo package (R), az.waic (Python/ArviZ) | Watanabe, 2010; Vehtari et al., 2017 |
| Bayes factors | When testing a precise null hypothesis (e.g., parameter = 0); sensitive to prior specification | Bridge sampling, Savage-Dickey density ratio | Kass & Raftery, 1995; Lee & Wagenmakers, 2014, Ch. 7 |
Critical domain knowledge: Prefer LOO-CV over WAIC for cognitive models. Vehtari et al. (2017) showed that PSIS-LOO is more robust in the finite-sample case, especially with weak priors or influential observations common in cognitive data. Always check the Pareto k diagnostic: values > 0.7 indicate unreliable LOO estimates for those observations.
| Bayes Factor (BF10) | Evidence Category | Source |
|---|---|---|
| 1 - 3 | Anecdotal / not worth more than a bare mention | Jeffreys, 1961; Lee & Wagenmakers, 2014 |
| 3 - 10 | Moderate evidence | Jeffreys, 1961; Lee & Wagenmakers, 2014 |
| 10 - 30 | Strong evidence | Jeffreys, 1961; Lee & Wagenmakers, 2014 |
| 30 - 100 | Very strong evidence | Jeffreys, 1961; Lee & Wagenmakers, 2014 |
| > 100 | Extreme / decisive evidence | Jeffreys, 1961; Lee & Wagenmakers, 2014 |
Caution: Bayes factors are highly sensitive to prior specification. A diffuse prior on the alternative hypothesis inflates evidence for the null (the Jeffreys-Lindley paradox). Always conduct a prior sensitivity analysis when reporting Bayes factors (Schad et al., 2021).
After model comparison, the selected model must demonstrate it can reproduce key features of the observed data:
Domain-specific checks: For RT models, always check the fit to the full RT distribution (not just the mean). Cognitive models derive their power from fitting distributional shape -- a model that matches mean RT but misses the right tail is inadequate (Ratcliff & McKoon, 2008).
Problem: Two or more parameters trade off so that many parameter combinations yield equivalent likelihoods. Common in RL models (learning rate vs. inverse temperature) and DDM (boundary vs. drift rate with few conditions).
Detection: Pairwise posterior scatter plots show strong correlations or ridges; marginal posteriors are much wider than expected.
Fix: Add conditions that differentially constrain parameters; use informative priors; reparameterize (e.g., the ratio v/a in DDM; Wilson & Collins, 2019).
Problem: Posterior conclusions change substantially when priors are varied within a reasonable range.
Detection: Re-fit with 2-3 alternative prior specifications and compare posteriors (Schad et al., 2021).
Fix: Collect more data; use more informative priors justified by previous literature; report sensitivity analysis in the paper.
Problem: In mixture models, MCMC chains swap component labels, creating multimodal marginal posteriors even when the model is well-identified.
Detection: Trace plots show "switching" between modes; R-hat is high even with long chains.
Fix: Impose ordering constraints (e.g., mu_1 < mu_2); use label-invariant summaries; post-hoc relabeling (Stephens, 2000).
Problem: Funnel-shaped posterior in hierarchical models where the group SD approaches zero, creating an increasingly narrow funnel that the sampler cannot traverse.
Detection: Divergent transitions concentrated near low group-SD values; non-centered parameterization is the standard fix (Betancourt & Girolami, 2015).
Fix: Non-centered parameterization (see Step 3 in Model Structure Decision Tree above).
Problem: With too few trials, individual-level parameters are poorly constrained even in hierarchical models.
Guideline: For DDM, minimum 40-60 trials per condition per participant for stable hierarchical estimation (Wiecki et al., 2013; Ratcliff & Childers, 2015). For simpler models (e.g., binomial SDT), 20-30 trials may suffice with hierarchical priors (Lee & Wagenmakers, 2014).
Fix: If data are already collected, rely more heavily on hierarchical shrinkage and report wide credible intervals honestly.
When reporting Bayesian cognitive models in a manuscript: