Core mathematical concepts and theoretical frameworks for statistics
Provides theoretical frameworks for sufficiency, completeness, UMVUE, exponential families, and decision theory.
npx claudepluginhub data-wise/scholarThis skill inherits all available tools. When active, it can use any tool Claude has access to.
Core mathematical statistics theory for rigorous methodology development
Use this skill when working on: theoretical derivations requiring classical statistics, optimality arguments, exponential family manipulations, decision-theoretic comparisons, or foundational probability theory.
A statistic $T(X)$ is sufficient for parameter $\theta$ if the conditional distribution of $X$ given $T(X)$ does not depend on $\theta$:
$$P(X = x \mid T(X) = t, \theta) = P(X = x \mid T(X) = t)$$
$T(X)$ is sufficient for $\theta$ if and only if the likelihood can be factored as:
$$f(x; \theta) = g(T(x), \theta) \cdot h(x)$$
where $g$ depends on $x$ only through $T(x)$, and $h$ does not depend on $\theta$.
A sufficient statistic $T$ is minimal sufficient if for any other sufficient statistic $U$, there exists a function $g$ such that $T = g(U)$.
Criterion: $T(x) = T(y)$ if and only if $\frac{f(x; \theta)}{f(y; \theta)}$ is constant in $\theta$.
#' Check Sufficiency via Factorization
#'
#' @param likelihood Function returning log-likelihood given data and theta
#' @param statistic Function computing candidate sufficient statistic
#' @param data Observed data
#' @return Logical indicating factorization structure
check_sufficiency <- function(likelihood, statistic, data) {
t_value <- statistic(data)
# Check if likelihood ratio is constant across different x with same T(x)
# This is a conceptual framework - actual verification requires problem-specific analysis
list(
statistic_value = t_value,
check_factorization = "Verify g(T(x), theta) * h(x) structure manually"
)
}
A family of distributions ${P_\theta : \theta \in \Theta}$ for statistic $T$ is complete if:
$$E_\theta[g(T)] = 0 \text{ for all } \theta \in \Theta \implies P_\theta(g(T) = 0) = 1 \text{ for all } \theta$$
Equivalently: the only unbiased estimator of zero is zero itself.
A weaker condition where the implication holds only for bounded functions $g$.
For exponential families with $k$-dimensional natural parameter and $k$-dimensional minimal sufficient statistic, the sufficient statistic is complete when the parameter space contains an open set.
Key Result: If $T$ is complete and sufficient, then any unbiased estimator that is a function of $T$ alone is UMVUE.
If $\hat{\theta}$ is any unbiased estimator of $\theta$ and $T$ is sufficient for $\theta$, then:
$$\tilde{\theta} = E[\hat{\theta} \mid T]$$
is also unbiased and has variance no greater than $\hat{\theta}$:
$$\text{Var}(\tilde{\theta}) \leq \text{Var}(\hat{\theta})$$
with equality if and only if $\hat{\theta}$ is already a function of $T$.
If $T$ is complete sufficient and $\hat{\theta} = g(T)$ is unbiased for $\theta$, then $\hat{\theta}$ is the unique UMVUE (Uniformly Minimum Variance Unbiased Estimator).
Strategy 1: Conditioning Method
Strategy 2: Direct Construction
#' Rao-Blackwell Improvement
#'
#' @param estimator_values Vector of estimator values from bootstrap/simulation
#' @param sufficient_stat Vector of corresponding sufficient statistic values
#' @return Rao-Blackwell improved estimates
rao_blackwell_improve <- function(estimator_values, sufficient_stat) {
# Group by sufficient statistic values and compute conditional expectation
improved <- tapply(estimator_values, sufficient_stat, mean)
list(
original_var = var(estimator_values),
improved_var = var(improved[as.character(sufficient_stat)]),
variance_reduction = 1 - var(improved[as.character(sufficient_stat)]) / var(estimator_values)
)
}
A distribution belongs to the exponential family if it can be written as:
$$f(x; \theta) = h(x) \exp\left(\eta(\theta)^T T(x) - A(\theta)\right)$$
Or in natural parameterization:
$$f(x; \eta) = h(x) \exp\left(\eta^T T(x) - A(\eta)\right)$$
where:
Moments from Log-Partition: $$E[T(X)] = \nabla A(\eta), \quad \text{Cov}(T(X)) = \nabla^2 A(\eta)$$
Maximum Likelihood: For iid sample $X_1, \ldots, X_n$, the MLE satisfies: $$\nabla A(\hat{\eta}) = \frac{1}{n} \sum_{i=1}^n T(X_i)$$
Completeness: Natural exponential families with full-rank $\eta$ have complete sufficient statistics.
| Distribution | $T(x)$ | $\eta$ | $A(\eta)$ |
|---|---|---|---|
| Normal($\mu$, $\sigma^2$) | $(x, x^2)$ | $(\mu/\sigma^2, -1/(2\sigma^2))$ | $-\eta_1^2/(4\eta_2) - \log(-2\eta_2)/2$ |
| Poisson($\lambda$) | $x$ | $\log\lambda$ | $e^\eta$ |
| Binomial($n$, $p$) | $x$ | $\log(p/(1-p))$ | $n\log(1 + e^\eta)$ |
| Gamma($\alpha$, $\beta$) | $(x, \log x)$ | $(-\beta, \alpha-1)$ | $\log\Gamma(\eta_2+1) - (\eta_2+1)\log(-\eta_1)$ |
#' Exponential Family MLE via Log-Partition Gradient
#'
#' @param sufficient_stats Matrix of sufficient statistics (n x k)
#' @param log_partition_gradient Function computing gradient of A(eta)
#' @param initial_eta Initial natural parameter
#' @return MLE natural parameter
exponential_family_mle <- function(sufficient_stats, log_partition_gradient, initial_eta) {
# MLE solves: gradient(A(eta)) = sample mean of T(X)
target_mean <- colMeans(sufficient_stats)
objective <- function(eta) {
sum((log_partition_gradient(eta) - target_mean)^2)
}
result <- optim(initial_eta, objective, method = "BFGS")
list(
eta_mle = result$par,
convergence = result$convergence == 0
)
}
| Loss | Formula | Properties |
|---|---|---|
| Squared error | $(a - \theta)^2$ | Penalizes larger errors more |
| Absolute error | $ | a - \theta |
| 0-1 loss | $I(a \neq \theta)$ | Classification |
| Weighted squared | $w(\theta)(a - \theta)^2$ | Heterogeneous importance |
An estimator $\delta$ is admissible if there is no other estimator $\delta'$ such that:
The Bayes estimator minimizes the posterior expected loss:
$$\delta^{\pi}(x) = \arg\min_a E[L(\theta, a) \mid X = x]$$
Key Result: Under squared error loss, the Bayes estimator is the posterior mean.
Admissibility Connection: Unique Bayes estimators are admissible.
An estimator $\delta^*$ is minimax if:
$$\sup_\theta R(\theta, \delta^*) = \inf_\delta \sup_\theta R(\theta, \delta)$$
Finding Minimax: Often found as the Bayes estimator with respect to the "least favorable prior."
#' Compare Estimators by Risk Function
#'
#' @param estimators List of estimator functions
#' @param theta_grid Grid of parameter values to evaluate
#' @param dgp Function to generate data given theta
#' @param loss Loss function L(theta, estimate)
#' @param n_sims Number of simulations per theta
#' @return Risk function evaluations
compare_estimator_risks <- function(estimators, theta_grid, dgp, loss, n_sims = 1000) {
results <- expand.grid(
theta = theta_grid,
estimator = names(estimators)
)
results$risk <- NA
for (i in seq_along(theta_grid)) {
theta <- theta_grid[i]
data_samples <- replicate(n_sims, dgp(theta), simplify = FALSE)
for (j in seq_along(estimators)) {
estimates <- sapply(data_samples, estimators[[j]])
results$risk[results$theta == theta &
results$estimator == names(estimators)[j]] <-
mean(loss(theta, estimates))
}
}
# Identify admissibility
# Check if any estimator dominates another at all theta values
results
}
A sequence $(M_n)_{n \geq 0}$ is a martingale with respect to filtration $(\mathcal{F}_n)$ if:
If $(M_n)$ is a martingale with differences $D_i = M_i - M_{i-1}$, and under regularity conditions:
$$\frac{M_n}{\sqrt{\sum_{i=1}^n E[D_i^2 \mid \mathcal{F}_{i-1}]}} \xrightarrow{d} N(0, 1)$$
Key Condition: Lindeberg condition for triangular arrays: $$\frac{1}{s_n^2} \sum_{i=1}^n E[D_i^2 I(|D_i| > \epsilon s_n) \mid \mathcal{F}_{i-1}] \xrightarrow{p} 0$$
If $\tau$ is a stopping time and $(M_n)$ is a martingale, then under boundedness conditions:
$$E[M_\tau] = E[M_0]$$
In sequential mediation analysis or adaptive designs, martingale theory provides:
#' Verify Martingale Property
#'
#' @param sequence Vector of observed martingale values
#' @param n_bootstrap Bootstrap samples for testing
#' @return Test for martingale property
test_martingale <- function(sequence, n_bootstrap = 1000) {
n <- length(sequence)
differences <- diff(sequence)
# Under martingale property, E[D_i | F_{i-1}] = 0
# Test using conditional mean estimation
# Simple test: differences should have mean zero conditional on past
conditional_means <- sapply(2:(n-1), function(i) {
mean(differences[1:i])
})
list(
mean_differences = mean(differences),
se_differences = sd(differences) / sqrt(length(differences)),
t_stat = mean(differences) / (sd(differences) / sqrt(length(differences))),
p_value = 2 * pnorm(-abs(mean(differences) / (sd(differences) / sqrt(length(differences)))))
)
}
A U-statistic of order $r$ is:
$$U_n = \binom{n}{r}^{-1} \sum_{1 \leq i_1 < \cdots < i_r \leq n} h(X_{i_1}, \ldots, X_{i_r})$$
where $h$ is a symmetric kernel function.
$$\text{Var}(U_n) = \binom{n}{r}^{-1} \sum_{c=1}^{r} \binom{r}{c} \binom{n-r}{r-c} \zeta_c$$
where $\zeta_c = \text{Cov}(h(X_1, \ldots, X_r), h(X_1, \ldots, X_c, X_{r+1}, \ldots, X_{2r-c}))$.
Under regularity conditions:
$$\sqrt{n}(U_n - \theta) \xrightarrow{d} N(0, r^2 \zeta_1)$$
where $\zeta_1$ is the first-order variance component.
The product of coefficients $\hat{a}\hat{b}$ can be viewed through U-statistic lens for variance estimation.
#' Compute U-statistic with Bootstrap Variance
#'
#' @param data Data vector or matrix
#' @param kernel Symmetric kernel function
#' @param r Order of U-statistic
#' @param B Bootstrap replicates
#' @return U-statistic with variance estimate
compute_ustat <- function(data, kernel, r = 2, B = 1000) {
n <- NROW(data)
# Compute U-statistic
indices <- combn(n, r)
u_values <- apply(indices, 2, function(idx) {
kernel(if (is.matrix(data)) data[idx, ] else data[idx])
})
u_stat <- mean(u_values)
# Bootstrap variance
boot_ustats <- replicate(B, {
boot_idx <- sample(n, replace = TRUE)
boot_data <- if (is.matrix(data)) data[boot_idx, ] else data[boot_idx]
boot_indices <- combn(n, r)
mean(apply(boot_indices, 2, function(idx) {
kernel(if (is.matrix(boot_data)) boot_data[idx, ] else boot_data[idx])
}))
})
list(
estimate = u_stat,
se = sd(boot_ustats),
ci = quantile(boot_ustats, c(0.025, 0.975))
)
}
The Fisher information matrix is:
$$I(\theta)_{jk} = E\left[\frac{\partial \log f(X; \theta)}{\partial \theta_j} \frac{\partial \log f(X; \theta)}{\partial \theta_k}\right]$$
Or equivalently (under regularity):
$$I(\theta)_{jk} = -E\left[\frac{\partial^2 \log f(X; \theta)}{\partial \theta_j \partial \theta_k}\right]$$
For any unbiased estimator $\hat{\theta}$:
$$\text{Var}(\hat{\theta}) \geq I(\theta)^{-1}$$
An estimator achieving this bound is efficient.
For mediation effects $\theta = ab$, the information combines:
$$I_{ab} = \left(\frac{\partial(ab)}{\partial a}\right)^2 I_a + \left(\frac{\partial(ab)}{\partial b}\right)^2 I_b + 2 \frac{\partial(ab)}{\partial a}\frac{\partial(ab)}{\partial b} I_{ab}$$
#' Compute Fisher Information Matrix Numerically
#'
#' @param log_likelihood Log-likelihood function(data, theta)
#' @param theta Parameter vector
#' @param data Data
#' @param n_samples Monte Carlo samples for expectation
#' @return Fisher information matrix
fisher_information <- function(log_likelihood, theta, data, n_samples = 1000) {
k <- length(theta)
eps <- sqrt(.Machine$double.eps)
# Compute score function
score <- function(x, th) {
grad <- numeric(k)
for (j in 1:k) {
th_plus <- th_minus <- th
th_plus[j] <- th[j] + eps
th_minus[j] <- th[j] - eps
grad[j] <- (log_likelihood(x, th_plus) - log_likelihood(x, th_minus)) / (2 * eps)
}
grad
}
# Expected outer product of score
scores <- t(sapply(1:NROW(data), function(i) {
score(if (is.matrix(data)) data[i, ] else data[i], theta)
}))
info_matrix <- (t(scores) %*% scores) / NROW(data)
list(
information = info_matrix,
cramer_rao_bound = solve(info_matrix)
)
}
Measurability: Random variables are measurable functions from $(\Omega, \mathcal{F})$ to $(\mathbb{R}, \mathcal{B})$.
Dominated Convergence Theorem: If $|f_n| \leq g$ with $E[g] < \infty$ and $f_n \to f$ a.s., then $E[f_n] \to E[f]$.
Fubini-Tonelli Theorem: For non-negative measurable functions or integrable functions: $$\int \left(\int f(x, y) , d\mu(x)\right) d\nu(y) = \int \left(\int f(x, y) , d\nu(y)\right) d\mu(x)$$
Causal identification often requires:
If $P \ll Q$ (P is absolutely continuous with respect to Q), then:
$$\frac{dP}{dQ}(x) = \text{likelihood ratio}$$
Critical for:
Version: 1.0.0 Created: 2025-12-08 Domain: Mathematical statistics foundations for methodology research Prerequisites: Probability theory, linear algebra, real analysis
Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.
Applies Anthropic's official brand colors and typography to any sort of artifact that may benefit from having Anthropic's look-and-feel. Use it when brand colors or style guidelines, visual formatting, or company design standards apply.
Create beautiful visual art in .png and .pdf documents using design philosophy. You should use this skill when the user asks to create a poster, piece of art, design, or other static piece. Create original visual designs, never copying existing artists' work to avoid copyright violations.