From quoth
Hierarchical Thompson sampling with cluster-level posteriors + 10% exploration + SNIPS counterfactual updates. Use when building retrieval/recommendation systems with implicit feedback that need to balance exploitation with exploration at scale (10k+ items).
How this skill is triggered — by the user, by Claude, or both
Slash command
/quoth:contextual-banditsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
- Large item catalog (10k+) where per-item Beta(α,β) is infeasible as sole signal
Per-item LinTS stores O(d²) matrix per arm: 1024d × 100k items ≈ 800GB. Infeasible.
Hierarchical decomposition:
Memory at 100k items, K=316 clusters: ~5KB of cluster stats.
Input: candidates (pre-filtered via HNSW top-N), clusterMap, K=3, queryEmbedding
1. Group candidates by cluster_id
2. For each cluster c: sample s_c ~ Beta(α_c, β_c)
3. Sort clusters by s_c desc
4. From each cluster (top-sampled first), rank items by:
score = 0.6·cosine(query, item.embedding) + 0.4·(α_i/(α_i+β_i))
5. Take top items until K reached; record cluster+within propensities
Critical for counterfactual updates (SNIPS):
θ_i ≈ (s_c_i / Σs) × (1 / (rank_within × |cluster|))
clip θ_i ≥ 0.01 to prevent weight explosion
Marsaglia-Tsang gamma method:
function sampleBeta(α, β) {
const g1 = sampleGamma(α), g2 = sampleGamma(β)
return g1 / (g1 + g2)
}
Why: without exploration, the system converges on whatever was initially popular. Exploration creates clean counterfactual data for unbiased SNIPS updates.
Mechanism: with probability ε=0.10, replace one of the K=3 ranked slots with a uniformly random candidate from the pool (excluding already-selected).
IF random() < ε:
slot = random(0, K-1)
replacement = uniform_random_from(pool - selected)
selected[slot] = replacement # mark is_exploration=true
propensity = ε / |available|
Why this matters for SNIPS: without exploration, the probability of a random item being picked approaches 0, making SNIPS weights (1/θ) unbounded. Exploration guarantees θ_i ≥ ε / pool_size, capping SNIPS weights at pool_size / ε ≈ 100-1000.
At injection time, persist per-slot:
INSERT INTO injection_log (session_id, pattern_id, cluster_id, rank, propensity, is_exploration, query_text, injected_at)
VALUES (?, ?, ?, ?, ?, ?, ?, now)
Critical for offline SNIPS evaluation — DO NOT drop this log.
Problem: we log injections with propensities θ_i and observe rewards r_i. Naive IPS (1/N) Σ r_i / θ_i has unbounded variance when θ_i is small.
SNIPS (Swaminathan & Joachims 2015):
r̂(cluster) = Σ_i (w_i · r_i) / Σ_i w_i where w_i = clip(1/θ_i, cap)
Self-normalization removes the bias introduced by clipping. Bounded variance. Production-dominant at Netflix/Spotify.
Given n observations and SNIPS estimate r̂:
α_new = α_old + n · r̂
β_new = β_old + n · (1 - r̂)
Cap n ≤ 10 per batch to prevent overshoot from correlated samples.
ESS = (Σw)² / Σw²
If ESS << n, weights are concentrated (few observations dominate) → confidence interval wider.
npx claudepluginhub montinou/quothCreates, edits, and optimizes skills for Claude Code, including drafting, evaluating with test prompts, iterating on performance, and improving skill descriptions for better triggering accuracy.