Help us improve
Share bugs, ideas, or general feedback.
From coscientist-deep-research
Runs a checklist of named adversarial attacks against a paper or manuscript — p-hacking, HARKing, selective baselines, missing controls, underpowered, circular reasoning, oversold deltas, irreproducibility. Each attack returns either "pass", "minor", or "fatal" with evidence. Used by the `red-team` sub-agent.
npx claudepluginhub epireve/coscientist --plugin coscientist-deep-researchHow this skill is triggered — by the user, by Claude, or both
Slash command
/coscientist-deep-research:attack-vectorsThe summary Claude sees in its skill listing — used to decide when to auto-load this skill
Generic critique is cheap. Named attacks are useful. This skill runs a structured checklist of known methodological failure modes and produces an attack log with severity per attack.
Creates p5.js generative art with seeded randomness, noise fields, and interactive parameter exploration. Use for algorithmic art, flow fields, or particle systems.
Share bugs, ideas, or general feedback.
Generic critique is cheap. Named attacks are useful. This skill runs a structured checklist of known methodological failure modes and produces an attack log with severity per attack.
| # | Attack | What it catches | Evidence required |
|---|---|---|---|
| 1 | p-hacking | Multiple comparisons without correction; unexplained p-values clustered near 0.05 | Report must show correction (Bonferroni, FDR, pre-registration); OR suspicious p-value distribution |
| 2 | HARKing | Hypothesizing After Results Known — hypothesis framed to match findings | Pre-registration present? Hypotheses in intro match exploratory analyses? |
| 3 | Selective baselines | Compared only to weak baselines; SOTA baseline omitted | List of baselines actually used vs. current literature's strong baselines |
| 4 | Missing controls | No negative control, no ablation, no placebo condition | Controls section present and adequate for the claim |
| 5 | Confounders | Known confounding variables not measured or controlled | Methods section addresses the obvious confounders for this domain |
| 6 | Underpowered | Sample size too small to detect claimed effect | Power analysis reported; OR n justified by pre-specified criterion |
| 7 | Circular reasoning | Evaluation uses training data, or defines the outcome in terms of the predictor | Data splits explicit; outcome and predictor independent |
| 8 | Oversold delta | Abstract claims larger improvement than tables show | Abstract numbers match table numbers; headline claim within CI |
| 9 | Irreproducibility | No code, no data, insufficient method detail | Code link valid? Data available? Hyperparameters complete? |
| 10 | Cherry-picked test set | Performance on one favorable test set generalized | Multiple datasets tested, or one dataset explicitly justified |
| 11 | Inappropriate statistics | Wrong test for the data distribution or sample size | Test matches data type; assumptions checked |
| 12 | Goodhart's law | Optimizes a metric that doesn't capture the stated goal | Metric discussed as a proxy; secondary metrics reported |
Add more attacks per domain. Keep the checklist small and sharp, not comprehensive-but-useless.
content.md + metadata.json + any figures/, tables/, equations.json.pass, minor, or fatal, with one-sentence evidence.fatal findings, steelman the paper first — is there a reading under which this isn't a fatal flaw? If yes, demote to minor.uv run python .claude/skills/attack-vectors/scripts/check.py \
--input /tmp/attack-findings.json \
--target-canonical-id <cid>
The checker validates structure and writes the attack log to the paper's artifact under attack_findings.json. On fatal findings, it logs a row in the attack_findings table with severity='fatal'.
From RESEARCHER.md: 4 (Tension, not fake consensus), 8 (Steelman before attack).
publishability-check)novelty-check)red-team sub-agent turns into a review