From sensei
Evaluate what the tests actually prove and what they miss. Use when reviewing a PR's test coverage, when a developer says "I wrote tests", or when you want to challenge the developer to reason about test quality — not just test quantity.
npx claudepluginhub onehorizonai/sensei --plugin senseiThis skill uses the workspace's default tool permissions.
Evaluate what the tests prove and what they fail to prove.
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Guides code writing, review, and refactoring with Karpathy-inspired rules to avoid overcomplication, ensure simplicity, surgical changes, and verifiable success criteria.
Provides UI/UX resources: 50+ styles, color palettes, font pairings, guidelines, charts for web/mobile across React, Next.js, Vue, Svelte, Tailwind, React Native, Flutter. Aids planning, building, reviewing interfaces.
Share bugs, ideas, or general feedback.
Evaluate what the tests prove and what they fail to prove.
Tests that pass are not the same as tests that verify the behavior.
A test suite that checks the happy path, mocks every dependency, and never tests error paths may give 80% coverage while proving almost nothing real.
The question is not: did the tests pass? The question is: what would have to be true about the code for these tests to catch a regression?
Ask the developer to answer that question before reviewing the tests yourself.
What does each test actually verify?
What is not tested?
Are security-sensitive behaviors tested?
Are characterization tests present for legacy or unfamiliar code?
Are the mocks meaningful?
Is this the right test level?
What is the failure mode?
Plain-English takeaway:
[Whether a non-technical owner should feel confident, cautious, or blocked]
What the tests prove:
[Concrete list — not "tests happy path" but "proves that X returns Y when Z"]
What the tests do not prove:
[Specific uncovered scenarios]
Characterization coverage:
[If legacy or unfamiliar code changed: what current behavior was pinned before the change, "missing", or "not applicable"]
Security coverage:
[What sign-in, permission, input, secret, privacy, or customer-boundary behavior is proven, missing, or not applicable]
Riskiest uncovered case:
[The one missing test most likely to correspond to a real bug]
Mock quality:
[Are mocks realistic? What assumptions do they embed?]
Evidence to add next:
[The smallest test or check that would reduce the biggest risk]
Question for you:
[A specific question about what the developer was trying to prove]