flaker
flaker is a test-intelligence toolkit for:
- sampling a smaller local test run from history and changed files
- detecting flaky tests in noisy CI environments
- measuring how well local sampled runs predict CI
- embedding the same core logic in MoonBit as a library
It is designed for repositories where:
- the full test suite is too expensive to run on every change
- CI failures are noisy because flaky tests are mixed with real regressions
- developers need a smaller local test run that still correlates well with CI
flaker helps answer:
- Which tests should I run for this change?
- How much can I shrink local execution without losing too much confidence?
- Which tests are actually flaky?
- How well does local sampled execution predict CI outcomes?
Upgrading from 0.0.x / 0.1.x? See docs/how-to-use.md#config-migration for the full key rename map. Starting with 0.2.0, the CLI refuses to start on legacy configs and points to the migration guide.
Upgrading from 0.4.x? See docs/migration-0.4-to-0.5.md or docs/migration-0.4-to-0.5.ja.md. 0.5.x keeps existing profiles working, but the recommended user-facing commands are now gate-oriented.
Install as a CLI
pnpm add -D @mizchi/flaker
Or run it without installing:
pnpm dlx @mizchi/flaker --help
Requirements:
Install as a Claude Code plugin
This repo also ships a Claude Code plugin with two skills:
flaker-setup
Introduce flaker on a fresh repository. Day 0 → Week 4 onboarding flow, decision points, copy-paste commands, and pitfalls.
flaker-management
Operate flaker after setup. Advisory vs required gating, nightly triage, quarantine, flaky tag management, and staged Playwright E2E / VRT rollout.
# In Claude Code
/plugin marketplace add mizchi/flaker
/plugin install flaker@flaker
Then ask the agent something like:
- "新しいプロジェクトに flaker をセットアップしたい"
- "flaker の advisory を required に上げる条件を決めたい"
- "E2E VRT の nightly triage を設計したい"
The setup reference checklist lives at docs/new-project-checklist.ja.md and docs/new-project-checklist.md.
The 0.4.x -> 0.5.x migration guide lives at docs/migration-0.4-to-0.5.ja.md and docs/migration-0.4-to-0.5.md.
The user guide lives at docs/usage-guide.ja.md and docs/usage-guide.md.
The operations guide lives at docs/operations-guide.ja.md and docs/operations-guide.md.
The operations quick start lives at docs/flaker-management-quickstart.ja.md and docs/flaker-management-quickstart.md.
Use as a MoonBit Library
flaker also publishes a MoonBit library surface at mizchi/flaker.
The root package re-exports both:
- pure computation APIs
- the shared contract types they consume and return
If you prefer a stricter import boundary, the same types are still available
from mizchi/flaker/contracts.
import {
"mizchi/flaker" @flaker,
}
test "sample from historical runs" {
let meta = @flaker.build_sampling_meta(
[
@flaker.SamplingHistoryRowInput::{
suite: "tests/login.spec.ts",
test_name: "login works",
task_id: Some("web-login"),
filter: None,
variant: None,
test_id: None,
status: "passed",
retry_count: 0,
duration_ms: 1200,
created_at: "2026-04-03T00:00:00.000Z",
},
],
[
@flaker.SamplingListedTestInput::{
suite: "tests/login.spec.ts",
test_name: "login works",
task_id: Some("web-login"),
filter: None,
variant: None,
test_id: None,
},
],
)
let sampled = @flaker.sample_weighted(meta, count=1, seed=1UL)
assert_eq(sampled.length(), 1)
}
The root library surface intentionally re-exports pure logic only:
- flaky detection:
detect_flaky
- sampling:
build_sampling_meta, sample_random, sample_weighted, sample_hybrid
- affected analysis:
resolve_affected, build_affected_report, build_affected_report_from_input
- stable identity:
create_stable_test_id, resolve_test_identity
- graph helpers:
find_affected_nodes, expand_transitive, topological_sort
- report reducers:
summarize_report, classify_report_diff, aggregate_report
- policy:
summarize_quarantine, compute_quarantine_exit_code, run_config_check
- metrics:
build_sampling_kpi
Contracts remain separate so the API boundary stays explicit and reusable from
other packages.
Experimental Direct MoonBit CLI