From flaker
Manages @mizchi/flaker operations post-setup: daily runs, flaky metrics review, advisory/required CI gate design, Playwright E2E/VRT promotions/demotions, PR budget tuning, nightly triage, quarantine/@flaky tags in OSS repos.
npx claudepluginhub mizchi/flaker --plugin flakerThis skill uses the workspace's default tool permissions.
`flaker-management` is the operational companion to `flaker-setup`.
assets/flaker.playwright-vrt-bootstrap.tomlassets/flaker.playwright-vrt-standard.tomlassets/flaker.playwright-vrt-strict.tomlassets/test-contract-template.mdassets/weekly-review-template.mdassets/workflow-nightly-learning.ymlassets/workflow-pr-advisory.ymlreferences/management-guide.ja.mdreferences/presets.ja.mdreferences/theory.ja.mdSets up @mizchi/flaker test intelligence CLI on new repositories, configures flaker.toml with profiles and quarantine settings, integrates into GitHub Actions.
Detects and troubleshoots flaky tests in Jest and pytest suites. Provides guidance, code, configs for unit/integration testing, mocking, and stability.
Debugs and fixes flaky Playwright E2E tests using LLM reports from GitHub Actions and Datadog. Use for investigating intermittent failures, triaging flakiness, or stabilizing tests.
Share bugs, ideas, or general feedback.
flaker-management is the operational companion to flaker-setup.
flaker-setup
Install, initialize, and wire the first advisory lane.flaker-management
Run the lane over time, review health via drift, promote or demote checks, and keep flaky tests from eroding trust.If the repository does not have flaker.toml and no CI lane yet, use flaker-setup first.
flaker.toml is the desired state (gates, profiles, [promotion] thresholds, [quarantine].auto).flaker apply is the reconciler — idempotent; safe to run hourly/daily/on-demand. It auto-runs collect / calibrate / quarantine apply as needed based on current DB state.flaker status is the drift detector — reports which [promotion] thresholds are unmet, so promotion readiness is a boolean (ready / not ready), not a judgement call.The canonical daily loop is:
flaker apply && flaker status
../../docs/operations-guide.ja.md or ../../docs/operations-guide.md first, depending on the user's language.../../docs/flaker-management-quickstart.ja.md or ../../docs/flaker-management-quickstart.md for the first 10 minutes.references/management-guide.ja.md for the full operating model.references/theory.ja.md.references/presets.ja.md.assets/ instead of rewriting them.flaker.toml — especially the [promotion] thresholds (defaults are documented; overriding signals intent)pull_request, push, scheduleflaker status output (drift + activity + health in one page)flaker status --gate merge --detail --json when you need exact promotion metricsflaker ops weekly for quarantine / flaky trend bundles@flaky tagging or quarantine manifest is already in useWhen applying this skill, return:
learning / verdict / rebalance[promotion] in flaker.toml; override only with justification)flaker apply && flaker status)flaker commands, config, and workflow snippets--gate merge to required until flaker status drift reports ready.analyze kpi, analyze eval, collect ci, debug doctor, quarantine suggest/apply, gate review/history/explain all print deprecation warnings in 0.7.0 and will be removed in 0.8.0. Use the primary commands instead.# Daily
flaker apply
flaker status
# Weekly operator review
flaker status --markdown > .artifacts/status-weekly.md
flaker ops weekly --output .artifacts/flaker-weekly.md
flaker status --gate merge --detail --json > .artifacts/merge-gate.json
# Promotion snapshot (authoritative metrics)
flaker gate review merge --json > .artifacts/gate-review-merge.json # DEPRECATED in 0.7.0; use status --gate merge --detail --json
# Incident
flaker debug retry
flaker debug confirm "<suite>:<test>" --repeat 10
flaker debug bisect --test "<name>"
Note: ops daily / weekly / incident are still first-class primary commands — apply does NOT emit the daily artifact yet. Use them directly.
Promote --gate merge advisory → required iff flaker status drift reports ready (all 5 [promotion] thresholds met). Primary signal is flaker status — the drift section shows ready or lists unmet thresholds:
matched_commits ≥ [promotion].matched_commits_min (default 20)false_negative_rate ≤ [promotion].false_negative_rate_max_percentage (default 5%)pass_correlation ≥ [promotion].pass_correlation_min_percentage (default 95%)holdout_fnr ≤ [promotion].holdout_fnr_max_percentage (default 10%)data_confidence ≥ [promotion].data_confidence_min (default moderate)Demote back to advisory when ANY of the following holds for 1+ week:
flaker collect ci / flaker collect calibrate (deprecated in 0.7.0) in daily cron when flaker apply already handles the ordering and idempotency.flaker analyze kpi (deprecated) instead of flaker status, or flaker analyze eval --markdown (deprecated) instead of flaker status --markdown.flaker status numbers alone when they look close — flaker status --gate merge --detail --json is the authoritative source for exact values (the deprecated flaker gate review merge --json form also still works with a stderr warning).flaker status drift holdout_fnr when holdout_ratio = 0; if holdout isn't configured, the threshold cannot be evaluated and drift treats it as unmet. Either configure [sampling].holdout_ratio or accept that holdout FNR will gate promotion.