Reads PRD and design doc, proposes verifiable completion criteria and mandatory hard gates for Flutter Flame games, and marks the contract as AGREED for code generation pipeline.
How this skill is triggered — by the user, by Claude, or both
Slash command
/flutter-flame-harness:flame-harness-contractThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Phase 4 of the flutter-flame-harness pipeline. Reads the PRD and design doc, proposes verifiable
Phase 4 of the flutter-flame-harness pipeline. Reads the PRD and design doc, proposes verifiable completion criteria (Mandatory Hard Gates + Functional Criteria), and marks the contract AGREED before handing off to the generator.
All file schemas (config.md, state.md, contract.md) are defined in
docs/harness-protocol.md — refer to that document as the single source of truth. Do not
redefine schemas here.
docs/harness/config.mdExtract:
| Key | Use |
|---|---|
app_name | Contract title header |
app_slug | Used to identify the game |
strict_mode | If true, run --strict negotiation; if false (or absent), run default 1-pass |
Find the most recent file matching docs/harness/plans/*-prd.md (sort by filename descending,
take the first). Abort with a clear message if none exists.
Find the most recent file matching docs/harness/plans/*-design.md (sort by filename
descending, take the first). Abort if none exists.
These 8 criteria are non-negotiable and must appear verbatim in every contract.md. A FAIL on
any one results in an immediate FAIL verdict, regardless of other passing criteria.
Source:
docs/harness-protocol.md§3 — Mandatory Hard Gates block. Always cite; never restate differently.
flutter analyze returns zero issues.flutter test — all tests pass.game_config.dart — no magic numbers in gameplay code.default_language, plus English when default_language ≠ en — no missing l10n keys.Copy these 8 lines verbatim into the ## Mandatory Hard Gates section of contract.md. Do not
paraphrase, reorder, or omit any gate.
In addition to the 8 core gates above, every contract includes a ## Platform-Robustness Gates
section requiring the patterns in docs/game-gotchas.md (cite it). These are mandatory:
AudioPool (not per-call FlameAudio.play()); BGM stops on game-over, app-background, and teardown.Haptics helper with platform guard + throttle +
enabled toggle + try/catch; gameplay never calls HapticFeedback.* raw.WidgetsBindingObserver; on background → pauseEngine() +
BGM pause; resume reverses; teardown cleans up audio/timers.world.children.whereType<...>() in a hot path (cache once per
frame); Paint/shaders not recreated per frame.CFBundleDisplayName/android:label) is
set to app_name — not "Runner"/the slug.config.orientation (the unused
orientation removed — no launch rotate); iPhone-only (TARGETED_DEVICE_FAMILY = 1, no iPad);
ITSAppUsesNonExemptEncryption = false; root back-button = SnackBar double-press-to-exit;
bundle id byte-identical across iOS (PRODUCT_BUNDLE_IDENTIFIER) and Android (applicationId)
== config.bundle_id (lowercase, no _/-).pubspec.yaml asset exists); a
.github/workflows CI (analyze + test) is present.supply (metadata/android/<locale>/images/icon.png +
featureGraphic.png); iOS has localized screenshots.shared_preferences
alone) — a SaveRepository mirrors the save blob to iOS Keychain (flutter_secure_storage,
first_unlock) + Android Block Store (play_services_block_store) + a shared_preferences cache,
reading durable-first and writing all tiers in try/catch; PreferencesService routes through it.MediaQuery.disableAnimations) is read and damps screen-shake/particles/
flashing; menu/overlay buttons are ≥48×48 dp with Semantics labels (icon-only buttons especially);
in-game HUD keeps withNoTextScaling. (Full in-game screen-reader support is out of scope.)flutter test merely passing, the game ships ≥3 system unit tests
(score/spawn/difficulty/economy — real logic), ≥1 widget test (a menu/overlay renders + a button
taps), and ≥1 integration test (core loop: boot → play → game-over). Mirrors the shipped games;
prevents a green flutter test that actually covers nothing.Per-game acceptance criteria derived from the PRD. Each criterion must be verifiable by one of:
grep -r "TODO" lib/ returns no matches).Read the PRD sections carefully:
| PRD section | What to extract |
|---|---|
| Core Loop | One criterion per loop step (each step must be observable) |
| Game Mechanics | One criterion per mechanic; specify measurable threshold |
| Win/Lose Conditions | Explicit pass/fail states |
| Content Metrics | Count targets (e.g. "≥ 3 enemy types defined in data files") |
| Progression & Economy | Score increments, reward triggers |
Criteria that are NOT verifiable (e.g. "the game is fun") must be rewritten as observable checks or removed. Every criterion that references a timing threshold must cite a concrete number (e.g. "responds within 100 ms" not "responds quickly").
Aim for 5–10 Functional Criteria per game. More is not better — specificity is.
Before marking AGREED, run:
grep -rn "TODO\|stub\|placeholder\|스텁\|미구현" lib/ --include="*.dart"
If this command returns any output, the contract must NOT be marked AGREED. Remove or resolve every stub before proceeding.
When strict_mode is false or absent:
docs/harness/contract.md with:
docs/harness-protocol.md §3).## Status: AGREED in the same pass.state.md and hands off to the generator phase.When strict_mode: true in config.md, or when the skill is invoked with --strict:
contract.md (same as default, steps 1–3 above).## Status: AGREED and proceeds (same state.md write as default mode).Phase B note: Phase B may reintroduce Evaluator-side contract negotiation with a Generator↔Evaluator round-trip; Phase A uses rigorous self-review only.
In both modes the contract file must contain ## Status: AGREED before the generator phase
begins coding.
docs/harness/contract.mdUse the layout defined in docs/harness-protocol.md §3:
# Contract — <app_name>
## Mandatory Hard Gates
These criteria are non-negotiable. A FAIL on any one of these results in an immediate FAIL verdict,
regardless of other passing criteria.
1. `flutter analyze` returns zero issues.
2. `flutter test` — all tests pass.
3. No TODO, stub, or placeholder in game logic (grep-checkable).
4. All tuning constants centralized in `game_config.dart` — no magic numbers in gameplay code.
5. Game content (enemies / levels / waves) is defined as data, not hardcoded.
6. Localization complete for all configured locales — `default_language`, plus English when `default_language` ≠ `en` — no missing l10n keys.
7. Core loop works end to end: start → play → win/lose → restart.
8. Runs on a simulator/emulator with zero crashes and zero console errors.
## Functional Criteria (per game)
<!-- Generator adds game-specific acceptance criteria here. -->
<!-- Each criterion must be verifiable by command, screenshot, or code path. -->
## Status: AGREED
If docs/harness/ does not exist, create it before writing.
state.mdUpdate docs/harness/state.md per the schema in docs/harness-protocol.md §2:
status: running
current_phase: contract
next_role: generator
current_round: 1
updated_at: "<ISO-8601 UTC now>"
Leave all other keys unchanged. Use Edit for a targeted update.
Per docs/harness-protocol.md §7 (Phase Transition Table): the contract → AGREED event sets
next_role: generator and current_round: 1.
pipeline-log.mdAppend one row to docs/harness/pipeline-log.md per the schema in docs/harness-protocol.md §6:
| <ISO-8601 UTC now> | complete | contract | contract AGREED; <N> functional criteria; next: generator |
state.md to status: paused,
pause_reason: manual_action.state.md to
status: paused, pause_reason: manual_action.config.md cannot be read, abort immediately (do not write partial output).--strict mode, the Generator self-marks AGREED after the rigorous second-pass review;
there is no multi-round Evaluator loop in Phase A. max_rounds applies only to the
generator→evaluator cycle (Phase 5–6), not to contract negotiation.npx claudepluginhub tjdrhs90/flutter-flame-harness --plugin flutter-flame-harnessRuns a skeptical QA gate for Flutter Flame games: executes static analysis, tests, stub detection, and contract validation. Default mode checks functionality; --strict adds quality scoring and edge-case sweeps.
Negotiates a file-based sprint contract between generator and evaluator agents, translating product specs into testable acceptance criteria with rubric tie-ins. Triggered by /mk:sprint-contract or sprint scope negotiation.
Enforces an 8-gate testability verification before commit/release: mandate-map coverage, real-deal-first ratio, AI eval coverage, mutation kill-rate, contract/property/determinism tests. Useful for features in mandate classes (parser, payment, RPC, state machine, UI, AI).