From wicked-garden
Re-baseline procedure for the AC-11 gate-result benchmark lane (`tests/crew/test_gate_result_benchmark.py`). The benchmark enforces a 2× p95 SLO on `gate-result.json` ingestion. When a deliberate perf change lands on main (validator hardening, cache tuning, schema expansion), the baseline needs updating. Never re-baseline to silence a regression. Use when: "re-baseline AC-11 benchmark", "gate-result benchmark regression", "p95 benchmark baseline out of date", "update benchmark_baseline.json", "benchmark.yml failure", "gate-result p95 exceeds 2x baseline", "rebaseline procedure", or `AC-11` baseline drift.
npx claudepluginhub mikeparcewski/wicked-garden --plugin wicked-gardenThis skill uses the workspace's default tool permissions.
Operational procedure for updating `tests/crew/benchmark_baseline.json` after a legitimate perf change on `main`.
Provides Ktor server patterns for routing DSL, plugins (auth, CORS, serialization), Koin DI, WebSockets, services, and testApplication testing.
Conducts multi-source web research with firecrawl and exa MCPs: searches, scrapes pages, synthesizes cited reports. For deep dives, competitive analysis, tech evaluations, or due diligence.
Provides demand forecasting, safety stock optimization, replenishment planning, and promotional lift estimation for multi-location retailers managing 300-800 SKUs.
Operational procedure for updating tests/crew/benchmark_baseline.json after a legitimate perf change on main.
Re-baseline only when a deliberate perf change lands on main. Examples:
gate_result_schema.py)phase_manager.py::_load_gate_result)gate-result.json)content_sanitizer.py)Do not re-baseline to silence a regression. If the benchmark fails on a PR that didn't intend perf work, treat it as a real regression and find the cause.
.github/workflows/benchmark.ymlscripts/crew/phase_manager.py, gate_result_schema.py, content_sanitizer.py, dispatch_log.py, or the benchmark test/baseline themselves. Other PRs skip the lane to keep default CI cost flat.tests/crew/test_gate_result_benchmark.py::test_load_gate_result_p95_within_2x_baselinebenchmark — opt-in. Local uv run pytest deselects it (see pyproject.toml addopts = "-m 'not benchmark'").uv run pytest -m benchmarkgate-result.json files cycling through 1 KB / 4 KB / 16 KB / 60 KB (bounded by MAX_SUMMARY_BYTES). Each round clears the memoization cache so the full validate + sanitize + cache-insert path is measured. Cache-hit timing is not part of the SLO.tests/crew/benchmark_baseline.json — p95_ns in nanoseconds.p95_current ≤ slo_multiplier × p95_baseline (default slo_multiplier = 2.0). The workflow comments the delta on every triggered PR.benchmark_baseline.json is deleted or malformed, the test soft-skips with a directive to record one. The SLO is not enforced until a valid baseline is present — so a re-baseline PR can merge without a circular dependency.Check out the target main commit.
git checkout main && git pull
Run the benchmark three times. The SLO is against p95, and CI noise varies — taking the highest of three local runs adds margin.
uv run pytest -m benchmark tests/crew/test_gate_result_benchmark.py -s
uv run pytest -m benchmark tests/crew/test_gate_result_benchmark.py -s
uv run pytest -m benchmark tests/crew/test_gate_result_benchmark.py -s
Record the highest p95_current reading from the three runs.
Update tests/crew/benchmark_baseline.json:
| Field | New value |
|---|---|
p95_ns | The highest p95_current from step 3, rounded up |
recorded_on | Today's date (YYYY-MM-DD) |
recorded_from_commit | Short SHA of the target main commit |
slo_multiplier | Leave at 2.0 unless the AC-11 contract changes |
rebaseline_procedure | Leave at wicked-garden:platform:gate-benchmark-rebaseline |
Commit the baseline update in a dedicated PR. Title convention:
chore(benchmark): re-baseline AC-11 after {change-summary}
The benchmark workflow runs on the PR and must pass (p95 should be well under 2× the new baseline since you just measured it).
Strict mode for gate-result ingestion (WG_GATE_RESULT_STRICT_AFTER, default 2026-06-18) requires the AC-11 benchmark lane to be active in CI. Rationale: once strict-mode activates, a silent 2×+ perf regression would push every approve_phase call over the SLO without a guard.
If the benchmark lane is broken or disabled on main:
WG_GATE_RESULT_STRICT_AFTER out (env var or default-constant update). Do not let strict-mode activate without benchmark enforcement.For production rollback of a specific ingestion check, prefer env-var soft-disable over git-revert:
| Variable | Effect |
|---|---|
WG_GATE_RESULT_SCHEMA_VALIDATION=off | Skip schema validator |
WG_GATE_RESULT_CONTENT_SANITIZATION=off | Skip content sanitizer |
WG_GATE_RESULT_DISPATCH_CHECK=off | Skip dispatch-log orphan check |
All flags auto-expire at WG_GATE_RESULT_STRICT_AFTER.
tests/crew/test_gate_result_benchmark.pytests/crew/benchmark_baseline.json.github/workflows/benchmark.ymlscripts/crew/phase_manager.py::_load_gate_resultscripts/crew/gate_result_schema.pyscripts/crew/content_sanitizer.pyscripts/crew/dispatch_log.py