Runs automated field tests on all registered GNOME extensions with ego-lint, diffs against baselines, classifies findings as TP/FP/borderline, and generates regression reports. Useful for extension quality assurance.
npx claudepluginhub zvibaratz/gnome-extension-reviewerThis skill uses the workspace's default tool permissions.
Run batch ego-lint across all field test extensions with baseline diffing, finding classification, and regression reporting.
Mandates invoking relevant skills via tools before any response in coding sessions. Covers access, priorities, and adaptations for Claude Code, Copilot CLI, Gemini CLI.
Share bugs, ideas, or general feedback.
Run batch ego-lint across all field test extensions with baseline diffing, finding classification, and regression reporting.
--review: Run ego-review on ALL extensions via headless claude -p (~$0.50-2.00/ext, ~5-10 min each)--review-changed: Run ego-review only on extensions with changed lint results--review-dry-run: Hydrate review prompts without invoking claude -p (for testing templates)--parallel N: Max concurrent claude -p sessions (default: 3)--update-baselines: Save current lint results as new baselines after reviewbash scripts/field-test-runner.sh --no-fetch
If extensions aren't cached yet:
bash scripts/field-test-runner.sh
This runs ego-lint on all extensions in field-tests/manifest.yaml, produces JSON results per extension, diffs against baselines, and appends to history.
To also run ego-review (headless Claude):
# Review all extensions (~$5-20 total, ~20-30 min with parallel 3)
bash scripts/field-test-runner.sh --no-fetch --review
# Review only changed extensions (cheaper, faster)
bash scripts/field-test-runner.sh --no-fetch --review-changed
# Test prompt hydration without invoking Claude
bash scripts/field-test-runner.sh --no-fetch --review-dry-run --extension hara-hachi-bu
# Control concurrency
bash scripts/field-test-runner.sh --no-fetch --review --parallel 5
Read the summary:
field-tests/results/<latest-timestamp>/summary.json
For each extension, read:
field-tests/results/<latest-timestamp>/<name>.lint.json — full resultsfield-tests/results/<latest-timestamp>/<name>.diff.json — diff from baseline (if baseline exists)For each extension with unannotated_findings in the diff:
field-tests/annotations/<name>.yaml with the classificationClassification guide:
--review or --review-changed)The runner handles this automatically when --review or --review-changed is passed:
scripts/review-prompt.md with lint JSON, diff, and annotationsclaude -p with the hydrated prompt, --plugin-dir, --add-dir, 10-minute timeout, $4 budget cap--parallel N (default 3) using background subshells + poll loopfield-tests/results/<timestamp>/<name>.review.mdok, timeout, error, skipped, excluded, dry-run, no-report, noneUse --review-dry-run to test prompt hydration without invoking Claude. Hydrated prompts are saved to <name>.review-prompt.md.
Produce field-tests/reports/<date>-regression.md with:
field-tests/history.jsonl — FP count on approved extensions over last N runs--review)Update the "Latest Lint Results" and "Annotation Coverage" tables in field-tests/README.md to reflect the current run:
history.jsonl entriesannotations/*.yaml and the diff outputKeep the "Extension Catalog" and "Code Metrics" sections unchanged unless a new extension was added to the manifest.
For new false positives on EGO-approved extensions that are confirmed FP (not borderline):
Create a GitHub issue:
false-positiveFalse positive: R-XXXX-NN on <extension>--update-baselines)bash scripts/field-test-runner.sh --update-baselines --no-fetch
field-tests/
├── manifest.yaml # Extension sources (committed)
├── cache/ # Downloaded extensions (gitignored)
├── baselines/ # Golden JSON snapshots (committed)
├── annotations/ # Finding classifications (committed)
├── results/ # Timestamped run output (gitignored)
├── history.jsonl # Trend data (committed)
└── reports/ # Regression reports (committed)
scripts/
├── field-test-runner.sh # Bash orchestrator
├── parse-manifest.py # Manifest YAML → JSON
├── parse-lint-results.py # ego-lint output → JSON
├── diff-baselines.py # Baseline comparison
├── review-prompt.md # Review prompt template ({{PLACEHOLDER}} tokens)
└── hydrate-review-prompt.py # Template hydrator (lint JSON + diff + annotations)
findings:
- id: "R-SEC-22::dconf CLI spawn"
classification: tp
notes: "dconf import/export — legitimate but needs disclosure"
- id: "init/shell-modification::constructor"
classification: fp
notes: "Constructor called from enable(). Fixed in PR #21"
/ego-field-test — see immediate impact across all extensionsfield-tests/README.md results and annotation tables/ego-field-test --update-baselines to snapshot improved state