Help us improve
Share bugs, ideas, or general feedback.
From skylights-kb
Run the golden Q&A evaluation against the current wiki. For each question in tests/golden-qna.md, ask the KB and judge the answer in-conversation. Writes vault/meta/golden-qna-report.md.
npx claudepluginhub popmechanic/skylights-agents --plugin skylights-kbHow this command is triggered — by the user, by Claude, or both
Slash command
/skylights-kb:kb-eval [--fail-below 0.9]This command is limited to the following tools:
The summary Claude sees in its command listing — used to decide when to auto-load this command
Run the golden Q&A evaluation.
## Steps
1. Get the question list:
Parse the JSON: each entry has `id`, `question`, `expected_facts`.
2. For each question:
a. Run `/kb-ask <question>` (or invoke the same logic — read the context, synthesize an answer with citations).
b. Compare the answer to `expected_facts`. Be a strict but fair judge: only mark `pass` if every expected fact is clearly covered. Mark `partial` if most are covered. Mark `fail` if many are missing.
c. Record the result:
3. Finalize:
Optionally pass `--fail-below 0.9` to exit non-zero if the pa...Share bugs, ideas, or general feedback.
Run the golden Q&A evaluation.
Get the question list:
cd ${CLAUDE_PLUGIN_ROOT}
tools/py.sh tools/eval_golden_qna.py list
Parse the JSON: each entry has id, question, expected_facts.
For each question:
a. Run /kb-ask <question> (or invoke the same logic — read the context, synthesize an answer with citations).
b. Compare the answer to expected_facts. Be a strict but fair judge: only mark pass if every expected fact is clearly covered. Mark partial if most are covered. Mark fail if many are missing.
c. Record the result:
tools/py.sh tools/eval_golden_qna.py write-result \
--id Q01 --verdict pass --missing "" --answer "<the answer>"
Finalize:
tools/py.sh tools/eval_golden_qna.py finalize
Optionally pass --fail-below 0.9 to exit non-zero if the pass rate is below 90%.
Report the summary to the owner. Show the path to the full report (vault/meta/golden-qna-report.md).