Search everything...

Slash Command

/kb-eval

Run the golden Q&A evaluation against the current wiki. For each question in tests/golden-qna.md, ask the KB and judge the answer in-conversation. Writes vault/meta/golden-qna-report.md.

npx claudepluginhub popmechanic/skylights-agents --plugin skylights-kb

Popularity

Forks

Invocation

How this command is triggered — by the user, by Claude, or both

Slash command

/skylights-kb:kb-eval [--fail-below 0.9]

Model invocable

No pre-commands

Tool Access

This command is limited to the following tools:

ReadWriteBash(tools/py.sh *)

Context Preview

The summary Claude sees in its command listing — used to decide when to auto-load this command

Run the golden Q&A evaluation.

## Steps

1. Get the question list:
   
   Parse the JSON: each entry has `id`, `question`, `expected_facts`.

2. For each question:
   a. Run `/kb-ask <question>` (or invoke the same logic — read the context, synthesize an answer with citations).
   b. Compare the answer to `expected_facts`. Be a strict but fair judge: only mark `pass` if every expected fact is clearly covered. Mark `partial` if most are covered. Mark `fail` if many are missing.
   c. Record the result:
      

3. Finalize:
   
   Optionally pass `--fail-below 0.9` to exit non-zero if the pa...

Command Content

35 lines · ~332 tokens

Stats

LanguagePython

Stars0

Forks1

MaintenanceExcellent

Last CommitJun 8, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

Stats

Actions

Help us improve

Share bugs, ideas, or general feedback.

/kb-eval | skylights-kb | ClaudePluginHub

Slash Command

/kb-eval

From skylights-kb

Run the golden Q&A evaluation against the current wiki. For each question in tests/golden-qna.md, ask the KB and judge the answer in-conversation. Writes vault/meta/golden-qna-report.md.

npx claudepluginhub popmechanic/skylights-agents --plugin skylights-kb

Popularity

Forks

Invocation

How this command is triggered — by the user, by Claude, or both

Slash command

/skylights-kb:kb-eval [--fail-below 0.9]

Model invocable

No pre-commands

Tool Access

This command is limited to the following tools:

ReadWriteBash(tools/py.sh *)

Context Preview

The summary Claude sees in its command listing — used to decide when to auto-load this command

Run the golden Q&A evaluation.

## Steps

1. Get the question list:
   
   Parse the JSON: each entry has `id`, `question`, `expected_facts`.

2. For each question:
   a. Run `/kb-ask <question>` (or invoke the same logic — read the context, synthesize an answer with citations).
   b. Compare the answer to `expected_facts`. Be a strict but fair judge: only mark `pass` if every expected fact is clearly covered. Mark `partial` if most are covered. Mark `fail` if many are missing.
   c. Record the result:
      

3. Finalize:
   
   Optionally pass `--fail-below 0.9` to exit non-zero if the pa...

Command Content

35 lines · ~332 tokens

Run the golden Q&A evaluation.

Steps

Get the question list:
```
cd ${CLAUDE_PLUGIN_ROOT}
tools/py.sh tools/eval_golden_qna.py list
```
Parse the JSON: each entry has id, question, expected_facts.
For each question: a. Run /kb-ask <question> (or invoke the same logic — read the context, synthesize an answer with citations). b. Compare the answer to expected_facts. Be a strict but fair judge: only mark pass if every expected fact is clearly covered. Mark partial if most are covered. Mark fail if many are missing. c. Record the result:
```
tools/py.sh tools/eval_golden_qna.py write-result \
  --id Q01 --verdict pass --missing "" --answer "<the answer>"
```
Finalize:
```
tools/py.sh tools/eval_golden_qna.py finalize
```
Optionally pass --fail-below 0.9 to exit non-zero if the pass rate is below 90%.
Report the summary to the owner. Show the path to the full report (vault/meta/golden-qna-report.md).

Stats

LanguagePython

Stars0

Forks1

MaintenanceExcellent

Last CommitJun 8, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

/kb-eval

Popularity

Invocation

Tool Access

Context Preview

Command Content

Help us improve

Help us improve

Find plugins for your project

/kb-eval

Popularity

Invocation

Tool Access

Context Preview

Command Content

Steps

Help us improve

Steps