From knowledge-distillery
Collects the actual content of all evidence sources identified in a PR's Evidence Bundle Manifest and produces a structured Evidence Bundle. Stage B step 1 — transforms identifier references into full content for downstream candidate extraction. Called by batch-refine orchestrator per PR.
npx claudepluginhub ether-moon/knowledge-distillery --plugin knowledge-distilleryThis skill uses the workspace's default tool permissions.
- Called by `/knowledge-distillery:batch-refine` orchestrator as a subagent per PR
Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.
Checks Next.js compilation errors using a running Turbopack dev server after code edits. Fixes actionable issues before reporting complete. Replaces `next build`.
Guides code writing, review, and refactoring with Karpathy-inspired rules to avoid overcomplication, ensure simplicity, surgical changes, and verifiable success criteria.
Share bugs, ideas, or general feedback.
/knowledge-distillery:batch-refine orchestrator as a subagent per PR/knowledge-distillery:collect-evidencepull_requests,issues,labels toolsetsgit with access to refs/notes/commitsgit log, git show, git notes show — commit and memento dataBash, Read, Glob, Grepknowledge-gate CLI.| Field | Source | Format |
|---|---|---|
| PR number | Passed by orchestrator | Integer |
| Repository | Derived via GitHub MCP if not provided | owner/repo |
| Manifest JSON | Parsed from PR comment — strict delimiter parsing preferred, LLM fallback for non-standard formats | JSON per evidence-manifest.spec.md |
An Evidence Bundle — a structured JSON object held in memory. NOT written to disk. Returned to the calling context for consumption by /knowledge-distillery:extract-candidates.
Follow these steps in exact order.
Fetch all PR comments and locate the Manifest:
Use GitHub MCP to list all issue-level comments on PR #{pr_number}.
<!-- EVIDENCE_BUNDLE_MANIFEST_START --><!-- EVIDENCE_BUNDLE_MANIFEST_START --> and <!-- EVIDENCE_BUNDLE_MANIFEST_END -->```json and closing ```)version must be "1"pr.number must be a positive integerpr.merge_sha must match /^[0-9a-f]{7,40}$/identifiers sub-keys (linear, slack, memento, greptile, notion) must be present (even if empty arrays)If strict parsing succeeds, proceed to Step 2.
If no comment contains the EVIDENCE_BUNDLE_MANIFEST_START delimiter, or if the JSON between delimiters is malformed:
{
"version": "1",
"pr": {
"number": "<from orchestrator input>",
"merge_sha": "<from PR merge commit — query GitHub MCP if not in comment>",
"base_branch": "<from PR base branch — query GitHub MCP if not in comment>",
"changed_files": ["<from PR changed files — query GitHub MCP if not in comment>"]
},
"identifiers": {
"linear": [],
"slack": [],
"memento": [],
"greptile": [],
"notion": []
},
"collected_at": "<current ISO 8601 timestamp>"
}
[A-Z]+-\d+) mentioned in the comment → populate identifiers.linearhttps://*.slack.com/archives/*/p*) → populate identifiers.slackidentifiers.mementoidentifiers.greptilehttps://(www.)?notion.(so|site)/*) → populate identifiers.notionpr fields not present in the comment, query GitHub MCP directly to fill them in.If fallback parsing produces a valid Manifest, proceed to Step 2.
If no comment resembling a Manifest exists at all (not even in non-standard format), return an Evidence Bundle with:
{
"sufficiency": {
"verdict": "insufficient",
"missing": ["manifest"],
"reason": "No Evidence Bundle Manifest comment found on PR #{pr_number}."
}
}
Stop processing — do not proceed to subsequent steps.
Carry forward from the parsed Manifest into the Evidence Bundle's root-level fields:
pr_number from pr.numbermerge_sha from pr.merge_shabase_branch from pr.base_branchchanged_files from pr.changed_filesThen collect PR content:
Title and body:
Use GitHub MCP to fetch PR #{pr_number} title and body.
Changed file list:
The full PR diff is on-demand evidence — extract-candidates fetches specific file diffs selectively. At this stage, collect only the list of changed files:
Use GitHub MCP to fetch the list of changed files in PR #{pr_number}. Extract relative file paths.
Store as changed_files in the Evidence Bundle. If the Manifest already contains pr.changed_files, verify and use that; otherwise populate from this query.
Note for downstream:
extract-candidatescan selectively fetch specific file diffs as needed using GitHub MCP orgit diff.
Commits:
Use GitHub MCP to list all commits in PR #{pr_number}. Extract each commit's SHA (short, 7 chars) and full message.
Review comments (inline on diff):
Use GitHub MCP to list all review comments (inline on diff) for PR #{pr_number}. Extract: author (login), body, path, line (or original_line).
Issue-level comments:
Use GitHub MCP to list all issue-level comments on PR #{pr_number}. Include all comments EXCEPT the Manifest comment itself.
Note: The full PR diff is not pre-collected. The changed file list is sufficient for this step.
For each entry in identifiers.linear:
titledescription (full body)comments — array of { author, body, created_at }labels — array of label namesstatus_changes — status transition history (e.g., [{ "from": "In Progress", "to": "Done", "changed_at": "ISO 8601", "actor": "username" }]). Best-effort collection: try Linear MCP getIssueHistory or issue activity/audit log. Most Linear MCP implementations do not expose a dedicated history endpoint — if unavailable, set status_changes: [] and move on. The evidence bundle remains valuable without transition history.{ "id": "...", "title": null, "description": null, "comments": [], "labels": [], "status_changes": [], "retrieved": false }Missing Linear content does NOT trigger insufficient. Linear issues are supplementary context that enriches the evidence bundle.
For each entry in identifiers.slack:
{ "url": "...", "content": "...", "retrieved": true }{ "url": "...", "content": null, "retrieved": false }Missing Slack content does NOT trigger insufficient. Slack threads are supplementary context.
Ensure notes refs are available before collecting:
git fetch origin refs/notes/commits:refs/notes/commits 2>/dev/null || true
For each entry in identifiers.memento where has_notes is true:
Summary notes:
git notes --ref=refs/notes/commits show {sha}
If successful, store the output as summary.
If git notes show fails for a commit, skip that entry silently.
Parse structured sections from the memento note (7-section format):
## Recorded Decisions section → extract decision slugs and commit SHAs as decision_refs
- `{slug}` ({sha}): {description}## Vault Entries Referenced section → extract entry IDs, signals, and notes as vault_refs
- `{entry_id}` [{signal}]: {note}vault_refs: [], decision_refs: []followed, outdated, conflicted, insufficientMissing memento notes do NOT trigger insufficient.
For each entry in identifiers.greptile:
Fetch PR comments from the Greptile bot:
Use GitHub MCP to list all review comments for PR #{pr_number}. Filter for comments by users whose login contains "greptile" (case-insensitive).
Also check issue-level comments for Greptile bot comments.
Collect: { "path": "...", "line": N, "body": "..." } for each comment.
Missing Greptile data does NOT trigger insufficient.
For each entry in identifiers.notion:
Use Notion MCP notion-fetch to retrieve the page:
Use Notion MCP to fetch the page at the URL from the identifier. Extract page title and content (returned as Markdown).
If retrieved successfully: { "url": "...", "title": "Page Title", "content": "markdown content", "retrieved": true }
If Notion MCP is unavailable or the page is not found: { "url": "...", "title": null, "content": null, "retrieved": false }
Missing Notion content does NOT trigger insufficient. Notion pages are supplementary context — design documents, decision records, and meeting notes that enrich the evidence bundle.
Evaluate evidence completeness using these rules:
| Condition | Verdict |
|---|---|
| Changed file list present AND commit messages present | Required baseline met — sufficient |
| All optional sources (Linear, Slack, memento, Greptile, Notion) missing but required baseline met | sufficient |
| No Manifest found | insufficient |
| GitHub MCP authentication failed (401/403) on any required call | insufficient (see GitHub MCP Auth Failure below) |
Composing the sufficiency object:
If sufficient:
{
"verdict": "sufficient",
"missing": [],
"reason": ""
}
If insufficient:
{
"verdict": "insufficient",
"missing": ["<specific items, e.g., 'linear:PAY-123', 'pr_diff', 'manifest'>"],
"reason": "<Human-readable explanation of what is missing and why it matters>"
}
Even when insufficient due to missing optional sources (e.g., Linear unavailable, Notion down), return the Evidence Bundle with whatever evidence was collected. The orchestrator decides how to handle insufficient bundles (e.g., keeping the PR in knowledge:pending for a later retry).
Auth failure is a special case that does not follow this rule — see GitHub MCP Auth Failure (401/403) below. Partial data on auth failure must be discarded.
A GitHub MCP call may return 401/403 mid-collection — typically because the workflow's installation token expired before the workflow finished. When this happens:
extract-candidates a misleading bundle.{
"sufficiency": {
"verdict": "insufficient",
"missing": ["github_auth"],
"reason": "GitHub MCP authentication failed (401/403). Token likely expired mid-run. PR will be retried on next batch."
}
}
batch-refine) treats the github_auth missing tag as a special signal: it stops the per-PR loop and follows the Unexpected 401 path (no graceful handoff, no retrigger). The PR remains knowledge:pending and is naturally picked up by the next cron run.This rule supersedes the "graceful degradation" guidance for optional sources — auth failure on the required GitHub baseline is never acceptable. There is no partial-data path here.
The final Evidence Bundle must follow this structure:
{
"pr_number": 1234,
"merge_sha": "abc123def456",
"base_branch": "main",
"changed_files": ["path/to/file.rb", "..."],
"evidence": {
"pr": {
"title": "PR title text",
"body": "PR body markdown",
"commits": [
{ "sha": "a1b2c3d", "message": "Full commit message" }
],
"review_comments": [
{ "author": "username", "body": "comment text", "path": "file.rb", "line": 42 }
],
"issue_comments": [
{ "author": "username", "body": "comment text" }
]
},
"linear": [
{
"id": "LIN-456",
"title": "Issue title",
"description": "Full issue body",
"comments": [
{ "author": "person", "body": "comment text", "created_at": "ISO 8601" }
],
"labels": ["decision", "bug"],
"status_changes": [
{ "from": "In Progress", "to": "Done", "changed_at": "ISO 8601", "actor": "username" }
],
"retrieved": true
}
],
"slack": [
{
"url": "https://team.slack.com/archives/C0123/p1709901234",
"content": "thread content or null",
"retrieved": true
}
],
"memento": [
{
"sha": "a1b2c3d",
"summary": "git notes content from refs/notes/commits",
"vault_refs": [
{ "entry_id": "entry-id", "signal": "followed", "note": "description of usage" }
],
"decision_refs": [
{ "slug": "decision-slug", "commit_sha": "abc1234" }
]
}
],
"greptile": [
{
"review_id": "greptile-pr-1234",
"comments": [
{ "path": "file.rb", "line": 10, "body": "review comment" }
]
}
],
"notion": [
{
"url": "https://notion.so/workspace/Design-Doc-abc123",
"title": "Design Doc: Payment Flow",
"content": "markdown content of the page",
"retrieved": true
}
]
},
"sufficiency": {
"verdict": "sufficient",
"missing": [],
"reason": ""
}
}
| Failure Mode | Behavior |
|---|---|
| No Manifest comment on PR | Return insufficient with reason. Do not proceed to collection steps. |
| Manifest JSON malformed | Return insufficient with reason. Do not proceed. |
| Linear MCP unavailable | Set retrieved: false for all Linear entries. Continue — optional source. |
| Linear issue deleted/moved/not found | Set retrieved: false for that entry. Continue — optional source. |
| Slack content unretrievable | Set retrieved: false for that entry. Continue — optional source. |
git notes show fails | Skip that memento entry. Continue — optional source. |
| Notion MCP unavailable | Set retrieved: false for all Notion entries. Continue — optional source. |
| Notion page not found / access denied | Set retrieved: false for that entry. Continue — optional source. |
| Changed file list unavailable | Use pr.changed_files from Manifest as fallback. |
| GitHub API rate limit | Report failure to orchestrator. Orchestrator retries in next batch. |
knowledge-gate commands (no CLI access in this step)/knowledge-distillery:extract-candidates — Stage B step 2)