Search everything...

Skill

collect-evidence

Collects the actual content of all evidence sources identified in a PR's Evidence Bundle Manifest and produces a structured Evidence Bundle. Stage B step 1 — transforms identifier references into full content for downstream candidate extraction. Called by batch-refine orchestrator per PR.

npx claudepluginhub ether-moon/knowledge-distillery --plugin knowledge-distillery

Tool Access

This skill uses the workspace's default tool permissions.

Preview

- Called by `/knowledge-distillery:batch-refine` orchestrator as a subagent per PR

SKILL.md

Similar Skills

skill-lookup

159.9k

Searches, retrieves, and installs Agent Skills from prompts.chat registry using MCP tools like search_skills and get_skill. Activates for finding skills, browsing catalogs, or extending Claude.

prompts.chat

next-compile

139.2k

Checks Next.js compilation errors using a running Turbopack dev server after code edits. Fixes actionable issues before reporting complete. Replaces `next build`.

1 file

vercel-next-js-2

karpathy-guidelines

90.0k

Guides code writing, review, and refactoring with Karpathy-inspired rules to avoid overcomplication, ensure simplicity, surgical changes, and verifiable success criteria.

andrej-karpathy-skills

Stats

Parent Repo Stars0

Parent Repo Forks0

Last CommitApr 27, 2026

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

collect-evidence | knowledge-distillery | ClaudePluginHub

Back to Skills

Skill

collect-evidence

From knowledge-distillery

npx claudepluginhub ether-moon/knowledge-distillery --plugin knowledge-distillery

Tool Access

This skill uses the workspace's default tool permissions.

Preview

- Called by `/knowledge-distillery:batch-refine` orchestrator as a subagent per PR

SKILL.md

collect-evidence — Stage B-1 Evidence Collection

When This Skill Runs

Called by /knowledge-distillery:batch-refine orchestrator as a subagent per PR
Runs within the same subagent context (Evidence Bundle is returned in-memory)
Invoked as /knowledge-distillery:collect-evidence

Prerequisites

GitHub MCP server configured with pull_requests,issues,labels toolsets
git with access to refs/notes/commits
Linear MCP server (optional — graceful degradation if unavailable)
Notion MCP server (optional — graceful degradation if unavailable)

Allowed Tools

GitHub MCP (read-only by behavioral contract) — PR data and review comments
Linear MCP — issue details and comments (read-only)
Notion MCP — page content retrieval (read-only)
git log, git show, git notes show — commit and memento data
Bash, Read, Glob, Grep
No file writes. No vault.db access. No knowledge-gate CLI.
MUST NOT create, modify, or delete any GitHub resources (comments, labels, PRs). Read operations only.

Input

Field	Source	Format
PR number	Passed by orchestrator	Integer
Repository	Derived via GitHub MCP if not provided	`owner/repo`
Manifest JSON	Parsed from PR comment — strict delimiter parsing preferred, LLM fallback for non-standard formats	JSON per evidence-manifest.spec.md

Output

An Evidence Bundle — a structured JSON object held in memory. NOT written to disk. Returned to the calling context for consumption by /knowledge-distillery:extract-candidates.

Execution Steps

Follow these steps in exact order.

Step 1: Parse the Evidence Bundle Manifest

Fetch all PR comments and locate the Manifest:

Use GitHub MCP to list all issue-level comments on PR #{pr_number}.

1a. Strict parsing (preferred)

Find the comment whose body contains 
Extract the text between  and 
Strip the markdown code fence (opening ```json and closing ```)
Parse the remaining text as JSON
Validate the parsed JSON:
- version must be "1"
- pr.number must be a positive integer
- pr.merge_sha must match /^[0-9a-f]{7,40}$/
- All identifiers sub-keys (linear, slack, memento, greptile, notion) must be present (even if empty arrays)

If strict parsing succeeds, proceed to Step 2.

1b. LLM fallback parsing

If no comment contains the EVIDENCE_BUNDLE_MANIFEST_START delimiter, or if the JSON between delimiters is malformed:

Search all PR comments for one that resembles an Evidence Bundle Manifest. Look for comments containing keywords like "Evidence Bundle Manifest", "evidence", identifier references (Linear IDs, Slack URLs, commit SHAs), or structured lists of evidence sources.
If a candidate comment is found, extract structured data from it by reading its content and mapping it to the Manifest schema:
```
{
  "version": "1",
  "pr": {
    "number": "<from orchestrator input>",
    "merge_sha": "<from PR merge commit — query GitHub MCP if not in comment>",
    "base_branch": "<from PR base branch — query GitHub MCP if not in comment>",
    "changed_files": ["<from PR changed files — query GitHub MCP if not in comment>"]
  },
  "identifiers": {
    "linear": [],
    "slack": [],
    "memento": [],
    "greptile": [],
    "notion": []
  },
  "collected_at": "<current ISO 8601 timestamp>"
}
```
- Extract any Linear issue IDs (pattern: [A-Z]+-\d+) mentioned in the comment → populate identifiers.linear
- Extract any Slack URLs (pattern: https://*.slack.com/archives/*/p*) → populate identifiers.slack
- Extract any commit SHAs referenced as memento sources → populate identifiers.memento
- Extract any Greptile review references → populate identifiers.greptile
- Extract any Notion URLs (pattern: https://(www.)?notion.(so|site)/*) → populate identifiers.notion
- For pr fields not present in the comment, query GitHub MCP directly to fill them in.
Validate the reconstructed Manifest using the same rules as 1a step 5. Empty identifier arrays are valid.

If fallback parsing produces a valid Manifest, proceed to Step 2.

1c. No Manifest found

If no comment resembling a Manifest exists at all (not even in non-standard format), return an Evidence Bundle with:

{
  "sufficiency": {
    "verdict": "insufficient",
    "missing": ["manifest"],
    "reason": "No Evidence Bundle Manifest comment found on PR #{pr_number}."
  }
}

Stop processing — do not proceed to subsequent steps.

Step 2: Collect PR Evidence (Required)

Carry forward from the parsed Manifest into the Evidence Bundle's root-level fields:

pr_number from pr.number
merge_sha from pr.merge_sha
base_branch from pr.base_branch
changed_files from pr.changed_files

Then collect PR content:

Title and body:

Use GitHub MCP to fetch PR #{pr_number} title and body.

Changed file list: The full PR diff is on-demand evidence — extract-candidates fetches specific file diffs selectively. At this stage, collect only the list of changed files:
```
Use GitHub MCP to fetch the list of changed files in PR #{pr_number}. Extract relative file paths.
```
Store as changed_files in the Evidence Bundle. If the Manifest already contains pr.changed_files, verify and use that; otherwise populate from this query.

Note for downstream: extract-candidates can selectively fetch specific file diffs as needed using GitHub MCP or git diff.

Commits:

Use GitHub MCP to list all commits in PR #{pr_number}. Extract each commit's SHA (short, 7 chars) and full message.

Review comments (inline on diff):

Use GitHub MCP to list all review comments (inline on diff) for PR #{pr_number}. Extract: author (login), body, path, line (or original_line).

Issue-level comments:

Use GitHub MCP to list all issue-level comments on PR #{pr_number}. Include all comments EXCEPT the Manifest comment itself.

Note: The full PR diff is not pre-collected. The changed file list is sufficient for this step.

Step 3: Collect Linear Evidence (Optional)

For each entry in identifiers.linear:

Query Linear MCP for the issue by ID. Collect:
- title
- description (full body)
- comments — array of { author, body, created_at }
- labels — array of label names
- status_changes — status transition history (e.g., [{ "from": "In Progress", "to": "Done", "changed_at": "ISO 8601", "actor": "username" }]). Best-effort collection: try Linear MCP getIssueHistory or issue activity/audit log. Most Linear MCP implementations do not expose a dedicated history endpoint — if unavailable, set status_changes: [] and move on. The evidence bundle remains valuable without transition history.
If Linear MCP is unavailable or the specific issue is not found:
- Record: { "id": "...", "title": null, "description": null, "comments": [], "labels": [], "status_changes": [], "retrieved": false }

Missing Linear content does NOT trigger insufficient. Linear issues are supplementary context that enriches the evidence bundle.

Step 4: Collect Slack Evidence (Optional)

For each entry in identifiers.slack:

Attempt to retrieve the Slack thread content using available integration
If retrieved successfully: { "url": "...", "content": "...", "retrieved": true }
If retrieval fails: { "url": "...", "content": null, "retrieved": false }

Missing Slack content does NOT trigger insufficient. Slack threads are supplementary context.

Step 5: Collect Memento Evidence (Optional)

Ensure notes refs are available before collecting:

git fetch origin refs/notes/commits:refs/notes/commits 2>/dev/null || true

For each entry in identifiers.memento where has_notes is true:

Summary notes:
```
git notes --ref=refs/notes/commits show {sha}
```
If successful, store the output as summary.
If git notes show fails for a commit, skip that entry silently.
Parse structured sections from the memento note (7-section format):
- Look for ## Recorded Decisions section → extract decision slugs and commit SHAs as decision_refs
  - Expected line format: - `{slug}` ({sha}): {description}
- Look for ## Vault Entries Referenced section → extract entry IDs, signals, and notes as vault_refs
  - Expected line format: - `{entry_id}` [{signal}]: {note}
- If these sections are absent (5-section legacy format), set both to empty arrays: vault_refs: [], decision_refs: []
- Valid signals: followed, outdated, conflicted, insufficient

Missing memento notes do NOT trigger insufficient.

Step 6: Collect Greptile Evidence (Optional)

For each entry in identifiers.greptile:

Fetch PR comments from the Greptile bot:

Use GitHub MCP to list all review comments for PR #{pr_number}. Filter for comments by users whose login contains "greptile" (case-insensitive).

Also check issue-level comments for Greptile bot comments.
Collect: { "path": "...", "line": N, "body": "..." } for each comment.

Missing Greptile data does NOT trigger insufficient.

Step 7: Collect Notion Evidence (Optional)

For each entry in identifiers.notion:

Use Notion MCP notion-fetch to retrieve the page:

Use Notion MCP to fetch the page at the URL from the identifier. Extract page title and content (returned as Markdown).

If retrieved successfully: { "url": "...", "title": "Page Title", "content": "markdown content", "retrieved": true }
If Notion MCP is unavailable or the page is not found: { "url": "...", "title": null, "content": null, "retrieved": false }

Missing Notion content does NOT trigger insufficient. Notion pages are supplementary context — design documents, decision records, and meeting notes that enrich the evidence bundle.

Step 8: Sufficiency Judgment

Evaluate evidence completeness using these rules:

Condition	Verdict
Changed file list present AND commit messages present	Required baseline met — `sufficient`
All optional sources (Linear, Slack, memento, Greptile, Notion) missing but required baseline met	`sufficient`
No Manifest found	`insufficient`
GitHub MCP authentication failed (401/403) on any required call	`insufficient` (see GitHub MCP Auth Failure below)

Composing the sufficiency object:

If sufficient:

{
  "verdict": "sufficient",
  "missing": [],
  "reason": ""
}

If insufficient:

{
  "verdict": "insufficient",
  "missing": ["<specific items, e.g., 'linear:PAY-123', 'pr_diff', 'manifest'>"],
  "reason": "<Human-readable explanation of what is missing and why it matters>"
}

Even when insufficient due to missing optional sources (e.g., Linear unavailable, Notion down), return the Evidence Bundle with whatever evidence was collected. The orchestrator decides how to handle insufficient bundles (e.g., keeping the PR in knowledge:pending for a later retry).

Auth failure is a special case that does not follow this rule — see GitHub MCP Auth Failure (401/403) below. Partial data on auth failure must be discarded.

GitHub MCP Auth Failure (401/403)

A GitHub MCP call may return 401/403 mid-collection — typically because the workflow's installation token expired before the workflow finished. When this happens:

Stop collecting immediately. Do not continue with partially fetched data. PR title/body without comments, or comments without authors, would feed extract-candidates a misleading bundle.
Discard any partial evidence for this PR. Do not write incomplete fields into the bundle that would look "sufficient" to downstream stages.

Return the bundle with:

{
  "sufficiency": {
    "verdict": "insufficient",
    "missing": ["github_auth"],
    "reason": "GitHub MCP authentication failed (401/403). Token likely expired mid-run. PR will be retried on next batch."
  }
}

The orchestrator (batch-refine) treats the github_auth missing tag as a special signal: it stops the per-PR loop and follows the Unexpected 401 path (no graceful handoff, no retrigger). The PR remains knowledge:pending and is naturally picked up by the next cron run.

This rule supersedes the "graceful degradation" guidance for optional sources — auth failure on the required GitHub baseline is never acceptable. There is no partial-data path here.

Evidence Bundle Structure

The final Evidence Bundle must follow this structure:

{
  "pr_number": 1234,
  "merge_sha": "abc123def456",
  "base_branch": "main",
  "changed_files": ["path/to/file.rb", "..."],
  "evidence": {
    "pr": {
      "title": "PR title text",
      "body": "PR body markdown",
      "commits": [
        { "sha": "a1b2c3d", "message": "Full commit message" }
      ],
      "review_comments": [
        { "author": "username", "body": "comment text", "path": "file.rb", "line": 42 }
      ],
      "issue_comments": [
        { "author": "username", "body": "comment text" }
      ]
    },
    "linear": [
      {
        "id": "LIN-456",
        "title": "Issue title",
        "description": "Full issue body",
        "comments": [
          { "author": "person", "body": "comment text", "created_at": "ISO 8601" }
        ],
        "labels": ["decision", "bug"],
        "status_changes": [
          { "from": "In Progress", "to": "Done", "changed_at": "ISO 8601", "actor": "username" }
        ],
        "retrieved": true
      }
    ],
    "slack": [
      {
        "url": "https://team.slack.com/archives/C0123/p1709901234",
        "content": "thread content or null",
        "retrieved": true
      }
    ],
    "memento": [
      {
        "sha": "a1b2c3d",
        "summary": "git notes content from refs/notes/commits",
        "vault_refs": [
          { "entry_id": "entry-id", "signal": "followed", "note": "description of usage" }
        ],
        "decision_refs": [
          { "slug": "decision-slug", "commit_sha": "abc1234" }
        ]
      }
    ],
    "greptile": [
      {
        "review_id": "greptile-pr-1234",
        "comments": [
          { "path": "file.rb", "line": 10, "body": "review comment" }
        ]
      }
    ],
    "notion": [
      {
        "url": "https://notion.so/workspace/Design-Doc-abc123",
        "title": "Design Doc: Payment Flow",
        "content": "markdown content of the page",
        "retrieved": true
      }
    ]
  },
  "sufficiency": {
    "verdict": "sufficient",
    "missing": [],
    "reason": ""
  }
}

Error Handling

Failure Mode	Behavior
No Manifest comment on PR	Return `insufficient` with reason. Do not proceed to collection steps.
Manifest JSON malformed	Return `insufficient` with reason. Do not proceed.
Linear MCP unavailable	Set `retrieved: false` for all Linear entries. Continue — optional source.
Linear issue deleted/moved/not found	Set `retrieved: false` for that entry. Continue — optional source.
Slack content unretrievable	Set `retrieved: false` for that entry. Continue — optional source.
`git notes show` fails	Skip that memento entry. Continue — optional source.
Notion MCP unavailable	Set `retrieved: false` for all Notion entries. Continue — optional source.
Notion page not found / access denied	Set `retrieved: false` for that entry. Continue — optional source.
Changed file list unavailable	Use `pr.changed_files` from Manifest as fallback.
GitHub API rate limit	Report failure to orchestrator. Orchestrator retries in next batch.

Constraints

MUST NOT write any files to disk
MUST NOT access or modify vault.db
MUST NOT call any knowledge-gate commands (no CLI access in this step)
MUST NOT extract knowledge candidates (that is /knowledge-distillery:extract-candidates — Stage B step 2)
MUST NOT make sufficiency decisions beyond the defined rules — no subjective "I think this is enough"
MUST return the Evidence Bundle in memory for the next step in the same subagent context
MUST preserve all raw content without summarization or interpretation
MUST NOT modify the PR (no comments, no label changes — the orchestrator handles that)