From session-orchestrator
Runs modular probes adapted to detected tech stack (JS/TS, Python, Docker, Vercel, Supabase, etc.) to discover quality issues. Enables interactive triage and auto-creates VCS issues.
npx claudepluginhub kanevry/session-orchestrator --plugin session-orchestratorThis skill uses the workspace's default tool permissions.
Two modes of operation:
Guides strict Test-Driven Development (TDD): write failing tests first for features, bugfixes, refactors before any production code. Enforces red-green-refactor cycle.
Guides systematic root cause investigation for bugs, test failures, unexpected behavior, performance issues, and build failures before proposing fixes.
Guides A/B test setup with mandatory gates for hypothesis validation, metrics definition, sample size calculation, and execution readiness checks.
Two modes of operation:
/discovery [scope]): Full 6-phase flow with interactive triage (Phases 0-6)discovery-on-close: true): Phases 0-4 only, returns structured findings to session-endThe scope argument accepts: all (default), code, infra, ui, arch, session, audit, vault, or comma-separated like code,session.
Read skills/_shared/bootstrap-gate.md and execute the gate check. If the gate is CLOSED, invoke skills/bootstrap/SKILL.md and wait for completion before proceeding. If the gate is OPEN, continue to Phase 1.
Read and parse Session Config per skills/_shared/config-reading.md. Store result as $CONFIG.
Discovery-relevant fields (parse these specifically):
discovery-on-close, discovery-probes, discovery-exclude-paths, discovery-severity-threshold, discovery-confidence-thresholdtest-command, typecheck-command, lint-commandpencil, vcs, cross-repos, stale-issue-daysDetect the project's tech stack via marker file checks. Use Glob and run checks in parallel:
| Marker File(s) | Activates |
|---|---|
package.json | JS/TS probes |
tsconfig.json | TypeScript probes |
requirements.txt / pyproject.toml | Python probes |
Dockerfile / docker-compose.yml | Container probes |
vercel.json / .vercel/ | Vercel probes |
.github/workflows/ | GitHub CI probes |
.gitlab-ci.yml | GitLab CI probes |
supabase/ | Supabase probes |
next.config.* / nuxt.config.* | SSR probes |
tailwind.config.* | Tailwind probes |
| Pencil in Session Config | design-drift probe |
.orchestrator/bootstrap.lock | harness-audit probe |
.vault.yaml OR Session Config vault-integration.enabled: true | vault probes |
discovery-probes is set in config, intersect with that listscope argument was passed, restrict to that categoryThe audit probe activates when bootstrap.lock is present OR when discovery-probes config explicitly lists audit.
The vault probe activates when .vault.yaml is present in the repo root OR when vault-integration.enabled: true in Session Config OR when discovery-probes config explicitly lists vault.
Default exclude paths (always apply):
node_modules/, .git/, dist/, build/, .next/, .nuxt/, coverage/Add any paths from discovery-exclude-paths in Session Config.
VCS Reference: Detect the VCS platform per the "VCS Auto-Detection" section of the gitlab-ops skill.
Report: "Discovery: [N] probes active across [categories]. Stack: [detected]. Threshold: [severity]."
Dispatch probe agents IN PARALLEL using the Agent tool. Group by category (max 5 agents):
Cursor IDE: No Agent() tool available. Run probes sequentially within the current session — one category at a time. Complete each category's analysis before moving to the next.
skills/discovery/probes-vault.md): invokes skills/discovery/probes/vault-staleness.mjs and skills/discovery/probes/vault-narrative-staleness.mjs directly via node. Each probe returns {findings, metrics, duration_ms}. The runner reports FINDING: blocks per finding and appends summary records to .orchestrator/metrics/vault-staleness.jsonl and vault-narrative-staleness.jsonl.Each agent receives:
probes-intro.md (confidence scoring reference) AND the category-specific probes-<category>.md file for this agent's category (include the actual grep commands/patterns in the prompt)FINDING:
probe: <probe_name>
category: <category>
severity: <critical|high|medium|low>
file_path: <absolute path>
line_number: <number>
matched_text: <exact text from tool output>
title: <short title for the finding>
description: <1-2 sentence description>
recommended_fix: <concrete fix suggestion>
"If a probe's activation condition is not met, skip it with: SKIPPED: <probe_name> -- " "If a probe command fails, skip it with: FAILED: <probe_name> -- " "Do NOT fabricate findings. Only report what tool output confirms."
CRITICAL: run_in_background: false for all agents.
Skip categories with no activated probes (don't dispatch empty agents).
After all probe agents complete:
Collect all FINDING: blocks from agent outputs into a unified findings list.
For EACH finding:
file_path:line_number using the Read toolmatched_text appears at or near that line (+/-3 lines tolerance)For each verified finding, assign a confidence score (0-100) based on three factors:
| Factor | Low (+0) | Medium (+10) | High (+20) |
|---|---|---|---|
| Pattern specificity | Generic match (URL, TODO) | Moderate (orphaned annotation, magic number) | Specific (API key regex, eval(), SQL injection) |
| File context | Test fixture, example, seed data, docs | Utility, config, scripts | Production source, API handler, middleware |
| Historical signal | Previously dismissed as false positive | No prior data (first occurrence) | Recurring issue (confirmed in learnings.jsonl) |
Scoring rules:
critical get a minimum confidence of 70 — they are NEVER auto-deferredThreshold: Read discovery-confidence-threshold from Session Config (default: 60). If not configured, use 60.
Annotate each finding with its confidence score for Phase 5 presentation.
Two findings are duplicates if:
file_path ANDKeep the higher severity finding. Merge descriptions.
discovery-severity-threshold from Session Config.discovery-confidence-threshold (default: 60). Log filtered-out findings: "Auto-dismissed N low-confidence findings (below threshold [T]). Use discovery-confidence-threshold: 0 to see all."Group remaining findings by category for Phase 5 presentation.
If in embedded mode (called from session-end): STOP HERE. Return structured findings to the caller using this schema:
Embedded mode return schema:
{
"findings": [
{"probe": "string", "category": "string", "severity": "critical|high|medium|low", "confidence": 0-100, "file": "string", "line": number, "description": "string", "recommendation": "string"}
],
"stats": {
"probes_run": number,
"findings_raw": number,
"findings_verified": number,
"false_positives": number,
"user_dismissed": 0,
"issues_created": 0,
"by_category": {"<category>": {"findings": number, "actioned": 0}}
}
}
Present both as structured data in your final output. Do not proceed to Phase 5.
Before presenting findings for triage, separate by confidence threshold:
/discovery --include-deferred."Present findings using AskUserQuestion -- NEVER plain text options. On Codex CLI where AskUserQuestion is unavailable, present as numbered Markdown lists.
Include confidence scores in the presentation:
[CRITICAL] (confidence: 85) hardcoded-values: API key found in src/config.ts:42
[HIGH] (confidence: 72) security-basics: eval() usage in src/utils/parser.ts:18
[MEDIUM] (confidence: 61) orphaned-annotations: TODO without issue in src/lib/auth.ts:55
Present a findings overview table:
## Discovery Results
Probes run: [N] | Findings verified: [N] | False positives discarded: [N]
| Category | Critical | High | Medium | Low | Total |
|----------|----------|------|--------|-----|-------|
| Code | ... | ... | ... | ... | ... |
| Infra | ... | ... | ... | ... | ... |
| UI | ... | ... | ... | ... | ... |
| Arch | ... | ... | ... | ... | ... |
| Session | ... | ... | ... | ... | ... |
For each Critical or High finding, use AskUserQuestion (on Codex CLI where AskUserQuestion is unavailable, present as numbered Markdown lists):
AskUserQuestion({
questions: [{
question: "<finding title>\n\n<file_path>:<line_number>\n```\n<matched_text with +/-3 lines context>\n```\n\n<description>\n\nRecommended fix: <recommended_fix>",
header: "<severity>",
options: [
{ label: "Create issue (<severity>)", description: "Create a priority:<severity> issue for this finding" },
{ label: "Adjust priority", description: "Create issue with different priority" },
{ label: "Dismiss -- intentional", description: "This is by design, skip" },
{ label: "Dismiss -- false positive", description: "Detection was wrong, skip" }
]
}]
})
If user selects "Adjust priority", ask which priority with another AskUserQuestion. On Codex CLI where AskUserQuestion is unavailable, present as numbered Markdown lists.
Group remaining findings by category. For each category with medium/low findings (on Codex CLI where AskUserQuestion is unavailable, present as numbered Markdown lists):
AskUserQuestion({
questions: [{
question: "[N] medium/low findings in [category]:\n\n1. [title] -- [file_path]:[line] ([severity])\n2. [title] -- [file_path]:[line] ([severity])\n...",
header: "[Category]",
options: [
{ label: "Accept all (Recommended)", description: "Create issues for all [N] findings" },
{ label: "Review individually", description: "Walk through each finding one by one" },
{ label: "Dismiss all", description: "Skip all medium/low findings in this category" }
]
}]
})
If "Review individually" selected, walk through each like Step 2.
Before creating any issues (on Codex CLI where AskUserQuestion is unavailable, present as numbered Markdown lists):
AskUserQuestion({
questions: [{
question: "Ready to create [N] issues?\n\n- [X] critical\n- [Y] high\n- [Z] medium\n- [W] low",
header: "Confirm",
options: [
{ label: "Create all [N] issues", description: "Proceed with issue creation" },
{ label: "Review list first", description: "Show full list before creating" },
{ label: "Cancel", description: "Do not create any issues" }
]
}]
})
VCS Reference: Detect the VCS platform per the "VCS Auto-Detection" section of the gitlab-ops skill. Use CLI commands per the "Common CLI Commands" section.
For each approved finding:
issue-templates.mdtype:discovery + priority:<level> + area:<inferred from category/filepath> + status:readyglab issue create --title "[Discovery] <title>" --label "type:discovery,priority:<level>,area:<area>,status:ready" --description "<body>"gh issue create --title "[Discovery] <title>" --label "type:discovery,priority:<level>,area:<area>,status:ready" --body "<body>"## Discovery Report
### Summary
- Probes run: [N] across [categories]
- Raw findings: [N]
- Verified: [N] (false positives discarded: [M])
- User approved: [N]
- Issues created: [N]
### Created Issues
| # | Title | Priority | Area | Probe |
|---|-------|----------|------|-------|
| <IID> | <title> | <priority> | <area> | <probe> |
### Dismissed Findings
- [N] dismissed as intentional
- [M] dismissed as false positive
### Recommendations
- [suggestions based on finding patterns]
discovery-severity-threshold -- filter before presenting to usernode_modules/, .git/, dist/, build/, .next/, .nuxt/, coverage/This phase runs only in standalone mode. Embedded mode returns findings to the caller.
After Phase 6 (Issue Creation) completes, prepare discovery statistics for session metrics:
Count totals from the triage results:
probes_run: number of probes that were activated and executedfindings_raw: total findings before verificationfindings_verified: findings that passed Phase 4.2 verificationfalse_positives: findings discarded during verificationuser_dismissed: findings the user declined during Phase 5 triageissues_created: issues created in Phase 6by_category: per-category breakdown of findings and actioned itemsReport stats summary:
Discovery stats: [probes_run] probes, [findings_raw] raw → [findings_verified] verified ([false_positives] false positives). User dismissed [user_dismissed]. Created [issues_created] issues.
These stats are available for session-end to include in sessions.jsonl under the discovery_stats field. The discovery skill does NOT write to sessions.jsonl directly — session-end handles that.
code scope, don't scan infrastructuresessions.jsonl directly — session-end handles metrics persistence