Help us improve
Share bugs, ideas, or general feedback.
From nexus-agents
Runs real end-to-end loops across all nexus-agents feature families — research, synthesize, vote, plan, dev-pipeline, graph workflow, memory, audit — to validate claims against actual tool outputs. Use after releases, weekly, or when several behavior fixes have landed.
npx claudepluginhub nexus-substrate/nexus-agentsHow this skill is triggered — by the user, by Claude, or both
Slash command
/nexus-agents:e2e-validationThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Unit tests prove a function does what its test says. They do **not** prove the
Runs multi-agent verification loop post-implementation, dispatching specialized agents for review with autonomous subagent fixes and retries until unanimous approval.
Runs end-to-end QA validation workflows with structured pass/fail reports and evidence. Useful for acceptance testing, release verification, and QA validation issues.
Orchestrates post-implementation validation by delegating to /vibe, /post-mortem, /retro, /forge, and lifecycle skills. Produces a slice-validation roll-up mapping every acceptance criterion to a passing test.
Share bugs, ideas, or general feedback.
Unit tests prove a function does what its test says. They do not prove the
real loops work: that the live voter panel actually returns 7 votes, that
run_dev_pipeline wires every stage, that routing picks a reachable adapter,
that the audit chain verifies. This skill runs the real loops with live
adapters and compares observed behavior against what the code, docs, and recent
"fixed" claims assert. Where reality diverges, it files a tracked issue.
Why this exists: this is dogfooding for validation, not implementation.
dogfooding-issuesimplements open issues;dev-pipelineruns one pipeline;system-reviewis a static health check. This skill is the periodic real-usage smoke test of the whole substrate.
Record the trigger in the report. One run per trigger is enough — don't loop it.
nexus-agents doctor). This skill validates real behavior —
never use consensus_vote { simulateVotes: true } (random output, #2319)
or any simulate/mock path. If no live adapter is available, record every step
as BLOCKED and stop — do not fabricate a pass.--quick votes,
dryRun pipelines) is an acceptable partial — label it as such.nexus-agents version and git SHA
so the report is reproducible.Pick one real, non-trivial seed task (a genuine backlog item works best, so the run produces real value). Drive it through the families below, capturing the actual tool output each step. Don't paraphrase success — paste/observe the real result and judge it.
| # | Family | Tools to exercise | What "real" validates |
|---|---|---|---|
| 1 | Research | research_discover, research_synthesize, research_query | discovery returns results; synthesis preserves provenance |
| 2 | Consensus | consensus_vote (run BOTH --quick 3-voter AND full 7-voter, higher_order + simple_majority) | all voters return — catches dead/404 voters (e.g. #3497), OAuth races, escalation |
| 3 | Planning/exec | orchestrate, execute_expert, delegate_to_model | adapter routing picks a reachable model; expert personas resolve |
| 4 | Pipelines | run_dev_pipeline (dryRun → real), run_pipeline, run_graph_workflow | every stage wired; no missing-stage / template-resolution gaps |
| 5 | Memory | memory_write, memory_query, memory_stats | writes persist and read back; counts move |
| 6 | Audit/health | verify_audit_chain, ci_health_check, query_trace | the run left a verifiable audit trail; traces resolve |
| 7 | Repo/analysis | repo_analyze, search_codebase, extract_symbols | analysis tools run against this repo |
Adapt the set to what changed since the last run — but a full periodic run should touch all seven families at least once.
For each step record: what you ran, the actual output (or error), the
claimed/expected behavior (from code/docs/changelog), and a verdict —
PASS / FAIL / BLOCKED / PARTIAL. A vague "seemed to work" is a FAIL of the
capture, not a PASS of the step.
On a real failure, follow the orchestrator-fallback rule (one retry max, then treat as a blocker — never silently retry past it) and apply the Q Protocol to triage.
When observed behavior contradicts a claim (a "fixed" bug that reproduces, a doc that lies, a dead voter, a missing stage), file a GitHub issue per the Discovered-Issues 4-point gate — this is canonical tracking, not a memory note.
USAi / government / org references from titles, bodies, and
pasted output; generalize to "the configured provider" / "an upstream provider".
Keep the technical content.Close with a report (post to the tracking issue and/or docs/ops/):
E2E Validation — <date> — nexus-agents <version> (<sha>)
Trigger: <release | weekly | N fixes | on-demand>
Coverage: 7/7 families | Adapters live: <which>
Result: <PASS n> / <FAIL n> / <BLOCKED n> / <PARTIAL n>
Issues filed: #.... , #....
Notes: <one or two lines on anything surprising>
Keep it to ~10 lines. The value is the issues filed and the honest verdict, not prose.