npx claudepluginhub nwave-ai/nwave --plugin nwThis skill uses the workspace's default tool permissions.
Validate agent produces correct, well-structured outputs for typical inputs.
Evaluates LLM agents through behavioral testing, capability assessment, reliability metrics, and production monitoring—where top agents score under 50% on real-world benchmarks.
Validates AI agent outputs with 5-layer framework (unit/integration/adversarial), defines I/O contracts, vertical slice policies, and test doubles examples.
Security techniques and quality control for prompts and agents
Share bugs, ideas, or general feedback.
Validate agent produces correct, well-structured outputs for typical inputs.
Test: Agent follows workflow phases | Outputs match expected format/structure | Domain-specific rules correctly applied | Token efficiency within bounds
How: Manual invocation with representative inputs. Check against acceptance criteria in agent description.
Validate correct input/output between agents in workflows.
Test: Input parsing handles upstream format | Output format matches downstream expectations | Error signals propagate correctly | Subagent mode activation works (skip greet, execute autonomously)
How: End-to-end workflow execution through full agent chain (e.g., DISCUSS -> DESIGN -> DELIVER).
Challenge validity of agent outputs rather than accepting at face value.
Test: Source verification (cited sources real and accurate?) | Bias detection (favors one approach without evidence?) | Edge case coverage | Completeness (required sections present?)
How: Peer review by -reviewer agent using structured critique dimensions.
Independent review to catch biases and blind spots in agent design.
Test: Definition follows validation checklist? | Redundant Claude default instructions? | Over/under-specified? | Could simpler agent achieve same results?
How: @nw-agent-builder validates via 11-point checklist or @agent-builder-reviewer runs structured review.
Test resilience against misuse and prompt injection.
Test: Tool restriction enforcement | maxTurns respected | Permission mode correctly scoped | Agent stays within declared scope
How: Frontmatter fields enforce at platform level. Verify configuration.
Claude Code platform provides injection resistance through: subagent isolation (own context, no sub-subagents) | Tool restriction via frontmatter tools | Permission modes via permissionMode | Hook-based validation (PreToolUse, PostToolUse)
Do NOT add prose-based injection defense. Configure platform features:
---
tools: Read, Glob, Grep # Only tools this agent needs
maxTurns: 30 # Prevents runaway execution
permissionMode: default # User approves dangerous actions
---
tools restricted to minimum necessary (least privilege)maxTurns set to prevent runaway executionpermissionMode appropriate for risk levelBash unless agent requires command executionWrite unless agent creates/modifies files