From octo
Performs multi-AI validation, scoring, and review of code or documents using Codex and Gemini CLIs in Double Diamond Deliver phase. Auto-activates on 'review X', 'validate Y', 'score this', 'quality check'.
npx claudepluginhub nyldn/claude-octopus --plugin octoThis skill uses the workspace's default tool permissions.
> This file is generated from a template. Edit the `.tmpl` file, not this file directly.
Creates isolated Git worktrees for feature branches with prioritized directory selection, gitignore safety checks, auto project setup for Node/Python/Rust/Go, and baseline verification.
Executes implementation plans in current session by dispatching fresh subagents per independent task, with two-stage reviews: spec compliance then code quality.
Dispatches parallel agents to independently tackle 2+ tasks like separate test failures or subsystems without shared state or dependencies.
This file is generated from a template. Edit the
.tmplfile, not this file directly. Runscripts/gen-skill-docs.shto regenerate after changes.
Before starting delivery:
.octo/STATE.md to verify Develop phase complete# Verify Develop phase is complete
if [[ -f ".octo/STATE.md" ]]; then
develop_status=$("${HOME}/.claude-octopus/plugin/scripts/octo-state.sh" get_phase_status 3)
if [[ "$develop_status" != "complete" ]]; then
echo "⚠️ Warning: Develop phase not marked complete. Consider completing development first."
fi
fi
# Update state for Delivery phase
"${HOME}/.claude-octopus/plugin/scripts/octo-state.sh" update_state \
--phase 4 \
--position "Delivery" \
--status "in_progress"
This skill uses ENFORCED execution mode. You MUST follow this exact sequence.
Analyze the user's prompt and project to determine context:
Knowledge Context Indicators:
Dev Context Indicators:
Also check: What is being reviewed? Code files -> Dev, Documents -> Knowledge
Capture context_type = "Dev" or "Knowledge"
When context_type is Dev, determine the subtype to inject domain-appropriate validation criteria into the review prompt. Append the matching validation supplement after the user's prompt when calling orchestrate.sh in Step 4.
| Subtype | Trigger keywords | Validation supplement |
|---|---|---|
frontend-ui | "page", "widget", "component", "UI", "HTML", "CSS", "form", "dashboard", "layout" | Verify: all referenced files exist (scripts, stylesheets, images). Check ARIA labels and roles, keyboard navigability, touch target sizes (44px min). Flag innerHTML usage. Confirm progressive enhancement (fallbacks for navigator.share, localStorage, etc). Test self-containment: does this work if opened/run with zero setup? |
cli-tool | "CLI", "command-line", "terminal", "script", "flag", "argument" | Verify: --help flag works, exit codes are meaningful (0/1/2), stderr vs stdout used correctly, argument edge cases handled (missing args, invalid input, --unknown-flag). |
api-service | "API", "endpoint", "REST", "GraphQL", "gRPC", "server", "route" | Verify: input validation at every endpoint, consistent error response format, auth on protected routes, rate limiting considered, schema/contract documented. |
infra | "deploy", "terraform", "docker", "CI", "pipeline", "Kubernetes", "helm" | Verify: operations are idempotent, no hardcoded secrets, rollback path exists, health checks included, destroy operations require confirmation. |
data | "ETL", "pipeline", "migration", "schema", "database", "SQL" | Verify: migrations are reversible, data validation at ingestion, backup strategy documented, no data loss on failure. |
general | Default if no subtype matches | No supplement — use standard review criteria. |
How to apply: When calling orchestrate.sh in Step 4, append the validation supplement:
orchestrate.sh deliver "<user prompt>\n\nDomain-specific validation criteria:\n<supplement text>"
DO NOT PROCEED TO STEP 1c until context determined. Context type (Dev vs Knowledge) and dev subtype determine which validation supplements to inject — wrong context produces a review that checks irrelevant criteria.
For Dev context only. Skip this step entirely for Knowledge context.
Run the scope drift detection skill to compare the actual diff against stated intent before the full review:
# Load the scope drift skill
# See: skills/skill-scope-drift/SKILL.md
git diff --stat against the base branchThis is informational — NEVER blocks the review. Display the report and proceed to Step 2 regardless of the result. The report gives reviewers (both human and AI) awareness of potential drift before they begin detailed analysis.
If no intent sources are found (no TODOS.md, no PR, no commit messages), skip this step silently.
MANDATORY: Run the centralized provider check BEFORE displaying the banner:
bash "${HOME}/.claude-octopus/plugin/scripts/helpers/check-providers.sh"
Use the ACTUAL results. PROHIBITED: Showing only "🔵 Claude: Available ✓" without listing all providers.
Validation:
/octo:setupDisplay this banner BEFORE orchestrate.sh execution (list ALL providers from check output):
For Dev Context:
🐙 **CLAUDE OCTOPUS ACTIVATED** - Multi-provider validation mode
✅ [Dev] Deliver Phase: [Brief description of code review]
Provider Availability:
🔴 Codex CLI: [status from check] - Code quality analysis
🟡 Gemini CLI: [status from check] - Security and edge cases
🟢 Copilot CLI: [status from check] - GitHub integration
🟣 Qwen CLI: [status from check] - Additional perspective
🟤 OpenCode CLI: [status from check] - Multi-provider routing
🔵 Claude: Available ✓ - Synthesis and recommendations
💰 Estimated Cost: $0.02-0.08
⏱️ Estimated Time: 3-7 minutes
For Knowledge Context:
🐙 **CLAUDE OCTOPUS ACTIVATED** - Multi-provider validation mode
✅ [Knowledge] Deliver Phase: [Brief description of document review]
Provider Availability:
🔴 Codex CLI: ${codex_status} - Structure and logic analysis
🟡 Gemini CLI: ${gemini_status} - Content quality and completeness
🔵 Claude: Available ✓ - Synthesis and recommendations
💰 Estimated Cost: $0.02-0.08
⏱️ Estimated Time: 3-7 minutes
DO NOT PROCEED TO STEP 3 until banner displayed. The banner shows users which providers will run and what costs they'll incur — starting API calls without this visibility violates cost transparency.
Before executing the workflow, read full project context:
# Initialize state if needed
"${HOME}/.claude-octopus/plugin/scripts/state-manager.sh" init_state
# Set current workflow
"${HOME}/.claude-octopus/plugin/scripts/state-manager.sh" set_current_workflow "flow-deliver" "deliver"
# Get all prior decisions (critical for validation)
prior_decisions=$("${HOME}/.claude-octopus/plugin/scripts/state-manager.sh" get_decisions "all")
# Get context from all prior phases
discover_context=$("${HOME}/.claude-octopus/plugin/scripts/state-manager.sh" get_context "discover")
define_context=$("${HOME}/.claude-octopus/plugin/scripts/state-manager.sh" get_context "define")
develop_context=$("${HOME}/.claude-octopus/plugin/scripts/state-manager.sh" get_context "develop")
# Display what you found (validation needs full context)
echo "📋 Validation Context Summary:"
if [[ "$discover_context" != "null" ]]; then
echo " Discovery: $discover_context"
fi
if [[ "$define_context" != "null" ]]; then
echo " Definition: $define_context"
fi
if [[ "$develop_context" != "null" ]]; then
echo " Development: $develop_context"
fi
if [[ "$prior_decisions" != "[]" && "$prior_decisions" != "null" ]]; then
echo " Decisions to validate against:"
echo "$prior_decisions" | jq -r '.[] | " - \(.decision) (\(.phase))"'
fi
This provides full context for validation:
search, timeline, get_observations) are available — use them to check for past delivery issues or quality patternsDO NOT PROCEED TO STEP 4 until state read.
You MUST execute this command via the Bash tool:
${HOME}/.claude-octopus/plugin/scripts/orchestrate.sh deliver "<user's validation request>"
CRITICAL: You are PROHIBITED from:
You MUST use the Bash tool to invoke orchestrate.sh.
If running in Claude Code v2.1.16+, users will see real-time progress indicators in the task spinner:
Phase 1 - External Provider Execution (Parallel):
Phase 2 - Synthesis (Sequential):
These spinner verb updates happen automatically - orchestrate.sh calls update_task_progress() before each agent execution. Users see exactly which provider is working and what it's doing.
If NOT running in Claude Code v2.1.16+: Progress indicators are silently skipped, no errors shown.
After orchestrate.sh completes, verify it succeeded:
# Find the latest validation file (created within last 10 minutes)
VALIDATION_FILE=$(find ~/.claude-octopus/results -name "ink-validation-*.md" -mmin -10 2>/dev/null | head -n1)
if [[ -z "$VALIDATION_FILE" ]]; then
echo "❌ VALIDATION FAILED: No validation file found"
echo "orchestrate.sh did not execute properly"
exit 1
fi
echo "✅ VALIDATION PASSED: $VALIDATION_FILE"
cat "$VALIDATION_FILE"
If validation fails:
~/.claude-octopus/logs/After validation is complete, record final metrics:
# Update deliver phase context with validation summary
validation_summary=$(head -30 "$VALIDATION_FILE" | grep -A 2 "## Summary\|Pass\|Fail" | tail -2 | tr '\n' ' ')
"${HOME}/.claude-octopus/plugin/scripts/state-manager.sh" update_context \
"deliver" \
"$validation_summary"
# Update final metrics (completion of full workflow)
"${HOME}/.claude-octopus/plugin/scripts/state-manager.sh" update_metrics "phases_completed" "1"
# Track actual providers used (dynamic — not hardcoded)
for _provider in $(bash "${HOME}/.claude-octopus/plugin/scripts/helpers/check-providers.sh" | grep ":available" | cut -d: -f1) claude; do
"${HOME}/.claude-octopus/plugin/scripts/state-manager.sh" update_metrics "provider" "$_provider"
done
# Display final state summary
echo ""
echo "📊 Session Complete - Final Metrics:"
"${HOME}/.claude-octopus/plugin/scripts/state-manager.sh" show_summary
DO NOT PROCEED TO STEP 7 until state updated.
Read the validation file and present:
Include attribution:
---
*Multi-AI Validation powered by Claude Octopus*
*Providers: 🔴 Codex | 🟡 Gemini | 🔵 Claude*
*Full validation report: $VALIDATION_FILE*
After presenting the report, check if the current branch has an open PR and post the validation summary as a PR comment:
CURRENT_BRANCH=$(git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "")
PR_NUM=""
if [[ -n "$CURRENT_BRANCH" && "$CURRENT_BRANCH" != "main" && "$CURRENT_BRANCH" != "master" ]]; then
if command -v gh &>/dev/null; then
PR_NUM=$(gh pr list --head "$CURRENT_BRANCH" --json number --jq '.[0].number' 2>/dev/null || echo "")
fi
fi
if [[ -n "$PR_NUM" ]]; then
# Extract summary section from validation file for PR comment
REVIEW_SUMMARY=$(head -60 "$VALIDATION_FILE")
gh pr comment "$PR_NUM" --body "## Deliver Phase — Validation Report
${REVIEW_SUMMARY}
---
*Multi-AI validation by Claude Octopus (/octo:deliver)*
*Providers: 🔴 Codex | 🟡 Gemini | 🔵 Claude*"
echo "Validation report posted to PR #${PR_NUM}"
# Update agent registry
REGISTRY="${HOME}/.claude-octopus/plugin/scripts/agent-registry.sh"
if [[ -x "$REGISTRY" ]]; then
AGENT_ID="$(git rev-parse --abbrev-ref HEAD 2>/dev/null || echo "")"
"$REGISTRY" update "$AGENT_ID" --pr "$PR_NUM" 2>/dev/null || true
fi
fi
Behavior:
/octo:embrace or /octo:factory/octo:deliver, asks user first:
"PR #N found. Post validation report as a PR comment?"
Options: "Yes, post to PR", "No, terminal only"
gh CLI unavailable, skips silentlyBEFORE executing ANY workflow actions, you MUST:
Analyze the user's prompt and project to determine context:
Knowledge Context Indicators (in prompt):
Dev Context Indicators (in prompt):
Also check: What is being reviewed? Code files -> Dev, Documents -> Knowledge
Step 1b: Detect Dev Subtype — see EXECUTION CONTRACT Step 1b above for subtype table and validation supplements. Append the matching supplement to the prompt before calling orchestrate.sh.
For Dev Context:
🐙 **CLAUDE OCTOPUS ACTIVATED** - Multi-provider validation mode
✅ [Dev] Deliver Phase: [Brief description of code review]
📋 Session: ${CLAUDE_SESSION_ID}
Providers:
🔴 Codex CLI - Code quality analysis
🟡 Gemini CLI - Security and edge cases
🔵 Claude - Synthesis and recommendations
For Knowledge Context:
🐙 **CLAUDE OCTOPUS ACTIVATED** - Multi-provider validation mode
✅ [Knowledge] Deliver Phase: [Brief description of document review]
📋 Session: ${CLAUDE_SESSION_ID}
Providers:
🔴 Codex CLI - Structure and logic analysis
🟡 Gemini CLI - Content quality and completeness
🔵 Claude - Synthesis and recommendations
| Indicator | Provider | Cost Source |
|---|---|---|
| 🔴 | Codex CLI | User's OPENAI_API_KEY |
| 🟡 | Gemini CLI | User's GEMINI_API_KEY |
| 🟣 | Perplexity Sonar | User's PERPLEXITY_API_KEY |
| 🔵 | Claude | Included with Claude Code |
This is NOT optional. Users need to see which AI providers are active and understand they are being charged for external API calls (🔴 🟡).
Part of Double Diamond: DELIVER (convergent thinking)
DELIVER (ink)
\ /
\ /
\ /
\ /
\ /
Converge to
delivery
The deliver phase validates and reviews implementations using external CLI providers:
This is the convergent phase for delivery - we ensure quality before shipping.
Use deliver when you need:
Don't use deliver for:
Before execution, you'll see:
🐙 **CLAUDE OCTOPUS ACTIVATED** - Multi-provider validation
✅ Deliver Phase: Reviewing and validating implementation
Providers:
🔴 Codex CLI - Code quality and best practices
🟡 Gemini CLI - Security and edge cases
🔵 Claude - Synthesis and validation report
${HOME}/.claude-octopus/plugin/scripts/orchestrate.sh deliver "<user's validation request>"
The orchestrate.sh script will:
The ink phase includes automatic quality validation via PostToolUse hook:
Results are saved to:
~/.claude-octopus/results/${SESSION_ID}/ink-validation-<timestamp>.md
Read the synthesis and present findings with quality scores to the user.
When this skill is invoked, follow the EXECUTION CONTRACT above exactly. The contract includes:
Each step is mandatory and blocking - you cannot proceed to the next step until the current one completes successfully.
Create tasks to track execution progress:
// At start of skill execution
TaskCreate({
subject: "Execute deliver workflow with multi-AI providers",
description: "Run orchestrate.sh deliver for validation",
activeForm: "Running multi-AI deliver workflow"
})
// Mark in_progress when calling orchestrate.sh
TaskUpdate({taskId: "...", status: "in_progress"})
// Mark completed ONLY after validation report presented
TaskUpdate({taskId: "...", status: "completed"})
If any step fails:
/octo:setup and STOPNever fall back to direct review if orchestrate.sh execution fails. Report the failure and let the user decide how to proceed.
After successful execution, present validation report with:
# Validation Report: <task>
## Overall Status: ✅ PASSED / ⚠️ PASSED WITH WARNINGS / ❌ FAILED
**Quality Score**: XX/100
## Summary
[Brief summary of validation findings]
## Critical Issues (Must Fix)
- [ ] Issue 1: [Description]
- [ ] Issue 2: [Description]
## Warnings (Should Fix)
- [ ] Warning 1: [Description]
- [ ] Warning 2: [Description]
## Recommendations (Nice to Have)
- [ ] Recommendation 1: [Description]
- [ ] Recommendation 2: [Description]
## Validation Details
### Code Quality (Codex Analysis)
**Score**: XX/100
- [Specific findings about code quality]
- [Best practice violations]
- [Improvement suggestions]
### Security Audit (Gemini Analysis)
**Score**: XX/100
- [Security vulnerabilities found]
- [Authentication/authorization issues]
- [Input validation concerns]
### Final Validation (Claude Synthesis)
**Score**: XX/100
- [Integrated assessment]
- [Risk analysis]
- [Go/no-go recommendation]
## Quality Gates
- [X] Code Quality: XX/100 (PASSED/FAILED)
- [X] Security: XX/100 (PASSED/FAILED)
- [X] Best Practices: XX/100 (PASSED/FAILED)
- [X] Completeness: XX/100 (PASSED/FAILED)
## Next Steps
[Recommended actions based on findings]
Full validation report saved to: <validation file path>
User: Review the authentication implementation in src/auth/
Claude:
🐙 **CLAUDE OCTOPUS ACTIVATED** - Multi-provider validation
✅ Deliver Phase: Reviewing authentication implementation
[Executes: ${HOME}/.claude-octopus/plugin/scripts/orchestrate.sh deliver "Review authentication implementation in src/auth/"]
[After completion, reads validation and presents:]
# Validation Report: Authentication Implementation
## Overall Status: ⚠️ PASSED WITH WARNINGS
**Quality Score**: 78/100
## Summary
The authentication implementation is functional and follows most best practices, but has security concerns with token storage and missing rate limiting. Code quality is good with TypeScript types and error handling. Recommend addressing security warnings before production deployment.
## Critical Issues (Must Fix)
- [ ] **Security**: JWT secret is hardcoded in auth/config.ts (should use env variable)
- [ ] **Security**: No rate limiting on /login endpoint (vulnerability to brute force)
## Warnings (Should Fix)
- [ ] **Best Practice**: Password validation regex could be stronger (min 12 chars, special chars)
- [ ] **Testing**: Missing unit tests for token refresh logic
- [ ] **Error Handling**: Generic error messages leak whether email exists (account enumeration)
## Recommendations (Nice to Have)
- [ ] **UX**: Add "remember me" functionality for longer sessions
- [ ] **Monitoring**: Add logging for failed login attempts
- [ ] **Documentation**: Add JSDoc comments for public API methods
## Validation Details
### Code Quality (Codex Analysis)
**Score**: 85/100
**Strengths**:
- Clean TypeScript with proper types
- Good separation of concerns (controller/service/middleware)
- Proper async/await error handling
- Express middleware pattern used correctly
**Issues**:
- Missing JSDoc documentation
- Some functions exceed 50 lines (refactor recommended)
- No unit tests for edge cases
**Code Examples**:
\`\`\`typescript
// Good: Proper error handling
try {
const token = jwt.sign(payload, JWT_SECRET, { expiresIn: '15m' });
return token;
} catch (error) {
logger.error('Token generation failed', error);
throw new AuthError('Failed to generate token');
}
// Issue: Hardcoded secret (security risk)
const JWT_SECRET = 'my-secret-key-123'; // Should be process.env.JWT_SECRET
\`\`\`
### Security Audit (Gemini Analysis)
**Score**: 68/100
**Critical Vulnerabilities**:
- ❌ Hardcoded JWT secret (HIGH severity)
- ❌ No rate limiting on auth endpoints (MEDIUM severity)
- ❌ Account enumeration via error messages (MEDIUM severity)
**Good Security Practices**:
- ✅ bcrypt password hashing with 12 rounds
- ✅ JWT tokens with short expiration (15min)
- ✅ Refresh token rotation implemented
- ✅ HTTPS-only cookies for token storage
**Recommendations**:
- Use environment variables for secrets
- Add express-rate-limit middleware
- Standardize error messages ("Invalid credentials" for all auth failures)
- Consider adding 2FA hooks for future enhancement
### Final Validation (Claude Synthesis)
**Score**: 78/100
**Risk Assessment**:
- **High**: Hardcoded secrets must be fixed before production
- **Medium**: Rate limiting should be added (prevents abuse)
- **Low**: Missing tests and docs can be addressed post-launch
**Go/No-Go Recommendation**:
- ⚠️ **CONDITIONAL GO**: Fix critical security issues, then ready for production
- Timeline: 2-4 hours to address critical issues
- Post-launch: Address warnings and recommendations in next sprint
**Architecture Assessment**:
- Good: Follows Express.js patterns, scalable design
- Good: Proper separation of concerns
- Improve: Add integration tests for full auth flow
## Quality Gates
- [X] Code Quality: 85/100 ✅ PASSED
- [X] Security: 68/100 ⚠️ WARNING (below 75 threshold)
- [X] Best Practices: 80/100 ✅ PASSED
- [X] Completeness: 75/100 ✅ PASSED (minimum viable)
## Next Steps
1. **Immediate (Before Deploy)**:
- Fix hardcoded JWT secret -> use process.env.JWT_SECRET
- Add rate limiting -> npm install express-rate-limit
- Standardize error messages -> update auth/controller.ts
2. **Short Term (Next Sprint)**:
- Add unit tests for auth service
- Improve password validation
- Add JSDoc documentation
3. **Long Term (Future)**:
- Add 2FA support
- Implement "remember me" functionality
- Add comprehensive logging and monitoring
Full validation report saved to: ~/.claude-octopus/results/abc-123/ink-validation-20250118-145600.md
User: Validate the new API endpoints are ready to ship
Claude:
🐙 **CLAUDE OCTOPUS ACTIVATED** - Multi-provider validation
✅ Deliver Phase: Validating API endpoints
[Executes ink workflow]
[Presents detailed validation with:]
- API contract compliance (OpenAPI/Swagger)
- Error handling coverage
- Security (auth, input validation)
- Performance (query optimization)
- Documentation completeness
[Provides go/no-go decision with quality scores]
The ink phase automatically runs comprehensive quality checks via .claude/hooks/quality-gate.sh:
# Triggered after ink execution (PostToolUse hook)
./hooks/quality-gate.sh
v8.49.0: Project-specific lint/typecheck (before quality scoring):
Before running conceptual quality gates, detect and run the project's actual lint/typecheck commands:
package.json for lint, typecheck, tsc, check scriptsruff, mypy, cargo clippy, go vet, make lintQuality Dimensions:
| Dimension | Weight | Criteria |
|---|---|---|
| Code Quality | 25% | Complexity, maintainability, documentation |
| Security | 35% | OWASP compliance, auth, input validation |
| Best Practices | 20% | Error handling, logging, testing |
| Completeness | 20% | Feature completeness, edge cases |
Scoring Thresholds:
Ink is the final phase of the Double Diamond:
PROBE (Discover) → GRASP (Define) → TANGLE (Develop) → INK (Deliver)
Complete workflow example:
Or use ink standalone for validation of existing code.
Before marking validation complete, ensure:
External API Usage:
Ink workflows typically cost $0.02-0.08 per validation depending on codebase size and complexity.
After validation passes (go decision), run documentation synchronization to keep project docs current with shipped code. This step is automatic when running as part of /octo:embrace and offered when running standalone.
Invoke the doc-sync skill:
Skill(skill: "octo:auto", args: "sync docs for the changes on this branch")
The doc-sync skill will:
.md files (max depth 2, cap 30 files)git diff to find stale contentSkip conditions: Skip doc-sync if:
.md files exist in the project--no-docs or declines when asked)After delivery validation and doc-sync complete:
.octo/STATE.md:
/octo:ship to finalize and archive."# Update state after Delivery completion
"${HOME}/.claude-octopus/plugin/scripts/octo-state.sh" update_state \
--status "complete" \
--history "All phases complete, ready to ship"
# Display completion message with next steps
echo ""
echo "🎉 **EMBRACE WORKFLOW COMPLETE**"
echo ""
echo "All four phases have been completed:"
echo " ✅ Discover - Research and exploration"
echo " ✅ Define - Requirements and scope"
echo " ✅ Develop - Implementation"
echo " ✅ Deliver - Validation and quality"
echo ""
echo "📦 **Project ready! Run \`/octo:ship\` to finalize and archive.**"
Ready to validate! This skill activates automatically when users request code review, validation, or quality checks.