AI engineering OS for Claude Code and Codex — 80 agents, 41 commands, 17 hooks.
npx claudepluginhub shaheerkhawaja/productionos --plugin productionosNiche-agnostic agentic evaluator using CLEAR v2.0 framework — 6-domain assessment, 8 analysis dimensions, 6-tier source prioritization, evidence strength ratings, and decision trees. Evaluates any plan, codebase, or research output.
Idea-to-running-code lifecycle orchestration. 10-phase pipeline with 5 hard decision gates, wave-based parallelism, and STATE.json resumability. Composes /deep-research, /auto-swarm-nth, /production-upgrade, /security-audit, and /ship into a single end-to-end flow.
Self-improving agent optimization — generates challenger variants of any agent/command, benchmarks against baseline, promotes winners, logs learnings to instincts. Inspired by Karpathy's autoresearch pattern.
Nth-iteration agent swarm — spawns parallel agent waves, evaluates strictly per wave, re-swarms gaps until 100% coverage and 10/10 quality. Can invoke any ProductionOS skill or command within waves.
Distributed agent swarm orchestrator — spawns parallel subagent clusters for any task with configurable depth, swarm size, and convergence criteria
Autonomous recursive improvement loop for a single target. Runs gap analysis, recursive refinement, evaluation, and convergence checks until the target reaches quality threshold or converges.
Idea exploration before building — understand the problem, propose approaches, present design, get approval. HARD-GATE: no implementation until design is approved.
Headless browser for QA testing, site inspection, and interaction verification. Navigate, screenshot, click, fill forms, capture snapshots.
ProductionOS smart router — single entry point that routes to the right pipeline based on intent. The ONLY command new users need to know.
Context engineering agent — researches context window optimization from arxiv, builds token-efficient context packages for downstream agents, manages cross-session persistence via MetaClaw.
Systematic debugging with hypothesis tracking — reproduce, hypothesize, test, narrow, fix. Never guess-and-check.
8-phase autonomous research pipeline with multi-source discovery, 4-layer citation verification, hypothesis generation, and PIVOT/REFINE/PROCEED decision loops. Confidence-gated — loops until 95%+ confidence.
Full UI/UX redesign pipeline — audits design, creates design systems, generates interactive HTML mockups, launches local browser for user interaction. Fuses /production-upgrade rigor with design agency methodology.
ProductionOS Mission Control — launch Claude DevTools, show session dashboard with eval convergence, agent dispatches, cost tracking, and hot file intelligence.
Post-ship documentation update — reads all project docs, cross-references the diff, updates README/ARCHITECTURE/CONTRIBUTING/CLAUDE.md to match what shipped.
Full-stack frontend upgrade pipeline — fuses /production-upgrade iterative audit with /plan-ceo-review vision and /plan-eng-review rigor. Deploys parallel auto-swarm agents for iterative audit and execution. Enriched with /deep-research for competitive parity.
Interactive code tutor — breaks down codebase logic, explains complexities, translates technical concepts for the user. Ideal after /btw commands. Teaches the WHY behind the code, not just the WHAT.
Business idea → production-ready plan pipeline. User provides an idea or business plan, agent researches market, competitors, existing solutions, challenges assumptions, identifies flaws, and builds a comprehensive execution plan with auto-document population.
Nuclear-scale autonomous research — deploys 500-1000 agents in ONE massive simultaneous wave for exhaustive topic saturation. Deep-research methodology × auto-swarm scale = maximum parallel intelligence. WARNING: Extreme resource consumption.
Nth-iteration omni-plan — recursive orchestration that chains ALL ProductionOS skills and agents, evaluates strictly per iteration, and loops until 10/10 is achieved. Each iteration can invoke any command or skill in the system.
ProductionOS flagship — 13-step orchestrative pipeline with tri-tiered evaluation, recursive convergence, CEO/Eng/Design review chain, CLEAR framework evaluation, multi-model judge tribunal, and autonomous PIVOT/REFINE/PROCEED decisions. Targets 100% production-ready output.
CEO/founder-mode plan review — rethink the problem, find the 10-star product, challenge premises. Four modes: SCOPE EXPANSION, SELECTIVE EXPANSION, HOLD SCOPE, SCOPE REDUCTION.
Engineering architecture review — lock in execution plan with data flow diagrams, error paths, test matrix, performance budget, and dependency analysis.
Run the full product upgrade pipeline — 55-agent iterative review with CEO/Engineering/UX/QA parallel loops
Show how to use ProductionOS — explains commands, recommended workflows, best flows to run, and usage guidelines.
Save current pipeline state for later resumption. Creates a checkpoint at .productionos/CHECKPOINT.json with all active context.
Resume a paused pipeline from .productionos/CHECKPOINT.json. Restores context and routes to the correct step.
Display ProductionOS system statistics — agent count, command count, hook count, test count, version, instinct count, and session history.
Update ProductionOS plugin to the latest version from GitHub
Report-only QA testing — produces structured report with health score, screenshots, and repro steps. No fixes applied.
Systematic QA testing with health scoring — tests web app, finds bugs, fixes them iteratively. Regression mode for re-testing known issues.
Review and refine flagged RLM outputs — reads pending signals, dispatches L17 SelfRefine (generate critique, refine, converge), updates signals with human verdicts
Engineering retrospective — analyzes commit history, work patterns, code quality metrics, self-eval scores, and ProductionOS health with persistent trend tracking.
Pre-landing code review — analyzes diff for SQL safety, LLM trust boundaries, conditional side effects, missing tests, dependency risks, and security issues.
7-domain security hardening audit — OWASP Top 10 2025, MITRE ATT&CK mapping, NIST CSF 2.0 alignment, secret detection, supply chain audit, container security, DevSecOps pipeline. Grounded in 734 cybersecurity skills.
Run self-evaluation on recent work — questions quality, necessity, correctness, dependencies, completeness, learning, and honesty. Enabled by default in all flows. Standalone invocation for on-demand evaluation.
End-of-session self-training — captures session metrics, extracts patterns via metaclaw-learner, updates instincts, and generates optimization hypotheses for the next run.
Ship workflow — detect base branch, merge, run tests, review diff, bump VERSION, update CHANGELOG, commit, push, create PR.
Test-driven development — write tests first, then implement minimal code to pass. Enforces red-green-refactor cycle with coverage targets.
UX improvement pipeline — creates user stories from UI guidelines, maps user journeys, identifies friction, dispatches fix agents. The user-experience equivalent of /production-upgrade.
Create step-by-step implementation plans with risk assessment, dependency mapping, and effort estimation. Used after brainstorming, before execution.
Red-team agent that attacks every assumption, breaks every feature, and finds every way to abuse the system. READ-ONLY — never modifies code. Uses hostile-user thinking to surface issues other agents miss.
AI/ML integration specialist. Designs model pipelines (inference, fine-tuning, LoRA adapters), selects infrastructure (GPU provisioning, model serving), implements evaluation frameworks, and optimizes for cost/latency tradeoffs. Covers Hugging Face, Replicate, Modal, RunPod, vLLM, and managed APIs.
API contract validation agent that ensures frontend API calls match backend endpoints, request/response types align, error codes are handled, and the API surface is consistent and well-documented.
HumanLayer-inspired approval gate. Enforces human-in-the-loop for HIGH-stakes operations. Classifies actions by risk level, blocks until approved.
System architecture generation agent — designs tech stack, service boundaries, data model, API contract, infrastructure topology, and security model from SRS requirements. Produces SYSTEM-ARCHITECTURE.md, DATA-MODEL.md, API-CONTRACT.md with Architecture Decision Records for every major choice.
AI asset generation agent — connects to image generation APIs (Nano Banana, FAL AI, Replicate), manages asset storage pipelines, generates responsive variants, and integrates assets into frontend code.
Headless browser control via Playwright for QA testing, screenshots, form interaction, page snapshots, and accessibility auditing. The eyes of ProductionOS.
Business logic validation agent that audits pricing calculations, approval workflows, state machines, authorization rules, and business rule consistency. Catches the bugs that pass code review but break the business.
Systematic code review agent with two-pass review (CRITICAL then INFORMATIONAL), fix-first heuristic (auto-fix mechanical issues, ask about ambiguous ones), battle-tested pattern detection, suppression list, and evidence-backed findings with file:line citations.
Communication assistant — generates and audits README, CHANGELOG, PR descriptions, commit messages, release notes, and API documentation. Cross-references docs against actual code for accuracy.
Comparative analysis agent — performs side-by-side codebase comparison, architecture A/B analysis, competitive analysis, before/after delta analysis, and technology evaluation with structured comparison matrices.
Code complexity analysis agent — measures cyclomatic complexity, cognitive complexity, code duplication, and function length to identify maintainability risks. Produces actionable refactoring recommendations prioritized by risk.
RAG-in-pipeline context management agent — retrieves relevant documentation, past decisions, library docs, and memory entries to ground every agent's work in authoritative context.
Cross-iteration convergence tracker — runs 6 detection algorithms (score-based, semantic, diminishing returns, oscillation, plateau+pivot, EMA velocity) to produce a unified convergence verdict with strategy recommendations.
Database schema and query audit agent. Checks normalization, indexes, naming conventions, migration safety, RLS/tenant isolation, N+1 queries, connection pool sizing, and data integrity constraints. Supports PostgreSQL, MySQL, SQLite, MongoDB.
Semgrep SAST scanner — runs deterministic static analysis with 695+ community rules, parses SARIF output into ProductionOS findings format, integrates with security-hardener and quality gates.
Manages cross-session context preservation, progressive context loading (L0/L1/L2), context compression, handoff artifact generation, and session continuity. Prevents context rot across long sessions and multi-session projects.
Stub and placeholder detector — distinguishes 'file exists' from 'feature works' by scanning for placeholder patterns, TODO comments, mock data, hardcoded values, and incomplete implementations that masquerade as working features.
Distributed swarm coordination agent — manages agent lifecycle, work distribution, convergence tracking, and inter-agent communication for auto-swarm operations.
Database schema architect that designs schemas from requirements, generates migrations, validates data models, and audits existing database structures. Supports PostgreSQL, Supabase, Pinecone, SQLite, MongoDB, and ORM-specific patterns (Prisma, Drizzle, SQLAlchemy, Django ORM).
Multi-agent structured debate protocol where 3-5 persona-driven debaters argue positions through claim-evidence-rebuttal rounds, moderated by a judge with adaptive convergence detection. Implements recursive debate within Claude Code's single-model constraint.
Autonomous PIVOT/REFINE/PROCEED decision agent — evaluates iteration results and autonomously decides whether to continue, adjust focus, or fundamentally change strategy. Inspired by AutoResearchClaw's Stage 15.
Deep research agent that investigates techstacks, libraries, competitor patterns, market niches, and best practices using Chain-of-Thought reasoning and multi-source synthesis. Use before any implementation to gather authoritative context for product decisions.
Chain of Density inter-iteration summarizer — progressively compresses findings across iterations into information-dense handoff documents that prevent context rot.
Dependency vulnerability scanner and health checker. Runs npm audit, pip-audit, checks for outdated packages, license conflicts, and abandoned dependencies.
Creates comprehensive design system specifications from codebase analysis and competitive research. Produces token systems, component inventories, pattern libraries, and theming configurations. The design equivalent of architecture-designer.
UI/UX redesign orchestrator — audits design, creates design systems, generates interactive HTML mockups, launches local browser for user selection and annotation. The design equivalent of /production-upgrade.
Pre-pipeline decision capture agent — conducts structured user interview to lock requirements, constraints, and non-negotiables before any review or fix agents run. Prevents pipeline from optimizing in wrong direction.
Document parsing agent — converts PDF, DOCX, PPTX, and other document formats to structured markdown text using Docling. Enables agents to consume non-code context like specs, PRDs, research papers, and slide decks.
Documentation accuracy auditor — cross-references README, CLAUDE.md, ARCHITECTURE.md, and inline comments against actual code behavior. Detects stale docs, wrong counts, broken links, and claims that no longer match the codebase.
Dynamic planning orchestrator that synthesizes findings from all review agents, produces prioritized fix plans, generates TDD specs, and sequences execution batches. Uses Chain-of-Thought reasoning and step-back prompting for strategic planning.
Comprehensive test strategy designer that analyzes coverage gaps, generates TDD specs and test stubs, plans test infrastructure across unit/integration/E2E layers, prioritizes tests by risk, and integrates with CI pipelines. Produces executable test files and architecture documentation.
TextGrad-inspired prompt optimizer. Evaluates agent/command prompts, computes textual gradients (what to improve), applies gradient descent on text, and converges to higher-quality prompts. Integrates with rubric-evolver for evaluation criteria.
Graph of Thought aggregation agent that connects findings from all review agents into a causal network, revealing systemic issues through centrality analysis and root cause identification.
Maps user stories into organized story maps with backbone activities, walking skeletons, and iteration layers. Creates visual journey maps and identifies story gaps. Sub-agent of ux-genie.
UX/UI audit agent that evaluates design consistency, accessibility, responsive behavior, interaction patterns, and identifies improvement opportunities through competitor comparison.
UX improvement orchestrator — creates detailed user stories from UI guidelines, maps user journeys, identifies friction points, and dispatches agents to implement UX improvements. The user-experience equivalent of /production-upgrade.
Completion verification agent — enforces 'NO COMPLETION WITHOUT FRESH EVIDENCE' protocol. Validates that every claimed fix, improvement, or deliverable has verifiable proof before marking as done.
Silent end-to-end system architect that observes all agent activity, validates architecture consistency, evaluates implementation quality, and sub-orchestrates specialist agents when gaps are detected. Runs as a background observer on every pipeline execution.
Ecosystem intelligence scanner — monitors Claude Code skill repositories, plugin marketplaces, and community contributions for new capabilities worth adopting. Produces ECOSYSTEM-INTEL.md reports.
Frontend design agent — generates design system tokens, component architecture, empathy maps, user journey maps, TTFV analysis, and motion design patterns. Orchestrates frontend-scraper and ux-auditor.
Playwright screenshot and Lighthouse performance capture agent — takes screenshots at multiple breakpoints, runs Lighthouse audits, and captures accessibility scores for visual evidence.
Ecosystem gap analyzer — compares ProductionOS capabilities against the broader Claude Code skill/plugin ecosystem, identifies missing features, and recommends adoption priorities. Produces actionable gap reports.
GitOps orchestrator agent — ensures clean code reaches the repository through pre-contribution analysis, branch management, commit hygiene, PR creation, issue tracking, pre-push validation, and repository health monitoring. Coordinates code-reviewer and self-healer before any push.
Safety and human-in-the-loop enforcement agent — monitors all pipeline operations for scope violations, protected file access, budget overruns, and security regressions. Can halt the pipeline.
Cross-session context persistence agent. Creates structured handoff artifacts with unique names, maintains changelog of reasoning and changes, and enables any agent to save/restore rich context across sessions. The foundational sub-agent invoked by all other agents for state management.
OWASP Top 10 + attack surface mapping agent — systematically checks for injection, broken auth, sensitive data exposure, XXE, broken access control, misconfig, XSS, deserialization, component vulns, and logging gaps.
Local RAG agent that progressively loads SecondBrain context using the 4-level drill-down (hot cache -> index -> domain -> entity). Returns minimal context sufficient to answer the query.
Graph RAG agent that traverses SecondBrain wikilinks to build context graphs for cross-project queries. Reads wiki pages, follows link chains, and returns structured context with relevance scoring.
Worktree lifecycle orchestrator — creates isolated git worktrees for parallel agent execution, assigns non-overlapping file scopes, runs preflight checks, manages merge sequence, and handles crash recovery for orphaned worktrees. Implements m13v's production patterns.
Infrastructure and backend setup specialist. Designs and scaffolds production backends with database support (Supabase, PostgreSQL, Pinecone, MongoDB), auth (Clerk, Auth0), payments (Stripe), and deployment (Vercel, Railway, Fly.io). Creates runnable scaffolds, not just plans.
Product-level user interview agent for greenfield projects — conducts structured intake to capture problem, audience, solution, business model, constraints, and success criteria. Outputs INTAKE-BRIEF.md, INTAKE-ASSUMPTIONS.md, INTAKE-PERSONAS.md for downstream pipeline consumption.
Independent LLM evaluator that scores codebases on 10 quality dimensions with evidence-based citations, confidence calibration, and self-consistency validation. Controls the recursive improvement loop convergence. READ-ONLY — never modifies code.
Autonomous merge conflict resolution agent. Detects conflicts during worktree merge, analyzes both sides using git diff3, proposes semantic resolutions, and applies them with test-gate verification. Designed for recursive autonomous operation in auto-swarm-nth waves.
Cross-run learning system inspired by AutoResearchClaw's MetaClaw — extracts structured lessons from pipeline failures, converts them to reusable rules, injects them into future runs. +18.3% robustness improvement.
Migration safety agent — plans database migrations, dependency upgrades, API version transitions, and breaking changes with rollback procedures and feature flag strategies.
Creates interactive HTML/CSS mockups from design specs and audit findings. Mockups include annotation overlay, dark mode toggle, responsive preview, and side-by-side comparison. Served via local HTTP for browser-based review.
Naming convention enforcer that audits variable names, function names, file names, class names, and database columns against language-specific best practices. Produces a renaming plan for clean, consistent codebases.
Test gap analyzer that identifies requirements with no automated test coverage and generates tests to close the gaps. Named after Nyquist sampling theorem — every requirement needs at least one happy-path test and one boundary test.
Performance benchmarking agent — profiles API response times, database query efficiency, bundle sizes, memory usage, and identifies bottlenecks with before/after comparison.
Three-persona evaluation agent that scores the codebase from Technical, Human, and Meta perspectives — then synthesizes a holistic verdict using weighted averaging.
Pre-execution plan validator — reads plans before execution and verifies they will achieve the stated goal, have no circular dependencies, fit context budget, and honor locked user decisions from discuss-phase.
Product Requirements Document generator — transforms intake brief, validated assumptions, and research findings into a complete PRD with user stories, journey maps, feature backlog, success metrics, and MoSCoW prioritization. Produces machine-readable output for downstream SRS and architecture agents.
Hypothesis-driven prompt variant generator for the autoresearch loop — analyzes agent instructions, generates challenger variants with specific hypotheses, evaluates prompt composition effectiveness. Part of the /auto-optimize pipeline.
Quality gate enforcer — evaluates configurable quality thresholds from quality-gates.yml before commits, deploys, and pipeline completions. Blocks on failures, warns on threshold proximity.
Orchestrates self-check, self-evaluation, and self-healing loops. Monitors agent output quality in real-time, triggers re-evaluation when quality drifts, and manages the learn→eval→heal→learn cycle. The QA brain of ProductionOS.
RAG pipeline architect that designs, implements, and optimizes Retrieval-Augmented Generation systems. Handles chunking strategies, embedding model selection, retrieval methods, reranking, and context window optimization for any target codebase.
Recursive LLM orchestrator — manages recursion depth, context budgets, convergence detection, and branch merging for recursive agent execution. Implements the RLM pattern within Claude Code's constraints.
Code refactoring specialist that eliminates dead code, reduces complexity, extracts functions, consolidates duplicates, and improves code structure without changing behavior. Follows the boy scout rule — leave code cleaner than you found it.
Regression detection agent — compares current codebase state against baseline metrics (self-eval scores, test coverage, complexity, performance) to detect quality regressions before they ship. Integrates with convergence engine for trend analysis.
Requirements traceability and SRS generation agent — produces Software Requirements Specification with business rules (BL-XX-XXX), decision trees (DT-X.X), acceptance criteria, and a full traceability matrix linking every requirement from PRD user story through technical spec to test case.
Autonomous deep research pipeline inspired by AutoResearchClaw — 8-phase literature discovery with multi-source search (arxiv, Semantic Scholar, OpenAlex), 4-layer citation verification, hypothesis generation via multi-agent debate, self-healing code execution, and autonomous PIVOT/REFINE/PROCEED decision loops.
Reverse engineering agent — extracts architecture, decision archaeology, design patterns, API surfaces, security models, and performance architecture from any production codebase. Produces replication guides.
Orchestrates recursive refinement of agent outputs using L16-L21 layers. Manages depth, convergence, and context compression within Claude Code's depth-3 agent limit.
OPRO-based rubric self-evolution agent. Generates rubric variants, scores them against calibration sets, and promotes winners. Feeds into llm-judge and self-evaluator for improved evaluation quality over time.
Unified rule evaluation agent — wraps Semgrep, ast-grep, ruff, oxlint, and custom quality-gates.yml rules into a single deterministic analysis pass. Returns structured findings with severity, CWE references, and fix suggestions.
Project scaffold generation agent — initializes a working project from architecture specifications. Creates directory structure, package configs, Docker setup, CI/CD pipelines, environment templates, and CLAUDE.md. Output builds and lints clean on first run.
Comprehensive security audit agent grounded in 734 cybersecurity skills — OWASP Top 10 2025, MITRE ATT&CK mapping, NIST CSF alignment, secret detection, supply chain audit, container security, and DevSecOps pipeline verification.
Self-evaluation agent that questions work quality, necessity, correctness, and dependency mapping. Runs after every agent action by default. Implements the 7-question self-eval protocol with self-heal loops.
Auto-fix agent with 10-round iterative healing, NaN/Infinity fast-fail detection, AST validation, and partial result capture. Inspired by AutoResearchClaw's self-healing execution. Runs after every batch to ensure validation gates pass.
Niche-agnostic agentic evaluator using CLEAR v2.0 framework — 6-domain assessment, 8 analysis dimensions, 6-tier source prioritization, evidence strength ratings, and decision trees. Evaluates any plan, codebase, or research output.
Composite: security audit -> production upgrade -> self-evaluation. Use when user says 'audit', 'check the codebase', 'find and fix issues', or 'is this production-ready'.
Idea-to-running-code lifecycle orchestration. 10-phase pipeline with 5 hard decision gates, wave-based parallelism, and STATE.json resumability. Composes /deep-research, /auto-swarm-nth, /production-upgrade, /security-audit, and /ship into a single end-to-end flow.
Self-improving agent optimization — generates challenger variants of any agent/command, benchmarks against baseline, promotes winners, logs learnings to instincts. Inspired by Karpathy's autoresearch pattern.
Nth-iteration agent swarm — spawns parallel agent waves, evaluates strictly per wave, re-swarms gaps until 100% coverage and 10/10 quality. Can invoke any ProductionOS skill or command within waves.
Distributed agent swarm orchestrator — spawns parallel subagent clusters for any task with configurable depth, swarm size, and convergence criteria
Autonomous recursive improvement loop for a single target. Runs gap analysis, recursive refinement, evaluation, and convergence checks until the target reaches quality threshold or converges.
Idea exploration before building — understand the problem, propose approaches, present design, get approval. HARD-GATE: no implementation until design is approved.
Headless browser for QA testing, site inspection, and interaction verification. Navigate, screenshot, click, fill forms, capture snapshots.
ProductionOS smart router — single entry point that routes to the right pipeline based on intent. The ONLY command new users need to know.
Context engineering agent — researches context window optimization from arxiv, builds token-efficient context packages for downstream agents, manages cross-session persistence via MetaClaw.
Systematic debugging with hypothesis tracking — reproduce, hypothesize, test, narrow, fix. Never guess-and-check.
8-phase autonomous research pipeline with multi-source discovery, 4-layer citation verification, hypothesis generation, and PIVOT/REFINE/PROCEED decision loops. Confidence-gated — loops until 95%+ confidence.
Full UI/UX redesign pipeline — audits design, creates design systems, generates interactive HTML mockups, launches local browser for user interaction. Fuses /production-upgrade rigor with design agency methodology.
ProductionOS Mission Control — launch Claude DevTools, show session dashboard with eval convergence, agent dispatches, cost tracking, and hot file intelligence.
Post-ship documentation update — reads all project docs, cross-references the diff, updates README/ARCHITECTURE/CONTRIBUTING/CLAUDE.md to match what shipped.
Full-stack frontend upgrade pipeline — fuses /production-upgrade iterative audit with /plan-ceo-review vision and /plan-eng-review rigor. Deploys parallel auto-swarm agents for iterative audit and execution. Enriched with /deep-research for competitive parity.
Composite: audit -> upgrade -> research -> plan -> swarm fix -> eval -> ship. The complete ProductionOS pipeline. Use when user says 'do everything', 'full cycle', 'end to end', or 'make it production-ready'.
Composite: SEO -> content -> ads -> analytics audit for marketing and growth. Use when user mentions 'marketing', 'SEO', 'growth', 'ads', 'conversion', or 'traffic'.
Interface Craft by ProductionOS Design — a toolkit for building polished, animated interfaces in React. Includes Motion System (human-readable animation DSL with stage-driven sequencing), DialKit (live control panels for tuning animation values), and Design Evaluator (systematic UI review based on ProductionOS Design's methodology). Triggers on: animate, animation, transition, storyboard, entrance, motion, spring, easing, timing, finetune-control, sliders, controls, tune, tweak, critique, review, feedback, audit, improve, polish, refine, redesign.
Interactive code tutor — breaks down codebase logic, explains complexities, translates technical concepts for the user. Ideal after /btw commands. Teaches the WHY behind the code, not just the WHAT.
Business idea -> production-ready plan pipeline. User provides an idea or business plan, agent researches market, competitors, existing solutions, challenges assumptions, identifies flaws, and builds a comprehensive execution plan with auto-document population.
Nuclear-scale autonomous research — deploys 500-1000 agents in ONE massive simultaneous wave for exhaustive topic saturation. Deep-research methodology x auto-swarm scale = maximum parallel intelligence. WARNING: Extreme resource consumption.
Nth-iteration omni-plan — recursive orchestration that chains ALL ProductionOS skills and agents, evaluates strictly per iteration, and loops until 10/10 is achieved. Each iteration can invoke any command or skill in the system.
ProductionOS flagship — 13-step orchestrative pipeline with tri-tiered evaluation, recursive convergence, CEO/Eng/Design review chain, CLEAR framework evaluation, multi-model judge tribunal, and autonomous PIVOT/REFINE/PROCEED decisions. Targets 100% production-ready output.
CEO/founder-mode plan review — rethink the problem, find the 10-star product, challenge premises. Four modes: SCOPE EXPANSION, SELECTIVE EXPANSION, HOLD SCOPE, SCOPE REDUCTION.
Engineering architecture review — lock in execution plan with data flow diagrams, error paths, test matrix, performance budget, and dependency analysis.
Run the full product upgrade pipeline — 55-agent iterative review with CEO/Engineering/UX/QA parallel loops
Show how to use ProductionOS — explains commands, recommended workflows, best flows to run, and usage guidelines.
Save current pipeline state for later resumption. Creates a checkpoint at .productionos/CHECKPOINT.json with all active context.
Resume a paused pipeline from .productionos/CHECKPOINT.json. Restores context and routes to the correct step.
Display ProductionOS system statistics — agent count, command count, hook count, test count, version, instinct count, and session history.
Update ProductionOS plugin to the latest version from GitHub
ProductionOS — dual-target AI engineering operating system for repo-wide audits, upgrade plans, code reviews, strategic product reviews, security sweeps, UX audits, and recursive quality improvement.
Report-only QA testing — produces structured report with health score, screenshots, and repro steps. No fixes applied.
Systematic QA testing with health scoring — tests web app, finds bugs, fixes them iteratively. Regression mode for re-testing known issues.
Review and refine flagged outputs, using critique and focused iteration to improve weak results.
Composite: deep research -> CEO review -> eng review. Use when user says 'research', 'plan', 'design', 'architect', or 'spec out'.
Retrospective workflow that summarizes what shipped, what broke, and what should improve next.
Enforces code review quality before commits and pushes across ALL projects. 6-gate sequence: diff size, PII/secrets, conventions, cross-project boundaries, completeness, self-review reminder. Only PII gate blocks; rest are advisory. Triggers on: "review before push", "pre-commit review", "quality gate", "/review-gate".
Pre-landing code review — analyzes diff for SQL safety, LLM trust boundaries, conditional side effects, missing tests, dependency risks, and security issues.
7-domain security hardening audit — OWASP Top 10 2025, MITRE ATT&CK mapping, NIST CSF 2.0 alignment, secret detection, supply chain audit, container security, DevSecOps pipeline. Grounded in 734 cybersecurity skills.
Run self-evaluation on recent work — questions quality, necessity, correctness, dependencies, completeness, learning, and honesty. Enabled by default in all flows. Standalone invocation for on-demand evaluation.
End-of-session self-training — captures session metrics, extracts patterns via metaclaw-learner, updates instincts, and generates optimization hypotheses for the next run.
Scaffold and wire a persistent SecondBrain (Obsidian vault + LLM wiki) for cross-session knowledge management. Creates PARA structure, wiki domains/entities/concepts, cross-project references, and RAG integration. Runs once per user, then the wiki compounds over time. Triggers on: "setup secondbrain", "create knowledge base", "setup wiki", "persistent memory", "second brain", "/setup-secondbrain".
Composite: self-eval -> review -> ship. Use when user says 'ship', 'deploy', 'push', 'merge', or 'create PR'. Ensures quality before shipping.
Ship workflow — detect base branch, merge, run tests, review diff, bump VERSION, update CHANGELOG, commit, push, create PR.
Test-driven development workflow that writes failing tests first, implements minimally, and refactors safely.
UX improvement pipeline — creates user stories from UI guidelines, maps user journeys, identifies friction, dispatches fix agents. The user-experience equivalent of /production-upgrade.
Local RAG and Graph RAG over the SecondBrain wiki vault. Progressive context loading (hot cache -> index -> domain -> entity). Graph traversal via wikilink resolution. Use when agents need cross-project context, when answering questions that span multiple domains, or when building context for planning tasks. Triggers on: "wiki context", "cross-project context", "what do we know about", "check the wiki", "graph context", "/wiki-rag".
Implementation planning workflow that turns approved ideas into dependency-aware execution plans.
ProductionOS instinct-based learning system. Observes sessions via hooks, extracts patterns, creates atomic instincts with confidence scoring, and auto-promotes high-confidence patterns across projects.
ProductionOS frontend quality scanner. Auto-activates when editing React/Vue/Svelte components, CSS, or layout files. Checks accessibility, responsive design, performance, and design system consistency.
ProductionOS — dual-target AI engineering operating system for repo-wide audits, upgrade plans, code reviews, strategic product reviews, security sweeps, UX audits, and recursive quality improvement.
ProductionOS security scanner. Auto-activates when editing auth, payment, credential, or admin files. Runs OWASP Top 10 checks, dependency audit, and secret detection.
Complete developer toolkit for Claude Code
Executes bash commands
Hook triggers when Bash tool is used
Modifies files
Hook triggers on file write and edit operations
Uses power tools
Uses Bash, Write, or Edit tools
Access thousands of AI prompts and skills directly in your AI coding assistant. Search prompts, discover skills, save your own, and improve prompts with AI.
Orchestrate multi-agent teams for parallel code review, hypothesis-driven debugging, and coordinated feature development using Claude Code's Agent Teams
Comprehensive toolkit for developing Claude Code plugins. Includes 7 expert skills covering hooks, MCP integration, commands, agents, and best practices. AI-assisted plugin creation and validation.
Context-Driven Development plugin that transforms Claude Code into a project management tool with structured workflow: Context → Spec & Plan → Implement
Comprehensive skill pack with 66 specialized skills for full-stack developers: 12 language experts (Python, TypeScript, Go, Rust, C++, Swift, Kotlin, C#, PHP, Java, SQL, JavaScript), 10 backend frameworks, 6 frontend/mobile, plus infrastructure, DevOps, security, and testing. Features progressive disclosure architecture for 50% faster loading.