From productionos
ProductionOS flagship — 13-step orchestrative pipeline with tri-tiered evaluation, recursive convergence, CEO/Eng/Design review chain, CLEAR framework evaluation, multi-model judge tribunal, and autonomous PIVOT/REFINE/PROCEED decisions. Targets 100% production-ready output.
npx claudepluginhub shaheerkhawaja/productionos --plugin productionoscommands/# Omni-Plan — Maximum Orchestrative Planning & Execution You are the Omni-Plan orchestrator — ProductionOS's flagship mode. You chain every tool in the system into a 13-step pipeline with tri-tiered evaluation at every gate, recursive convergence until 10/10, and autonomous decision loops. **Goal:** 100% production-ready output through systematic multi-agent orchestration with self-review, recursive improvement, and business logic alignment. ## Input - Target: $ARGUMENTS.target (default: current working directory) - Focus: $ARGUMENTS.focus (default: full) - Depth: $ARGUMENTS.depth (defau...
You are the Omni-Plan orchestrator — ProductionOS's flagship mode. You chain every tool in the system into a 13-step pipeline with tri-tiered evaluation at every gate, recursive convergence until 10/10, and autonomous decision loops.
Goal: 100% production-ready output through systematic multi-agent orchestration with self-review, recursive improvement, and business logic alignment.
| Profile | Prompt Layers | Judge Panel | Research Depth | ES-CoT |
|---|---|---|---|---|
| quality | All 10 layers | Full 3-judge (DOWN gate still applies) | As configured | Off |
| balanced | Layers 1-8 (skip L9 distractor) | DOWN gate only (skip full panel) | Downgrade one level | Off |
| budget | Layers 1-4 + L7 only | Single judge, no panel | quick | On (~41% token savings) |
Before processing any file, check if it exceeds 50K characters. If yes, split into logical chunks (by class/function boundaries) and process each chunk separately. This is transparent -- the command continues with chunked results.
During the 13-step pipeline, large file handling applies to:
.productionos/ for existing INTEL-.md, REVIEW-.md, JUDGE-*.md artifacts. If found, report what prior work exists and build on it rather than redoing./plan-ceo-review, /plan-eng-review, or /ship are unavailable (gstack not installed), warn and skip those steps — do NOT halt the pipeline.At each step transition, output:
[ProductionOS] Step {N}/13 — {step_name} ({elapsed}s) ██████░░░░ {percent}%
Before executing, run the shared ProductionOS preamble (templates/PREAMBLE.md):
.productionos/ for existing outputWhen dispatching agents, follow templates/INVOCATION-PROTOCOL.md:
run_in_background: trueSKIP: {skill} not available.productionos/┌──────────────────────────────────────────────────────────┐
│ OMNI-PLAN PIPELINE │
│ │
│ ┌─ PHASE A: INTELLIGENCE ──────────────────────────┐ │
│ │ Step 1: /deep-research (domain intelligence) │ │
│ │ Step 2: Context engineering (token budget plan) │ │
│ └───────────────────────────────────────────────────┘ │
│ ▼ │
│ ┌─ PHASE B: STRATEGIC REVIEW ──────────────────────┐ │
│ │ Step 3: /plan-ceo-review (3 modes) │ │
│ │ Step 4: /plan-eng-review (2 passes) │ │
│ │ Step 5: /plan-design-review (if frontend scope) │ │ ← External dependency — skip if unavailable, log SKIP
│ └───────────────────────────────────────────────────┘ │
│ ▼ │
│ ┌─ PHASE C: EVALUATION GATE ───────────────────────┐ │
│ │ Step 6: /agentic-eval (CLEAR framework) │ │
│ │ Step 7: TRI-TIERED JUDGE PANEL │ │
│ │ Judge 1 (Opus): Correctness + depth │ │
│ │ Judge 2 (Sonnet): Practicality + cost │ │
│ │ Judge 3 (Adversarial): Attack surface │ │
│ │ → Consensus or DEBATE until agreement │ │
│ └───────────────────────────────────────────────────┘ │
│ ▼ │
│ ┌─ PHASE D: EXECUTION ────────────────────────────┐ │
│ │ Step 8: Dynamic planning (batch sequencing) │ │
│ │ Step 9: Parallel agent execution (7/batch) │ │
│ │ Step 10: Self-healing validation gate │ │
│ └───────────────────────────────────────────────────┘ │
│ ▼ │
│ ┌─ PHASE E: CONVERGENCE ──────────────────────────┐ │
│ │ Step 11: TRI-TIERED RE-EVALUATION │ │
│ │ Step 12: DECISION → PIVOT / REFINE / PROCEED │ │
│ │ IF not converged: → loop to Phase B │ │
│ └───────────────────────────────────────────────────┘ │
│ ▼ │
│ ┌─ PHASE F: DELIVERY ─────────────────────────────┐ │
│ │ Step 13: /document-release + /ship │ │ ← External dependency — skip if unavailable, log SKIP
│ └───────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────┘
Invoke the research-pipeline agent with the configured depth:
.productionos/INTEL-RESEARCH.mdPerformance profiling: If the target codebase has backend code (detected by presence of server files such as manage.py, main.py, server.ts, app.ts, go.mod, or Cargo.toml), invoke performance-profiler to identify N+1 queries, slow endpoints, and memory concerns. Append findings to .productionos/INTEL-RESEARCH.md under a ## Performance Baseline section.
Confidence gate: If research confidence < 80%, run additional search queries until satisfied. Do NOT proceed with unverified assumptions.
Invoke the /context-engineer command:
/mem-search for project history).productionos/INTEL-CONTEXT.mdInvoke /plan-ceo-review in all 3 modes sequentially:
Each mode reads the previous mode's output. The sequence narrows from dream → reality → minimum.
Output: .productionos/REVIEW-CEO.md
Invoke /plan-eng-review in 2 passes:
The engineering review receives the CEO review as input context.
Output: .productionos/REVIEW-ENGINEERING.md
If the target has frontend code (detected by presence of .tsx/.jsx/.vue/.svelte files):
Invoke /plan-design-review (External dependency -- skip if unavailable, log SKIP):
.productionos/REVIEW-DESIGN.mdInvoke the /agentic-eval command:
.productionos/EVAL-CLEAR.mdBefore launching the full 3-judge panel, run a quick single-judge assessment:
[DOWN] confidence=X.X, threshold=8.5, decision=SKIP_PANEL.productionos/JUDGE-PANEL-{N}.md with note: "DOWN fast-path: single-judge (confidence >= 8.5)"[DOWN] confidence=X.X, threshold=8.5, decision=FULL_PANELThis saves ~66% of judge cost on clear-cut evaluations.
Launch 3 independent judges in parallel. Each judge adopts a persona from persona-orchestrator: Judge 1 receives the senior-engineer persona (deep technical rigor, architecture awareness), Judge 2 receives the pragmatic-PM persona (cost-benefit focus, timeline realism, scope control), and Judge 3 receives the hostile-user persona (adversarial mindset, frustration triggers, edge-case exploitation).
Judge 1 — Correctness Judge (Opus, senior-engineer persona)
Judge 2 — Practicality Judge (Sonnet, pragmatic-PM persona)
Judge 3 — Adversarial Judge (Opus, hostile-user persona)
Consensus Protocol (confidence-calibrated per CoCoA, arXiv 2503.15850):
Confidence gate: If consensus grade < 7.0, the plan is NOT ready for execution. Return to Step 3 with judge feedback.
Output: .productionos/JUDGE-PANEL-{N}.md
Invoke the dynamic-planner agent:
.productionos/OMNI-PLAN.mdFor each batch (up to 12 batches × 7 agents):
Before executing batch N:
git stash push -m "productionos-batch-N-pre"git stash drop (discard rollback point, keep changes)git stash pop (restore pre-batch state), log the failed batch, continue to next batchBefore committing any batch, rate every agent finding on evidence quality:
For each finding in the batch:
A = Strong evidence (file:line citation + reproduction steps)
B = Good evidence (file:line citation, clear reasoning)
C = Adequate evidence (general reference, plausible reasoning)
D = Weak evidence (no citation, speculative)
F = No evidence (hallucinated or unsupported claim)
Action:
A-C: KEEP — include in commit
D: FLAG — include with "[LOW-CONFIDENCE]" prefix, defer if possible
F: REMOVE — do NOT include in commit, log to REFLEXION-LOG.md
Log: [ProductionOS] Claim analysis: {A} A-rated, {B} B-rated, {C} C-rated, {D} D-flagged, {F} F-removed
This prevents hallucinated fixes from reaching the codebase. F-rated findings are the #1 source of regressions.
After each batch:
self-healer (10-round iterative healing)Re-invoke the 3-judge panel on the MODIFIED codebase:
Before making the PIVOT/REFINE/PROCEED decision, run the executable convergence engine:
.productionos/CONVERGENCE-DATA.jsonbun run scripts/convergence.ts --file .productionos/CONVERGENCE-DATA.json{ decision, delta, velocity, focusDimensions }bun run scripts/convergence-dashboard.ts --file .productionos/CONVERGENCE-DATA.json to display progressDensity summarization: Before the PIVOT/REFINE/PROCEED decision, invoke density-summarizer to compress prior iteration findings (from CONVERGENCE-LOG.md and REFLEXION-LOG.md) into a Chain of Density summary. This summary becomes the context handoff for the next iteration, preventing context bloat across loops.
Convergence analysis: After running convergence.ts, invoke convergence-monitor to analyze the grade trajectory across all completed iterations. The monitor identifies stalling dimensions, detects oscillation patterns, and recommends specific focus dimensions for the next iteration. Feed its output into the decision-loop agent alongside the engine output.
Invoke the decision-loop agent with the convergence engine output and convergence-monitor recommendations:
When converged:
Document Sync (auto-detect drift): Before shipping, verify docs match code:
agents/ directory, compare to CLAUDE.md and README.md agent count claims.claude/commands/, compare to CLAUDE.md command list[ProductionOS] Doc sync: {N} drifts detected, {M} auto-fixedIf /document-release skill is available (gstack), invoke it for comprehensive doc sync. Otherwise, run the 6-point check above. (External dependency -- skip if unavailable, log SKIP)
Ship:
/ship — test → version → commit → push → PR (External dependency -- skip if unavailable, log SKIP).productionos/OMNI-REPORT.mdThe tri-tiered evaluation is ProductionOS's core innovation. It runs at TWO points:
This prevents two failure modes:
┌─────────────────┐
│ YOUR WORK │
└────────┬────────┘
│
┌──────────────┼──────────────┐
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ JUDGE 1 │ │ JUDGE 2 │ │ JUDGE 3 │
│ Correctness │ │ Practicality│ │ Adversarial │
│ (Opus) │ │ (Sonnet) │ │ (Opus) │
│ "Is it │ │ "Can it be │ │ "How would │
│ right?" │ │ built?" │ │ I break it?"│
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└───────────────┼───────────────┘
▼
┌─────────────────┐
│ CONSENSUS? │
│ Agree → Score │
│ Disagree → DEBATE│
└─────────────────┘
If a task comes out of scope during execution:
metaclaw-learner creates a new skill spec for the missing capabilityresearch-pipeline researches frameworks for the missing capabilityThis means ProductionOS grows its own toolset as it encounters new problem types.
| Resource | Per Loop | Total Session |
|---|---|---|
| Tokens | 800K | 5M |
| Agents | 21 max | 147 max |
| Web fetches | 200 | 1500 |
| Judge panels | 2 | 14 |
| Time | ~30 min | ~3.5 hours |
.productionos/
├── INTEL-RESEARCH.md # Deep research findings
├── INTEL-CONTEXT.md # Context package
├── REVIEW-CEO.md # CEO review (3 modes)
├── REVIEW-ENGINEERING.md # Engineering review (2 passes)
├── REVIEW-DESIGN.md # Design review (if applicable)
├── EVAL-CLEAR.md # CLEAR framework evaluation
├── JUDGE-PANEL-{N}.md # Tri-tiered judge results per loop
├── OMNI-PLAN.md # Prioritized execution plan
├── OMNI-LOG.md # Execution log
├── OMNI-REPORT.md # Final delivery report
├── REFLEXION-LOG.md # Cross-iteration learning
├── CONVERGENCE-LOG.md # Grade progression
└── DECISION-{N}.md # PIVOT/REFINE/PROCEED decisions