By romabeckman
Orchestrates autonomous TDD/DDD software engineering workflows with phase-isolated agents for adversarial QA, critical code review, and persistent project memory. Coordinates backend and frontend development, automated testing, documentation generation, and trace-driven skill optimization.
Senior Software Architect specialized in DDD, system design, technical refinement, and technical decision-making. Use for architecture decisions, scope refinement, design refinement, and technical quality gates.
Chief Technology Officer for the autonomous-orchestrator. Governs execution strategies, evaluates metrics, and triggers the autonomous loop.
Senior Backend Developer specialized in TDD, API design, database modeling, security, and performance. Use for writing backend code (APIs, services, workers), fixing server bugs, implementing business logic, and backend testing.
Systematic Investigation and Debugging Specialist. Uses the systematic-debugging skill and the "5 Whys" to identify the root cause of bugs before implementation.
Senior Frontend Developer specialized in TDD, UI/UX implementation, accessibility, and performance. Use for writing frontend code (React, Vue, CSS, HTML), fixing UI bugs, implementing designs, and frontend testing.
Autonomous Adversarial QA agent. Reads machine-readable specs and code to execute edge-case and security testing, returning a JSON verdict.
Sovereign loop manager. Handles file initialization, feature lifecycle tracking, and recursive TDD-Validation-Optimization cycles. Strictly delegates all technical tasks to sub-agents.
Harness performance evaluator. Reads all execution traces in docs/harness-history/traces/, computes composite scores per skill_chain, identifies the Pareto frontier of best harness configurations, and recommends the optimal chain for the next session. Run periodically or on demand to guide harness optimization.
Execution trace recorder. Captures what happened during a skill session and persists structured logs to docs/harness-history/traces/. Enables retrospective analysis and harness optimization via harness-evaluator and meta-harness.
Autonomous Meta-Harness proposer. Reads the full harness history filesystem, diagnoses failure patterns, proposes a targeted improvement, stores the candidate, and outputs a JSON decision.
Uses power tools
Uses Bash, Write, or Edit tools
Own this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimOwn this plugin?
Verify ownership to unlock analytics, metadata editing, and a verified badge. GitHub access is read-only (username + org membership).
Sign in to claimBased on adoption, maintenance, documentation, and repository signals. Not a security audit or endorsement.

Harness Engineering: A reliable AI agent is not just a raw model. It is defined as: $$\text{Reliable Agent} = \text{Model (AI)} + \text{Harness (Controls)} + \text{Human Auditor}$$
HarnessKit is a complete AI-assisted software engineering methodology built on Harness Engineering—the principle that true reliability comes from enclosing generative models inside structured execution scaffolds and human-driven governance loops.
At the heart of HarnessKit is the autonomous-orchestrator skill. Once provided with the initial task scope, it runs an atomic, continuous execution cycle without stopping, pausing, or asking redundant questions—fully automating domain planning, TDD execution, and multi-agent code reviews.
However, the human engineer is never replaced: your role evolves into Live Auditing.
While the orchestrator executes continuously, you act as the Human Auditor in the cockpit, tracking the live stream through your coding workspace (Claude Code, Cursor, OpenCode, Gemini, Copilot, etc.). The AI moves with sovereignty, but you maintain continuous telemetry and oversight.
Because the engine runs seamlessly without waiting for permissions at every step, you use this live observability to dynamically intercept the loop when necessary:
Ctrl+C) the moment you notice the AI has adopted an incorrect architectural premise.maxReworks directly inside the configuration files.To prevent systemic risks (such as N+1 queries, memory leaks, security vulnerabilities, or database connection exhaustion), HarnessKit employs a Socratic Code Review model. The orchestrator invokes the the-grumpy-tech-lead to validate the code inferentially by asking deep architectural questions rather than providing copy-paste solutions.
Below is a visual example of how this interactive code review occurs under your watch:

The loop is driven by a robust Product State Machine. It tracks feature backlogs and dynamic transition gates, ensuring that a feature only progresses when quality criteria are fully satisfied.

Based on gate scores, the orchestrator updates the project state machine into four terminal statuses:
COMPLETED: Approved and ready for your final PR review.RETRY: Scores fell short; the engine compiles a REWORK-LOG.md and loops back to code automatically.BLOCKED: Critical crash/break—the engine triggers a circuit breaker and halts for immediate human intervention.FAILED: Non-blocking tech debt—the pipeline logs the issue and moves to the next feature, leaving the debt for you to audit later.HarnessKit is distributed as a command-line plugin compatible with major AI developer ecosystems.
⚠️ IMPORTANT! This project requires the Superpowers skill. Install it before initializing HarnessKit:
/plugin install superpowers@claude-plugins-official
/plugin marketplace add romabeckman/harness-kit
/plugin install harness-kit@harness-kit
/harness-kit:project-memory --help
copilot plugin marketplace add romabeckman/harness-kit
copilot plugin install harness-kit@harness-kit
# Install the extension
agy plugin install https://github.com/romabeckman/harness-kit
To prevent role contamination, the orchestrator isolates operational contexts by dispatching highly specialized agent personas equipped with dedicated skills.
/skills)npx claudepluginhub romabeckman/harness-kit --plugin harness-kitHarness Engineering framework - skills, agents, and commands for safe, reviewable, incremental agent-driven development. Includes RPEQ workflow (Research, Plan, Execute, QA), ast-grep setup, and codebase analysis tools.
HarnessFlow — From idea to shipped product: high-quality engineering workflows for AI agents. Spec-anchored SDD, gated TDD, evidence-based routing, independent reviews, and formal closeout.
Multi-agent orchestration framework for Claude Code, Gemini CLI, and Codex CLI — 19 agents, 14 skills, 16 commands, quality gates, TDD enforcement
52 agent skills for systematic software development. Covers design, planning, TDD, code review, debugging, quality gates, and adversarial testing. 12 core skills are eval-tested with measured A/B deltas using Anthropic's skill evaluation framework.
Verification-first engineering toolkit for Claude Code. 15 skills across a 5-phase spine (Investigate → Design → Implement → Verify → Ship), 8 specialist agents, an interactive setup wizard. Every skill has rationalizations + evidence requirements. Built for senior ICs and tech leads.
An agent-routed harness for end-to-end software product development